WebVTT API

Web Video Text Tracks ( WebVTT ) are text tracks providing specific text "cues" that are time-aligned with other media, such as video or audio tracks. The WebVTT API provides functionality to define and manipulate these text tracks. The WebVTT API is primarily used for displaying subtitles or captions that overlay with video content, but it has other uses: providing chapter information for easier navigation and generic metadata that needs to be time-aligned with audio or video content.

Concepts and usage

A text track is a container for time-aligned text data that can be played in parallel with a video or audio track to provide a translation, transcription, or overview of the content. A video or audio media element may define tracks of different kinds or in different languages, allowing users to display appropriate tracks based on their preferences or needs.

The different kinds of text data that can be specified are listed below. Note that browsers do not necessarily support all kinds of text tracks.

  • subtitles provide a textual translation of spoken dialog. This is the default type of text track, and if used, the source language must be specified.
  • captions provide a transcription of spoken text, and may include information about other audio such as music or background noise. They are intended for hearing impaired users.
  • chapters provide high level navigation information, allowing users to more easily switch to relevant content.
  • metadata is used for any other kinds of time-aligned information.
  • The individual time-aligned units of text data within a track are referred to as "cues". Each cue has a start time, end time, and textual payload. It may also have "cue settings", which affect its display region, position, alignment, and/or size. Lastly, a cue may have a label, which can be used to select it for CSS styling.

    A text track and cues can be defined in a file using the WebVTT File Format , and then associated with a particular <video> element using the <track> element.

    Alternatively you can add a TextTrack to a media element in JavaScript using HTMLMediaElement.addTextTrack() , and then add individual VTTCue objects to the track with TextTrack.addCue() .

    The ::cue CSS pseudo-element can be used both in HTML and in a WebVTT file to style the cues for a particular element, for a particular tag within a cue, for a VTT class, or for a cue with a particular label. The ::cue-region pseudo-element is intended for styling cues in a particular region, but is not supported in any browser.

    Most important WebVTT features can be accessed using either the file format or Web API.

    Interfaces

    VTTCue

    Represents a cue, the text displayed in a particular timeslice of the text track associated with a media element.

    VTTRegion

    Represents a portion of a video element onto which a VTTCue can be rendered.

    TextTrack

    Represents a text track, which holds the list of cues to display along with an associated media element at various points while it plays.

    TextTrackCue

    An abstract base class for various cue types, such as VTTCue .

    TextTrackCueList

    An array-like object that represents a dynamically updating list of TextTrackCue objects. An instance of this type is obtained from TextTrack.cues in order to get all the cues in the TextTrack object.

    TextTrackList

    Represents a list of the text tracks defined for a media element, with each track represented by a separate TextTrack instance in the list.

    TrackEvent

    Part of the HTML DOM API, this is the interface for the addtrack and removetrack events that are fired when a track is added or removed from TextTrackList (or more generally, when a track is added/removed from an HTML media element).

    These CSS pseudo-element are used to style cues in media with VTT tracks.

    ::cue

    Matches cues within a selected element in media with VTT tracks.

    Note: The specification defines another pseudo-element, ::cue-region , but this is not supported by any browsers.