Media
elements (audio
and video,
in this specification) implement the following interface:
interface HTMLMediaElement : HTMLElement {
  // error state
  readonly attribute MediaError? error;
  // network state
           attribute DOMString src;
  readonly attribute DOMString currentSrc;
           attribute DOMString crossOrigin;
  const unsigned short NETWORK_EMPTY = 0;
  const unsigned short NETWORK_IDLE = 1;
  const unsigned short NETWORK_LOADING = 2;
  const unsigned short NETWORK_NO_SOURCE = 3;
  readonly attribute unsigned short networkState;
           attribute DOMString preload;
  readonly attribute TimeRanges buffered;
  void load();
  DOMString canPlayType(DOMString type);
  // ready state
  const unsigned short HAVE_NOTHING = 0;
  const unsigned short HAVE_METADATA = 1;
  const unsigned short HAVE_CURRENT_DATA = 2;
  const unsigned short HAVE_FUTURE_DATA = 3;
  const unsigned short HAVE_ENOUGH_DATA = 4;
  readonly attribute unsigned short readyState;
  readonly attribute boolean seeking;
  // playback state
           attribute double currentTime;
  readonly attribute unrestricted double duration;
  readonly attribute Date startDate;
  readonly attribute boolean paused;
           attribute double defaultPlaybackRate;
           attribute double playbackRate;
  readonly attribute TimeRanges played;
  readonly attribute TimeRanges seekable;
  readonly attribute boolean ended;
           attribute boolean autoplay;
           attribute boolean loop;
  void play();
  void pause();
  // media controller
           attribute DOMString mediaGroup;
           attribute MediaController? controller;
  // controls
           attribute boolean controls;
           attribute double volume;
           attribute boolean muted;
           attribute boolean defaultMuted;
  // tracks
  readonly attribute AudioTrackList audioTracks;
  readonly attribute VideoTrackList videoTracks;
  readonly attribute TextTrackList textTracks;
  TextTrack addTextTrack(DOMString kind, optional DOMString label, optional DOMString language);
};
The media element
attributes, src, crossorigin, preload, autoplay, mediagroup, loop, muted, and controls, apply to all media elements. They
are defined in this section.
Media elements are used to present audio data, or video and audio data, to the user. This is referred to as media data in this section, since this section applies equally to media elements for audio or for video. The term media resource is used to refer to the complete set of media data, e.g. the complete video file, or complete audio file.
A media resource can have multiple audio and
video tracks. For the purposes of a media element, the video data of the
media resource is only that of the
currently selected track (if any) given by the element's
videoTracks attribute, and the audio data of the
media resource is the result of mixing
all the currently enabled tracks (if any) given by the element's
audioTracks attribute.
Both audio
and video
elements can be used for both audio and video. The main difference
between the two is simply that the audio
element has no playback area for visual content (such as video or
captions), whereas the video
element does.
errorReturns a MediaError object representing the
current error state of the element.
Returns null if there is no error.
interface MediaError {
  const unsigned short MEDIA_ERR_ABORTED = 1;
  const unsigned short MEDIA_ERR_NETWORK = 2;
  const unsigned short MEDIA_ERR_DECODE = 3;
  const unsigned short MEDIA_ERR_SRC_NOT_SUPPORTED = 4;
  readonly attribute unsigned short code;
};
error
. 
codeReturns the current error's error code, from the list below.
MEDIA_ERR_ABORTED
(numeric value 1)MEDIA_ERR_NETWORK
(numeric value 2)MEDIA_ERR_DECODE
(numeric value 3)MEDIA_ERR_SRC_NOT_SUPPORTED
(numeric value 4)src attribute was not suitable.The src content attribute on
media elements gives
the address of the media resource (video, audio) to show. The
attribute, if present, must contain a valid
non-empty URL potentially surrounded by spaces.
The crossorigin content
attribute on media elements is a
CORS settings attribute.
currentSrcReturns the address of the current media resource.
Returns the empty string when there is no media resource.
There are two ways to specify a media resource, the src attribute, or source elements. The attribute overrides
the elements.
A media resource can be described in terms of
its type, specifically a MIME type, in some cases with a codecs parameter. (Whether the codecs parameter is allowed or not depends on the MIME
type.) [RFC4281]
Types are usually somewhat incomplete descriptions; for example
"video/mpeg" doesn't say anything except what
the container type is, and even a type like "video/mp4; codecs="avc1.42E01E, mp4a.40.2"" doesn't
include information like the actual bitrate (only the maximum
bitrate). Thus, given a type, a user agent can often only know
whether it might be able to play media of that type (with
varying levels of confidence), or whether it definitely
cannot play media of that type.
A type that the user agent knows it cannot render is one that describes a resource that the user agent definitely does not support, for example because it doesn't recognize the container type, or it doesn't support the listed codecs.
The MIME type "application/octet-stream"
with no parameters is never a type
that the user agent knows it cannot render. User agents must
treat that type as equivalent to the lack of any explicit Content-Type metadata when it is used to label a
potential media resource.
 "application/octet-stream" 
 is special-cased here; if any parameter appears with it, it 
 should 
 be treated just like any other MIME type. This is a deviation from the rule 
 that unknown MIME type parameters should be ignored. 
canPlayType(type)Returns the empty string (a negative response), "maybe", or "probably" based on how confident the user agent is that it can play media resources of the given type.
This script tests to see if the user agent supports a
(fictional) new format to dynamically decide whether to use a
video
element or a plugin:
<section id="video">
 <p><a href="playing-cats.nfv">Download video</a></p>
</section>
<script>
 var videoSection = document.getElementById('video');
 var videoElement = document.createElement('video');
 var support = videoElement.canPlayType('video/x-new-fictional-format;codecs="kittens,bunnies"');
 if (support != "probably" && "New Fictional Video Plugin" in navigator.plugins) {
   // not confident of browser support
   // but we have a plugin
   // so use plugin instead
   videoElement = document.createElement("embed");
 } else if (support == "") {
   // no support from browser and no plugin
   // do nothing
   videoElement = null;
 }
 if (videoElement) {
   while (videoSection.hasChildNodes())
     videoSection.removeChild(videoSection.firstChild);
   videoElement.setAttribute("src", "playing-cats.nfv");
   videoSection.appendChild(videoElement);
 }
</script>
The type attribute of the source element allows the user agent to
avoid downloading resources that use formats it cannot render.
networkStateReturns the current state of network activity for the element, from the codes in the list below.
NETWORK_EMPTY (numeric
value 0)NETWORK_IDLE (numeric
value 1)NETWORK_LOADING
(numeric value 2)NETWORK_NO_SOURCE
(numeric value 3)load()Causes the element to reset and start selecting and loading a new media resource from scratch.
The preload attribute is an
enumerated attribute. The following
table lists the keywords and states for the attribute — the
keywords in the left column map to the states in the cell in the
second column on the same row as the keyword. The attribute can be
changed even once the media resource is being buffered or played;
the descriptions in the table below are to be interpreted with that
in mind.
| Keyword | State | Brief description | 
|---|---|---|
| none | None | Hints to the user agent that either the author does not expect the user to need the media resource, or that the server wants to minimise unnecessary traffic. This state does not provide a hint regarding how aggressively to actually download the media resource if buffering starts anyway (e.g. once the user hits "play"). | 
| metadata | Metadata | Hints to the user agent that the author does not expect the
user to need the media resource, but that fetching the resource
metadata (dimensions, track list, duration, etc), and maybe even
the first few frames, is reasonable. If the user agent precisely
fetches no more than the metadata, then the media element will end up with its 
readyStateattribute set toHAVE_METADATA; typically though, some
frames will be obtained as well and it will probably beHAVE_CURRENT_DATAorHAVE_FUTURE_DATA. When the media
resource is playing, hints to the user agent that bandwidth is to
be considered scarce, e.g. suggesting throttling the download so
that the media data is obtained at the slowest possible rate that
still maintains consistent playback. | 
| auto | Automatic | Hints to the user agent that the user agent can put the user's needs first without risk to the server, up to and including optimistically downloading the entire resource. | 
The empty string is also a valid keyword, and maps to the Automatic state. The attribute's missing value default is user-agent defined, though the Metadata state is suggested as a compromise between reducing server load and providing an optimal user experience.
Authors might switch the attribute from
"none" or "metadata" to "auto" dynamically once the user begins
playback. For example, on a page with many videos this might be
used to indicate that the many videos are not to be downloaded
unless requested, but that once one is requested it is to
be downloaded aggressively.
The autoplay attribute can override the
preload attribute (since if the media
plays, it naturally has to buffer first, regardless of the hint
given by the preload attribute). Including both is not
an error, however.
bufferedReturns a TimeRanges object that represents the
ranges of the media resource that the user agent has
buffered.
durationReturns the length of the media resource, in seconds, assuming that the start of the media resource is at time zero.
Returns NaN if the duration isn't available.
Returns Infinity for unbounded streams.
currentTime [ = value ]Returns the official playback position, in seconds.
Can be set, to seek to the given time.
Will throw an InvalidStateError
exception if there is no selected media resource or if there is a 
current media controller.
The loop attribute is a boolean attribute that, if specified,
indicates that the media element is to seek back to the start
of the media resource upon reaching the end.
The loop attribute has no effect while the element
has a 
current media controller.
readyStateReturns a value that expresses the current state of the element with respect to rendering the current playback position, from the codes in the list below.
HAVE_NOTHING (numeric
value 0)No information regarding the media resource is available. No data for
the 
current playback position is available. Media elements whose
networkState attribute are set to NETWORK_EMPTY are always in the
HAVE_NOTHING state.
HAVE_METADATA (numeric
value 1)Enough of the resource has been obtained that the duration of
the resource is available. In the case of a video
element, the dimensions of the video are also available. The API
will no longer throw an exception when seeking. No media data is available for the immediate
current playback position.
HAVE_CURRENT_DATA
(numeric value 2)Data for the immediate 
current playback position is available, but either not enough
data is available that the user agent could successfully advance
the 
current playback position in the 
direction of playback at all without immediately reverting to
the HAVE_METADATA state, or there is no
more data to obtain in the 
direction of playback. For example, in video this corresponds
to the user agent having data from the current frame, but not the
next frame, when the 
current playback position is at the end of the current frame;
and to when playback has ended.
HAVE_FUTURE_DATA
(numeric value 3)Data for the immediate 
current playback position is available, as well as enough data
for the user agent to advance the 
current playback position in the 
direction of playback at least a little without immediately
reverting to the HAVE_METADATA state, and the text tracks are ready. For
example, in video this corresponds to the user agent having data
for at least the current frame and the next frame when the 
current playback position is at the instant in time between the
two frames, or to the user agent having the video data for the
current frame and audio data to keep playing at least a little when
the 
current playback position is in the middle of a frame. The user
agent cannot be in this state if playback has ended, as the 
current playback position can never advance in this case.
HAVE_ENOUGH_DATA
(numeric value 4)All the conditions described for the HAVE_FUTURE_DATA state are met,
and, in addition, either of the following conditions is also
true:
In practice, the difference between HAVE_METADATA and HAVE_CURRENT_DATA is
negligible. Really the only time the difference is relevant is when
painting a video
element onto a canvas,
where it distinguishes the case where something will be drawn
(HAVE_CURRENT_DATA or greater)
from the case where nothing is drawn (HAVE_METADATA or less). Similarly,
the difference between HAVE_CURRENT_DATA (only the
current frame) and HAVE_FUTURE_DATA (at least this
frame and the next) can be negligible (in the extreme, only one
frame). The only time that distinction really matters is when a
page provides an interface for "frame-by-frame" navigation.
It is possible for the ready state of a media
element to jump between these states discontinuously. For example,
the state of a media element can jump straight from HAVE_METADATA to HAVE_ENOUGH_DATA without passing
through the HAVE_CURRENT_DATA and
HAVE_FUTURE_DATA states.
The autoplay attribute is a
boolean attribute. When present, the
user agent will automatically begin playback of the media resource as soon as it can do so
without stopping.
Authors are urged to use the autoplay attribute rather than using
script to trigger automatic playback, as this allows the user to
override the automatic playback when it is not desired, e.g. when
using a screen reader. Authors are also encouraged to consider not
using the automatic playback behavior at all, and instead to let
the user agent wait for the user to start playback explicitly.
pausedReturns true if playback is paused; false otherwise.
endedReturns true if playback has reached the end of the media resource.
defaultPlaybackRate [ = value ]Returns the default rate of playback, for when the user is not fast-forwarding or reversing through the media resource.
Can be set, to change the default rate of playback.
The default rate has no direct effect on playback, but if the user switches to a fast-forward mode, when they return to the normal playback mode, it is expected that the rate of playback will be returned to the default rate of playback.
When the element has a 
current media controller, the 
defaultPlaybackRate attribute is ignored and the
current media controller's 
defaultPlaybackRate is used instead.
playbackRate [ = value ]Returns the current rate playback, where 1.0 is normal speed.
Can be set, to change the rate of playback.
When the element has a 
current media controller, the 
playbackRate attribute is ignored and the 
current media controller's 
playbackRate is used instead.
playedReturns a TimeRanges object that represents the
ranges of the media resource that the user agent has
played.
play()Sets the paused
attribute to false, loading the media resource and beginning playback if
necessary. If the playback had ended, will restart it from the
start.
pause()Sets the paused
attribute to true, loading the media resource if necessary.
seekingReturns true if the user agent is currently seeking.
seekableReturns a TimeRanges object that represents the
ranges of the media resource to which it is possible for
the user agent to seek.
A media resource can have multiple embedded audio and video tracks. For example, in addition to the primary video and audio tracks, a media resource could have foreign-language dubbed dialogues, director's commentaries, audio descriptions, alternative angles, or sign-language overlays.
audioTracksReturns an AudioTrackList object representing
the audio tracks available in the media resource.
videoTracksReturns a VideoTrackList object representing
the video tracks available in the media resource.
In this example, a script defines a function that takes a URL to a video and a reference to an element where the video is to be placed. That function then tries to load the video, and, once it is loaded, checks to see if there is a sign-language track available. If there is, it also displays that track. Both tracks are just placed in the given container; it's assumed that styles have been applied to make this work in a pretty way!
<script>
 function loadVideo(url, container) {
   var controller = new MediaController();
   var video = document.createElement('video');
   video.src = url;
   video.autoplay = true;
   video.controls = true;
   video.controller = controller;
   container.appendChild(video);
   video.onloadedmetadata = function (event) {
     for (var i = 0; i < video.videoTracks.length; i += 1) {
       if (video.videoTracks[i].kind == 'sign') {
         var sign = document.createElement('video');
         sign.src = url + '#track=' + video.videoTracks[i].id; 
         sign.autoplay = true;
         sign.controller = controller;
         container.appendChild(sign);
         return;
       }
     }
   };
 }
</script>
AudioTrackList and VideoTrackList objectsThe AudioTrackList and VideoTrackList interfaces are used by
attributes defined in the previous section.
interface AudioTrackList : EventTarget {
  readonly attribute unsigned long length;
  getter AudioTrack (unsigned long index);
  AudioTrack? getTrackById(DOMString id);
           attribute EventHandler onchange;
           attribute EventHandler onaddtrack;
           attribute EventHandler onremovetrack;
};
interface AudioTrack {
  readonly attribute DOMString id;
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
           attribute boolean enabled;
};
interface VideoTrackList : EventTarget {
  readonly attribute unsigned long length;
  getter VideoTrack (unsigned long index);
  VideoTrack? getTrackById(DOMString id);
  readonly attribute long selectedIndex;
           attribute EventHandler onchange;
           attribute EventHandler onaddtrack;
           attribute EventHandler onremovetrack;
};
interface VideoTrack {
  readonly attribute DOMString id;
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
           attribute boolean selected;
};
audioTracks . 
length
videoTracks . 
lengthReturns the number of tracks in the list.
audioTracks[index]
videoTracks[index]Returns the specified AudioTrack or VideoTrack object.
audioTracks . 
getTrackById( id )
videoTracks . 
getTrackById( id )Returns the AudioTrack or VideoTrack object with the given
identifier, or null if no track has that identifier.
ididReturns the ID of the given track. This is the ID that can be
used with a fragment identifier if the format supports the
Media Fragments URI syntax, and that can be used with
the getTrackById() method. [MEDIAFRAG]
kind
kindReturns the category the given track falls into. The possible track categories are given below.
label
labelReturns the label of the given track, if known, or the empty string otherwise.
language
languageReturns the language of the given track, if known, or the empty string otherwise.
enabled [ = value ]Returns true if the given track is active, and false otherwise.
Can be set, to change whether the track is enabled or not. If multiple audio tracks are enabled simultaneously, they are mixed.
videoTracks . 
selectedIndexReturns the index of the currently selected track, if any, or −1 otherwise.
selected [ = value ]Returns true if the given track is active, and false otherwise.
Can be set, to change whether the track is selected or not. Either zero or one video track is selected; selecting a new track while a previous one is selected will unselect the previous one.
| Category | Definition | Applies to... | Examples | 
|---|---|---|---|
| " alternative" | A possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video). | Audio and video. | Ogg: "audio/alternate" or "video/alternate"; DASH: "alternate" without "main" and "commentary" roles, and, for audio, without the "dub" role (other roles ignored). | 
| " captions" | A version of the main video track with captions burnt in. (For legacy content; new content would use text tracks.) | Video only. | DASH: "caption" and "main" roles together (other roles ignored). | 
| " description" | An audio description of a video track. | Audio only. | Ogg: "audio/audiodesc". | 
| " main" | The primary audio or video track. | Audio and video. | Ogg: "audio/main" or "video/main"; WebM: the "FlagDefault" element is set; DASH: "main" role without "caption", "subtitle", and "dub" roles (other roles ignored). | 
| " main-desc" | The primary audio track, mixed with audio descriptions. | Audio only. | AC3 audio in MPEG-2 TS: bsmod=2 and full_svc=1. | 
| " sign" | A sign-language interpretation of an audio track. | Video only. | Ogg: "video/sign". | 
| " subtitles" | A version of the main video track with subtitles burnt in. (For legacy content; new content would use text tracks.) | Video only. | DASH: "subtitle" and "main" roles together (other roles ignored). | 
| " translation" | A translated version of the main audio track. | Audio only. | Ogg: "audio/dub". DASH: "dub" and "main" roles together (other roles ignored). | 
| " commentary" | Commentary on the primary audio or video track, e.g. a director's commentary. | Audio and video. | DASH: "commentary" role without "main" role (other roles ignored). | 
| "" (empty string) | No explicit kind, or the kind given by the track's metadata is not recognised by the user agent. | Audio and video. | Any other track type, track role, or combination of track roles not described above. | 
The 
audioTracks and 
videoTracks attributes allow scripts to select which
track should play, but it is also possible to select specific
tracks declaratively, by specifying particular tracks in the
fragment identifier of the URL of the media resource. The format of the fragment
identifier depends on the MIME type of the media resource. [RFC2046]
[RFC3986]
In this example, a video that uses a format that supports the Media Fragments URI fragment identifier syntax is embedded in such a way that the alternative angles labeled "Alternative" are enabled instead of the default video track. [MEDIAFRAG]
<video src="myvideo#track=Alternative"></video>
Each media element can have a MediaController. A MediaController is an object that
coordinates the playback of multiple media elements, for
instance so that a sign-language interpreter track can be overlaid
on a video track, with the two being kept in sync.
By default, a media element has no MediaController. An implicit
MediaController can be assigned
using the mediagroup content attribute. An
explicit MediaController can be assigned
directly using the 
controller IDL attribute.
Media elements with a
MediaController are said to be
slaved to their controller. The MediaController modifies the
playback rate and the playback volume of each of the media elements slaved
to it, and ensures that when any of its slaved media elements
unexpectedly stall, the others are stopped at the same time.
When a media element is slaved to a MediaController, its playback rate
is fixed to that of the other tracks in the same MediaController, and any looping is
disabled.
enum MediaControllerPlaybackState { "waiting", "playing", "ended" };
[Constructor]
interface MediaController : EventTarget {
  readonly attribute unsigned short readyState; // uses HTMLMediaElement.readyState's values
  readonly attribute TimeRanges buffered;
  readonly attribute TimeRanges seekable;
  readonly attribute unrestricted double duration;
           attribute double currentTime;
  readonly attribute boolean paused;
  readonly attribute MediaControllerPlaybackState playbackState;
  readonly attribute TimeRanges played;
  void pause();
  void unpause();
  void play(); // calls play() on all media elements as well
           attribute double defaultPlaybackRate;
           attribute double playbackRate;
           attribute double volume;
           attribute boolean muted;
           attribute EventHandler onemptied;
           attribute EventHandler onloadedmetadata;
           attribute EventHandler onloadeddata;
           attribute EventHandler oncanplay;
           attribute EventHandler oncanplaythrough;
           attribute EventHandler onplaying;
           attribute EventHandler onended;
           attribute EventHandler onwaiting;
           attribute EventHandler ondurationchange;
           attribute EventHandler ontimeupdate;
           attribute EventHandler onplay;
           attribute EventHandler onpause;
           attribute EventHandler onratechange;
           attribute EventHandler onvolumechange;
};
MediaController()Returns a new MediaController object.
controller [ = controller ]Returns the current MediaController for the media element, if any; returns null
otherwise.
Can be set, to set an explicit MediaController. Doing so removes
the mediagroup attribute, if any.
readyStateReturns the state that the MediaController was in the last
time it fired events as a result of reporting the controller
state. The values of this attribute are the same as for the
readyState attribute of media elements.
bufferedReturns a TimeRanges object that represents the
intersection of the time ranges for which the user agent has all
relevant media data for all the slaved media elements.
seekableReturns a TimeRanges object that represents the
intersection of the time ranges into which the user agent can seek
for all the slaved media elements.
durationReturns the difference between the earliest playable moment and the latest playable moment (not considering whether the data in question is actually buffered or directly seekable, but not including time in the future for infinite streams). Will return zero if there is no media.
currentTime [ = value ]Returns the 
current playback position, in seconds, as a position between
zero time and the current 
duration.
Can be set, to seek to the given time.
pausedReturns true if playback is paused; false otherwise. When this attribute is true, any media element slaved to this controller will be stopped.
playbackStateReturns the state that the MediaController was in the last
time it fired events as a result of reporting the controller
state. The value of this attribute is either "playing",
indicating that the media is actively playing, "ended",
indicating that the media is not playing because playback has
reached the end of all the 
slaved media elements, or "waiting",
indicating that the media is not playing for some other reason
(e.g. the MediaController is paused).
pause()Sets the 
paused attribute to true.
unpause()Sets the 
paused attribute to false.
play()Sets the 
paused attribute to false and invokes the play()
method of each slaved media element.
playedReturns a TimeRanges object that represents the
union of the time ranges in all the slaved media elements that
have been played.
defaultPlaybackRate [ = value ]Returns the default rate of playback.
Can be set, to change the default rate of playback.
This default rate has no direct effect on playback, but if the
user switches to a fast-forward mode, when they return to the
normal playback mode, it is expected that rate of playback
(playbackRate)
will be returned to this default rate.
playbackRate [ = value ]Returns the current rate of playback.
Can be set, to change the rate of playback.
volume [ = value ]Returns the current playback volume multiplier, as a number in the range 0.0 to 1.0, where 0.0 is the quietest and 1.0 the loudest.
Can be set, to change the volume multiplier.
Throws an IndexSizeError
if the new value is not in the range 0.0 .. 1.0.
muted [ = value ]Returns true if all audio is muted (regardless of other attributes either on the controller or on any media elements slaved to this controller), and false otherwise.
Can be set, to change whether the audio is muted or not.
The mediagroup content
attribute on media elements can be
used to link multiple media elements
together by implicitly creating a MediaController. The value is text;
media elements with
the same value are automatically linked by the user agent.
Multiple media elements
referencing the same media resource will share a single network
request. This can be used to efficiently play two (video) tracks
from the same media resource in two different places on
the screen. Used with the mediagroup attribute, these elements
can also be kept synchronised.
In this example, a sign-languge interpreter track from a movie
file is overlaid on the primary video track of that same video file
using two video
elements, some CSS, and an implicit MediaController:
<article>
 <style scoped>
  div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
  video { position; absolute; bottom: 0; right: 0; }
  video:first-child { width: 100%; height: 100%; }
  video:last-child { width: 30%; }
 </style>
 <div>
  <video src="movie.vid#track=Video&track=English" autoplay controls mediagroup=movie></video>
  <video src="movie.vid#track=sign" autoplay mediagroup=movie></video>
 </div>
</article>
A media element can have a group of associated text tracks, known as the media element's list of text tracks. The text tracks are sorted as follows:
track element children of the media element, in tree order.
addTextTrack() method, in the order they were added,
oldest first.A text track consists of:
This decides how the track is handled by the user agent. The kind is represented by a string. The possible strings are:
subtitlescaptionsdescriptionschaptersmetadataThe kind of track can
change dynamically, in the case of a text track corresponding to a track element.
This is a human-readable string intended to identify the track for the user.
The label of a
track can change dynamically, in the case of a text track corresponding to a track element.
When a text track label is the empty string, the user agent should automatically generate an appropriate label from the text track's other properties (e.g. the kind of text track and the text track's language) for use in its user interface. This automatically-generated label is not exposed in the API.
This is a string extracted from the media resource specifically for in-band metadata tracks to enable such tracks to be dispatched to different scripts in the document.
For example, a traditional TV station broadcast streamed on the Web and augmented with Web-specific interactive features could include text tracks with metadata for ad targetting, trivia game data during game shows, player states during sports games, recipe information during food programs, and so forth. As each program starts and ends, new tracks might be added or removed from the stream, and as each one is added, the user agent could bind them to dedicated script modules using the value of this attribute.
Other than for in-band metadata text tracks, the in-band metadata track dispatch type is the empty string. How this value is populated for different media formats is described in steps to expose a media-resource-specific text track.
This is a string (a BCP 47 language tag) representing the language of the text track's cues. [BCP47]
The language of
a text track can change dynamically, in the case of a text track corresponding to a track element.
One of the following:
Indicates that the text track's cues have not been obtained.
Indicates that the text track is loading and there have been no fatal errors encountered so far. Further cues might still be added to the track by the parser.
Indicates that the text track has been loaded with no fatal errors.
Indicates that the text track was enabled, but when the user agent attempted to obtain it, this failed in some way (e.g. URL could not be resolved, network error, unknown text track format). Some or all of the cues are likely missing and will not be obtained.
The readiness state of a text track changes dynamically as the track is obtained.
One of the following:
Indicates that the text track is not active. Other than for the purposes of exposing the track in the DOM, the user agent is ignoring the text track. No cues are active, no events are fired, and the user agent will not attempt to obtain the track's cues.
Indicates that the text track is active, but that the user agent is not actively displaying the cues. If no attempt has yet been made to obtain the track's cues, the user agent will perform such an attempt momentarily. The user agent is maintaining a list of which cues are active, and events are being fired accordingly.
Indicates that the text track is active. If no attempt has yet
been made to obtain the track's cues, the user agent will perform
such an attempt momentarily. The user agent is maintaining a list
of which cues are active, and events are being fired accordingly.
In addition, for text tracks whose kind is
subtitles or
captions, the
cues are being overlaid on the video as appropriate; for text
tracks whose kind is
descriptions,
the user agent is making the cues available to the user in a
non-visual fashion; and for text tracks whose kind is
chapters, the
user agent is making available to the user a mechanism by which the
user can navigate to any point in the media resource by selecting a cue.
A list of text track cues, along with rules for updating the text track rendering. For example, for WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]
The list of cues of a text track can change dynamically, either because the text track has not yet been loaded or is still loading, or due to DOM manipulation.
Each text track has a corresponding TextTrack object.
Each media element has a list of pending text tracks, which must initially be empty, a blocked-on-parser flag, which must initially be false, and a did-perform-automatic-track-selection flag, which must also initially be false.
When the user agent is required to populate the list of pending text tracks of a media element, the user agent must add to the element's list of pending text tracks each text track in the element's list of text tracks whose text track mode is not disabled and whose text track readiness state is loading.
Whenever a track element's parent node changes, the
user agent must remove the corresponding text track from any list of pending text tracks
that it is in.
Whenever a text track's text track readiness state changes to either loaded or failed to load, the user agent must remove it from any list of pending text tracks that it is in.
When a media element is created by an HTML parser or XML parser, the user agent must set the element's blocked-on-parser flag to true. When a media element is popped off the stack of open elements of an HTML parser or XML parser, the user agent must honor user preferences for automatic text track selection, populate the list of pending text tracks, and set the element's blocked-on-parser flag to false.
The text tracks of a media element are ready when both the element's list of pending text tracks is empty and the element's blocked-on-parser flag is false.
A text track cue is the unit of time-sensitive data in a text track, corresponding for instance for subtitles and captions to the text that appears at a particular time and disappears at another time.
Each text track cue consists of:
An arbitrary string.
The time, in seconds and fractions of a second, that describes the beginning of the range of the media data to which the cue applies.
The time, in seconds and fractions of a second, that describes the end of the range of the media data to which the cue applies.
A boolean indicating whether playback of the media resource is to pause when the end of the range to which the cue applies is reached.
A writing direction, either horizontal (a line extends horizontally and is positioned vertically, with consecutive lines displayed below each other), vertical growing left (a line extends vertically and is positioned horizontally, with consecutive lines displayed to the left of each other), or vertical growing right (a line extends vertically and is positioned horizontally, with consecutive lines displayed to the right of each other).
If the writing direction is horizontal, then line position percentages are relative to the height of the video, and text position and size percentages are relative to the width of the video.
Otherwise, line position percentages are relative to the width of the video, and text position and size percentages are relative to the height of the video.
A boolean indicating whether the line's position is a line position (positioned to a multiple of the line dimensions of the first line of the cue), or whether it is a percentage of the dimension of the video.
Either a number giving the position of the lines of the cue, to be interpreted as defined by the writing direction and snap-to-lines flag of the cue, or the special value auto, which means the position is to depend on the other active tracks.
A text track cue has a text track cue computed line position whose value is that returned by the following algorithm, which is defined in terms of the other aspects of the cue:
If the text track cue line position is numeric, the text track cue snap-to-lines flag of the text track cue is not set, and the text track cue line position is negative or greater than 100, then return 100 and abort these steps.
If the text track cue line position is numeric, return the value of the text track cue line position and abort these steps. (Either the text track cue snap-to-lines flag is set, so any value, not just those in the range 0..100, is valid, or the value is in the range 0..100 and is thus valid regardless of the value of that flag.)
If the text track cue snap-to-lines flag of the text track cue is not set, return the value 100 and abort these steps. (The text track cue line position is the special value auto.)
Let cue be the text track cue.
If cue is not in a list of cues of a text track, or if that text track is not in the list of text tracks of a media element, return −1 and abort these steps.
Let track be the text track whose list of cues the cue is in.
Let n be the number of text tracks whose text track mode is showing and that are in the media element's list of text tracks before track.
Increment n by one.
Negate n.
Return n.
A number giving the position of the text of the cue within each line, to be interpreted as a percentage of the video, as defined by the writing direction.
A number giving the size of the box within which the text of each line of the cue is to be aligned, to be interpreted as a percentage of the video, as defined by the writing direction.
An alignment for the text of each line of the cue, either start alignment (the text is aligned towards its start side), middle alignment (the text is aligned centered between its start and end sides), end alignment (the text is aligned towards its end side). Which sides are the start and end sides depends on the Unicode bidirectional algorithm and the writing direction. [BIDI]
The raw text of the cue, and rules for its interpretation, allowing the text to be rendered and converted to a DOM fragment.
Each text track cue has a corresponding
TextTrackCue object. A text track cue's in-memory representation
can be dynamically changed through this TextTrackCue API.
In addition, each text track cue has two pieces of dynamic information:
This flag must be initially unset. The flag is used to ensure events are fired appropriately when the cue becomes active or inactive, and to make sure the right cues are rendered.
The user agent must synchronously unset this flag whenever the
text track cue is removed from its text track's text track list of cues;
whenever the text track itself is removed from its media element's list of text tracks or has its
text track mode changed to disabled; and whenever the media element's 
readyState is changed back to HAVE_NOTHING. When the flag is unset
in this way for one or more cues in text tracks that were
showing
prior to the relevant incident, the user agent must, after having
unset the flag for all the affected cues, apply the rules for
updating the text track rendering of those text tracks. For example,
for text tracks based on
WebVTT,
the 
rules for updating the display of WebVTT text tracks. [WEBVTT]
This is used as part of the rendering model, to keep cues in a consistent position. It must initially be empty. Whenever the text track cue active flag is unset, the user agent must empty the text track cue display state.
The text track cues of a media element's text tracks are ordered relative to each other in the text track cue order, which is determined as follows: first group the cues by their text track, with the groups being sorted in the same order as their text tracks appear in the media element's list of text tracks; then, within each group, cues must be sorted by their start time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any cues with identical end times must be sorted in the order they were last added to their respective text track list of cues, oldest first (so e.g. for cues from a WebVTT file, that would initially be the order in which the cues were listed in the file). [WEBVTT]
A media-resource-specific text track is a text track that corresponds to data found in the media resource.
interface TextTrackList : EventTarget {
  readonly attribute unsigned long length;
  getter TextTrack (unsigned long index);
           attribute EventHandler onaddtrack;
           attribute EventHandler onremovetrack;
};
textTracks . lengthReturns the number of text tracks associated with
the media element (e.g. from track elements). This is the number of
text tracks in the
media element's list of text tracks.
textTracks[ n ]Returns the TextTrack object representing the
nth text track in the media element's list of text tracks.
trackReturns the TextTrack object representing the
track element's text track.
enum TextTrackMode { "disabled", "hidden", "showing" };
interface TextTrack : EventTarget {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  readonly attribute DOMString inBandMetadataTrackDispatchType;
           attribute TextTrackMode mode;
  readonly attribute TextTrackCueList? cues;
  readonly attribute TextTrackCueList? activeCues;
  void addCue(TextTrackCue cue);
  void removeCue(TextTrackCue cue);
           attribute EventHandler oncuechange;
};
addTextTrack( kind [, label [, language ] ] )Creates and returns a new TextTrack object, which is also added to
the media element's list of text tracks.
kindReturns the text track kind string.
labelReturns the text track label, if there is one, or the empty string otherwise (indicating that a custom label probably needs to be generated from the other attributes of the object if the object is exposed to the user).
languageReturns the text track language string.
inBandMetadataTrackDispatchTypeReturns the text track in-band metadata track dispatch type string.
mode
[ = value ]Returns the text track mode, represented by a string from the following list:
disabled"The text track disabled mode.
hidden"The mode.
showing"The text track showing mode.
Can be set, to change the mode.
cuesReturns the text track list of cues, as a
TextTrackCueList object.
activeCuesReturns the text track cues from
the text track list of cues that are
currently active (i.e. that start before the 
current playback position and end after it), as a
TextTrackCueList object.
addCue( cue )Adds the given cue to textTrack's text track list of cues.
removeCue( cue )Removes the given cue from textTrack's text track list of cues.
In this example, an audio
element is used to play a specific sound-effect from a sound file
containing many sound effects. A cue is used to pause the audio, so
that it ends exactly at the end of the clip, even if the browser is
busy running some script. If the page had relied on script to pause
the audio, then the start of the next clip might be heard if the
browser was not able to run the script at the exact time
specified.
var sfx = new Audio('sfx.wav');
var sounds = sfx.addTextTrack('metadata');
// add sounds we care about
function addFX(start, end, name) {
  var cue = new TextTrackCue(start, end, '');
  cue.id = name;
  cue.pauseOnExit = true;
  sounds.addCue(cue);
}
addFX(12.783, 13.612, 'dog bark');
addFX(13.612, 15.091, 'kitten mew'))
function playSound(id) {
  sfx.currentTime = sounds.getCueById(id).startTime;
  sfx.play();
}
// play a bark as soon as we can
sfx.oncanplaythrough = function () {
  playSound('dog bark');
}
// meow when the user tries to leave
window.onbeforeunload = function () {
  playSound('kitten mew');
  return 'Are you sure you want to leave this awesome page?';
}
interface TextTrackCueList {
  readonly attribute unsigned long length;
  getter TextTrackCue (unsigned long index);
  TextTrackCue? getCueById(DOMString id);
};
lengthReturns the number of cues in the list.
Returns the text track cue with index index in the list. The cues are sorted in text track cue order.
getCueById( id )Returns the first text track cue (in text track cue order) with text track cue identifier id.
Returns null if none of the cues have the given identifier or if the argument is the empty string.
enum AutoKeyword { "auto" };
[Constructor(double startTime, double endTime, DOMString text)]
interface TextTrackCue : EventTarget {
  readonly attribute TextTrack? track;
           attribute DOMString id;
           attribute double startTime;
           attribute double endTime;
           attribute boolean pauseOnExit;
           attribute DOMString vertical;
           attribute boolean snapToLines;
           attribute (long or AutoKeyword) line;
           attribute long position;
           attribute long size;
           attribute DOMString align;
           attribute DOMString text;
  DocumentFragment getCueAsHTML();
           attribute EventHandler onenter;
           attribute EventHandler onexit;
};
TextTrackCue(
startTime, endTime,
text )Returns a new TextTrackCue object, for use with the
addCue() method.
The startTime argument sets the text track cue start time.
The endTime argument sets the text track cue end time.
The text argument sets the text track cue text.
Returns the TextTrack object to which this text track cue belongs, if any, or null
otherwise.
Returns the text track cue identifier.
Can be set.
Returns the text track cue start time, in seconds.
Can be set.
Returns the text track cue end time, in seconds.
Can be set.
Returns true if the text track cue pause-on-exit flag is set, false otherwise.
Can be set.
Returns a string representing the text track cue writing direction, as follows:
The empty string.
The string "rl".
The string "lr".
Can be set.
Returns true if the text track cue snap-to-lines flag is set, false otherwise.
Can be set.
Returns the text track cue line
position. In the case of the value being auto, the string
"auto" is returned.
Can be set.
Returns the text track cue text position.
Can be set.
Returns the text track cue size.
Can be set.
Returns a string representing the text track cue alignment, as follows:
The string "start".
The string "middle".
The string "end".
Can be set.
Returns the text track cue text in raw unparsed form.
Can be set.
Returns the text track cue text as a
DocumentFragment
of HTML elements and other DOM nodes.
Chapters are segments of a media resource with a given title. Chapters can be nested, in the same way that sections in a document outline can have subsections.
Each text track cue in a text track being used for describing chapters has three key features: the text track cue start time, giving the start time of the chapter, the text track cue end time, giving the end time of the chapter, and the text track cue text giving the chapter title.
The following snippet of a WebVTT file shows how nested chapters can be marked up. The file describes three 50-minute chapters, "Astrophysics", "Computational Physics", and "General Relativity". The first has three subchapters, the second has four, and the third has two. [WEBVTT]
WEBVTT 00:00:00.000 --> 00:50:00.000 Astrophysics 00:00:00.000 --> 00:10:00.000 Introduction to Astrophysics 00:10:00.000 --> 00:45:00.000 The Solar System 00:00:00.000 --> 00:10:00.000 Coursework Description 00:50:00.000 --> 01:40:00.000 Computational Physics 00:50:00.000 --> 00:55:00.000 Introduction to Programming 00:55:00.000 --> 01:30:00.000 Data Structures 01:30:00.000 --> 01:35:00.000 Answers to Last Exam 01:35:00.000 --> 01:40:00.000 Coursework Description 01:40:00.000 --> 02:30:00.000 General Relativity 01:40:00.000 --> 02:00:00.000 Tensor Algebra 02:00:00.000 --> 02:30:00.000 The General Relativistic Field Equations
The controls attribute is a
boolean attribute. If present, it
indicates that the author has not provided a scripted controller
and would like the user agent to provide its own set of
controls.
volume
[ = value ]Returns the current playback volume, as a number in the range 0.0 to 1.0, where 0.0 is the quietest and 1.0 the loudest.
Can be set, to change the volume.
Throws an IndexSizeError
if the new value is not in the range 0.0 .. 1.0.
muted
[ = value ]Returns true if audio is muted, overriding the volume
attribute, and false if the volume
attribute is being honored.
Can be set, to change whether the audio is muted or not.
The muted attribute on media elements is a
boolean attribute that controls the
default state of the audio output of the media resource, potentially overriding user
preferences.
This attribute has no dynamic effect (it only controls the default state of the element).
This video (an advertisment) autoplays, but to avoid annoying users, it does so without sound, and allows the user to turn the sound on.
<video src="adverts.cgi?kind=video" controls autoplay loop muted></video>
Objects implementing the TimeRanges interface represent a list of
ranges (periods) of time.
interface TimeRanges {
  readonly attribute unsigned long length;
  double start(unsigned long index);
  double end(unsigned long index);
};
lengthReturns the number of ranges in the object.
start(index)Returns the time for the start of the range with the given index.
Throws an IndexSizeError
if the index is out of range.
end(index)Returns the time for the end of the range with the given index.
Throws an IndexSizeError
if the index is out of range.
[Constructor(DOMString type, optional TrackEventInit eventInitDict)]
interface TrackEvent : Event {
  readonly attribute object? track;
};
dictionary TrackEventInit : EventInit {
  object? track;
};
trackReturns the track object (TextTrack, AudioTrack, or VideoTrack) to which the event
relates.
This section is non-normative.
The following events fire on media elements as part of the processing model described above:
| Event name | Interface | Fired when... | Preconditions | 
|---|---|---|---|
| loadstart | Event | The user agent begins looking for media data, as part of the resource selection algorithm. | 
networkStateequalsNETWORK_LOADING | 
| progress | Event | The user agent is fetching media data. | 
networkStateequalsNETWORK_LOADING | 
| suspend | Event | The user agent is intentionally not currently fetching media data. | 
networkStateequalsNETWORK_IDLE | 
| abort | Event | The user agent stops fetching the media data before it is completely downloaded, but not due to an error. | erroris an object with the codeMEDIA_ERR_ABORTED.
networkStateequals eitherNETWORK_EMPTYorNETWORK_IDLE, depending on when the
download was aborted. | 
| error | Event | An error occurs while fetching the media data. | erroris an object with the codeMEDIA_ERR_NETWORKor higher.
networkStateequals eitherNETWORK_EMPTYorNETWORK_IDLE, depending on when the
download was aborted. | 
| emptied | Event | A media element whose 
networkStatewas previously not in theNETWORK_EMPTYstate has just
switched to that state (either because of a fatal error during load
that's about to be reported, or because theload()method was invoked while the resource selection
algorithm was already running). | 
networkStateisNETWORK_EMPTY; all the IDL
attributes are in their initial states. | 
| stalled | Event | The user agent is trying to fetch media data, but data is unexpectedly not forthcoming. | 
networkStateisNETWORK_LOADING. | 
| loadedmetadata | Event | The user agent has just determined the duration and dimensions of the media resource and the text tracks are ready. | 
readyStateis newly equal toHAVE_METADATAor greater for the
first time. | 
| loadeddata | Event | The user agent can render the media data at the current playback position for the first time. | 
readyStatenewly increased toHAVE_CURRENT_DATAor greater
for the first time. | 
| canplay | Event | The user agent can resume playback of the media data, but estimates that if playback were to be started now, the media resource could not be rendered at the current playback rate up to its end without having to stop for further buffering of content. | 
readyStatenewly increased toHAVE_FUTURE_DATAor
greater. | 
| canplaythrough | Event | The user agent estimates that if playback were to be started now, the media resource could be rendered at the current playback rate all the way to its end without having to stop for further buffering. | 
readyStateis newly equal toHAVE_ENOUGH_DATA. | 
| playing | Event | Playback is ready to start after having been paused or delayed due to lack of media data. | 
readyStateis newly equal to or greater thanHAVE_FUTURE_DATAandpausedis false, orpausedis newly false and
readyStateis equal to or greater thanHAVE_FUTURE_DATA. Even if this
event fires, the element might still not be 
potentially playing, e.g. if the element is 
blocked on its media controller (e.g. because the 
current media controller is paused, or another slaved media element is stalled
somehow, or because the media resource has no data corresponding to
the 
media controller position), or the element is 
paused for user interaction or 
paused for in-band content. | 
| waiting | Event | Playback has stopped because the next frame is not available, but the user agent expects that frame to become available in due course. | 
readyStateis equal to or less thanHAVE_CURRENT_DATA, andpausedis false. Eitherseekingis true, or the 
current playback position is not contained in any of the ranges
inbuffered.
It is possible for playback to stop for other reasons withoutpausedbeing false, but those reasons do not fire this event (and when
those situations resolve, a separateplayingevent is not fired either): e.g.
the element is newly 
blocked on its media controller, or playback ended, or playback 
stopped due to errors, or the element has 
paused for user interaction or 
paused for in-band content. | 
| seeking | Event | The seekingIDL attribute changed to true. | |
| seeked | Event | The seekingIDL attribute changed to false. | |
| ended | Event | Playback has stopped because the end of the media resource was reached. | 
currentTimeequals the end of the media resource;endedis true. | 
| durationchange | Event | The durationattribute has just been updated. | |
| timeupdate | Event | The current playback position changed as part of normal playback or in an especially interesting way, for example discontinuously. | |
| play | Event | The element is no longer paused. Fired after the play()method has returned, or when theautoplayattribute has caused playback
to begin. | pausedis newly false. | 
| pause | Event | The element has been paused. Fired after the pause()method has returned. | pausedis newly true. | 
| ratechange | Event | Either the 
defaultPlaybackRateor the
playbackRateattribute has just been updated. | |
| volumechange | Event | Either the volumeattribute or themutedattribute has changed. Fired after the relevant attribute's setter
has returned. | 
The following events fire on MediaController objects:
| Event name | Interface | Fired when... | 
|---|---|---|
| emptied | Event | All the 
slaved media elements newly have 
readyStateset toHAVE_NOTHINGor greater, or there
are no longer any 
slaved media elements. | 
| loadedmetadata | Event | All the 
slaved media elements newly have 
readyStateset toHAVE_METADATAor greater. | 
| loadeddata | Event | All the 
slaved media elements newly have 
readyStateset toHAVE_CURRENT_DATAor
greater. | 
| canplay | Event | All the 
slaved media elements newly have 
readyStateset toHAVE_FUTURE_DATAor
greater. | 
| canplaythrough | Event | All the 
slaved media elements newly have 
readyStateset toHAVE_ENOUGH_DATAor
greater. | 
| playing | Event | The MediaControlleris no longer a
blocked media controller. | 
| ended | Event | The MediaControllerhas reached the end
of all the 
slaved media elements. | 
| waiting | Event | The MediaControlleris now a 
blocked media controller. | 
| ended | Event | All the slaved media elements have newly ended playback. | 
| durationchange | Event | The 
durationattribute has just been updated. | 
| timeupdate | Event | The media controller position changed. | 
| play | Event | The 
pausedattribute is newly false. | 
| pause | Event | The 
pausedattribute is newly true. | 
| ratechange | Event | Either the 
defaultPlaybackRateattribute or the
playbackRateattribute has just been updated. | 
| volumechange | Event | Either the 
volumeattribute or the
mutedattribute has just been updated. | 
This section is non-normative.
Playing audio and video resources on small devices such as
set-top boxes or mobile phones is often constrained by limited
hardware resources in the device. For example, a device might only
support three simultaneous videos. For this reason, it is a good
practice to release resources held by media elements when
they are done playing, either by being very careful about removing
all references to the element and allowing it to be garbage
collected, or, even better, by removing the element's src attribute and any source element descendants, and invoking
the element's load()
method.
Similarly, when the playback rate is not exactly 1.0, hardware, software, or format limitations can cause video frames to be dropped and audio to be choppy or muted.