This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.
Media Multiple Text Tracks API
Contents
JavaScript API for a Multitrack Media Resource
Summary
This is a proposal for introduction of a JavaScript API for HTML5 media elements that allows Web authors to determine the data that is available from a media resource. It exposes the tracks that the resource contains, the type of data it is (e.g. audio/vorbis, text/srt, video/theora), the role this data plays (e.g. audio description, caption, sign language), and the actual language it is in (RFC3066 language code). It also enables control over the activation state of the track.
Rationale
The HTML5 video element specification right now states (bold parts added):
To make video and audio content accessible to people who are blind, deaf, or have other physical or cognitive disabilities, authors are expected to provide alternative media streams and/or to embed accessibility aids (such as caption or subtitle tracks, audio description tracks, or sign-language overlays) into their media streams.
User agents should provide controls to enable or disable the display of closed captions, audio description tracks, and other additional data associated with the video stream, though such features should, again, not interfere with the page's normal rendering.
If the controls attribute is present, or if scripting is disabled for the media element, then the user agent should expose a user interface to the user. This user interface should include features to begin playback, pause playback, seek to an arbitrary position in the content (if the content supports arbitrary seeking), change the volume, change the display of closed captions or embedded sign-language tracks, select different audio tracks or turn on audio descriptions, and show the media content in manners more suitable to the user (e.g. full-screen video or in an independent resizable window). Other controls may also be made available.
The current state is that none of the released browsers that support the media elements are exposing controls for accessibility data that is contained inside the media resources. Only experimental approaches towards exposing UI controls for accessibility data in tracks of media resources are available. One of the reasons for lack of such controls is that there is no defined alternative JavaScript API for these kinds of tracks.
The built-in video controller in Top of Tree Webkit on OSX and Windows has a button that toggles display of closed caption tracks in QuickTime and MPEG-4 movies. WebKit also has an experimental, "webkit" prefixed, JavaScript API to detect and toggle the display of closed captions so custom media controls can also control the display of captions. These features are available for experimentation in WebKit nightly builds.
Similarly, there is a patch for Firefox that adds a menu to a video's controls for all subtitle tracks, but there is no JavaScript equivalent to it:
This proposal introduces a JavaScript API for extracting basic information about the tracks contained inside a media resource (audio or video), which may include audio descriptions, sign language video, closed captions or subtitles. It also encourages browser developers to expose visual and accessible controls for activating and deactivating tracks in multitrack media resources.
Draft Proposal
The HTMLMediaElement is extended with a tracks attribute that allows access to the tracks of a loaded (state: HAVE_METADATA) media resource:
interface HTMLMediaElement : HTMLElement {
  ...
  readonly attribute MediaTracks tracks;
  ...
};
The tracks are provided in a list with indexed access (similar to HTMLCollection, but members are MediaTrack, not a subclass of HTMLElement):
interface MediaTracks {
  readonly attribute unsigned long length;
  caller getter MediaTrack item(in unsigned long index);
  ...
};
A track is composed of several attributes, most of which are read-only. A track can be either enabled (enabled=true) or disabled (enabled=false).
interface MediaTrack {
  readonly attribute DOMString name;
  readonly attribute DOMString src;
  readonly attribute DOMString role;
  readonly attribute DOMString type;
  readonly attribute DOMString language;
           attribute boolean enabled;
  ...
};
track . name
    Returns a DOMString with the name of the track if available.
track . src
    Returns a DOMString that identifies the source resource of the track - for external resources it is their URI, for internal ones it is the
    URI of the complete resource. In future, track addressing via Media Fragments is possible.
    (see http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/)
track . role
    Returns the function of the track, being one of the following: "caption", "subtitle", "sign", "audioDesc" etc.
track . type
    Returns a valid MIME type, e.g. text/srt, or application/ttaf+xml.
track . language
    Returns the language of the track. If present, the value must be a valid RFC 3066 language code.
track . enabled = true / false
    Enables / disables the track for display.
Note: The mapping from container format information to the values of the @role API will need to be defined, in particular for MPEG and Ogg; this is outside the HTML WG's efforts - the spec only provides a standard set of roles that container formats map to in order to be compatible.
(Note: we may want to introduce later a @media attribute:
track . media
    Returns the media attribute value associated with the track, which for multitrack media resources is the attributed to the media element's
    source element, if available. A media query that evaluates to "false" means it *must* not be enabled because it is not appropriate for the user's
    environment, so trying to set it will throw a NOT_SUPPORTED_ERR.
)
Available Roles
The following roles have so far been defined: (the text roles are also mentioned at http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations#The_Markup)
Text tracks:
- "caption",
- "subtitle",
- "textaudiodesc",
- "karaoke",
- "chapters",
- "tickertext",
- "lyrics"
Video tracks:
- "main"
- "alternate" (e.g. different camera angle)
- "sign" (for sign language)
Audio tracks:
- "main"
- "alternate" (probably linked to an alternate video track)
- "dub"
- "audiodesc"
- "music"
- "sfx" (sound effects)
Example use
 if (video.tracks[1].role == "caption") video.tracks[1].enabled = true;
     enables a caption track
 if (video.tracks[2].role == "subtitle" && video.tracks[2].language == "fr") video.tracks[2].enabled = true;
     enables a French subtitle track
Impact
Positive Effects
- it will be possible to find out as a Web author whether accessibility data is contained inside the media resource and react accordingly, e.g. display a menu for activation of each track, or automatically enable a certain track knowing the accessibility needs of the user
Negative Effects
- the API of the media element gets more complicated
References
HTML Working Group Issue
HTML ISSUE-152 Handling of additional tracks of a multitrack audio/video resource
Related Bugs
Bug 9452: Handling of additional tracks of a multitrack audio/video resource
Bug 8659: Media events to indicate captions and audio descriptions
Bug 5758: Insufficient accessibility fallback for <audio> or <video>

