This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.

Media Multitrack Change Proposals Summary

From HTML accessibility task force Wiki
Jump to: navigation, search




Resolution due: 22nd April

At the current time, there are four change proposals that have been put forward to introduce support for multiple in-band and out-of-band audio-visual tracks for the audio and video elements. This page tries to address the differences between the proposals, their advantages and disadvantages, with an expectation that a common change proposal can be arrived at.


Previously discussed ideas:

Actual change proposals submitted:

  1. Expose in-band audio tracks
  2. Master with slave media and text tracks
  3. Master with slave media tracks
  4. Controller proposal

Agreements between the proposals

Even though it might not seem so, the existing change proposals have a lot in common and some fundamental decisions are included in them that exclude other solutions.

For example, in relation to the full list of discussed options, the existing proposals agree:

  • exposing in-band tracks is a necessity
  • not overloading the <track> element for audio and video
  • not introducing further elements underneath audio and video for media tracks
  • no introduction of a meta-element for grouping of media resources
  • using an attribute on the audio or video element for grouping of media resources

This is a big step forward from initial test designs.

Discussing the proposals

Proposal 1: Expose in-band audio tracks

  • only provides a solution for in-band audio tracks
  • is incorporated mostly in the other proposals
  • make sure the ability to find out about in-band audio tracks is provided


interface HTMLMediaElement : HTMLElement {

  // audio tracks
  readonly attribute unsigned long audioTrackCount;
  readonly attribute DOMString audioTrackLanguage[];
           attribute unsigned long currentAudioTrack;

Proposal 2: Master with slave media and text tracks

  • a master video is defined, identifies slaves through @timeline attribute
  • makes all track types equally independent from main resource: text, video, audio
  • rendering of tracks for external is wherever they appear on page and CSS is required to move into position
  • allows existence of cues without a video element to render into
  • in-band tracks can be made explicit to the page through the use of media fragment URIs in media elements


interface HTMLMediaElement : HTMLElement {
           attribute DOMString timeline;
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  readonly attribute DOMString name;
  readonly attribute boolean checked;

  readonly attribute TextTrack[] textTracks;
  MutableTextTrack addTextTrack(in DOMString kind, in optional DOMString label, in optional DOMString language);

interface CueTrack : HTMLMediaTrack {
  readonly attribute TextTrackCueList cues;
  readonly attribute TextTrackCueList activeCues;
           // event raised if a cue becomes active/inactive
           // with target being the activated/deactivated TextTrackCue
           attribute Function oncueenter;
           attribute Function oncueexit;
CueTrack implements EventTarget;

interface TextTrack {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;

  const unsigned short NONE = 0;
  const unsigned short LOADING = 1;
  const unsigned short LOADED = 2;
  const unsigned short ERROR = 3;
  readonly attribute unsigned short readyState;
           attribute Function onload;
           attribute Function onerror;

  const unsigned short OFF = 0;
  const unsigned short HIDDEN = 1;
  const unsigned short SHOWING = 2;
           attribute unsigned short mode;

  readonly attribute TextTrackCueList cues;
  readonly attribute TextTrackCueList activeCues;

           attribute Function oncuechange;
TextTrack implements EventTarget;

Proposal 3: Master with slave media tracks

  • a master video is defined, identifies slaves through @timeline attribute
  • all slaved media elements turn their timeline related attributes to readonly and changes to the master on currentTime, defaultPlaybackRate, playbackRate, autoplay, loop, volume, muted, play() and pause() feed through to the slaves
  • makes audio and video track types independent from main resource to avoid duplication of HTMLMediaElement definition
  • rendering of audio and video tracks for external tracks is wherever they appear on page and CSS is required to move overlay video into position


interface HTMLMediaElement {
           attribute DOMString timeline;
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  readonly attribute DOMString name;
  readonly attribute boolean checked;

with the following meaning:

  • timeline - Synchronizes the timeline with another <audio> or <video> element. This attribute modifies the seeking behavior of the media elements to which it is applied. The timeline of an element with this attribute is slaved to that of the master, so the time and playback rate of both are always keep in sync.
  • srclang - Gives the language of the media data
  • label - Gives a user-readable title for the track. This title is used by user agents when listing subtitle, caption, and audio description tracks in their user interface.
  • name - Marks a track as part of a mutually exclusive group: one one of the track in a group is ever enabled.
  • checked - A track is enabled when the media element's 'checked' attribute is set. In a exclusive group, only the first checked track is enabled.
  • kind - List the accessibility affordance or affordances the track satisfies.

Proposal 4: Controller proposal

  • separate solution for in-band and external media elements
  • in-band tracks are explicitly present, or can be displayed through media fragment URI as tracks
  • for external grouped tracks, there is a Controller object that keeps the common state
  • because of the Controller object, all slaved media elements expose the same behaviour
  • because of the Controller object, it is not possible to get into a circular reference of two media elements being slaves to each other
  • since all media elements in a group are handled as equals, it is possible for a user to turn off the "main" video and, for example, just listen to the audio description of a different media element


interface HTMLMediaElement : HTMLElement {

  // media controller (for external media tracks)
           attribute DOMString mediaGroup;
           attribute MediaController controller;

  // tracks (for in-band tracks only)
  readonly attribute MultipleTrackList audioTracks;
  readonly attribute ExclusiveTrackList videoTracks;

interface MediaController {
  readonly attribute TimeRanges buffered;
  readonly attribute TimeRanges seekable;

  // playback state
  readonly attribute double duration;
           attribute double currentTime;

  readonly attribute boolean paused;
  readonly attribute TimeRanges played;
  void play();
  void pause();

           attribute double defaultPlaybackRate;
           attribute double playbackRate;

           attribute double volume;
           attribute boolean muted;

           attribute Function onemptied;
           attribute Function onloadedmetadata;
           attribute Function onloadeddata;
           attribute Function oncanplay;
           attribute Function oncanplaythrough;
           attribute Function onplaying;
           attribute Function onwaiting;

           attribute Function ondurationchange;
           attribute Function ontimeupdate;
           attribute Function onplay;
           attribute Function onpause;
           attribute Function onratechange;
           attribute Function onvolumechange;

interface TrackList {
  readonly attribute unsigned long length;
  DOMString getLabel(in unsigned long index);
  DOMString getLanguage(in unsigned long index);

           attribute Function onchange;

interface MultipleTrackList : TrackList {
  boolean isEnabled(in unsigned long index);
  void enable(in unsigned long index);
  void disable(in unsigned long index);

interface ExclusiveTrackList : TrackList {
  readonly attribute unsigned long selectedIndex;
  void select(in unsigned long index);

  • mediagroup content attribute is added to both audio and video to allow grouping declaratively
  • the ExclusiveTrackList makes sure that for in-band video only one video is ever displayed in the viewport; if more than one video track is to be displayed, an explicit video element using a media fragment URI has to be used
  • if one slaved media element stalls, all of them stall
  • looping is disabled on all media elements
  • playback rate of the slaves is fixed to the playback rate of the controller

Silvia's Notes:

  • make TrackList more like Text Track, see
  • the TrackList only includes label and language attributes - in analogy to TextTrack it should probably rather include (id, label, language, kind); possibly also include getFragmentURL
  • proposals for values of kind are:
for video:
* sign language video (in different sign languages)
* captions (as in: burnt-in video that may just be overlays)
* different camera angle
* associated video track (which might be a generalization of different camera angle). One use case is video mosaic.
for audio:
* audio descriptions
* language dub
* commentary (such as director's commentary)
* clear audio
  • a group should be able to loop over the full multitrack
  • a group should be able to autoplay
  • some attributes of HTMLMediaElement are missing in the MediaController that might make sense to collect state from the slaves: readyState and ended.

Key questions

The following list of key questions about the existing change proposals has surfaced in recent meetings:

  • Discovery of in-band tracks: the solution must allow to discover the available list of in-band tracks from JavaScript
  • Independent timelines: if offset, current time, playback rate and playback direction are independent, timelines are uncoupled - the solution must have these change together
  • Multiple video and audio active in parallel: the solution must allow multiple video tracks active in parallel, just like audio, even for in-band
  • Duration: the solution must create a duration that is the maximum of all active tracks, since we're uniting tracks to a single resource
  • Width/height: the solution must create a unified width/height for the controller, which is the maximum of all of the contributing tracks
  • Disabled: the solution must make the CSS box invisible (i.e. transparent) when a video track is disabled (in particular for PIP video)
  • Controls: the solution must have a default control across all elements as well as on the individual ones and they must be all in sync
  • Loaded ranges: the solution must keep the same ranges loaded across tracks
  • Seeking: the solution must allow seeking
  • Controlling: should slave media elements have any means of controlling other than the controller