This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.
Media Multitrack Change Proposal 2
Synchronize separate media elements through attributes
SUMMARY
This is a change proposal for ISSUE-152, introducing markup, an API, and rendering instructions for media resources with multiple synchronized tracks of audio or video data.
RATIONALE
- Allowing users to control the rendering of audio, video and cue tracks embedded in a video file or provided as independent resources.
- Unifies and simplifies the handling of text track (cue) resources with that of video (since a time aligned text track can be considered to be video with a special sample encoding)
- Enabling authors to control the rendering of sign-language tracks embedded in a video file or provided as independent resources.
- Enabling authors to control the rendering of recorded audio description tracks embedded in a video file or provided as independent resources.
- Enabling authors to control the rendering of alternative audio (director's commentary) tracks embedded in a video file or provided as independent resources.
- Enabling authors to provide features such as YouTube Doubler with synchnorisation, including cases where the two media resources are to be synchronised with different starting offsets or different playback rates.
- Enabling authors to have short loops (e.g. a metronome sound) play over a longer track (e.g. a song), keeping the two tightly synchronised even if the longer file does stall while playing (due to network congestion).
- Allowing authors to select specific dubbed audio tracks based on the language of the track.
- Enabling the user to make use of "pause", "fast-forward", "rewind", "seek", "volume", and "mute" features in the above cases.
- Enabling the user to turn individual tracks on and off from a single menu that works across all the tracks of a video.
- Allowing authors to use CSS for presentation control, e.g. to control where multiple video channels are to be placed relative to each other.
- Allowing authors to control the volume of each track independently, and also control the volume of the mix.
RELATIONSHIP TO OTHER PROPOSALS
This proposal originates from Option 6 on the Media Multitrack API wiki page of the Accessibility Task Force. It is a modification of [1]
It addresses all of the issues as stated in that proposal, the main differences to that proposal are that this proposal:
- Eliminates the majority of the special case handling for tracks in [2] (Section 4.8.10.10 in the W3C dev draft), unifying the model for cues with that for audio and video.
- Tracks considered as first class <cue> elements means that they can be added and deleted from the DOM in the normal manner.
- Cue tracks do not need a special sourcing or loading mechanism (they can reuse the concepts established for audio and video), in particular they can now be open ended streams fixing a known bug with the existing mechanism.
- Cue tracks dont need special load or error events
- The display mechanism is more cleanly integrated into CSS, rather than relying on special case rules that use a video element as a viewport. In particular a single cue file/stream can provide captions for multiple videos placed next to or over each other, in mashups or arranged in a PiP fashion.
- It eliminates a level of indirection in the API which is confusing
DETAILS
Markup
New <audio> and <video> content attributes are:
- timeline - Synchronizes the timeline with another <audio> or <video> element. This attribute modifies the seeking behavior of the media elements to which it is applied. The timeline of an element with this attribute is slaved to that of the master, so the time and playback rate of both are always keep in sync.
- srclang - Gives the language of the media data
- label - Gives a user-readable title for the track. This title is used by user agents when listing subtitle, caption, and audio description tracks in their user interface.
- name - Marks a track as part of a mutually exclusive group: only one of the track in a group is ever enabled.
- checked - A track is enabled when the media element's 'checked' attribute is set. In a exclusive group, only the first checked track is enabled.
- kind - List the accessibility affordance or affordances the track satisfies.
The <track> element is renamed to <cues> to reflect its actual semantic, and is no longer required to be embedded in a <video> element. <cues> may appear anywhere that an <audio> or <video> element may appear. It inherits the same IDL and functionality as a media element with the addition that it has a set of cues.
This markup example shows how a sign-language and a audio description track can be added as external resources to a main video element, the captions are overlayed in the 'safe area' typically used by broadcasters; namely the central 80% rectangle:
<article>
  <style scoped>
   div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
   video, audio { position: absolute; bottom: 0; right: 0; }
   video.v1 { width: 100%; height: 100%; }
   video.v2 { width: 30%; }
   cues { display:none; position: absolute; bottom: 10%; right: 10%; width: 80%; height: 80%;}
   cues[checked] {display:block}
  </style>
  <div>
    <!-- primary content -->
    <video id="v1" controls>
      <source src=“video.webm” type=”video/webm”>
      <source src=“video.mp4” type=”video/mp4”>
    </video>
    <!-- pre-recorded audio descriptions -->
    <audio id="a1" timeline="v1" kind="descriptions" srclang="en" label="English Audio Description" checked>
      <source src="audesc.ogg" type="audio/ogg">
      <source src="audesc.mp3" type="audio/mp3">
    </audio>
    <!-- sign language overlay -->
    <video id="v2" timeline="v1" kind="signing" srclang="asl" label="American Sign Language" checked>
      <source src="signing.webm" type="video/webm">
      <source src="signing.mp4" type="video/mp4">
    </video>
    <cues id="captions" timeline="v1" kind="captions" srclang="en" label="Captions">
      <source src="captions.vtt" type="text/vtt" />
      <source src="captions.xml" type="application/ttml+xml" />
    </cues>
  </div>
</article>
- The controls of the master should include a menu with a list of all the available tracks provided through the slave media elements.
Having <cue> tracks be external to <video> means that they can be placed on a page and slaved to an <audio> element which has no rendering rectangle; this is one of the outstanding failures in all of the approaches to date and would make it possible to achieve WCAG 2.0 compliance with an audio only media presentation.
for example:
<article>
  <style scoped>
   div { margin: 1em auto; position: relative; width: 200px; height: 300px; }
   video.sign { position: absolute; bottom: 10%; right: 10%; width: 100%; height: 100%; }
   cues { display:none; position: absolute; bottom: 10%; right: 10%; width: 80%; height: 80%;}
   cues[checked] {display:block}
  </style>
  <div>
    <!-- primary content - no visual container -->
    <audio id="v1" controls>
      <source src="podcast.ogg" type="audio/ogg">
      <source src="podcast.mp3" type="audio/mp3">
    </audio>
    <!-- sign language translation -->
    <video id="sign" timeline="v1" kind="signing" srclang="asl" label="American Sign Language" checked>
      <source src="signing.webm" type="video/webm">
      <source src="signing.mp4" type="video/mp4">
    </video>
    <cues id="captions" timeline="v1" kind="captions" srclang="en" label="Captions">
      <source src="captions.vtt" type="text/vtt" />
      <source src="captions.xml" type="application/ttml+xml" />
    </cues>
  </div>
</article>
This next example shows how a single caption source is used to provide captioning for two videos placed side by side in a news style interview presentation.
<article>
  <style scoped>
   div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
   video.left { position: absolute; bottom: 0%; right: 0%; width: 50%; height: 100%; }
   video.right { position: absolute; bottom: 0%; left: 0%; width: 50%; height: 100%; }
   cues { display:none; position: absolute; bottom: 10%; right: 10%; width: 80%; height: 80%;}
   cues[checked] {display:block}
  </style>
  <div>
    <!-- two videos placed side by side each contains one participant in the interview -->
    <video id="left" controls>
      <source src="video1.ogg" type="audio/ogg">
      <source src="video1.mp3" type="audio/mp3">
    </video>
    <video id="right" timeline="v1">
      <source src="video2.ogg" type="audio/ogg">
      <source src="video2.mp3" type="audio/mp3">
    </video>
	<!-- the captions for both participants -->
    <cues id="captions" timeline="left" kind="captions" srclang="en" label="Captions">
      <source src="captions.vtt" type="text/vtt" />
      <source src="captions.xml" type="application/ttml+xml" />
    </cues>
  </div>
</article>
This markup example shows how a in-band tracks can be handled. Using a separate audio element for audio tracks allows independent volume and mute/unmute control. Having a separate video element for video tracks makes it possible to give it its own CSS rendering area:
<article>
  <style scoped>
   div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
   video, audio { position: absolute; bottom: 0; right: 0; }
   video.v1 { width: 100%; height: 100%; }
   video.v2 { width: 30%; }
   track { position: absolute; bottom: 10%; right: 10%; width: 80%; height: 80%;}
  </style>
  <div>
    <!-- primary content -->
    <video id="v1" controls>
      <source src=“video.webm” type=”video/webm”>
      <source src=“video.mp4” type=”video/mp4”>
    </video>
    <!-- pre-recorded audio descriptions -->
    <audio id="a1" timeline="v1" kind="descriptions" srclang="en" label="English Audio Description">
      <source src="video.webm#track=en_description" type="audio/ogg">
      <source src="video.mp4#track=en_description" type="audio/mp3">
    </audio>
    <!-- sign language overlay -->
    <video id="v2" timeline="v1" kind="signing" srclang="asl" label="American Sign Language">
      <source src="video.webm#track=asl" type="video/webm">
      <source src="video.mp4#track=asl" type="video/mp4">
    </video>
	<!-- captions -->
    <cues id="captions" timeline="v1" kind="captions" srclang="en" label="Captions">
      <source src="video.webm#track=captions" type="text/vtt" />
      <source src="video.mp4#track=captions" type="application/ttml+xml" />
    </cues>
  </div>
</article>
- If no slaves are added to a video that has multiple audio, video and cue tracks, the resource defines which tracks are displayed and the UA, author and user have no means to turn tracks on/off.
- If the UA/author/user should be allowed to control the activation/deactivation of tracks, it is necessary to provide slave video or audio elements that link back to the master's individual tracks. In this case, all tracks are by default deactivated and are only activated when either the page author adds a @mode="on" attribute, the UA settings require them to be activated, the script author activates them through script, or the user activates them from the common menu.
- The timeline is defined through the master. Where an element has a @controls attribute, it has to show the identical state to the master. Where a slave's resource is longer than the master, it is ended at the duration of the master. Where a slave's resource is shorter, regard it as an empty track from there on.
- When one element stalls, all stall. After a timeout, however, it could be possible for a UA to drop out on a slave track to be able to continue playing the other tracks. This is a QoS issue that is left to the UAs for implementation.
JavaScript API
interface HTMLMediaElement : HTMLElement {
           attribute DOMString timeline;
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  readonly attribute DOMString name;
  readonly attribute boolean checked;
  // error state
  readonly attribute MediaError error;
  // network state
           attribute DOMString src;
  readonly attribute DOMString currentSrc;
  const unsigned short NETWORK_EMPTY = 0;
  const unsigned short NETWORK_IDLE = 1;
  const unsigned short NETWORK_LOADING = 2;
  const unsigned short NETWORK_NO_SOURCE = 3;
  readonly attribute unsigned short networkState;
           attribute DOMString preload;
  readonly attribute TimeRanges buffered;
  void load();
  DOMString canPlayType(in DOMString type);
  // ready state
  const unsigned short HAVE_NOTHING = 0;
  const unsigned short HAVE_METADATA = 1;
  const unsigned short HAVE_CURRENT_DATA = 2;
  const unsigned short HAVE_FUTURE_DATA = 3;
  const unsigned short HAVE_ENOUGH_DATA = 4;
  readonly attribute unsigned short readyState;
  readonly attribute boolean seeking;
  // playback state 
           attribute double currentTime;
  readonly attribute double initialTime;
  readonly attribute double duration;
  readonly attribute Date startOffsetTime;
  readonly attribute boolean paused;
           attribute double defaultPlaybackRate;
           attribute double playbackRate;
  readonly attribute TimeRanges played;
  readonly attribute TimeRanges seekable;
  readonly attribute boolean ended;
           attribute boolean autoplay;
           attribute boolean loop;
  void play();
  void pause();
  // controls
           attribute boolean controls;
           attribute double volume;
           attribute boolean muted;
};
interface CueTrack : HTMLMediaTrack {
  readonly attribute TextTrackCueList cues;
  readonly attribute TextTrackCueList activeCues;
           // event raised if a cue becomes active/inactive
           // with target being the activated/deactivated TextTrackCue
           attribute Function oncueenter;
           attribute Function oncueexit;
}
CueTrack implements EventTarget;
A script developer that wants to get the overall playback state of the multitrack resource (e.g. to run their own controls) should only ever read the IDL attributes of the master. For changing the playback position, only @currentTime of the master is relevant - @currentTime of the slaves is turned into a readonly attribute. Autoplay, loop and playbackRate changes are also ignored, as are calls to play() and pause().
With such an interface, we can e.g. use the following to activate the first English audio description track of video v1:
// get all video elements that depend on v1
audioTracks = new Array[];
index = 0;
for (i in document.getElementsByTagname("audio")) {
  if (i.timeline == "v1") {
    audioTracks[index] = i;
    index++;
  }
}
for (i in audioTracks) {
  if (audioTracks[i].kind == "audiodescription" && audioTracks[i].language == "en") {
    audioTracks[i].checked="checked";
    break;
  }
}
Similarly a script can expose all caption tracks
// switch on captions for all videos
for (i in document.getElementsByTagname("cues")) {
  if (i.kind == "captions") {
    i.checked="checked";
  }
}
Rendering
There are now multiple dependent media elements on the page, possibly each with controls. Rendering, including "display: none" is left to the author - default rendering follows the CSS layout model.
There is a need to add a menu to the controls displayed on the master element. This menu will contain lists of alternative and additionally available tracks on top of the main ones, as defined by the slave elements.
IMPACT
Advantages
- + Video elements can be styled individually as their own CSS block elements and deactivated with "display: none".
- + Audio and video elements retain their full functionality, even if the user interaction with any controls represent a synchronized interaction with all elements in the group.
- + Unifies video element and text track handling; and breaks the restriction that text tracks are limited to a video container.
- + Eliminates the confusing two level IDL API between track as an Element and the Track as cue container.
- + Doesn't require any new elements, just attributes; generalising cue handling to the same model removes a complicated concept from the specification.
Disadvantages
- - There are new attributes on the audio and video elements making them more complex.
Conformance impact This is a new feature, which will require implementation in all UAs.
Risks
- If the discussion was to continue around these proposals, it can be expected that we can find a consensus solution. This is a new feature and not really a contentious issue.