Media Multitrack Change Proposal

Synchronize separate media elements through attributes

SUMMARY

This is a change proposal for ISSUE-152, introducing markup, an API, and rendering instructions for media resources with multiple synchronized tracks of audio or video data.


RATIONALE

  • Allowing users to control the rendering of audio and video tracks embedded in a video file or provided as independent resources.
  • Enabling authors to control the rendering of sign-language tracks embedded in a video file or provided as independent resources.
  • Enabling authors to control the rendering of recorded audio description tracks embedded in a video file or provided as independent resources.
  • Enabling authors to control the rendering of alternative audio (director's commentary) tracks embedded in a video file or provided as independent resources.
  • Enabling authors to provide features such as YouTube Doubler with synchnorisation.
  • Allowing authors to select specific dubbed audio tracks based on the language of the track.
  • Enabling the user to make use of "pause", "fast-forward", "rewind", "seek", "volume", and "mute" features in the above cases.
  • Enabling the user to turn individual tracks on and off from a single menu that works across all the tracks of a video.
  • Allowing authors to use CSS for presentation control, e.g. to control where multiple video channels are to be placed relative to each other.
  • Allowing authors to control the volume of each track independently, and also control the volume of the mix.


RELATIONSHIP TO OTHER PROPOSALS

This proposal originates from Option 6 on the Media Multitrack API wiki page of the Accessibility Task Force.

It takes into account the discussions that were had about all the other options on that page, too.

It also takes into account the ideas of the Proposal for Audio and Video Track Selection and Synchronisation for Media Elements.

The main differences to that proposal are that this proposal:

  • Does not rely on the creation of an abstract timeline through a MediaController per media element. Instead, this proposal requires the page author to specify the media element which has the master timeline.
  • Improves on the usability of the multitrack resource for users by including the concept of a single menu across all the slaved media elements that allows turning the individual tracks on or off.
  • Allows the page author to mark tracks as being part of a group from which only one may be enabled.
  • Allows existing multi-track capable browser implementations to work, while allowing page authors and users to override the default rendering choices.

Note that this approach is also related to the way in which the Timesheet implementations of http://labs.kompozer.net/timesheets/audio.html#htmlMarkup and http://labs.kompozer.net/timesheets/video.html#htmlMarkup synchronize multiple media resources (see also http://www.w3.org/TR/SMIL3/smil-timing.html#Timing-ControllingRuntimeSync).


DETAILS

Markup

New <audio> and <video> content attributes are:

  • timeline - Synchronizes the timeline with another <audio> or <video> element. This attribute modifies the seeking behavior of the media elements to which it is applied. The timeline of an element with this attribute is slaved to that of the master, so the time and playback rate of both are always keep in sync.
  • srclang - Gives the language of the media data
  • label - Gives a user-readable title for the track. This title is used by user agents when listing subtitle, caption, and audio description tracks in their user interface.
  • name - Marks a track as part of a mutually exclusive group: one one of the track in a group is ever enabled.
  • checked - A track is enabled when the media element's 'checked' attribute is set. In a exclusive group, only the first checked track is enabled.
  • kind - List the accessibility affordance or affordances the track satisfies.

This markup example shows how a sign-language and a audio description track can be added as external resources to a main video element:

<article>
  <style scoped>
   div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
   video, audio { position: absolute; bottom: 0; right: 0; }
   video.v1 { width: 100%; height: 100%; }
   video.v2 { width: 30%; }
  </style>
  <div>
    <!-- primary content -->
    <video id="v1" controls>
      <source src=“video.webm” type=”video/webm”>
      <source src=“video.mp4” type=”video/mp4”>
      <track kind=”captions” srclang=”en” src=”captions.vtt”>
    </video>
    <!-- pre-recorded audio descriptions -->
    <audio id="a1" timeline="v1" kind="descriptions" srclang="en" label="English Audio Description" checked>
      <source src="audesc.ogg" type="audio/ogg">
      <source src="audesc.mp3" type="audio/mp3">
    </audio>
    <!-- sign language overlay -->
    <video id="v2" timeline="v1" kind="signing" srclang="asl" label="American Sign Language" checked>
      <source src="signing.webm" type="video/webm">
      <source src="signing.mp4" type="video/mp4">
    </video>
  </div>
</article>
  • The controls of the master should include a menu with a list of all the available tracks provided through the slave media elements.

This markup example shows how a in-band sign-language and a in-band audio description track can be handled. Using a separate audio element for audio tracks allows independent volume and mute/unmute control. Having a separate video element for video tracks makes it possible to give it its own CSS rendering area:

<article>
  <style scoped>
   div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
   video, audio { position: absolute; bottom: 0; right: 0; }
   video.v1 { width: 100%; height: 100%; }
   video.v2 { width: 30%; }
  </style>
  <div>
    <!-- primary content -->
    <video id="v1" controls>
      <source src=“video.webm” type=”video/webm”>
      <source src=“video.mp4” type=”video/mp4”>
      <track kind=”captions” srclang=”en” src=”captions.vtt”>
    </video>
    <!-- pre-recorded audio descriptions -->
    <audio id="a1" timeline="v1" kind="descriptions" srclang="en" label="English Audio Description">
      <source src="video.webm#track=en_description" type="audio/ogg">
      <source src="video.mp4#track=en_description" type="audio/mp3">
    </audio>
    <!-- sign language overlay -->
    <video id="v2" timeline="v1" kind="signing" srclang="asl" label="American Sign Language">
      <source src="video.webm#track=asl" type="video/webm">
      <source src="video.mp4#track=asl" type="video/mp4">
    </video>
  </div>
</article>


  • If no slaves are added to a video that has multiple audio and video tracks, the resource defines which tracks are displayed and the UA, author and user have no means to turn tracks on/off.
  • If the UA/author/user should be allowed to control the activation/deactivation of tracks, it is necessary to provide slave video or audio elements that link back to the master's individual tracks. In this case, all tracks are by default deactivated and are only activated when either the page author adds a @checked attribute, the UA settings require them to be activated, the script author activates them through script, or the user activates them from the common menu.
  • The timeline is defined through the master and the duration of the longest track. Where an element has a @controls attribute, it has to show the identical state to the master. Where a track is shorter than the full timeline, display a transparent image for video and silence for audio.
  • When one element stalls, all stall. After a timeout, however, it could be possible for a UA to drop out on a slave track to be able to continue playing the other tracks. This is a QoS issue that is left to the UAs for implementation.


JavaScript API

interface HTMLMediaElement {
  [...]
           attribute DOMString timeline;
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  readonly attribute DOMString name;
  readonly attribute boolean checked;
}

A script developer that wants to get the overall playback state of the multitrack resource (e.g. to run their own controls) should only ever read the IDL attributes of the master. For changing the playback position, only @currentTime of the master is relevant - @currentTime of the slaves is turned into a readonly attribute. Autoplay, loop and playbackRate changes are also ignored, as are calls to play() and pause().

With such an interface, we can e.g. use the following to activate the first English audio description track of video v1:

// get all video elements that depend on v1
audioTracks = new Array[];
index = 0;
for (i in document.getElementsByTagname("audio")) {
  if (i.timeline == "v1") {
    audioTracks[index] = i;
    index++;
  }
}
for (i in audioTracks) {
  if (audioTracks[i].kind == "audiodescription" && audioTracks[i].language == "en") {
    audioTracks[i].checked="checked";
    break;
  }
}


Rendering

There are now multiple dependent media elements on the page, possibly each with controls. Rendering, including "display: none" is left to the author - default rendering follows the CSS layout model.

There is a need to add a menu to the controls displayed on the master element. This menu will contain lists of alternative and additionally available tracks on top of the main ones, as defined by the slave elements.


IMPACT


Advantages

  1. + Video elements can be styled individually as their own CSS block elements and deactivated with "display: none".
  2. + Audio and video elements retain their full functionality, even if the user interaction with any controls represent a synchronized interaction with all elements in the group.
  3. + Doesn't require any new elements, just attributes.

Disadvantages

  1. - There are new attributes on the audio and video elements making them more complex.

Conformance impact This is a new feature, which will require implementation in all UAs.

Risks

  • If the discussion was to continue around these proposals, it can be expected that we can find a consensus solution. This is a new feature and not really a contentious issue.
Last modified on 11 April 2011, at 00:24