This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.
Media Multitrack Media API
From HTML accessibility task force Wiki
Note: the discussion here is superceded by http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Change_Proposals_Summary
Contents |
Multitrack Media API
Issue: http://www.w3.org/html/wg/tracker/issues/152 Feedback due by 21st February
Bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452
Use case: Audio and video often have more than one audio and one video track. In particular we often have sign language tracks, audio description tracks, dubbed audio tracks, but also alternate viewing angles and similar additional or alternative tracks to the main a/v tracks. Sometimes such tracks are an inherent part of the main media resource, in other instances they are separate by synchronised resources. Currently there is no means in HTML5 to use such multitrack media resources. You can check out http://www.longtailvideo.com/support/addons/audio-description/15136/audio-description-reference-guide for an example of audio descriptions in Flash that we want to replicate in HTML5.
Some example uses: http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_Rendering
Requirements: We need a means to make use of such multitrack media content in HTML5. 1. We need a means to provide multitrack media resources to the Web page where the multiple tracks come in multiple resources. 2. We need to define a JavaScript API that lets us control the display/playback of individual video/audio tracks both out of in-band multitrack media resources and out of constructed multitrack resources from multiple external files.
Side conditions:
- we want to achieve a consistent API between in-band and external audio/video tracks.
- we want to be able to control the relative volume of a additional audio track and the positioning of video tracks as picture-in-picture or side-by-side viewports.
- we don't want to screw with the source selection algorithm which is already complicated enough as is.
- we want to support the most important and most scalable use case natively and encourage that as the main means to author content, namely by providing a main resource and additional tracks to complete that presentation. Other use cases should not get dedicated markup and can be satisfied through special JavaScript or server software.
- there is no new markup needed for in-band, just a JavaScript API and notes on how to render.
- we assume tracks are created in such a fashion that the can add to each other, not replace each other. This restricts authoring, but if somebody wants to do replacement, they can always define alternative audio and video elements and activate them through JavaScript.
- we assume that the alternate audio/video tracks are provided as a single file with approximately the same duration as the main resource and that synchronisation between them implies synchronizing their starting points and playback speed. Content in alternate audio/video tracks that goes beyond the duration of the main resource will be chopped off and never play back.
- situations where we have small snippets of audio that are synchronized to particular times in the video (as shown below) are not considered here. They can right now be solved by using WebVTT with a @kind="metadata", with a hyperlink to the media resource(s) in each synchronized cue, and with JavaScript that will interpret this content and play back the links at the right time. This approach also allows for providing textual descriptions at the same time as recorded descriptions in sync in a WebVTT resource as may be in use with a Braille device. This approach can also provide mixed text and audio descriptions in cases where, e.g. proper names would not be read out correctly by a screen reader.
-- silence from 0s-15s -- video description #1 from 15s-20s -- silence from 20s-30s -- video description #2 from 30s-35s -- silence from 35s-45s -- video description #3 from 45s-50s -- silence from 50s-60s
Possible solutions to the markup challenge:
(1) No markup in HTML - leave to a manifest file
For example synchronizing external audio description and sign language video with main video:
<video id="v1" poster=“video.png” controls> <source src=“manifest_webm” type=”video/webm”> <source src=“manifest_mpg” type=”video/mp4”> <track kind=”captions” srclang=”en” src=”captions.vtt”> </video>
In this approach we do not distinguish between the markup for a multitrack media resource where the tracks are provided in-band or externally. Instead we expect this information to be available in some kind of manifest file which the browser will parse and expose to the Web page as though all the tracks are available in-band.
Advantages:
- + There is no need to define any new markup (i.e. elements and attributes) for it, just a JavaScript API.
- + This could work well with adaptive streaming which probably also needs a manifest file.
- + Since a manifest file is restricted to a certain content type, this also makes it easy to provide the correctly encoded alternative media resources with the correct main resource (the "codec" issue).
- + The synchronization is completely handed to the browser and it will make sure that start time and progress line up.
- + This approach also allows for the introduction of snippet synchronization rather than fully synchronized audio descriptions, since the basis of adaptive streaming is a collection of snippets.
Disadvantages:
- - It makes it non-obvious to HTML if there is an audio description track/sign language track/other track available (though that is also the case for in-band, too) (the "discoverability" issue).
- - There is a need to define the manifest file format to deal with multiple tracks.
- - It is impossible(?) to style the tracks through CSS, e.g. make one small and an overlay onto video etc. For rendering we have to rely on the browser.
JavaScript API: We require a means to expose the list of available tracks for a media resource in JavaScript and a means to activate/deactivate the tracks. For example:
interface MediaTrack {
readonly attribute DOMString kind;
readonly attribute DOMString label;
readonly attribute DOMString language;
const unsigned short OFF = 0;
const unsigned short HIDDEN = 1;
const unsigned short SHOWING = 2;
attribute unsigned short mode;
}
interface HTMLMediaElement : HTMLElement {
[...]
readonly attribute TextTrack[] textTracks;
readonly attribute MediaTrack[] mediaTracks;
};
With such an interface, we can e.g. use the following to activate/deactivate the first English audio description track:
for (i in video.mediaTrack) {
if (video.mediaTrack[i].kind == "description" && video.mediaTrack[i].language == "en") {
video.mediaTrack[i].mode = SHOWING;
break;
}
}
Rendering:
There is only one video element on the page, but potentially several video tracks for this video.
- we probably need to render all of the tracks into the same video viewport with only one control, e.g. tiled, picture-in-picture, or as a scrollable list on the side of the main video, see http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_Rendering; this could be specified as a CSS style on the video element, video.style.tracks = {tiled, pip, list} with a default of "tiled".
- we need to be able to address the individual video tracks through CSS and be able to change some CSS styles such as background, border, width, height, opacity, transitions, transforms and animations. The use of a pseudo-element to address the different video tracks (and audio tracks for that matter) is probably necessary: something like ::mediaTrack(id) with id being the index in the MediaTrack[] list.
Then we can do the following:
video {
tracks: pip;
}
video::mediaTrack(2) {
width: 200px;
opacity: 0.7;
}
We probably also need to add the list of available tracks to the menu for track selection and probably make it possible to close individual tracks (e.g. through and "X" in a corner).
(2) Overload <track> inside <video>
For example synchronizing external audio description and sign language video with main video:
<video id="v1" poster=“video.png” controls> <!-- primary content --> <source src=“video.webm” type=”video/webm”> <source src=“video.mp4” type=”video/mp4”> <track kind=”captions” srclang=”en” src=”captions.vtt”> <!-- pre-recorded audio descriptions --> <track src="audesc.ogg" kind="descriptions" type="audio/ogg" srclang="en" label="English Audio Description"> <track src="audesc.mp3" kind="descriptions" type="audio/mp3" srclang="en" label="English Audio Description"> <!-- sign language overlay --> <track id="signwebm" src="signlang.webm" kind="signings" type="video/webm" srclang="asl" label="American Sign Language"> <track id="signmp4" src="signlang.mp4" kind="signings" type="video/mp4" srclang="asl" label="American Sign Language"> </video>
In this approach we add a @type attribute to the <track> element, allowing it to also be used with external audio and video and not just text tracks.
Advantages:
- + The HTML markup clearly exposes what tracks are available (the "discovery" issue).
- + All types of external tracks are perceived to be handled in the same way, no matter if text, video or audio.
Disadvantages:
- - The given example uses replication of <track> elements for alternative codec files (the "codec" issue). It would also be possible to introduce <source> elements under <track> to cover this need. Neither of these options seems particularly elegant.
- - It is confusing media and text tracks in the same interface, which makes it hard to read and author and parse and style through CSS, e.g. if we would like to style all text tracks.
- - We lose all the functionality that is available to audio and video resources in the <audio> and <video> elements, such as setting the volume, width, and height.
- - Since the <track> element now isn't a full media element, it does not expose the features of a media element such as error states, seeking position, controls, muting, their own volume etc. This may also be an advantage...
- - It is necessary to define a default rendering means for the child a/v tracks. This may be overriden by CSS.
JavaScript API: We would reuse the TextTrack API for these types of tracks, too, and just introduce some further @kind such as signlanguage or audiodescription. However, a part of the TextTrack API - the elements and attributes dealing with cues - will be irrelevant and we only need these parts:
interface TextTrack {
readonly attribute DOMString kind;
readonly attribute DOMString label;
readonly attribute DOMString language;
const unsigned short OFF = 0;
const unsigned short HIDDEN = 1;
const unsigned short SHOWING = 2;
attribute unsigned short mode;
const unsigned short NONE = 0;
const unsigned short LOADING = 1;
const unsigned short LOADED = 2;
const unsigned short ERROR = 3;
readonly attribute unsigned short readyState;
readonly attribute Function onload;
readonly attribute Function onerror;
}
With such an interface, we can e.g. use the following to activate/deactivate the first English audio description track:
for (i in video.track) {
if (video.track[i].kind == "audiodescription" && video.track[i].language == "en") {
video.track[i].mode = SHOWING;
break;
}
}
Rendering:
Again, there is only one video element on the page, so we probably need to render all of the tracks into the same video viewport with only one control.
Again, there is the question of layout, which could be done as, e.g. tiled, picture-in-picture, or as a list on the side of the main video, see http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_Rendering; this could be specified as a CSS style on the video element, video.style.tracks = {tiled, pip, list} with a default of "tiled".
Since the individual tracks are explicitly marked-up, there is probably no need for a pseudo-selector (although... what about in-band tracks...).
We can do the following:
video {
tracks: pip;
}
track.signwebm, track.signmp4 {
width: 200px;
opacity: 0.7;
}
(3) Introduce <audiotrack> and <videotrack>
Instead of overloading <track>, one could consider creating new track elements for audio and video, such as <audiotrack> and <videotrack>.
This allows keeping different attributes on these elements and having audio / video / text track lists separate in JavaScript.
Also, it allows forInvalid language.
You need to specify a language like this: <source lang="html4strict">...</source>
Supported languages for syntax highlighting:
4cs, 6502acme, 6502kickass, 6502tasm, 68000devpac, abap, actionscript, actionscript3, ada, algol68, apache, applescript, apt_sources, asm, asp, autoconf, autohotkey, autoit, avisynth, awk, bash, basic4gl, bf, bibtex, blitzbasic, bnf, boo, c, c_mac, caddcl, cadlisp, cfdg, cfm, chaiscript, cil, clojure, cmake, cobol, cpp, cpp-qt, csharp, css, cuesheet, d, dcs, delphi, diff, div, dos, dot, e, ecmascript, eiffel, email, erlang, f1, fo, fortran, freebasic, fsharp, gambas, gdb, genero, genie, gettext, glsl, gml, gnuplot, go, groovy, gwbasic, haskell, hicest, hq9plus, html4strict, icon, idl, ini, inno, intercal, io, j, java, java5, javascript, jquery, kixtart, klonec, klonecpp, latex, lb, lisp, locobasic, logtalk, lolcode, lotusformulas, lotusscript, lscript, lsl2, lua, m68k, magiksf, make, mapbasic, matlab, mirc, mmix, modula2, modula3, mpasm, mxml, mysql, newlisp, nsis, oberon2, objc, objeck, ocaml, ocaml-brief, oobas, oracle11, oracle8, oxygene, oz, pascal, pcre, per, perl, perl6, pf, php, php-brief, pic16, pike, pixelbender, plsql, postgresql, povray, powerbuilder, powershell, progress, prolog, properties, providex, purebasic, python, q, qbasic, rails, rebol, reg, robots, rpmspec, rsplus, ruby, sas, scala, scheme, scilab, sdlbasic, smalltalk, smarty, sql, systemverilog, tcl, teraterm, text, thinbasic, tsql, typoscript, unicon, vala, vb, vbnet, verilog, vhdl, vim, visualfoxpro, visualprolog, whitespace, whois, winbatch, xbasic, xml, xorg_conf, xpp, z80, zxbasic
