This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.
Track Kinds
Contents
Background
- Support for multiple in-band tracks in HTML raises the question of how to determine what each track is. This could be done with external metadata, but this implies the page has a means to obtain and interpret that metadata, which rules out a number of architectures involving generic scripts and web page components.
- Increasing interest in and standardization of adaptive streaming means that multiple in-band audio and video tracks become viable, because it is possible to transport only the active tracks.
Current status (4/28/11)
- A getKind() method is included in the W3C editor’s draft
- A specific set of values is defined in the draft, based on some of the values supported by the Ogg container format. This list is repeated below for reference
- 3GPP have asked W3C if they will define kind values for accessibility purposes and provided us with a list of the values they have defined for other non-accessibility) purposes
- MPEG is expected to align with 3GPP
- This page is a work-in-progress and is being actively studied by the HTML a11y working group
Questions from 3GPP
In [1] they ask:
- whether our hope to recommend use of W3C ‘role’ names, in our specification, seems achievable and reasonable, in your opinion;
- your thinking on the set of names;
- your schedule for defining at least a stable initial set of names;
- whether you will define a URN to identify the set you define.
Proposed answers (proposed by Mark)
- Yes, except that we call them "kinds"
- (to be decided - see below)
- Iinitial set to be decided by LC deadline (May 22nd?)
- (to be decided - don't see why not)
The rationale for defining values in W3C rather than just reflecting what containers support is that scripts should not need to know what kind of container the track came from to be able to interpret the values. The values should be defined in a “container-independent” way.
Kinds in the W3C Editor's draft
These values are returned by the getKind() method on an audio or video track and are documented here : http://dev.w3.org/html5/spec/video.html#dom-tracklist-getkind
Value | Definition | |
---|---|---|
alternative |
A possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video) | |
description |
An audio description of a video track. | |
main |
The primary audio or video track. | |
sign |
A sign-language interpretation of an audio track. | |
translation |
A translated version of the main track. |
Kinds that have been suggested
This section collects all the "kinds" which have been suggested and their source. There may be duplicates and one of the purposes of the table is to identify them.
Value | Source | Definition | Proposal |
---|---|---|---|
video/main | Ogg "roles" [2] | The main video track. | Same as main
|
video/alternate | Ogg "roles" [2] | A possible alternative to the main track, e.g. a different camera angle. | Same as alternative
|
video/sign | Ogg "roles" [2] | A sign-language video track. | Same as sign
|
audio/main | Ogg "roles" [2] | The main audio track. | Same as main
|
audio/dub | Ogg "roles" [2] | The audio track but with speech in a different language to the original. | Same as translation
|
audio/audiodesc | Ogg "roles" [2] | An audio description recording for the vision-impaired. | Same as description
|
audio/music | Ogg "roles" [2] | A music track, e.g. when music, speech and sound effects are delivered in different tracks. | Not required |
audio/speech | Ogg "roles" [2] | A speech track, e.g. when music, speech and sound effects are delivered in different tracks. | Not required |
audio/sfx | Ogg "roles" [2] | A sound effects track, e.g. when music, speech and sound effects are delivered in different tracks. | Not required |
main | 3GPP Liaison [1] | This stream is part of the main program content. | Same as main
|
supplementary | 3GPP Liaison [1] | For a main program that is audio, a supplementary video stream might provide, for example, dynamic graphics | Not required |
alternate | 3GPP Liaison [1] | Such a stream might provide a different camera viewpoint (and we strongly recommend the provision of further annotations to clarify the nature of the alternative); | Same as alternative
|
commentary | 3GPP Liaison [1] | Use alternative
| |
dub | 3GPP Liaison [1] | An alternative audio stream that contains a non-original language | Same as translation
|
captions | 3GPP Liaison [1] | Add to specification | |
subtitles | 3GPP Liaison [1] | Add to specification | |
sign language | a11y TF | Same as sign
| |
Captions | a11y TF | As in burnt in captions | Add to specification |
Different camera angles | a11y TT | Same as alternative
| |
Video mosaic | a11y TF | Very specific use-case. Handle at page level | |
Language dub | a11y TF | Same as translation
| |
Audio descriptions | a11y TF | Same as description above
| |
Commentary | a11y TF | As in director’s commentary | Use alternative
|
Clear audio | a11y TF | See http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Clear_audio | Add to specification |
Highcontrast | David Singer | (requirements unclear ?) | Needs further study |
Lowcontrast | David Singer | (requirements unclear ?) | Needs further study |
Colour blindness adjustments | David Singer | (requirements unclear ?) | Needs further study |
Cognitative adjustments | David Singer | (requirements unclear ?) | Needs further study |
Repetitive stimulus safe | David Singer | (not clear this is a kind rather than some other kind of property) | Needs further study |
New kinds proposed for the specification
This section documents the kinds which are proposed to be added to the specification.
Value | Definition |
---|---|
captions |
Video with open ("burned in") captions |
subtitles |
Video with open ("burned in") subtitles |
clearaudio |
An alternative audio track in which sounds which are not dialog or other important non-speech information are attenuated. See http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Clear_audio |
General rationale for exposing kind values is that they give UAs and JS developers a means to determine what to do with a track. Mostly this means to define whether something should be turned on in addition to a main track, and whether a track contains accessibility and internationalization information to allow them to be turned on by the UA/JS. It may also be used to provide suitable UI elements to enable/disable the track. Human-readable information that might assist user selection would go into the getLabel().
References
[1] http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_64/Docs/S4-110502.zip