This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.

Track Kinds

From HTML accessibility task force Wiki
Jump to: navigation, search

Background

  • Support for multiple in-band tracks in HTML raises the question of how to determine what each track is. This could be done with external metadata, but this implies the page has a means to obtain and interpret that metadata, which rules out a number of architectures involving generic scripts and web page components.
  • Increasing interest in and standardization of adaptive streaming means that multiple in-band audio and video tracks become viable, because it is possible to transport only the active tracks.

Current status (4/28/11)

  • A getKind() method is included in the W3C editor’s draft
  • A specific set of values is defined in the draft, based on some of the values supported by the Ogg container format. This list is repeated below for reference
  • 3GPP have asked W3C if they will define kind values for accessibility purposes and provided us with a list of the values they have defined for other non-accessibility) purposes
  • MPEG is expected to align with 3GPP
  • This page is a work-in-progress and is being actively studied by the HTML a11y working group

Questions from 3GPP

In [1] they ask:

  1. whether our hope to recommend use of W3C ‘role’ names, in our specification, seems achievable and reasonable, in your opinion;
  2. your thinking on the set of names;
  3. your schedule for defining at least a stable initial set of names;
  4. whether you will define a URN to identify the set you define.

Proposed answers (proposed by Mark)

  1. Yes, except that we call them "kinds"
  2. (to be decided - see below)
  3. Iinitial set to be decided by LC deadline (May 22nd?)
  4. (to be decided - don't see why not)

The rationale for defining values in W3C rather than just reflecting what containers support is that scripts should not need to know what kind of container the track came from to be able to interpret the values. The values should be defined in a “container-independent” way.

Kinds in the W3C Editor's draft

These values are returned by the getKind() method on an audio or video track and are documented here : http://dev.w3.org/html5/spec/video.html#dom-tracklist-getkind

Value Definition
alternative A possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video)
description An audio description of a video track.
main The primary audio or video track.
sign A sign-language interpretation of an audio track.
translation A translated version of the main track.

Kinds that have been suggested

This section collects all the "kinds" which have been suggested and their source. There may be duplicates and one of the purposes of the table is to identify them.

Value Source Definition Proposal
video/main Ogg "roles" [2] The main video track. Same as main
video/alternate Ogg "roles" [2] A possible alternative to the main track, e.g. a different camera angle. Same as alternative
video/sign Ogg "roles" [2] A sign-language video track. Same as sign
audio/main Ogg "roles" [2] The main audio track. Same as main
audio/dub Ogg "roles" [2] The audio track but with speech in a different language to the original. Same as translation
audio/audiodesc Ogg "roles" [2] An audio description recording for the vision-impaired. Same as description
audio/music Ogg "roles" [2] A music track, e.g. when music, speech and sound effects are delivered in different tracks. Not required
audio/speech Ogg "roles" [2] A speech track, e.g. when music, speech and sound effects are delivered in different tracks. Not required
audio/sfx Ogg "roles" [2] A sound effects track, e.g. when music, speech and sound effects are delivered in different tracks. Not required
main 3GPP Liaison [1] This stream is part of the main program content. Same as main
supplementary 3GPP Liaison [1] For a main program that is audio, a supplementary video stream might provide, for example, dynamic graphics Not required
alternate 3GPP Liaison [1] Such a stream might provide a different camera viewpoint (and we strongly recommend the provision of further annotations to clarify the nature of the alternative); Same as alternative
commentary 3GPP Liaison [1] Use alternative
dub 3GPP Liaison [1] An alternative audio stream that contains a non-original language Same as translation
captions 3GPP Liaison [1] Add to specification
subtitles 3GPP Liaison [1] Add to specification
sign language a11y TF Same as sign
Captions a11y TF As in burnt in captions Add to specification
Different camera angles a11y TT Same as alternative
Video mosaic a11y TF Very specific use-case. Handle at page level
Language dub a11y TF Same as translation
Audio descriptions a11y TF Same as description above
Commentary a11y TF As in director’s commentary Use alternative
Clear audio a11y TF See http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Clear_audio Add to specification
Highcontrast David Singer (requirements unclear ?) Needs further study
Lowcontrast David Singer (requirements unclear ?) Needs further study
Colour blindness adjustments David Singer (requirements unclear ?) Needs further study
Cognitative adjustments David Singer (requirements unclear ?) Needs further study
Repetitive stimulus safe David Singer (not clear this is a kind rather than some other kind of property) Needs further study

New kinds proposed for the specification

This section documents the kinds which are proposed to be added to the specification.

Value Definition
captions Video with open ("burned in") captions
subtitles Video with open ("burned in") subtitles
clearaudio An alternative audio track in which sounds which are not dialog or other important non-speech information are attenuated. See http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Clear_audio

General rationale for exposing kind values is that they give UAs and JS developers a means to determine what to do with a track. Mostly this means to define whether something should be turned on in addition to a main track, and whether a track contains accessibility and internationalization information to allow them to be turned on by the UA/JS. It may also be used to provide suitable UI elements to enable/disable the track. Human-readable information that might assist user selection would go into the getLabel().

References

[1] http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_64/Docs/S4-110502.zip

[2] http://wiki.xiph.org/SkeletonHeaders#Role