From W3C Wiki
Jump to: navigation, search


Technical Requirements Prioritizations and Dependencies

Alternative Content Technologies

System Requirements

Reference Requirement Brief Description Types of Technologies affected Must/ Should/ May
Described video
(DV-1) Provide an indication that descriptions are available, and are active/non-active. audio rendering, user interface, API, user preferences, markup must
(DV-2) Render descriptions in a time-synchronized manner, using the media resource as the timebase master. audio rendering, synchronization must
(DV-3) Support multiple description tracks (e.g., discrete tracks containing different levels of detail). API, multitrack, synchronization, navigation, markup, user interface must
(DV-4) Support recordings of real human speech as part of a media resource, or as an external file. synchronization, multitrack, API, markup must
(DV-5) Allow the author to independently adjust the volumes of the audio description and original soundtracks. audio rendering, API, user interface must
(DV-6) Allow the user to independently adjust the volumes of the audio description and original soundtracks, with the user's settings overriding the author's. user preferences, API, user interface must
(DV-7) Permit smooth changes in volume rather than stepped changes. The degree and speed of volume change should be under provider control. audio rendering, user interface, API must
(DV-8) Allow the author to provide fade and pan controls to be accurately synchronised with the original soundtrack. audio rendering, user interface must
(DV-9) Allow the author to use a codec which is optimised for voice only, rather than requiring the same codec as the original soundtrack. codecs must
(DV-10) Allow the user to select from among different languages of descriptions, if available, even if they are different from the language of the main soundtrack. markup, API, user interface must
(DV-11) Support the simultaneous playback of both the described and non-described audio tracks so that one may be directed at separate outputs (e.g., a speaker and headphones). user interface, audio rendering must
(DV-12) Provide a means to prevent descriptions from carrying over from one program or channel when the user switches to a different program or channel. synchronization must
(DV-13) Allow the user to relocate the description track within the audio field, with the user setting overriding the author setting. The setting should be re-adjustable as the media plays. user preferences, audio rendering must
(DV-14) Support metadata, such as copyright information, usage rights, language, etc. cue format, in-band cues, multitrack must
Text video description
(TVD-1) Support presentation of text video descriptions through a screen reader or braille device cue format, audio rendering, visual rendering, synchronization, API, markup, speech synthesis must
(TVD-1) cont support playback speed control and voice control and synchronization points with the video. user interface, speech synthesis, cue format, audio rendering, synchronization, API, markup must
(TVD-2) TVDs need to be provided in a format that contains start time, text per description cue (the duration is determined dynamically, though an end time could provide a cut point) cue format must
(TVD-2) cont TVDs need to be provided in a format that contains possibly a speech-synthesis markup to improve quality of the description speech synthesis should
(TVD-2) cont TVDs need to be provided in a format that contains accompanying metadata labeling for speakers, language, etc. cue format, audio rendering, speech synthesis must
(TVD-3) Where possible, provide a text or separate audio track privately to those that need it in a mixed-viewing situation, e.g., through headphones. audio rendering should
(TVD-4) Where possible, provide options for authors and users to deal with the overflow case: continue reading, stop reading, and pause the video. cue format, rendering, user interface should
(TVD-5) Support the control over speech-synthesis playback speed, volume and voice, and provide synchronisation points with the video. user interface, audio rendering, speech synthesis, synchronization must
Extended video descriptions
(EVD-1) Support detailed user control as specified in (TVD-4) for extended video descriptions. cue format, rendering, user interface, API must
(EVD-2) Support automatically pausing the video and main audio tracks in order to play a lengthy description. rendering, user interface, API must
(EVD-3) Support resuming playback of video and main audio tracks when the description is finished. rendering, API must
Clear audio
(CA-1) Support speech as a separate, alternative audio track from other sounds. synchronization, multitrack, API must
(CA-2) Support the synchronisation of multitrack audio either within the same file or from separate files - preferably both. synchronization, multitrack, API, markup must
(CA-3) Support separate volume control of the different audio tracks. user interface, API must
(CA-4) Support pre-emphasis filters, pitch-shifting, and other audio-processing algorithms. audio rendering, API, user interface must
Content navigation by content structure
(CN-1) Provide a means to structure media resources so that users can navigate them by semantic content structure, e.g. through adding a track to the video that contains navigation markers (in table-of-content style). This means must allow authors to identify ancillary content structures. Support keeping all media representations synchronised when users navigate. must
(CN-2) The navigation track should provide for hierarchical structures with titles for the sections. must
(CN-3) Support both global navigation by the larger structural elements of a media work, and also the most localized atomic structures of that work, even though authors may not have marked-up all levels of navigational granularity. must
(CN-4) Support third-party provided structural navigation markup. must
(CN-5) Keep all content representations in sync, so that moving to any particular structural element in media content also moves to the corresponding point in all provided alternate media representations (captions, described video, transcripts, etc) associated with that work. must
(CN-6) Support direct access to any structural element, possibly through URIs. must
(CN-7) Support pausing primary content traversal to provide access to such ancillary content in line. must
(CN-8) Support skipping of ancillary content in order to not interrupt content flow. must
(CN-9) Support access to each ancillary content item, including with "next" and "previous" controls, apart from accessing the primary content of the title. must
(CN-10) Support that in bilingual texts both the original and translated texts can appear on screen, with both the original and translated text highlighted, line by line, in sync with the audio narration. must
(CC-1) render time-synchronized cues along the media timebase cue format, in-band cues, synchronisation, API, rendering, user preferences must
(CC-2) allow erasures, i.e. times when no text cues are active cue format, in-band cues must
(CC-3) allow gap-less cues cue format must
(CC-4) specify a character encoding cue format, internationalisation must
(CC-5) positioning on all parts of the screen, inside and outside the video viewport cue format, rendering, user preferences must
(CC-6) display of multiple text cues at the same time cue format, rendering, user preferences must
(CC-7) display of multiple text cues also in ltr or rtl languages cue format, rendering, internationalisation must
(CC-8) allow explicit line breaks cue format, rendering must
(CC-9) allow a range of font faces and sizes cue format, rendering, user preferences must
(CC-10) allow background colors and background opacity cue format, rendering, user preferences must
(CC-11) allow text colors and opacity cue format, rendering, user preferences must
(CC-12) allow thicker outline or a drop shadow on text cue format, rendering, user preferences must
(CC-13) enable/disable continuation of background color on erasures cue format, rendering, user preferences must
(CC-14) allow cue text rendering effects, e.g. paint on, pop on, roll up, appear cue format, rendering, user preferences must
(CC-15) support bottom 1/12 rendering rule cue format, rendering, user preferences must
(CC-16) support mixed language cues cue format, rendering, internationalisation must
(CC-17) support mixed language cue files cue format, rendering, internationalisation, API must
(CC-18) support furigana, ruby and other common typographical conventions cue format, rendering, internationalisation must
(CC-19) support full range of typographical glyphs, layout and punctuation marks cue format, rendering, internationalisation must
(CC-20) support semantic markup of mixed language cues cue format, rendering, internationalisation, speech synthesis must
(CC-21) support semantic markup of different speakers cue format, rendering, speech synthesis must
(CC-22) support the same API for in-band and external cue formats cue format, API must
(CC-23) synchronized display of cue text and media data cue format, API, synchronisation must
(CC-24) support user activation/deactivation of cue tracks API, synchronisation, user preferences must
(CC-25) support edited and verbatim caption alternatives API, synchronisation, user preferences must
(CC-26) support several cue tracks in different languages API, synchronisation, user preferences must
(CC-27) support live captioning cue format, API, synchronisation, user preferences must
Enhanced captions/subtitles
(ECC-1) support metadata markup of cue segments cue format should
(ECC-2) support hyperlinking on cue segments cue format should
(ECC-3) support extended cue times and overlap handling cue format, synchronisation, user preferences should
(ECC-4) support pausing on extended cue times or parallel display cue format, synchronisation, user preferences should
(ECC-5) allow users to specify their reading speed to deal with extended cues cue format, synchronisation, user preferences should
Sign translation
(SL-1) Support sign-language video either as a track as part of a media resource or as an external file. must
(SL-2) Support the synchronized playback of the sign-language video with the media resource. must
(SL-3) Support the display of sign-language video either as picture-in-picture or alpha-blended overlay, as parallel video, or as the main video with the original video as picture-in-picture or alpha-blended overlay. Parallel video here means two discrete videos playing in sync with each other. It is preferable to have one discrete <video> element contain all pieces for sync purposes rather than specifying multiple <video> elements intended to work in sync. must
(SL-4) Support multiple sign-language tracks in several sign languages. must
(SL-5) Support the interactive activation/deactivation of a sign-language track by the user. must
(T-1) Support the provisioning of a full text transcript for the media asset in a separate but linked resource, where the linkage is programatically accessible to AT. must
(T-2) Support the provisioning of both scrolling and static display of a full text transcript with the media resource, e.g. in a area next to the video or underneath the video, which is also AT accessible. must
Access to interactive controls / menus
(KA-1) Support operation of all functionality via the keyboard on systems where a keyboard is (or can be) present, and where a unique focus object is employed. This does not forbid and should not discourage providing mouse input or other input methods in addition to keyboard operation. (UAAG 2.0 4.1.1) (NOTE: This means that all interaction possibilities with media elements need to be keyboard accessible; e.g., through being able to tab onto the play, pause, mute buttons, and to move the playback position from the keyboard.) must
(KA-2) Support a rich set of native controls for media operation, including but not limited to play, pause, stop, jump to beginning, jump to end, scale player size (up to full screen), adjust volume, mute, captions on/off, descriptions on/off, selection of audio language, selection of caption language, selection of audio description language, location of captions, size of captions, video contrast/brightness, playback rate, content navigation on same level (next/prev) and between levels (up/down) etc. This is also a particularly important requirement on mobile devices or devices without a keyboard. (NOTE: This means that the @controls content attribute needs to provide an extended set of control functionality including functionality for accessibility users.) must
(KA-3) All functionality available to native controls must also be available to scripted controls. The author would be able to choose any/all of the controls, skin them and position them. (NOTE: This means that new IDL attributes need to be added to the media elements for the extra controls that are accessibility related.) must
(KA-4) It must always be possible to enable native controls regardless of the author preference to guarantee that such functionality is available and essentially override author settings through user control. This is also a particularly important requirement on mobile devices or devices without a keyboard. (NOTE: This could be enabled through a context menu, which is keyboard accessible and its keyboard access cannot be turned off.) must
(KA-5) The scripted and native controls must go through the same platform-level accessibility framework (where it exists), so that a user presented with the scripted version is not shut out from some expected behaviour. (NOTE: This is below the level of HTML and means that the accessibility platform needs to be extended to allow access to these controls. ) must
Granularity level control for structural navigation
(CNS-1) All identified structures, including ancillary content as defined in "Content Navigation" above, must be accessible with the use of "next" and "previous," as refined by the granularity control. must
(CNS-2) Users must be able to discover, skip, play-in-line, or directly access ancillary content structures. must
(CNS-3) Users need to be able to access the granularity control using any input mode, e.g. keyboard, speech, pointer, etc. must
(CNS-4) Producers and authors may optionally provide additional access options to identified structures, such as direct access to any node in a table of contents. must
Time-scale modification
(TSM-1) The user can adjust the playback rate of the time-based media tracks to between 50% and 250% of real time. must
(TSM-2) Speech whose playback rate has been adjusted by the user maintains pitch in order to limit degradation of the speech quality. must
(TSM-3) All provided alternative media tracks remain synchronized across this required range of playback rates. must
(TSM-4) The user agent provides a function that resets the playback rate to normal (100%). must
(TSM-5) The user can stop, pause, and resume rendered audio and animation content (including video and animated images) that last three or more seconds at their default playback rate. (UAAG 2.0 4.9.6) must
Production practice and resulting requirements
(PP-1) Support existing production practice for alternative content resources, in particular allow for the association of separate alternative content resources to media resources. Browsers cannot support all forms of time-stamp formats out there, just as they cannot support all forms of image formats (etc.). This necessitates a clear and unambiguous declared format, so that existing authoring tools can be configured to export finished files in the required format. must
(PP-2) Support the association of authoring and rights metadata with alternative content resources, including copyright and usage information. must
(PP-3) Support the simple replacement of alternative content resources even after publishing. This is again dependent on authoring practice - if the content creator delivers a final media file that contains related accessibility content inside the media wrapper (for example an MP4 file), then it will require an appropriate third-party authoring tool to make changes to that file - it cannot be demanded of the browser to do so. must
(PP-4) Typically, alternative content resources are created by different entities to the ones that create the media content. They may even be in different countries and not be allowed to re-publish the other one's content. It is important to be able to host these resources separately, associate them together through the Web page author, and eventually play them back synchronously to the user. must
Discovery and activation/deactivation of available alternative content by the user
(DAC-1) The user has the ability to have indicators rendered along with rendered elements that have alternative content (e.g., visual icons rendered in proximity of content which has short text alternatives, long descriptions, or captions). In cases where the alternative content has different dimensions than the original content, the user has the option to specify how the layout/reflow of the document should be handled. (UAAG 2.0 3.1.1). must
(DAC-2) The user has a global option to specify which types of alternative content by default and, in cases where the alternative content has different dimensions than the original content, how the layout/reflow of the document should be handled. (UAAG 2.0 3.1.2). Media queries have been proposed as a way of meeting this need, along with the use of CSS for layout. must
(DAC-3) The user can browse the alternatives and switch between them. must
(DAC-4) Synchronized alternatives for time-based media (e.g., captions, descriptions, sign language) can be rendered at the same time as their associated audio tracks and visual tracks (UAAG 2.0 3.1.3). must
(DAC-5) Non-synchronized alternatives (e.g., short text alternatives, long descriptions) can be rendered as replacements for the original rendered content (UAAG 2.0 3.1.3). must
(DAC-6) Provide the user with the global option to configure a cascade of types of alternatives to render by default, in case a preferred alternative content type is unavailable (UAAG 2.0 3.1.4). must
(DAC-7) During time-based media playback, the user can determine which tracks are available and select or deselect tracks. These selections may override global default settings for captions, descriptions, etc. (UAAG 2.0 4.9.8) must
(DAC-8) Provide the user with the option to load time-based media content such that the first frame is displayed (if video), but the content is not played until explicit user request. (UAAG 2.0 4.9.2) must
Requirements on making properties available to the accessibility interface
(API-1) The existence of alternative-content tracks for a media resource must be exposed to the user agent. must
(API-2) Since authors will need access to the alternative content tracks, the structure needs to be exposed to authors as well, which requires a dynamic interface. must
(API-3) Accessibility APIs need to gain access to alternative content tracks no matter whether those content tracks come from within a resource or are combined through markup on the page. must
Requirements on the use of the viewport
(VP-1) It must be possible to deal with three different cases for the relation between the viewport size, the position of media and of alternative content:
  1. the alternative content's extent is specified in relation to the media viewport (e.g., picture-in-picture video, lower-third captions)
  2. the alternative content has its own independent extent, but is positioned in relation to the media viewport (e.g., captions above the audio, sign-language video above the audio, navigation points below the controls)
  3. the alternative content has its own independent extent and doesn't need to be rendered in any relation to the media viewport (e.g., text transcripts)

If alternative content has a different height or width than the media content, then the user agent will reflow the (HTML) viewport. (UAAG 2.0 3.1.4).
(NOTE: This may create a need to provide an author hint to the Web page when embedding alternate content in order to instruct the Web page how to render the content: to scale with the media resource, scale independently, or provide a position hint in relation to the media. On small devices where the video takes up the full viewport, only limited rendering choices may be possible, such that the UA may need to override author preferences.)

(VP-2) The user can change the following characteristics of visually rendered text content, overriding those specified by the author or user-agent defaults (UAAG 2.0 3.6.1). Note: this should include captions and any text rendered in relation to media elements, so as to be able to magnify and simplify rendered text):
  1. text scale (i.e., the general size of text) ,
  2. font family, and
  3. text color (i.e., foreground and background).

(NOTE: This should be achievable through UA configuration or even through something like a greasemonkey script or user CSS which can override styles dynamically in the browser.)

(VP-3) Provide the user with the ability to adjust the size of the time-based media up to the full height or width of the containing viewport, with the ability to preserve aspect ratio and to adjust the size of the playback viewport to avoid cropping, within the scaling limitations imposed by the media itself. (UAAG 2.0 4.9.9)

(NOTE: This can be achieved by simply zooming into the Web page, which will automatically rescale the layout and reflow the content.)

(VP-4) Provide the user with the ability to control the contrast and brightness of the content within the playback viewport. (UAAG 2.0 4.9.11)

(NOTE: This is a user-agent device requirement and should already be addressed in the UAAG. In live content, it may even be possible to adjust camera settings to achieve this requirement. It is also a "SHOULD" level requirement, since it does not account for limitations of various devices.)

(VP-5) Captions and subtitles traditionally occupy the lower third of the video, where also controls are also usually rendered. The user agent must avoiding overlapping of overlay content and controls on media resources. This must also happen if, for example, the controls are only visible on demand.

(NOTE: If there are several types of overlapping overlays, the controls should stay on the bottom edge of the viewport and the others should be moved above this area, all stacked above each other. )

Requirements on the parallel use of alternate content on potentially multiple devices in parallel
(MD-1) Support a platform-accessibility architecture relevant to the operating environment. (UAAG 2.0 2.1.1) must
(MD-2) Ensure accessibility of all user-interface components including the user interface, rendered content, and alternative content; make available the name, role, state, value, and description via a platform-accessibility architecture. (UAAG 2.0 2.1.2) must
(MD-3) If a feature is not supported by the accessibility architecture(s), provide an equivalent feature that does support the accessibility architecture(s). Document the equivalent feature in the conformance claim. (UAAG 2.0 2.1.3) must
(MD-4) If the user agent implements one or more DOMs, they must be made programmatically available to assistive technologies. (UAAG 2.0 2.1.4) This assumes the video element will write to the DOM. must
(MD-5) If the user can modify the state or value of a piece of content through the user interface (e.g., by checking a box or editing a text area), the same degree of write access is available programmatically (UAAG 2.0 2.1.5). must
(MD-6) If any of the following properties are supported by the accessibility-platform architecture, make the properties available to the accessibility-platform architecture (UAAG 2.0 2.1.6):
  1. the bounding dimensions and coordinates of rendered graphical objects;
  2. font family;
  3. font size;
  4. text foreground color;
  5. text background color;
  6. change state/value notifications.
(MD-7) Ensure that programmatic exchanges between APIs proceed at a rate such that users do not perceive a delay. (UAAG 2.0 2.1.7). must


The technologies in use in the checklist are as follows:

  • cue format: text format for providing time-aligned text
  • in-band cues: time-aligned text inside binary media resources
  • multitrack: dealing with multiple audio and video tracks in one resource
  • synchronisation: synchronisation between different resources
  • API: JavaScript API
  • rendering (video): visual rendering of accessibility information
  • audio rendering: audio rendering of accessibility information
  • speech synthesis: text-to-speech rendering of alternative text
  • user preferences: preference settings in Web browsers
  • internationalisation: satsifying the needs of users with different language backgrounds
  • navigation: dealing with relationships in the user interface
  • markup: needs specific markup in the HTML
  • codecs: requirements on types of audio and video codecs
  • user interface: part of visual rendering that affects user interaction