This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.

TextFormat Mapping to Requirements

From HTML accessibility task force Wiki
Jump to: navigation, search

Technical Requirements Mapping for Time-aligned Text Alternatives

This page compares time-aligned text formats with the list of requirements that the media accessibility TF has collected.

The following formats are being considered:

1. TTML - Timed Text Markup Language TTML is a self contained XML format for describing the synchronised display of formatted text that can be associated with a given timeline.


2. WebSRT - Web Subtitle Resource Tracks WebSRT is a self-contained line text format consisting of temporal cues, their positioning, styling, and content.


Comparison of format features to the media accessibility requirements:

Reference Requirement Brief Description Types of Technologies affected Technology Categorization TTML addresses requirement WebSRT addresses requirement
Described video
(DV-1) Provide an indication that descriptions are available, and are active/non-active. audio rendering, user interface, API, user preferences, markup UX This a UA requirement, note TTML does not specifically address audio rendering, but it has markup to indicate text as description, and is capable of embedding SSML or similar. WebSRT does not (yet) have markup to indicate cue content as description, but is capable of embedding SSML or any other content.

It is possible to include such metadata in a first cue of 0 duration, which has been seen in SRT before, too:

00:00:00.000 --> 00:00:00.000
Kind=descriptions

Turning them on/off is a player issue, i.e will be done through the <track> element.

<video>
  <track id=”dv1” kind=”descriptions” srclang=”en” src=”dv1.wsrt”>
</video>
(DV-2) Render descriptions in a time-synchronized manner, using the media resource as the timebase master. audio rendering, synchronization UX TTML rendering can be based on a media resource using the media clock mode. Descriptions can be notated on individual cues within a file, and externally to the whole file using @kind.
 <p begin="0s" end="10s" role="description"> This is a description cue. </p>
 <p begin="0s" end="10s" role="caption"> This is a caption cue. </p>
 <p begin="11s" end="12s" role="description"> This is another description cue. </p>


WebSRT provides for the specification of descriptions for a media resource's timeline together with the <track> element:
<video>
  <track id=”dv2” kind=”descriptions” srclang=”en” src=”dv2.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 This is a first description cue.

 00:00:10.000 --> 00:00:20.000
 This is a second description cue.
(DV-3) Support multiple description tracks (e.g., discrete tracks containing different levels of detail). API, multitrack, synchronization, navigation, markup, user interface SPECCED Each track can be a separate TTML file, or enclosed as separate div's in a single file The <track> element of HTML together with WebSRT can provide for multiple description tracks through multiple WebSRT files.
<video>
  <track label=”rough” kind=”descriptions” srclang=”en” src=”dv1.wsrt”>
  <track label=”detailed” kind=”descriptions” srclang=”en” src=”dv2.wsrt”>
</video>
(DV-4) Support recordings of real human speech as part of a media resource, or as an external file. synchronization, multitrack, API, markup SPECNEW N/A N/A
(DV-5) Allow the author to independently adjust the volumes of the audio description and original soundtracks. audio rendering, API, user interface UX N/A N/A
(DV-6) Allow the user to independently adjust the volumes of the audio description and original soundtracks, with the user's settings overriding the author's. user preferences, API, user interface UX N/A N/A
(DV-7) Permit smooth changes in volume rather than stepped changes. The degree and speed of volume change should be under provider control. audio rendering, user interface, API UX N/A N/A
(DV-8) Allow the author to provide fade and pan controls to be accurately synchronised with the original soundtrack. audio rendering, user interface SPECNEW, UX N/A N/A
(DV-9) Allow the author to use a codec which is optimised for voice only, rather than requiring the same codec as the original soundtrack. codecs NO N/A N/A
(DV-10) Allow the user to select from among different languages of descriptions, if available, even if they are different from the language of the main soundtrack. markup, API, user interface UX The <track> element of HTML together with TTML can provide for multiple description tracks through multiple TTML files.
<video>
  <track label=”description” kind=”descriptions” srclang=”en” src=”dv1.ttml”>
  <track label=”Beschreibung” kind=”descriptions” srclang=”de” src=”dv2.ttml”>
</video>
The <track> element of HTML together with WebSRT can provide for multiple description tracks through multiple WebSRT files.
<video>
  <track label=”description” kind=”descriptions” srclang=”en” src=”dv1.wsrt”>
  <track label=”Beschreibung” kind=”descriptions” srclang=”de” src=”dv2.wsrt”>
</video>
(DV-11) Support the simultaneous playback of both the described and non-described audio tracks so that one may be directed at separate outputs (e.g., a speaker and headphones). user interface, audio rendering UX N/A N/A
(DV-12) Provide a means to prevent descriptions from carrying over from one program or channel when the user switches to a different program or channel. synchronization UX N/A N/A
(DV-13) Allow the user to relocate the description track within the audio field, with the user setting overriding the author setting. The setting should be re-adjustable as the media plays. user preferences, audio rendering UX N/A N/A
(DV-14) Support metadata, such as copyright information, usage rights, language, etc. cue format, in-band cues, multitrack CUEFMT TTML supports arbitrary metadata, on a file/section/cue or part of cue basis:
<p dur="20s">
     <metadata>
        Title: The Prophet
        Binding: Paperback
        ISBN 10: 000100039X
        ISBN 13: 9780001000391
        Keywords: Prophet - Gibran
        Subjects: POETRY / Inspirational & Religious; 
     </metadata>
     And when the shadow fades and is no more, the light that lingers becomes a shadow to another light.<br/>
     And thus your freedom when it loses its fetters becomes itself the fetter of a greater freedom.
     <span><metadata>Author: Kahlil Gibran</metadata>
           Khalil Gibran <span lang="ar">جبران خليل جبران بن ميخائيل بن سعد,</span> 
     </span>
WebSRT does not support file-wide metadata yet, but it supports timed metadata and such information could go into the first cue. Here are some EBU SRT inspired metadata fields.
0:00:00.000 --> 00:00:00.000
Kind=descriptions
Language = de
OriginalProgramTitle = Of Apples and Oranges
OrigianlEpisodeTitle = Epilogue
SubtitleListRefCode = invoice123
CreationDate = 2010-11-24T11:15:52EST
RevisionDate = 2010-11-24T11:15:52EST
RevisionNumber = 99
MaxCharWidth = 30
MaxRows = 3
CountryOfOrigin = Australia
Publisher = Gingertech
Editor = Mr Bean
EditorContact = mrbean@gmail.com
Copyright = Gingertech
License = http://creativecommons.org/licenses/by-sa/2.5/
Text video description
(TVD-1) Support presentation of text video descriptions through a screen reader or braille device cue format, audio rendering, visual rendering, synchronization, API, markup, speech synthesis UX TTML has markup to indicate text as description, and is capable of embedding SSML or similar.
    <div region='descriptionArea'>
      <p xml:id='desc1' ttm:role='description' begin='18s' dur='1s'
        >Open on a man in sports jacket and tie in front of a plain white background waving. An email address sean@windows.com is overlaid</p>
      <p xml:id='desc2' ttm:role='description' begin='5.08s' dur='1s'
        >Woman in casual clothing stands in front of a white board covered in technical diagrams. An email address uche@windows.com is overlaid.</p>
     ...
WebSRT supports timing text descriptions:
<video>
  <track id=”tvd1” kind=”descriptions” srclang=”en” src=”tvd1.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 This is a first description cue.

 00:00:10.000 --> 00:00:20.000
 This is a second description cue.
(TVD-1) cont support playback speed control and voice control and synchronization points with the video. user interface, speech synthesis, cue format, audio rendering, synchronization, API, markup UX TTML is inherently synchronised with external media. The actor attribute or speech metadata can be used to control voice. WebSRT supports voice control via the <v> element. It is inherently synchronized with external media.
<video>
  <track id=”tvd2” kind=”descriptions” srclang=”en” src=”tvd2.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 <1>This is a description cue by voice 1.

 00:00:10.000 --> 00:00:20.000
 <narrator>This is a description cue by the narrator.

 00:00:20.000 --> 00:00:30.000
 <music>This is some music.

Playback speed is a function of the screen reader software.

(TVD-2) TVDs need to be provided in a format that contains start time, text per description cue (the duration is determined dynamically, though an end time could provide a cut point) cue format UX its not clear what 'duration is determined dynamically' actually means here; assuming the description is not extended, and thus the main timeline is still running, the description has to be inserted in a gap of known duration to avoid the existing soundtrack, and so specifying a synchronised end point or duration on the cue is required, if the text is being read out by a TTS engine there are multiple means to "fit" the audio to the gap one is to speed the reader up, another is to slow the video down, such approaches are however a TTS playback issue, and not a text format issue. WebSRT has end times which provide the screen reader time a rough time to read the text.
Example for dynamic end time is in TVD-4.
(TVD-2) cont TVDs need to be provided in a format that contains possibly a speech-synthesis markup to improve quality of the description speech synthesis UX TTML can embed SSML in its own namespace
<p begin="30s" dur="5s">
   <metadata>
   <speak xmlns="http://www.w3.org/2001/10/synthesis">
   <s xml:lang="en-US">
      <voice name="David" gender="male" age="25">
      <prosody duration="5s">
        You shall be free indeed when your days are not without a care nor 
        your nights without a want and a grief</voice>
     </prosody>
   </s>
   </speak>
   </metadata>
   You shall be free indeed when your days are not without a care nor 
   your nights without a want and a grief
</p>

Or parts of SSML e.g. just the duration hint:

<p begin="30s" dur="5s" tts:duration="5s" xmlns:tts="http://www.w3.org/2001/10/synthesis">
   You shall be free indeed when your days are not without a care nor 
   your nights without a want and a grief
</p>
WebSRT cues can contain SSML or any other speech synthesis markup and then have a JavaScript library on the page to interpret the SSML:
<video>
  <track id=”tvd2” kind=”metadata” srclang=”en” src=”tvd2.wsrt”>
</video>
 00:00:00.000 --> 00:00:00.000
 Kind=metadata
 Type=application/ssml+xml
 Version=1.0
 Language=en-US
 Lexicon=http://www.example.com/lexicon.file
 Lexicon=http://www.example.com/strange-words.file

 00:00:00.000 --> 00:00:10.000
 <voice gender="male">
   <s>We stand in front of the <prosody rate="-20%">capitol in Washington.</prosody></s>
 </voice>

 00:00:10.000 --> 00:00:20.000
 <voice gender="male">
   <s>There is a crowd gathered <break/> and it is raining.</s>
  </voice>
(TVD-2) cont TVDs need to be provided in a format that contains accompanying metadata labelling for speakers, language, etc. cue format, audio rendering, speech synthesis UX TTML contains specific metadata for this, and can add arbitrary metadata.
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata">
  <head>
    <ttm:agent xml:id="connery" type="person">
      <ttm:name type="family">Connery</ttm:name>
      <ttm:name type="given">Thomas Sean</ttm:name>
      <ttm:name type="alias">Sean</ttm:name>
      <ttm:name type="full">Sir Thomas Sean Connery</ttm:name>
    </ttm:agent>
    <ttm:agent xml:id="bond" type="character">
      <ttm:name type="family">Bond</ttm:name>
      <ttm:name type="given">James</ttm:name>
      <ttm:name type="alias">007</ttm:name>
      <ttm:actor agent="connery"/>
    </ttm:agent>
  </head>
  <body>
    <div>
      ...
      <p ttm:agent="bond">I travel, a sort of licensed troubleshooter.</p>
      ...
    </div>
  </body>
</tt>


WebSRT supports <v> voice labelling. It does not (yet) support language changes mid-cue or attachment of arbitrary metadata mid-cue. This would require introduction of something like a <span> element or introduction of a lang and class attributes on the voice marker, e.g.
 00:00:00.000 --> 00:00:10.000
 <1>This is a description cue by voice 1.

 00:00:10.000 --> 00:00:20.000
 <narrator>This is a spoken by <span class=fn>Charles de Gaulles</span>.

 00:00:20.000 --> 00:00:30.000
 <2 lang=fr>Bonjour mes amis.
(TVD-3) Where possible, provide a text or separate audio track privately to those that need it in a mixed-viewing situation, e.g., through headphones. audio rendering UX it could be possible to use metadata markup on the TTML region mapping for descriptions to achieve this; but its laregley a UA issue This is a system level requirement where the descriptions are rendered to. For example
<video>
  <track id=”tvd3” kind=”descriptions” srclang=”en” src=”tvd3.wsrt”>
</video>

would be given to the screen reader, which could route its output to a headphone or Braille device rather than the computer speaker.

(TVD-4) Where possible, provide options for authors and users to deal with the overflow case: continue reading, stop reading, and pause the video. cue format, rendering, user interface UX I think this issue is actually conflating a number of concerns. In an extended description, e.g.
<p begin="1s" dur="1t" role="x-extended-description>This description pauses the video"</p>

The video is paused, and so the text duration is minimal, since it is only its onset which is of interest. There can be no overflow here. If the descriptions are not extended, then in order to maintain synchronisation an explicit duration on the cue will need to be specified, the speech engine will need to fit the alloted duration, an author controls overflow by not having overlapping activve cues, and if the cues do overlap; its up to the TTS engine settings whether the audio is pre-empted, continues to completion or overlapped.

WebSRT does not support specification of a means to tell the player to stop the video and wait for the end of the speech synthesis. It could be done by making the end time controlled by the browser/screen reader, e.g.
 00:00:00.000 --> next
 This is a first description cue.

 00:00:10.000 --> finish
 This is a second description cue.

 00:00:30.000 --> 00:00:40.000
 This is a third description cue.

"next" could signify to stop reading when the next cue gets active if not finished reading it, "finish" could signify to pause the video until the screen reader is finished. When an end time is given, it would also stop there because it can be assumed that something in the main audio track is happening there that is more important.

(TVD-5) Support the control over speech-synthesis playback speed, volume and voice, and provide synchronisation points with the video. user interface, audio rendering, speech synthesis, synchronization UX N/A (use embedded SSML) N/A (that is functionality of the screen reader)


Extended video descriptions
(EVD-1) Support detailed user control as specified in (| TVD-4) for extended video descriptions. cue format, rendering, user interface, API UX N/A N/A (it's for browser settings)
(EVD-2) Support automatically pausing the video and main audio tracks in order to play a lengthy description. rendering, user interface, API UX See example above for use of extended description role WebSRT does not currently specify at which cue the video has to pause and for how long.
Example for dynamic end time is in TVD-4.
(EVD-3) Support resuming playback of video and main audio tracks when the description is finished. rendering, API UX See example above for use of extended description role WebSRT does not currently specify at which cue the video has to pause and for how long.
Example for dynamic end time is in TVD-4.
Clear audio
(CA-1) Support speech as a separate, alternative audio track from other sounds. synchronization, multitrack, API SPECNEW N/A N/A
(CA-2) Support the synchronisation of multitrack audio either within the same file or from separate files - preferably both. synchronization, multitrack, API, markup SPECCED, UX N/A N/A
(CA-3) Support separate volume control of the different audio tracks. user interface, API UX N/A N/A
(CA-4) Support pre-emphasis filters, pitch-shifting, and other audio-processing algorithms. audio rendering, API, user interface UX N/A N/A
Content navigation by content structure
(CN-1) Provide a means to structure media resources so that users can navigate them by semantic content structure, e.g. through adding a track to the video that contains navigation markers (in table-of-content style). This means must allow authors to identify ancillary content structures. Support keeping all media representations synchronised when users navigate. Multitrack, synchronisation, api, markup, navigation, user interface. SPECCED (chapters) A ttml file in a track with kind=chapter with clock mode = "clock" can be interpreted as a hierarchical set of navigation points. then begin end end times indicating the indexes into the media, and the timed text the label.
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="en" ttp:timebase="clock"
xmlns="http://www.w3.org/ns/ttml"
xmlns:ttp="http://www.w3.org/ns/ttml#parameter">
  <body  role="x-nav-work" timeContainer='seq'>
    <div role="x-nav-section" timeContainer='seq'>
      <p role="x-nav-section" timeContainer='seq'>
        <span role="x-nav-section" dur="11.300s">Index point 1.1.1 </span>
        <span role="x-nav-section" dur="20.100s">Index point 1.1.2 </span>
        <span role="x-nav-section" dur="12.900s">Index point 1.1.3 </span>
        <span role="x-nav-section" dur="13.700s">Index point 1.1.4 </span>
      </p>
      <p role="x-nav-section" timeContainer='seq'>
        <span role="x-nav-section" dur="7.200s">Index point 1.2.1 </span>
        <span role="x-nav-section" dur="28.500s">Index point 1.2.2 </span>
        <span role="x-nav-section" dur="31.090s">Index point 1.2.3 </span>
        <span role="x-nav-section" dur="41.000s">Index point 1.2.4 </span>
      </p>
     </div>
    <div role="x-nav-section" timeContainer='seq'>
      <p role="x-nav-section" timeContainer='seq'>
        <span role="x-nav-section" dur="11.300s">Index point 2.1.1 </span>
        <span role="x-nav-section" dur="20.100s">Index point 2.1.2 </span>
        <span role="x-nav-section" dur="12.900s">Index point 2.1.3 </span>
        <span role="x-nav-section" dur="13.700s">Index point 2.1.4 </span>
      </p>
      <p role="x-nav-section" timeContainer='seq'>
        <span role="x-nav-section" dur="1.300s">Index point 2.2.1 </span>
        <span role="x-nav-section" dur="2.100s">Index point 2.2.2 </span>
        <span role="x-nav-section" dur="2.900s">Index point 2.2.3 </span>
        <span role="x-nav-section" dur="3.700s">Index point 2.2.4 </span>
      </p>
    </div>
  </body>
</tt>
WebSRT provides support for chapter tracks in conjunction with the <track> element.
<video>
  <track id=”cn1” kind=”chapters” srclang=”en” src=”cn1.wsrt”>
</video>
chapter-1
00:00:00,000 --> 00:00:18,000
Introductory Titles

chapter-2
00:00:18,001 --> 00:01:10,000
The Jack Plugs

chapter-3
00:01:10,001 --> 00:02:30,000
Robotic Birds
(CN-2) The navigation track should provide for hierarchical structures with titles for the sections. cue format, multi track, synchronisation, api, markup, navigation, user interface. SPECNEW, CUEFMT See example above for hierarchical navigation WebSRT provides support for chapter tracks whose text is the titles.
See example in CN-1.
(CN-3) Support both global navigation by the larger structural elements of a media work, and also the most localized atomic structures of that work, even though authors may not have marked-up all levels of navigational granularity. Multitrack, synchronisation, api, markup, navigation, user interface. UX See example above WebSRT has no hierarchical navigation included (yet). An option would be to include the dependency or hierarchical level into the cue.
chapter-1
00:00:07,000 --> 00:00:18,000
Digital Media

chapter-1.1
00:00:7,001 --> 00:00:10,000
<parent>chapter-1
Digital Video

chapter-1.2
<parent>chapter-1
00:00:10,001 --> 00:00:18,000
Digital Audio

It may also be possible for the accessibility API to navigate automatically on words and sentences in cues as minimal navigational granularity which does not need extra markup.

(CN-4) Support third-party provided structural navigation markup. cue format, Multitrack, synchronisation, api, markup, navigation, user interface. SPECCED Track contents are referenced by URI, and can thus be provided by 3rd party. Third party can provide chapter tracks throgh <track>.
<video>
  <track id=”cn1” kind=”chapters” srclang=”en” src=”http://othersite.com/cn4.wsrt”>
</video>
(CN-5) Keep all content representations in sync, so that moving to any particular structural element in media content also moves to the corresponding point in all provided alternate media representations (captions, described video, transcripts, etc) associated with that work. Multitrack, synchronisation, api, markup, navigation, user interface. UX This is up to the video element, but note that TTML can be random indexed at any time and so will keep up with any such jumps. N/A (this is the task of the <video> element)
(CN-6) Support direct access to any structural element, possibly through URIs. Multitrack, api SPECNEW URI's with a fragment segment can reference specific xml:id in a chapter ttml

e.g. http://www.example.com/navmenu.ttml#climax

N/A (media fragment URIs provide this)
(CN-7) Support pausing primary content traversal to provide access to such ancillary content in line. Multitrack, synchronisation, api, user preferences, markup, user interface UX This is up to the video element, but note that TTML can be random indexed at any time and so will keep up with any such pauses. N/A (this is the task of the <video> element)
(CN-8) Support skipping of ancillary content in order to not interrupt content flow. Multitrack, synchronisation, api, user preferences, markup, user interface UX This is up to the video element, but note that TTML can be random indexed at any time and so will keep up with any such jumps. N/A (this is the task of the UI)
(CN-9) Support access to each ancillary content item, including with "next" and "previous" controls, apart from accessing the primary content of the title. Multitrack, synchronisation, api, markup, navigation, user interface UX This is up to the video element, but note that TTML can be random indexed at any time and so will keep up with any such jumps. N/A (this is the task of the UI)
(CN-10) Support that in bilingual texts both the original and translated texts can appear on screen, with both the original and translated text highlighted, line by line, in sync with the audio narration. cue format, in-band cues, multi track, synchronisation, api, rendering (video), internationalization, navigation, user preferences, markup, user interface UX This doesnt seem like a navigation requirement, but can be dealt with by TTML files. See example at TTML Use cases <track> allows time-overlapping cues from different resources, that can be in different languages.
Captioning
(CC-1) render time-synchronized cues along the media timebase cue format, in-band cues, synchronisation, API, rendering, user preferences UX TTML can use media time. And can specify cues at frame or user defined tick intervals, or in absolute time. WebSRT synchronizes cues along the media timeline.
<video>
  <track id=”cc1” kind=”captions” srclang=”en” src=”cc1.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 This is a first caption cue from 0 to 10 sec.

 00:00:10.000 --> 00:00:20.000
 This is a second caption cue from 10 to 20 sec.
(CC-2) allow erasures, i.e. times when no text cues are active cue format, in-band cues CUEFMT example:

<p begin='1s' dur='1s'>c1 </p> <p begin='3s' dur='1s'>c2 </p>

Regions have a property to determine whether the background should remain visible

WebSRT supports times without cues.
 00:00:00.000 --> 00:00:10.000
 This is a first caption cue from 0 to 10 sec.

 00:00:15.000 --> 00:00:20.000
 This is a second caption cue from 15 to 20 sec.
There is no caption between 10 and 15 sec.
(CC-3) allow gap-less cues cue format CUEFMT, UX "example:

<div timeContainer='seq'>

<p dur='1s'>c1 </p>

<p dur='1s'>c2 </p>

</div>"

WebSRT supports gap-less cues.

See example in CC-1.

(CC-4) specify a character encoding cue format, internationalisation CUEFMT TTML is XML, and as well as the XML set, can specify any IANA registered character encoding WebSRT supports only UTF-8 per specification.
(CC-5) positioning on all parts of the screen, inside and outside the video viewport cue format, rendering, user preferences SPECNEW, CUEFMT region supports origin and extent attributes. These can be logical e.g. in terms of the video frame (and values greater than 100% and less than 0% are supported for off-video positions), or in video-pixel precise measures. WebSRT supports positioning in relation to a reference viewing area, which could be defined anywhere. The <track> element is, however, only defined for the video viewport, which is a problem for both TTML and WebSRT.

Example positioning inside the video viewport:

 00:00:00.000 --> 00:00:15.000 A:start S:100% L:100%
 This first caption is positioned at top left corner.

 00:00:15.000 --> 00:00:20.000 A:end S:100% L:0%
 This second caption bottom right corner.
(CC-6) display of multiple text cues at the same time cue format, rendering, user preferences UX TTML supports multiple regions, and multiple cues within a region.
  <div region='Area1'>
    <p begin='1s' dur='2s'>text one</p>
    <p begin='3s' dur='2s'>text two, will append to text one for 1 second</p>
  </div>
  <div region='Area2'>
    <p dur='5s'>text in a completely other area which stays visible the whole time</p>
  </div>
WebSRT supports time-overlapping cues, which can each be positioned at different locations.

Example positioning inside the video viewport:

 00:00:00.000 --> 00:00:20.000
 This is the first caption displayed bottom centered.

 00:00:15.000 --> 00:00:20.000
 This second caption pushed the first up by a line
 since it overlaps in time.

 00:00:15.000 --> 00:00:20.000 A:middle T:50% L:50%
 This third caption appears with the second one, but centered on screen.
(CC-7) display of multiple text cues also in ltr or rtl languages cue format, rendering, internationalisation UX writing direction is specifiable on a text element basis
 <div s:writingMode="tb-lr" >
    <p>
      頭を<span m:role="x-ruby">
        股<span m:role="x-rubytext" >また</span>
      </span>に突つ込んで祈るわ<span s:writingMode="lr">ruby</span>
    </p>
  </div>

Note the the use of Tate‑chuu‑yoko (also called kumimoji and renmoji) a block of horizontal type laid out within a vertical type line. Common in Asian texts.

WebSRT supports rendering directions for cue text.

Example vertical rendering:

 00:00:15,042 --> 00:00:18,042 A:start D:vertical L:98%
 <ruby>左<rt>ひだり</rt></ruby>に<ruby> 見<rt>み</rt></ruby>えるのは…

 00:00:18,750 --> 00:00:20,333 A:start D:vertical-lr L:98%
 <ruby>右<rt>みぎ</rt></ruby>に<ruby> 見<rt>み</rt></ruby>えるのは…

First cue renders vertically right-to-left, second left-to-right.

(CC-8) allow explicit line breaks cue format, rendering CUEFMT TTML supports the
element preserved whitespace, as well as the wrap and noWrap attribute.
<p xml:space="preserve"
>Example of a cue
split on three lines
with some    extra    whitespace
</p>
WebSRT line breaks in cues are explicit line breaks.
 00:00:00.000 --> 00:00:10.000
 This is a first caption cue
 broken in two lines.

 00:00:10.000 --> 00:00:20.000
 This is a
 second caption cue
 broken in three lines.
(CC-9) allow a range of font faces and sizes cue format, rendering, user preferences CUEFMT TTML supports the fontFamily and fontSize style property

This can be specified inline - making the caption file self contained and under the control of the caption author; inline style is optional

 <p xml:id="cc01" s:fontSize="14pt">Declared as fontsize 14pt</p>

But can be overridden by the page author or user stylesheet.

 <style>
  video::cue {
    font-size: larger!important;
  }
   video::cue#cc01 {
    color: red!important;
  }
</style>
WebSRT supports font styling through CSS - the default font is '0.1vh sans-serif'.
 <style>
  video track#cc9 ::cue {
    font: 15px arial,sans-serif;
  }
  video track#cc9 ::cue-part(1) {
    font: italic 12px Georgia, serif;
  }
 </style>
<video>
 <track id=”cc9” label=”caption1” kind=”captions” srclang=”en” src=”cc9.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 This cue is in arial 15px sans-serif.

 00:00:10.000 --> 00:00:20.000
 <1>This cue part</1> is in 12px Georgia, serif
(CC-10) allow background colors and background opacity cue format, rendering, user preferences CUEFMT TTML supports the backgroundColor style property on any element, which can be specified as an argb value. It also supports an overall opacity on regions. This can be specified inline - making the caption file self contained, but can be overridden by the page author or user stylesheet. WebSRT defines a default cue background color and transparency of 'rgba(0,0,0,0.8)', which can be modified by CSS.
 <style>
  video track#cc10 ::cue#cue1 {
    background-color: red;
  }
  video track#cc10 ::cue#cue2 {
    background-color: rgba(255,255,0,0.5);
  }
 </style>
<video>
 <track id=”cc10” label=”caption1” kind=”captions” srclang=”en” src=”cc10.wsrt”>
</video>
 cue1
 00:00:00.000 --> 00:00:10.000
 This cue has a red background.

 cue2
 00:00:10.000 --> 00:00:20.000
 This cue has a semi-transparent yellow background.
(CC-11) allow text colors and opacity cue format, rendering, user preferences CUEFMT TTML supports the color style property on any element, which can be specified as an argb value.This can be specified inline - making the caption file self contained, but can be overridden by the page author or user stylesheet. WebSRT uses CSS to style text - the default color and opacity is 'rgba(255,255,255,0)'.
 <style>
  video track#cc11 ::cue-part(1) {
    color: red;
  }
  video track#cc11 ::cue-part(2) {
    color: rgba(255,255,0,0.5);
  }
 </style>
<video>
 <track id=”cc11” label=”caption1” kind=”captions” srclang=”en” src=”cc11.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 <1>This cue text</1> has a red background.

 cue2
 00:00:10.000 --> 00:00:20.000
 <2>This cue text</2> has a semi-transparent yellow background.
(CC-12) allow thicker outline or a drop shadow on text cue format, rendering, user preferences UX TTML supports the fontOutline style property which has both a thickness and a shadow radius.This can be specified inline - making the caption file self contained, but can be overridden by the page author or user stylesheet. WebSRT uses CSS to style text including text outline and stroke.
 <style>
  video track#cc12 ::cue-part(1) {
    outline: yellow dotted thick;
  }
  video track#cc12 ::cue-part(2) {
    text-shadow: 2px 1px blue;
  }
 </style>
<video>
 <track id=”cc12” label=”caption1” kind=”captions” srclang=”en” src=”cc12.wsrt”>
</video>
 00:00:00.000 --> 00:00:10.000
 <1>This cue text</1> has a tick yellow dotted outline.

 cue2
 00:00:10.000 --> 00:00:20.000
 <2>This cue text</2> has a blue shadow.
(CC-13) enable/disable continuation of background color on erasures cue format, rendering, user preferences UX region supports the showBackground style property which controls when region backgrounds should be present. This is animatable. By default erasures don't show anything in WebSRT/<track>. But you could provide an empty cue and use CSS on this.
 00:00:00.000 --> 00:00:10.000
 This is the first cue.

 00:00:10.000 --> 00:00:15.000
  

 00:00:15.000 --> 00:00:20.000
 This is the second cue with a blank cue in between.

The blank cue has a background, but it remains unclear how to set the cue background box to a fixed size.

(CC-14) allow cue text rendering effects, e.g. paint on, pop on, roll up, appear cue format, rendering, user preferences CUEFMT TTML timing can support these down to the character level if required, examples of each are given in the W3C TTML test suite.
<p dur="16.5s" s:wrapOption="noWrap">
  <span begin="0.5s">The </span>
  <span begin="1.3s">video </span>
  <span begin="2.1s">element </span>
  <span begin="2.9s">is </span>
  <span begin="3.7s">a </span>
  <span begin="4.5s">media </span>
  <span begin="5.3s">element </span>
  <span begin="6.1s">whose </span>
  <span begin="6.9s">media </span>
  <span begin="7.7s">data </span>
</p>
WebSRT specifies successive display and controls formatting via CSS.
 <style>
  video track#cc14 ::cue-part(future) {
    visibility: none;
  }
 </style>
<video>
 <track id=”cc14” label=”caption1” kind=”captions” srclang=”en” src=”cc14.wsrt”>
</video>
 cue1
 00:00:00.000 --> 20.000
 A <00:00:01.000>word <00:00:02.000>a <00:00:03.000>time <00:00:04.000>is <00:00:05.000>displayed.

 cue2
 00:00:06.000 --> 10.000
 Now the line just appears.

 cue3
 00:00:10.000 --> 00:00:20.000
 This <00:00:11.000>is <00:00:12.000>more <00:00:13.000>word <00:00:14.000>by <00:00:15.000>word <00:00:16.000>text.

Cue1 is timed and will display one word at a time at the given timestamps, which makes it roll-up mode. Cue2 just appears and is thus pop-on mode. Cue3 is added below cue1, replaces cue2, and also displays word by word. The same can be achieved for individual characters to make it paint-on.

The number of lines shown depends on the duration of the caption cues and the number of lines in each.

(CC-15) support bottom 1/12 rendering rule cue format, rendering, user preferences CUEFMT regions can be specified using a percentage of the video frame. Text alignment within regions can be to the after edge. WebSRT supports bottom 1/12 rendering rule by default.
(CC-16) support inserting left-to-right and right-to-left segments within a vertical run cue format, rendering, internationalisation CUEFMT, UX TTML supports xml:lang on any text element, and is unicode based. WebSRT does not support mixed language inside a cue (yet), but only per cue. In particular mixed direction isn't supported. See CC-17.


(CC-17) support mixed language cue files cue format, rendering, internationalisation, API CUEFMT, UX TTML supports xml:lang on any text element, including the root (where it is required) and is unicode based. WebSRT does not (yet) support mixed language inside a cue. This would require introduction of something like a <span> element or introduction of a lang and class attribute on the voice marker, e.g.
 00:00:10.000 --> 00:00:20.000
 <narrator>This is a piece <span lang="fr>en anglais</span>.

 00:00:20.000 --> 00:00:30.000
 <2 lang=fr>Bonjour mes amis.
(CC-18) support furigana, ruby and other common typographical conventions cue format, rendering, internationalisation CUEFMT

The semantics can be expressed using roles.

  <div region="r1" >
    <p>
      頭を<span ttm:role="x-ruby">
        股<span ttm:role="x-rubytext" tts:fontSize="50%">また</span>
      </span>に突つ込んで祈るわ
    </p>
  </div>

The presentation using CSS:

<style type="text/css">
  video::cue[m|role="x-ruby"]
  {
    display:ruby;
  }
  video::cue[m|role="x-rubytext"]
  {
    display:ruby-text;
  }
</style>
WebSRT supports <ruby> markup.

Example:

 00:00:15,042 --> 00:00:18,042 A:start D:vertical L:98%
 <ruby>左<rt>ひだり</rt></ruby>に<ruby> 見<rt>み</rt></ruby>えるのは…

 00:00:18,750 --> 00:00:20,333 A:start D:vertical L:98%
 <ruby>右<rt>みぎ</rt></ruby>に<ruby> 見<rt>み</rt></ruby>えるのは…
(CC-19) support full range of typographical glyphs, layout and punctuation marks cue format, rendering, internationalisation CUEFMT, UX TTML supports full unicode and is font based. WebSRT supports full unicode.

See for example CC-18.

(CC-20) permit in-line mark-up for foreign words or phrases. cue format, rendering, internationalisation, speech synthesis CUEFMT TTML supports xml:lang on any text element, and is unicode based. WebSRT supports <b> and <i> markup for default caption rendering. WebSRT as "metadata" supports any markup otherwise.

Example:

 00:00:00.000 --> 00:00:10.000
 This is <b>bold</b>.

 00:00:15.000 --> 00:00:20.000
 This is <i>italics</i>.
(CC-21) support semantic markup of different speakers cue format, rendering, speech synthesis CUEFMT, UX TTML has the <ttm:actor> element to describe different speaking agents and the ttm:actor attribute to refer to these. WebSRT supports <voice> markup for speaker voices.

Example:

 00:00:00.000 --> 00:00:10.000
 <1>This is speaker one.

 00:00:15.000 --> 00:00:20.000
 <2>This is speaker two.

In future <v name> markup is planned to make it more useful.

(CC-22) support the same API for in-band and external cue formats cue format, API SPECCED TTML is in active use on the internet today for both inband (MediaRoom - using MPEG-4) and out of band captioning (e.g. BBC iplayer) WebSRT will easily fit into containers that already allow encapsulation of SRT such as Ogg (through Ogg Kate) and Matroska/WebM. There is also software that can encode SRT into MPEG-4 already.
(CC-23) synchronized display of cue text and media data cue format, API, synchronisation UX TTML can use media time. And can specify cues at frame intervals, or in absolute time (ms). WebSRT is synchronized to media time through the Web browser.
(CC-24) support user activation/deactivation of cue tracks API, synchronisation, user preferences UX N/A - this is a user agent reqt. N/A (issue for the UI and the <track> element)
(CC-25) support edited and verbatim caption alternatives API, synchronisation, user preferences SPECNEW, UX These can be as separate files, or as separate divs within one file WebSRT allows to provide both in separate files.

Example:

<video>
 <track id=”cc25” label=”caption” kind=”captions” srclang=”en” src=”cc25.wsrt”>
 <track id=”cc25-ext” label=”extended cc” kind=”metadata” srclang=”en” src=”cc25-ext.wsrt”>
</video>

Example extended caption:

 title-1
 00:00:00,000 --> 00:00:02,050
 <img src="http://people.xiph.org/~giles/2004/openweekend/theora-talk/xifish.png" alt="Xiph logo"/>
 <a href="http://www.xiph.org/about/" alt="Xiph about page">About <i>Xiph.org</i></a>

 title-2
 00:00:02,050 --> 00:00:05,450
 <img src="http://www.linuxcertified.com/images/redhat-logo.jpg" alt="RedHat logo"/>
 <a href="http://www.redhat.com/" alt="RedHat Website">Sponsored by <b>RedHat</b></a>

 title-3
 00:00:05,450 --> 00:00:07,450
 <a href="http://www.xiph.org/video/vid1.shtml" alt="Xiph video page">Original Publication</a>
 <a href="http://webchat.freenode.net/?channels=xiph" alt="Xiph irc">Chat with the creators of the video</a>
(CC-26) support several cue tracks in different languages API, synchronisation, user preferences SPECCED These can be as separate files, or as separate divs within one file WebSRT allows to provide these in separate files.

Example:

<video>
 <track id=”cc26-en” label=”Captions” kind=”captions” srclang=”en” src=”cc26-en.wsrt”>
 <track id=”cc26-de” label=”Gehoerlosen-Untertitel” kind=”captions” srclang=”de” src=”cc25-de.wsrt”>
</video>
(CC-27) support live captioning cue format, API, synchronisation, user preferences SPECCED TTML can be produced in real time (e.g. from 608 caption data). There are a variety of mechanisms for delivering data serially in real time using XML, although this is outside of the scope of the TTML format specifically; it is in active use for in band systems. Note that for out-of-band systems, the main issue is not production at the server, but rather transmitting the information in time to the playback engine. A typical mechanism for live streaming on the web is via HTTP adaptive streaming, where an index file is used to point to a set of files on the server (which maybe generated in real time), this approach can be used to point to the appropriate TTML files too, where each file maps to one AV segment. WebSRT allows live captioning easily through continued appending to file. Also note that this is not strictly necessary, since the MutableTimedTrack API provides for such use cases.


Enhanced captions/subtitles
(ECC-1) support metadata markup of cue segments cue format CUEFMT TTML has the <metadata> element which can include arbitrary data. WebSRT could support microdata on <v>, <b> or <i> elements.
See tvd-2 for an example.
(ECC-2) support hyperlinking on cue segments cue format CUEFMT, UX Hyperlinks (e.g in the HTML namespace) could be included in TTML as foreign namespace elements, or as metadata or using a role="x-hyperlink" and styling with CSS3 and XBL for semantics. Note that the link can have independant timing from the captions. WebSRT allows cues with any markup, which can include HTML markup with hyperlinks.
See cc-25 for an example.
(ECC-3) support extended cue times and overlap handling cue format, synchronisation, user preferences CUEFMT, UX TTML timing can be of arbitrary length. Cues can overlap in time in parallel time containers. WebSRT cues have arbitrary length and can overlap.
See CC-6 for an example.
(ECC-4) support pausing on extended cue times or parallel display cue format, synchronisation, user preferences CUEFMT, UX Since TTML is based on media time, this requires the media time to be paused and restarted, this can be handled by proposed HTML specific attribute on cues which causes the player to pause the media. WebSRT specifies parallel display of overlapping cues. The synchronization with video would need to be solved in the player.
See TVD-4 for an example.
(ECC-5) allow users to specify their reading speed to deal with extended cues cue format, synchronisation, user preferences UX If TTML is synched to the media, then if the media slows down the cues will too. If the media is paused the cues will remain indefinetly. TTML could be synched to another external clock if required; although the relationship between that clock and the media clock would need to be specified elsewhere. N/A (UI issue and preference settings)


Sign translation
(SL-1) Support sign-language video either as a track as part of a media resource or as an external file. multitrack, synchronisation, API, rendering SPECNEW, SPECCED N/A N/A
(SL-2) Support the synchronized playback of the sign-language video with the media resource. synchronisation SPECNEW, SPECCED N/A N/A
(SL-3) Support the display of sign-language video either as picture-in-picture or alpha-blended overlay, as parallel video, or as the main video with the original video as picture-in-picture or alpha-blended overlay. Parallel video here means two discrete videos playing in sync with each other. It is preferable to have one discrete <video> element contain all pieces for sync purposes rather than specifying multiple <video> elements intended to work in sync. user interface, rendering, user preferences, markup UX N/A N/A
(SL-4) Support multiple sign-language tracks in several sign languages. internationalisation SPECCED N/A N/A
(SL-5) Support the interactive activation/deactivation of a sign-language track by the user. user interface, user preferences UX N/A N/A
Transcripts
(T-1) Support the provisioning of a full text transcript for the media asset in a separate but linked resource, where the linkage is programatically accessible to AT. linkage SPECCED TTML is capable of long form documents, and animating within them (e.g. fo highlight current word or sentence) While WebSRT can in theory do this, it would be better to provide this in HTML. It is always possible to extract the cues as HTML in JavaScript and write it out similar to http://annodex.net/~silvia/a11y_bcp/.
(T-2) Support the provisioning of both scrolling and static display of a full text transcript with the media resource, e.g. in an area next to the video or underneath the video, which is also AT accessible. linkage, rendering, user interface UX This can be controlled using TTML timing, or left to the user agent. While WebSRT can in theory do this, it would be better to provide this in HTML. A transformation of WebSRT to HTML could provide for this. It is always possible to extract the cues as HTML in JavaScript and write it out similar to http://annodex.net/~silvia/a11y_bcp/.


Access to interactive controls / menus
(KA-1) Support operation of all functionality via the keyboard on systems where a keyboard is (or can be) present, and where a unique focus object is employed. This does not forbid and should not discourage providing mouse input or other input methods in addition to keyboard operation. (UAAG 2.0 4.1.1) user interface (NOTE: This means that all interaction possibilities with media elements need to be keyboard accessible; e.g., through being able to tab onto the play, pause, mute buttons, and to move the playback position from the keyboard.) UX N/A N/A
(KA-2) Support a rich set of native controls for media operation, including but not limited to play, pause, stop, jump to beginning, jump to end, scale player size (up to full screen), adjust volume, mute, captions on/off, descriptions on/off, selection of audio language, selection of caption language, selection of audio description language, location of captions, size of captions, video contrast/brightness, playback rate, content navigation on same level (next/prev) and between levels (up/down) etc. This is also a particularly important requirement on mobile devices or devices without a keyboard. user interface, user preferences, API (NOTE: This means that the @controls content attribute needs to provide an extended set of control functionality including functionality for accessibility users.) UX N/A N/A (some of these are related, but UI functionality)
(KA-3) All functionality available to native controls must also be available to scripted controls. The author would be able to choose any/all of the controls, skin them and position them. API (NOTE: This means that new IDL attributes need to be added to the media elements for the extra controls that are accessibility related.) NO N/A N/A
(KA-4) It must always be possible to enable native controls regardless of the author preference to guarantee that such functionality is available and essentially override author settings through user control. This is also a particularly important requirement on mobile devices or devices without a keyboard. user interface, linkage (NOTE: This could be enabled through a context menu, which is keyboard accessible and its keyboard access cannot be turned off.) UX N/A N/A
(KA-5) The scripted and native controls must go through the same platform-level accessibility framework (where it exists), so that a user presented with the scripted version is not shut out from some expected behaviour. API, linkage (NOTE: This is below the level of HTML and means that the accessibility platform needs to be extended to allow access to these controls. ) NO N/A N/A
Granularity level control for structural navigation
(CNS-1) All identified structures, including ancillary content as defined in "Content Navigation" above, must be accessible with the use of "next" and "previous," as refined by the granularity control. multitrack, synchronization, api, navigation, markup, user interface UX N/A N/A (UI matter)
(CNS-2) Users must be able to discover, skip, play-in-line, or directly access ancillary content structures. multitrack, synchronization, api, navigation, markup, user interface UX N/A N/A (UI matter)
(CNS-3) Users need to be able to access the granularity control using any input mode, e.g. keyboard, speech, pointer, etc. user interface UX N/A N/A (UI matter)
(CNS-4) Producers and authors may optionally provide additional access options to identified structures, such as direct access to any node in a table of contents. multitrack, synchronization, markup, navigation, user interface SPECCED N/A N/A (UI matter)
Time-scale modification
(TSM-1) The user can adjust the playback rate of the time-based media tracks to between 50% and 250% of real time. user interface, user preference UX N/A N/A (UI matter)
(TSM-2) Speech whose playback rate has been adjusted by the user maintains pitch in order to limit degradation of the speech quality. user interface UX N/A N/A (UI matter)
(TSM-3) All provided alternative media tracks remain synchronized across this required range of playback rates. synchronisation UX N/A N/A (<video> element matter)
(TSM-4) The user agent provides a function that resets the playback rate to normal (100%). user interface UX N/A N/A (UI matter)
(TSM-5) The user can stop, pause, and resume rendered audio and animation content (including video and animated images) that last three or more seconds at their default playback rate. (UAAG 2.0 4.9.6) user interface UX N/A N/A (UI matter)
Production practice and resulting requirements
(PP-1) Support existing production practice for alternative content resources, in particular allow for the association of separate alternative content resources to media resources. Browsers cannot support all forms of time-stamp formats out there, just as they cannot support all forms of image formats (etc.). This necessitates a clear and unambiguous declared format, so that existing authoring tools can be configured to export finished files in the required format. synchronisation, cue format NO N/A N/A (<video> element matter)
(PP-2) Support the association of authoring and rights metadata with alternative content resources, including copyright and usage information. cue format, multitrack CUEFMT N/A WebSRT does not (yet) support file-wide metadata.
See DV-14 for an example.
(PP-3) Support the simple replacement of alternative content resources even after publishing. This is again dependent on authoring practice - if the content creator delivers a final media file that contains related accessibility content inside the media wrapper (for example an MP4 file), then it will require an appropriate third-party authoring tool to make changes to that file - it cannot be demanded of the browser to do so. multitrack, cue format NO N/A N/A (Web server practice)
(PP-4) Typically, alternative content resources are created by different entities to the ones that create the media content. They may even be in different countries and not be allowed to re-publish the other one's content. It is important to be able to host these resources separately, associate them together through the Web page author, and eventually play them back synchronously to the user. synchronisation SPECCED N/A N/A (Web server and <video> element matter)
Discovery and activation/deactivation of available alternative content by the user
(DAC-1) (part a) (a)The user has the ability to have indicators rendered along with rendered elements that have alternative content (e.g., visual icons rendered in proximity of content which has short text alternatives, long descriptions, or captions). user interface, linkage UX N/A N/A (UI matter)
(DAC-1) (part b) (b) In cases where the alternative content has different dimensions than the original content, the user has the option to specify how the layout/reflow of the document should be handled. (UAAG 2.0 3.1.1). user interface, linkage UX N/A N/A (UI matter)
(DAC-2) The user has a global option to specify which types of alternative content by default and, in cases where the alternative content has different dimensions than the original content, how the layout/reflow of the document should be handled. (UAAG 2.0 3.1.2). rendering, audio rendering, user interface (Note: Media queries have been proposed as a way of meeting this need, along with the use of CSS for layout.) SPECNEW N/A N/A (UI matter)
(DAC-3) The user can browse the alternatives and switch between them. user interface, navigation UX N/A N/A (UI matter)
(DAC-4) Synchronized alternatives for time-based media (e.g., captions, descriptions, sign language) can be rendered at the same time as their associated audio tracks and visual tracks (UAAG 2.0 3.1.3). synchronisation, multitrack SPECCED N/A N/A (UI matter)
(DAC-5) Non-synchronized alternatives (e.g., short text alternatives, long descriptions) can be rendered as replacements for the original rendered content (UAAG 2.0 3.1.3). linkage SPECCED N/A N/A (UI matter)
(DAC-6) Provide the user with the global option to configure a cascade of types of alternatives to render by default, in case a preferred alternative content type is unavailable (UAAG 2.0 3.1.4). user preferences UX N/A N/A (UI matter)
(DAC-7) During time-based media playback, the user can determine which tracks are available and select or deselect tracks. These selections may override global default settings for captions, descriptions, etc. (UAAG 2.0 4.9.8) user interface, user preferences UX N/A N/A (UI matter)
(DAC-8) Provide the user with the option to load time-based media content such that the first frame is displayed (if video), but the content is not played until explicit user request. (UAAG 2.0 4.9.2) user interface, (autostart) UX N/A N/A
Requirements on making properties available to the accessibility interface
(API-1) The existence of alternative-content tracks for a media resource must be exposed to the user agent. user interface SPECCED N/A N/A (UI matter)
(API-2) Since authors will need access to the alternative content tracks, the structure needs to be exposed to authors as well, which requires a dynamic interface. API SPECCED N/A N/A (UI matter)
(API-3) Accessibility APIs need to gain access to alternative content tracks no matter whether those content tracks come from within a resource or are combined through markup on the page. multitrack, synchronisation, API, linkage SPECCED N/A N/A (UI matter)
Requirements on the use of the viewport
(VP-1) It must be possible to deal with three different cases for the relation between the viewport size, the position of media and of alternative content:
  1. the alternative content's extent is specified in relation to the media viewport (e.g., picture-in-picture video, lower-third captions)
  2. the alternative content has its own independent extent, but is positioned in relation to the media viewport (e.g., captions above the audio, sign-language video above the audio, navigation points below the controls)
  3. the alternative content has its own independent extent and doesn't need to be rendered in any relation to the media viewport (e.g., text transcripts)

If alternative content has a different height or width than the media content, then the user agent will reflow the (HTML) viewport. (UAAG 2.0 3.1.4).

rendering, user interface, linkage (NOTE: This may create a need to provide an author hint to the Web page when embedding alternate content in order to instruct the Web page how to render the content: to scale with the media resource, scale independently, or provide a position hint in relation to the media. On small devices where the video takes up the full viewport, only limited rendering choices may be possible, such that the UA may need to override author preferences.) UX N/A WebSRT will render into a given viewport only.
(VP-2) The user can change the following characteristics of visually rendered text content, overriding those specified by the author or user-agent defaults (UAAG 2.0 3.6.1). Note: this should include captions and any text rendered in relation to media elements, so as to be able to magnify and simplify rendered text):
  1. text scale (i.e., the general size of text) ,
  2. font family, and
  3. text color (i.e., foreground and background).
rendering (NOTE: This should be achievable through UA configuration or even through something like a greasemonkey script or user CSS which can override styles dynamically in the browser.) UX N/A WebSRT allows for all these, see examples in the caption section above.
(VP-3) Provide the user with the ability to adjust the size of the time-based media up to the full height or width of the containing viewport, with the ability to preserve aspect ratio and to adjust the size of the playback viewport to avoid cropping, within the scaling limitations imposed by the media itself. (UAAG 2.0 4.9.9) rendering (NOTE: This can be achieved by simply zooming into the Web page, which will automatically rescale the layout and reflow the content.) UX N/A WebSRT allows for all these rendering mechanisms, see CC-5 for an example.
(VP-4) Provide the user with the ability to control the contrast and brightness of the content within the playback viewport. (UAAG 2.0 4.9.11) user interface (NOTE: This is a user-agent device requirement and should already be addressed in the UAAG. In live content, it may even be possible to adjust camera settings to achieve this requirement. It is also a "SHOULD" level requirement, since it does not account for limitations of various devices.) UX N/A N/A (user-agent device matter)
(VP-5) Captions and subtitles traditionally occupy the lower third of the video, where also controls are also usually rendered. The user agent must avoiding overlapping of overlay content and controls on media resources. This must also happen if, for example, the controls are only visible on demand. rendering (NOTE: If there are several types of overlapping overlays, the controls should stay on the bottom edge of the viewport and the others should be moved above this area, all stacked above each other. ) UX N/A N/A (<track> matter)
Requirements on the parallel use of alternate content on potentially multiple devices in parallel
(MD-1) Support a platform-accessibility architecture relevant to the operating environment. (UAAG 2.0 2.1.1) linkage UX N/A N/A
(MD-2) Ensure accessibility of all user-interface components including the user interface, rendered content, and alternative content; make available the name, role, state, value, and description via a platform-accessibility architecture. (UAAG 2.0 2.1.2) user interface, linkage UX N/A N/A
(MD-3) If a feature is not supported by the accessibility architecture(s), provide an equivalent feature that does support the accessibility architecture(s). Document the equivalent feature in the conformance claim. (UAAG 2.0 2.1.3)  ?? UX N/A N/A
(MD-4) If the user agent implements one or more DOMs, they must be made programmatically available to assistive technologies. (UAAG 2.0 2.1.4) This assumes the video element will write to the DOM. API UX N/A N/A
(MD-5) If the user can modify the state or value of a piece of content through the user interface (e.g., by checking a box or editing a text area), the same degree of write access is available programmatically (UAAG 2.0 2.1.5). API UX, SPECCED N/A N/A
(MD-6) If any of the following properties are supported by the accessibility-platform architecture, make the properties available to the accessibility-platform architecture (UAAG 2.0 2.1.6):
  1. the bounding dimensions and coordinates of rendered graphical objects;
  2. font family;
  3. font size;
  4. text foreground color;
  5. text background color;
  6. change state/value notifications.
rendering UX N/A N/A
(MD-7) Ensure that programmatic exchanges between APIs proceed at a rate such that users do not perceive a delay. (UAAG 2.0 2.1.7). API UX N/A N/A