This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.

Media TextTrack Issues

From HTML accessibility task force Wiki
Jump to: navigation, search

The TextTrack interface and the <track> element as present in the current HTML5 specification are fairly well developed. However, there are some issues that we continue to come across and will need to discuss and give feedback on. This wiki page is starting to collect the issues so a good proposal can be made.


Support of multiple formats

Multiple discussions are currently happening about supporting more than one file format in <track> elements.

Formats that people like to see supported natively in browsers are: SRT (SubRip), WebVTT, TTML (former DFXP), SUB (SubViewer), SSA (substation alpha), YouTube XML.

The requirement for supporting more than one format comes from the drive to re-use existing content without transcoding. It seems that some browser vendors are open to this idea and are indeed planning to implement support for more than one format beyond WebVTT as the baseline.

The proposal is therefore to implement the same mechanism for <track> elements as we have for <audio> and <video> elements: namely to use the <source> element for format selection.

Example:

 <video controls>
   <source src="video.webm" type="video/webm">
   <source src="video.mp4" type="video/mp4">

   <track label="English Captions" kind="captions" srclang="en-US">
     <source src="cap_en.ttml" type="text/ttml">
     <source src="cap_en.srt" type="text/srt">
     <source src="cap_en.vtt" type="text/vtt">
   </track>
 </video>


The advantages of introducing such a source selection on track are:

  • it is possible to provide alternative formats for a single track (label, kind, lang)-combination
  • we do not have to implement content sniffing on the @src attribute of track when we support several formats (note that content sniffing can become costly in particular in a mobile environment and when many languages are supported)
  • it is possible to extend canPlayType to also query supported text track formats
  • easy for authors to re-use existing formats without needing to transcode them all

The disadvantages are:

  • need to implement a source selection algorithm for text tracks, too
  • browsers may eventually have to support many text track formats


Positioning of track cues

Currently, it is possible to style and position the cue text content through the ::cue pseudo-element.

However, the automatic rendering location of the cues is on top of the video and it is not possible to change the positioning via CSS.

The proposal is therefore to also introduce a ::track pseudo-element that will allow to move the rendering box of the cues to another location on the page, no matter if the track came from in-band or an external resource.

It is unclear yet how the rendering would work, though, e.g. if a "display:block" may be necessary to move the rendering box off the video viewport. This needs to be clarified.


Remove use of Array

The HTMLMediaElement contains the textTracks as an Array. Arrays in this manner are not used anywhere else in the HTML specification. Rather, similar content is created as a collection.

For consistency, the proposal is to change:

readonly attribute TextTrack[] textTracks;

to:

readonly attribute TextTrackCollection textTracks;

and add:

interface TextTrackCollection : HTMLCollection {
 // inherits length and item()
 caller getter TextTrack namedItem(in DOMString name); // overrides inherited namedItem()
};


More consistent attributes with HTMLMediaElement

Text tracks have the following IDL:

  interface HTMLTrackElement : HTMLElement {
             attribute DOMString kind;
             attribute DOMString src;
             attribute DOMString srclang;
             attribute DOMString label;
             attribute boolean default;

    readonly attribute TextTrack track;
  };

  interface TextTrack {
    readonly attribute DOMString kind;
    readonly attribute DOMString label;
    readonly attribute DOMString language;

    const unsigned short NONE = 0;
    const unsigned short LOADING = 1;
    const unsigned short LOADED = 2;
    const unsigned short ERROR = 3;
    readonly attribute unsigned short readyState;
             attribute Function onload;
             attribute Function onerror;

    const unsigned short OFF = 0;
    const unsigned short HIDDEN = 1;
    const unsigned short SHOWING = 2;
             attribute unsigned short mode;

    readonly attribute TextTrackCueList cues;
    readonly attribute TextTrackCueList activeCues;

             attribute Function oncuechange;
  };


Some of the IDL attributes of the HTMLMediaElements are very close in meaning:

interface HTMLMediaElement : HTMLElement {

  // error state
  readonly attribute MediaError error;

  // network state
           attribute DOMString src;
  readonly attribute DOMString currentSrc;
  const unsigned short NETWORK_EMPTY = 0;
  const unsigned short NETWORK_IDLE = 1;
  const unsigned short NETWORK_LOADING = 2;
  const unsigned short NETWORK_NO_SOURCE = 3;
  readonly attribute unsigned short networkState;
           attribute DOMString preload;
  readonly attribute TimeRanges buffered;
  void load();
  DOMString canPlayType(in DOMString type);

  // ready state
  const unsigned short HAVE_NOTHING = 0;
  const unsigned short HAVE_METADATA = 1;
  const unsigned short HAVE_CURRENT_DATA = 2;
  const unsigned short HAVE_FUTURE_DATA = 3;
  const unsigned short HAVE_ENOUGH_DATA = 4;
  readonly attribute unsigned short readyState;
  readonly attribute boolean seeking;

  // playback state
  [...]
};

It makes sense to harmonize some of these attributes.

Also, currently text tracks require complete download (or failure) to proceed. In order to support live generated streams of captions being synchronised in a multi-track presentation, the network handling of text should be unified with that of binary media tracks.


For example: a HTMLNetworkSource could be introduce to be the network connected object that actually gets the bits. This would then be re-used by the HTMLMediaElement and the TextTrack:

interface HTMLNetworkSource {
 // error state
 readonly attribute MediaError error;

 // network state
 const unsigned short NETWORK_EMPTY = 0;
 const unsigned short NETWORK_IDLE = 1;
 const unsigned short NETWORK_LOADING = 2;
 const unsigned short NETWORK_NO_SOURCE = 3; 
 readonly attribute unsigned short networkState;
          attribute DOMString preload;
 readonly attribute TimeRanges buffered;
 void load();

  // ready state
 const unsigned short HAVE_NOTHING = 0;
 const unsigned short HAVE_METADATA = 1;
 const unsigned short HAVE_CURRENT_DATA = 2;
 const unsigned short HAVE_FUTURE_DATA = 3;
 const unsigned short HAVE_ENOUGH_DATA = 4;
 readonly attribute unsigned short readyState;
}


OnChange event on TextTrack

Currently, the only event that the TextTrack itself raises is oncuechange:

           attribute Function oncuechange;

There are multiple problems with that:

  • most authors will want to do something different when a cue ends to when a cue starts, so having to mix both functionality in one event is not clean
  • there is no indication of which cue has changed in this event, so the author has to maintain a copy of the previous state of activeCues and determine the difference to find out which cues have actually changed
  • this event is both raised on the cues as well as the track for a new/finished cue, which might lead to double handling
  • it might be nicer to just mirror the onenter and onleave events of the cues into the track


CueList change events on TextTrack

The oncuechange event of the TextTrack tracks the changes of the activeCues list.

However - and in particular with in-band tracks, live streams, and mutableTextTracks - the list of cues itself can change during the course of playback of the media resource. In particular in most media formats text tracks are interleaved with audio and video tracks, so the loadedmetadata event fires long before the cues list is complete.

Therefore, the proposal is to introduce another new event to track this:

           attribute Function oncuelistchange;

This will allow an author who is handling everything in JavaScript to identify that new cues have been added and preprocess them, such as display in a growing transcript on screen even before they are being active, or pre-fetch images that they point to.


Definition of loaded on TextTrack

The readiness state of a TextTrack can be either "not loading", "loading", "loaded", or "failed to load".

In a in-band or dynamically loading track resource, the stated "loaded" may never occur. http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#text-track-loaded

However, the media resource fetching algorithm allows a media resource to go to HAVE_METADATA and throw the onmetadataloaded event only when the text tracks are ready, i.e. all active track are either "loaded" or "failed to load". http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource

This has unexpected consequences, e.g. it will inhibit media resources with in-band tracks from playing until the full resource has been buffered.

Therefore, the proposal is to introduce another state into the readiness state list of TextTracks, which is "metadataloaded" and similarly to media resources is achieved once the decoding pipeline for the text track has been set up (i.e. the file format identified and the headers loaded) and it is clear that the data can be decoded.

Alternatively, the readyState states of the HTMLMediaElement could just be adopted for TextTracks, too.


Support for multiple metadata track types

Timed text metadata tracks will be used to carry application signaling data for cable video content. There are multiple kinds of metadata tracks that need to be distinguishable from one another easily by script. For example, a video program may have one metadata track containing timed application data indicating available advertisement insertion points and another metadata track containing timed application data for synchronizing Web content with the video program. These examples of application data are expected to be XML documents but could possibly be of some other format. There needs to be a way to distinguish between @kind=METADATA tracks based on the type of application encoding.

In the case of out-of-band Timed Text Tracks, support for multiple file formats is being discussed, e.g. WebVTT, TTML, etc (see Support of multiple formats). So, metadata may also come in various file formats.

Example (taken from "Support for multiple formats") :

 <video controls>
   <source src="video.webm" type="video/webm">
   <source src="video.mp4" type="video/mp4">

   <track label="English Captions" kind="captions" srclang="en-US">
     <source src="cap_en.ttml" type="text/ttml">
     <source src="cap_en.srt" type="text/srt">
     <source src="cap_en.vtt" type="text/vtt">
   </track>
 </video>

Three alternatives have been identified for differentiating metadata tracks.


Alternative 1

Expand @kind values, as is being done for multiple video and audio tracks, to reflect common types of application data.

Example:

  <video controls>
   <source src="video.webm" type="video/webm">
   <source src="video.mp4" type="video/mp4">

   <track label="…" kind="metadata/adinsertion" srclang="…">
     <source src="metaadinsertion.ttml" type="text/ttml">
     <source src="metaadinsertion.vtt" type="text/vtt">
   </track>
   <track label="…" kind="metadata/syncwebcontent" srclang="…">
     <source src="metasyncwebcontent.ttml" type="text/ttml">
     <source src="metasyncwebcontent.vtt" type="text/vtt">
   </track>
 </video>

Advantages of this approach are:

  • Uses the existing @kind attribute.
  • Is similar in direction to what’s being done to distinguish multiple audio and video tracks.
  • Works with the current <track> element definition.

Disadvantages of this approach are:

  • Complicates the semantics of @kind.
  • The list gets too long if there are more than 3 or 4 application or multiple document types per application.


Alternative 2

Use @label to reflect the type of metadata. One way to represent XML document types is “xml/schema-def”.

Example:

 <video controls>
   <source src="video.webm" type="video/webm">
   <source src="video.mp4" type="video/mp4">

   <track label="xml/scte35schemadef" kind=" metadata" srclang="…">
     <source src="metascte35.ttml" type="text/ttml">
     <source src="metascte35.vtt" type="text/vtt">
   </track>
   <track label=”xml/eissschemadef" kind="metadata" srclang="…">
     <source src="metaeiss.ttml" type="text/ttml">
     <source src="metaeiss.vtt" type="text/vtt">
   </track>
 </video>

Advantages of this approach are:

  • Uses the existing @label attribute.
  • Works with the current <track> element definition.

Disadvantages of this approach are:

  • Complicates the semantics for @label.


Alternative 3

Add a “metadatatype” parameter to the MIME types that can contain metadata (as was done with the “codecs” parameter for certain video file formats). One way to represent XML document types is metadatatype=xml/schema-def”.

Example:

 <video controls>
  <source src="video.webm" type="video/webm">
  <source src="video.mp4" type="video/mp4">

  <track label="…" kind="metadata" srclang="…">
    <source src="metascte35.ttml" type="text/ttml; metadatatype=xml/scte35schemadef">
     <source src="metascte35.vtt" type="text/vtt; metadatatype=xml/scte35schemadef">
   </track>
   <track label=”…" kind="metadata" srclang="…">
     <source src="metaeiss.ttml" type="text/ttml; metadatatype=xml/eissschemadef">
     <source src="metaeiss.vtt" type="text/vtt; metadatatype=xml/eissschemadef">
   </track>
 </video>

Advantages of this approach are:

  • Is consistent with the semantics of MIME types and MIME type parameters.
  • Easy to add parameter values for new data types.

Disadvantages of this approach are:

  • Requires IETF standardization.
  • Will not work with <track> as presently defined. This could be addressed by allowing a @type attribute on the <track> element.


In-band sourced tracks can be handled in exactly the same way for each of the alternative with the user agent generating @kind, @label or metadatakind parameter based on appropriate specifications for the in-band data format. In Alternative 3, the @type would be set to the MIME type of the media resource transport format.

Alternative 3 seems to be the best choice.