MPTF/MPTF Discussions/TV services transport mapping

From Web and TV IG

Submitter: CableLabs


Tracker Issue ID: ISSUE-39

Description:

HTML5 supports media resources with multiple embedded media tracks[1] and text tracks[2]. In the case of multiple media tracks, the user agent creates the AudioTrackList and VideoTrackList objects that contain audio and video tracks in the media resource. The HTML5 specification does not define how the user agent creates these objects from the media resource. In the case of text tracks, a media-resource-specific text track can be created from data found in the media resource[3] where the HTML5 specification states “When a media resource contains data that the user agent recognizes and supports as being equivalent to a text track…”. Again, the HTML5 specification does not define how the user agent does this - [3] suggests “Set the new text track's kind, label, and language based on the semantics of the relevant data, as defined by the relevant specification” (emphasis by the author).

Motivation:

If multiple media tracks and text tracks are to be recognized by user agents in a standard manner, specifications need to be created that define how this is done. Various media delivery formats are in use today, e.g. WebM[4], Ogg[5], MPEG-2 transport stream[6], MPEG-4 file format[7], HTTP Live Streaming[8] and Smooth Streaming[9]. New delivery format specifications are nearing completion, e.g. MPEG DASH[10]. How tracks are recognized by user agents can be expected to differ by delivery format.

Dependencies:

The set of dependencies is a function of what type of in-band track information is to be recognized by UAs and what type of media transport formats are supported by UAs. Table 1 shows an example set of track data (rows) that might be carried in various media formats (columns).

Table 1 TV Services and Media Transport
MPEG2 TS MPEG4 ISOBMFF DASH MPD Other Fragmented File Formats
ETV triggers[11]
Ad insertion[12]
Content advisories[13]
Secondary audio[14]
Audio descriptions[14]
Closed captions[15]
Subtitles[15]


Each row represents an application that consists of audio or application data in the media resource that needs to be exposed to the application via the HTML5 track or media elements. Each column represents a type of media transport that could be relevant. In addition to MPEG2 TS, we anticipate the development of an in-band mechanism for MPEG4 fragmented file format (used in DASH). There are also specifications at various stages of development for out-of-band track types referenced in DASH MPD and other manifest file formats, such as HTTP live streaming.

What needs to be standardized

Each cell in the table needs a specification that will define:

  1. How the audio/application data is carried and found in the media resource
  2. What formats should be recognized
  3. How the information in the tracks gets mapped to the appropriate HTML5 API/DOM objects