Mapping from MPEG-2 Transport to HTML5

Unofficial Draft 31 October 2011

Editor:
Bob Lund

Abstract

This specification defines a standard mechanism a user agent should use to expose the tracks in an MPEG-2 TS media container so that the tracks can be identifed in a common way.

Status of This Document

This document is merely a public working draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

Table of Contents

1. Introduction and Purpose

HTML5 UAs [HTML5] may playback MPEG-2 TS media resources that contain a multiplex of video, audio, text and private data elementary streams. Television program providers and distributors use these streams to deliver services associated with the primary video and audio in the multiplexed stream. These services are collectively termed "TV Services". A consistent HTML presentation of these TV services tracks by UAs is essential in order for script to understand the specific type of service and interpret the track data, independent of the media resource provider. This specification defines requirements for how these MPEG-2 TS elementary streams should be translated by the HTML5 user agent into the equivalent HTML5 video, audio and text track elements.

Note that the Web page providing the user interface (e.g. program guide) is often not provided by the originator of the program content. For example, the guide may be provided by the television manufacturer or the cable or satellite TV provider, while the multiplexed streams are provided by hundreds of independent television program providers. Therefore, the Web page has no a priori knowledge of which streams are in the programs at any given time.

This specification defines the requirements for an HTML5 user agent to recognize and make the MPEG-2 TS program streams available to Web content in a consistent way that is independent of the program provider. Example TV Services are:

Closed Captioning Textual representation of the media resource audio dialogue intended for the hearing impaired.
Subtitles Alternate language textual representation of the media resource audio dialogue.
Content Advisories Content rating information used by parental control applications.
Synchronized Content Signaling messages to control the execution of a client application in a manner synchronized with the media resource playback.
Client ad insertion Signaling messages that convey advertisement insertion opportunities to a client application.
Audio translations Alternate language representation of the primary audio track.
Audio descriptions Audio descriptions of the video intended for the visually impaired.

The requirements in this specification only apply to single program MPEG-2 transport streams; multi-program MPEG-2 transport streams are out of scope for this specification. Different specifications may define equivalent requirements for other media transport and container formats, MPEG-4 base media file format and MPEG DASH for example. The following sections define requirements for how the user agent must recognize MPEG-2 TS video, audio and other data tracks and how the HTML5 elements representing those tracks must be created.

2. Video, Audio and Text Track Creation

HTML5 VideoTrack, AudioTrack and TextTrack must be created as defined in [HTML5].

HTML5 VideoTrack, AudioTrack and TextTrack elements have additional attributes, beyond those referenced in this specification, that should be set by the user agent consistent with user preferences.

A user agent may be presented with previously processed in-band TextTracks, for example, when the viewer seeks back in the media resource, as controlled by the seekable time ranges attribute of the HTMLMediaElement. TextTrackCues are not removed from the TextTrack so the user agent must not create duplicate TextTrackCues in this case. How the user agent accomplishes this is implementation specific.

2.1 Track Description TextTrack

Recognition of specific types of video, audio and text tracks will, in general, be dependent on geographical region or service or content provider. In order that UA implementations are independent of region and provider, it is desirable that the UA recognize tracks in a generic manner and rely on a script to implement region, provider and application specific recognition of tracks. This requires script access to the MPEG program description in the program map table (PMT) as defined in [H.222.0] so that the tracks can be used correctly.

The UA must create a TextTrack in the media resource TextTrackList to make the PMT available to a script and set the TextTrack attributes using the following rules:

  1. kind = "metadata".
  2. label = "video/mp2t track-description", i.e. the string concatenation of the MPEG-2 TS MIME type [RFC3555] and "track-description".
  3. language = ""(empty string).
  4. mode="HIDDEN"

For each PMT received in the program stream, the UA must create a TextTrackCue only in the case where the PMT differs from the PMT represented by the previously created TextTrackCue. This is in recognition of the fact that the PMT is received at a minimum rate of every 140 msec but changes at a much lower rate.

For each new PMT, a UA must create a new TextTrackCue in the TextTrack as described in [HTML5] section "Text track model" with attributes set as follows:

  1. startTime is set to the current time in the media resource timeline.
  2. endTime = Infinity.
  3. Text of the cue is set to the PMT encoded in Base64 [BASE64].
  4. pauseOnExit=false.

Other media container formats, Ogg and WebM for example, can contain multiple tracks along with metadata describing those tracks. The track-description text track would be useful for making that metadata available to script.

2.2 VideoTrack

For all MPEG-2 video stream types supported by the UA, the UA must create a new VideoTrack in the VideoTrackList of the media resource.

The HTML5 specification [HTML5] requires that the VideoTrackList must use the order defined by the media resource. VideoTracks must appear in the VideoTrackList in the same order as they appear in the PMT as defined in [H.222.0].

The UA must set the VideoTrack label attribute to a text string representing the packet ID (PID) of the equivalent MPEG-2 program stream. NOTE: Use of the label to identify the media resource track identifier works in MPEG-2 TS as there is no specified way for the label to be set by the media resource. Other container formats do provide a value to be used for the label. It would preferable for a new IDL attribute to be defined for the express purpose of specifying the media resource track identifier.

The UA must set VideoTrackList[0].VideoTrack.kind = "main".

For all other VideoTrackList entries, the UA should set the VideoTrack kind attribute if it can determine the correct value.

The UA must set VideoTrack.language to the value of the ISO_639_language_code field [ISO639.2] in the ISO 639 descriptor, if present, associated with the video stream type in the PMT. If the UA cannot determine the VideoTrack kind and language attributes it must set them to the empty string.

2.3 AudioTrack

For all MPEG-2 audio stream types supported by the user agent, the use agent must create a new AudioTrack in the AudioTrackList of the media resource.

The HTML5 specification [HTML5] requires that the AudioTrackList must use the order defined by the media resource. AudioTracks must appear in the AudioTrackList in the same order as they appear in the PMT as defined in [H.222.0].

For each AudioTrack created the UA must set the label attribute to a text string representing the PID of the equivalent MPEG-2 program stream.

The UA must set AudioTrackList[0].AudioTrack.kind = "main".

For all other AudioTrackList entries, the UA should set the AudioTrack kind attribute if it can determine the correct value.

The UA must set AudioTrack.language to the value of the ISO_639_language_code field in the ISO 639 descriptor, if present, associated with the audio stream type in the PMT. If the UA cannot determine the AudioTrack kind and language attributes it must set them to the empty string.

2.4 Other TextTracks

For all MPEG-2 stream types that are not audio or video stream types, the UA must create a new TextTrack in the TextTrackList of the media resource.

The HTML5 specification [HTML5] requires that the TextTrackList must use the order defined by the media resource. TextTracks must appear in the TextTrackList in the same order as they appear in the PMT as defined in [H.222.0].

The UA should set the TextTrack kind attribute to one of the categories defined in [HTML5].

If the UA cannot determine the TextTrack kind attribute it must set it to "metadata "

If the UA sets the TextTrack.kind attribute to one of the categories defined in [HTML5], it should set the TextTrack language attribute if it can determine the appropriate value. If the UA cannot determine the TextTrack language attribute it must set it to the empty string.

The UA must:

  1. set the label attribute to a text string representing the PID of the equivalent MPEG-2 program stream.
  2. set mode="disabled"

The MPEG-2 TS packets with the PID corresponding to the TextTrack contain either PES packets or private data packets as defined in [H.222.0]. For each PES or private data packet in the program stream represented by the TextTrack, the UA must create a TextTrackCue in the TextTrack as described in [HTML5] section "Text track model" with attributes set as follows:

  1. startTime is set to the current time in the media resource timeline.
  2. endTime is set to the Infinity.
  3. the text of the cue is set to the PES or private data packet encoded in Base64 [BASE64].
  4. pauseOnExit is set to false .

It is important to note that the above metatdata TextTrack and textTrackCue creation requirements, while minimizing region/provider/application knowledge in the UA, make the semantics of the metadata TextTrack and TextTrackCues opaque to the UA. So, for example, if the UA does not recognize subtitle tracks but creates a generic metadata text track as defined above the user agent behavior defined in [HTML5] for subtitle tracks will not occur since the UA is not aware this is a subtitle track. It is up to a script to identify the subtitle track and process the subtitle messages in the TextTrackCues in a manner appropriate for the subtitle format. One way this could be done is for a script to receive the TextTrackCues, extract the subtitle messages and create a new subtitle TextTrack and textTrakCues, in which case the UA defined behavior in [HTML5] would occur.

3. Closed Captioning

Closed captioning is delivered as part of the MPEG-2 TS video stream and must be recognized and made available by the UA.

A user agent (UA) that recognizes closed captioning must:

  1. Create a new TextTrack as defined in [HTML5] section "Sourcing in-band text tracks" with the track element attributes set as follows:
    1. kind= "caption".
    2. language is set to a BCP-47 [BCP47] conformant representation of the caption data language.
    3. label set to a text string representing the PID of the MPEG-2 video program stream containing the caption data.
  2. The UA must create a new TextTrackCue in the TexTrack as described in [HTML5] for each caption with attributes set as follows:
    1. startTime is set to the presentation time set in the caption data converted into the equivalent time in seconds relative to the media resource timeline.
    2. endTime is set to the end of the presentation if specified in the caption data. If the end of the presentation is not specified endTime is set to Infinity.
    3. the text of the cue is set to the caption data. It is UA implementation specific how the type of the caption data cue is determined by the UA when the cue text is rendered or when getCueTextAsHTML() is called.
    4. pauseOnExit is set to false

Acknowledgements

Thanks are expressed by the editor to the following individuals for their input to and feedback on this specification to date (in alphabetical order).

Mukta Kar, Giuseppe Pascale, Ed Shrum, George Sarosi, Clarke Stevens, Mark Vickers and Eric Winkelman.

A. References

A.1 Normative references

[BASE64]
" The Base16, Base32, and Base64 Data Encodings "
[BCP47]
" Tags for Identifying Languages "
[H.222.0]
" Infrastructure of audiovisual services - Transmission multiplexing and synchronization "
[HTML5]
" A vocabulary and associated APIs for HTML and XHTML"
[ISO639.2]
" ISO 639.2, Code for the Representation of Names of Languages - Part 2: alpha-3 code "
[RFC3555]
" MIME Type Registration of RTP Payload Formats"

A.2 Informative references

No informative references.