Main Page

From Media Resource In-band Tracks Community Group
Jump to: navigation, search

This page is the Main Page of the Wiki for the Media Resource In-band Tracks Community Group.

Overview

This group will develop a specification defining how user agents should expose in-band tracks as HTML5 media element video, audio and text tracks so that Web applications can access the in-band track information, through the media element, in a interoperable manner across user agent implementations.

The CG is currently considering exposing track content for the following media stream container types:

Other formats could be considered in the future, such as RTP streams.

Input Documents

CableLabs Specification

Additional CableLabs draft specification

Mapping MP4 tracks to WebVTT

OIPF Specification

Requirements

This section specifies details beyond what is currently in HTML5 for exposing in-band tracks as HTML video, audio and text tracks.

Audio/Video Tracks

  1. An Audio/VideoTrack must be created for each audio/video elementary stream that the UA can render.
  2. The track order must be defined for all supported media resource container formats.
  3. A method for determining the in-band audio/video stream's 'kind' attribute must be defined for all supported media resource container formats.
  4. A method for determining the in-band audio/video stream's 'language' attribute must be defined for all supported media resource container formats.

Text Tracks

  1. A method for determining which in-band streams are text tracks must be defined for all supported media resource container formats.
  2. A method for determining the in-band data stream's 'kind' attribute must be defined for all supported media resource container formats.
  3. A method for determining the DataCue attributes must be defined for all in-band data streams created as metadata TextTracks.

Track Metadata

  1. Media resource metadata that describes the contents of an in-band video, audio or text track must be made available to a Web page.
  2. A Web page must be provided with a way to associate in-band track metadata (made available as described in the previous step) with the proper VideoTrack, AudioTrack or TextTrack object.
  3. The MIME type of the media resource containing in-band tracks must be available to a Web page so it knows how to parse the metadata for those in-band tracks.


Audio/Video/Text Track Creation

This section defines audio, video and text track, text track cue creation for each of the media resource types.

MPEG-2 TS

Track order

Order in the PMT. See ISO/IEC 13818-1:2013.

Track id attribute

Decimal representation of the PID in the PMT of the elementary stream containing the track data. Note in the case of CEA708 closed captions this will be the PID of the video elementary stream containing captions in the Picture User Data.

Track label attribute

Empty string.

Track language attribute

ISO_639_language_code in the ISO 639 Language Descriptor. See ISO/IEC 13818-1:2013.

Audio and Video Track Creation

The following table summarize how audio and video track attributes should be set.

Audio and Video kind attribute
Applies to kind Detection
video main First video elementary stream in the PMT
audio descriptions AC-3: bsmod == 2 and full_svc == 0
main First audio elementary stream in the PMT
main-desc AC-3: bsmod == 2 and full_svc == 1
translation AC-3: Not first audio elementary stream in the PMT and bsmod == 0

TextTrack Creation

The following table summarize how text track kind should be set.

Text Track kind
kind Detection
captions Presence of caption_service_descriptor or closed_caption_type in a video elementary stream
subtitles stream_type == 0x82 that is rendered by the UA
metadata stream_type == 0x05, 0x80 – 0xFF containing private sections (payload_unit_start_indicator == 1)

Text Track Cue Creation

Text Track Cue Generation
TextTrack.kind Cue Generation
captions Unspecified until a CEA708Cue subclass is defined
subtitles Unspecified until a SCTE27Subtitle subclass is defined
metadata The UA creates a DataCue for one complete private section table. Cue data is the entire private section table starting with table_id through CRC_32. The cue startTime and endTime is the time in the media resource timeline when the cue is created. Cue text is the empty string. Cue pauseOnExit is false.

MPEG-2 TS Metadata Cue

Below is a proposal for an MPEG-2 TS metadata text track cue interface. There are three types of metadata exposed to script: transport stream program map table, transport stream description table, and private section. The tableId attribute identifies the specific type of metadata:

tableId assignments (see Table 2-26)
Value Description
0x02 Transport stream program map table
0x03 Transport stream description table
0x40 - 0xFE User Private
interface MpegTsSection : TextTrackCue { //Generic MPEG-2 TS Section
    attribute octet tableId;
    attribute MpegTsSyntaxSection? syntaxSection;
}

interface MpegTsSyntaxSection {
    attribute unsigned short tableIdExtension;
    attribute octet versionNumber;
    attribute boolean currentNextIndicator;
    attribute octet sectionNumber;
    attribute octet lastSectionNumber;
}

interface MpegTsDescriptor {
    attribute octet tag;
    attribute ArrayBuffer data;
}

interface MpegTsElementaryStreamMetadata {
    attribute octet streamType;
    attribute unsigned short elementaryPID;
    attribute MpegTsDescriptor[] descriptors;
}

interface MpegTsPmt : MpegTsSection { //Transport Stream program map section (see Table 2-28)
    attribute unsigned short programNumber;
    attribute MpegTsDescriptor[] descriptors;
    attribute MpegTsElementaryStreamMetadata[] streams;
}

interface MpegTsDescriptionSection : MpegTsSection { // Transport Stream Description Table (see Table 2-30-1)
   attribute MpegTsDescriptor[] descriptors;
}

interface MpegTsPrivateSection : MpegTsSection { //Private section (see Table 2-30)
    attribute boolean privateIndicator;
    attribute MpegTsSyntaxSection? syntaxSection;
    ArrayBuffer privateData;
}

ISOBMFF

Track order

Order of TrackBox (trak) boxes in the MovieBox (moov) container. See http://standards.iso.org/ittf/PubliclyAvailableStandards/c061988_ISO_IEC_14496-12_2012.zip.

Track id attribute

Decimal representation of the track_ID of a TrackHeaderBox (tkhd) in a TrackBox (trak) box. See http://standards.iso.org/ittf/PubliclyAvailableStandards/c061988_ISO_IEC_14496-12_2012.zip.

Track label attribute

The name field in the HandlerBox. See http://standards.iso.org/ittf/PubliclyAvailableStandards/c061988_ISO_IEC_14496-12_2012.zip.

Track language attribute

The language field in the MediaHeaderBox. See http://standards.iso.org/ittf/PubliclyAvailableStandards/c061988_ISO_IEC_14496-12_2012.zip.

The following table is based on the MP4 Registration Authority codec table and on the 2004 Edition of the ISO Base Media File Format standard (Download ISO/IEC 14496-12:2004).

WebM

Track order

Order in the EBML Initialization Segment. See http://www.w3.org/TR/media-source/#init-segment

Track id attribute

Decimal representation of the TrackUID of a track in the Segment info. See http://www.webmproject.org/code/specs/container/#ebml-basics and http://matroska.org/technical/specs/index.html#TrackUID

Track label attribute

The "name" field of a track in the Segment info. See http://matroska.org/technical/specs/index.html#Name

Track language attribute

The "language" field of a track in the Segment info. See http://matroska.org/technical/specs/index.html#Language

Ogg

Track order

Order in skeleton fisbone headers. https://wiki.xiph.org/Ogg_Skeleton_4

Track id attribute

The "name" field of the message header fields of a fisbone header in Ogg Skeleton. See https://wiki.xiph.org/Ogg_Skeleton_4

If no Skeleton header is available, use a decimal representation of the stream serialnumber. See http://xiph.org/ogg/doc/oggstream.html

Track label attribute

The contents of the "title" message header field of a fisbone header in Ogg Skeleton, if a Skeleton header is available. See https://wiki.xiph.org/Ogg_Skeleton_4

Empty otherwise.

Track language attribute

The "language" message header field of a fisbone header in Ogg Skeleton, if a Skeleton header is available. See https://wiki.xiph.org/Ogg_Skeleton_4

DASH

Track order

Order in the MPD.

Track id attribute

id attribute in the AdaptationSet element. Empty string if id attribute is not present. See ISO/IEC 23009-1.

Track label attribute

Empty string.

Track language attribute

lang attribute in the AdaptationSet element. See ISO/IEC 23009-1

Audio and Video Track Creation

The following table summarize how audio and video track kind attribute should be set.

Audio and Video Track kind Attribute
Applies to kind Detection
video main Role schemeIdUri = urn:mpeg:dash:role:2011, value = main
audio descriptions Role schemeIdUri = urn:mpeg:dash:role:2011, value = supplementary and Role schemeIdUri = urn:dlna:dash:role:2012, value = description
main Role schemeIdUri = urn:mpeg:dash:role:2011, value = main
main-desc Role schemeIdUri = urn:mpeg:dash:role:2011, value = main and Role schemeIdUri = urn:dlna:dash:role:2012, value = description
translation Role descriptor with URN = urn:mpeg:dash:role:2011, value = dub

TextTrack Creation

The following table summarize how text track kind attribute should be set.

Text Track kind Attribute
kind Detection
captions Role descriptor with URN = urn:mpeg:dash:role:2011, value = caption
subtitles Role descriptor with URN = urn:mpeg:dash:role:2011, value = subtitle
metadata AdaptationSet mimeType of 'application' or 'text' and no Role descriptor with value of caption or subtitle

Exposing In-band Track Metadata

Three implementation alternatives are described for making media resource in-band track metadata available to Web applications:

  • As a separate TextTrack, hereafter referred to as "In-band Metadata TextTrack". This alternative is based on the CableLabs Specification.
  • As an attribute of the track object. This is based on a discussion on public-inbandtracks@w3.org.
  • Use the existing inbandTrackMetadataDispatchType attribute. This only makes metadata available for inband text tracks of kind == 'metadata'.

In-band Metadata TextTrack Alternative

Media resource in-band track metadata is made available to the Web application as a TextTrack. The UA creates an in-band metadata TextTrack as appropriate for the media resource type.

The UA creates a DataCue that contains a representation of the media resource in-band track metadata is follows:

  • startTime is set to the current time in the media resource timeline.
  • endTime is set to Infinity.
  • data and text attributes are set as follows, based on the type of the media resource:
    • MPEG-2 TS:
      • data - Let pmt length be the value of the "program_info_length" in the TS_program_map_section, interpreted as an integer as defined by the MPEG-2 specification. Set data to program descriptor bytes be the pmt length bytes following the "program_info_length" field.
      • text is set to the empty string.
      • pauseOnExit is set to false.
    • MP4 ISOBMFF:
    • Ogg:
    • WebM:
    • DASH:

In-band Metadata Attribute Alternative

Media resource in-band track metadata is made available to the Web application as a new attribute, e.g. trackMetadata, of the video, audio or text track. The inBandMetadataTrackDispatchType fulfills the function proposed for the trackMetadata attribute, but only for TextTracks where "kind" == metadata.

The UA sets the trackMetadata attribute for each video, audio and text track it creates, depending on the media resource type, as follows:

  • MPEG-2 TS: Let stream type be the value of the "stream_type" field describing the track's type in the media resource's program map section, interpreted as an 8-bit unsigned integer. Let length be the value of the "ES_info_length" field for the track in the same part of the program map section, interpreted as an integer as defined by the MPEG-2 specification. Let descriptor bytes be the length bytes following the "ES_info_length" field. Set trackMetadata to the concatenation of: the string “video/mp2t”, the stream type byte and the descriptor bytes bytes, expressed in hexadecimal using uppercase ASCII hex digits.
  • MP4 ISOBMFF:
  • Ogg:
  • WebM:
  • DASH:

Another option would be to expose the descriptor data as an ArrayBuffer, and add an additional attribute to indicate that the stream is MPEG-TS:

interface TextTrack {

   ...
   readonly attribute ArrayBuffer metadata; // binary descriptor data

}

interface HTMLMediaElement {

   ...
   readonly attribute mediaType; // "video/mp2t"

}

inBandMetadataTrackDispatchType Alternative

Media resource in-band track metadata is made available to the Web application using the inbandMetadataTrackDispatchType attribute defined in HTML5. This attribute is only defined for TextTracks @kind == 'metatdata'; metadata will not be available to Web applications for video and audio tracks, and text tracks @kind != 'metadata'.

Alternative Pros and Cons

In-band Track Metadata Pros/Cons
Alternative Pros Cons
In-band Metadata TextTrack
  • No new HTML5 attribute or API.
  • Web app access to metadata for all track types.
  • Binary data doesn't need to be encoded.
  • Not implemented in most UAs.
  • Extra work for script to access track metadata.
  • Duplicates function of inBandMetadataTrackDispatchType.
  • Mixing metadata with cues is possibly inelegant.
In-band Metadata Attribute
  • Web app access to metadata for all track types.
  • Simple for script to access track metadata.
  • Metadata as an attribute seems more elegant.
  • Since this is a new attribute, we could make it an ArrayBuffer and avoid unnecessary base64 encoding.
  • Requires new HTML5 attribute for audio and video track. Requires changing the name of the inBandMetadataTrackDispatchType attribute.
  • No use cases have been identified.
inBandMetadataTrackDispatchType
  • No new HTML5 attributes or APIs.
  • Implemented in UAs.
  • Covers known use cases.
  • Metadata as an attribute seems more elegant.
  • Web app access only to metadata text tracks. Is there a use case where this is a problem?
  • Need to encode binary data as a base64 string.

Proposed HTML spec changes

Audio and video kind table

Additions to the existing table are green

Return values for AudioTrack.kind and VideoTrack.kind
Category Definition Applies to... Examples
"alternative" A possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video). Audio and video. Ogg: "audio/alternate" or "video/alternate"; DASH: "alternate" without "main" and "commentary" roles, and, for audio, without the "dub" role (other roles ignored).
"captions" A version of the main video track with captions burnt in. (For legacy content; new content would use text tracks.) Video only. DASH: "caption" and "main" roles together (other roles ignored).
"descriptions" An audio description of a video track. Audio only. Ogg: "audio/audiodesc".; AC3 audio in MPEG-2 TS: bsmod=2 and full_svc=0.; DASH: "supplementary" and urn:dlna:dash:role:2012 "description" roles together.
"main" The primary audio or video track. Audio and video. Ogg: "audio/main" or "video/main"; WebM: the "FlagDefault" element is set; DASH: "main" role without "caption", "subtitle", and "dub" roles (other roles ignored).; MPEG-2 TS: first audio (video) elementary stream in the PMT.
"main-desc" The primary audio track, mixed with audio descriptions. Audio only. AC3 audio in MPEG-2 TS: bsmod=2 and full_svc=1.; DASH: "main" and urn:dlna:dash:role:2012 "description" roles together.
"sign" A sign-language interpretation of an audio track. Video only. Ogg: "video/sign".
"subtitles" A version of the main video track with subtitles burnt in. (For legacy content; new content would use text tracks.) Video only. DASH: "subtitle" and "main" roles together (other roles ignored).
"translation" A translated version of the main audio track. Audio only. Ogg: "audio/dub". DASH: "dub" and "main" roles together (other roles ignored).; AC3 audio in MPEG-2 TS: not first audio elementary stream in the PMT and bsmod=0.
"commentary" Commentary on the primary audio or video track, e.g. a director's commentary. Audio and video. DASH: "commentary" role without "main" role (other roles ignored).
"" (empty string) No explicit kind, or the kind given by the track's metadata is not recognised by the user agent. Audio and video. Any other track type, track role, or combination of track roles not described above.

Text kind table

Add this table in this section.

Return values for TextTrack.kind
Category Definition Examples
"captions" Text of the main dialogue. MPEG-2 TS: Presence of caption_service_descriptor or closed_caption_type in a video elementary stream; DASH: "caption" role.
"chapters"
"descriptions" A text description of a video track.
"subtitles" Text translation of the main dialogue. MPEG-2 TS: stream_type=0x82 that is rendered by the user agent; MP4 ISOBMFF: handler_type of 'subt'; DASH: "subtitle" role.
"metadata" Tracks to be interpreted by the Web application. MPEG-2 TS: stream_type=0x05, 0x80 - 0xFF, 0x82 that is not rendered by the user agent; MP4 ISOBMFF: handler_type of 'meta'.

Guidelines for creating metadata text track cues

Add this table in this section.

Guidelines for creating metadata text track cues
Media resource type Cue creation guidelines
DASH Cues are created as defined for the AdaptationSet mimeType. See ISO/IEC 23009-1.
MPEG-4 BMFF Cues are created as defined for the MetaDataSampleEntry namespace or mime_format. See ISO/IEC 14496-12:2012.
MPEG-2 TS Cue data is the entire private section table starting with table_id through CRC_32. See ISO/IEC 13818-1:2013. The cue startTime and endTime is the time in the media resource timeline when the cue is created. pauseOnExit is false.
Ogg TBD.
WebM TBD.

Exposing a Media Resource Specific TextTrack

Add the following to Step 4.

If the media resource is a DASH MPD file

The text track in-band metadata track dispatch type must be set to the concatenation of the AdaptationSet element attributes and all child Role descriptors.

Media Resource Formats

The following tables give a summary per media type of how the different stream types are exposed.

MPEG-2 TS

The following table is based on the 2013 Edition of the MPEG-2 TS standard (Download ITU-T H.222.0 | ISO/IEC 13818-1:2013).

id: The id for every track should be a string containing the track's PID formatted as a decimal number.

language: The language should be determined from the ISO_639_language_descriptor

NOTE: The list is probably too big, we could trim it and say that those not listed are not supported yet.
<BobLund131216>I updated this table.

  1. Unless there is a case where an audio/video elementary stream not rendered by the UA needs to be presented to the application, these streams should not cause creation of a text track.
  2. Stream types 0x05 and 0x80 - 0xFF are used for a variety of text tracks, e.g. 0x05 ETV signaling, 0x82 SCTE 27 subtitles. Others are user private but could still be used for Web applications, e.g. Nielsen rating signaling.

</BobLund131216>

NOTE: We could also extend it with DVB, ATSC, ... stream types, and/or using the SMPTE registration data.

MPEG-2 TS Stream Type HTML 5 Track attributes
Stream type value description Track Interface kind inBandMetadataTrackDispatchType cue payload
0x00 ITU-T - ISO/IEC Reserved N/A
0x01 ISO/IEC 11172-2 Video VideoTrack if decoding is supported, TextTrack otherwise
0x02 ITU-T Rec. H.262 - ISO/IEC 13818-2 Video or ISO/IEC 11172-2 constrained parameter video stream VideoTrack if decoding is supported + TextTrack if stream contains closed captions (as identified by a caption_service_descriptor in the PMT or a data_type == 0x03 in the Picture User Data, TextTrack otherwise
0x03 ISO/IEC 11172-3 Audio AudioTrack if decoding is supported, TextTrack otherwise
0x04 ISO/IEC 13818-3 Audio AudioTrack if decoding is supported, TextTrack otherwise
0x05 ITU-T Rec. H.222.0 - ISO/IEC 13818-1 private_sections N/ATextTrack
0x06 ITU-T Rec. H.222.0 - ISO/IEC 13818-1 PES packets containing private data N/A
0x07 ISO/IEC 13522 MHEG N/A
0x08 ITU-T Rec. H.222.0 - ISO/IEC 13818-1 Annex A DSM-CC N/A
0x09 ITU-T Rec. H.222.1 N/A
0x0A ISO/IEC 13818-6 type A N/A
0x0B ISO/IEC 13818-6 type B N/A
0x0C ISO/IEC 13818-6 type C N/A
0x0D ISO/IEC 13818-6 type D N/A
0x0E ITU-T Rec. H.222.0 - ISO/IEC 13818-1 auxiliary N/A
0x0F ISO/IEC 13818-7 Audio with ADTS transport syntax AudioTrack if decoding is supported, TextTrack otherwise
0x10 ISO/IEC 14496-2 Visual VideoTrack if decoding is supported, TextTrack otherwise
0x11 ISO/IEC 14496-3 Audio with the LATM transport syntax as defined in ISO/IEC 14496-3/AMD-1 AudioTrack if decoding is supported, TextTrack otherwise
0x12 ISO/IEC 14496-1 SL-packetized stream or FlexMux stream carried in PES packets N/A
0x13 ISO/IEC 14496-1 SL-packetized stream or FlexMux stream carried in ISO/IEC14496_sections N/A
0x14 ISO/IEC 13818-6 Synchronized Download Protocol N/A
0x15 Metadata carried in PES packets N/A
0x16 Metadata carried in metadata_sections N/A
0x17 Metadata carried in ISO/IEC 13818-6 Data Carousel N/A
0x18 Metadata carried in ISO/IEC 13818-6 Object Carousel N/A
0x19 Metadata carried in ISO/IEC 13818-6 Synchronized Download Protocol N/A
0x1A IPMP stream (defined in ISO/IEC 13818-11, MPEG-2 IPMP) N/A
0x1B AVC video stream conforming to one or more profiles defined in Annex A of Rec. ITU-T H.264 - ISO/IEC 14496-10 or AVC video sub-bitstream of SVC, or MVC base view sub-bitstream, or AVC video sub-bitstream of MVC VideoTrack if decoding is supported, TextTrack otherwise
0x1C ISO/IEC 14496-3 Audio, without using any additional transport syntax, such as DST, ALS and SLS AudioTrack if decoding is supported, TextTrack otherwise
0x1D ISO/IEC 14496-17 Text TextTrack
0x1E Auxiliary video stream as defined in ISO/IEC 23002-3 VideoTrack if decoding is supported, TextTrack otherwise
0x1F SVC video sub-bitstream of an AVC video stream conforming to one or more profiles defined in Annex G of Rec. ITU-T H.264 - ISO/IEC 14496-10 VideoTrack if decoding is supported, TextTrack otherwise
0x20 MVC video sub-bitstream of an AVC video stream conforming to one or more profiles defined in Annex H of Rec. ITU-T H.264 - ISO/IEC 14496-10 VideoTrack if decoding is supported, TextTrack otherwise
0x21 Video stream conforming to one or more profiles as defined in Rec. ITU-T T.800 - ISO/IEC 15444-1 VideoTrack if decoding is supported, TextTrack otherwise
0x22 Additional view Rec. ITU-T H.262 - ISO/IEC 13818-2 video stream for service-compatible stereoscopic 3D services VideoTrack if decoding is supported, TextTrack otherwise
0x23 Additional view Rec. ITU-T H.264 - ISO/IEC 14496-10 video stream for service-compatible stereoscopic 3D services VideoTrack if decoding is supported, TextTrack otherwise
0x24-0x7E ITU-T Rec. H.222.0 - ISO/IEC 13818-1 Reserved N/A
0x7F IPMP stream N/A
0x80-0xFF User Private N/ATextTrack

ISOBMFF

The following table is based on the MP4 Registration Authority codec table and on the 2004 Edition of the ISO Base Media File Format standard (Download ISO/IEC 14496-12:2004).

id: The id should be a string containing the track_ID in TrackHeaderBox, formatted as a decimal number.

label: The label should be the name in HandlerBox.

language: The language should be determined from the language in MediaHeaderBox.

ISOBMFF Types HTML 5 Track attributes
Track Sample Entry 4-Char-Code Description Stream type (Track handler) Defined in/by MPEG-4 Stream Type MPEG-4 Object Type Indication Track Interface kind inBandMetadataTrackDispatchType cue payload
3gvo 3GPP Video Orientation Metadata 3GPP TextTrack
ac-3 AC-3 audio Audio ETSI 0xA5 AudioTrack if decoding is supported, TextTrack otherwise
alac Apple lossless audio codec Audio Apple AudioTrack if decoding is supported, TextTrack otherwise
avc1 Advanced Video Coding Video AVC 0x20 VideoTrack if decoding is supported, TextTrack otherwise
avc2 Advanced Video Coding Video AVC 0x20 VideoTrack if decoding is supported, TextTrack otherwise
avcp Advanced Video Coding Parameters Video AVC 0x20 VideoTrack if decoding is supported, TextTrack otherwise
dra1 DRA Audio Audio DRA 0xA7 AudioTrack if decoding is supported, TextTrack otherwise
drac Dirac Video Coder Video Dirac 0xA4 VideoTrack if decoding is supported, TextTrack otherwise
dtsc DTS Coherent Acoustics audio Audio DTS 0xA9 AudioTrack if decoding is supported, TextTrack otherwise
dtse DTS Express low bit rate audio, also known as DTS LBR Audio DTS 0xAC AudioTrack if decoding is supported, TextTrack otherwise
dtsh DTS-HD High Resolution Audio Audio DTS 0xAA AudioTrack if decoding is supported, TextTrack otherwise
dtsl DTS-HD Master Audio Audio DTS 0xAB AudioTrack if decoding is supported, TextTrack otherwise
ec-3 Enhanced AC-3 audio Audio ETSI 0xA6 AudioTrack if decoding is supported, TextTrack otherwise
enca Encrypted/Protected audio Audio ISMAc, ISO
encs Encrypted Systems stream (various) ISO
enct Encrypted Text Text ISO
encv Encrypted/protected video Video ISMAc, ISO
fdp$20 File delivery hints Hint ISO
g719 ITU-T Recommendation G.719 (2008) Audio ITU G.719 0xA8 AudioTrack if decoding is supported, TextTrack otherwise
g726 ITU-T Recommendation G.726 (1990) Audio SDV AudioTrack if decoding is supported, TextTrack otherwise
ixse DVB Track Level Index Track Metadata DVB TextTrack
m2ts MPEG-2 transport stream for DMB Hint DMB-MAF
m4ae MPEG-4 Audio Enhancement Audio MP4v1 AudioTrack if decoding is supported, TextTrack otherwise
m4ae MPEG-4 Audio Enhancement Audio MP4v2 0x40, others AudioTrack if decoding is supported, TextTrack otherwise
mett Text timed metadata Metadata ISO TextTrack
metx XML timed metadata Metadata ISO TextTrack
mjp2 Motion JPEG 2000 Video MJ2 VideoTrack if decoding is supported, TextTrack otherwise
mlix DVB Movie level index track Metadata DVB TextTrack
mlpa MLP Audio Audio Dolby MLP AudioTrack if decoding is supported, TextTrack otherwise
mp4a MPEG-4 Audio Audio MP4v1 0x40, others AudioTrack if decoding is supported, TextTrack otherwise
mp4s MPEG-4 Systems (various) MP4V1 various TextTrack
mp4v MPEG-4 Visual Video MP4V1 0x20, others VideoTrack if decoding is supported, TextTrack otherwise
mvc1 Multiview coding Video AVC VideoTrack if decoding is supported, TextTrack otherwise
mvc2 Multiview coding Video AVC VideoTrack if decoding is supported, TextTrack otherwise
oksd OMA Keys Metadata OMA DRM XBS
pm2t Protected MPEG-2 Transport Hint ISO
prtp Protected RTP Reception Hint ISO
raw$20 Uncompressed audio Audio MJ2 AudioTrack if decoding is supported, TextTrack otherwise
resv Restricted Video Video AVC VideoTrack if decoding is supported, TextTrack otherwise
rm2t MPEG-2 Transport Reception Hint ISO
rrtp RTP reception Hint ISO
rsrp SRTP Reception Hint ISO
rtp$20 RTP Hints Hint ISO
s263 ITU H.263 video (3GPP format) Video 3GPP VideoTrack if decoding is supported, TextTrack otherwise
samr Narrowband AMR voice Audio 3GPP AudioTrack if decoding is supported, TextTrack otherwise
sawb Wideband AMR voice Audio 3GPP AudioTrack if decoding is supported, TextTrack otherwise
sawp Extended AMR-WB (AMR-WB+) Audio 3GPP AudioTrack if decoding is supported, TextTrack otherwise
sevc EVRC Voice Audio 3GPP2 0xA0 AudioTrack if decoding is supported, TextTrack otherwise
sm2t MPEG-2 Transport Server Hint ISO
sqcp 13K Voice Audio 3GPP2 0xE1 AudioTrack if decoding is supported, TextTrack otherwise
srtp SRTP Hints Hint ISO
ssmv SMV Voice Audio 3GPP2 0xA1 AudioTrack if decoding is supported, TextTrack otherwise
stpp Sub-titles (Timed Text) Text DECE TextTrack
svc1 Scalable Video Coding Video AVC VideoTrack if decoding is supported, TextTrack otherwise
svcM SVC metadata Metadata AVC TextTrack
tc64 64 bit timecode samples Timecode Apple TextTrack
text Textual meta-data with MIME type Metadata MPEG4 TextTrack
tmcd 32 bit timecode samples Timecode Apple TextTrack
twos Uncompressed 16-bit audio Audio MJ2 AudioTrack if decoding is supported, TextTrack otherwise
tx3g Timed Text stream Text 3GPP TextTrack
urim URI identified timed metadata Metadata ISO TextTrack
vc-1 SMPTE VC-1 Video SMPTE 0xA3 VideoTrack if decoding is supported, TextTrack otherwise
xml$20 XML-formatted meta-data Metadata MPEG4 TextTrack

WebM / Matroska

See: The WebM spec and the Matroska spec.

id: The id should be a string containing the TrackUID formatted as a decimal number.

label: The label should be be the Name, decoded from UTF-8.

language: The language should be determined from the Language.

OGG

id: If the file contains a Skeleton section, the id should be the contents of the Name header, decoded from UTF-8. Otherwise, it should be a string containing the stream serial number, formatted as a decimal number.

label: If the file contains a Skeleton section, the label should be the contents of the Title header.

language: If the file contains a Skeleton section, the label should be determined from the Language header.

TODO: The Skeleton section itself should probably be exposed as a metadata track, and so should any unknown streams.