Bug 13359 - <track> A way is needed to identify the type of data in a track element
Summary: <track> A way is needed to identify the type of data in a track element
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 blocker
Target Milestone: ---
Assignee: Silvia Pfeiffer
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: a11y, a11ytf, media
Depends on:
Blocks:
 
Reported: 2011-07-25 22:29 UTC by Bob Lund
Modified: 2012-11-28 15:04 UTC (History)
20 users (show)

See Also:


Attachments
Mapping in-band MPEG-2 TS tracks to HTML5 (31.31 KB, text/html)
2011-11-03 21:20 UTC, Bob Lund
Details
Attachment for comment 57 (53.31 KB, application/pdf)
2012-07-03 15:37 UTC, Bob Lund
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bob Lund 2011-07-25 22:29:56 UTC
Problem Statement

HTML5 provides access to timed text tracks and its constituent data via the TextTrack and TextTrackCue interfaces for the purpose of allowing (1) script control over showing or hiding the associated track content via TextTrack.mode, and (2) script access to the constituent track data (as cues) via TextTrack.getCueAsSource(). In order for application authored script to (1) make decisions about showing/hiding track content and (2) interpret the raw source content, it is necessary to provide the script with additional information about the form of the track content. In the case where TextTrack.kind == metadata it is also necessary to provide the script with information about the purpose of the track. At present, the only information provided is via TextTrack.kind and TextTrack.label; however, this information is insufficient to determine the form of the track content since it does not identify the media type or the purpose of the track for metatdata tracks. For example, if TextTrack.kind == 'metadata', then the track content may contain ETV EISS data [1], Ad insertion data (SCTE-35) [2], generic XML data, etc.  Similarly, if TextTrack.kind == 'captions', then the track content may contain WebVTT data [3], Timed Text (TTML) data [4], CEA-708 data [5], etc.


Proposed Solution

In order to resolve the above problem, it is proposed that a @type attribute be added to the track element and an associated IDL attribute be added to the TextTrack interface. When specified by the content author, this media type attribute would serve as a hint to application script as to the media type employed for the track content.

[1] http://www.cablelabs.com/specifications/OC-SP-ETV-AM1.0-I06-110128.pdf
[2] http://www.scte.org/standards/Standards_Available.aspx
[3] http://www.whatwg.org/specs/web-apps/current-work/webvtt.html
[4] http://www.w3.org/TR/ttaf1-dfxp/
[5] http://www.ce.org/Standards/listings.asp
Comment 1 Michael[tm] Smith 2011-08-04 05:06:48 UTC
mass-moved component to LC1
Comment 2 Ian 'Hixie' Hickson 2011-08-19 16:30:13 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: I don't understand. What is the concrete use case where a script would be doing anything with a kind=metadata track and not know what the metadata is but yet would still be able to do something useful with it?
Comment 3 Glenn Adams 2011-08-19 17:24:29 UTC
(In reply to comment #2)
> EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> satisfied with this response, please change the state of this bug to CLOSED. If
> you have additional information and would like the editor to reconsider, please
> reopen this bug. If you would like to escalate the issue to the full HTML
> Working Group, please add the TrackerRequest keyword to this bug, and suggest
> title and text for the tracker issue; or you may create a tracker issue
> yourself, if you are able to do so. For more details, see this document:
>    http://dev.w3.org/html5/decision-policy/decision-policy.html
> 
> Status: Did Not Understand Request
> Change Description: no spec change
> Rationale: I don't understand. What is the concrete use case where a script
> would be doing anything with a kind=metadata track and not know what the
> metadata is but yet would still be able to do something useful with it?

My understanding (Bob may wish to correct) is as follows:

Page origin O exposes video/audio from a 3rd party P, which contains/refers to a metadata track sourced by P or sourced by a 4th party M.

Page origin O provides JS for interpreting metadata tracks with media types M1, M2, or M3.

If Page author O is provided with content type information for the metadata track contained in response headers (or by other means if fetched via non-HTTP), then O's JS can select appropriate handling of metadata content exposed to JS via TextTrackCue.getCueAsSource().

This information could be exposed to JS via a new TextTrack.type IDL attribute.
Comment 4 Bob Lund 2011-08-19 22:18:04 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> > satisfied with this response, please change the state of this bug to CLOSED. If
> > you have additional information and would like the editor to reconsider, please
> > reopen this bug. If you would like to escalate the issue to the full HTML
> > Working Group, please add the TrackerRequest keyword to this bug, and suggest
> > title and text for the tracker issue; or you may create a tracker issue
> > yourself, if you are able to do so. For more details, see this document:
> >    http://dev.w3.org/html5/decision-policy/decision-policy.html
> > 
> > Status: Did Not Understand Request
> > Change Description: no spec change
> > Rationale: I don't understand. What is the concrete use case where a script
> > would be doing anything with a kind=metadata track and not know what the
> > metadata is but yet would still be able to do something useful with it?
> 
> My understanding (Bob may wish to correct) is as follows:
> 
> Page origin O exposes video/audio from a 3rd party P, which contains/refers to
> a metadata track sourced by P or sourced by a 4th party M.
> 
> Page origin O provides JS for interpreting metadata tracks with media types M1,
> M2, or M3.
> 
> If Page author O is provided with content type information for the metadata
> track contained in response headers (or by other means if fetched via
> non-HTTP), then O's JS can select appropriate handling of metadata content
> exposed to JS via TextTrackCue.getCueAsSource().
> 
> This information could be exposed to JS via a new TextTrack.type IDL attribute.

To expand, 3rd party P adds three in-band tracks: IB1 - parental control content advisories, IB2 - SCTE-35 segment descriptors for targeted advertising and IB3 -EISS for interactive television. (In reply to comment #3)
> (In reply to comment #2)
> > EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> > satisfied with this response, please change the state of this bug to CLOSED. If
> > you have additional information and would like the editor to reconsider, please
> > reopen this bug. If you would like to escalate the issue to the full HTML
> > Working Group, please add the TrackerRequest keyword to this bug, and suggest
> > title and text for the tracker issue; or you may create a tracker issue
> > yourself, if you are able to do so. For more details, see this document:
> >    http://dev.w3.org/html5/decision-policy/decision-policy.html
> > 
> > Status: Did Not Understand Request
> > Change Description: no spec change
> > Rationale: I don't understand. What is the concrete use case where a script
> > would be doing anything with a kind=metadata track and not know what the
> > metadata is but yet would still be able to do something useful with it?
> 
> My understanding (Bob may wish to correct) is as follows:
> 
> Page origin O exposes video/audio from a 3rd party P, which contains/refers to
> a metadata track sourced by P or sourced by a 4th party M.
> 
> Page origin O provides JS for interpreting metadata tracks with media types M1,
> M2, or M3.
> 
> If Page author O is provided with content type information for the metadata
> track contained in response headers (or by other means if fetched via
> non-HTTP), then O's JS can select appropriate handling of metadata content
> exposed to JS via TextTrackCue.getCueAsSource().
> 
> This information could be exposed to JS via a new TextTrack.type IDL attribute.

To expand, consider 3rd party P adding three in-band signaling tracks: IB1 - content advisories for parental control, IB2 - SCTE35 segment descriptors for targeted advertising and IB3 - EISS for interactive television. The user agent executing Page O recognizes these in-band tracks and sources them as three different track elements with kind = metadata as described in [1]. Since the user agent knows how to recognize these in-band tracks it can inform JS of the metadata type by a new TextTrack.type IDL attribute. Without this new attribute, JS will have to contain logic to infer the metadata type, redoing what the UA has already done.

[1] http://dev.w3.org/html5/spec/Overview.html#sourcing-in-band-text-tracks
Comment 5 Silvia Pfeiffer 2011-08-20 03:08:37 UTC
(In reply to comment #4)
> (In reply to comment #3)
> To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> content advisories for parental control, IB2 - SCTE35 segment descriptors for
> targeted advertising and IB3 - EISS for interactive television. The user agent
> executing Page O recognizes these in-band tracks and sources them as three
> different track elements with kind = metadata as described in [1]. Since the
> user agent knows how to recognize these in-band tracks it can inform JS of the
> metadata type by a new TextTrack.type IDL attribute. Without this new
> attribute, JS will have to contain logic to infer the metadata type, redoing
> what the UA has already done.

How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox) recognize IB1, IB2, and IB3? They are not specified in HTML so there is no requirement for them to be able to decode them and I haven't seen any moves that spans across these and other UAs to make IB1-3 a standard-supported format in them. If there is, then it would make a lot more sense to have actual kind=IB1-3 values as part of the spec IMHO.
Comment 6 Glenn Adams 2011-08-20 06:42:32 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> > content advisories for parental control, IB2 - SCTE35 segment descriptors for
> > targeted advertising and IB3 - EISS for interactive television. The user agent
> > executing Page O recognizes these in-band tracks and sources them as three
> > different track elements with kind = metadata as described in [1]. Since the
> > user agent knows how to recognize these in-band tracks it can inform JS of the
> > metadata type by a new TextTrack.type IDL attribute. Without this new
> > attribute, JS will have to contain logic to infer the metadata type, redoing
> > what the UA has already done.
> 
> How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> requirement for them to be able to decode them and I haven't seen any moves
> that spans across these and other UAs to make IB1-3 a standard-supported format
> in them. If there is, then it would make a lot more sense to have actual
> kind=IB1-3 values as part of the spec IMHO.

There is no requirement that an HTML5 UA decode PNG files, JPEG files, MPEG video, or even JavaScript or CSS for that matter. So what's the point of your question?
Comment 7 Silvia Pfeiffer 2011-08-20 07:12:03 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > (In reply to comment #3)
> > > To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> > > content advisories for parental control, IB2 - SCTE35 segment descriptors for
> > > targeted advertising and IB3 - EISS for interactive television. The user agent
> > > executing Page O recognizes these in-band tracks and sources them as three
> > > different track elements with kind = metadata as described in [1]. Since the
> > > user agent knows how to recognize these in-band tracks it can inform JS of the
> > > metadata type by a new TextTrack.type IDL attribute. Without this new
> > > attribute, JS will have to contain logic to infer the metadata type, redoing
> > > what the UA has already done.
> > 
> > How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> > recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> > requirement for them to be able to decode them and I haven't seen any moves
> > that spans across these and other UAs to make IB1-3 a standard-supported format
> > in them. If there is, then it would make a lot more sense to have actual
> > kind=IB1-3 values as part of the spec IMHO.
> 
> There is no requirement that an HTML5 UA decode PNG files, JPEG files, MPEG
> video, or even JavaScript or CSS for that matter. So what's the point of your
> question?

There is also no field in HTML that provides a hint on which image file format is being used. Why would there need to be one that hints on which metadata format is being used?
Comment 8 Glenn Adams 2011-08-20 07:28:05 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > (In reply to comment #4)
> > > > (In reply to comment #3)
> > > > To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> > > > content advisories for parental control, IB2 - SCTE35 segment descriptors for
> > > > targeted advertising and IB3 - EISS for interactive television. The user agent
> > > > executing Page O recognizes these in-band tracks and sources them as three
> > > > different track elements with kind = metadata as described in [1]. Since the
> > > > user agent knows how to recognize these in-band tracks it can inform JS of the
> > > > metadata type by a new TextTrack.type IDL attribute. Without this new
> > > > attribute, JS will have to contain logic to infer the metadata type, redoing
> > > > what the UA has already done.
> > > 
> > > How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> > > recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> > > requirement for them to be able to decode them and I haven't seen any moves
> > > that spans across these and other UAs to make IB1-3 a standard-supported format
> > > in them. If there is, then it would make a lot more sense to have actual
> > > kind=IB1-3 values as part of the spec IMHO.
> > 
> > There is no requirement that an HTML5 UA decode PNG files, JPEG files, MPEG
> > video, or even JavaScript or CSS for that matter. So what's the point of your
> > question?
> 
> There is also no field in HTML that provides a hint on which image file format
> is being used. Why would there need to be one that hints on which metadata
> format is being used?

Because we are not talking about the UA interpreting the metadata here, we are talking about client JS interpreting the metadata (track). The utility of TextTrackCue.getCueAsSource() is put in question unless some information is provided on type. The alternative is that the JS client code must perform content sniffing on the result of getCueAsSource().

Also, I believe the request here is not for a markup supplied hint of type, but a UA determined actual type (presuming the UA has an embedded sniffer or access to other content type metadata in the transport or content) in order to supply actual type information via an IDL attribute.
Comment 9 Silvia Pfeiffer 2011-08-20 13:07:50 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #5)
> > > > (In reply to comment #4)
> > > > > (In reply to comment #3)
> > > > > To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> > > > > content advisories for parental control, IB2 - SCTE35 segment descriptors for
> > > > > targeted advertising and IB3 - EISS for interactive television. The user agent
> > > > > executing Page O recognizes these in-band tracks and sources them as three
> > > > > different track elements with kind = metadata as described in [1]. Since the
> > > > > user agent knows how to recognize these in-band tracks it can inform JS of the
> > > > > metadata type by a new TextTrack.type IDL attribute. Without this new
> > > > > attribute, JS will have to contain logic to infer the metadata type, redoing
> > > > > what the UA has already done.
> > > > 
> > > > How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> > > > recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> > > > requirement for them to be able to decode them and I haven't seen any moves
> > > > that spans across these and other UAs to make IB1-3 a standard-supported format
> > > > in them. If there is, then it would make a lot more sense to have actual
> > > > kind=IB1-3 values as part of the spec IMHO.
> > > 
> > > There is no requirement that an HTML5 UA decode PNG files, JPEG files, MPEG
> > > video, or even JavaScript or CSS for that matter. So what's the point of your
> > > question?
> > 
> > There is also no field in HTML that provides a hint on which image file format
> > is being used. Why would there need to be one that hints on which metadata
> > format is being used?
> 
> Because we are not talking about the UA interpreting the metadata here, we are
> talking about client JS interpreting the metadata (track). The utility of
> TextTrackCue.getCueAsSource() is put in question unless some information is
> provided on type. The alternative is that the JS client code must perform
> content sniffing on the result of getCueAsSource().


Why not just use a data-type attribute that provides this information? JS-interpreted data is what the data-* attributes are made for.


 
> Also, I believe the request here is not for a markup supplied hint of type, but
> a UA determined actual type (presuming the UA has an embedded sniffer or access
> to other content type metadata in the transport or content) in order to supply
> actual type information via an IDL attribute.


"it is proposed that a @type attribute be added to the track element and an associated IDL attribute be added to the TextTrack interface" - that's a request for a content attribute IIUC.
Comment 10 Bob Lund 2011-08-20 13:19:10 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > (In reply to comment #6)
> > > > (In reply to comment #5)
> > > > > (In reply to comment #4)
> > > > > > (In reply to comment #3)
> > > > > > To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> > > > > > content advisories for parental control, IB2 - SCTE35 segment descriptors for
> > > > > > targeted advertising and IB3 - EISS for interactive television. The user agent
> > > > > > executing Page O recognizes these in-band tracks and sources them as three
> > > > > > different track elements with kind = metadata as described in [1]. Since the
> > > > > > user agent knows how to recognize these in-band tracks it can inform JS of the
> > > > > > metadata type by a new TextTrack.type IDL attribute. Without this new
> > > > > > attribute, JS will have to contain logic to infer the metadata type, redoing
> > > > > > what the UA has already done.
> > > > > 
> > > > > How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> > > > > recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> > > > > requirement for them to be able to decode them and I haven't seen any moves
> > > > > that spans across these and other UAs to make IB1-3 a standard-supported format
> > > > > in them. If there is, then it would make a lot more sense to have actual
> > > > > kind=IB1-3 values as part of the spec IMHO.
> > > > 
> > > > There is no requirement that an HTML5 UA decode PNG files, JPEG files, MPEG
> > > > video, or even JavaScript or CSS for that matter. So what's the point of your
> > > > question?
> > > 
> > > There is also no field in HTML that provides a hint on which image file format
> > > is being used. Why would there need to be one that hints on which metadata
> > > format is being used?
> > 
> > Because we are not talking about the UA interpreting the metadata here, we are
> > talking about client JS interpreting the metadata (track). The utility of
> > TextTrackCue.getCueAsSource() is put in question unless some information is
> > provided on type. The alternative is that the JS client code must perform
> > content sniffing on the result of getCueAsSource().
> 
> 
> Why not just use a data-type attribute that provides this information?
> JS-interpreted data is what the data-* attributes are made for.
> 

The use case I offered is for the case of in-band tracks recognized by the user agent. To quote the current HTML5 spec draft "These attributes are not intended for use by software that is independent of the site that uses the attributes." This is exactly the use case offered - Page comes from O and content with in-band tracks comes from P.
> 
> 
> > Also, I believe the request here is not for a markup supplied hint of type, but
> > a UA determined actual type (presuming the UA has an embedded sniffer or access
> > to other content type metadata in the transport or content) in order to supply
> > actual type information via an IDL attribute.
> 
> 
> "it is proposed that a @type attribute be added to the track element and an
> associated IDL attribute be added to the TextTrack interface" - that's a
> request for a content attribute IIUC.

What is requested is a TextTrack.type idl attribute.
Comment 11 Glenn Adams 2011-08-20 16:28:01 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > (In reply to comment #6)
> > > > (In reply to comment #5)
> > > > > (In reply to comment #4)
> > > > > > (In reply to comment #3)
> > > > > > To expand, consider 3rd party P adding three in-band signaling tracks: IB1 -
> > > > > > content advisories for parental control, IB2 - SCTE35 segment descriptors for
> > > > > > targeted advertising and IB3 - EISS for interactive television. The user agent
> > > > > > executing Page O recognizes these in-band tracks and sources them as three
> > > > > > different track elements with kind = metadata as described in [1]. Since the
> > > > > > user agent knows how to recognize these in-band tracks it can inform JS of the
> > > > > > metadata type by a new TextTrack.type IDL attribute. Without this new
> > > > > > attribute, JS will have to contain logic to infer the metadata type, redoing
> > > > > > what the UA has already done.
> > > > > 
> > > > > How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> > > > > recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> > > > > requirement for them to be able to decode them and I haven't seen any moves
> > > > > that spans across these and other UAs to make IB1-3 a standard-supported format
> > > > > in them. If there is, then it would make a lot more sense to have actual
> > > > > kind=IB1-3 values as part of the spec IMHO.
> > > > 
> > > > There is no requirement that an HTML5 UA decode PNG files, JPEG files, MPEG
> > > > video, or even JavaScript or CSS for that matter. So what's the point of your
> > > > question?
> > > 
> > > There is also no field in HTML that provides a hint on which image file format
> > > is being used. Why would there need to be one that hints on which metadata
> > > format is being used?
> > 
> > Because we are not talking about the UA interpreting the metadata here, we are
> > talking about client JS interpreting the metadata (track). The utility of
> > TextTrackCue.getCueAsSource() is put in question unless some information is
> > provided on type. The alternative is that the JS client code must perform
> > content sniffing on the result of getCueAsSource().
> 
> 
> Why not just use a data-type attribute that provides this information?
> JS-interpreted data is what the data-* attributes are made for.
> 

In the use case mentioned previously, the page originator does not know a priori the type of the track metadata, so it cannot effectively provide a hint.
 
> 
> > Also, I believe the request here is not for a markup supplied hint of type, but
> > a UA determined actual type (presuming the UA has an embedded sniffer or access
> > to other content type metadata in the transport or content) in order to supply
> > actual type information via an IDL attribute.
> 
> 
> "it is proposed that a @type attribute be added to the track element and an
> associated IDL attribute be added to the TextTrack interface" - that's a
> request for a content attribute IIUC.

You're correct. One reads this as a request for a content authored hint of type, with the hint reflected into IDL. Perhaps Bob needs to adjust the request to reflect that it is for an IDL attribute that exposes a UA determined *actual* type, or if the UA cannot determine actual type, then a type indication based on envelope or other context information accompanying or referred to by the track content itself.

For example, if the track content is obtained via HTTP, then the UA could either sniff the content to verify actual type or it could pass along a Content-Type header value to the client JS via TextTrack.type.
Comment 12 Ian 'Hixie' Hickson 2011-08-23 05:23:24 UTC
Please note that there is no reason to quote the entirety of a comment when replying. Either quote nothing or quote just what you are responding to.


Is there a _concrete_ example here? I really don't understand the use case being presented. Re comment 3, for example, why would O be exposing content from P?
Comment 13 Glenn Adams 2011-08-23 05:52:03 UTC
(In reply to comment #12)
> Please note that there is no reason to quote the entirety of a comment when
> replying. Either quote nothing or quote just what you are responding to.
> 
> 
> Is there a _concrete_ example here? I really don't understand the use case
> being presented. Re comment 3, for example, why would O be exposing content
> from P?

CableTVCorp (O) creates an electronic program guide with a page depicted as a grid of videos V1...V4 provided by 3rd party content providers P1...P4. V1...V4, in turn, contain or refer to timed track metadata content (e.g., video descriptions, etc), which UA (upon decoding V1...V4) expose to O's JS client code via TextTrackCue.getCueAsSource(). O does not (generally cannot) a priori know the timed track metadata content type, however, the UA does (or can) know it, since it decoded V1...V4 and was responsible for exposing the cue data to JS in the first place.

Knowing the actual content type of metadata M1...M4 (associated with V1...V4), O's JS client code can decode cue data and use it for various purposes, e.g., to annotate chapters and scenes being depicted in grid on V1...V4, etc.
Comment 14 Bob Lund 2011-08-25 19:55:04 UTC
(In reply to comment #5)
> 
> How would a general-purpose UA (such as Opera, Chrome, Safari or Firefox)
> recognize IB1, IB2, and IB3? They are not specified in HTML so there is no
> requirement for them to be able to decode them and I haven't seen any moves
> that spans across these and other UAs to make IB1-3 a standard-supported format
> in them.

The types for IB1, IB2 and IB3 could be defined by specifications external to HTML5. The following language is in http://dev.w3.org/html5/spec/Overview.html#sourcing-in-band-text-tracks

"A media-resource-specific text track is a text track that corresponds to data found in the media resource. Rules for processing and rendering such data are defined by the relevant specifications, e.g. the specification of the video format if the media resource is a video"

I have made a proposal in the Web and TV IG Media Pipeline task force that these specs be created for an identified set of metadata and transport formats http://www.w3.org/2011/webtv/wiki/MPTF/MPTF_Discussions/TV_services_transport_mapping

The idea of a mapping specification was also raised in http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/031916.html. This is a long email. The relevant text from Ian was

"There can be a standard way. The idea is that all the types of metadata 
tracks that browsers will support should be specified so that all browsers 
can map them the same way. I'm happy to work with anyone interested in 
writing such a mapping spec, just let me know."

This mapping is necessary but not sufficient to address the problem. There needs to be a way for the type of the data that results from the mapping to be conveyed to script - hence bug 13359.

> If there is, then it would make a lot more sense to have actual
> kind=IB1-3 values as part of the spec IMHO.

This is a possible approach but the current use of "kind" seems to be categorical. What would be most useful to script is an indication of the syntax and semantics of the metadata - i.e. its type. Also, is can be expected that for a given category, e.g. advertising insertion point, there will be different types depending on the transport format. For example, SCTE-35 in MPEG-2 TS and some, yet to be defined format, in a SMPTE TT track in DASH.
Comment 15 Bob Lund 2011-08-25 20:50:57 UTC
(In reply to comment #12)
> Please note that there is no reason to quote the entirety of a comment when
> replying. Either quote nothing or quote just what you are responding to.
> 
> 
> Is there a _concrete_ example here? I really don't understand the use case
> being presented. Re comment 3, for example, why would O be exposing content
> from P?

Content advisories will be added by the content producer. A aervice provider will ingest that content and insert ad insertion messages for doing targeted advertising in some markets, but not in others. The service provider will provide a page O representing the program guide to access this channel across those markets. So page O needs to deal with: a) a metadata text track containing the content advisories from the content producer and b)another metadata text track, which may or may not be present, containing ad insertion messages. There's a need to identify the type of parental control message used and to distinguish between the two different metadata text tracks in script.

Even if there was agreement across content producers and service providers about message types one is left with the question of how does script distinguish between the tracks in the case there is multiple metadata text tracks source from in-band tracks in the media resource.
Comment 16 contributor 2011-09-23 23:38:55 UTC
Checked in as WHATWG revision r6586.
Check-in comment: Avoid firing 'click' twice per click on a <command>, oops.
http://html5.org/tools/web-apps-tracker?from=6585&to=6586
Comment 17 Glenn Adams 2011-09-23 23:52:58 UTC
(In reply to comment #16)
> Checked in as WHATWG revision r6586.
> Check-in comment: Avoid firing 'click' twice per click on a <command>, oops.
> http://html5.org/tools/web-apps-tracker?from=6585&to=6586

i think you logged this comment against the wrong bug, as it has nothing to do with it
Comment 18 Ian 'Hixie' Hickson 2011-09-26 22:26:02 UTC
(In reply to comment #13)
> CableTVCorp (O) creates an electronic program guide with a page depicted as a
> grid of videos V1...V4 provided by 3rd party content providers P1...P4.
>
> V1...V4, in turn, contain or refer to timed track metadata content (e.g., video
> descriptions, etc), which UA (upon decoding V1...V4) expose to O's JS client
> code via TextTrackCue.getCueAsSource().
>
> O does not (generally cannot) a priori
> know the timed track metadata content type

Yes it can. P1...P4 would just tell O what they are. I don't understand the problem here. This is a well-established pattern on the Web. An aggregator site gets information from third parties and those parties tell the aggregator exactly what they are providing, so that the aggregator can make use of it. Sometimes the aggregator explicitly specifies the format (e.g. Google Maps' transit directions feature has a specified format that transit companies can use), sometimes it's the other way around (e.g. Yahoo! Finance pulls information in from a number of different sources each of which might use its own format). Either way, though, the aggregator knows what the format of the data is.
Comment 19 Bob Lund 2011-09-26 23:13:56 UTC
(In reply to comment #18)
> (In reply to comment #13)
> > CableTVCorp (O) creates an electronic program guide with a page depicted as a
> > grid of videos V1...V4 provided by 3rd party content providers P1...P4.
> >
> > V1...V4, in turn, contain or refer to timed track metadata content (e.g., video
> > descriptions, etc), which UA (upon decoding V1...V4) expose to O's JS client
> > code via TextTrackCue.getCueAsSource().
> >
> > O does not (generally cannot) a priori
> > know the timed track metadata content type
> 
> Yes it can. P1...P4 would just tell O what they are. I don't understand the
> problem here. This is a well-established pattern on the Web. An aggregator site
> gets information from third parties and those parties tell the aggregator
> exactly what they are providing, so that the aggregator can make use of it.
> Sometimes the aggregator explicitly specifies the format (e.g. Google Maps'
> transit directions feature has a specified format that transit companies can
> use), sometimes it's the other way around (e.g. Yahoo! Finance pulls
> information in from a number of different sources each of which might use its
> own format). Either way, though, the aggregator knows what the format of the
> data is.

This could work if markup is used to reference out-of-band <track> where kind = metadata. This may also work if there is only one metadata track per program P. However, if the user agent is sourcing in-band metadata tracks, which is the case when cable content is re-purposed to the web, and a program P carries multiple metadata tracks, e.g. content advisories + SCTE-35 ad insertion messages, then even if aggregator O knows the type in the metadata tracks there is no way for script on O's page to know which exposed in-band track is which. Also, in the case of linear content where the referenced media resource is an unending stream with changing metadata tracks then there is also the same problem since the page from O might not be updated when there is a track change.
Comment 20 Bob Lund 2011-11-03 21:20:40 UTC
Created attachment 1040 [details]
Mapping in-band MPEG-2 TS tracks to HTML5

This page proposes a mechanism for how tracks in an in-band MPEG-2 TS should be exposed in HTML5 so script can identify the tracks. It is also proposed that this approach could be applicable to other container formats such as WebM and Ogg.
Comment 21 Ian 'Hixie' Hickson 2011-11-03 21:36:35 UTC
So the use case here is that there are in-band streams, e.g. TV, that have application-specific metadata, e.g. ad targeting keywords, game data, and the like, which are to be handled by specific modules on the client side as they arrive, basically "dispatching" each track to a separate blob of code that does not know ahead of time whether it will be needed or not.

We can't use "label" because some formats already provide a human-readable string even for metadata tracks (for debugging?).

Is there any particular reason there needs to be multiple metadata tracks for this, instead of the media stream having a single metadata stream that uses a convention of putting the "dispatch" type code in the first line of the cue, or something along those lines? (That would also avoid the problem of having to keep track of when these new tracks come along and enabling them.)
Comment 22 Bob Lund 2011-11-03 21:55:37 UTC
(In reply to comment #21)
> So the use case here is that there are in-band streams, e.g. TV, that have
> application-specific metadata, e.g. ad targeting keywords, game data, and the
> like, which are to be handled by specific modules on the client side as they
> arrive, basically "dispatching" each track to a separate blob of code that does
> not know ahead of time whether it will be needed or not.
> 
> We can't use "label" because some formats already provide a human-readable
> string even for metadata tracks (for debugging?).
> 
> Is there any particular reason there needs to be multiple metadata tracks for
> this, instead of the media stream having a single metadata stream that uses a
> convention of putting the "dispatch" type code in the first line of the cue, or
> something along those lines? (That would also avoid the problem of having to
> keep track of when these new tracks come along and enabling them.)

In the case of out-of-band tracks, multiple metadata text tracks would likely be used. So, having this model for in-band tracks seems to make sense. Another consideration is that an application might not be interested in all metadata tracks. Having the in-band tracks mapped to separate metadata text tracks allows this. Last, if the UA can determine the dispatch code then that's almost the same as identifying the type of metadata.

Some containers newer than MPEG-2 TS provide a MIME type for each track. This would be a natural fit with identifying the metadata track with a type.
Comment 23 Clarke Stevens 2011-11-09 22:52:40 UTC
At the F2F meetings in Santa Clara, the HTML WG appeared to support this bug as well as bug 14492 which requests a change event when tracks are removed. The combination of bugs 13359 and 14492 should address the needs of the Media Pipeline task force (MPTF).

The MPTF also supports the idea of a common reference in mapping track information to specific protocols. CableLabs has produced such a mapping reference for MPEG2 (http://www.w3.org/Bugs/Public/attachment.cgi?id=1040). Other mappings should be created for Ogg, WebM, etc.

At the HTML WG meeting in Santa Clara, no consensus was reached on who should maintain these mapping references.

The resolution suggested by the Media Pipeline task force (MPTF) is the same
one suggested in the original bug report. Here is a link to the MPTF
requirement:

http://www.w3.org/2011/webtv/wiki/MPTF/MPTF_Requirements#R3._Handling_of_In-band_Tracks
Comment 24 Ian 'Hixie' Hickson 2011-11-24 23:05:41 UTC
> In the case of out-of-band tracks, multiple metadata text tracks would likely
> be used.

Why?
Comment 25 Bob Lund 2011-11-28 17:23:39 UTC
(In reply to comment #24)
> > In the case of out-of-band tracks, multiple metadata text tracks would likely
> > be used.
> 
> Why?

Authoring of the metadata text tracks will be done independently for each application,.e.g. the <track> with content insertion triggers may be sourced by the business entity that sells insertion opportunities, the <track> with interactive content triggers may be sourced by the content provider that wants to add interactivity to the content.

Since these will come in as independent files it seems most efficient to be able to deliver them that way, as opposed to requiring a processing step to merge them into a single <track>.

This would be consistent with other text tracks, e.g. captions and subtitles, that are separate in <track>.

Bob
Comment 26 Ian 'Hixie' Hickson 2011-12-05 22:11:09 UTC
Fair enough.

Ok, I can add an attribute to TextTrack whose value is a string that the track is associated with in the in-band data stream where the track was found.

Can you give me a pointer to what field this information would be stored in for H.264, WebM, or Ogg streams, so that I can include an example to point implementors in the right direction?
Comment 27 Bob Lund 2011-12-07 20:33:33 UTC
(In reply to comment #26)
> Fair enough.
> 
> Ok, I can add an attribute to TextTrack whose value is a string that the track
> is associated with in the in-band data stream where the track was found.
> 
> Can you give me a pointer to what field this information would be stored in for
> H.264, WebM, or Ogg streams, so that I can include an example to point
> implementors in the right direction?

Once a UA determines that the track is a metadata text track, here are the fields, by media container, that I think should be concatenated in some fashion to go in this string (perhaps this should apply to all video, audio and text tracks where the UA cannot determine the kind):

Ogg[1] - Ogg skeleton header fields: content_type, role, name

MPEG-2 TS[2] - Program map table entry fields: stream_type, any descriptors

DASH[3] - MPD Role element, e.g.  <Role schemeIdUri=
Comment 28 Bob Lund 2011-12-08 21:38:37 UTC
(In reply to comment #27)

> 
> Once a UA determines that the track is a metadata text track, here are the
> fields, by media container, that I think should be concatenated in some fashion
> to go in this string (perhaps this should apply to all video, audio and text
> tracks where the UA cannot determine the kind):
> 
> Ogg[1] - Ogg skeleton header fields: content_type, role, name
> 
> MPEG-2 TS[2] - Program map table entry fields: stream_type, any descriptors
> 
> DASH[3] - MPD Role element, e.g.  <Role schemeIdUri=
Comment 29 Silvia Pfeiffer 2011-12-12 07:25:39 UTC
(In reply to comment #27)
>
> Ogg[1] - Ogg skeleton header fields: content_type, role, name

content_type == MIME type in Ogg; is that really what you're after?
role is already mapped to the @role attribute; what more do you expect from it?
name is already mapped to the @kind attribute; what more do you expect from it?


> MPEG-2 TS[2] - Program map table entry fields: stream_type, any descriptors
> 
> DASH[3] - MPD Role element, e.g.  <Role schemeIdUri=

Looks lie your post got cut off.

You've missed WebM.
Comment 30 Bob Lund 2011-12-12 16:07:18 UTC
(In reply to comment #29)
> (In reply to comment #27)
> >
> > Ogg[1] - Ogg skeleton header fields: content_type, role, name
> 
> content_type == MIME type in Ogg; is that really what you're after?

If the script needs MIME type to identify the format of a metadata track, then yes.

> role is already mapped to the @role attribute; what more do you expect from it?

I don't see a role attribute in either the HTMLTrackElement of TextTrack interface.

> name is already mapped to the @kind attribute; what more do you expect from it?

Definition of @kind enumerates a set of kinds. There is only a single metadata kind. The definition of name in http://wiki.xiph.org/SkeletonHeaders#Name states it is a free text string. Inclusion of name could be used to discriminate the specific type of metadata.

> 
> 
> > MPEG-2 TS[2] - Program map table entry fields: stream_type, any descriptors
> > 
> > DASH[3] - MPD Role element, e.g.  <Role schemeIdUri=
> 
> Looks lie your post got cut off.
> 
> You've missed WebM.

Thanks - WebM, MPEG-4 and DASH got cut off. Here is the complete list again.

Ogg - Ogg skeleton header fields: content_type, role, name [1]

MPEG-2 TS - Program map table entry fields: stream_type field in the PMT, any descriptors [2]

DASH - MPD Role element, e.g.  <Role schemeIdUri=”urn:mpeg:DASH:role:2011” value=”subtitle”/> [3]. NOTE: DASH has defined a scheme with a set of values. It is expected that other organizations will define additional schemes as needed.

WebM - TrackType and CodecID [4].

ISOBMFF (MPEG-4) - Contents of the handler reference box (hdlr) in each track : attributes handler_type and name [5]. NOTE: MPEG only defines three types of track handlers: video, audio and hint. Other organizations can extend the MPEG handler reference box to define new types of tracks.

[1] http://wiki.xiph.org/SkeletonHeaders
[2] ITU H.222.0 (5/2006) Table 2-33
[3] http://developer.longtailvideo.com/trac/export/1601/branches/adaptive/doc/specs/dash.pdf
[4] http://matroska.org/technical/specs/index.html
[5] ISO/IEC 14496-12


Bob
Comment 31 Silvia Pfeiffer 2011-12-13 11:43:03 UTC
> > role is already mapped to the @role attribute; what more do you expect from it?
> 
> I don't see a role attribute in either the HTMLTrackElement of TextTrack
> interface.

Ah yes: I meant @kind , sorry.


> > name is already mapped to the @kind attribute; what more do you expect from it?
> 
> Definition of @kind enumerates a set of kinds. There is only a single metadata
> kind. The definition of name in http://wiki.xiph.org/SkeletonHeaders#Name
> states it is a free text string. Inclusion of name could be used to
> discriminate the specific type of metadata.

And here I meant @title - I really don't know how I got so confused! :-)
Comment 32 Silvia Pfeiffer 2011-12-13 21:36:30 UTC
> > > name is already mapped to the @kind attribute; what more do you expect from it?
> > 
> > Definition of @kind enumerates a set of kinds. There is only a single metadata
> > kind. The definition of name in http://wiki.xiph.org/SkeletonHeaders#Name
> > states it is a free text string. Inclusion of name could be used to
> > discriminate the specific type of metadata.
> 
> And here I meant @title - I really don't know how I got so confused! :-)

OK, Christmas is tiring: I meant @label.

However, @title is actually an attribute that we haven't determined what it's used for in <track> and it can take any advisory information for the element. So, if all we need is a means to map descriptive information from in-band tracks to an IDL attribute, it may be of use?

So, for your use case, @kind would be fixed and set to the string "metadata", @label to whatever is in Ogg's "name" field, and @title could contain a concatenation of Ogg's "content-type" field and whatever remains of Ogg's "role" field that is not "metadata".
Comment 33 Bob Lund 2011-12-13 23:34:20 UTC
(In reply to comment #32)
> > > > name is already mapped to the @kind attribute; what more do you expect from it?
> > > 
> > > Definition of @kind enumerates a set of kinds. There is only a single metadata
> > > kind. The definition of name in http://wiki.xiph.org/SkeletonHeaders#Name
> > > states it is a free text string. Inclusion of name could be used to
> > > discriminate the specific type of metadata.
> > 
> > And here I meant @title - I really don't know how I got so confused! :-)
> 
> OK, Christmas is tiring: I meant @label.
> 
> However, @title is actually an attribute that we haven't determined what it's
> used for in <track> and it can take any advisory information for the element.
> So, if all we need is a means to map descriptive information from in-band
> tracks to an IDL attribute, it may be of use?
> 
> So, for your use case, @kind would be fixed and set to the string "metadata",
> @label to whatever is in Ogg's "name" field, and @title could contain a
> concatenation of Ogg's "content-type" field and whatever remains of Ogg's
> "role" field that is not "metadata".

There isn't a @title in either the HTMLTrackElement or TextTrack interface; the new string proposed in Comment 26 would suffice. Then, as you suggest: @kind='metadata', @label=Ogg name. I still think it would be valuable to put the Ogg title into the new string so @new_string=Ogg content_type + role + title, The reason I think Ogg title is important is that it provides a way to further differentiate the type of metadata content beyond the more generic role.
Comment 34 Silvia Pfeiffer 2011-12-14 07:22:55 UTC
> There isn't a @title in either the HTMLTrackElement or TextTrack interface;

Every HTML element has a title attribute - it's a global attribute, see http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-title-attribute .

> the
> new string proposed in Comment 26 would suffice. Then, as you suggest:
> @kind='metadata', @label=Ogg name. I still think it would be valuable to put
> the Ogg title into the new string so @new_string=Ogg content_type + role +
> title, The reason I think Ogg title is important is that it provides a way to
> further differentiate the type of metadata content beyond the more generic
> role.

Ogg's "title" field is a free form string, human readable. So it probably better matches what @label should have. Ogg's "name" field is machine readable. So it's likely more useful in your new field.
Comment 35 Bob Lund 2011-12-14 15:34:27 UTC
(In reply to comment #34)
> > There isn't a @title in either the HTMLTrackElement or TextTrack interface;
> 
> Every HTML element has a title attribute - it's a global attribute, see
> http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-title-attribute
> .
> 
> > the
> > new string proposed in Comment 26 would suffice. Then, as you suggest:
> > @kind='metadata', @label=Ogg name. I still think it would be valuable to put
> > the Ogg title into the new string so @new_string=Ogg content_type + role +
> > title, The reason I think Ogg title is important is that it provides a way to
> > further differentiate the type of metadata content beyond the more generic
> > role.
> 
> Ogg's "title" field is a free form string, human readable. So it probably
> better matches what @label should have. Ogg's "name" field is machine readable.
> So it's likely more useful in your new field.

It wasn't entirely clear to me which of the title or name field to use; I think we should go with your advice. So @new_string = Ogg content_type + role + name.
Comment 36 Ian 'Hixie' Hickson 2012-01-26 22:10:59 UTC
So I went to add this, but the description in comment 30 is too vague to get interoperability here.

Could you describe the exact normative steps to generate a string from in-band text tracks for each of the formats we should be supporting?
Comment 37 Bob Lund 2012-01-27 20:58:36 UTC
(In reply to comment #36)
> So I went to add this, but the description in comment 30 is too vague to get
> interoperability here.
> 
> Could you describe the exact normative steps to generate a string from in-band
> text tracks for each of the formats we should be supporting?

Below is some example normative text. Let me know if this needs to be reworked in any way.

We've also developed a more general mechanism for exposing all track metadata in media resource with in-band tracks (see http://html5.cablelabs.com/tracks/media-container-mapping.html). This is defined for Ogg, WebM, MPEG-2 TS and MPEG-4 File Format media types. Basically, the metadata is exposed via a text track to script .

This would be an alternative to the normative text below. This alternative is more general and also applies to video and audio tracks so in that sense I think it is a better solution. However, it may be too much to consider at this point.

Requested normative text
=====================================
The attribute (let's call it text_track_type) MUST be set to a string depending on the MIME type of the media resource:

video/ogg - The string contains the values of the Ogg Skeleton message headers content-type, role and name for this text track: '{"content_type":string containing content-type message header, "role":string containing the role message header, "name":string containing the name message header}'[1].

video/mp2t - The string contains the PMT elementary stream type and descriptors for this text track: '{"stream_type":string representation of the PMT stream_type field, "es_descriptors":[es_descriptor, ...]}'. Each es_descriptor is {"tag":string representation of the PMT descriptor tag, "desc_contents":Base64 encoded representation of the PMT elementary stream descriptor}[2]

video/webm - The string contains the TrackEntry and CodedID for this text track: '{"tracktype":string representation of the TrackType element contained in the TrackEntry element, "codecid":string representation of the CodecID element contained in the TrackEntry element}'[4].

video/mp4 - The string contains the track hdlr box and trak box contents for this track: '{"trak-mdia-hdlr-hdlr_type":string representation of the hdlr box hdlr_type field, "trak-mdia-hdlr-name":hdlr box name field, "trak":Base64 representation of the trak box (and all child boxes)}' [4]



[1] http://wiki.xiph.org/SkeletonHeaders
[2] ITU H.222.0 (5/2006) Table 2-33
[3] http://matroska.org/technical/specs/index.html
[4] ISO/IEC 14496-12
Comment 38 Ian 'Hixie' Hickson 2012-02-01 03:40:10 UTC
That seems really complicated. Can browser vendors confirm whether this is the kind of thing they'd be willing to implement?
Comment 39 Robert O'Callahan (Mozilla) 2012-02-01 04:12:19 UTC
It does seem unnecessarily complicated. For the usage described (enabling a script to find its metadata track), all that's needed is to expose a single string from the track to the script. For WebM the contents of the track's CodecID element would be appropriate and sufficient. For Ogg, the contents of the 'name' field should be enough. Similar simplifications for the other formats should be possible.

Comment #37 seems to be trying to create a feature that exposes a subset of the container-specific per-track metadata to script, which is overkill for the use-cases described in comment #0. IMHO.
Comment 40 Bob Lund 2012-02-01 15:59:49 UTC
(In reply to comment #39)
> It does seem unnecessarily complicated.

All for possible simplification.

> For the usage described (enabling a
> script to find its metadata track), all that's needed is to expose a single
> string from the track to the script. For WebM the contents of the track's
> CodecID element would be appropriate and sufficient. 

If CodecID provides the same information as TrackType, e.g. distinguishes between subtitle and control, then I agree. Otherwise it seems TrackType provides essential information in determining a text track type.

>For Ogg, the contents of
> the 'name' field should be enough. Similar simplifications for the other
> formats should be possible.

According to [1], "Role describe what semantic type of content is contained in a track" and "This field (Name) provides the opportunity to associate a free text string with the track". If only one of these fields can be provided I think Role would be more appropriate as it's values are defined. However, Role still doesn't fully identify the type of data in, for example, text/metadata, so it seems Name could be useful.

> 
> Comment #37 seems to be trying to create a feature that exposes a subset of the
> container-specific per-track metadata to script, which is overkill for the
> use-cases described in comment #0. IMHO.

[1] http://wiki.xiph.org/SkeletonHeaders
Comment 41 Robert O'Callahan (Mozilla) 2012-02-01 19:30:10 UTC
(In reply to comment #40)
> If CodecID provides the same information as TrackType, e.g. distinguishes
> between subtitle and control, then I agree. Otherwise it seems TrackType
> provides essential information in determining a text track type.

I can't imagine a situation where two tracks would have the same CodecID but different TrackTypes. More importantly, for the use-case described, authors can ensure that the CodecID is only used with one particular TrackType.

The same reasoning applies to Ogg 'name'.
Comment 42 Silvia Pfeiffer 2012-02-01 23:53:25 UTC
FWIW: I also don't think we need yet another mechanism to represent Ogg's role values in <track> - that's what @kind was created for. As for what is in Ogg's name - this was what <track>'s @label attribute was created for. Ogg has no means to specify what type of metadata is in a track other than giving it in the "name" field. So, from Ogg's POV there is nothing that we need extra.

Why can't MPEG also just throw all its extra information about codecs and metadata types etc into the @label attribute?
Comment 43 Bob Lund 2012-02-02 00:16:05 UTC
(In reply to comment #42)
> FWIW: I also don't think we need yet another mechanism to represent Ogg's role
> values in <track> - that's what @kind was created for. 

Ogg defines 10 text track, 3 video track and 7 audio track Role values. The current HTML5 spec enumerates 5 text track kinds and 6 audio/video kinds. So all of Ogg's roles can't be mapped to @kind. If the definition of @kind was expanded so it could take on the value of Role in Ogg (and designated fields in other media formats) then that would work.

> As for what is in Ogg's
> name - this was what <track>'s @label attribute was created for. Ogg has no
> means to specify what type of metadata is in a track other than giving it in
> the "name" field. So, from Ogg's POV there is nothing that we need extra.
> 
> Why can't MPEG also just throw all its extra information about codecs and
> metadata types etc into the @label attribute?

This is possible, although the current language spec says the label is human readable, which wouldn't be the case for MPEG metadata.
Comment 44 Ian 'Hixie' Hickson 2012-04-25 18:39:57 UTC
Exposing some author-provided machine label for metadata tracks seems reasonable, and not too much of a burden on implementors, _if_ we provide it by just exposing a single field from each format. You don't need more than one field, it just has to be something that the author can control on the production side and then read predictably on the consumption side. It has to be a field that the author can use to provide arbitrary data.

For ogg, it seems "name" is the closest match. It's not intended to be user-readable, as far as I can tell, and it's freeform.

For WebM/Matroska, I can't see an appropriate field, but maybe CodecID is sufficient for most use cases.

I don't know the other formats well enough to comment on those.


Note that currently we don't actually define even how "label" is mapped. The idea is that there will be format-specific specifications to define the mapping. The only difference here is that it's not at all as clear what the mapping should be for this field so I don't trust that user agents would implement it interoperably with similar vague hand-waving.
Comment 45 Bob Lund 2012-05-01 23:24:47 UTC
(In reply to comment #44)
> Exposing some author-provided machine label for metadata tracks seems
> reasonable, and not too much of a burden on implementors, _if_ we provide it by
> just exposing a single field from each format. You don't need more than one
> field, it just has to be something that the author can control on the
> production side and then read predictably on the consumption side. It has to be
> a field that the author can use to provide arbitrary data.
> 
> For ogg, it seems "name" is the closest match. It's not intended to be
> user-readable, as far as I can tell, and it's freeform.

I think that would do.

> 
> For WebM/Matroska, I can't see an appropriate field, but maybe CodecID is
> sufficient for most use cases.

I think the CodecID looks right.

> 
> I don't know the other formats well enough to comment on those.

The type of metadata can be determined as follows in MPPEG-4 ISO Base Media File Format and MPEG-2 Single Program Transport Stream.

MPEG-4 File Format:

A timed metadata track is identified by this box in the file: moov:trak:mdia:hdlr ==  'meta' ([1] section 8.4.3.1). Such a metadata track may include text or XML formatted data.

A timed metadata track contains text if moov:trak:mdia:minf:stbl:stsd contains MetaDataSampleEntry('mett'). The type of text data is identified by a MIME type. In this case let TYPE = 'mp4-mett' and VALUE = TextMetaDataSampleEntry.mime_format ([1] section 8.5.2.2).

A timed metadata track contains XML if moov:trak:mdia:minf:stbl:stsd contains MetaDataSampleEntry('metx'). The format of the XML document is identified by its namespace. In this case let TYPE = 'mp4-metx' and VALUE = XMLMetaDataSampleEntry.namespace.

Set the HTML5 metadata TextTrack label attribute to "{TYPE : VALUE}". NOTE: this is a suggested syntax for type/value. If something else is more appropriate than that would be fine as long as the representation allows script to distinguish whether the label attribute contains a text MIME type of an XML name space.

MPEG-2 Single Program Transport Stream

A private data elementary stream in an MPEG-2 SPTS that should be exposed as an HTML5 TextTrack of kind='metadata' is identified by the MPEG-2 TS elementary stream type 'stream_type' in the program map table (PMT) of the MPEG-2 media resource ([2] section 2.4.4.8). A 'stream_type' of 0x05, 0x06, 0x08, 0x09, 0x0E, 0x15-0x1A, 0x7F, 0x82-0x86, 0x88-0xFF indicates such a private data elementary stream.

Further information about the type of private data in that elementary stream is identified by one or more descriptors associated with this stream type in the PMT. These descriptors occupy a ES_info_length number of bytes immediately following the ES_info_length field in the PMT ([2] 2.4.4.9). Let TYPE = 'mp2t-ST' where ST is a string representation of the 'stream_type' field. Let VALUE = a string consisting of the Base64 encoded descriptor bytes.

Set the HTML5 metadata TextTrack label attribute to "{TYPE : VALUE}".

[1] ISO/IEC 14496-12 Third Edition 2008-10-15 "ISO base media file format"
[2] ITU-T H.222.0 05/2006 "Information technology – Generic coding of moving pictures and associated audio information: Systems"

> 
> 
> Note that currently we don't actually define even how "label" is mapped. The
> idea is that there will be format-specific specifications to define the
> mapping. 

Populating the label in the manner we're talking about for TextTrack would be useful for video and audio as well. HTML5 has enumerated "kind" for certain types of video, audio and text tracks but this mechanism doesn't allow the content creator to create a track of a kind that's not in the enumeration and have it be identifiable by script. Filling in the label as discussed would address this limitation.


> The only difference here is that it's not at all as clear what the
> mapping should be for this field so I don't trust that user agents would
> implement it interoperably with similar vague hand-waving.
Comment 46 Ian 'Hixie' Hickson 2012-05-08 17:04:13 UTC
roc, how does comment 45 sound for MPEG-4 and MPEG-2 stream? Is it simple enough?

I'm not sure I really follow how XML works in the context of a metadata stream. Is there a separate XML document per cue? Or is it one big document? If it's one big document then it seems that it'll need much more of a spec to define the mapping anyway, and we can leave this issue to that spec, and just handle the text case.
Comment 47 Robert O'Callahan (Mozilla) 2012-05-08 20:37:07 UTC
(In reply to comment #46)
> roc, how does comment 45 sound for MPEG-4 and MPEG-2 stream? Is it simple
> enough?

Sounds OK.

But why not simply encode the type and value as "TYPE VALUE"? Less chance of things going wrong with author regexes if VALUE happens to contain "}". Need to require that TYPE does not contain a space.
Comment 48 Bob Lund 2012-05-31 16:14:32 UTC
(In reply to comment #47)
> (In reply to comment #46)
> > roc, how does comment 45 sound for MPEG-4 and MPEG-2 stream? Is it simple
> > enough?
> 
> Sounds OK.
> 
> But why not simply encode the type and value as "TYPE VALUE"? Less chance of
> things going wrong with author regexes if VALUE happens to contain "}". Need to
> require that TYPE does not contain a space.

Sounds OK.

Does the editor have an idea for where the solution will be documented? One proposal is a separate W3C spec that is referenced by the HTML5 spec.

On a related issue, the solution to this bug will allow script to recognize the type of data in text tracks in a standard way, even if the UA does not recognize those tracks. Script access to these tracks is only possible if the UA exposes all text tracks in a media resource. As far as I can tell, the HTML5 spec has only one requirement for how the UA exposes media resource text tracks [1]. This could be read to imply that a UA must expose all text tracks. However, this could also be read to apply to only the text tracks that the UA makes available. It might be good to add an explicit requirement that all text tracks in the media resource be exposed . BTW, HTML5 has a similar track list ordering requirement for video and audio tracks [2]. An explicit requirement for the UA to expose all video and audio tracks in a media resource would be useful.

[1] http://dev.w3.org/html5/spec/single-page.html#list-of-text-tracks
[2] http://dev.w3.org/html5/spec/single-page.html#audiotracklist
Comment 49 Ian 'Hixie' Hickson 2012-06-28 03:50:03 UTC
(In reply to comment #45)
> 
> A timed metadata track contains text if moov:trak:mdia:minf:stbl:stsd contains
> MetaDataSampleEntry('mett'). The type of text data is identified by a MIME
> type. In this case let TYPE = 'mp4-mett' and VALUE =
> TextMetaDataSampleEntry.mime_format ([1] section 8.5.2.2).
> 
> A timed metadata track contains XML if moov:trak:mdia:minf:stbl:stsd contains
> MetaDataSampleEntry('metx'). The format of the XML document is identified by
> its namespace. In this case let TYPE = 'mp4-metx' and VALUE =
> XMLMetaDataSampleEntry.namespace.

I can't reconcile this with the MPEG4 spec, mostly because the MPEG4 spec seems to be impenetrable.

First, there's no such thing as a "moov:trak:mdia:minf:stbl:stsd box", it seems that the formal way to refer to this would be "the first stsd box in the first stbl box in the first minf box in the first mdia box in the first trak box in the first moov box, if such a box exists". Is that right? Second, I can't work out how to even formally phrase the stuff about "mett" and "mime_format" and so forth. I can't find the definition of SDL, and the SDL in the above-cited MPEG spec is the only place that mentions "mett". The MPEG files don't seem to actually contain anything called "TextMetaDataSampleEntry", that seems to just be a concept used in the spec's syntax definitions (?). Really not sure how to write this.
Comment 50 Ian 'Hixie' Hickson 2012-06-28 04:40:08 UTC
I think I've worked out how to phrase it, though it'll need review. Still working on the MPEG2 stuff before I can check this in.
Comment 51 Ian 'Hixie' Hickson 2012-06-28 05:02:14 UTC
I can't find the MPEG2 standard anywhere so I'm not able to spec that part.

I've specced the requirements for Ogg, WebM, and MPEG-4. I haven't specced requirements for DASH, since no proposal for DASH was made.

The patch will be given below.

(In reply to comment #48)
> 
> On a related issue, the solution to this bug will allow script to recognize the
> type of data in text tracks in a standard way, even if the UA does not
> recognize those tracks.

The UA can only expose tracks in formats it understands how to expose. Tracks in formats it doesn't understand how to expose obviously can't be exposed, since it doesn't understand how to expose them.
Comment 52 Ian 'Hixie' Hickson 2012-06-28 05:10:35 UTC
(leaving open since the fix is only applied to the WHATWG copy currently)
Comment 53 contributor 2012-06-28 05:11:23 UTC
Checked in as WHATWG revision r7150.
Check-in comment: Define textTrack.inBandMetadataTrackDispatchType
http://html5.org/tools/web-apps-tracker?from=7149&to=7150
Comment 54 Ian 'Hixie' Hickson 2012-06-28 05:34:59 UTC
Mike helped me find the MPEG-2 spec, so I added support for that too. Please do review the text, I have no idea what I'm doing here.
Comment 55 contributor 2012-06-28 05:36:07 UTC
Checked in as WHATWG revision r7151.
Check-in comment: Define textTrack.inBandMetadataTrackDispatchType for MPEG-2 also.
http://html5.org/tools/web-apps-tracker?from=7150&to=7151
Comment 56 Bob Lund 2012-06-29 15:49:25 UTC
(In reply to comment #54)
> Mike helped me find the MPEG-2 spec, so I added support for that too. Please do
> review the text, I have no idea what I'm doing here.

The text for MPEG-2 TS looks good but there is one thing missing that had been suggested.

As it stands in the proposed text, when the in-band metadata track dispatch type is set, there is nothing to indicate if the text track was from an MPEG-2 TS, Ogg, WebM MEPG-4 media resource and script won't know how to interpret the attribute. One way to fix this would be have inBandMetadatTrackDispatchType start with "mp2t-", "ogg-", "webm-" or "mp4-". For example,

If the media resource is an Ogg file
The text track in-band metadata track dispatch type must be set to _the concatenation of "ogg-" and_ the value of the Name header field. [OGGSKELETONHEADERS]
Comment 57 Bob Lund 2012-07-03 15:37:37 UTC
Created attachment 1152 [details]
Attachment for comment 57
Comment 58 Bob Lund 2012-07-03 15:40:46 UTC
(In reply to comment #49)
> (In reply to comment #45)
> > 
> > A timed metadata track contains text if moov:trak:mdia:minf:stbl:stsd contains
> > MetaDataSampleEntry('mett'). The type of text data is identified by a MIME
> > type. In this case let TYPE = 'mp4-mett' and VALUE =
> > TextMetaDataSampleEntry.mime_format ([1] section 8.5.2.2).
> > 
> > A timed metadata track contains XML if moov:trak:mdia:minf:stbl:stsd contains
> > MetaDataSampleEntry('metx'). The format of the XML document is identified by
> > its namespace. In this case let TYPE = 'mp4-metx' and VALUE =
> > XMLMetaDataSampleEntry.namespace.
> 
> I can't reconcile this with the MPEG4 spec, mostly because the MPEG4 spec seems
> to be impenetrable.
> 
> First, there's no such thing as a "moov:trak:mdia:minf:stbl:stsd box", it seems
> that the formal way to refer to this would be "the first stsd box in the first
> stbl box in the first minf box in the first mdia box in the first trak box in
> the first moov box, if such a box exists". Is that right? Second, I can't work
> out how to even formally phrase the stuff about "mett" and "mime_format" and so
> forth. I can't find the definition of SDL, and the SDL in the above-cited MPEG
> spec is the only place that mentions "mett". The MPEG files don't seem to
> actually contain anything called "TextMetaDataSampleEntry", that seems to just
> be a concept used in the spec's syntax definitions (?). Really not sure how to
> write this.

Here is proposed text, based on what is already in http://www.whatwg.org/specs/web-apps/current-work/#text-track-in-band-metadata-track-dispatch-type. The attachment shows the change with the same formatting as the existing text.



If the media resource is an MPEG-4 file

Let sample entry be the contents of the stsd box, which is contained in the  stbl box, which is  contained in the minf box, which is contained in the mdia box, which is contained in the trak box. Let type be the type field of the stsd box. If type is neither "mett" nor "metx" then the text track in-band metadata track dispatch type must be set to the empty string. Otherwise, if the type is "mett" then sample entry is a TextMetaDataSample and the text track in-band metadata track dispatch type must be set to the concatenation of the string "mett", a U+0020 SPACE character, and the value of the mime_format field of TextMetaDataSampleEntry. Otherwise, if the type is "metx" then sample entry is an XMLMetaDataSampleEntry and the text track in-band metadata track dispatch type must be set to the concatenation of the string "metx", a U+0020 SPACE character, and the value of the namespace field of XMLMetaDataSampleEntry.[MPEG4]
Comment 59 Ian 'Hixie' Hickson 2012-07-12 17:26:37 UTC
Wouldn't the page know what the format was? After all, it was the one that gave the URL.
Comment 60 contributor 2012-07-18 04:33:54 UTC
This bug was cloned to create bug 1 as part of operation convergence.
Comment 61 Bob Lund 2012-08-27 20:21:46 UTC
(In reply to comment #59)
> Wouldn't the page know what the format was? After all, it was the one that gave
> the URL.

I assume this is a reply t o comment #56.

In the case where the URL is specified in the video element, the page author would need to use different versions of the JS dispatch function depending on the media type of the content pointed to by the URL. Will the author always be able to make this association? What about in a dynamically created page where the only piece of information about the content is the URL?

Also, wouldn't this be a source of errors? The metadata available to the page author might be wrong, saying that a piece of content is Ogg when in fact it is WebM. A UA that supports both will play the content but the wrong dispatch function will be used.

Having the UA set the type as proposed would address the above use cases.
Comment 62 Silvia Pfeiffer 2012-09-07 19:32:05 UTC
(In reply to comment #60)
> This bug was cloned to create bug 1 as part of operation convergence.

I think that didn't work and this is still a bug shared between WHATWG and HTMLWG.
Comment 63 Silvia Pfeiffer 2012-09-07 19:33:12 UTC
(In reply to comment #55)
> Checked in as WHATWG revision r7151.
> Check-in comment: Define textTrack.inBandMetadataTrackDispatchType for MPEG-2
> also.
> http://html5.org/tools/web-apps-tracker?from=7150&to=7151

Applied to HTMLWG spec at https://github.com/w3c/html/commit/4886ebc76b0335ceb8e47d2fa8c10868fc6b0be0 .
Comment 64 Silvia Pfeiffer 2012-09-07 19:41:31 UTC
(In reply to comment #61)
> (In reply to comment #59)
> > Wouldn't the page know what the format was? After all, it was the one that gave
> > the URL.
> 
> I assume this is a reply t o comment #56.
> 
> In the case where the URL is specified in the video element, the page author
> would need to use different versions of the JS dispatch function depending on
> the media type of the content pointed to by the URL. Will the author always be
> able to make this association?

Since it's the author providing the URL, he should know the media type of the content behind the URL. Can you explain a situation where that wouldn't be the case?

> What about in a dynamically created page where
> the only piece of information about the content is the URL?

Even then the page author is the one writing the JS, so they would know.
 
> Also, wouldn't this be a source of errors? The metadata available to the page
> author might be wrong, saying that a piece of content is Ogg when in fact it is
> WebM. A UA that supports both will play the content but the wrong dispatch
> function will be used.

If the author doesn't know the media type/can't find out through probing, why would the browser get it right?
Comment 65 Bob Lund 2012-09-07 23:13:03 UTC
(In reply to comment #64)
> (In reply to comment #61)
> > (In reply to comment #59)
> > > Wouldn't the page know what the format was? After all, it was the one that gave
> > > the URL.
> > 
> > I assume this is a reply t o comment #56.
> > 
> > In the case where the URL is specified in the video element, the page author
> > would need to use different versions of the JS dispatch function depending on
> > the media type of the content pointed to by the URL. Will the author always be
> > able to make this association?
> 
> Since it's the author providing the URL, he should know the media type of the
> content behind the URL. Can you explain a situation where that wouldn't be the
> case?
> 
> > What about in a dynamically created page where
> > the only piece of information about the content is the URL?
> 
> Even then the page author is the one writing the JS, so they would know.

If all the page generator (human author or otherwise) has is the URL to the content then how will it know the type of content?

> 
> > Also, wouldn't this be a source of errors? The metadata available to the page
> > author might be wrong, saying that a piece of content is Ogg when in fact it is
> > WebM. A UA that supports both will play the content but the wrong dispatch
> > function will be used.
> 
> If the author doesn't know the media type/can't find out through probing, why
> would the browser get it right?

The browser has access to the content-type when it tries to play the content, right?
Comment 66 Silvia Pfeiffer 2012-09-08 09:49:12 UTC
(In reply to comment #65)
> If all the page generator (human author or otherwise) has is the URL to the
> content then how will it know the type of content?

That's what mime types are for.


> > If the author doesn't know the media type/can't find out through probing, why
> > would the browser get it right?
> 
> The browser has access to the content-type when it tries to play the content,
> right?

The JS developer has that access, too, via the getResponseHeader() function, see http://www.w3.org/TR/XMLHttpRequest/#the-getresponseheader-method .

Is that sufficient or do you have a use case beyond this?
Comment 67 Glenn Adams 2012-09-09 01:30:42 UTC
(In reply to comment #66)
> (In reply to comment #65)
> > If all the page generator (human author or otherwise) has is the URL to the
> > content then how will it know the type of content?
> 
> That's what mime types are for.

Mime type metadata is merely a hint. Only the decoder of the data (in this case the UA) can determine the actual type.

> 
> > > If the author doesn't know the media type/can't find out through probing, why
> > > would the browser get it right?
> > 
> > The browser has access to the content-type when it tries to play the content,
> > right?
> 
> The JS developer has that access, too, via the getResponseHeader() function,
> see http://www.w3.org/TR/XMLHttpRequest/#the-getresponseheader-method .

That doesn't work when track data is embedded in a container format that doesn't expose metadata about its component content. It also doesn't work when you aren't using XHR, and it doesn't work if you are using XHR and the response doesn't match the actual type.

> Is that sufficient or do you have a use case beyond this?

Track data embedded in the user private data of a MPEG-2 TS.
Comment 68 Bob Lund 2012-09-10 14:43:15 UTC
(In reply to comment #66)
> (In reply to comment #65)
> > If all the page generator (human author or otherwise) has is the URL to the
> > content then how will it know the type of content?
> 
> That's what mime types are for.

If the page in question sets the src attribute of the video element with the content URL, there is no need for the mime type.
 
> 
> 
> > > If the author doesn't know the media type/can't find out through probing, why
> > > would the browser get it right?
> > 
> > The browser has access to the content-type when it tries to play the content,
> > right?
> 
> The JS developer has that access, too, via the getResponseHeader() function,
> see http://www.w3.org/TR/XMLHttpRequest/#the-getresponseheader-method .

I don't understand the relevance of XHR to this? All the page and UA have is a URL to a content item. 

> 
> Is that sufficient or do you have a use case beyond this?

No, the suggestions aren't sufficient for the above-stated reasons.

If there is a way JS can determine the type of a content item given only the URL then I think that would suffice. Otherwise, I think it would be better for the UA to tag the dispatch string as suggested based on the mime type it will have when playing the content. This always works and makes no assumptions about what content metadata is available at page creation.
Comment 69 Robin Berjon 2012-11-27 16:18:34 UTC
I'm having a hard time figuring out if I can close this bug because we've taken the change, or if the further discussion requires leaving it open. I get a sense that the further discussion might actually justify a bug of its own.

Silvia: since you've been actively involved here, can you please make that call? This is the last LC1 bug. If it stays open it should probably be moved to .next.
Comment 70 Bob Lund 2012-11-27 16:43:01 UTC
Spec text for resolving this bug has already been created here [1]. This text looks good, with one change: s/of the first trak box/of a trak box/. I requested this change in response to an email for Ian.

With this change to the text in [1] the bug appears to be addressed.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#sourcing-in-band-text-tracks

(In reply to comment #69)
> I'm having a hard time figuring out if I can close this bug because we've
> taken the change, or if the further discussion requires leaving it open. I
> get a sense that the further discussion might actually justify a bug of its
> own.
> 
> Silvia: since you've been actively involved here, can you please make that
> call? This is the last LC1 bug. If it stays open it should probably be moved
> to .next.
Comment 71 Silvia Pfeiffer 2012-11-28 15:04:38 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If
you are satisfied with this response, please change the state of
this bug to CLOSED. If you have additional information and would
like the Editor to reconsider, please reopen this bug. If you would
like to escalate the issue to the full HTML Working Group, please
add the TrackerRequest keyword to this bug, and suggest title and
text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this
document:   http://dev.w3.org/html5/decision-policy/decision-policy-v2.html

Status: Accepted

Change Description:
Previous changes are already in the W3C spec:
http://dev.w3.org/html5/spec/single-page.html#sourcing-in-band-text-tracks
http://dev.w3.org/html5/spec/single-page.html#text-track-in-band-metadata-track-dispatch-type

Final patch is staged at:
https://github.com/w3c/html/commit/c1cee3b0ffdf29a46992caaf13db5552a4ced5b1

Rationale: Accepted the MPEG-4 parsing knowledge of the reporter