This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26929 - [InbandTracks] How to expose ISOBMFF Tracks
Summary: [InbandTracks] How to expose ISOBMFF Tracks
Status: NEW
Alias: None
Product: HTML WG
Classification: Unclassified
Component: Sourcing In-band Media Resource Tracks (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Silvia Pfeiffer
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-29 12:36 UTC by Cyril Concolato
Modified: 2014-10-28 15:41 UTC (History)
3 users (show)

See Also:


Attachments

Description Cyril Concolato 2014-09-29 12:36:05 UTC
The following sentence is wrong and does not provide normative statements:
"A user agent recognises and supports data from a MPEG-4 TrackBox as being equivalent to a HTML track based on the value of the 'handler_type' field in the HandlerBox ('hdlr) of the MediaBox ('mdia') of the TrackBox"
It is wrong because a UA will know if it supports the decoding of a track based on the SampleEntry (code and content) and not only on the handler type. They might not even care about the handler type.
The sentence should be rewritten with normative statements. 

Also same remark as for bug #26927, the overall picture is not clear: when should data be exposed as VideoTrack: it shall be exposed a VideoTrack if supported otherwise it may be exposed as TextTrack . Should all tracks be exposed (possibly via a negotiation mechanism) ?

We might want to update the section regarding the "KindBox" and "ExtendedLanguageBox" which are in the on-going amendment of ISOBMFF to update the table.

Same remark as for bug #26921, the TTML and WebVTT parts should be moved in a separate section. CEA708 and 3GPP Timed Text as well. They can be carried in TS or in MP4 or else.

I will provide updated text for all these changes.

Note: the whole section is also applicable to the MIME "application/mp4", which may carry metadata only tracks.
Comment 1 Cyril Concolato 2014-09-29 13:16:43 UTC
For those interested in the new ExtendedLanguageBox, see some details here:
http://gpac.wp.mines-telecom.fr/2014/09/23/language-tagging-in-gpac/
Comment 2 Bob Lund 2014-10-02 21:40:23 UTC
(In reply to Cyril Concolato from comment #0)
> The following sentence is wrong and does not provide normative statements:
> "A user agent recognises and supports data from a MPEG-4 TrackBox as being
> equivalent to a HTML track based on the value of the 'handler_type' field in
> the HandlerBox ('hdlr) of the MediaBox ('mdia') of the TrackBox"
> It is wrong because a UA will know if it supports the decoding of a track
> based on the SampleEntry (code and content) and not only on the handler
> type. They might not even care about the handler type.

The fact that the UA could look at other fields doesn't make the use of the 'hdlr' box wrong. ISO/IEC 14496-12:2012(E) states that the 'hdlr' box is mandatory and "This box within a Media Box declares the process by which the media-data in the track is presented, and thus, the nature of the media in a track. For example, a video track would be handled by a video handler."
 
> The sentence should be rewritten with normative statements. 
> 
> Also same remark as for bug #26927, the overall picture is not clear: when
> should data be exposed as VideoTrack: it shall be exposed a VideoTrack if
> supported otherwise it may be exposed as TextTrack . Should all tracks be
> exposed (possibly via a negotiation mechanism) ?

What part the following spec text isn't clear?

text track: the 'handler_type' value is "meta", "subt" or "text"
video track: the 'handler_type' value is "soun"
audio track: the 'handler_type' value is "vide"
> 
> We might want to update the section regarding the "KindBox" and
> "ExtendedLanguageBox" which are in the on-going amendment of ISOBMFF to
> update the table.

Please provide a reference to this amendment.

> 
> Same remark as for bug #26921, the TTML and WebVTT parts should be moved in
> a separate section. CEA708 and 3GPP Timed Text as well. They can be carried
> in TS or in MP4 or else.

TTML and WebVTT are not carried in TS AFAIK. CEA708 is not spec'd for MP4. And as commented on in 26921, how TTML, WebVTT, CEA-708 is identified and found is container specific. It might make sense to centralize how Cues are created from TTML and WebVTT content once found in the container.

> 
> I will provide updated text for all these changes.
> 
> Note: the whole section is also applicable to the MIME "application/mp4",
> which may carry metadata only tracks.
Comment 3 Cyril Concolato 2014-10-08 20:10:54 UTC
(In reply to Bob Lund from comment #2)
> (In reply to Cyril Concolato from comment #0)
> > The following sentence is wrong and does not provide normative statements:
> > "A user agent recognises and supports data from a MPEG-4 TrackBox as being
> > equivalent to a HTML track based on the value of the 'handler_type' field in
> > the HandlerBox ('hdlr) of the MediaBox ('mdia') of the TrackBox"
> > It is wrong because a UA will know if it supports the decoding of a track
> > based on the SampleEntry (code and content) and not only on the handler
> > type. They might not even care about the handler type.
> 
> The fact that the UA could look at other fields doesn't make the use of the
> 'hdlr' box wrong. ISO/IEC 14496-12:2012(E) states that the 'hdlr' box is
> mandatory and "This box within a Media Box declares the process by which the
> media-data in the track is presented, and thus, the nature of the media in a
> track. For example, a video track would be handled by a video handler."
What you cite does not describe the full process. The sentence in the sourcing spec is wrong because it is incomplete and because knowing the track is a video does not tell you if it's AVC or HEVC. Actually, the same sample entry could be used with different handler, and you'd still be able to process it because that is the SampleEntry that defines the track format (not the handler). The handler is just a very limited indication. 
 
>  
> > The sentence should be rewritten with normative statements. 
> > 
> > Also same remark as for bug #26927, the overall picture is not clear: when
> > should data be exposed as VideoTrack: it shall be exposed a VideoTrack if
> > supported otherwise it may be exposed as TextTrack . Should all tracks be
> > exposed (possibly via a negotiation mechanism) ?
> 
> What part the following spec text isn't clear?
> 
> text track: the 'handler_type' value is "meta", "subt" or "text"
> video track: the 'handler_type' value is "soun"
> audio track: the 'handler_type' value is "vide"
I meant: Should a UA expose a track it does not understand? 

> > 
> > We might want to update the section regarding the "KindBox" and
> > "ExtendedLanguageBox" which are in the on-going amendment of ISOBMFF to
> > update the table.
> 
> Please provide a reference to this amendment.
The reference is: ISO/IEC 14496-12:2012/DAM 4. This draft is available here:
http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w14574-v2-w14574.zip

> 
> > 
> > Same remark as for bug #26921, the TTML and WebVTT parts should be moved in
> > a separate section. CEA708 and 3GPP Timed Text as well. They can be carried
> > in TS or in MP4 or else.
> 
> TTML and WebVTT are not carried in TS AFAIK. 
Sorry, I should have said "could". The could be in WebM, Ogg, or whatever. My point here is that it should be out of the MP4 section. MP4 defines how to get the WebVTT file back so our purpose, we can just say the same processing as for WebVTT out-of-MP4 but in MPD applies.

> CEA708 is not spec'd for MP4.
I think Apple has a spec for that. We've had request to support it in GPAC.


> And as commented on in 26921, how TTML, WebVTT, CEA-708 is identified and
> found is container specific. It might make sense to centralize how Cues are
> created from TTML and WebVTT content once found in the container.
Yes, please!
Comment 4 Bob Lund 2014-10-14 20:12:12 UTC
(In reply to Cyril Concolato from comment #3)
> (In reply to Bob Lund from comment #2)
> > (In reply to Cyril Concolato from comment #0)
> > > The following sentence is wrong and does not provide normative statements:
> > > "A user agent recognises and supports data from a MPEG-4 TrackBox as being
> > > equivalent to a HTML track based on the value of the 'handler_type' field in
> > > the HandlerBox ('hdlr) of the MediaBox ('mdia') of the TrackBox"
> > > It is wrong because a UA will know if it supports the decoding of a track
> > > based on the SampleEntry (code and content) and not only on the handler
> > > type. They might not even care about the handler type.
> > 
> > The fact that the UA could look at other fields doesn't make the use of the
> > 'hdlr' box wrong. ISO/IEC 14496-12:2012(E) states that the 'hdlr' box is
> > mandatory and "This box within a Media Box declares the process by which the
> > media-data in the track is presented, and thus, the nature of the media in a
> > track. For example, a video track would be handled by a video handler."
> What you cite does not describe the full process. The sentence in the
> sourcing spec is wrong because it is incomplete and because knowing the
> track is a video does not tell you if it's AVC or HEVC. Actually, the same
> sample entry could be used with different handler, and you'd still be able
> to process it because that is the SampleEntry that defines the track format
> (not the handler). The handler is just a very limited indication.

The section of the sourcing spec referred to in this bug is about determining if a track is audio, video or text based on the handler. The codec type is not relevant to this determination. When the codec type is relevant, e.g. how CEA708 Cues are sourced, identification of audio for the visually impaired, the codec type is referenced.
 
>  
> >  
> > > The sentence should be rewritten with normative statements. 
> > > 
> > > Also same remark as for bug #26927, the overall picture is not clear: when
> > > should data be exposed as VideoTrack: it shall be exposed a VideoTrack if
> > > supported otherwise it may be exposed as TextTrack . Should all tracks be
> > > exposed (possibly via a negotiation mechanism) ?
> > 
> > What part the following spec text isn't clear?
> > 
> > text track: the 'handler_type' value is "meta", "subt" or "text"
> > video track: the 'handler_type' value is "soun"
> > audio track: the 'handler_type' value is "vide"
> I meant: Should a UA expose a track it does not understand?

This same question was asked in bug 26923. Comment #5 in that bug [1] provides an answer (Comment #6 provides the URL to the reference in #5). The DASH spec suggests "Clients may ignore Representations that rely on codecs or other rendering technologies they do not support or that are otherwise unsuitable".

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=26923#c5
> 
> > > 
> > > We might want to update the section regarding the "KindBox" and
> > > "ExtendedLanguageBox" which are in the on-going amendment of ISOBMFF to
> > > update the table.
> > 
> > Please provide a reference to this amendment.
> The reference is: ISO/IEC 14496-12:2012/DAM 4. This draft is available here:
> http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/
> w14574-v2-w14574.zip

Agreed about the ExtendedLanguageBox. Not sure about the KindBox. It seems that specific KindBox values depend on the schemeURI. So the sourcing spec needs a specific set of schemeURI's to specify a KindBox value to HTML track @kind mapping.

> 
> > 
> > > 
> > > Same remark as for bug #26921, the TTML and WebVTT parts should be moved in
> > > a separate section. CEA708 and 3GPP Timed Text as well. They can be carried
> > > in TS or in MP4 or else.
> > 
> > TTML and WebVTT are not carried in TS AFAIK. 
> Sorry, I should have said "could". The could be in WebM, Ogg, or whatever.
> My point here is that it should be out of the MP4 section. MP4 defines how
> to get the WebVTT file back so our purpose, we can just say the same
> processing as for WebVTT out-of-MP4 but in MPD applies.
> 
> > CEA708 is not spec'd for MP4.
> I think Apple has a spec for that. We've had request to support it in GPAC.

CEA708 in AVC in MPEG-2 TS is specified. There is no spec for CEA708 in AVC in ISOBMFF AFAIK.

> 
> 
> > And as commented on in 26921, how TTML, WebVTT, CEA-708 is identified and
> > found is container specific. It might make sense to centralize how Cues are
> > created from TTML and WebVTT content once found in the container.
> Yes, please!
Comment 5 Cyril Concolato 2014-10-28 15:41:11 UTC
> The section of the sourcing spec referred to in this bug is about
> determining if a track is audio, video or text based on the handler. The
> codec type is not relevant to this determination. When the codec type is
> relevant, e.g. how CEA708 Cues are sourced, identification of audio for the
> visually impaired, the codec type is referenced.
I agree with your approach. The initial text was misleading. I proposed some text in a PR:
https://github.com/w3c/HTMLSourcingInbandTracks/pull/33

> Agreed about the ExtendedLanguageBox. 
Added to the PR.

> Not sure about the KindBox. It seems
> that specific KindBox values depend on the schemeURI. So the sourcing spec
> needs a specific set of schemeURI's to specify a KindBox value to HTML track
> @kind mapping.
Yes. Don't know if MPEG should be defining them or not. I expect the schemeIdURI to be http://www.w3.org/TR/HTML for kinds defined by the HTML spec but that's nowhere specified.

> CEA708 in AVC in MPEG-2 TS is specified. There is no spec for CEA708 in AVC
> in ISOBMFF AFAIK.
You can check https://developer.apple.com/library/mac/documentation/quicktime/qtff/QTFFChap3/qtff3.html#//apple_ref/doc/uid/TP40000939-CH205-SW87