Bug 18933 - Segment byte boundaries are not defined
Segment byte boundaries are not defined
Status: RESOLVED FIXED
Product: HTML WG
Classification: Unclassified
Component: Media Source Extensions
unspecified
All All
: P2 normal
: ---
Assigned To: Mark Watson
HTML WG Bugzilla archive list
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-20 12:37 UTC by Haakon Riiser
Modified: 2013-02-05 22:35 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Haakon Riiser 2012-09-20 12:37:03 UTC
As far as I can tell, the MSE spec is not clear on how to determine where a segment ends. Since append() does not necessarily deliver complete segments, we need to be able to determine the size in bytes of both init and media segments based on the data provided by append().

With ISO BMFF, we currently assume that the last byte in an init segment is the last byte in the moov box, implying that an init segment will never have any other top-level boxes after the moov box.

For media segments, we assume that the last bytes is the last byte in the segment's last mdat box (we expect that the segment contains exactly as many mdat boxes as there are traf boxes inside the segment's moof box, one for each track). 

Since this is just guesswork, I think the spec should explicitly state how to determine init and media segment boundaries, for all possible container formats.
Comment 1 Mark Watson 2012-09-20 16:37:30 UTC
(In reply to comment #0)
> As far as I can tell, the MSE spec is not clear on how to determine where a
> segment ends. Since append() does not necessarily deliver complete segments, we
> need to be able to determine the size in bytes of both init and media segments
> based on the data provided by append().
> 
> With ISO BMFF, we currently assume that the last byte in an init segment is the
> last byte in the moov box, implying that an init segment will never have any
> other top-level boxes after the moov box.
> 
> For media segments, we assume that the last bytes is the last byte in the
> segment's last mdat box (we expect that the segment contains exactly as many
> mdat boxes as there are traf boxes inside the segment's moof box, one for each
> track). 
> 
> Since this is just guesswork, I think the spec should explicitly state how to
> determine init and media segment boundaries, for all possible container
> formats.

I think for ISO BMFF it is sufficient to define how the beginning of a segment is detected - this implicitly signals the end of the preceding segment.

I propose that the beginning of a media segment is signalled by moof box header and the beginning of an Initialization segment is signalled by any box header which is not moof or mdat.
Comment 2 Steven Robertson 2012-09-20 17:46:42 UTC
For Chrome, the current definitions we used are:

- Initialization Segment is a complete 'moov' box. Start is signaled by the 'moov' header, end is signaled by getting the number of bytes signaled in the 'moov' header.

- Media Segments begin with a 'moof' box header. Once the header is received, the furthest extent of the media data referenced in that 'moof' may be calculated. The end of the media segment is considered the end of the 'mdat' atom containing the furthest byte referenced by the 'moof'.

This avoids any heuristics about e.g. number of 'mdat' atoms per segment, while still being able to identify the end of a media segment at the actual end of that segment, instead of waiting for extra data like a new box header or an explicit EOS.

(Under Mark's proposal, the 'sidx' atom would be considered the start of an initialization segment. Since DASH Live profile typically uses a new 'sidx' atom in front of every 'moof', this would break, since an initialization segment must contain a 'moov'.)
Comment 3 Mark Watson 2012-09-20 18:08:45 UTC
(In reply to comment #2)
> For Chrome, the current definitions we used are:
> 
> - Initialization Segment is a complete 'moov' box. Start is signaled by the
> 'moov' header, end is signaled by getting the number of bytes signaled in the
> 'moov' header.

There may be other boxes ahead of the moov. ftyp, styp for example, that should be considered part of the Initialization Segment (even if you do not use them).

> 
> - Media Segments begin with a 'moof' box header. Once the header is received,
> the furthest extent of the media data referenced in that 'moof' may be
> calculated. The end of the media segment is considered the end of the 'mdat'
> atom containing the furthest byte referenced by the 'moof'.
> 
> This avoids any heuristics about e.g. number of 'mdat' atoms per segment, while
> still being able to identify the end of a media segment at the actual end of
> that segment, instead of waiting for extra data like a new box header or an
> explicit EOS.
> 
> (Under Mark's proposal, the 'sidx' atom would be considered the start of an
> initialization segment. Since DASH Live profile typically uses a new 'sidx'
> atom in front of every 'moof', this would break, since an initialization
> segment must contain a 'moov'.)

Whilst DASH allows you to distribute sidx boxes around the file, I would expect them to be parsed by the script - in order to find the positions of the media segments - and not passed to the Media Source.

You could apply the rules I suggested (or those rules together with your rule about moof box references) with the assumption that unrecognized boxes, including sidx, are ignored by the UA. So if you get <box> <box> <box> <moov> (where <box> is an unrecognized box or sidx) you consider the "first box" to be the moov.
Comment 4 Steven Robertson 2012-09-20 19:02:03 UTC
(In reply to comment #3)
> Whilst DASH allows you to distribute sidx boxes around the file, I would expect
> them to be parsed by the script - in order to find the positions of the media
> segments - and not passed to the Media Source.

Sure, that's common for VOD, but for the typical case in live media, each live DASH segment (sidx-moof-mdat*) is provided at a separate URL, which serves as sufficient framing and thus removes the requirement for the app to parse the media. In this case, I would suggest that implementors anticipate receiving (and ignoring) 'sidx' atoms.
 
> You could apply the rules I suggested (or those rules together with your rule
> about moof box references) with the assumption that unrecognized boxes,
> including sidx, are ignored by the UA. 

Agreed! Chrome's implementation ignores every recognized top-level apart from 'moov', 'moof', and 'mdat'. (It rejects any top-level not defined in any version of 14496-12, since we've found it very difficult for authors to track down the source of framing errors on append without that.)

> So if you get <box> <box> <box> <moov>
> (where <box> is an unrecognized box or sidx) you consider the "first box" to be
> the moov.

IIUC: you're suggesting that for the purposes of avoiding adding a fourth conceptual parse state ('parsing something which is neither an initialization nor a media segment', in addition to 'parsing an initialization segment', 'parsing a media segment', and 'not parsing anything / at a parse-frame boundary'), consider unrecognized boxes to be a prefix of the init/media segment that they precede? If so, I agree here as well.

(I'm less a fan of suffixes, since for a hypothetical parser that did not do incremental parsing and instead waited for complete media segments, and a hypothetical app that only appended complete segments, a full extra segment would be required before data would be visible.)
Comment 5 Aaron Colwell (c) 2012-09-21 20:48:04 UTC
This all sounds reasonable to me. Mark or Steve would you be willing to propose some text for Section 7.3 to clarify this.

Haakon, Do think anything needs to be updated in the WebM section?
Comment 6 Philip Jägenstedt 2012-09-24 08:26:01 UTC
We don't have any issue with the WebM section right now, we'll open new bugs in the future if we find something.
Comment 7 Adrian Bateman [MSFT] 2012-10-21 15:47:08 UTC
Assigned to Mark to work with Steven on text updates.
Comment 8 Aaron Colwell (c) 2012-12-28 21:28:25 UTC
Mark and/or Steve, please provide some new text for the spec that reflects what you have agreed upon here. 

Step 2 of the segment parser loop algorithm already provides a mechanism for skipping bytes that the bytestream specifications say should be ignored. I think you just need to provide text for the ISOBMFF section that outlines what top-level elements should be ignored.
Comment 9 Aaron Colwell (c) 2013-02-05 22:35:47 UTC
Changes committed.
https://dvcs.w3.org/hg/html-media/rev/77975abeec41

Added text about skipping top-level boxes for ISO-BMFF.