Bug 18642 - Handle timestamp overflow in append(data)
Handle timestamp overflow in append(data)
Status: RESOLVED FIXED
Product: HTML WG
Classification: Unclassified
Component: Media Source Extensions
unspecified
PC Linux
: P2 normal
: ---
Assigned To: Aaron Colwell (c)
HTML WG Bugzilla archive list
:
Depends on:
Blocks: 18400
  Show dependency treegraph
 
Reported: 2012-08-21 14:34 UTC by Philip Jägenstedt
Modified: 2012-12-28 21:13 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Jägenstedt 2012-08-21 14:34:25 UTC
http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#dom-append

There's already a check to trigger MEDIA_ERR_DECODE when any modified timestamp would underflow (< 0) but nothing about how to handle overflow. This is particularly likely with 32-bit timestamp fields, but can of course happen regardlessly.

Suggestion: If any modified timestamp would overflow in the representation used in the byte stream, trigger a MEDIA_ERR_DECODE.

In addition, the byte stream specs could place the necessary requirements to make this situation as unlikely as possible.
Comment 1 Steven Robertson 2012-08-21 15:26:01 UTC
> In addition, the byte stream specs could place the necessary requirements 
> to make this situation as unlikely as possible.

It is not possible in the general case to incrementally produce a specification-compliant byte stream which represents the result of compositing media data from multiple independent streams into a source buffer. (For instance, a single ISO BMFF stream cannot contain multiple 'moov' atoms, and therefore cannot be updated with new codec descriptions or timescales on the fly.)

As a result, implementations cannot produce a decodeable media stream by simply tweaking incoming media segments in place. Therefore, no implementation should be limited in its representation of timecodes by the format of the incoming byte stream.
Comment 2 Philip Jägenstedt 2012-08-23 11:46:18 UTC
"Find all timestamps inside data and add timestampOffset to them" certainly sounds like it's suggesting to rewrite the byte stream. The alternative would be to keep the offset as metadata together with the segment to be handled by a customized demuxer at the time of playback, but in that case the quoted step makes no sense as part of append().
Comment 3 Bob Lund 2012-08-28 21:44:12 UTC
(In reply to comment #0)
> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#dom-append
> 
> There's already a check to trigger MEDIA_ERR_DECODE when any modified timestamp
> would underflow (< 0) but nothing about how to handle overflow. This is
> particularly likely with 32-bit timestamp fields, but can of course happen
> regardlessly.
> 
> Suggestion: If any modified timestamp would overflow in the representation used
> in the byte stream, trigger a MEDIA_ERR_DECODE.
> 
> In addition, the byte stream specs could place the necessary requirements to
> make this situation as unlikely as possible.

The MPEG-2 TS Presentation Time Stamp and Decode Time Stamp fields are 33bit counts of a 90KHz clock, which will rollover every 26.5 hours. This is a normal event in a linear (e.g. broadcast) video channel. This does not represent a decode error.

It is not clear if/how a receiver could make this situation unlikely while still remaining compliant with the transport stream specification.
Comment 4 Philip Jägenstedt 2012-08-29 07:59:49 UTC
(In reply to comment #3)
> (In reply to comment #0)
> > http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#dom-append
> > 
> > There's already a check to trigger MEDIA_ERR_DECODE when any modified timestamp
> > would underflow (< 0) but nothing about how to handle overflow. This is
> > particularly likely with 32-bit timestamp fields, but can of course happen
> > regardlessly.
> > 
> > Suggestion: If any modified timestamp would overflow in the representation used
> > in the byte stream, trigger a MEDIA_ERR_DECODE.
> > 
> > In addition, the byte stream specs could place the necessary requirements to
> > make this situation as unlikely as possible.
> 
> The MPEG-2 TS Presentation Time Stamp and Decode Time Stamp fields are 33bit
> counts of a 90KHz clock, which will rollover every 26.5 hours. This is a normal
> event in a linear (e.g. broadcast) video channel. This does not represent a
> decode error.

It should be the byte stream specification that defines what time stamps can be represented. If MPEG-2 has no such restrictions, nothing would be considered overflowing and the condition will not be hit.

> It is not clear if/how a receiver could make this situation unlikely while
> still remaining compliant with the transport stream specification.

The receiver cannot do anything. What I mean is that the byte stream specs should recommend using the largest possible timestamps (I've been told that MPEG-4 supports both 32 and 64 bits) and as large-grained a timescale as possible. But this is just spec fluff, the main thrust of this bug is to handle overflow on the receiving side for MPEG-4 and WebM.
Comment 5 Aaron Colwell (c) 2012-08-30 02:05:37 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #0)
> > > http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#dom-append
> > > 
> > > There's already a check to trigger MEDIA_ERR_DECODE when any modified timestamp
> > > would underflow (< 0) but nothing about how to handle overflow. This is
> > > particularly likely with 32-bit timestamp fields, but can of course happen
> > > regardlessly.
> > > 
> > > Suggestion: If any modified timestamp would overflow in the representation used
> > > in the byte stream, trigger a MEDIA_ERR_DECODE.
> > > 
> > > In addition, the byte stream specs could place the necessary requirements to
> > > make this situation as unlikely as possible.

I agree. I believe text should be added to encourage using the largest timestamps possible to avoid overflow.

> > 
> > The MPEG-2 TS Presentation Time Stamp and Decode Time Stamp fields are 33bit
> > counts of a 90KHz clock, which will rollover every 26.5 hours. This is a normal
> > event in a linear (e.g. broadcast) video channel. This does not represent a
> > decode error.
> 
> It should be the byte stream specification that defines what time stamps can be
> represented. If MPEG-2 has no such restrictions, nothing would be considered
> overflowing and the condition will not be hit.

I believe Bob is saying that this WILL happen about once a day in MPEG-2 TS deployments so it needs to be addressed.

> 
> > It is not clear if/how a receiver could make this situation unlikely while
> > still remaining compliant with the transport stream specification.
> 
> The receiver cannot do anything. What I mean is that the byte stream specs
> should recommend using the largest possible timestamps (I've been told that
> MPEG-4 supports both 32 and 64 bits) and as large-grained a timescale as
> possible. But this is just spec fluff, the main thrust of this bug is to handle
> overflow on the receiving side for MPEG-4 and WebM.

I believe the best compromise is to require support for a rollover within a single media segment. It should only do this if it has exhausted the maximum available bits it can use to represent a timestamp(ie 64 for MP4, 56 for WebM, 33 for MPEG2-TS). If such a rollover happens within a media segment it is pretty straight forward what to do. Just add 2^x to the timestamp that have rolled over. It is the web application's responsibility to detect this case though and adjust SourceBuffer.timestampOffset to an appropriate value before appending the next segment so that the next timestamps map to the correct position in the timeline. 

Since MPEG2-TS is the most frequent rollover example we are dealing with here I believe it would be pretty easy for the web application to periodically inspect the bitstream and predict when it should expect a rollover. When that time approaches it can inspect every media segment it appends until it finds where the rollover happens and then updates SourceBuffer.timestampOffset at the appropriate time. This should only happen for a brief period once a day so I don't think it should be a huge performance concern.

What do you think?
Comment 6 Steven Robertson 2012-08-30 03:15:59 UTC
Sorry to be nit-picky here, but since our organization spends an astonishing amount of resources annually to re-encode perfectly compliant files in order to work around broken video stacks, I'm pretty keen on hammering down the details regarding media formats ;)

(In reply to comment #5)
> > > > In addition, the byte stream specs could place the necessary requirements to
> > > > make this situation as unlikely as possible.
> 
> I agree. I believe text should be added to encourage using the largest
> timestamps possible to avoid overflow.

I'm not sure I understand why this would be helpful.

Field-size decisions are local to a box in BMFF, and local to a field in EBML. This means that any media segment can be muxed to the maximum field width should its local timestamps overflow smaller field widths, regardless of previous timestamp sizes.

The only application I can see that might require modifying these values in-place would be to rewrite timestamps from JavaScript. This is, however, precisely the manipulation that the offset mechanism obviates. As a result, there does not seem to be a use for suggesting oversized timestamps in media formats from the JavaScript layer. Is there another use of rewriting values in-place that I'm not seeing?

As previously mentioned, it is not possible to implement a Source Buffer without demuxing media segments and storing metadata in a private, sample-based format, in order to support mandatory spec features such as incremental parsing of segments, 'abort()' of partially-appended media segments, and segment overlapping. Is there an implementation reason why such a private format would have a precision limited by the precision of the incoming media format?

Even if the implementation of the source buffer is only loosely coupled to the DOM, these features require additional side-channel information for these Media Source-specific elements of a spec in addition to the media data pushed into the Source Buffer. If this information is provided via a side-channel, it seems reasonable to me that the information regarding the timestamp offset could be provided in the same side-channel. In that case, there does not seem to be a reason to rewrite timestamps in place at the implementation level. Is there a reason why information like 'abort()' calls could be transmitted to the demuxer, but not information like the timestamp offset?

Is there a reason why spec-compliant files which did not use the maximum field width for all fields would result in a degraded experience beyond those discussed above?
Comment 7 Aaron Colwell (c) 2012-08-30 05:32:26 UTC
(In reply to comment #6)
> Sorry to be nit-picky here, but since our organization spends an astonishing
> amount of resources annually to re-encode perfectly compliant files in order to
> work around broken video stacks, I'm pretty keen on hammering down the details
> regarding media formats ;)
> 
> (In reply to comment #5)
> > > > > In addition, the byte stream specs could place the necessary requirements to
> > > > > make this situation as unlikely as possible.
> > 
> > I agree. I believe text should be added to encourage using the largest
> > timestamps possible to avoid overflow.
> 
> I'm not sure I understand why this would be helpful.
> 
> Field-size decisions are local to a box in BMFF, and local to a field in EBML.
> This means that any media segment can be muxed to the maximum field width
> should its local timestamps overflow smaller field widths, regardless of
> previous timestamp sizes.

I agree. I wasn't trying to say that the maximum timestamp field size must always be used. I was trying to say that we should only allow rollover if all possible bits have been exhausted. WebM allows 7 - 56 bits to be used and I'd expect a muxer to use 7 bits for times near 0 and more bits later in the file. I'd only expect a rollover when the timestamps exhausted available 56 bits.

> 
> The only application I can see that might require modifying these values
> in-place would be to rewrite timestamps from JavaScript. This is, however,
> precisely the manipulation that the offset mechanism obviates. As a result,
> there does not seem to be a use for suggesting oversized timestamps in media
> formats from the JavaScript layer. Is there another use of rewriting values
> in-place that I'm not seeing?

I was not intending to imply byte stream rewriting. It appears I need to clarify that text. I intended this to mean that the timestampOffset was applied to whatever timestamps were the result of demuxing the appended data. I'm assuming that the UA is using more than 32 bits to represent timestamps internally. I guess I need to clarify this. 

> 
> As previously mentioned, it is not possible to implement a Source Buffer
> without demuxing media segments and storing metadata in a private, sample-based
> format, in order to support mandatory spec features such as incremental parsing
> of segments, 'abort()' of partially-appended media segments, and segment
> overlapping. Is there an implementation reason why such a private format would
> have a precision limited by the precision of the incoming media format?

No. My intent was not to limit presentation timestamp range to only what can be represented by the format. All the time related fields in HTMLMediaElement are doubles so I assumed that the UA would take advantage of that sizable range. Any UA that properly supports long running live broadcasts in a format with only 32-bit timestamp fields should already deal with this and represent timestamps w/ more than 32-bits. 

> 
> Even if the implementation of the source buffer is only loosely coupled to the
> DOM, these features require additional side-channel information for these Media
> Source-specific elements of a spec in addition to the media data pushed into
> the Source Buffer. If this information is provided via a side-channel, it seems
> reasonable to me that the information regarding the timestamp offset could be
> provided in the same side-channel. In that case, there does not seem to be a
> reason to rewrite timestamps in place at the implementation level. Is there a
> reason why information like 'abort()' calls could be transmitted to the
> demuxer, but not information like the timestamp offset?

This seems like an implementation detail that a UA needs to sort out. In Chrome we do not rewrite the byte stream data. We demux the data and then apply the offset to our internal timestamp representation, which happens to be 64-bit integer. It was never my intent for a decoder error to occur if applying a timestamp offset would cause a rollover in the format's timestamp representation. It was my intent to have the format timestamp representation converted to a UA internal representation which then eventually gets exposed as a double through TimeRanges. I think it would be very surprising to a developer if the range of a double was truncated in some way by the byte stream format being used.

> 
> Is there a reason why spec-compliant files which did not use the maximum field
> width for all fields would result in a degraded experience beyond those
> discussed above?
If a format could use 64-bits, but chose to wrap around at 32-bits this would provide an ambiguous situation for the UA. Within a single media segment it isn't ambiguous, but across media segments it is because there is no way to determine if the segment actually is a roll over or just and out-of-order append. Only the web application is in a position to determine what is going on because it knows whether the two media segments are supposed to be adjacent or not.
Comment 8 Philip Jägenstedt 2012-09-12 13:40:38 UTC
We've abandoned the strategy of rewriting the byte stream in favor of data+offset, which makes this bug invalid. Leaving open in case there's any clarification the editor wants to make, since the spec does invite such a misunderstanding by saying things like "Find all timestamps inside data and add timestampOffset to them."
Comment 9 Aaron Colwell (c) 2012-09-21 21:23:00 UTC
(In reply to comment #8)
> We've abandoned the strategy of rewriting the byte stream in favor of
> data+offset, which makes this bug invalid. Leaving open in case there's any
> clarification the editor wants to make, since the spec does invite such a
> misunderstanding by saying things like "Find all timestamps inside data and add
> timestampOffset to them."

Ok. I definitely think you are right about this text being confusing. I was trying to stay format neutral and not bias the text towards Chrome implementation details here, but clearly I failed on communicating my intent. I'll figure out a way to make this clearer.
Comment 10 Adrian Bateman [MSFT] 2012-10-21 16:19:47 UTC
Issues identified here:
1. The text needs to be reworked so people don't think that the bytestream is being rewritten in place.
2. Timestamp rollover behavior needs to be clarified. I believe the concensus was that rollover within a media segment must be handled, but it is up to the application to handle it across media segments via timestampOffset.
3. Conversion from bytestream timestamp to presentation timestamp double needs clarification.

Next Action: Reorganize append() algorithm into several sub-algorithms so that there are logical places to put the clarifications mentioned above. Several open issues require this.

Assigned to Aaron.
Comment 11 Aaron Colwell (c) 2012-12-28 21:13:02 UTC
I believe changes made in the following revisions have addressed the outstanding issues.

http://dvcs.w3.org/hg/html-media/rev/e1c91093dfdc
- Clarifies how MPEG2-TS timestamp rollover should be handled.

http://dvcs.w3.org/hg/html-media/rev/4d013fe2dbec
- Introduced coded frame processing algorithm that explicitly states the byte stream timestamps are converted to doubles before the timestampOffset is applied.