18615 – Define how SourceBuffer.buffered maps to HTMLMediaElement.buffered

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18615 - Define how SourceBuffer.buffered maps to HTMLMediaElement.buffered

Summary: Define how SourceBuffer.buffered maps to HTMLMediaElement.buffered

Status:	RESOLVED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	Media Source Extensions (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Aaron Colwell (c)
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-08-17 23:11 UTC by Aaron Colwell (c)
Modified:	2013-01-31 19:56 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

The current spec text does not specify how HTMLMediaElement.buffered is calculated.

Here is a proposal based on what we currently have implemented in Chrome.

- While readyState is "open" HTMLMediaElement.buffered is the intersection of the SourceBuffer.buffered ranges for all sources in MediaSource.activeSourceBuffers.
- While readyState is "ended" HTMLMediaElement.buffered is the intersection above but the end time of the last range in the intersection is extended to the highest end time across all SourceBuffer.buffered ranges in MediaSource.activeSourceBuffers.

The behavior is different based on readyState because in the "open" state a difference between SourceBuffer.buffered ranges could indicate missing data that should stall playback. Once there has been a transition to "ended" the application has signalled that it has appended everything it wants played so a mismatch in the last ranges simply indicate different stream lengths not missing data.

Comment 1 Philip Jägenstedt 2012-08-20 09:28:35 UTC

That sounds sensible, but I think there's a bug in the definition for "ended". If the video SourceBuffer has [0,10] and [20,30] buffered and the audio SourceBuffer has [0,15] buffered, does it really make sense to report [0,30] even though no data at all is available for [15,20]?

(In reply to comment #1)
> That sounds sensible, but I think there's a bug in the definition for "ended".
> If the video SourceBuffer has [0,10] and [20,30] buffered and the audio
> SourceBuffer has [0,15] buffered, does it really make sense to report [0,30]
> even though no data at all is available for [15,20]?

You're right. I can think of two routes to avoid this situation.

Solution 1:
In MediaSource.endOfStream(), check to make sure that the last ranges in the activeSourceBuffers overlap and throw an exception if they don't. This would prevent a transition to "ended" when this situation occurs.

Solution 2:
1. Collect all the ranges that intersect and then find the highest end timestamp of that set. 
2. remove(highest end timestamp, duration) 
3. Update duration to highest end timestamp.

Currently Chrome implements solution 1 with the added criteria that the current playback position must be in the last range. If it isn't then endOfStream() throws an exception. This was to prevent playback from getting stalled if there was a gap in the buffered data between the current position and the last ranges. This second condition may not be necessary anymore since you can append() your way out of "ended" now.

What do you think? Do you have a preference here or an alternate suggestion?

Comment 3 Philip Jägenstedt 2012-10-08 09:03:01 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > That sounds sensible, but I think there's a bug in the definition for "ended".
> > If the video SourceBuffer has [0,10] and [20,30] buffered and the audio
> > SourceBuffer has [0,15] buffered, does it really make sense to report [0,30]
> > even though no data at all is available for [15,20]?
> 
> You're right. I can think of two routes to avoid this situation.
> 
> Solution 1:
> In MediaSource.endOfStream(), check to make sure that the last ranges in the
> activeSourceBuffers overlap and throw an exception if they don't. This would
> prevent a transition to "ended" when this situation occurs.

Would you do this only for endOfStream(), or also for endOfStream("network")? In the network error case, it doesn't really make sense.

> Solution 2:
> 1. Collect all the ranges that intersect and then find the highest end
> timestamp of that set. 
> 2. remove(highest end timestamp, duration) 
> 3. Update duration to highest end timestamp.
> 
> Currently Chrome implements solution 1 with the added criteria that the current
> playback position must be in the last range. If it isn't then endOfStream()
> throws an exception. This was to prevent playback from getting stalled if there
> was a gap in the buffered data between the current position and the last
> ranges. This second condition may not be necessary anymore since you can
> append() your way out of "ended" now.
> 
> What do you think? Do you have a preference here or an alternate suggestion?

I do think it would be nice to have the same logic for all cases of endOfStream to reduce the overall complexity. In other words, something like solution 2, where data that cannot be used is thrown away, sounds slightly preferable. However, I don't feel strongly about this, it's also a question about what behavior would be the most useful to authors.

HTMLMediaElement.buffered return value text changes committed.
http://dvcs.w3.org/hg/html-media/rev/c3de559a1c37

endOfStream() behavior will be updated in a followup change.

Comment 5 Philip Jägenstedt 2012-10-12 11:56:01 UTC

The new spec text is in http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#htmlmediaelement-attributes

The important bit is "If the highest intersection end time is less than the highest end time, then update the intersection range so that the highest intersection end time equals the highest end time." As far as I can tell, this has the same problem as noted in Comment 1, as the highest intersection end time will be 10 and the highest end time will be 30.

It's also ambiguos what "so that the highest intersection end time equals the highest end time" means, since it could mean either modifying the last range (intended) or adding a new range (not intended). Talking about the end time of the last range and modifying the last range could make this clearer.

Comment 6 Adrian Bateman [MSFT] 2012-10-22 00:53:27 UTC

Assigned to Aaron to work with Philip to identify and remove any ambiguity.

Comment 7 Adrian Bateman [MSFT] 2013-01-29 03:55:02 UTC

Our proposal is as follows:

* The HTMLMediaElement.buffered range is the union of the time ranges of all the active audio and video ranges in the activeSourceBuffers collection.

* If there is video content but not audio then fill with silence.

* If there is audio content but not video content then fill with the final frame of video before the gap and play the audio.

* If there is no content for a range then stall the playback waiting for the application to fill it.

* We don't think this should change based on endOfStream.

Just for the record, here is the updated algorithm that I proposed to the list.
http://lists.w3.org/Archives/Public/public-html-media/2013Jan/0021.html

I believe it addresses the issue that Philip raised.

1. If activeSourceBuffers.length equals 0 then return an empty TimeRanges object
   and abort these steps.
2. Let active ranges be the ranges returned by buffered for
   each SourceBuffer object in activeSourceBuffers.
3. Let highest end time be the largest range end time in the active
   ranges.
4. Let intersection ranges equal a TimeRange from 0 to highest end time.
5. For each SourceBuffer object in activeSourceBuffers run the following steps:
      1. Let source ranges equal the ranges returned by the buffered attribute on the current range.
      2. If readyState is "ended", then set the end time on the last range in source ranges to highest end time.
      3. Let new intersection ranges equal the the intersection between the intersection ranges and the source ranges.
      4. Replace the ranges in intersection ranges with the new intersection ranges.
6. Return the intersection ranges.

(In reply to comment #7)

I have a few concerns with this proposal, especially in relation to demuxed content where different SourceBuffers are used for audio & video.

> Our proposal is as follows:
> 
> * The HTMLMediaElement.buffered range is the union of the time ranges of all
> the active audio and video ranges in the activeSourceBuffers collection.

My problem with using the union is that it hides the fact that we may not have data buffered for all active streams. If the audio source buffer has data buffered for [0-30] and the video source buffer has video buffered for [0-10] and [20-30] the union would mislead the web application into thinking that it has all the data for starting playback at 15. In this situation a seek to 15 should stall in my opinion. 

> 
> * If there is video content but not audio then fill with silence.
> 
> * If there is audio content but not video content then fill with the final
> frame of video before the gap and play the audio.
> 
> * If there is no content for a range then stall the playback waiting for the
> application to fill it.

For muxed content I don't really have a problem with this, but when using demuxed content where the audio & video are in different source buffers, I think these rules could lead to unexpected behavior. Say I only append data to the video source buffer. If I understand these rules then playback should start immediately and silence should be output until the audio data actually gets appended. This doesn't seem like a good user experience. Perhaps I'm misunderstanding something here.

Another scenario where these rules seem to be problematic is when playback encounters a gap where one of the SourceBuffers is missing data like the example I give above. If playback starts at 0, I'd expect playback to stall when it gets to 10 since there is missing video data for the range [10-20]. If I understand these rules correctly, you'd want playback to continue over this gap and playback not to stall.

> 
> * We don't think this should change based on endOfStream.

The reason I have this behavior changing based on endOfStream is because until endOfStream() is called, there is no way to differentiate whether we are actually at the end of the content or whether we are waiting for more content to be appended. In the former case, it seems like we could follow the rules you suggest, but in the later case where we don't know whether more content is going to be appended it seems better to stall when the playback position reaches an area where there is missing data. That is why I was suggesting the use of the intersection instead of the union, because the intersection reflects the ranges were playback won't stall.

Changes committed
https://dvcs.w3.org/hg/html-media/rev/aae26333e7d1

I replaced the existing algorithm with my proposal in Comment 8 so that Philip's example is addressed. This simply fixes the corner case he uncovered and doesn't represent a fundamental change in the existing interpretation of this field.

I'm leaving this bug open though until the discussion w/ Adrian has completed since that discussion may lead to further changes to this algorithm.

Comment 11 Adrian Bateman [MSFT] 2013-01-31 19:56:10 UTC

We've discussed this again and believe we can work with Aaron's proposal. I'm resolving the bug as fixed by Aaron's change. If anyone disagrees with the resolution they should reopen.