18921 – append(data) should accept any part of media segment

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18921 - append(data) should accept any part of media segment

Summary: append(data) should accept any part of media segment

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	Media Source Extensions (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Adrian Bateman [MSFT]
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	18922
	Show dependency tree / graph

Reported:	2012-09-19 19:03 UTC by Hadar Weiss
Modified:	2012-10-21 15:56 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description Hadar Weiss 2012-09-19 19:03:50 UTC

From step 6 of the append method, as defined in the spec:
If data is part of a media segment and timestampOffset is not 0:
1.Find all timestamps inside data and add timestampOffset to them.
2.If any of the modified timestamps are earlier than the presentation start time, then call endOfStream("decode"), and abort these steps.
3.Copy the contents of data, with the modified timestamps, into the source buffer.

The definition of "part of a media segment" have several conditions that needs to be addressed. In particular, a part of a media segment that is not continuous to a previously appended buffer.

In seeking scenario, after the current source buffer is aborted, a user might append a buffer which is not a beginning of a segment. In that case, the browser should discard all the bytes that are prior to a beginning of a segment. This will protect the playback from failing. In some cases, there can be several "small" appends which are part of previous segment which should all be discarded, until the append of buffer the contain the start of a segment.

Without that, js developers must employ parsers to have the accurate offset of all segments.

Comment 1 Mark Watson 2012-09-19 19:14:40 UTC

(In reply to comment #0)
> From step 6 of the append method, as defined in the spec:
> If data is part of a media segment and timestampOffset is not 0:
> 1.Find all timestamps inside data and add timestampOffset to them.
> 2.If any of the modified timestamps are earlier than the presentation start
> time, then call endOfStream("decode"), and abort these steps.
> 3.Copy the contents of data, with the modified timestamps, into the source
> buffer.
> 
> The definition of "part of a media segment" have several conditions that needs
> to be addressed. In particular, a part of a media segment that is not
> continuous to a previously appended buffer.
> 
> In seeking scenario, after the current source buffer is aborted, a user might
> append a buffer which is not a beginning of a segment. 

No, at the beginning, or after flushing the buffer, appending must begin at the start of a segment.

It's not an assumption that the media file format supports 'discovery' of segment boundaries.

> In that case, the
> browser should discard all the bytes that are prior to a beginning of a
> segment. 

It may not have any way to recognize segment boundaries. The application should be aware of segment boundaries and append only from the beginning of a segment.

> This will protect the playback from failing. In some cases, there can
> be several "small" appends which are part of previous segment which should all
> be discarded, until the append of buffer the contain the start of a segment.
> 
> Without that, js developers must employ parsers to have the accurate offset of
> all segments.

Yep.

Comment 2 Hadar Weiss 2012-09-19 19:27:38 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > From step 6 of the append method, as defined in the spec:
> > If data is part of a media segment and timestampOffset is not 0:
> > 1.Find all timestamps inside data and add timestampOffset to them.
> > 2.If any of the modified timestamps are earlier than the presentation start
> > time, then call endOfStream("decode"), and abort these steps.
> > 3.Copy the contents of data, with the modified timestamps, into the source
> > buffer.
> > 
> > The definition of "part of a media segment" have several conditions that needs
> > to be addressed. In particular, a part of a media segment that is not
> > continuous to a previously appended buffer.
> > 
> > In seeking scenario, after the current source buffer is aborted, a user might
> > append a buffer which is not a beginning of a segment. 
> 
> No, at the beginning, or after flushing the buffer, appending must begin at the
> start of a segment.
> 
> It's not an assumption that the media file format supports 'discovery' of
> segment boundaries.
> 
> > In that case, the
> > browser should discard all the bytes that are prior to a beginning of a
> > segment. 
> 
> It may not have any way to recognize segment boundaries. The application should
> be aware of segment boundaries and append only from the beginning of a segment.
> 

> > This will protect the playback from failing. In some cases, there can
> > be several "small" appends which are part of previous segment which should all
> > be discarded, until the append of buffer the contain the start of a segment.
> > 
> > Without that, js developers must employ parsers to have the accurate offset of
> > all segments.
> 
> Yep.



The spec says: "It must be possible to identify segment boundaries and segment type (initialization or media) by examining the byte stream alone."

If there is a way for the browser, even a complicated way to discover the boundaries, I think it should be done. It has a major significance on the API users.

(In reply to comment #2)
> (In reply to comment #1)
> > > This will protect the playback from failing. In some cases, there can
> > > be several "small" appends which are part of previous segment which should all
> > > be discarded, until the append of buffer the contain the start of a segment.
> > > 
> > > Without that, js developers must employ parsers to have the accurate offset of
> > > all segments.
> > 
> > Yep.
> 
> 
> 
> The spec says: "It must be possible to identify segment boundaries and segment
> type (initialization or media) by examining the byte stream alone."
> 
> If there is a way for the browser, even a complicated way to discover the
> boundaries, I think it should be done. It has a major significance on the API
> users.

This doesn't mean that it can identify the boundries at arbitrary points in the stream. This was only intended to mean that it should be able to identify the difference between the beginning of an initialization segment and a media segment without any signalling from JavaScript. It doesn't mean you can append arbitrary combinations in the middle of other segments. That would make validating the byte stream EXTREMELY difficult and would likely cause more developer confusion.

This is an advanced media API and expects the web application to be able to identify where segment boundaries are. The expectation is that some sort of manifiest, like DASH or HLS have, will be available to the application so that it can determine where the segment boundaries are w/o having to implement format parsing in JavaScript

Comment 4 Hadar Weiss 2012-09-21 23:33:36 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > > This will protect the playback from failing. In some cases, there can
> > > > be several "small" appends which are part of previous segment which should all
> > > > be discarded, until the append of buffer the contain the start of a segment.
> > > > 
> > > > Without that, js developers must employ parsers to have the accurate offset of
> > > > all segments.
> > > 
> > > Yep.
> > 
> > 
> > 
> > The spec says: "It must be possible to identify segment boundaries and segment
> > type (initialization or media) by examining the byte stream alone."
> > 
> > If there is a way for the browser, even a complicated way to discover the
> > boundaries, I think it should be done. It has a major significance on the API
> > users.
> 
> This doesn't mean that it can identify the boundries at arbitrary points in the
> stream. This was only intended to mean that it should be able to identify the
> difference between the beginning of an initialization segment and a media
> segment without any signalling from JavaScript. It doesn't mean you can append
> arbitrary combinations in the middle of other segments. That would make
> validating the byte stream EXTREMELY difficult and would likely cause more
> developer confusion.
> 
> This is an advanced media API and expects the web application to be able to
> identify where segment boundaries are. The expectation is that some sort of
> manifiest, like DASH or HLS have, will be available to the application so that
> it can determine where the segment boundaries are w/o having to implement
> format parsing in JavaScript

One thing that can help, is to expose the JavaScript with the segments cue points (like in Flash video), which should be available after the video is parsed. This should help the developer to easily implement seeking. In particular, SeekToMediaSegmentAt(video.currentTime) would be very easy.

So what I'm suggesting is to expose parsed meta data about the video once its initialization has finished.

(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > This is an advanced media API and expects the web application to be able to
> > identify where segment boundaries are. The expectation is that some sort of
> > manifiest, like DASH or HLS have, will be available to the application so that
> > it can determine where the segment boundaries are w/o having to implement
> > format parsing in JavaScript
> 
> One thing that can help, is to expose the JavaScript with the segments cue
> points (like in Flash video), which should be available after the video is
> parsed. This should help the developer to easily implement seeking. In
> particular, SeekToMediaSegmentAt(video.currentTime) would be very easy.
> 
> So what I'm suggesting is to expose parsed meta data about the video once its
> initialization has finished.

I think you are assuming that all the segments come from a single file. The intent of this API is not to tie itself to that restriction. I believe you are also assuming that the cues are always at the beginning of the file which is also not necessarily true with the supported formats. Something would need to parse parts of the file to properly fetch the cues from the end.

This API deals with sequences of segments, not files. It intentionally breaks some of the restrictions imposed by files so that you can compose presentations from pieces of different files.

Why don't you just do manifest creation on the backend?

Comment 6 Adrian Bateman [MSFT] 2012-10-21 15:56:45 UTC

It is a requirement that applications do have an accurate offset of all segments.