This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21333 - consequences of requirement in section 8 on initialisation segments
Summary: consequences of requirement in section 8 on initialisation segments
Status: RESOLVED LATER
Alias: None
Product: HTML WG
Classification: Unclassified
Component: Media Source Extensions (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Adrian Bateman [MSFT]
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-19 12:44 UTC by Jon Piesing (OIPF)
Modified: 2013-04-08 21:33 UTC (History)
6 users (show)

See Also:


Attachments

Description Jon Piesing (OIPF) 2013-03-19 12:44:52 UTC
This issue arises from joint discussions between the Open IPTV Forum, HbbTV and the UK DTG. These organizations originally sent a liaison statement to the W3C Web & TV IG:

https://lists.w3.org/Archives/Member/member-web-and-tv/2013Jan/0000.html (W3C member only link)

We have some concerns about the following language in section 8.1.

"The following rules apply to all initialization segments within a byte stream: 
1.	The number and type of tracks must be consistent.
For example, if the first initialization segment has 2 audio tracks and 1 video track, then all initialization segments that follow it in the byte stream must describe 2 audio tracks and 1 video track.
....
4. Codecs changes are not allowed.

For example, a byte stream that starts with an initialization segment that specifies a single AAC track and later contains an initialization segment that specifies a single AMR-WB track is not allowed. Support for multiple codecs is handled with multiple SourceBuffer objects."

Firstly we are trying to understand the consequences for our advert insertion use-cases.

Adverts are unlikely to have the same video and audio tracks as the main content item and may not have the same codec (particularly audio). Hence these restrictions would seem to force adverts to be handled as separate SourceBuffer objects from the main content items.

Is a (relatively) seamless transition between content from different SourceBuffers possible with MSE?

Secondly all 3 organisations had lengthy and controversial debates last year about requirements for a common initialisation segment in DASH. It is understood that MPEG are defining new boxes to enable new initialisation data to be carried in-band. This language would seem to exclude the use of those new boxes and again would force the use of separate SourceBuffers. Is this correct?
Comment 1 Jon Piesing (OIPF) 2013-03-19 12:51:25 UTC
To avoid any confusion, this language is in section 8 in the first public working draft and section 10 in the current editors draft.
Comment 2 Simon Waller 2013-03-19 13:58:06 UTC
Related to this is the requirement for consecutive media segments to be "monotonically increasing in time without any gaps" (taken from definition of Append Sequence). Adverts will have their own timeline unrelated to each other or the main content.
Comment 3 Adrian Bateman [MSFT] 2013-03-19 14:29:14 UTC
(In reply to comment #2)
> Related to this is the requirement for consecutive media segments to be
> "monotonically increasing in time without any gaps" (taken from definition
> of Append Sequence). Adverts will have their own timeline unrelated to each
> other or the main content.

See timestampOffset (https://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#widl-SourceBuffer-timestampOffset).
Comment 4 Adrian Bateman [MSFT] 2013-03-19 14:40:30 UTC
(In reply to comment #0)
> Adverts are unlikely to have the same video and audio tracks as the main
> content item and may not have the same codec (particularly audio). Hence
> these restrictions would seem to force adverts to be handled as separate
> SourceBuffer objects from the main content items.

How would this work without using a different SourceBuffer? I don't think it's common for media engines to be able to consume content of a different encoding mid-stream. MSE only abstracts away the network layer (how data gets to the media element).

> Is a (relatively) seamless transition between content from different
> SourceBuffers possible with MSE?

This might be an area where we need a proposal from you.

> Secondly all 3 organisations had lengthy and controversial debates last year
> about requirements for a common initialisation segment in DASH. It is
> understood that MPEG are defining new boxes to enable new initialisation
> data to be carried in-band. This language would seem to exclude the use of
> those new boxes and again would force the use of separate SourceBuffers. Is
> this correct?

The specific Byte Stream Format sections of the spec are informative and not required to be supported. The idea is that if you want to support one of these formats then you should do it in the way presented to promote interoperability. If there are new formats created then these would need to be handled according to the general requirements for byte stream formats at the start of that section.
Comment 5 Adrian Bateman [MSFT] 2013-03-19 14:44:52 UTC
I think this bug might be about two issues:

1) How to seamlessly transition between SourceBuffers

2) A proposal for a modification/new section for the byte stream format based on new work by MPEG.

I would recommend closing this bug and opening separate bugs to track specific issues. Alternatively, if only one of those issues is intended, this bug could be changed to track the more specific issue.
Comment 6 Jon Piesing (OIPF) 2013-03-19 14:55:33 UTC
At the moment we (i.e. the participants in the HbbTV / OIPF / UK DTG co-operation) are still trying to understand MSE and how to co-operate with the W3C. If splitting out more focussed separate issues as they become apparent is the W3C way of working then of course that can be done.
Comment 7 Michael Thornburgh 2013-03-25 23:04:09 UTC
regarding "1) How to seamlessly transition between SourceBuffers", i also feel this is crucial, both for acceptable user experience and for maintaining a reasonable and achievable level of burden for the Javascript programmer.  i strongly support the Reporting Party opening a new bug specific to this issue.

(user experience) if a transition between buffers must be managed from Javascript (based on current timeline playback position or triggers/cues in a text track), you can't guarantee frame-precise transitions because of Javascript scheduling latency. depending on how the media is laid out, you could have a late transition to an ad end up showing an inappropriate or premature part of the main program, or missing a beginning portion of an ad.  depending on how the video decoder works (and how many video decoder contexts the platform supports), missing the first keyframe of the ad could cause a skip or blackness til the next keyframe (so potentially not showing several seconds of the ad).

(Javascript burden) it is conceivable that the main program and every ad to be inserted (potentially multiple back-to-back ads in one ad break) may have a different format requiring separate SourceBuffers in the current model. especially in circumstances where the format isn't known ahead of time (it should be sufficient to just extract the format at play time directly from the media), handling the possibility of different formats without having to parse the media files in Javascript, this could lead to a large number of SourceBuffers that must be maintained in Javascript, switched between for playback and seek, and routed to during append.  the browser may not allow enough SourceBuffers to be instantiated simultaneously to maintain the desired amount of ahead-buffering.  the Javascript application will have to manually do garbage collection on all these SourceBuffers as their respective portions of the timeline are evicted.

having to maintain the separate SourceBuffers seems like an unnecessary and artificial complication to the API, when conceptually you have a timeline of media that you've laid out and that you just want to play out [as] seamlessly [as possible].

i propose removing the constraint that a SourceBuffer can't have its codecs or number/type of tracks change, and dispatch events to notify Javascript when transitions/track configuration changes are reached and automatically handled.  this can guarantee the most seamless transition possible on the platform, with simple Javascript and no time-critical JS execution required.
Comment 8 Adrian Bateman [MSFT] 2013-03-26 04:25:50 UTC
(In reply to comment #7)
> i propose removing the constraint that a SourceBuffer can't have its codecs
> or number/type of tracks change, and dispatch events to notify Javascript
> when transitions/track configuration changes are reached and automatically
> handled.  this can guarantee the most seamless transition possible on the
> platform, with simple Javascript and no time-critical JS execution required.

Since MSE attempts to abstract away the network tier from HTML Media Elements, this suggestion seems equivalent to letting a data being progressively downloaded for a <source> element change format mid-download. I'm not aware of anyone building something like this. I don't think SourceBuffers should be required to support multiple formats.

If you have multiple SourceBuffers then you need to switch tracks, but you have this situation with HTML Media Elements already. It's not clear to me why the problem of track switching is specific to MSE. Isn't this actually a HTML5 issue that could be required of any media element including those performing progressive download over HTTP? As I suggested on the last call, this might be something solved outside the scope of MSE.
Comment 9 Aaron Colwell (c) 2013-03-26 18:47:28 UTC
(In reply to comment #7)
> regarding "1) How to seamlessly transition between SourceBuffers", i also
> feel this is crucial, both for acceptable user experience and for
> maintaining a reasonable and achievable level of burden for the Javascript
> programmer.  i strongly support the Reporting Party opening a new bug
> specific to this issue.
> 
> (user experience) if a transition between buffers must be managed from
> Javascript (based on current timeline playback position or triggers/cues in
> a text track), you can't guarantee frame-precise transitions because of
> Javascript scheduling latency. depending on how the media is laid out, you
> could have a late transition to an ad end up showing an inappropriate or
> premature part of the main program, or missing a beginning portion of an ad.
> depending on how the video decoder works (and how many video decoder
> contexts the platform supports), missing the first keyframe of the ad could
> cause a skip or blackness til the next keyframe (so potentially not showing
> several seconds of the ad).
> 
> (Javascript burden) it is conceivable that the main program and every ad to
> be inserted (potentially multiple back-to-back ads in one ad break) may have
> a different format requiring separate SourceBuffers in the current model.
> especially in circumstances where the format isn't known ahead of time (it
> should be sufficient to just extract the format at play time directly from
> the media), handling the possibility of different formats without having to
> parse the media files in Javascript, this could lead to a large number of
> SourceBuffers that must be maintained in Javascript, switched between for
> playback and seek, and routed to during append.  the browser may not allow
> enough SourceBuffers to be instantiated simultaneously to maintain the
> desired amount of ahead-buffering.  the Javascript application will have to
> manually do garbage collection on all these SourceBuffers as their
> respective portions of the timeline are evicted.
> 
> having to maintain the separate SourceBuffers seems like an unnecessary and
> artificial complication to the API, when conceptually you have a timeline of
> media that you've laid out and that you just want to play out [as]
> seamlessly [as possible].
> 
> i propose removing the constraint that a SourceBuffer can't have its codecs
> or number/type of tracks change, and dispatch events to notify Javascript
> when transitions/track configuration changes are reached and automatically
> handled.  this can guarantee the most seamless transition possible on the
> platform, with simple Javascript and no time-critical JS execution required.

I don't believe this should be a goal for v1. The existing spec is already particularly taxing for existing media engines and I believe that this will just add another huge amount of work that isn't really needed right now. I think this is a "nice to have" not a "must have". If content providers & ad providers decide not to unify their content, then that is a choice, but like any choice there are consequences. I feel like we already have allowed much more change then the current HTML5 spec allows so I think we should wait and see how the things we've speced out so far pan out before adding something like this.
Comment 10 Michael Thornburgh 2013-03-26 20:42:58 UTC
(In reply to comment #8)> 
> Since MSE attempts to abstract away the network tier from HTML Media
> Elements, this suggestion seems equivalent to letting a data being
> progressively downloaded for a <source> element change format mid-download.
> I'm not aware of anyone building something like this. I don't think
> SourceBuffers should be required to support multiple formats.

if i gave a URL for a media file to a <video> element, and it was playable, the least surprising thing to do would be to make a best effort to play it even if the format changed partway through.  and note that some format changes are allowed already (like video resolution or number of audio channels), but some are not allowed (like codec, or number of tracks, or track IDs when there's more than one track of a particular kind).

> If you have multiple SourceBuffers then you need to switch tracks, but you
> have this situation with HTML Media Elements already. It's not clear to me
> why the problem of track switching is specific to MSE. Isn't this actually a
> HTML5 issue that could be required of any media element including those
> performing progressive download over HTTP? As I suggested on the last call,
> this might be something solved outside the scope of MSE.

i agree that MSE isn't *necessarily* the place to solve the "seamless track changes at exactly the right time" problem (although the existence of the problem in the first place feels unnecessary to me).  however, seamless playback is only part of the problem. i also enumerated issues from the Javascript side, including SourceBuffer explosion and having to manage/create/destroy multiple SourceBuffers in general, when all you want to accomplish is to lay out some media in the timeline as naturally as possible and have it play as smoothly as possible.
Comment 11 Michael Thornburgh 2013-03-26 20:54:08 UTC
(In reply to comment #9)
> I don't believe this should be a goal for v1. The existing spec is already
> particularly taxing for existing media engines and I believe that this will
> just add another huge amount of work that isn't really needed right now. I
> think this is a "nice to have" not a "must have". If content providers & ad
> providers decide not to unify their content, then that is a choice, but like
> any choice there are consequences. I feel like we already have allowed much
> more change then the current HTML5 spec allows so I think we should wait and
> see how the things we've speced out so far pan out before adding something
> like this.

i think a very common use case will involve ads from ad providers that will be multiplexed video and english-only audio (and no captions), and primary content with separated video and audio, with audio in multiple languages, and caption tracks in multiple languages.  video engines would already have to support being reconfigured on the fly under Javascript control. having it be automatic solves the problem once properly, decreases complexity for the JS programmer, and improves the user experience.
Comment 12 Aaron Colwell (c) 2013-03-26 21:23:23 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > I don't believe this should be a goal for v1. The existing spec is already
> > particularly taxing for existing media engines and I believe that this will
> > just add another huge amount of work that isn't really needed right now. I
> > think this is a "nice to have" not a "must have". If content providers & ad
> > providers decide not to unify their content, then that is a choice, but like
> > any choice there are consequences. I feel like we already have allowed much
> > more change then the current HTML5 spec allows so I think we should wait and
> > see how the things we've speced out so far pan out before adding something
> > like this.
> 
> i think a very common use case will involve ads from ad providers that will
> be multiplexed video and english-only audio (and no captions), and primary
> content with separated video and audio, with audio in multiple languages,
> and caption tracks in multiple languages.  video engines would already have
> to support being reconfigured on the fly under Javascript control. having it
> be automatic solves the problem once properly, decreases complexity for the
> JS programmer, and improves the user experience.

I think this is a bad idea. Ad providers won't do this if we don't let them. I don't think this will simplify things. You would still have to provide mappings for how the selected tracks change over time. This mapping would be particularly fragile if the application allowed any sort of access to the video element xxxTracks lists. Allowing track count & type changes within a SourceBuffer also open a can of worms with reguard to how to map tracks with eachother across these boundaries. 

I also think you are overestimating how many SourceBuffers initial implementations of MSE are going to allow. The UAs have the ability to reject SourceBuffer creation and will likely do so based on the limitations of the existing media engines. Ad providers should not assume that they can just create as many SourceBuffers as they want.

I think we should stick with the existing restrictions for v1, get some implementation experience, and then see if it still makes sense to loosen the restrictions to allow this sort of stuff in.
Comment 13 Adrian Bateman [MSFT] 2013-03-26 22:01:35 UTC
(In reply to comment #12)
> I think we should stick with the existing restrictions for v1, get some
> implementation experience, and then see if it still makes sense to loosen
> the restrictions to allow this sort of stuff in.

I agree. We want something that is standalone and hits the key use cases. Implementation experience will inform what we need to do next. This capability isn't required to make MSE useful.

(In reply to comment #11)
> i think a very common use case will involve ads from ad providers that
> will be multiplexed video and english-only audio (and no captions),
> and primary content with separated video and audio, with audio in
> multiple languages, and caption tracks in multiple languages.  video
> engines would already have to support being reconfigured on the fly
> under Javascript control. having it be automatic solves the problem
> once properly, decreases complexity for the JS programmer, and
> improves the user experience.

If there are API design constraints that you think make it impossible to loosen this later then I'd be interested in hearing concrete proposals to fix that. I recommend closing this bug and having people pursue separate bugs for the sub-issues as appropriate (as I suggested in comment #5).
Comment 14 Kevin Streeter 2013-03-31 16:44:14 UTC
 
> I think this is a bad idea. Ad providers won't do this if we don't let them.
> I don't think this will simplify things. You would still have to provide
> mappings for how the selected tracks change over time. This mapping would be
> particularly fragile if the application allowed any sort of access to the
> video element xxxTracks lists. Allowing track count & type changes within a
> SourceBuffer also open a can of worms with reguard to how to map tracks with
> eachother across these boundaries. 
> 

Unfortunately a publisher doesn't always have a lot of control over the ad provider.  

While often a publisher will use "1st party" ads that they encode (meaning they can make sure its matches their content), almost all publishers have a lot of unsold ad inventory in their streams.  Publishers will use ad networks and other 3rd-party sources of ad content to fill the remnant inventory.

When using a 3rd-party source for ads the publisher has no connection to the ad provider, and no way to enforce that the ad content matches the primary content.  These restrictions will essentially make it impossible to use 3rd-party sources for ads.
Comment 15 Aaron Colwell 2013-04-08 21:33:51 UTC
(In reply to comment #14)
> Unfortunately a publisher doesn't always have a lot of control over the ad
> provider.  
> 
> While often a publisher will use "1st party" ads that they encode (meaning
> they can make sure its matches their content), almost all publishers have a
> lot of unsold ad inventory in their streams.  Publishers will use ad
> networks and other 3rd-party sources of ad content to fill the remnant
> inventory.
> 
> When using a 3rd-party source for ads the publisher has no connection to the
> ad provider, and no way to enforce that the ad content matches the primary
> content.  These restrictions will essentially make it impossible to use
> 3rd-party sources for ads.

I think this is a little overstated. These restrictions simply keep the status quo for what is currently possible with HTML5. If the Ad networks and publishers don't coordinate their efforts then, yes, they don't get the seamless splicing benefits that MSE can provide right now. If they do work together, then they can benefit. I think this is a reasonable compromise for a v1 since we enable some improvements, but don't necessarily improve every conceivable possibility.

I think it is important to get an initial version of the spec done and gain implementation experience before we add more complexity by easing these restrictions. Also there is nothing stopping people from experimenting with extensions to MSE that do what you want. I just don't think it belongs in the v1 spec.