22137 – changes in number of audio tracks during advert insertion

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22137 - changes in number of audio tracks during advert insertion

Summary: changes in number of audio tracks during advert insertion

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	Media Source Extensions (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Adrian Bateman [MSFT]
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:	PRE_LAST_CALL
Keywords:

Depends on:	22785
Blocks:
	Show dependency tree / graph

Reported:	2013-05-22 14:14 UTC by Jon Piesing (OIPF)
Modified:	2013-07-25 14:45 UTC (History)
CC List:	9 users (show)

See Also:

Attachments

Description Jon Piesing (OIPF) 2013-05-22 14:14:25 UTC

In many cases, the main content will have more than one audio track. This could be multiple languages (depending on the market) as well as accessible audio such as either audio description for the visually impaired or clean audio for the hearing impaired (see section E.4 of TS 101 154 for the latter) or indeed both. In many cases, the audio in adverts may not include any of these tracks and could easily just include one track for audio in the most common language. 

In cases where more than one audio track is combined in the same byte stream (e.g. MPEG-2 Transport Stream), it is unlikely to be practical or economic to produce versions of each advert supporting each combination of audio tracks matching the content that the advert could be used with.

We request the W3C relax the restrictions in section 11 that “apply to all initialization segments in a byte stream” – particularly requirements #1 (“The number and type of tracks must be consistent”) and #3 (“Track IDs must be the same across initialization segments if the segment describes multiple tracks of a single type”) – at least for audio tracks.

NOTE: This issue arises from joint discussions between the Open IPTV Forum, HbbTV and the UK DTG. These organizations originally sent a liaison statement to the W3C Web & TV IG which is archived here;

https://lists.w3.org/Archives/Member/member-web-and-tv/2013Jan/0000.html (W3C member only link)

Comment 1 Aaron Colwell 2013-05-23 18:30:52 UTC

Marking all pre-Last Call bugs

Comment 2 Aaron Colwell 2013-05-24 00:40:40 UTC

It is not that simple to relax these requirements. Please provide a detailed description of how the UA is supposed to determine what tracks map to eachother across these boundaries. Indicate how the UA is supposed to maintain and update the set of selected/enabled tracks when this happens keeping in mind that on every boundary the track IDs can be completely different.

Here is an initial set of questions that I think need detailed answers before we can proceed:
1. What extra information is required to make this work? 
2. Is this information mandated in all the current file formats the spec supports? If not, what are the fallback behaviors?
3. How can these transitions be achieved with minimal to no burden the web application? 
4. What is acceptable fallback behavior for implementations that can't fully support these types of changes in track count?
5. What information in the UA is expected to guide selection of these extra tracks?

Comment 3 Aaron Colwell 2013-05-29 00:19:27 UTC

Over the weekend I was thinking about this a little more and I think I have come up with a way of relaxing the constant track count restriction to accomodate this use case without introducing the large increase in complexity that I am worried about. The initial idea is to limit mandatory behavior to only playing back a single track of each type. If we start here then it is pretty straight forward to allow some flexibility on this front w/o having to significantly increase complexity.

In the single track of each type it is pretty straight forward to specify rules for which track to select in each initialization segment. For example
- For each track type (i.e. audio, video, text) select the first track in the initialization segment that has its "default flag" set.
- If the initialization segment has tracks of a particular type but none of them have the "default flag" set, then select the first track, of that type, that appears in the initialization segment.
- The first initialiation segment appended must contain at least one track for each type that the application wants renderered. If a track for a specific type is not specified in this first initialization segment, then the UA does not have to render a track of that type if it happens to appear in a later initialization segment.

This provides a deterministic way to select tracks no matter how may exist in the initialization segment while at the same time bounds the number of tracks the UA actually has to handle. These rules could likely be expanded to multiple tracks of the same type by using the kind, label & language to figure out what tracks should be mapped to the same logical track. The multi-track case though would NOT be mandatory for implementations. This is how complexity for mobile and embedded devices could be mitigated.

I realize that this may not be ideal, but it is the best I can think of right now to contain complexity and give you a little of what you want.

Comment 4 Jon Piesing (OIPF) 2013-06-11 10:00:58 UTC

This comment is submitted on behalf of the participants in the joint discussions between the Open IPTV Forum, HbbTV and the UK DTG.

This is certainly an improvement. Does an HTML5 UA have a concept of user preferences for audio language or accessibility?

If it does not then what is proposed is OK.

If it does then permitting the selection of tracks based on these preferences would seem to give a better user experience than what is proposed.

Comment 5 Jerry Smith 2013-07-09 03:13:00 UTC

We've not yet closed on this topic, so are bringing it back for discussion.  We have concluded that the request is understandable and valid.  The experience could have some rough patches, but that is perhaps preferable to returning an error when the track count isn't precisely aligned.

There are some tricky aspects though, like playing a track in one language and having it flip to another during ad insertion.  This is a tough issue to solve, since it’s fundamental to the premise that some tracks can be abruptly dropped and later resumed.  Including language in the heuristic would help in cases where the playback language matches the operating system localization, which should be a common case.

Persisting user selections also seems tricky.  If we are playing content in one language, insert an ad and play another, what should be done with any user selected changes during the ad?  Presumably post-ad we normally would revert to the previously playing track that disappeared during the ad, but if the user makes a selection, should we persist it?  

Also, can we continue to prohibit dynamically increasing the number of audio tracks?  This would be more difficult to implement.

Comment 6 Adrian Bateman [MSFT] 2013-07-09 16:04:03 UTC

As discussed on the call today, I propose resolving this bug WONTFIX for MSE since it really points to an underlying HTML5 issue.

Instead we should file a HTML5 bug proposing to solve the solution of seamless track switching at a given presentation time. This would allow this use case to be solved with multiple SourceBuffers and still have seamless transitions.

Comment 7 Michael Thornburgh 2013-07-09 21:22:36 UTC

the original description for this bug is concerned with the case of "main content has several languages but the spliced ad only has one" case.

i believe a much more common case will be "main content has separated video and audio" (using separate SourceBuffers for video and audio to accommodate multiple language tracks) but the spliced ad will have multiplexed video and audio. in other words, during main content appending, one SourceBuffer will have one video track and zero audio tracks, and the other SourceBuffer will have zero video tracks and one audio track. during ad appending, one SB will have one video track and one audio track, and the other will have zero video and zero audio tracks.

in this case, not appending anything to the "other" SB will cause a stall.

one way to handle this would be to treat SourceBuffers as a filter & switch for append-time buffering behavior rather than being directly connected to decoder buffers. SBs could indicate the track configuration of the current initialization segment, and a means of controlling the mapping of those tracks onto HTMLMediaElement tracks could be given. if a SB had no mapped tracks, then it wouldn't stall playback if nothing was appended. otherwise data would have to be filtered and mapped through the SB and onto the media timeline for playback.

by decoupling SourceBuffers from the decoders, it could be reasonable to have more than two SourceBuffers supported in an implementation. one could have three, for example: one for demuxed video, one for demuxed audio, and one for muxed video+audio. the SB's tracks could be mapped/unmapped *at append time* to control what was laid down into the decode buffer for playback. this would allow for seamless (format changes notwithstanding) transitions without a time-critical timeline-synced JavaScript track change call.

if SBs are filters&mappers, they could be created and destroyed on-the-fly during append without affecting data already placed into the decode buffers. the media engine would be responsible for playing back the decode buffers as seamlessly as possible.

Comment 8 Aaron Colwell 2013-07-18 18:02:29 UTC

I have a concern that this is making a fundamental change rather late in 
the spec development process. It isn't clear to me that this is a must have
feature for v1. It also isn't clear to me why putting a mechanism like this
into MSE is better than a more generic track switching mechanism that could
work for all HTML5 media content. As far as I can tell your use case could be
easily handled with 3 SourceBuffers and a mechanism to specify what points in
the presemtation timeline track switches should occur.

Like I said on the call, I think we should keep the spec as is, and start
work on a new track switching spec that would allow a web application to
specify where seamless track switches should happen in the presentation
timeline. I believe this issue is orthogonal to MSE and should be handled
as such.

Comment 9 Michael Thornburgh 2013-07-22 21:26:19 UTC

(In reply to comment #8)
> [...] As far as I can tell your use case could be
> easily handled with 3 SourceBuffers and a mechanism to specify what points in
> the presemtation timeline track switches should occur.

i agree, except for two things: 1) earlier conversations on the list and calls indicated that browsers are unlikely to support more than two SourceBuffers per media element; 2) there needs to be some way to disable a SourceBuffer at append time so that not appending data to it won't cause a stall.

Comment 10 Aaron Colwell 2013-07-22 21:48:42 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > [...] As far as I can tell your use case could be
> > easily handled with 3 SourceBuffers and a mechanism to specify what points in
> > the presemtation timeline track switches should occur.
> 
> i agree, except for two things: 1) earlier conversations on the list and
> calls indicated that browsers are unlikely to support more than two
> SourceBuffers per media element; 

This will likely be true for initial MSE implementations, but as time goes on it is likely that more SourceBuffers will be supported as these implementations mature. I think this is a quality of implementation issue.

2) there needs to be some way to disable a
> SourceBuffer at append time so that not appending data to it won't cause a
> stall.

Just disable/unselect the tracks that are provided by that SourceBuffer. That should cause the SourceBuffer to be removed from activeSourceBuffers which in turn should prevent it from causing playback to stall.

Comment 11 Michael Thornburgh 2013-07-22 22:08:36 UTC

(In reply to comment #10)
> [...] 
> Just disable/unselect the tracks that are provided by that SourceBuffer.
> That should cause the SourceBuffer to be removed from activeSourceBuffers
> which in turn should prevent it from causing playback to stall.

it's my understanding that enabling/disabling tracks takes effect immediately (at playback time), not at append/buffering time.

if you scheduled a track disable for a particular point in playback, you might not be able to reach that point because playback could already be stalled for lack of data in a SourceBuffer.  if you let buffers run dry, then enable/disable the appropriate tracks, then begin appending again, you'd avoid the SourceBuffer stall but you'd have a playback stall/glitch.

if you enable/disable tracks while appending, tracks would enter or leave playback at the wrong times, potentially not presenting the intended tracks to the user or showing inappropriate tracks to the user.

Comment 12 Aaron Colwell 2013-07-22 22:46:30 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > [...] 
> > Just disable/unselect the tracks that are provided by that SourceBuffer.
> > That should cause the SourceBuffer to be removed from activeSourceBuffers
> > which in turn should prevent it from causing playback to stall.
> 
> it's my understanding that enabling/disabling tracks takes effect
> immediately (at playback time), not at append/buffering time.

Yes given the current HTML5 track selection mechanism. I believe there is an assumption here that append time is the only time that you'd want to decide when to adjust the track selection. I believe decoupling track selection from appending would be more flexible of a solution.

> 
> if you scheduled a track disable for a particular point in playback, you
> might not be able to reach that point because playback could already be
> stalled for lack of data in a SourceBuffer.  if you let buffers run dry,
> then enable/disable the appropriate tracks, then begin appending again,
> you'd avoid the SourceBuffer stall but you'd have a playback stall/glitch.

It is up to the application to make sure that there has been enough data appended and the proper tracks are selected to reach the next switch point. I agree that with the current HTML5 spec there would be a slight stall/glitch if switches were keyed off of the existing events. This is why I'm advocating for a separate time-based track switching spec, so that we have a generic way to signal to the media engine when these track changes should occur in a way that is independent of how the media is being sourced. All the application has to do is make sure the switches are described to the UA some reasonable time before the current position reaches the switch point. The web application could choose to do this alongside its appending activity or defer the decisions until a later time.

> 
> if you enable/disable tracks while appending, tracks would enter or leave
> playback at the wrong times, potentially not presenting the intended tracks
> to the user or showing inappropriate tracks to the user.

Yes. If the web application is not careful about how it constructs presentations, I agree this could happen.

Comment 13 Michael Thornburgh 2013-07-23 22:10:20 UTC

As discussed on the call today, i opened bug 22785 for a means to schedule track selection changes.

Comment 14 Aaron Colwell 2013-07-25 14:45:02 UTC

Resolving as WONTFIX now that bug 22785 has been filed to address this issue in another spec.