This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22785 - there should be a way to schedule audio/video track selection changes in HTMLMediaElements
Summary: there should be a way to schedule audio/video track selection changes in HTML...
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P2 enhancement
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 22137
  Show dependency treegraph
 
Reported: 2013-07-23 22:04 UTC by Michael Thornburgh
Modified: 2016-04-20 19:39 UTC (History)
12 users (show)

See Also:


Attachments

Description Michael Thornburgh 2013-07-23 22:04:28 UTC
As part of the Media Source Extensions (MSE) work, specifically in the context of ad insertion and media segment splicing, a need for scheduling the selection or deselection of video and audio tracks was identified.  Such a mechanism would allow for seamlessly switching between the video and audio track(s) of a main program having a first track configuration and the video and audio track(s) of a secondary program (such as an ad) having a different track configuration.  For additional information, see bug 22137.

One use case for MSE is insertion of an advertisement into a media stream. It will often be the case that an advertisement will have a different track configuration from the main program. For example: a main program may comprise demultiplexed video and audio stream files, represented in MSE as two SourceBuffers, with one SourceBuffer contributing a VideoTrack and one contributing an AudioTrack. However, the advertisement may comprise a single multiplexed video and audio stream file, represented in MSE as one SourceBuffer contributing both a VideoTrack and an AudioTrack.

Currently, selecting and deselecting the video and audio tracks of an HTMLMediaElement from JavaScript takes effect immediately.  However, JavaScript execution can't be guaranteed to be synchronized to within one video frame or one audio sample, for a seamless transition from the tracks of a main program to the tracks of an advertisement. JavaScript execution latency could cause media playback stalls or glitches.

The editors of the MSE spec indicated that a general solution to this problem would be to have a mechanism to schedule track selections/deselections along the media playback timeline, so that the media playback engine can effect a frame-accurate seamless track transition without a time-critical JavaScript execution.

Since media tracks can be added and deleted from an HTMLMediaElement over time, it seems that a natural way to accomplish this would be to add something like a "selectedRanges" property to VideoTrack, and something like an "enabledRanges" property to AudioTrack. Unfortunately, the TimeRanges interface doesn't support modifications.
Comment 1 Aaron Colwell 2013-07-23 23:04:07 UTC
(In reply to comment #0)
> Since media tracks can be added and deleted from an HTMLMediaElement over
> time, it seems that a natural way to accomplish this would be to add
> something like a "selectedRanges" property to VideoTrack, and something like
> an "enabledRanges" property to AudioTrack. Unfortunately, the TimeRanges
> interface doesn't support modifications.

FYI, I have always thought of this feature as being implemented as a form of metadata text track. It seems like representing switches as TextTrackCues would provide most of the functionality we'd need. I don't feel strongly one way or the other about this, I just figured I'd share some preliminary thoughts I had.
Comment 2 Silvia Pfeiffer 2013-07-23 23:22:11 UTC
I agree with Aaron - the way to implement this is to define a TextTrack (e.g. through a WebVTT file that is included through a <rack> element) which is of kind="metadata" and has cues of the duration of the advertisement. The cues can include the track identifier(s) to be activated to play the advertisement and the track numbers to switch back to once finished. The onenter and onexit events allow JavaScript to execute the track switching at the start and end time of the events, see http://www.w3.org/html/wg/drafts/html/master/single-page.html#texttrackcue .
Comment 3 Robert O'Callahan (Mozilla) 2013-07-24 03:13:22 UTC
Can you give an example showing why JS track selection might be needed in a case like this?
Comment 4 Silvia Pfeiffer 2013-07-24 03:15:55 UTC
(In reply to comment #3)
> Can you give an example showing why JS track selection might be needed in a
> case like this?

"you" being Michael, I assume
Comment 5 Giuseppe Pascale 2013-07-24 07:30:21 UTC
I suspect that doing track selection in JS may not guarantee the requirement of having a seamless transitions, isn't it?
Comment 6 Giuseppe Pascale 2013-07-24 07:37:04 UTC
On the other end, if you "describe" the ad brakes in a TextTrack, the player could interpret it ahead of playback and do the switches without requiring user scripts to do anything. 

events are still useful if the application wants to be notified of the ad start/end, e.g. for analytics purposes.
Comment 7 Michael Thornburgh 2013-07-24 17:38:08 UTC
there are multiple ways today of doing track switching in JavaScript. as Sylvia points out, one way to align track switches with the right time in the media playout is via a metadata TextTrack having cues describing desired track switches.

any scheme using JavaScript for track switching will have multiple issues:

  1) JavaScript execution can't be guaranteed to be timely or to take effect at a precise frame/sample moment, particularly on platforms/implementations where media playback happens in a hardware component running independently of the browser & JavaScript.

  2) in the case of MSE, where some of the tracks might not have any data depending on whether you're in the main program or advertisement section, and where the media segments' durations might not be exactly as indicated in a manifest (for example, missing a few frames), playback could stall before reaching a switch point so it would never be reached, and the switch to tracks with media to play (and the disabling of tracks with no media to play, thereby keeping those tracks from causing a stall) wouldn't happen.

for #1, having the switches happen automatically, and under the control of the media engine, allows the switches to happen seamlessly at exactly the desired points.  JavaScript execution latency and the likely decoupling of the JS environment from the media engine all but guarantees that there will be glitches.  the glitch could include showing a few frames of an ad slate/slug from the main program stream, which looks sloppy and gives a poor user experience.

for #2, the media engine can know ahead of time that a switch is scheduled and can allow for a little slop to avoid a stall.
Comment 8 Aaron Colwell 2013-07-24 17:51:21 UTC
(In reply to comment #6)
> On the other end, if you "describe" the ad brakes in a TextTrack, the player
> could interpret it ahead of playback and do the switches without requiring
> user scripts to do anything. 

Yes. My intent was that the cues "describe" the switches to the media engine. The expectation is that the application provides these descriptions "early enough" that the media engine can use them to provide seamless transitions. If the application tries to add a cue "too close" to the current playback position then there are very limited guarentees about how seamless the transition will be. This would provide some wiggle room for any sort of buffering & delays that may be present in the media engine or JavaScript engine. "Too close" is probably somewhere between 20-500ms depending on the media engine.

> 
> events are still useful if the application wants to be notified of the ad
> start/end, e.g. for analytics purposes.

That was my thought as well.
Comment 9 Robert O'Callahan (Mozilla) 2013-07-24 21:51:30 UTC
It would still be helpful for someone to address comment #3.
Comment 10 Michael Thornburgh 2013-07-24 22:05:07 UTC
(In reply to comment #3)
> Can you give an example showing why JS track selection might be needed in a
> case like this?

track selection from JavaScript is already possible, using the "selected" property on VideoTrack and "enabled" property on AudioTrack.

as described in the initial bug description, when using MSE and inserting an advertisement into a main program where the advertisement has a different track configuration, different MSE SourceBuffers will be used, and there will be two VideoTracks (one from the main program and one from the advertisement) and two AudioTracks (one from the main program and one from the advertisement) on the HTMLMediaElement.

on a transition from the main program content to the advertisement, it is necessary to select the ad's video and audio tracks, and unselect the main program's tracks.  when it's time to return to the main program, the ad's tracks must be unselected and the main program's tracks selected.

this bug requests a means to schedule automatic track selection/deselection to happen at pre-arranged times, without just-in-time JavaScript involvement.
Comment 11 Robert O'Callahan (Mozilla) 2013-07-24 22:18:22 UTC
Why can't the media element automatically play the tracks it has data for, without JS intervention?
Comment 12 Michael Thornburgh 2013-07-24 22:34:23 UTC
(In reply to comment #11)
> Why can't the media element automatically play the tracks it has data for,
> without JS intervention?

if the track is enabled but there is no data, it will cause a playback stall rather than rendering "nothing".

there may be periods overlap (particularly in the case of a main program having an ad slate/slug, with an advertisement being overlaid from a different track).
Comment 13 Robert O'Callahan (Mozilla) 2013-07-24 23:16:34 UTC
Maybe you just need a way to alter that behavior, then.
Comment 14 Giuseppe Pascale 2013-07-25 07:32:07 UTC
(In reply to comment #11)
> Why can't the media element automatically play the tracks it has data for,
> without JS intervention?

I assume you can have multiple ads track to switch to, so you need to say which one to use.

Also, I'm wondering if it's always the case that the switch happens in places where the "main" track has no data. It could well be that there are data for the time slot and the app may decide on the fly if keep playing the original content or switch to an ad for a given time and then back to the main track
Comment 15 Silvia Pfeiffer 2013-08-05 12:13:57 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > On the other end, if you "describe" the ad brakes in a TextTrack, the player
> > could interpret it ahead of playback and do the switches without requiring
> > user scripts to do anything. 
> 
> Yes. My intent was that the cues "describe" the switches to the media
> engine. The expectation is that the application provides these descriptions
> "early enough" that the media engine can use them to provide seamless
> transitions. If the application tries to add a cue "too close" to the
> current playback position then there are very limited guarentees about how
> seamless the transition will be. This would provide some wiggle room for any
> sort of buffering & delays that may be present in the media engine or
> JavaScript engine. "Too close" is probably somewhere between 20-500ms
> depending on the media engine.

For something like this to work, we'd need to define a new @kind of text track that has this expected behavior - namely to switch between media tracks.

I think given the work on Web Animations, we may not even need need the use of a WebVTT track to pre-schedule transitions between different video tracks - Web Animations should be capable of providing that functionality: http://dev.w3.org/fxtf/web-animations/ together with the use of track fragments to specify which track to switch to: http://www.w3.org/TR/2011/CR-media-frags-20111201/#naming-track .
Comment 16 Giuseppe Pascale 2013-08-08 09:32:46 UTC
I'm not familiar with this spec (just starting to read it now). Would you mind summarizing how you expect it to be used for the use case discussed this? Can be used as is or would it require changes?
Comment 17 Michael Thornburgh 2013-08-08 23:55:45 UTC
(In reply to comment #15)
[...]> 
> I think given the work on Web Animations, we may not even need need the use
> of a WebVTT track to pre-schedule transitions between different video tracks
> - Web Animations should be capable of providing that functionality:
> http://dev.w3.org/fxtf/web-animations/ together with the use of track
> fragments to specify which track to switch to:
> http://www.w3.org/TR/2011/CR-media-frags-20111201/#naming-track .

from my initial reading of Web Animations, this approach on first glance seems appealing, if one could animate a track element's enabled/selected attribute from false to true at a particular time, and if the HTMLMediaElement's play timeline could be the animation timeline/clock.

it didn't seem like the enabled/selected attribute would be among the class of "animate-able" attributes.  if you needed a timed event, then you're back to the "JavaScript driving track changes" problem.

i think the timing model of Web Animations, specifically of the time being sampled and items animating instantaneously according to the (transformed) time at that instant, won't work for seamless track switches, because in order to effect a seamless transition during playback the media engine will have had to buffer up data from the to-be-selected tracks and begun decoding ahead of the switch.  as Aaron pointed out earlier, the media engine may need to know significantly ahead of time (he stated 20-500ms) to effect a seamless transition.
Comment 18 Jerry Smith 2013-11-02 00:55:00 UTC
I like the proposal to have a cue for video track changes for ad insertion scenarios.  This was discussed during MSE spec development, but not implemented.  Pre-buffering for ads is possible today, but its problematic for a JS client to smoothly switch from one currently playing sourceBuffer to another.  A new cue could allow JS to schedule changes between two or more sourceBuffers, with switches smoothly executed on enter/exit times by the UA.  The cue could also trigger enter/exit events to allow any JS specific changes as well, including removing played content from sourceBuffers. 

I also believe a provision like this would be helpful in resolving a concern for HbbTV using MSE.  There may be some time urgency in resolving this.
Comment 19 Arron Eicholz 2016-04-20 19:39:37 UTC
HTML5.1 Bugzilla Bug Triage: 

This bug constitutes a request for a new feature of HTML. Our current guidelines, rather than track such requests as bugs or issues, is to create a proposal for the desired behavior, or at least a sketch of what is wanted (much of which is probably contained in this bug), and start the discussion/proposal in the WICG (https://www.w3.org/community/wicg/). As your idea gains interest and momentum, it may be brought back into HTML through the Intent to Migrate process (https://wicg.github.io/admin/intent-to-migrate.html).