Re: Proposal for Audio and Video Track Selection and Synchronisation for Media Elements from Philip Jägenstedt on 2011-03-24 (public-html@w3.org from March 2011)

From: Philip Jägenstedt <philipj@opera.com>
Date: Thu, 24 Mar 2011 10:58:02 +0100
To: public-html@w3.org
Message-ID: <op.vsudy0dasr6mfa@localhost.localdomain>
On Wed, 23 Mar 2011 03:10:51 +0100, Ian Hickson <ian@hixie.ch> wrote:

> On Tue, 22 Mar 2011, Philip Jägenstedt wrote:
>> >
>> > Well if you know ahead of time which track you want you can just use
>> > fragment identifiers in the 'src' attribute, but yes, worse case it  
>> would
>> > look something like this:
>> >
>> >  var controller = new MediaController();
>> >  var video1 = document.createElement('video');
>> >  video1.controller = controller;
>> >  video1.src = 'video.foo';
>> >  video1.onloadedmetadata = function () {
>> >    video1.videoTracks.select(1);
>> >  };
>> >  var video2 = document.createElement('video');
>> >  video2.controller = controller;
>> >  video2.src = 'video.foo';
>> >  video2.onloadedmetadata = function () {
>> >    video2.muted = true;
>> >    video2.videoTracks.select(2);
>> >  };
>> >
>> >
>> > > This seems like it would work, but what if steps 2 and 3 happen in  
>> the
>> > > other order?
>> >
>> > So long as they happen in the same task, it doesn't matter; the  
>> loading of
>> > the media resource happens asynchronously after the task ends.
>>
>> It's inevitable that someone will set video2.controller in
>> video2.onloadedmetadata.
>
> Sure, but by that point they have much bigger problems (e.g. the videos
> will be playing out of sync).

I assumed that when you assign a common controller to two videos, they  
would be synchronized from then on, isn't that the case? As long as none  
of them have begun playing yet, why would they become out of sync? Is this  
part of the spec commented out so that I just haven't read it yet?

>> In your proposal, the decoding of two video streams from the same  
>> resource is
>> not as tightly coupled, one can only make educated guesses.
>
> There's no guesswork. You know it's the same resource, and you know
> they're being synchronised. What is there to guess about?

The problem is that for a single resource there are basically two modes of  
operation:

1. n streams playing in sync at the same playback rate and offset.

2. n streams playing at different playback rates or offsets, although  
possibly still synchronized by a controller.

With traditional media frameworks, these two modes would be implemented  
differently:

1. 1 decoding pipeline with 1 demuxer and n decoders (this is how track  
switching is usually implemented).

2. n decoding pipelines, each with 1 demuxer and 1 decoder (somewhat  
wasteful).

Since nothing prevents script authors from switching between these modes  
at any time, one will either have to always stick to the wasteful option  
2, or be prepared to tear down and set up decoding pipelines at any time  
during playback. Making that transition transparent would be very  
challenging. The guessing I was referring to is guessing when it's safe to  
stick to one mode or the other. It never is, right?

>> I'm not saying that this is necessarily a bad thing, I just want to make
>> sure that everyone is fully aware of the complexity. I think there are a
>> few sane ways of going about this:
>>
>> * Only allow setting the controller of an audio/video element while
>> readyState == HAVE_NOTHING, so that the browser knows when starting to
>> fetch a resource whether or not it's a good idea to share the decoding
>> pipeline between two elements.
>
> That doesn't help for the case where the tracks start in sync and the
> author suddenly advances one so it's offset, or changes the playback rate
> of a single track.

True.

>> * Make the connection more explicit by letting each video track expose a
>> Stream object which could be set to the .src of another video element.
>> The other video element would not be able to change the playbackRate, so
>> it would always be safe to reuse a single demuxer.
>
> We could do that, certainly. It seems a bit weird in that it makes a
> <video> into a feature with two orthogonal purposes, but I suppose that
> might be ok. (Maybe alternatively we could have a way to get a stream  
> back
> from an XMLHttpRequest and we could use that, it would make more sense.)
>
> However, I'm not sure it really is any better than comparing the URL,
> which you already have to do anyway to share the download in cases such  
> as
> this in other parts of the platform -- and once you're sharing the
> download, sharing the decoding pipeline seems like an obvious move.

It's obviously theoretically possible, but implementation complexity is  
the issue here.

>> OK, for the record I question the necessity of:
>>
>> * Synchronized playback of tracks of the same resource at different
>> playbackRates and offsets.
>>
>> * Synchronized playback of looping resources.
>
> From the spec point of view support for these is almost free, so I'm
> reluctant to drop support for them unless it's really hard to implement.
> From the implementation point of view, it seems that there's nothing
> especially different about supporting that than supporting  
> synchronisation
> of separate files. Could you elaborate on this point?

The first point is explained above.

Synchronized looping just seems like a somewhat odd feature that adds  
implementation complexity, so I'd like to see some concrete use cases. The  
metronome example isn't very realistic, because music performed by people  
isn't at a constant BPM (beats/min), so I don't think the result would be  
very impressive. If that's the use case, we'd be better of with an Audio  
API that allows one to do adaptive BPM detection and sync the metronome to  
that.

Another problem with looping is that one has to know the exact duration  
(down to the sample) for it to work when seeking, and that information  
sometimes just isn't available without decoding the whole file and  
counting the samples.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Thursday, 24 March 2011 09:58:37 UTC