Re: Tech Discussions on the Multitrack Media (issue-152) from David Singer on 2011-02-16 (public-html@w3.org from February 2011)

From: David Singer <singer@apple.com>
Date: Wed, 16 Feb 2011 16:13:34 +0800
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Jonas Sicking <jonas@sicking.cc>, Mark Watson <watsonm@netflix.com>, Philip Jägenstedt <philipj@opera.com>, public-html <public-html@w3.org>
Message-Id: <F0A79CE5-331E-4913-9563-08E4421457B3@apple.com>
On Feb 16, 2011, at 15:53 , Silvia Pfeiffer wrote:

> On Wed, Feb 16, 2011 at 6:33 PM, David Singer <singer@apple.com> wrote:
>> I think we might be in agreement here, but I am not being clear.
>> 
>> On Feb 16, 2011, at 15:22 , Silvia Pfeiffer wrote:
>> 
>>>> b) and the timing does not need to change, but the audio description has, as part of its mix-down, the appropriate portions of the main audio, then make the video the primary source, and offer a <track> which has multiple sources, one or more of which are the plain audio, others are the audio description
>>> 
>>> That would require the author to pull the main video into two separate
>>> resources
>> 
>> no...I am not being clear.
>> 
>> <video src="just the video, madam" />
>> <track src="primary audio" />
>> <track src="audio description mixed with the right bits of primary audio" kind="audio-desc-of-video" />
> 
> I think I did understand correctly. Now you have three resources: one
> with just video, one with just main audio and one with mixed-in audio
> description, rather than just two: the main resource (a/v) and the
> mixed-in audio description.

sorry, when you said "the main video into two" I took that as meaning the media type, but you meant the main program.  My misunderstanding.

yes, but this is a consequence of needing to replace the audio.  The alternative would be to have some instruction somewhere to mute the audio in the muxed resource, in the case that audio description is used.  Now we have a different de-optimization; the client is downloading audio that it doesn't want (bandwidth waste) and we have to find some way to say the muting instruction.

> 
> What I am saying is that this requires the author to pull the main
> resource (a/v) into two resources: one a and one v, which is not the
> typical way in which <video> deals with resources. Thinking about
> formats (mp4, webm, ogg), this can result in quite an explosion of
> files.
> 
> 
>>> 
>>> In case of a mix-down audio description - which I regard as the 20%
>>> use case
>> 
>> Oh.  I think it's the 90% use case.  Usually there isn't enough quiet time in the primary audio to give the audio description.
> 
> Yes, but you don't have to provide the audio description as a mixed-in
> resource, i.e. you can very well provide just the recorded speaker and
> do the mixing by the browser. When I have authored audio descriptions
> in the past, I have recorded my voice while listening to the main
> audio on the headphones, so I had a separate audio track for the audio
> description. I would think that's the easiest and best way of doing it
> because it does not degrade the main audio and allows for the "clear
> audio" use case.

But what do you present to the user who wants audio description?  Both the main audio and the audio description?

> 
> 
>>>> c) and the timing needs to change; offer two or more sources, one or more of which have the normal audio and normal timing, and one or more of which have the audio description with the revised timing
>>> 
>>> The problem here is that not just the audio changes timing, but also
>>> the video.
>> 
>> again, I am being unclear
>> 
>> <video>
>>  <source src="the usual normal program, muxed audio and video" />
>>  <source src="a described program, with timing changes in it, muxed audio and video" kind="audio-desc-of-video" />
>> </video>
> 
> This is not possible. Because the <source> elements are only selected
> based on the first codec that the browser understands.

Well, I both made a mistake and am suggesting that we add kind-based selection as well.  The mistake is that the more specific source has to be first:

<video>
 <source src="a described program, with timing changes in it, muxed audio and video" kind="audio-desc-of-video" />
<source src="the usual normal program, muxed audio and video" />
 </video>

so that it gets skipped if the user doesn't want audio-desc-of-video.

>  

> They have
> nothing to do with multitrack. You would have to put them inside a
> <track> element if you want to follow Eric's model, but then you have
> to alternative media resources in a <track> element with no resource
> on the <video>:
> 
> <video>
>  <track>
>    <source src=""the usual normal program, muxed audio and video" />
>  </track>
>  <track>
>    <source src="a described program, with timing changes in it, muxed
> audio and video" kind="audio-desc-of-video" />
>  </track>
> </video>
> 

But that expresses that the two can be played together (tracks are additive) which is not the case.  Tracks are additive.

> 
>>>> while it is technically true that the user-agent may be able to make all sorts of ingenious displays, it's not a great system design to assume that the UA and the user will have the time or skills to make the choices over lots of ingenious possibilities.
>>> 
>>> We do in fact have to discuss how the display of multiple videos would
>>> work. Would they be expected to be displayed as picture-in-picture?
>> 
>> I'd love to be able to give them display areas, and adjust the page as needed to suit.  That's why I originally thought of media queries; they can be used as needed to adjust the entire page layout, and also 'style' the tracks in the video.
> 
> Can you give an example how you think the media query would achieve
> this? I don't follow.


I am not enthused about a fully-worked example, but if the main video and the tracks are CSS visible, and have CSS-managed rendering areas, then I could do this, for a horizontal layout apportioning the rendering area:

Program without sign-language shown (user does not have the show sign-language preference set):

  <video -- 50% of rendering width> <div takes 30%> <advert div takes 20%>

Program with sign-language

  <video 50% > <track taking 10%> <div takes 25%> <advert takes 15%>

Here I have styled the div and the advert with two different rules depending on the user's desire for sign-language, and given the sign-language video 10% as well.

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Wednesday, 16 February 2011 08:14:46 UTC