Re: timing model of the media resource in HTML5

Deep breath...

On Wed, 27 Jan 2010 12:57:51 +0100, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> Ken Harrenstien from Google wrote this to me (and allowed me to quote
> him, which is why I cc-ed him):
>> The principal reason for wanting to allow explicit markup is latency
>> and infrastructure overhead.
>>
>> Without the markup, the only way to know what's in-band is to start
>> streaming the video.  How long will it take to find out what kinds of
>> captions it contains and whether they are supported?  How much
>> bandwidth and setup is wasted in the process?  At Google we care very
>> deeply about those things.
>>
>> I think this information is very, if not exactly, analogous to the
>> other markup provided for <video>. I need it to tell immediately if the  
>> video is even
>> playable/watchable for me (as a hearing-impaired person).
>
> I believe he has a strong case.

I don't agree that this is an important enough use case to add extra HTML  
markup for. The issue seems to be that perhaps finding the tracks in the  
resources is slow. If that's the case and they want to immediately present  
a menu of all available tracks, I suggest embedding the information using  
the data-* attributes or similar. Since the information can easily get out  
of sync with the actual resource when markup is copied around, I wouldn't  
be willing to rely on any such markup for the browser native controls or  
context menus. Native controls that get their information directly from  
the resource and an API to expose track information for scripted controls  
is the way to do, in my opinion. Getting the information would be done as  
a part of reaching HAVE_METADATA and I really don't think this will add  
noticeably to load time.

> Also, it is really important to expose the role (and the language)
> that a track takes on within a multitrack media file, such that a UA
> can decide whether to display a track or not and where to display it.
> I do believe that the control of which tracks are being displayed
> should stay with the UA and not be forced by the file or the media
> framework. I cannot see a better way for exposing this functionality
> uniformly across multiple media file types other than explicit markup.

Again, I'd rather rely on the information in the media resources.

> If we buried the track information in a javascript API, we would
> introduce an additional dependency and we would remove the ability to
> simply parse the Web page to get at such information. For example, a
> crawler would not be able to find out that there is a resource with
> captions and would probably not bother requesting the resource for its
> captions (or other text tracks).

Surely, robots would just index the resources themselves?

> Eric further said:
>> It seems to me that because it will require
>> new specialized tools get the information, and because it will be really
>> difficult to do correctly (ten digit serial numbers?), people are  
>> likely to
>> just skip it completely.
>
> There is a need for addressing the track in a unique way, i.e.
> javascript needs to be able to tell the media framework exactly which
> track it is talking about (e.g. to turn it on or off).

The API for exposing tracks should simply have something like .enable()  
for each track object (or similar), there's no need to expose unique IDs  
to do this.

It might be needed if we want to have e.g. <video tracks="video0,audio3">  
or similar. Media Fragments URI is supposed to provide a syntax for  
addressing individual tracks, perhaps we can hook into that at some level?

> Incidentally, we do need to develop the javascript API for exposing
> the video's tracks no matter whether we do it in declarative syntax or
> not. Here's a start at a proposal for this (obviously inspired by the
> markup):
>
>   video.numberTracks(); -> return number of available tracks
>   video.firstTrack(); -> returns first track ("first" to be defined -
> e.g. there is no inherent order in Ogg)
>   video.lastTrack(); -> returns last track ("last" to be defined)
>   track.next(); -> returns next track in list
>   track has the following attributes: type, ref, lang, role, media
> (and the usual contenders, e.g. id, style)

Yes, we need something like this.

> Philip said:
>> <source> is a void element, so this markup does not degrade nicely in  
>> any
>> shipped <video>-capable browsers. Try
>> <http://software.hixie.ch/utilities/js/live-dom-viewer/saved/318>.  
>> Firefox
>> puts the second <source> element inside nested <track> elements and  
>> Safari
>> just drops it.
>
> That is disappointing. This means we have to try and find a different
> way of marking it up. Maybe we can just throw a <tracks> element
> underneath each <source> element, as in this:
>
>  <video>
>   <source src="video.ogv" type="video/ogg">
>   <tracks>
>    <track id='ogg_v' role='video' ref='serialno:1505760010'></track>
>    <track id='ogg_a' role='audio' lang='en'  
> ref='serialno:0821695999'></track>
>    <track id='ogg_ad' role='auddesc' lang='en'
> ref='serialno:1421614520'></track>
>    <track id='ogg_s' role='sign' lang='ase'  
> ref='serialno:1413244634'></track>
>    <track id='ogg_cc' role='caption' lang='en'
> ref='serialno:1421849818'></track>
>  </tracks>
>  <source src="video.mp4" type="video/mp4">
>  <tracks>
>    <track id='mp4_v' role='video' ref='trackid:1'></track>
>    <track id='mp4_a' role='audio' lang='en' ref='trackid:2'></track>
>  </tracks>
>  <overlay>
>    <source src="en.srt" lang="en-US">
>    <source src="hans.srt" lang="zh-CN">
>  </overlay>
> </video>
>
> Is it guaranteed that the order is retained and therefore, can we
> guarantee the association of the tracks element to the previous source
> element?

This is technically possible, but a bit odd. Before going any further with  
this we should establish that there is an actual need for marking up  
tracks like this, which I have yet to be convinced of.

> An alternative would be to have such resource composition stored in a
> separate file - a resource composition xml file (?) - on the server
> and to link to it in the <source> element (or the <video> element if
> there's only one). Then, it's not polluting the html markup and the UA
> doesn't have to parse a lengthy media file but rather only has to
> parse a separately retrieved xml file. For example:
>
> <video>
>  <source src="video.ogg" type="video/ogg" rcf="video.ogg.rcf">
>  <source src="video.mp4" type="video/mp4" rcf="video.mpg.rcf">
>  <overlay>
>    <source src="en.srt" lang="en-US">
>    <source src="hans.srt" lang="zh-CN">
>  </overlay>
> </video>

Would this be any different from linking to the resource directly, which  
quite certainly knows its composition best?

> Now, let's talk about the <overlay> element.
>
> I am not too fussed about renaming <itextlist> to <overlay>. I can see
> why you would go for this name - because most text will be rendered on
> top of or next to the video generally. It essentially provides a "div"
> into which the data can be rendered, rather than an abstract structure
> like my "itextlist". My intention was to keep the structure and the
> presentation separate from each other. But if it's general agreement
> that "overlay" is a better name, I'm happy to go with it. (Also, I'm
> happy to rename "itext" to "source", since that was already what I had
> started doing in
> http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/
> , where I've also renamed "category" to "role").
>
> I'm assuming that in an example like this one below (no matter in
> which way the tracks are exposed), the caption track of the ogg file
> would be another track in the <source> element if the UA chose that
> video.ogv file over the video.mp4 file?
>
> <video>
>   <source src="video.ogv" type="video/ogg">
>   <tracks>
>    <track id='ogg_v' role='video' ref='serialno:1505760010'></track>
>    <track id='ogg_a' role='audio' lang='en'  
> ref='serialno:0821695999'></track>
>    <track id='ogg_ad' role='auddesc' lang='en'
> ref='serialno:1421614520'></track>
>    <track id='ogg_s' role='sign' lang='ase'  
> ref='serialno:1413244634'></track>
>    <track id='ogg_cc' role='caption' lang='en'
> ref='serialno:1421849818'></track>
>  </tracks>
>  <source src="video.mp4" type="video/mp4">
>  <tracks>
>    <track id='mp4_v' role='video' ref='trackid:1'></track>
>    <track id='mp4_a' role='audio' lang='en' ref='trackid:2'></track>
>  </tracks>
>  <overlay>
>    <source src="en.srt" lang="en-US">
>    <source src="hans.srt" lang="zh-CN">
>  </overlay>
> </video>
>
> I.e. it would be parsed to something like:
>
> <video>
>   <source src="video.ogv" type="video/ogg">
>   <overlay>
>    <source src="en.srt" lang="en-US">
>    <source src="hans.srt" lang="zh-CN">
>    <source ref='serialno:1421849818' lang="en">
>  </overlay>
> </video>
>
> This makes it an additional caption track to display. Is this right?
> There are no alternative choices between tracks?
>
>
> I would actually suggest that if we want to go with <overlay>, we need
> to specify different overlays for different types of text. In this way
> we can accommodate textual audio descriptions, captions, subtitles
> etc. Then, I would suggest that for every type of text there should
> every only be one <source> displayed. It is not often that you want
> more than one subtitle track displayed. You most certainly never want
> to have more than one caption track displayed and never more than one
> textual audio description track. But you do want each one of them
> displayed in addition to the other.
>
> For example:
>
> <video src="video.ogg">
>   <overlay role="caption"
> style="font-size:2em;padding:1em;text-align:center; display: block;">
>     <source src="en-us.srt" lang="en-US">
>     <source src="en.srt" lang="en">
>   </overlay>
>   <overlay role="tad" style="z-index: -100; display: block;"
> aria-live="assertive">
>     <source src="tad-en.srt" lang="en">
>     <source src="tad-de.srt" lang="de">
>   </overlay>
>   <overlay role="subtitle"
> style="font-size:2em;padding:1em;text-align:center; display: block;">
>     <source src="de.srt" lang="de">
>     <source src="sv.srt" lang="sv">
>     <source src="fi.srt" lang="fi">
>   </overlay>
> </video>
>
>
> BTW: somewhere along the discussion between Philip and Maciej you lost
> me, so no comments on those.

I agree on adding something like role="". On the naming, Maciej pointed  
out and I now agree that <overlay> is presentational and not really a  
brilliant choice. I think this should be controlled by CSS in some way or  
anthoer.

What we agree on so far seems to be:

<video src="video">
   <sourcelist role="subtitle">
     <source src="subtitles.en.srt" lang="en">
   </sourcelist>
</video>

Where <sourcelist> is whatever name we can agree on. Maybe something that  
sounds like it has to do with timed text, I don't know.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Thursday, 28 January 2010 13:40:52 UTC