TextTrack types and TextTrackCue types

21 Sep 2016

See also: IRC log


atai2, addison, dsinger, bob, kirkwood, Joshue108, kinjim, satoshin, bob_bailey, paulj, stevez, cyril, zcorpan, francois, ericc, mav, ddahl, gadams, travis


<scribe> scribeNick: nigel


atai2: Intro, Requirements, Status, Actions


atai2: There's also a summary from Monday's Web & TV IG meeting

<paulj> Paul Jessop, Recording Industry Association of America

atai2: In HTML5 video element a track element can be added.
... it is missing a type attribute that tells the user agent which type the resource is that will be loaded for the track.
... So the markup doesn't let the UA know if it can support the issue.

cyril: For example if you have subtitles in different formats you have to make the http request to discover the type

atai2: The HTML markup is passed into an HTMLTrack object or one that supports that interface. That has
... an object that supports the TextTrack interface, that contains the content in TextTrackCues.
... Each TextTrackCue can be one caption with a defined entry and exit time.
... These are format independent interfaces for TextTrack and TextTrackCue.
... In the real world only the derived VTTCue can be constructed as an object in most environments, aside
... from Edge.

cyril: What's the status of the DataCue?
... I remember it was drafted but I'm not sure on its status.

zcorpan: It may be in the W3C HTML.

ericc: We (apple) implemented it.


atai2: The user agent should be able to get a hint of the format type of an external, out of band text track resources without fetching and introspecting the resource.
... Make the TextTrackCue usable for different caption and subtitle formats

ericc: This is only an issue if the UA supports more than one format.

nigel: It's also useful for a UA that supports only one format because it allows them to avoid loading resources that it cannot process

cyril: This is a bug in browsers then?

zcorpan: No, the idea was that browsers would support VTT and we wouldn't need to change formats for the track element.
... If we end up supporting different formats then obviously we would need to support type information.

atai2: That's not exactly in the HTML5 spec but it has been assumed in the implementations.

cyril: So is this an action on the HTML spec?

atai2: Let's come to that later if we identify it as a requirement.
... A minimum layer for the second requirement is that the user agent should provide a constructor for a generic, extensible TextTrackCue.
... The background is that developers wanting to support other formats like TTML can only initialise a VTTCue now
... and add some properties to get the payload. That's not the preferred approach from a developer's perspective.

zcorpan: Is this for implementing TTML in javascript?

atai2: Yes

cyril: Is the AddCue function not sufficient? Why do you need the constructor?

atai2: AddCue() returns a cue - it only makes a VTTCue.

zcorpan: There's only a constructor for a VTTCue, not a TextTrackCue.

gadams: Doesn't AddCue() take a cue as an argument? So you have to construct it to pass it in as an argument.

zcorpan: Perhaps you're thinking about addTextTrack().

gadams: You can create a DataCue but that only creates metadata and is constrained, so it can't be used for presentation.

zcorpan: If you want it rendered then some spec would need to describe the rendering rules. Or you would
... need to provide the primitives for rendering.

gadams: I'm in favour of putting html in because all browsers know how to present that. Or even just SVG,
... now that it's part of HTML5.

atai2: [HTML5 TextTrackCue interface] https://www.w3.org/TR/html5/embedded-content-0.html#timed-text-tracks

gadams: Interestingly there is a getCueAsHTML() method on TextTrackCue.

zcorpan: But that doesn't return the same DOM that the browser uses to display. In most implementations they're disconnected.

atai2: I think there are additional requirements, as Glenn mentioned, an alternative specialised from the TextTrackCue such as HTMLCue.
... For the first two requirements, does everyone agree that these are needed?

dsinger: This would be a type attribute on the track element?

atai2: Yes.

dsinger: You could do a head request and consider trusting the content type.

zcorpan: Nobody does that in practice.

gadams: Given that all uses of type are just hints and use introspection to come up with a reliable answer,
... I'm not sure that this would work other than making it symmetric with other features that use a type attribute.
... The type attribute is a hint that can be ignored.

atai2: I agree, but that is the same problem on the src element. It's helpful anyway, and there are well
... defined MIME types for VTT, TTML etc.

zcorpan: The main use would be avoiding download of definitely not supported data.

gadams: That's true.
... So that would be dangerous to have a type because the UA might avoid loading something that is actually desired.

ericc: It could be helpful too - in webkit we populate a menu with the tracks, and we get bug reports that
... some tracks are not functional and the reason is that the UA doesn't understand the format.

<cyril> https://www.w3.org/TR/html51/semantics-embedded-content.html#datacue-datacue

nigel: A couple of other questions: 1. Do we need a "load anyway" attribute to tell UA to load a resource it doesn't understand, if there's js present that can process it?
... and 2. Should the actually loaded tracks be used to drive UI e.g. the list of available tracks?
... and 3. Should we bundle up caption tracks that contain the same essence but in different formats in e.g.
... a srcset grouping so the UA can load just the one(s) that it knows how to process?

paulj: I can think of a use case for that for lyrics that either have profanities in or not, and allow a choice between.

ericc: Another one is for forced content, i.e. translation subtitles with or without included hard of hearing subtitles.

atai2: Another is do we need developer documentation for content authors for how to use text track and text track cue for different formats

zcorpan: I think if we really want to have web developers implement other text track formats then we should
... expose the primitives for rendering stuff on a video, what the browser uses e.g. to render the VTTCues
... should be in an API to do the same thing or your own thing instead of overlaying with CSS.

gadams: Isn't it the case that in WebKit and blink that the primitives are in fact html fragments in a div?

zcorpan: It's not exposed.

gadams: yes but that is what is actually used, is that right?

zcorpan: There are dom nodes.

ericc: It is but in both webkit and chrome it uses a custom renderer that only renders cues. It happens to
... use markup that's hidden in the shadow DOM that's rendered differently from HTML with a custom renderer.

gadams: So it's like an augmented version of HTML+CSS with customised rendering semantics?

ericc: yes

atai2: For me the question is should it be a strict requirement that every subtitle or caption resource has a
... type with documentation for how it is rendered and interacts with the cue interface. I found in the HTML5
... spec two or three areas that say cues need to be handled or rendered according to the spec for that format.
... It's not clear from the spec if it is a strict requirement to define how the cues are actually rendered.

zcorpan: I think it is a requirement on browsers but not on web developers. Web developers can do whatever they want independent of spec.

nigel: Going back to the primitives question are there off the shelf things we could use like HTML or SVG or others?

zcorpan: It's not the primitives that are the issue but the javascript API.

ericc: We would need to expose more 'knobs' on the interface for adjusting the display of cues.

nigel: Isn't this a separate thing to VTTCue though, and a different specialisation of TextTrackCue could have its own way of doing layout and display?

ericc: Well you would still need to provide hooks to allow developers to set the positions etc.

gadams: Let's posit a GenericTextTrackCue, the only thing you would need is a way to interpret its payload format.

ericc: You would still need to define the position of the video window etc.

nigel: There's some context information needed to feed into the document processor, that could be exposed.

ericc: Maintaining sync with the video is not easy in javascript - right now you have to poll the video that
... draws down the performance, and you have to handle seeks etc.

cyril: So currently it's using shadow DOM to render?

ericc: That's an implementation detail.

cyril: Do you provide access to that shadow DOM?

ericc: No absolutely not.

gadams: I agree, that would be a bad idea.

atai2: Adds a requirement: If other format type need to be supported by the user agent, they need to offer an API with rendering rules for subtitles and captions. Other formats may need to specify similar API as WebVTT.

cyril: I checked that DataCue in HTML5.1 is specified and has a constructor.

ericc: It does not render.

atai2: It does not work with @kind=captions

ericc: Correct.

gadams: it is constructable but constrained to metadata only.

cyril: I don't know if rendering rules for subtitles and captions need to be there, I could imagine using js
... to create a graphical representation to be synchronised with the video based on data.

zcorpan: It's not that we need the rendering rules, but that the web developer is in control of the rendering.

cyril: Typically the web developer would create SVG, HTML whatever and give that to the browser to display
... in a synchronised manner.

ericc: yes, you could stick whatever data you want in it.

gadams: That's the missing piece, right now the only option is VTTCue but if that's insufficient then you're stuck.

ericc: I think there would be lots of problems trying to find a way to render a DataCue.

nigel: +1 you would need a different generic cue type that's intended for rendering than DataCue.

cyril: So you could in js create something that has a payload that can be rendered and create js event handlers that tell the browser to render it?

ericc: yes you could even with DataCue.

Travis: Only if you're not in full screen

ericc: You can make an ancestor element of the video go full screen in everything except iOS. Otherwise
... the full screen video element does not allow js to draw over it.

zcorpan: The point about format types supported by the user agent, and the discussion was about formats
... supported by the web developer not the user agent.

atai2: But both are true?

zcorpan: Yes but we also want to enable formats that the UA can support?

nigel: yes, it should be generalisable to both cases, ones that the UA can support natively or that need javascript to process.

gadams: A question for ericc - when the technique of DOM nodes is used are there synchronisation issues?

ericc: I was describing what you have to do now, and there are issues.

gadams: You have to keep track of time?

ericc: You can keep track of time using a timeupdate handler. It can be slightly out of sync and has more
... performance impacts than native support.

atai2: adds "Mechanism to find out if a user agent support a format type"

zcorpan: So this is feature detecting native support for a format. We don't need an API for that because we
... can already detect if the interface type is supported - e.g. if VTTCue object is there you can assume support
... and similarly if e.g. TTMLCue were supported then you would be able to assume support for it.
... I realise that's not how it works today for UAs that support TTML, but that's the right way to do it.

ericc: That's the right way to do it.


atai2: How can standards be improved?
... Add a type attribute to TextTrack would be a concrete proposal to add to HTML5.

zcorpan: I think that's starting from the wrong end, adding that attribute.

atai2: Let's not put these in order, but I see your point.

ericc: That's certainly the easiest step.

zcorpan: But it doesn't help!

ericc: No.

atai2: In Edge you can create a generic TextTrackCue object without the VTTCue's properties.

zcorpan: I think it's like that because we originally had just TextTrackCue and then changed it to VTTCue but Edge hasn't caught up.

gadams: In blink there was a generic interface left for TextTrackCue but it cannot be constructed as an object.

nigel: Regardless of what Edge does and whether it is right or wrong we should think about what we want the specs to say and the implementations to do.

dsinger: What the specs say

atai2: We also need to describe how to use TextTrackCue with different formats

cyril: So you want to allow web developers to use other formats using the VTTCue? What if VTTCue doesn't support some semantics?

ericc: That's what this is about, not using VTTCue

atai2: e.g. define API to render Cues and define API for other formats

How to continue with this and make a proposal

atai2: I'm not sure if this is a task for WHATWG or Web Platform WG or there should first be a CG to evaluate different options.

zcorpan: Web Incubator Community Group would be a good place.

atai2: Show of hands for intereste in contributing?

group: [several hands go up including contributors to the discussion]

atai2: Thanks I think it was a good discussion!

zcorpan: We would need to figure out how to expose enough primitives to allow a javascript implementation
... of WebVTT to produce the same output as the existing VTTCue.

nigel: Exactly.
... If the primitives were exposed then a single js library could be used everywhere to give a more consistent and accessible experience.

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.144 (CVS log)
$Date: 2016/09/21 13:01:59 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.144  of Date: 2015/11/17 08:39:34  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Found ScribeNick: nigel
Inferring Scribes: nigel
Present: atai2 addison dsinger bob kirkwood Joshue108 kinjim satoshin bob_bailey paulj stevez cyril zcorpan francois ericc mav ddahl gadams travis
Got date from IRC log name: 21 Sep 2016
Guessing minutes URL: http://www.w3.org/2016/09/21-texttrack-minutes.html
People with action items: 

[End of scribe.perl diagnostic output]