W3C

- DRAFT -

Subtitle Format Support of TextTrack

08 Nov 2017

Agenda

Attendees

Present
Andreas_Tai, Alex_Deacon, Evan_Yamanishi, Cyril_Concolato, Eric_Carlson, Pierre_Lemieux, Kyosuke_Mizutani, Nigel_Megitt
Regrets
Chair
Andreas
Scribe
nigel

Contents


<scribe> scribe: nigel

Introductions

Alex_Deacon: MPAA

evan: Evan Yamanishi, WW Norton & Co, text book publisher

cyril: Cyril Concolato, Netflix

<ericc> Eric Carlson: WebApple Apple

pal: Pierre Lemieux, Movielabs

Kyosuke_Mizutani: Kyosuke Mizutani, Japan Commercial Broadcasters Association

Andreas: Andreas Tai, IRT

nigel: Nigel Megitt, BBC

Recap

Andreas: Last year, a similar session
... TextTrack has no type attribute to identify formats
... TextTrackCue use for other than WebVTT means you have to use VTTCue interface
... because constructors don't exist for TextTrackCue.
... Example: <video controls><track kind="captions" src="..."> - cannot say if it is VTT or TTML

ericc: Only matters when the browser supports more than one format

Andreas: Question is if there is a use case for that?

cyril: If your browser supports only one format then you only want to load the resources you can process

ericc: In a world when there aren't browsers that support more than one format then its not so important.

Andreas: The second issue was that there are more formats, used more widely, and using
... the text track model makes a lot of sense, and then you have to use VTTCue with
... onenter and onexit events to overlay an html payload on the video.
... You have to initiate an object with a lot of properties not needed, it's not perfect.
... As last year: we can use DataCue, but it's not used for rendering.

cyril: In HTML 5.1 it was "at risk", in 5.2 it is not "at risk" and in WHATWG HTML it is absent.

Andreas: It's not the use case to use DataCue

ericc: The purpose of adding it was to have timed data that the browser doesn't know anything
... about but so that scripts can receive timed data to do whatever is appropriate.

Andreas: This was discussed last year, consensus was it is not good to use for captions.
... Should not set kind="captions"

ericc: No, you should not

Andreas: The kind should be "metadata"

cyril: That's not what the spec says.

ericc: If you have captions then you should tag the text track as containing captions, but if
... they're not actually captions but are metadata then that doesn't make sense.

cyril: That means you can still have captions in DataCue, it is not a problem

ericc: There are issues we may cover later.

cyril: DataCue vs VTTCue is orthogonal to the kind attribute value.
... On caniuse there's no support for DataCue.

ericc: Firefox doesn't support it and they took it out of Chrome. Last time I checked
... Edge had primitive support for partial spec of WebVTT

Andreas: Last year Simon Pieters said the main use for the type attribute is to avoid downloading
... of definitely not supported data.
... Something else is to prioritise formats based on which is more appropriate.
... Then if we open the model up beyond WebVTT, then what needs to be done?
... Comments from Simon and ericc that there have to be rendering rules, and an assumption
... that the web developer has an interface to change the cue and the presentation.
... For TTML working with VTTCue there's no method for interacting with the cue or changing
... presentation.

nigel: Is there an API for querying the user's preferences about caption presentation?

ericc: No, and I think it would be very hard to persuade people that it would be worth
... the increased fingerprinting footprint - it is quite large.

nigel: OK, is there a way to ask for a local stylesheet to be applied?

ericc: No, it would give the same issue of fingerprinting

nigel: I see, yes, after applying the CSS your JS could still query the resultant style properties.

ericc: Yes

Andreas: Do we need what we asked for last time - an extended API for exposing the properties
... for styling the captions?

ericc: Yes, if our goal is to come up with a way for a script to be able to parse a cue format
... that the browser doesn't understand but then handover the responsibility for rendering to the browser
... then we need APIs so the script can give the browser all the information that it has about
... how to render that cue. Stating the obvious, I think that's a good goal because the browser
... has information the script can't ever have, like how does the user of this computer want
... to have their captions rendered? Does the person want to choose a font size, etc. These
... are system level settings. And so that the user agent can also make a choice about which
... (if there's more than one text track) kind of track to present based on those settings.

lsullam: Lillian Sullam, Penguin Random House

Andreas: The last point is that you can bypass text track cue and do your own.

ericc: A lot of people do that

nigel: We do that in the BBC

Andreas: It has performance problems.

ericc: Yes and other implications, like there's no way to apply the user's preferences to it.

Andreas: The conclusion last year was to go on with this, there was interest, and I volunteered
... to create a proposal and feed it back to last year's participants and start an activity
... in WICG but I did not manage it. This is a second try, to discuss how to do that and if
... it is a valid way to go.
... Also, some parts of HTML5 I didn't notice last year.
... For example "type" - HTML5 is aware there's no type attribute, and it says that there is
... no common way to check the MIME types. It proposes to use the HTTP header to look for
... the content type, and we agreed last year it was not a good way to do it.

ericc: It is not reliable.

Andreas: Sniffing the payload is not good either.

ericc: [agrees]

Andreas: Then there's a section 4.7.14.11.4 about in band data. There are a few constraints
... which an implementation needs to apply, like converting to text track cues and conforming
... to what is specified in HTML 5 about text track cue, which is not a lot.

ericc: As far as I know webkit is the only engine that does anything with inband text tracks.

nigel: It's missing from MSE, should it be there?

ericc: MSE has tracks of type text but I don't think anyone uses it.

cyril: It's worse - if you put text in then it doesn't render because there's nothing in the source buffer.

ericc: I think that's a result of the lack of support. As far as I know we've never had a bug
... report about this.

pal: I raised this when MSE was being designed 3-4 years ago. I got the response that if you
... are not happy with the functions in the browser for media container parsing, then do the
... parsing on your own.

ericc: Yes but that doesn't always work. Even if you have the code to do the parsing yourself.

pal: That philosophy may have driven the decisions in the specs.
... It was a top level direction in the MSE group.

ericc: The MSE spec does support text tracks but I don't think any of the browsers do.

Andreas: In general for out of band text tracks or text track data if there is another format
... then at least a mapping is needed from the format feature to what is defined for text track cue.
... There is one document, an unofficial draft, that Cyril contributed to, about in band tracks.
... It mentions encapsulation of TTML in ISOBMFF. It says the best option would be to map
... it to a not-yet-defined TTMLCue, or second if not available, to map to a VTTCue, or
... thirdly possibly use a DataCue object. This is actually referenced by HTML at the moment.

Discussion and next steps

Andreas: Two years ago there was a more specific target to have a TTMLCue or HTMLCue
... and then last year we discussed a more generic cue object. The question is what do we
... need to define to get a generic cue concept, and where should it live. Should it be in HTML
... or something specified elsewhere?

ericc: That's getting the cart before the horse. My opinion is it a worthwhile activity for the
... reasons I have said before. We would certainly support it. Last year we talked about
... starting a CG and coming up with some kind of a specification, and if it gets to the point
... where it looks like a reasonable thing figure out if it is an extension spec or if it should
... go into HTML but I don't think we should spend time thinking about that yet.

Andreas: That was why Simon suggested using WICG

ericc: Yes

Andreas: Is it still a good option?

cyril: I think it is considered a good approach to go through WICG with a prototype, some
... use cases, get feedback, and then begin a CG or even a WG.
... First is the use cases. You have to convince people that the current VTTCue is not good enough.

ericc: I don't think it will be hard at all to convince people of that.

Nigel: A point to note is that you could conceive of this as possibly being an activity of
... the Timed Text Working Group, and that group's charter expires end of March 2018.
... So that could set a timeline for going through this WICG process and generating output.
... if we want it in the TTWG charter.

Andreas: For incubating, a more neutral home would be good.

ericc: +1

Andreas: The HTML spec expects a more generic cue type

ericc: I would be very surprised if that ever happens.

Andreas: The browser manufacturers only implement VTTCue but the HTML spec itself is neutral.
... And VTT gives all the rendering rules that HTML requires.
... If we now want a more generic cue would it fit in that model? Possibly not 100%.
... That generic cue would also be usable for VTT.

ericc: Sure, I don't think it would make sense to remove the existing interface, but if there
... were a script library that did a better job of handling WebVTT using the generic cue
... mechanism then that might make sense.

Andreas: So it would be an extension of the HTML5 TextTrackCue?

ericc: Yes

Andreas: What is needed?

ericc: I think what is needed is to define the javascript interface - what are its properties,
... methods, constructor inputs etc.

Andreas: There are so many properties that affect subtitle presentation, not just positioning,
... also fonts, colours etc. Should this all go in the API? There is not clear alignment between
... VTT and TTML, with some features not in both formats. So maybe we have to concentrate
... on more fundamental features.

ericc: Yes exactly. We have to focus on the things that are really needed for the web. If it
... ends up being a huge spec then it would be unlikely to be implemented.

pal: There's HTML and TTML that both exist already. Why does the cue interface need to
... be anything other than set HTML or set TTML?

ericc: Set TTML would only work if the browser understands it.

pal: So why not set HTML?

ericc: Maybe, I'm not sure.

pal: I've been discussing with the folks at shaka player. They have their own extension of
... TextTrackCue. I asked how to extend the API to support TTML, which allows font size
... to change, italics in the middle of the cue etc, not just a single setting for a whole cue.
... So I could try to reproduce all the features and we would end up with TTML essentially.
... It would be a lot of effort for little point. Then there's also HTML, so if there's a good
... mapping to HTML then we would not have to duplicate all this work. Part of the issue
... is that captions are complex, it's more than one setting per cue. How do you do that in
... an API?

cyril: Canvas!

pal: Maybe not the right answer but in my mind browsers know HTML and it's already there,
... so why not just use it.

Andreas: What would be there additional to TextTrackCue?

pal: Ideally you'd setTTML(), but maybe throwing in an idea, setHTML()?

ericc: There's a method on VTTCue that returns a DOM fragment. I think you're saying there
... would be a generic cue that would set the content for a DOM fragment. When we were
... working on VTTCue in the first place people had a lot of concerns about setting arbitrary
... CSS on it.

pal: I remember that - there were concerns about making any HTML behaviour happen. But
... that's already possible in general, so what is special about doing it in a Cue? I don't know.

Andreas: Then how do you expose features to the web developer like setting fonts etc?

ericc: If there were a constructor that returned you a cue and it took as its argument the
... start time, duration and a DOM fragment, then in your format parser it would be responsible
... for making that DOM fragment and doing whatever is needed to make it render the right way.

pal: Back to what you were saying, if you move user preference away from the UA and give
... it to the app then the app has to make that happen within the HTML.

Andreas: We have that problem with TVs too, where there is an app that handles subtitles,
... and there are user preferences in the TV, and the question is how to allow them to interact.

ericc: Yes. On MacOS and iOS there's a user setting for font colour, background color, the
... stroke, and some other attributes, so any interface is going to require that the browser
... is going to be able to apply those styles.

nigel: Isn't this the point before that post-presentation the javascript could query the settings,
... and that would be open to fingerprinting?

ericc: No, you would hide that information from the javascript and only apply it on presentation.

Nigel: I'm not sure it is a good model to have system wide settings - you might want provider-based settings.

group: [discussion of customisation options]

pal: I just want to confirm that the vision for user setting, and preserving across apps...

ericc: It's a system setting so its up to the application that renders to apply the system settings.

pal: How does that prevent fingerprinting?

ericc: It's not exposed to scripts.

pal: Thank you.

Summary and close

Andreas: We got to a similar point to last year so it would be good to start this activity!

ericc: yes

Andreas: If this is fine, or someone else wants to volunteer, then I will do that.

ericc: This is important, if you do that we will support you.

Andreas: Should we go for both models, the generic cue and ...?

ericc: We're talking about the same thing, just about how to get the cues into the user agent.

pal: Applying the styling is the complex area.

evan: ARIA is working on settings and personalisation semantics.

<evan> https://w3c.github.io/personalization-semantics/

Andreas: Thank you all for joining.

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2017/11/08 22:05:55 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.152  of Date: 2017/02/06 11:04:15  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/Andreas: In/Topic: In/
Succeeded: s/eric:/ericc:/g
Succeeded: s/ts/te/
Succeeded: s/ONly/Only/
Present: Andreas_Tai Alex_Deacon Evan_Yamanishi Cyril_Concolato Eric_Carlson Pierre_Lemieux Kyosuke_Mizutani Nigel_Megitt
Found Scribe: nigel
Inferring ScribeNick: nigel
Agenda: https://www.w3.org/wiki/index.php?title=Slides&amp;action=edit&amp;redlink=1

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: 

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]