Meeting minutes
cpn: We're running this meeting under WICG contribution requirements, not the M&E IG.
… Goal today is to try to get alignment on what we want to propose with regards to TextTrackCue.
… I'd like to do that that takes into account the proposals that have been put forward.
… Ideally, I'd like to end with something that's compatible with Apple's proposal, especially since they already shipped the DataCue API, and that satisfies Rob's requirements.
RobSmith: That sounds good to me. I have some diagrams to guide us.
… [showing hierarchy of VTTCue and TextTrackCue]
… If we start with VTTCue in the blue box, that has text content and getCueAsHTML() as a method, and some other functionality.
… That inherits from TextTrackCue which has a start time and an end time.
… In the yellow box, we have DataCue. VTTCue is the Timed Text carrier. DataCue is the timed metadata carrier.
… This design works.
… The little circles show which interfaces can be constructed. TextTrackCue does not have a constructor.
… Also, DataCue does not actually exist in practice across browsers.
… It does not have wider adoption.
… If we had wider support, the conversation would probably end here.
… With that in mind, here is a slightly different proposal, which is what I proposed in WICG/datacue.
… Most of this is the same. VTTCue and TextTrackCue are as they were, except that TextTrackCue can now be extended because it has a constructor.
… The MyExtendedCue class is user-defined. The only change is to allow access to the constructor of TextTrackCue.
… If there is a constructor, the unique class of an extended class can be used to identify the type of cues.
… It does away for the need for DataCue. It has a type, it has a value. And it has custom functions in there as well.
cpn: Essentially, the web platform should allow TextTrackCue to be constructed so that I can create my own derived class. And then I get the functionality from TextTrackCue.
RobSmith: Yes, exactly, it gives access to the functionality, that does not have much to do with "text tracks" in practice.
cpn: HTML already has the concept of different types of text tracks (e.g., chapters).
… We probably have to live with the fact that it's called TextTrackCue in any case.
RobSmith: I have a tiny change to suggest to Apple's proposal but we can then propose it for integration, I think.
… Looking at Apple's proposal, in this design, my understanding is that VTTCue is modified here. The text content still exists, but it now depends on cueNode, which is an attribute of TextTrackCue.
… At the top, we got startTime and endTime, then cue and cueBackground are boolean flags, related to rendering the cues on screen.
cpn: My reading of the proposal is that these would actually be CSS pseudo-classes, not attributes on the TextTrackCue.
… It would be a style that you can attach to any HTML element.
RobSmith: My understanding is that these get attached to cueNode, a DocumentFragment, some element in the document.
… The cue and cueBackground are part of the constructor but you have to have a cueNode of the right type.
… I'm a little unclear as to why cueNode and getCueAsHTML() are different.
cpn: If we're trying to implement rendering of TTML cues, I would parse the TTML document, use that to create a DocumentFragment to the TextTrackCue constructor, and then it would be rendered in time by the browser.
Pierre: That's my understanding as well.
cpn: There's a whole different set of questions as to whether this proposed solution solves TTML/IMSC text track rendering. Perhaps we don't need to get into that here. The thing we're interested here is carriage of more general purpose metadata than caption rendering.
… When I look at that picture from that perspective, I see weirdness in relation to DataCue, as DataCue does not have any business with DOM rendering.
… You'd like TextTrackCue to be completely generic, and then you'd like derived classes that have the functionality you need.
Alicia: I agree with that. Ideally, you'd have a TimedEvent generic class, and then derived classes.
… Unfortunately, it might be a bit too late to change that.
RobSmith: I agree, it's called cue because it has a start time and an end time.
… Start time and end time is what is shared across scenarios.
cpn: I'm ignoring the weirdness of the inheritance of DataCue for a moment. The proposal is to add a constructor with an additional parameter on top of start time and end time, to set the cueNode.
… If that parameter were optional, you'd get what you need to get the generic instance that you need to do your user defined derived class.
… That's the change I was wondering about, making the parameter optional.
Alicia: Didn't Apple have the same problem when they implemented DataCue in Webkit?
cpn: Yes, I assume that they had to do something like make things null, yes.
Alicia: Or maybe not. I'm not familiar with the proposal.
RobSmith: Another proposal is to make the cue region pseudo-element global instead of limiting them to VTTCue. Presumably to apply to TextTrackCue as well.
cpn: I guess that's what I'd like to understand. Whether that has meaning from a DataCue perspective.
RobSmith: I think they're trying to link things to user preferences, e.g., if you have a dark theme set, then cues would match that theme. There are some restrictions today related to privacy, I believe..
Alicia: Dark/Light theme is already exposed though.
cpn: There are other parameters, e.g., size.
Alicia: Yes, there is some tension there.
… If one of the goals is to have a non VTT text track, one of the things you need is to position things on the screen. I assume that's part of the motivation. I haven't read the proposal in depth to comment for sure.
RobSmith: That puzzles me. If you want big text, then the cue is going to pop up huge on the screen. I would assume some balance is going to be needed.
Alicia: Yes. In broadcast, there are rules for closed captions overlapping the underlying video.
RobSmith: The proposal also wants to support legacy formats, to integrate that with web browsers.
… That's out of my domain.
Alicia: One of these formats is CEA/CPA/[lots of names] 608 that was used in TV broadcasting for closed captions.
… This is also used in the wild outside of the TV world.
Pierre: What problem are we trying to solve? It'd be good to list them? Is is a synchronization issue? Subtitle captioning issue? Something else?
cpn: I don't think it's a captioning issue. We're trying to enable web applications to be able to synchronize any kind of metadata alongside a plane of audio/video stream.
… Rob synchronizes geolocation information alongside a video.
Pierre: Today, you can do that with VTTCues.
… With an empty cue.
… It's already in the browser anyway.
RobSmith: Yes, but comes with heavy plumbery.
Pierre: OK, so I would not even talk about TextTrackCue. That's confusing. Every time we get back to that discussion, we get back to cue rendering and CEA 608 and the like, and that's totally orthogonal to what you seem to be needed.
… TextTrackCue is "text". The name says "text track".
RobSmith: Here is a proposal for how to harmonize my proposal and Apple's proposal.
… Same proposal as Apple's, except I decoupled it from TextTrackCue, through a new FragmentCue interface. Functionally, it's the same thing. Whatever functionality they want to add is fine.
cpn: This makes sense to me. That's the hierarchy I was envisioning indeed.
… Back to Pierre's point, you can use VTTCue today. The wrinkle is that you have to bold the payload as text cue content.
RobSmith: You don't, you can add arbitrary attributes.
cpn: Is there a conclusion that the design is imperfect, and that we would do it differently if starting again from scratch, but if it's usable for that purpose, do we need anything new at all?
… Are we pursuing a real problem?
RobSmith: The polyfill builds on top of VTTCue. But the only bits it's interesting in is start time and end time. And there's nothing that needs the rest of the functionality.
… From that perspective, I agree, this is a practical solution.
… If you look at MDN and you look at TextTrackCue, it says that it is an abstract base class.
… That is what I propose to do in my proposal: add a constructor, but keep the class as abstract by rejecting on direct instantiation.
Alicia: What you're changing is that you're allowing JS apps to derive from TextTrackCue and call its constructor.
RobSmith: That's exactly it. Exposing it to user access.
… See the Wiki page I prepared to demo the proposal.
… The constructor takes an object that has a foreground and background and that's about it. Nothing further is required.
… Unlike DataCue, the class defines everything.
… It's a more modern version of JavaScript.
Alicia: DataCue was also meant to support in-band tracks, that's why it has a type attribute as well. I'm ok with users creating their fancy cues.
cpn: The reason why your proposal does not support in-band is that you need some interface that the browser needs to instantiate when it parses in-band tracks, so that it may render or expose them.
… But that's a simple extension to this model. DataCue can still exist alongside with this proposal.
RobSmith: This is a simpler change compared to DataCue. With a minor change, it makes it available to users.
cpn: I'm wondering now. If we were to introduce a TextTrackCue constructor, would web developers start implementing their own DataCue class that would prevent standardization of the class later on?
Alicia: Chris, that's the case for everything. Window is already exposed.
cpn: I have this recollection of people being constrained because polyfills exist as browsers don't want to break existing user code. But this is so niche that it may not be a concern.
Alicia: I don't think it's necessary to reserve the term.
cpn: With this proposal, the DataCue proposal refocuses on in-band scenarios. I think it all hangs together. The question in my mind: do we really need to inhibit TextTrackCue construction?
RobSmith: If you allow that, that undermines the type identification in my model. If you allow TextTrackCue objects to be instantiated directly, less able programmers will just use that and use JS to add functionalities. Then we wouldn't be able to tell the difference. And then you'll have to guess the type of cues.
… Blocking the constructor forces them to do the right thing.
cpn: I'm trying to think how the problem arises. Combining code from different authors who made different choices?
RobSmith: You load the class. You construct a cue. You add event handlers, and then you add it to a text track. Then it's handled by the scheduler.
… When the cue is dispatched to an event handler, the handler needs to figure out what type of cue it is. And it could be coming from somewhere else on the page.
… You could have a location but you can also have sensors data coming in as well. Different types of cues.
Francois: That's up to the developer. What you propose forces developers into one way of doing it, but other ways may be equally valid
RobSmith: I'm concerned that two different people may be creating different cue types.
Alicia: You may be using two different libraries that provide you with different cues. If one of those makes the bad decision of instantiating TextTrackCue instead of deriving it, your life is going to be worse. Regarding ergonomics, you would want to guide JS users to do the right thing.
cpn: We need to stop here because of time.
… Easiest for me would two weeks from now, same time.
… To continue the discussion.
… Or this time on Thursday next week.
RobSmith: I'm planning to propose a breakout session for Breakouts Day 2025.
cpn: OK, let's do Thursday next week then.
Francois: Regrets from me.
cpn: In the diagrams, I would suggest to add the DataCue interface as it exists today.