28 June 2021


Chris_Needham, Iraj_Sodagar, Kaz_Ashimura, Nigel_Megitt, Rob_Smith, Yasser_Syed
cpn, nigel

Meeting minutes

Unbounded Cues

Rob: Discussion is now on how to support unbounded cues represented through bounded cues
… A cue with start time and no end time would be ignored by engines that don't support unbounded cues. That's correct
… If you update the cue to have a bounded end time, you're effectively repeating it with a bounded cue. The engine can understand that but it may be late
… As an example, a cue from 1 minute to forever, then update it at 2 minutes to end at 3 minutes. Thats graceful degradation, best we can do
… Can we fill the gap, between 1 minutes and 2 minutes that makes the system display the cue earlier. Suggestion for the unbounded cue to be displayed forever, but how to end the cue?
… Whatever you do there is wrong. Looking at it differently, we know how it degrades when displayed late, but what you want is a bounded cue that runs from 1 minute to 2 minutes
… I think it's an insoluble problem. Either update the engine to understand unbounded cues or construct your WebVTT file using only bounded cues, chop it into sections
… The bounded cue engine would understand that, but it requires the VTT file to be authored in that way. Requires knowledge of the receiver capability
… Don't see how to have the unbounded cue display instantly, the only thing to do at that point is to display forever, which may not be right

Chris: Other than diverging WebVMT from WebVTT

Rob: Puts big overhead, if you can modify existing bounded cues, so need to be able to identify the cue to go back and modify later

Nigel: We discussed in last week's TTWG call about the model. There may be something relevant there: to create an external construct, e.g., a chapter that begins and you don't know when it ends
… Add to some construct into an entry with start and end time. This could be adapted here? It's just the start of an idea. I recommend looking at the minutes from that meeting, and considering the data model
… Want to be clear about the semantic model about the cue times, with other entities you may want to model that have a different lifecycle. Could be a way of splitting those out

<nigel> Minutes from last week's TTWG meeting

Chris: Let's organise a VTT specific follow up meeting, as we have that and DASH emsg events to look at

DASH emsg and MSE

Chris: [talks through MSE and video media timeline]

Iraj: Why not deliver all components via MSE? Easier to manage the timeline buffer consistency between them.

Nigel: Would make sense and be simpler overall. If you're watching a live stream, the memory for storing the text tracks could extend forever. If you did it with MSE, it would be clearer what the timeline is, remove things no longer needed
… Seeking back through texttrack cues would work in the same way. Seems strange now that you can't do this

Chris: For those browsers that support inband captions in MSE, I don't know what happens when you remove an MSE buffer range

Nigel: Nothing. There's a cue API you can use to remove captions from the text track

Iraj: Synchronization and data management. The texttrack exists forever,and MSE has a limited buffer size. The timeline alignment between texttrack and the video?

Nigel: Synchronization is less of an issue than how you provide the data

Chris: [talks through algorithm to define cue handling for emsg DataCues]
… Have I described the equivalence correctly for step v?

Iraj: Is there a separate mechanism for populating the text track cues?

Chris: Two ways:
… It could be MSE extracting media events from the media.
… Or it could be the website populating the same track with MPD events.

Iraj: Might be a problem, having those two possible mechanisms.
… The reason is that the equivalency rules talk about event message instances received through the same mechanism.
… So if there are inband events, then the equivalency rules apply to those event instances.
… They don't go across different delivery mechanisms.
… When you use text track cues to maintain the list of already despatched events,
… since that text track may get populated by another mechanism,
… there are inconsistencies possible.

Chris: When I said "yes" I actually meant you could choose to do it, but you're not required to.
… In other words, an application could populate separate text tracks for inband vs MPD events.
… Then if we specify the equivalency rules as operating within a track, then that could work.

Iraj: Yes, that would mean there is no confusion in the equivalency rules.
… One more possible issue, regarding the lifetime of text track cues, and there being no purging.
… When I wrote that document, there was an internal buffer for maintaining already despatched events.
… The question was raised: what is the lifetime of that table, how far back should it go?
… I said it was left to implementations.
… But then I thought the simpler model is the same as the length of the MSE buffer.
… In this case it seems to me that you are saying the lifetime of an event is forever.

Chris: Yes ...

Iraj: I need to check if that will cause problems.

Chris: At the moment, removing from the source buffer is the application's responsibility.
… If the application says it wants to remove a time range that is in the past,
… the application could also inspect the text tracks to remove any events that lie within the same time window.
… We could leave it all to the application, and then the fact that the text track lasts forever
… maybe in practice does not matter because the application is going to remove cues
… so that the timeline matches the MSE buffers.

Iraj: Yes. What's important is the consistency between the behaviour of different applications.
… So that when a content author provides data then they know that every UA's behaviour will be the same,
… even in random access seeking.
… If a browser keeps all events for all time, then all browsers should do it.
… Or if the retained lifetime is the MSE buffer's, then all browsers should do that,
… so that it is predictable in any given scenario.

Chris: Yes. Do we think that having a model where removing a range of audio and video from
… MSE also removes the corresponding Text Track Cues in the same time range would be a good route?

Iraj: It depends on the implementation requirement.
… If the MSE implementation is required also to maintain the buffer of Text Track Cues as internal book-keeping,
… that becomes very simple because the equivalency between the buffers is simple. The MSE keeps one single range.
… If we build this model that, in terms of supporting the events,
… the MSE needs to go through the Text Track Cues and maintain the despatched events in that Text Track,
… then we need to define consistent buffer management rules between MSE and Text Tracks (outside MSE).

Chris: Yes.

Iraj: I was a bit uncomfortable because of this hybrid model, where there has to be some correlation with MSE somehow.
… Does that mean that every deployment of browsers has to maintain that model?
… Or is it possible to instantiate MSE and not Text Track Cues?
… I believe we need to have a single model, either always handle events in Text Track Cues, or purely within MSE and not use Text Tracks.

Chris: The way I'm thinking about this is to ask what should implement the requirements: fully in the browser, or partly
… in the browser and partly in the player?
… The browser could be specified to apply the processing rules to the indefinite buffer and then that would still be consistent,
… but not be complete in terms of what you're looking for, because the player would be required
… to remove the cues when you remove the media segments from the MSE.
… Another option is more along the lines of closer integration of the text tracks with MSE.

Nigel: Could there be a programmatic way to link MSE to a text track and then have some required behaviour?

Iraj: I think that makes more sense than leaving it to the application. Then you can assume uniform behaviour between applications

<Zakim> nigel, you wanted to check if the cardinalities on the first diagram are really 1..* or could be 0..* and to

Chris: Could check with implementers on plans for VideoTrack, AudioTrack, TextTrack in MSE

Iraj: With live content with segmented delivery of captions, there's no side-car for TextTrackCues unless you add significant delay? How does live streaming work with subtitles?

Chris: The player requests the captions and uses the TextTrack API to add the cues

Iraj: If MSE v2 supports event parsing, would also supporting captions make sense?

Chris: I would think so, yes

Next meeting

Chris: Meet in 3 weeks, July 19th?

Iraj: Need to discuss with MSE people

Chris: I'll email the MSE editors
… Also arrange a follow up on WebVTT. I'll follow up with Gary and Rob to schedule a time

Minutes manually created (not a transcript), formatted by scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).


Succeeded: i/Nigel: Could/scribe: cpn

Maybe present: Chris, Iraj, Nigel, Rob