Media Timed Events / DataCue

Meeting minutes

TextTrackCue end time implementation

Rob: I have detailed instructions on how to contribute to Chromium. How much support is already implemented?
… When I submitted the unbounded cues changes, I've run the WPT tests, and have figures for Firefox, Chrome and Safari
… There's a significant difference in number of tests that pass and fail between them. There's also a different number of tests for each of the browsers

Chris: All the browsers should have TextTrackCue and VTTCue

Rob: Do the tests do feature detection to enable or disable certain tests?
… I've contributed tests for unbounded cues and the VTTCue constructor
… Three tests: two modified, one added
… WPT was easy to set up locally

Chris: I think building needs a lot of disk space and time

Rob: The steps are well described it seems

emsg equivalency rules

<RobSmith> Unbounded cue Web Platform Test change details: https://github.com/web-platform-tests/wpt/pull/28394#issuecomment-814920479

Chris: https://github.com/WICG/datacue/issues/28

Nigel: From the last meeting, we discussed some specific questions to understand exactly how things are processed
… Chris's summary didn't match my understanding though

Iraj: We use different terminology, so there may be a mismatch causing some confusion
… I may lack some understanding of TextTrackCue, but let's confirm so we can understand better
… [Reviews Nigel's question in #28]
… When you say a cue, is that one instance of an event?

Nigel: That's right. There's an algorithm that runs all the time while media is playing, which processes the list of text track cues
… Whenever the playhead moves past the begin time of a cue, a onenter handler is run, and whenever it moves past the cue end time, there's a similar onexit handler
… Any state associated with a cue that applies during media playback has opportunity to change

Iraj: When the playhead reaches the texttrackcue start time, what happens?

Nigel: Within a short period of time after the begin time, an event handler is put on the JavaScript event queue, so there's a short delay before that gets executed
… The same thing happens at the end of the cue, in a separate onexit handler

Iraj: Then a TextTrack contains multiple cues. There could be one or more active cues at any point in that timeline?

Nigel: That's right

Iraj: Should the new event at Tn+1 end the previous cue? It creates a new cue with start time?
… So in your question, the cue at Tn+1 defines the end time of the previous cue?
… This is how the two models differ from each other
… The three values we have, when they're equivalent, what does it mean for the application?
… Any cue with those three values are equivalent. They're not necessarily continuations. That's up to the application
… If one is processed, you don't need to process the next one
… Also the order is not important. If you process one, you don't need to process the next one
… If you miss one (e.g., doing random access), it's as if you processed it
… Do we need signaling of updating a cue - e.g., changing payload or end time?
… There is a mismatch. There's no explicit signaling for this kind of continuation update.
… The only thing the processing rules say is whether the cues are equivalent
… If at time Tn a cue is received, it's put in the TextTrack. If Tn+1 a cue with same 3 values is received, even with different payload, it replaces the previous one
… If the UA doesn't receive the emsg at Tn, instead at Tn+1, it creates the message as defined at Tn+1, becuse the UA doesn't have the history of the Tn event
… So you're example doesn't work in the processing model because the payload is different
… If we have two different versions of the emsg, e.g., using v1 that can signal events prior to arrival time. The event instance will be the same as in Tn
… If it has end time, there's no end time in either
… The second one could have a new end time. It could put a new end time at Tn+1, but what's important is that the application should consider that the event message at Tn+1 may not be processed, because Tn is processed
… If it needs updating, it should use a new id. In the payload it says it's an update of the previous message - so the application processes it as an update, not the UA

Nigel: That helps, thank you. My confusion came from considering what happens if you rewind, should it recreate the state?
… The alternative algorithm: when you see a repeated cue with same 3 values, you just discard it

Iraj: Yes, that's covered in the document
… It's a simple check. If it's not there, add the id to the table

Rob: Thanks for the clear description. The scenario Iraj described sounds like unbounded cues and unbounded cue updates
… Some confusion - are you actually updating an existing cue? At the app level, you may think you are, but at the UA level you're just creating a new cue. That cue can be linked to an existing cue instance
… This is exactly what unbounded cues does

Iraj: It's up to the application to do that interpretation, not the UA

Rob: That's my understanding too

Nigel: I thought the UA did process the cues in this model, some confusion

Rob: I'm breaking down the unbounded cue use case, using only bounded cues, and describing why its equivalent
… It's the conceptual model that means you don't know (until later). I'll finish writing that, but it sounds like we're talking about the same thing here
… Not proposing updating bounded cues in any way. The only thing I propose adding to a cue is to change a previousnly-unspecified end time to be specified
… Infinity just means we don't know the end time

Nigel: Two separate conversations: unbounded cues and emsgs

Rob: There may well be a common solution for both

Chris: So we can define processing rule to say that if a cue with same 3-values exists on the TextTrack, we can drop it

Iraj: We should develop the processing model for event messages, then see how well it maps to the TextTrackCue model

Rob: WebVMT has stateful transitions. Having developed that, I can see it maps to TextTracks. As a separate issue, if you want to update cues knowing the start time and payload, but don't know end time
… If at a later time you have a payload that coincidentally links the previous one, you don't have to link them
… Happy to help define the emsg processing model, it's a common problem for metadata
… For example, a sensor sample (speed, air quality, etc), and then you subsequently update it with a replacement value, the next in the sequence for that value
… We discussed at TPAC last year how to do interpolation between two consecutive samples
… This may be more relevant to emsg - or consider it as a step change: the previous value holds until the next message is received

Chris: The use of interpolation is application specific

Nigel: Coming back to the idea of if you see an emsg event with the same 3-values, you already have a cue for it, and you are allowed to drop it
… Do we need a stronger requirement, so you *must* drop it. Performance issue if there's lots of messages, each with an event handler, and they all get called

<Zakim> nigel, you wanted to ask if we actually must drop duplicates

Iraj: Chris asked about the duration of the equivalency rule. Two levels of answers for this.
… MPEG-DASH considers as an optimisation, the client may drop instances based on the 3-values, it doesn't have to
… For an MSE spec, we can make it required behaviour, so the app developer knows all browsers will behave the same
… There's a simple way of making it a 'must' requirement. Define dropping messages with same 3-values. Duration of equivalency buffer for event dispatch is length of the MSE media buffer
… Length of ids could be required to be the same. If the app developer knows the id happens 5 minutes before, and the buffer length is 5 minutes, it knows the id won't be in the dispatch table
… It could make the UA processing consistent.

Nigel: Those two things seem to be separable to me

Iraj: The reason they're tied together, you can't have an infinite length table. Consider a 24/7 live stream, the browser joins the stream at some point in time.
… It can't know what happened previously. It could pause, time passes, then join again, similar to skipping, and receive the same events again
… For all those behaviours, depends on the length of the table it keeps. Is there a minimum we can require the UA to keep? Could be the mininumu of the MSe media buffer
… If the web app seeks to a time prior to the media buffer, it needs to request the segment again and change the append window size
… It doesn't maintain any information on whether it's been done in the past, so treats this as a new segment

Chris: MSE buffers and TextTracks currently are separate

Iraj: Reason for describing this way is to add it to MSE v2
… One way to define is to define input to TextTrack from MSE, so the UA processes the cues and put into the TextTrack
… We assume the content is coming in segments in realtime
… With TextTrack, you could have the entire document. So you don't need the MSE model here
… Two cases: 1) MSE streaming, short window, segment based processing. 2) You have the entire presentation

Chris: File based playback with VTT files

Chris: App level vs browser-level responsibility?

Iraj: Is there a model right now for MSE playback mapped to TextTrack cues

Nigel: Delivering TextTrack cues via MSE has been suggested, but not done yet
… Two points here. If you have a long-running stream, you don't want unlimited memory use from cues being added
… With DASH live streaming, there's a rewind window, so you don't want cues outside that period
… Also user experience and acquisition time. If the cue events are chapter markers, it would be strange if you got different chapter markers depending on how long you'd be watching for
… You'd expect to get all the chapter markers, then be able to seek
… If you have to frequently re-issue those, sending those over and over again, it works nicely with the idea of dropping stuff outside the buffer, as you don't need it

Iraj: An example of that kind of application: thumbnail navigation. In DASH we have a specific representation. You download one piece of media with an array of images, parsed by application to build a timeline with thumbnails
… It provides ability to navigate without downloading the entire media, just download thumbnails
… I believe it's similar to a side-car file, download the whole file, but don't go through MSE to do that, because the content is too long you just have a short window in time.

Iraj: Download the whole thing at once, parse timeline. It's used for seeking to points in the media timeline, navigation.

Next steps

Iraj: I'll be available June 21st or 28th

Chris: Can we align the TextTrack based processing model?

Iraj: Create a draft for MSE, and see how it maps to the DataCue

<RobSmith> I'd prefer 28th June if possible

Chris: I'll try to write something to consolidate what we've discussed
… Thank you all, this discussion has to helped clarify

[adjourned]

– DRAFT –
Media Timed Events / DataCue

24 May 2021

Attendees

Meeting minutes

TextTrackCue end time implementation

emsg equivalency rules

Next steps

Diagnostics