14:53:41 RRSAgent has joined #me 14:53:41 logging to https://www.w3.org/2021/05/24-me-irc 14:53:45 Zakim has joined #me 14:53:56 Meeting: Media Timed Events / DataCue 14:54:10 Agenda: https://www.w3.org/events/meetings/2793e495-c4d0-4e3e-8e29-e302ff479154 14:54:19 Chair: Chris 14:55:04 scribenick: cpn 14:58:59 nigel has joined #me 15:03:43 Present+ Nigel_Megitt 15:04:01 Present+ Chris_Needham, Rob_Smith 15:07:48 RobSmith has joined #me 15:08:37 Topic: TextTrackCue end time implementation 15:09:09 Rob: I have detailed instructions on how to contribute to Chromium. How much support is already implemented? 15:09:33 ... When I submitted the unbounded cues changes, I've run the WPT tests, and have figures for Firefox, Chrome and Safari 15:10:10 ... There's a significant difference in number of tests that pass and fail between them. There's also a different number of tests for each of the browsers 15:11:07 Chris: All the browsers should have TextTrackCue and VTTCue 15:11:46 Rob: Do the tests do feature detection to enable or disable certain tests? 15:12:52 ... I've contributed tests for unbounded cues and the VTTCue constructor 15:13:04 rrsagent, make log public 15:13:06 ... Three tests: two modified, one added 15:13:09 rrsagent, draft minutes 15:13:09 I have made the request to generate https://www.w3.org/2021/05/24-me-minutes.html kaz 15:13:58 ... WPT was easy to set up locally 15:14:28 present+ Kaz_Ashimura, Iraj_Sodagar 15:14:41 Chris: I think building needs a lot of disk space and time 15:14:51 Rob: The steps are well described it seems 15:16:13 Topic: emsg equivalency rules 15:16:16 Unbounded cue Web Platform Test change details: https://github.com/web-platform-tests/wpt/pull/28394#issuecomment-814920479 15:16:19 Chris: https://github.com/WICG/datacue/issues/28 15:17:47 Nigel: From the last meeting, we discussed some specific questions to understand exactly how things are processed 15:17:57 ... Chris's summary didn't match my understanding though 15:18:46 Iraj: We use different terminology, so there may be a mismatch causing some confusion 15:19:19 ... I may lack some understanding of TextTrackCue, but let's confirm so we can understand better 15:19:42 ... [Reviews Nigel's question in #28] 15:20:11 ... When you say a cue, is that one instance of an event? 15:20:38 Nigel: That's right. There's an algorithm that runs all the time while media is playing, which processes the list of text track cues 15:21:05 ... Whenever the playhead moves past the begin time of a cue, a onenter handler is run, and whenever it moves past the cue end time, there's a similar onexit handler 15:21:28 ... Any state associated with a cue that applies during media playback has opportunity to change 15:21:46 Iraj: When the playhead reaches the texttrackcue start time, what happens? 15:22:21 Nigel: Within a short period of time after the begin time, a event handler is put on the JavaScript event queue, so there's a short delay before that gets executed 15:22:45 ... The same thing happens at the end of the cue, in a separate onexit handler 15:22:48 s/a event handler/an event handler/ 15:23:29 Iraj: Then a TextTrack contains multiple cues. There could be one or more active cues at any point in that timeline? 15:23:36 Nigel: That's right 15:24:51 Iraj: Should the new event at Tn+1 end the previous cue? It creates a new cue with start time? 15:25:26 ... So in your question, the cue at Tn+1 defines the end time of the previous cue? 15:25:42 ... This is how the two models differ from each other 15:26:02 ... The three values we have, when they're equivalent, what does it mean for the application? 15:26:29 ... Any cue with those three values are equivalent. They're not necessarily continuations. That's up to the application 15:26:42 ... If one is processed, you don't need to process the next one 15:26:55 ... Also the order is not important. If you process one, you don't need to process the next one 15:27:19 ... If you miss one (e.g., doing random access), it's as if you processed it 15:27:38 ... Do we need signaling of updating a cue - e.g., changing payload or end time? 15:28:20 ... There is a mismatch. There's no explicit signaling for this kind of continuation update. 15:28:40 ... The only thing the processing rules say is whether the cues are equivalent 15:29:16 ... If at time Tn a cue is received, it's put in the TextTrack. If Tn+1 a cue with same 3 values is received, even with different payload, it replaces the previous one 15:29:59 ... If the UA doesn't receive the emsg at Tn, instead at Tn+1, it creates the message as defined at Tn+1, becuse the UA doesn't have the history of the Tn event 15:30:16 ... So you're example doesn't work in the processing model because the payload is different 15:30:59 ... If we have two different versions of the emsg, e.g., using v1 that can signal events prior to arrival time. The event instance will be the same as in Tn 15:31:07 ... If it has end time, there's no end time in either 15:32:00 ... The second one could have a new end time. It could put a new end time at Tn+1, but what's important is that the application should consider that the event message at Tn+1 may not be processed, because Tn is processed 15:32:37 ... If it needs updating, it should use a new id. In the payload it says it's an update of the previous message - so the application processes it as an update, not the UA 15:32:52 q+ 15:33:04 Nigel: That helps, thank you. My confusion came from considering what happens if you rewind, should it recreate the state? 15:33:34 ... The alternative algorithm: when you see a repeated cue with same 3 values, you just discard it 15:33:45 Iraj: Yes, that's covered in the document 15:34:06 ... It's a simple check. If it's not there, add the id to the table 15:34:59 Rob: Thanks for the clear description. The scenario Iraj described sounds like unbounded cues and unbounded cue updates 15:35:39 ... Some confusion - are you actually updating an existing cue? At the app level, you may think you are, but at the UA level you're just creating a new cue. That cue can be linked to an existing cue instance 15:36:23 ... This is exactly what unbounded cues does 15:36:36 Iraj: It's up to the application to do that interpretation, not the UA 15:36:43 Rob: That's my understanding too 15:37:19 Nigel: I thought the UA did process the cues in this model, some confusion 15:37:50 Rob: I'm breaking down the unbounded cue use case, using only bounded cues, and describing why its equivalent 15:38:27 ... It's the conceptual model that means you don't know (until later). I'll finish writing that, but it sounds like we're talking about the same thing here 15:39:02 ... Not proposing updating bounded cues in any way. The only thing I propose adding to a cue is to change a previousnly-unspecified end time to be specified 15:39:13 ... Infinity just means we don't know the end time 15:39:27 Nigel: Two separate conversations: unbounded cues and emsgs 15:39:40 Rob: There may well be a common solution for both 15:40:55 Chris: So we can define processing rule to say that if a cue with same 3-values exists on the TextTrack, we can drop it 15:41:11 q+ to ask if we actually must drop duplicates 15:41:30 Iraj: We should develop the processing model for event messages, then see how well it maps to the TextTrackCue model 15:41:40 ack RobSmith 15:42:28 Rob: WebVMT has stateful transitions. Having developed that, I can see it maps to TextTracks. As a separate issue, if you want to update cues knowing the start time and payload, but don't know end time 15:42:50 ... If at a later time you have a payload that coincidentally links the previous one, you don't have to link them 15:43:06 ... Happy to help define the emsg processing model, it's a common problem for metadata 15:43:40 ... For example, a sensor sample (speed, air quality, etc), and then you subsequently update it with a replacement value, the next in the sequence for that value 15:44:03 ... We discussed at TPAC last year how to do interpolation between two consecutive samples 15:44:29 ... This may be more relevant to emsg - or consider it as a step change: the previous value holds until the next message is received 15:45:26 Chris: The use of interpolation is application specific 15:46:05 Nigel: Coming back to the idea of if you see an emsg event with the same 3-values, you already have a cue for it, and you are allowed to drop it 15:46:51 ... Do we need a stronger requirement, so you *must* drop it. Performance issue if there's lots of messages, each with an event handler, and they all get called 15:46:55 ack n 15:46:55 nigel, you wanted to ask if we actually must drop duplicates 15:47:15 Iraj: Chris asked about the duration of the equivalency rule. Two levels of answers for this. 15:47:40 ... MPEG-DASH considers as an optimisation, the client may drop instances based on the 3-values, it doesn't have to 15:48:00 ... For an MSE spec, we can make it required behaviour, so the app developer knows all browsers will behave the same 15:49:07 ... There's a simple way of making it a 'must' requirement. Define dropping messages with same 3-values. Duration of equivalency buffer for event dispatch is length of the MSE media buffer 15:49:40 ... Length of ids could be required to be the same. If the app developer knows the id happens 5 minutes before, and the buffer length is 5 minutes, it knows the id won't be in the dispatch table 15:49:57 ... It could make the UA processing consistent. 15:50:17 Nigel: Those two things seem to be separable to me 15:50:54 Iraj: The reason they're tied together, you can't have an infinite length table. Consider a 24/7 live stream, the browser joins the stream at some point in time. 15:51:24 ... It can't know what happened previously. It could pause, time passes, then join again, similar to skipping, and receive the same events again 15:52:05 ... For all those behaviours, depends on the length of the table it keeps. Is there a minimum we can require the UA to keep? Could be the mininumu of the MSe media buffer 15:52:34 ... If the web app seeks to a time prior to the media buffer, it needs to request the segment again and change the append window size 15:52:51 ... It doesn't maintain any information on whether it's been done in the past, so treats this as a new segment 15:54:48 Chris: MSE buffers and TextTracks currently are separate 15:55:02 Iraj: Reason for describing this way is to add it to MSE v2 15:55:29 ... One way to define is to define input to TextTrack from MSE, so the UA processes the cues and put into the TextTrack 15:55:38 ... We assume the content is coming in segments in realtime 15:56:09 ... With TextTrack, you could have the entire document. So you don't need the MSE model here 15:56:34 ... Two cases: 1) MSE streaming, short window, segment based processing. 2) You have the entire presentation 15:56:49 q+ 15:56:50 q+ 15:57:52 ... Chris: File based playback with VTT files 15:58:00 s/... Chris/Chris/ 16:00:06 Chris: App level vs browser-level responsibility? 16:00:47 Iraj: Is there a model right now for MSE playback mapped to TextTrack cues 16:01:04 Nigel: Delivering TextTrack cues via MSE has been suggested, but not done yet 16:01:30 ... Two points here. If you have a long-running stream, you don't want unlimited memory use from cues being added 16:01:45 ... With DASH live streaming, there's a rewind window, so you don't want cues outside that period 16:02:16 ... Also user experience and acquisition time. If the cue events are chapter markers, it would be strange if you got different chapter markers depending on how long you'd be watching for 16:02:28 ... You'd expect to get all the chapter markers, then be able to seek 16:03:03 ... If you have to frequently re-issue those, sending those over and over again, it works nicely with the idea of dropping stuff outside the buffer, as you don't need it 16:04:04 Iraj: An example of that kind of application: thumbnail navigation. In DASH we have a specific representation. You download one piece of media with an array of images, parsed by application to build a timeline with thumbnails 16:04:25 ... It provides ability to navigate without downloading the entire media, just download thumbnails 16:04:59 ... I believe it's similar to a side-car file, download the whole file, but don't go through MSE to do that, because the content is too long you just have a short window in time. 16:05:14 q- 16:06:00 Iraj: Download the whole thing at once, parse timeline. It's used for seeking to points in the media timeline, navigation. 16:06:15 q- 16:06:44 Topic: Next steps 16:07:00 Iraj: I'll be available June 21st or 28th 16:08:08 Chris: Can we align the TextTrack based processing model? 16:09:05 Iraj: Create a draft for MSE, and see how it maps to the DataCue 16:13:57 I'd prefer 28th June if possible 16:16:55 Chris: I'll try to write something to consolidate what we've discussed 16:17:32 ... Thank you all, this discussion has to helped clarify 16:17:40 [adjourned] 16:17:46 rrsagent, draft minutes 16:17:46 I have made the request to generate https://www.w3.org/2021/05/24-me-minutes.html cpn 19:09:03 nigel_ has joined #me 20:13:54 Zakim has left #me 21:08:03 nigel has joined #me 21:24:07 kaz has joined #me