14:49:18 RRSAgent has joined #me 14:49:18 logging to https://www.w3.org/2021/09/20-me-irc 14:49:22 Zakim has joined #me 14:56:35 Meeting: Media Timed Events 14:56:45 Present+ Chris_Needham 14:56:47 Chair: Chris 14:56:49 Agenda: https://www.w3.org/events/meetings/2b88a9a9-b1bc-463e-973f-018e98cb1558/20210920T160000 14:56:57 scribenick: cpn 15:01:49 Present+ Nigel_Megitt 15:01:52 alicia has joined #me 15:01:58 present+ Alicia_Boya, Louay_Bassbouss 15:03:03 present+ Gary_Katsevman 15:04:38 https://github.com/w3c/webvtt/issues/496#issuecomment-921999893 15:04:40 Topic: Unbounded cues in WebVTT 15:05:03 Gary: I posted some comments on the issue. For unbounded cues, one of the things we have concerns about is backwards compatibility 15:05:13 ... If you have a new unbounded cue, what do old parsers do? 15:05:42 ... Thinking about it, I'm leaning towards what Rob was saying, that there isn't a good way to represent unbounded cues in the old way 15:06:03 ... You may not know ahead of time what the unbounded cues represent - e.g., a cue will never get an end time 15:06:14 ... Or if it will get an end time, but don't know when 15:06:54 ... Could be represented differently, for example: if you know if it will never ends, you could put 99 hours as the end time. Or if it does have an end time, copy the cue in small duration increments 15:07:22 Nigel: Is that an argument for not needing anything more than what we have now? 15:07:50 Gary: For this use case (unbounded cues), it's fine if there's no way for them to show up in old parsers 15:08:06 Nigel: Looking at the use cases document? 15:08:34 Gary: I'm talking specifically about just being able to represent unbounded cues rather than a particular use case 15:09:06 ... David asked about how likely this would go into WebVTT as MPEG would need to decide whether to keep their changes in or not 15:09:32 ... Is it possible to have this in a narrow enough scope that this feature can be added now and expanded more later 15:10:16 https://github.com/w3c/media-and-entertainment/blob/master/media-timed-events/unbounded-cues.md 15:10:38 Chris: We haven't completed the use case list. Do we need them all to make progress on representation? 15:11:34 Gary: I don't think we need an exhaustive list of use cases so that we can decide whether to ship a constrained feature, then expand it to cover all the uses we'd want 15:12:06 ... If we agree it can be constrained enough without blocking other use cases, we could reasonably ship it 15:12:20 ... Otherwise go back to MPEG and say we'll be unlikely to ship within their timeframe 15:13:02 Chris: What dependency does MPEG have on our work? 15:13:35 Gary: They're using the API, adding support for unbounded cues, but they realise there's no defined representation yet 15:13:52 ... So they're relying on us to implement that before they ship their next spec. 15:14:11 ... If we don't have a representation they'll pull their changes 15:14:16 present+ Iraj_Sodagar 15:15:56 Chris: Summarise Rob's proposal? 15:16:15 Gary: If a cue is unbounded, just update that end time, and not allow anything else to be updated 15:16:36 ... That could be constrained enough that it doesn't prevent doing the other things should we want to 15:16:53 Chris: Use cases include updating a cue end time from unknown to known 15:17:08 ... Also change cue end time from some known time to another known time 15:17:16 ... And updating other cue attributes 15:19:04 Chris: Consistency across existing implementations? 15:19:28 Gary: They'd ignore the first cue with missing end time 15:20:00 q+ to ask if there's been any development on the data model, in terms of "does a VTTCue represent state or presentation?" 15:20:22 Chris: Is that an acceptable fallback behaviour? Because if not, you need a marker value such as 99 hours to represent unbounded? 15:20:41 ack n 15:20:41 nigel, you wanted to ask if there's been any development on the data model, in terms of "does a VTTCue represent state or presentation?" 15:21:04 Nigel: Looking at the document, returning to the data model topic: what is a cue? 15:21:42 ... In segmented delivery where you keep sending a small chunk, and there may be repetition, you're not representing state, it's more here's what to do at a period of time 15:22:12 ... In the updating a sports score use cases, it changes the use of a cue significantly. The cue payload includes some state, and the cue timing relates to the state 15:22:56 ... So it seems fundamental that we should be clear about what it is. Any more consideration from that point of view? Is that a helpful way to think about it? 15:23:27 Gary: It's worth considering, but I haven't thought from that perspective that much 15:23:59 ... One thing is that WebVTT currently has the karaoke mode, not that's implemented anywhere. In authoring you can say "show these words at this time for this cue" 15:24:05 ... This seems to fall along those lines 15:24:37 Nigel: Is that the syntactical way of updating cues? 15:24:51 Gary: Yes, but now browser implements it, so may be removed 15:24:56 s/now/no/ 15:25:19 Nigel: It's a syntactic niceness. You could get that effect by repeating cues instead 15:25:45 Gary: Yes, but you get large VTT files. Would be nice to get karaoke mode for that reason 15:26:16 Nigel: It seems orthogonal. You're not updating an end time. The parser could update the payload 15:27:01 Gary: Live chapterisation is a question. We know when we change scenes we change the chapter. The sports score use case helped me better understand that 15:27:09 ... The old marker needs to be updated 15:27:55 Nigel: In the chapterisation model, do you need to repeat the chapter information, in segmented media delivery you need to repeat it so you don't need to search back 15:28:20 Gary: Yes. I'm not sure that needs to be in the spec, but we'd want to have an answer for that 15:28:44 ... It is possible for segmented media, you'd want to copy the unbounded cues over each time, potentially every segment 15:30:02 Chris: Segmented delivery has this issue regardless of unbounded cues. Is that defined anywhere? 15:30:49 Gary: May not be defined somewhere, but generally is repeated. We talked at FOMS about making a WebVTT Note that for cues over multiple segments and copy them through as many segments as necessary until the cue ends 15:31:15 Nigel: I don't think there's any presentation defined for chapters. So if you're providing content you'd have to understand the user experience you want. 15:31:55 ... If you want a reasonable acquisition time for chapters you'd have to repeat it often enough. The client would have to understand it's the same chapter 15:32:24 ... To my previous question: what is the VTTCue modelling. It seems to be enough information for the player to do what it needs to do, but not really modelling data changing over time 15:32:47 ... So if you're referring to some data entity consistently from cue to cue, there'd need to be an external way to identify it. 15:33:28 ... If the client needs a running view of the scoreline, it could. A VTT cue in your MP4 payload could be a "score" metadata cue with an id so you know it's a score 15:33:38 ... and the cue with that id can be updated 15:34:39 ... It's not the cue that represents the data that's changing, the cue is painting a state that's modelled in the application which updates the score 15:35:37 ... So you don't need to set the cue as unbounded. For segmented delivery you know what's in the segment when you create it, so can use the segment timing for cue duration 15:35:52 ... Then the application keeps its own model of the data and does whatever it needs to do 15:36:27 Gary: To support jumping into a live stream we have to chop the cue into multiple parts anyway, which is one of the fallback approaches for missing end time 15:39:18 Chris: So in the model Nigel described, we don't need unbounded cues as the cue timing is defined by the segment timing, and "unboundedness" is up to the application 15:39:19 Nigel: Yes 15:40:01 Gary: Would knowing that a cue is going to be repeated through multiple segments be useful to clients? 15:40:44 Nigel: As a data point, with VTT for captions if there's a requirement to teardown captions and rebuild them, a key UX point is that people don't want to see flicker 15:41:38 ... To ways to achieve it: maintain an identifier so there's a contract between the data provider and consumer, so a cue with ID 43 in one segment is promised to be the same as the same ID in another segment 15:42:00 ... The other technique is comparison of the payload, merge together if they're the same 15:42:41 ... With the maintaining of IDs approach, need to define how that works across segments 15:42:52 Gary: Not defined in WebVTT 15:43:10 Nigel: We have the same problem with TTML, it's not defined 15:43:36 Gary: It could be a backwards compatibility issue. Although people don't use IDs in practice it may not be an issue 15:43:58 Nigel: So players would have to do comparisons and then do updates to the future state 15:44:28 Gary: A question is: in the live caption use case, if the content or cue settings differ but the id is the same, do you update the cue? 15:45:02 Nigel: Is the id scoped to the document? Use the begin and end time to work out what's visible at a given time 15:46:10 ... The equivalent in the TTML model, where ids are scoped to the containing document, no claims are made about ids across segments 15:46:45 ... In that case, do a model comparison between now and next (discounting ids). I favour having similar data models if we can 15:46:51 Gary: Makes sense to me 15:47:31 Chris: Would two segments be considered two different documents? 15:47:34 Gary: Right now, yes 15:48:40 ... It sounds like what we're saying is that, for segmented WebVTT, if you have unbounded time events that you want to represent, you have to be able have the user not load all the captions from the beginning of time, copy the cues between segments 15:48:51 q+ to mention moments in time 15:49:27 ... It's not that different from having a bunch of short cues, while having a signal that tells us that this cue is going to be repeated, it's not necessarily required 15:50:30 ... The signal would be the cue has unbounded end time, that indicates it will be repeated, until it ends and is assigned an end time 15:51:25 Nigel: This reminds me of something related. In VTT and TTML, things have beginning and end times, so everything has a duration. 15:52:11 ... Do we need a concept of a "moment" in time? 15:53:47 Gary: An example is an ID3 metadata, right now the way those are done is you get a start time, and the end time is the duration of the video or the start time of the next cue in the ID3 cue points track 15:54:07 ... That's a workaround for representing a moment in time that uses a duration 15:54:19 Nigel: [looking at WebVTT karaoke mode] 15:54:47 ... The idea is to indicate a per-word start time with no duration. I think that's missing from the general timed text data model? 15:55:05 ... People have suggested use cases where that could be useful 15:56:02 ...You could have a metadata tag repeated in each segment, duration matches the segment. But what there isn't is an initial definition of the state, and then at a given moment here's a new state 15:57:06 ... Reminded of a demo. The idea to be able to change the number of words shown at once based on dynamic display. Using timestamps instead of durations would allow you to customise that from a user perspective 15:58:00 ... Could be more interesting than the idea of having unbounded cues, but it's a very different way of doing things. In the same way TTML doesn't support timed metadata without being attached to something in the document such as a div 16:01:21 Chris: Could do a separate out of band query to get the complete state at a moment in time? 16:02:00 Chris: Next steps? 16:02:12 Gary: A follow up meeting when Rob can join. 16:02:43 ... We need to decide what to tell David on timescale 16:03:11 Nigel: I don't think we've identified a strong need to make a change from today's discussion, it just moves the logic elsewhere 16:03:52 Gary: Saying we don't think it'll go in right now is reasonable? 16:04:03 Chris: Do we miss an opportunity? 16:04:39 Gary: It may be another year or so, but shouldn't close the door forever, but it will severely delay it 16:05:13 Nigel: But that's on the basis that nothing is missing now. If we don't have use cases that can't be fulfilled now, less change is good 16:07:00 Chris: We've talked abotu this for timed metadata, what about also for captions? 16:07:04 Gary: Immediate use case doesn't require a spec change it seems 16:07:57 Nigel: If there is another model for live captions that poeple need, where it's firing a caption directly and minimising repetition, would need more discussion 16:08:09 ... Possibly in an RTC scenario 16:08:24 Gary: We did get a question around it for live captioning recently https://github.com/w3c/webvtt/issues/320#issuecomment-917386887 16:08:50 Topic: Next meeting 16:09:26 Gary: Next week if possible, assuming Rob can make it 16:09:28 Chris: OK 16:10:58 [adjourned] 16:11:02 rrsagent, draft minutes 16:11:02 I have made the request to generate https://www.w3.org/2021/09/20-me-minutes.html cpn 16:11:05 rrsagent, make log public 18:08:08 Zakim has left #me