14:49:18 <RRSAgent> RRSAgent has joined #me
14:49:18 <RRSAgent> logging to https://www.w3.org/2021/09/20-me-irc
14:49:22 <Zakim> Zakim has joined #me
14:56:35 <cpn> Meeting: Media Timed Events
14:56:45 <cpn> Present+ Chris_Needham
14:56:47 <cpn> Chair: Chris
14:56:49 <cpn> Agenda: https://www.w3.org/events/meetings/2b88a9a9-b1bc-463e-973f-018e98cb1558/20210920T160000
14:56:57 <cpn> scribenick: cpn
15:01:49 <nigel> Present+ Nigel_Megitt
15:01:52 <alicia> alicia has joined #me
15:01:58 <cpn> present+ Alicia_Boya, Louay_Bassbouss
15:03:03 <cpn> present+ Gary_Katsevman
15:04:38 <gkatsev> https://github.com/w3c/webvtt/issues/496#issuecomment-921999893
15:04:40 <cpn> Topic: Unbounded cues in WebVTT
15:05:03 <cpn> Gary: I posted some comments on the issue. For unbounded cues, one of the things we have concerns about is backwards compatibility
15:05:13 <cpn> ... If you have a new unbounded cue, what do old parsers do?
15:05:42 <cpn> ... Thinking about it, I'm leaning towards what Rob was saying, that there isn't a good way to represent unbounded cues in the old way
15:06:03 <cpn> ... You may not know ahead of time what the unbounded cues represent - e.g., a cue will never get an end time
15:06:14 <cpn> ... Or if it will get an end time, but don't know when
15:06:54 <cpn> ... Could be represented differently, for example: if you know if it will never ends, you could put 99 hours as the end time. Or if it does have an end time, copy the cue in small duration increments
15:07:22 <cpn> Nigel: Is that an argument for not needing anything more than what we have now?
15:07:50 <cpn> Gary: For this use case (unbounded cues), it's fine if there's no way for them to show up in old parsers
15:08:06 <cpn> Nigel: Looking at the use cases document?
15:08:34 <cpn> Gary: I'm talking specifically about just being able to represent unbounded cues rather than a particular use case
15:09:06 <cpn> ... David asked about how likely this would go into WebVTT as MPEG would need to decide whether to keep their changes in or not
15:09:32 <cpn> ... Is it possible to have this in a narrow enough scope that this feature can be added now and expanded more later
15:10:16 <cpn> https://github.com/w3c/media-and-entertainment/blob/master/media-timed-events/unbounded-cues.md
15:10:38 <cpn> Chris: We haven't completed the use case list. Do we need them all to make progress on representation?
15:11:34 <cpn> Gary: I don't think we need an exhaustive list of use cases so that we can decide whether to ship a constrained feature, then expand it to cover all the uses we'd want
15:12:06 <cpn> ... If we agree it can be constrained enough without blocking other use cases, we could reasonably ship it
15:12:20 <cpn> ... Otherwise go back to MPEG and say we'll be unlikely to ship within their timeframe
15:13:02 <cpn> Chris: What dependency does MPEG have on our work?
15:13:35 <cpn> Gary: They're using the API, adding support for unbounded cues, but they realise there's no defined representation yet
15:13:52 <cpn> ... So they're relying on us to implement that before they ship their next spec.
15:14:11 <cpn> ... If we don't have a representation they'll pull their changes
15:14:16 <cpn> present+ Iraj_Sodagar
15:15:56 <cpn> Chris: Summarise Rob's proposal?
15:16:15 <cpn> Gary: If a cue is unbounded, just update that end time, and not allow anything else to be updated
15:16:36 <cpn> ... That could be constrained enough that it doesn't prevent doing the other things should we want to
15:16:53 <cpn> Chris: Use cases include updating a cue end time from unknown to known
15:17:08 <cpn> ... Also change cue end time from some known time to another known time
15:17:16 <cpn> ... And updating other cue attributes
15:19:04 <cpn> Chris: Consistency across existing implementations?
15:19:28 <cpn> Gary: They'd ignore the first cue with missing end time
15:20:00 <nigel> q+ to ask if there's been any development on the data model, in terms of "does a VTTCue represent state or presentation?"
15:20:22 <cpn> Chris: Is that an acceptable fallback behaviour? Because if not, you need a marker value such as 99 hours to represent unbounded?
15:20:41 <cpn> ack n
15:20:41 <Zakim> nigel, you wanted to ask if there's been any development on the data model, in terms of "does a VTTCue represent state or presentation?"
15:21:04 <cpn> Nigel: Looking at the document, returning to the data model topic: what is a cue?
15:21:42 <cpn> ... In segmented delivery where you keep sending a small chunk, and there may be repetition, you're not representing state, it's more here's what to do at a period of time
15:22:12 <cpn> ... In the updating a sports score use cases, it changes the use of a cue significantly. The cue payload includes some state, and the cue timing relates to the state
15:22:56 <cpn> ... So it seems fundamental that we should be clear about what it is. Any more consideration from that point of view? Is that a helpful way to think about it?
15:23:27 <cpn> Gary: It's worth considering, but I haven't thought from that perspective that much
15:23:59 <cpn> ... One thing is that WebVTT currently has the karaoke mode, not that's implemented anywhere. In authoring you can say "show these words at this time for this cue"
15:24:05 <cpn> ... This seems to fall along those lines
15:24:37 <cpn> Nigel: Is that the syntactical way of updating cues?
15:24:51 <cpn> Gary: Yes, but now browser implements it, so may be removed
15:24:56 <cpn> s/now/no/
15:25:19 <cpn> Nigel: It's a syntactic niceness. You could get that effect by repeating cues instead
15:25:45 <cpn> Gary: Yes, but you get large VTT files. Would be nice to get karaoke mode for that reason
15:26:16 <cpn> Nigel: It seems orthogonal. You're not updating an end time. The parser could update the payload
15:27:01 <cpn> Gary: Live chapterisation is a question. We know when we change scenes we change the chapter. The sports score use case helped me better understand that
15:27:09 <cpn> ... The old marker needs to be updated
15:27:55 <cpn> Nigel: In the chapterisation model, do you need to repeat the chapter information, in segmented media delivery you need to repeat it so you don't need to search back
15:28:20 <cpn> Gary: Yes. I'm not sure that needs to be in the spec, but we'd want to have an answer for that
15:28:44 <cpn> ... It is possible for segmented media, you'd want to copy the unbounded cues over each time, potentially every segment
15:30:02 <cpn> Chris: Segmented delivery has this issue regardless of unbounded cues. Is that defined anywhere?
15:30:49 <cpn> Gary: May not be defined somewhere, but generally is repeated. We talked at FOMS about making a WebVTT Note that for cues over multiple segments and copy them through as many segments as necessary until the cue ends
15:31:15 <cpn> Nigel: I don't think there's any presentation defined for chapters. So if you're providing content you'd have to understand the user experience you want.
15:31:55 <cpn> ... If you want a reasonable acquisition time for chapters you'd have to repeat it often enough. The client would have to understand it's the same chapter
15:32:24 <cpn> ... To my previous question: what is the VTTCue modelling. It seems to be enough information for the player to do what it needs to do, but not really modelling data changing over time
15:32:47 <cpn> ... So if you're referring to some data entity consistently from cue to cue, there'd need to be an external way to identify it.
15:33:28 <cpn> ... If the client needs a running view of the scoreline, it could. A VTT cue in your MP4 payload could be a "score" metadata cue with an id so you know it's a score
15:33:38 <cpn> ... and the cue with that id can be updated
15:34:39 <cpn> ... It's not the cue that represents the data that's changing, the cue is painting a state that's modelled in the application which updates the score
15:35:37 <cpn> ... So you don't need to set the cue as unbounded. For segmented delivery you know what's in the segment when you create it, so can use the segment timing for cue duration
15:35:52 <cpn> ... Then the application keeps its own model of the data and does whatever it needs to do
15:36:27 <cpn> Gary: To support jumping into a live stream we have to chop the cue into multiple parts anyway, which is one of the fallback approaches for missing end time
15:39:18 <cpn> Chris: So in the model Nigel described, we don't need unbounded cues as the cue timing is defined by the segment timing, and "unboundedness" is up to the application
15:39:19 <cpn> Nigel: Yes
15:40:01 <cpn> Gary: Would knowing that a cue is going to be repeated through multiple segments be useful to clients?
15:40:44 <cpn> Nigel: As a data point, with VTT for captions if there's a requirement to teardown captions and rebuild them, a key UX point is that people don't want to see flicker
15:41:38 <cpn> ... To ways to achieve it: maintain an identifier so there's a contract between the data provider and consumer, so a cue with ID 43 in one segment is promised to be the same as the same ID in another segment
15:42:00 <cpn> ... The other technique is comparison of the payload, merge together if they're the same
15:42:41 <cpn> ... With the maintaining of IDs approach, need to define how that works across segments
15:42:52 <cpn> Gary: Not defined in WebVTT
15:43:10 <cpn> Nigel: We have the same problem with TTML, it's not defined
15:43:36 <cpn> Gary: It could be a backwards compatibility issue. Although people don't use IDs in practice it may not be an issue
15:43:58 <cpn> Nigel: So players would have to do comparisons and then do updates to the future state
15:44:28 <cpn> Gary: A question is: in the live caption use case, if the content or cue settings differ but the id is the same, do you update the cue?
15:45:02 <cpn> Nigel: Is the id scoped to the document? Use the begin and end time to work out what's visible at a given time
15:46:10 <cpn> ... The equivalent in the TTML model, where ids are scoped to the containing document, no claims are made about ids across segments
15:46:45 <cpn> ... In that case, do a model comparison between now and next (discounting ids). I favour having similar data models if we can
15:46:51 <cpn> Gary: Makes sense to me
15:47:31 <cpn> Chris: Would two segments be considered two different documents?
15:47:34 <cpn> Gary: Right now, yes
15:48:40 <cpn> ... It sounds like what we're saying is that, for segmented WebVTT, if you have unbounded time events that you want to represent, you have to be able have the user not load all the captions from the beginning of time, copy the cues between segments
15:48:51 <nigel> q+ to mention moments in time
15:49:27 <cpn> ... It's not that different from having a bunch of short cues, while having a signal that tells us that this cue is going to be repeated, it's not necessarily required
15:50:30 <cpn> ... The signal would be the cue has unbounded end time, that indicates it will be repeated, until it ends and is assigned an end time
15:51:25 <cpn> Nigel: This reminds me of something related. In VTT and TTML, things have beginning and end times, so everything has a duration.
15:52:11 <cpn> ... Do we need a concept of a "moment" in time?
15:53:47 <cpn> Gary: An example is an ID3 metadata, right now the way those are done is you get a start time, and the end time is the duration of the video or the start time of the next cue in the ID3 cue points track
15:54:07 <cpn> ... That's a workaround for representing a moment in time that uses a duration
15:54:19 <cpn> Nigel: [looking at WebVTT karaoke mode]
15:54:47 <cpn> ... The idea is to indicate a per-word start time with no duration. I think that's missing from the general timed text data model?
15:55:05 <cpn> ... People have suggested use cases where that could be useful
15:56:02 <cpn> ...You could have a metadata tag repeated in each segment, duration matches the segment. But what there isn't is an initial definition of the state, and then at a given moment here's a new state
15:57:06 <cpn> ... Reminded of a demo. The idea to be able to change the number of words shown at once based on dynamic display. Using timestamps instead of durations would allow you to customise that from a user perspective
15:58:00 <cpn> ... Could be more interesting than the idea of having unbounded cues, but it's a very different way of doing things. In the same way TTML doesn't support timed metadata without being attached to something in the document such as a div
16:01:21 <cpn> Chris: Could do a separate out of band query to get the complete state at a moment in time?
16:02:00 <cpn> Chris: Next steps?
16:02:12 <cpn> Gary: A follow up meeting when Rob can join.
16:02:43 <cpn> ... We need to decide what to tell David on timescale
16:03:11 <cpn> Nigel: I don't think we've identified a strong need to make a change from today's discussion, it just moves the logic elsewhere
16:03:52 <cpn> Gary: Saying we don't think it'll go in right now is reasonable?
16:04:03 <cpn> Chris: Do we miss an opportunity?
16:04:39 <cpn> Gary: It may be another year or so, but shouldn't close the door forever, but it will severely delay it
16:05:13 <cpn> Nigel: But that's on the basis that nothing is missing now. If we don't have use cases that can't be fulfilled now, less change is good
16:07:00 <cpn> Chris: We've talked abotu this for timed metadata, what about also for captions?
16:07:04 <cpn> Gary: Immediate use case doesn't require a spec change it seems
16:07:57 <cpn> Nigel: If there is another model for live captions that poeple need, where it's firing a caption directly and minimising repetition, would need more discussion
16:08:09 <cpn> ... Possibly in an RTC scenario
16:08:24 <cpn> Gary: We did get a question around it for live captioning recently https://github.com/w3c/webvtt/issues/320#issuecomment-917386887
16:08:50 <cpn> Topic: Next meeting
16:09:26 <cpn> Gary: Next week if possible, assuming Rob can make it
16:09:28 <cpn> Chris: OK
16:10:58 <cpn> [adjourned]
16:11:02 <cpn> rrsagent, draft minutes
16:11:02 <RRSAgent> I have made the request to generate https://www.w3.org/2021/09/20-me-minutes.html cpn
16:11:05 <cpn> rrsagent, make log public
18:08:08 <Zakim> Zakim has left #me