14:52:58 RRSAgent has joined #me 14:52:58 logging to https://www.w3.org/2021/03/15-me-irc 14:53:02 Zakim has joined #me 14:53:19 meeting: WICG DataCue 14:53:24 chair: Chris 14:53:41 present+ Chris_Needham 15:05:48 fghilardi has joined #me 15:24:45 present+ Kaz_Ashimura, Chris_Needham, Yasser_Syed, Iraj_Sdagar 15:26:16 Agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2021Mar/0033.html 16:04:43 nigel has joined #me 17:34:18 Zakim has left #me 20:08:27 rrsagent, draft minutes 20:08:27 I have made the request to generate https://www.w3.org/2021/03/15-me-minutes.html cpn 20:08:50 Present: Kaz_Ashimura, Chris_Needham, Yasser_Syed, Iraj_Sodagar, Nigel_Megitt, Rob_Smith 20:08:54 Topic: emsg handling in MSE 20:09:02 scribenick: cpn 20:09:06 Chris: I created a working document see where to change MSE etc specs to add emsg box handling 20:09:12 https://docs.google.com/document/d/1J3QtUa0udRycz1u-B3QtVDmTQ8F-sotcPVVYnqBpfFw/edit# 20:09:18 Chris: Looking at where to add processing steps for emsg. Segment Parser Loop goes to Coded Frame Processing, which talks about timed text Coded Frames. Could add steps here? emsg fits the definition of coded frames 20:09:23 Iraj: Or have a separate processing algorithm, then it's easier to handle differently where we need to 20:09:28 Yasser: What if the emsg contains a splice point - e.g., a SCTE35 message containing splice point definition? 20:09:33 Chris: Do we need the MSE implementation to handle seamless switch to the ad/replacement content? 20:09:44 Iraj: No, the expectation is the web app would handle it, and queue up the new content at beginning of a media segment. We're not talking about doing the splice in MSE in the same source buffer. emsg describes the splice point time, it's the job of the application to switch to the alternate stream 20:09:49 Yasser: Is emsg processed before media is decoded? 20:09:55 Iraj: That's a content authoring issue. you can put emsg at any point in timeline, at start of each segment. it describes a time, either now or in future. For authoring, emsg could be immediate, e.g., show a score on screen, the start time is right at the beginning. If app needs time to process the event, e.g., the switch point is in future so needs time to fetch media segment ahead of time. Starting point of event is in the future. emsg is generic/agnosti[CUT] 20:10:09 ... There are two dispatch modes: on-receive gives advance notice to web app, on-start is similar to media samples, so on media timeline 20:10:16 ... We don't have API model for the on-receive case, but TextTrack fits for triggering on media timeline 20:10:23 ... For on-receive mode, it's similar to the media sample presentation time. Presentation time could be same as media segment start time. So it's similar to on-start 20:10:30 ... Presentation time of the event: for on-receive mode, it's the earliest presentation time of the segment that contains it. 20:10:34 ... We can then treat the same, MSE dispatches the event at the presentation time 20:10:39 Chris: How does MSE know whether to tread as on-receive or on-start? 20:10:52 Iraj: The scheme owner (scheme_id_uri + value) defines it. MSE is not aware of those schemes, so the app should set the dispatch mode. There's two parts to this: What messages app wants to receive (emsg schemes), and which dispatch mode is to be used. Add an API for app to call to set this up 20:10:59 Yasser: For emsg messages, is the application responsible for repetition of emsg messages? 20:11:06 Iraj: There's a definition of equivalency for emsg boxes: scheme_id_uri, value, id -- equivalent if all the same. If two messages have the same values, MSE should handle it by only dispatching the first one 20:11:12 ... What's the lifetime of the equivalency? HOw long to keep track of whether it dispatched the message? The DASH spec has not defined this. Need to think about it 20:11:17 Yasser: Could have a defined duration? 20:11:23 Iraj: MSE has an append window that defines the segment of the media timeline that MSE maintains. Anything outside the append window doesn't exist. For handing messages, could look at the append window. https://w3c.github.io/media-source/#append-window 20:11:29 ... Length of append window is left to the implementation. For a live stream with unknown duration, could be a fixed amount. It grows to a maximum size. Depends on how much buffering the player wants to have, for timeshift plus some time ahead for future buffering 20:11:35 Chris: Any changes for emsg box to the append error algorithm in the MSE ISO BMFF byte stream spec? Doesn't discuss if the boxes aren't well formed 20:11:41 Iraj: Suggestion is to discard the emsg box if not well formed. Boxes, if they exist, should have the right format. If emsg doesn't have scheme_id_uri, we may prefer to throw away than append error 20:11:46 Yasser: Other use cases for emsg than splicing? 20:11:50 Iraj: Yes, sparse metadata with media timeline, sports scores, ticker information 20:11:55 Yasser: At the app level, are there any errors that may be due to not getting information on time? 20:12:00 Iraj: If the error depends on the emsg payload, it would be message and app dependent. For a splice point, the message body points to the past. Is it an error or not? Whether it's an error depends on how important it is. Some applications could stop playing 20:12:04 Yasser: If the event is so early the app couldn't process it? 20:12:12 Iraj: If the application can't buffer an event for a long time into the future, the app could issue an error. emsg just carries a message, the app needs to interpret to understand it 20:12:20 Action: Chris to schedule a follow up call to continue detail on emsg and MSE, 8am next Monday 20:12:24 Topic: TextTrackCue endTime 20:12:32 Nigel: Do we need a serialization syntax in WebVTT, WebVMT, or is it just the API that changes? 20:12:39 Rob: It does. In the cue line, the timings for the cue - WebVTT requires a start and time. WebVMT allows you to omit the end time. WebVMT also allows you to update the cue, e.g, for a live streaming situation. Record the start time with a blank end time. You can go back to fill in the end time, then it becomes a valid WebVTT cue. If the end time is omitted in WebVTT, it's ignored as an invalid cue, and it continues to process subsequent cues 20:12:45 Nigel: Not discussed further if that's the correct behaviour. Not obvious to me - if the parser doesn't understand it (e.g., an older parser), it ignores altogether. It could fill in a reasonalble alternative value for the data and keep going. Won't get it right for everyone, but something would be displayed instead of nothing. Nobody has come up with the desirable semantic yet for a cue that's supposed to be active but isn't because not parsed 20:12:51 Rob: Unbounded cues are not supported until you implement, so it's acceptable. Adding a later format, fails gracefully as the feature isn't supported yet. It doesn't stop processing, and doesn't do anything with the cue. If you want it do something, would require an implementation change 20:12:56 Nigel: In this case, it skips the entire cue, not just the part of the cue that it couldn't parse 20:13:01 Rob: It skips the cue block associated with it, as it's an error in the cue block 20:13:07 Nigel: If you wanted to supply data in a form that would show something for old-style parsers, and something with improved behaviour for new-style parsers, seems impossible as defined now 20:13:12 Rob: Yes, there'd need to be some update 20:13:16 ... In old syntax, it's an error to omit the end time 20:13:23 Nigel: Need to think about some addition to the parsing spec that results in something for old parsers, but allows a new-style parser to assign an infinite end time, rather than the fallback value used in an old-style parser. May be hard to define in WebVTT 20:13:32 Rob: Alternative is to put Infinity, but that still woudn't parse. It would be invalid syntax and presumably would be ignored. Disadvantage: if you subsequently want to bound the cue, you'd have to go back and remove the Infinity value and replace with the actual cue end time. Easier to leave blank, from that point of view 20:13:35 Chris: What's the WebVTT use case? 20:13:41 Nigel: Live captions, so you could construct a VTTCUe with no end time, then later modify its endtime when you know the value, e.g, audience applause where you don't know the end time and followed by a narrator cue. If there's a way to do that programmatically, would be a nice way to arrange things 20:13:46 ... Follow on is how to construct this, e.g, WebVTT in MP4. THe MPEG spec change is to handle VTT cues with no end time. Don't have to have an identifier on the cue in MPEG4, so how to go back and update? 20:13:51 Rob: Answer is to set an identifier, then send the cue again with the same identifier 20:13:55 Nigel: Don't think anything is defined in WebVTT for modifying an existing cue 20:13:58 Rob: Chris did you investigate? 20:14:04 Chris: Yes, modifying cue start/end times to see what HTML TextTrackCue events get fired. Still need to file implementation bugs on that 20:14:13 Nigel: Difference between serialization format and the in-memory processing model. Imagine having an MP4 document with a cue with infinite end time, then another cue arrives with a non-infinite end time. Needs to be defined how to handle 20:14:18 Rob: Reuse a cue identifier? 20:14:24 Nigel: Not come up so far, as all cues have finite duration. What's the requirement for identifiers to be unique? In the MPEG context, you deliver lots of small WebVTT files. If you want to deliver in separate documents/files - a cue with no end time, then a modification to the cue end time, need to define uniqueness across all the files, e.g., across a DASH or HLS manifest 20:14:30 Chris: Changing any arbitrary cue information? 20:14:34 Nigel: Not clear what happens if the constraint is broken 20:14:42 Rob: Wondering about the purpose. Use case: sports in a game, put as a TextTrackCue on screen. We don't know how long it lasts, or what the updated text will be when the score changes. This could have a 'score' id that you want to update the content of the cue 20:14:51 Nigel: I'd expect the time of the event to be reflected in the cue time, and some other identifier. Separate events that happen. The end time on each of those when issued is Infinity. When a score change occurs, we go back amd update the end time. The score itself is a metadata key, can have multiple keys 20:14:56 ... Could be a metadata identifier, not necessarily a VTTCue id. if you only have one cue with the score, you wouldn't be able to replay the changes to the event information (score) 20:15:00 ... Metadata cues are part of the data model for WebVTT. The cue text itself identifies that the cue contains the score. So there'd need to have separate cue identifiers. And conceptually, you don't want to modify a single cue, rather stop presenting one, start presenting another 20:15:05 Rob: CoUld have a cue that's shown across the entire match, update the text in it 20:15:09 Nigel: But what happens if you rewind, it shows the wrong score? 20:15:13 Rob: WebVMT handles it be replaying all the cues from the beginning 20:15:19 Nigel: The TextTrackCue models handles it by having start/end times 20:15:24 Rob: The update happens at a known time, so it's handled in the same way as a cue, I think 20:15:28 Chris: emsg has similar thing, rules for updating cues based on the id 20:15:38 Nigel: TextTrackCue already provides logic for stepping through time and updating what's valid. If a scheme allows update after a seek event, also do some other fetching an processing to modify cues to reflect the point of time 20:15:43 Rob: Questioning whether it's possible in the curent spec 20:15:48 Nigel: Race condition. If you want to notice that the current playhead time has moved and reset the cues int he TextTrackCueList to reflect the state at that point in time, and the UA does its own updating ... will be hard to define ... application and UA both updating 20:15:52 Chris: Time marches on will activate/deactivate 20:15:59 Nigel: If the cue is somehow modified, the text has changed, and the current playback position is still in the start-end interval. There's no change so an onenter/onexit event shouldn't happen 20:16:03 Chris: We don't have an event to let the web app know that some arbitrary attribute of a cue was changed. Do we need that? 20:16:08 Rob: I don't see any wording to prevent changes to the cue content 20:16:14 Nigel: WebVTT spec doesn't talk about it at all 20:16:17 Chris: What does TTML define? 20:16:22 Nigel: There's a few things to consider. Nothing in a TTML spec talks about modifying cues. It does describe how to modify the presentation. How it typically works; in a live stream protocol,e..g, DASH with MP4, choose a segmentation period, send chunks of TTML document, one for each segment (sample in MPEG spec, often one sample/segment) 20:16:28 ... If I want a subtitle to appear that's red, then change it to blue, I arrange it to have an end time, then add another p element with 'blue' 20:16:49 ... All of the live streaming type applications of TTML don't rely on there being any state in the receiver. Similar to the WebVTT cue identifier, there's a requirement that no XML id is duplicated, but no such constraint across multiple documents - e.g., with EBU TT live where there's a sequence of documents 20:17:01 ... The specs for those define order of precedence. At most one document can be active. If you want some information to persist across sample boundaries or docs in your sequence, just duplicate them betweem the documents. This allows you to seek and not be dependnent on having seen the previous documents 20:17:09 Rob: Could webvtt do the same thing? 20:17:18 Nigel: There isn't a TTMLCue, so there isn't an identifier you can use to go back to, to modify the cue. A higher level issue: no defined mapping to TextTrackCue from TTML cues 20:17:25 ... You can have an entry in TTML with no end time, it's scoped to it's parent end time. Could be active forever. Modify that afterwards isn't possible 20:17:34 ... With the sequence of documents, document 1 could have text with no end time, document 2 does have a begin time. The end time of document 1 is the begin time of document2. By definition that applies the end time to document 1. So if you have more than one document (normal case), you'd have time bounds on the TTML docuemnts in the sequence. When that happens, anything with infinite end time doesn't have anymore 20:17:41 Rob: TTML document is broken into segments that are self contained. So it's like putting lots of documents in a stream. Handle in the same way? 20:17:46 Nigel: Look at MPEG4. Groups of WebVTT documents, handling defined very different to groups of TTML documents. What's being delivered is a bunch of cues, the cues can have been delivered by multiple documents 20:17:52 Rob: If you treated the stream of video and text and metadata as a series of small files. There's one of those for each period, and they don't overlap. Could construct the VTT files in a similar way 20:17:57 Nigel: You could, but the model may not be build that way. There'd need to be a semantic defined for that model 20:18:04 Rob: Thinking about how a cue update would work, in a live stream you'd have an unbounded cue that ends at some point. You'd go back and end that cue, add a new one starting at that time (could also be unbounded). That would play back correctly 20:18:09 Chris: Time marches on shouldn't be affected in that case. Do we need a "cue changed" event for the WebVTT in MP4? Does it allow things other than the end time to be changed? 20:18:15 Nigel: The WebVTT in MP4 is the only place that describes how to package VTT files within one track (e.g, per language). Possibly HLS also? It imposes a requirement on cue ids across files, and behaviour if something different changes. What should happen if other attributes change along with the cue end time? 20:18:22 ... The wrapper has timing structure, not sure how it works in WebVTT. You can deliver cue payload data that goes outside sample boundaries 20:18:27 Action: Rob to create web platform tests for the TextTrackCue endTime change 20:18:31 Chris: TextTrackCue has no constructor, so tests could use VTTCue 20:18:36 Nigel: Sequence of WebVTT files with overlapping cues, cue id uniqueness across multiple files is not defined so far 20:18:41 Rob: If you can go back and modify a previous TextTrackCue object, what are the use cases? What parts of the data model need to change? 20:18:50 Nigel: The way we've defined this so far is by having unknown end time. Alternatively, you could set it with a known end time, and replace it with another one, as often you need. This would work with existing parsers. Then you could layer an extra tag: this cue is a time extension of another cue, to indicate to an implementation to update an existing cue end time. This could achieve the same result but without breaking existing stuff 20:18:55 Rob: But what if the end time update came too late? 20:19:00 Nigel: You'd have to allow for that. It's a delivery and packaging issue, e.g, segment duration 20:19:08 Nigel: Good to get feedback on WebVTT missing cue end time / update model. Need more detail on the semantic of modifying previously-sent cues 20:19:12 Rob: Could be done by id 20:19:16 Nigel: But ids could be scoped by file, so needs to be a combination of file + id. Rule for uniqueness across files may need a file identifier 20:19:20 Rob: To update a cue that's still active, there needs to be a way to update that cue. Could you repeat the whole cue: start time, end time, content, to replace the previous cue? 20:19:25 ... Don't see a purpose to changing everything on the fly 20:19:29 ... Maybe doesn't require an id, just match it on the content and start time 20:19:34 ... Requirement is to be able to stop a cue, and be able to identify the cue uniquely 20:19:37 Topic: Next call 20:19:44 Chris: Next week, focusing on emsg and MSE, then in 4 weeks for general Media Timed Events 20:19:49 [adjourned] 20:19:55 rrsagent, draft minutes 20:19:55 I have made the request to generate https://www.w3.org/2021/03/15-me-minutes.html cpn 20:20:01 rrsagent, make log public v2 20:20:08 rrsagent, make log public