W3C

– DRAFT –
WICG DataCue

15 March 2021

Attendees

Present
Chris_Needham, Iraj_Sodagar, Kaz_Ashimura, Nigel_Megitt, Rob_Smith, Yasser_Syed
Regrets
-
Chair
Chris
Scribe
cpn

Meeting minutes

emsg handling in MSE

Chris: I created a working document see where to change MSE etc specs to add emsg box handling

https://docs.google.com/document/d/1J3QtUa0udRycz1u-B3QtVDmTQ8F-sotcPVVYnqBpfFw/edit#

Chris: Looking at where to add processing steps for emsg. Segment Parser Loop goes to Coded Frame Processing, which talks about timed text Coded Frames. Could add steps here? emsg fits the definition of coded frames

Iraj: Or have a separate processing algorithm, then it's easier to handle differently where we need to

Yasser: What if the emsg contains a splice point - e.g., a SCTE35 message containing splice point definition?

Chris: Do we need the MSE implementation to handle seamless switch to the ad/replacement content?

Iraj: No, the expectation is the web app would handle it, and queue up the new content at beginning of a media segment. We're not talking about doing the splice in MSE in the same source buffer. emsg describes the splice point time, it's the job of the application to switch to the alternate stream

Yasser: Is emsg processed before media is decoded?

Iraj: That's a content authoring issue. you can put emsg at any point in timeline, at start of each segment. it describes a time, either now or in future. For authoring, emsg could be immediate, e.g., show a score on screen, the start time is right at the beginning. If app needs time to process the event, e.g., the switch point is in future so needs time to fetch media segment ahead of time. Starting point of event is in the future. emsg is generic/agnosti[CUT]
… There are two dispatch modes: on-receive gives advance notice to web app, on-start is similar to media samples, so on media timeline
… We don't have API model for the on-receive case, but TextTrack fits for triggering on media timeline
… For on-receive mode, it's similar to the media sample presentation time. Presentation time could be same as media segment start time. So it's similar to on-start
… Presentation time of the event: for on-receive mode, it's the earliest presentation time of the segment that contains it.
… We can then treat the same, MSE dispatches the event at the presentation time

Chris: How does MSE know whether to tread as on-receive or on-start?

Iraj: The scheme owner (scheme_id_uri + value) defines it. MSE is not aware of those schemes, so the app should set the dispatch mode. There's two parts to this: What messages app wants to receive (emsg schemes), and which dispatch mode is to be used. Add an API for app to call to set this up

Yasser: For emsg messages, is the application responsible for repetition of emsg messages?

Iraj: There's a definition of equivalency for emsg boxes: scheme_id_uri, value, id -- equivalent if all the same. If two messages have the same values, MSE should handle it by only dispatching the first one
… What's the lifetime of the equivalency? HOw long to keep track of whether it dispatched the message? The DASH spec has not defined this. Need to think about it

Yasser: Could have a defined duration?

Iraj: MSE has an append window that defines the segment of the media timeline that MSE maintains. Anything outside the append window doesn't exist. For handing messages, could look at the append window. https://w3c.github.io/media-source/#append-window
… Length of append window is left to the implementation. For a live stream with unknown duration, could be a fixed amount. It grows to a maximum size. Depends on how much buffering the player wants to have, for timeshift plus some time ahead for future buffering

Chris: Any changes for emsg box to the append error algorithm in the MSE ISO BMFF byte stream spec? Doesn't discuss if the boxes aren't well formed

Iraj: Suggestion is to discard the emsg box if not well formed. Boxes, if they exist, should have the right format. If emsg doesn't have scheme_id_uri, we may prefer to throw away than append error

Yasser: Other use cases for emsg than splicing?

Iraj: Yes, sparse metadata with media timeline, sports scores, ticker information

Yasser: At the app level, are there any errors that may be due to not getting information on time?

Iraj: If the error depends on the emsg payload, it would be message and app dependent. For a splice point, the message body points to the past. Is it an error or not? Whether it's an error depends on how important it is. Some applications could stop playing

Yasser: If the event is so early the app couldn't process it?

Iraj: If the application can't buffer an event for a long time into the future, the app could issue an error. emsg just carries a message, the app needs to interpret to understand it

Action: Chris to schedule a follow up call to continue detail on emsg and MSE, 8am next Monday

TextTrackCue endTime

Nigel: Do we need a serialization syntax in WebVTT, WebVMT, or is it just the API that changes?

Rob: It does. In the cue line, the timings for the cue - WebVTT requires a start and time. WebVMT allows you to omit the end time. WebVMT also allows you to update the cue, e.g, for a live streaming situation. Record the start time with a blank end time. You can go back to fill in the end time, then it becomes a valid WebVTT cue. If the end time is omitted in WebVTT, it's ignored as an invalid cue, and it continues to process subsequent cues

Nigel: Not discussed further if that's the correct behaviour. Not obvious to me - if the parser doesn't understand it (e.g., an older parser), it ignores altogether. It could fill in a reasonalble alternative value for the data and keep going. Won't get it right for everyone, but something would be displayed instead of nothing. Nobody has come up with the desirable semantic yet for a cue that's supposed to be active but isn't because not parsed

Rob: Unbounded cues are not supported until you implement, so it's acceptable. Adding a later format, fails gracefully as the feature isn't supported yet. It doesn't stop processing, and doesn't do anything with the cue. If you want it do something, would require an implementation change

Nigel: In this case, it skips the entire cue, not just the part of the cue that it couldn't parse

Rob: It skips the cue block associated with it, as it's an error in the cue block

Nigel: If you wanted to supply data in a form that would show something for old-style parsers, and something with improved behaviour for new-style parsers, seems impossible as defined now

Rob: Yes, there'd need to be some update
… In old syntax, it's an error to omit the end time

Nigel: Need to think about some addition to the parsing spec that results in something for old parsers, but allows a new-style parser to assign an infinite end time, rather than the fallback value used in an old-style parser. May be hard to define in WebVTT

Rob: Alternative is to put Infinity, but that still woudn't parse. It would be invalid syntax and presumably would be ignored. Disadvantage: if you subsequently want to bound the cue, you'd have to go back and remove the Infinity value and replace with the actual cue end time. Easier to leave blank, from that point of view

Chris: What's the WebVTT use case?

Nigel: Live captions, so you could construct a VTTCUe with no end time, then later modify its endtime when you know the value, e.g, audience applause where you don't know the end time and followed by a narrator cue. If there's a way to do that programmatically, would be a nice way to arrange things
… Follow on is how to construct this, e.g, WebVTT in MP4. THe MPEG spec change is to handle VTT cues with no end time. Don't have to have an identifier on the cue in MPEG4, so how to go back and update?

Rob: Answer is to set an identifier, then send the cue again with the same identifier

Nigel: Don't think anything is defined in WebVTT for modifying an existing cue

Rob: Chris did you investigate?

Chris: Yes, modifying cue start/end times to see what HTML TextTrackCue events get fired. Still need to file implementation bugs on that

Nigel: Difference between serialization format and the in-memory processing model. Imagine having an MP4 document with a cue with infinite end time, then another cue arrives with a non-infinite end time. Needs to be defined how to handle

Rob: Reuse a cue identifier?

Nigel: Not come up so far, as all cues have finite duration. What's the requirement for identifiers to be unique? In the MPEG context, you deliver lots of small WebVTT files. If you want to deliver in separate documents/files - a cue with no end time, then a modification to the cue end time, need to define uniqueness across all the files, e.g., across a DASH or HLS manifest

Chris: Changing any arbitrary cue information?

Nigel: Not clear what happens if the constraint is broken

Rob: Wondering about the purpose. Use case: sports in a game, put as a TextTrackCue on screen. We don't know how long it lasts, or what the updated text will be when the score changes. This could have a 'score' id that you want to update the content of the cue

Nigel: I'd expect the time of the event to be reflected in the cue time, and some other identifier. Separate events that happen. The end time on each of those when issued is Infinity. When a score change occurs, we go back amd update the end time. The score itself is a metadata key, can have multiple keys
… Could be a metadata identifier, not necessarily a VTTCue id. if you only have one cue with the score, you wouldn't be able to replay the changes to the event information (score)
… Metadata cues are part of the data model for WebVTT. The cue text itself identifies that the cue contains the score. So there'd need to have separate cue identifiers. And conceptually, you don't want to modify a single cue, rather stop presenting one, start presenting another

Rob: CoUld have a cue that's shown across the entire match, update the text in it

Nigel: But what happens if you rewind, it shows the wrong score?

Rob: WebVMT handles it be replaying all the cues from the beginning

Nigel: The TextTrackCue models handles it by having start/end times

Rob: The update happens at a known time, so it's handled in the same way as a cue, I think

Chris: emsg has similar thing, rules for updating cues based on the id

Nigel: TextTrackCue already provides logic for stepping through time and updating what's valid. If a scheme allows update after a seek event, also do some other fetching an processing to modify cues to reflect the point of time

Rob: Questioning whether it's possible in the curent spec

Nigel: Race condition. If you want to notice that the current playhead time has moved and reset the cues int he TextTrackCueList to reflect the state at that point in time, and the UA does its own updating ... will be hard to define ... application and UA both updating

Chris: Time marches on will activate/deactivate

Nigel: If the cue is somehow modified, the text has changed, and the current playback position is still in the start-end interval. There's no change so an onenter/onexit event shouldn't happen

Chris: We don't have an event to let the web app know that some arbitrary attribute of a cue was changed. Do we need that?

Rob: I don't see any wording to prevent changes to the cue content

Nigel: WebVTT spec doesn't talk about it at all

Chris: What does TTML define?

Nigel: There's a few things to consider. Nothing in a TTML spec talks about modifying cues. It does describe how to modify the presentation. How it typically works; in a live stream protocol,e..g, DASH with MP4, choose a segmentation period, send chunks of TTML document, one for each segment (sample in MPEG spec, often one sample/segment)
… If I want a subtitle to appear that's red, then change it to blue, I arrange it to have an end time, then add another p element with 'blue'
… All of the live streaming type applications of TTML don't rely on there being any state in the receiver. Similar to the WebVTT cue identifier, there's a requirement that no XML id is duplicated, but no such constraint across multiple documents - e.g., with EBU TT live where there's a sequence of documents
… The specs for those define order of precedence. At most one document can be active. If you want some information to persist across sample boundaries or docs in your sequence, just duplicate them betweem the documents. This allows you to seek and not be dependnent on having seen the previous documents

Rob: Could webvtt do the same thing?

Nigel: There isn't a TTMLCue, so there isn't an identifier you can use to go back to, to modify the cue. A higher level issue: no defined mapping to TextTrackCue from TTML cues
… You can have an entry in TTML with no end time, it's scoped to it's parent end time. Could be active forever. Modify that afterwards isn't possible
… With the sequence of documents, document 1 could have text with no end time, document 2 does have a begin time. The end time of document 1 is the begin time of document2. By definition that applies the end time to document 1. So if you have more than one document (normal case), you'd have time bounds on the TTML docuemnts in the sequence. When that happens, anything with infinite end time doesn't have anymore

Rob: TTML document is broken into segments that are self contained. So it's like putting lots of documents in a stream. Handle in the same way?

Nigel: Look at MPEG4. Groups of WebVTT documents, handling defined very different to groups of TTML documents. What's being delivered is a bunch of cues, the cues can have been delivered by multiple documents

Rob: If you treated the stream of video and text and metadata as a series of small files. There's one of those for each period, and they don't overlap. Could construct the VTT files in a similar way

Nigel: You could, but the model may not be build that way. There'd need to be a semantic defined for that model

Rob: Thinking about how a cue update would work, in a live stream you'd have an unbounded cue that ends at some point. You'd go back and end that cue, add a new one starting at that time (could also be unbounded). That would play back correctly

Chris: Time marches on shouldn't be affected in that case. Do we need a "cue changed" event for the WebVTT in MP4? Does it allow things other than the end time to be changed?

Nigel: The WebVTT in MP4 is the only place that describes how to package VTT files within one track (e.g, per language). Possibly HLS also? It imposes a requirement on cue ids across files, and behaviour if something different changes. What should happen if other attributes change along with the cue end time?
… The wrapper has timing structure, not sure how it works in WebVTT. You can deliver cue payload data that goes outside sample boundaries

Action: Rob to create web platform tests for the TextTrackCue endTime change

Chris: TextTrackCue has no constructor, so tests could use VTTCue

Nigel: Sequence of WebVTT files with overlapping cues, cue id uniqueness across multiple files is not defined so far

Rob: If you can go back and modify a previous TextTrackCue object, what are the use cases? What parts of the data model need to change?

Nigel: The way we've defined this so far is by having unknown end time. Alternatively, you could set it with a known end time, and replace it with another one, as often you need. This would work with existing parsers. Then you could layer an extra tag: this cue is a time extension of another cue, to indicate to an implementation to update an existing cue end time. This could achieve the same result but without breaking existing stuff

Rob: But what if the end time update came too late?

Nigel: You'd have to allow for that. It's a delivery and packaging issue, e.g, segment duration

Nigel: Good to get feedback on WebVTT missing cue end time / update model. Need more detail on the semantic of modifying previously-sent cues

Rob: Could be done by id

Nigel: But ids could be scoped by file, so needs to be a combination of file + id. Rule for uniqueness across files may need a file identifier

Rob: To update a cue that's still active, there needs to be a way to update that cue. Could you repeat the whole cue: start time, end time, content, to replace the previous cue?
… Don't see a purpose to changing everything on the fly
… Maybe doesn't require an id, just match it on the content and start time
… Requirement is to be able to stop a cue, and be able to identify the cue uniquely

Next call

Chris: Next week, focusing on emsg and MSE, then in 4 weeks for general Media Timed Events

[adjourned]

Summary of action items

  1. Chris to schedule a follow up call to continue detail on emsg and MSE, 8am next Monday
  2. Rob to create web platform tests for the TextTrackCue endTime change
Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Diagnostics

Maybe present: Chris, Iraj, Nigel, Rob, Yasser