Media Timed Events / DataCue

Meeting minutes

TextTrackCue end time

Rob: It was proposed at TPAC Lyon, steadily progressing. Things have accelerated thanks to Gary who pointed out that the WPT needed looking at
… I've written tests, proposed a change to WebVTT as well, as it inherits HTML TextTrackCue
… There are three pull requests ready to go
… Any final reviews, and then we're done

Chris: Any indications of implementer support?

Rob: WebKit interested, Eric has a use case where they want to use this, mentioned in the WebVTT pull request
… Discussion on whether there should be WebVTT syntax changes. The proposed change was minimal
… Need to validate, as NaN or -Infinity is not supported
… The VTTCue constructor is needed for WPT, because TextTrackCue doesn't have a constructor
… There's no support in WebVTT for unbounded cues, in WebVMT can omit the end time
… Simple example, in discussion with WebVTT, a 0-0 game score, we don't know when it will change, and we don't know when the end of the game is
… so 0-0 could be an unbounded cue. Should it be handled in WebVTT syntax? What's the best way to do that?
… Driven by use cases, propose syntax changes

Chris: Where to discuss, here or in TTWG?

Nigel: Not clear we have a validated set of use cases yet. Either as this WICG group or MEIG, I think any validated use cases we can use to test proposed solutions would be helpful
… Rob has put in detailed comments. Before changing the spec, which has some complexity, eg, identifying cues to update, across multiple documents
… Finding a simple solution that meets those requirements, so requirements as input to the process, real world use cases would be helpful
… So MEIG could make it easier for people to provide input

Chris: Agree, we can do that under MEIG

<RobSmith> Unbounded cue use cases and WebVTT syntax issue https://github.com/w3c/webvtt/issues/496

Kaz: I'd like to remind you that MEIG used to produce use case documents, could use the use case template updated by the WoT IG

<kaz> use case template (MD)

<kaz> use case template (HTML)

Chris: Stakeholders?

Rob: Eric at Apple has an existing use case, I have WebVMT. Gary mentioned FOMS, cue update matching that Nigel mentioned - by start time and content
… Need to relax the time ordering, can only update the end time to bound an unbounded cue. May be out of order if matching by start time
… Issue #496 discusses. For WebVMT, I've thought about the syntax, which could work well in WebVTT
… Keen to support in WebVMT as a recording format. Two proposals for cue matching: cue content, and cue identifiers
… If you use start+content to match cues, can do across different WebVTT files. But it's repetitive as you need to repeat the content, and add the end time
… Using cue identifiers, which ties with the WebKit use cases, where you have a cue to be updated periodically, but you don't know when
… There's a sequence of cues, content modified at each step, involves no repetition. WebVMT does this for the interpolation scheme

Nigel: It's really important to understand what problem we're solving before designing solutions
… The solution design belongs in TTWG, the requirements input we need to get right, as input to that

Rob: I disagree, I have the use cases for WebVMT

Nigel: There are edge cases not fully explored, we need to understand other use cases, e.g., specifically for WebVTT
… Don't want to paint ourselves in a corner

Rob: There are 4 use cases in the issue, please give feedback

Nigel: Needs a validation exercise. People may not be aware that they need to give input

Rob: Let's publicise it, there's some interest already

Chris: What's the scope? WebVTT based formats, WebVMT, in-band emsg?

Rob: Cue format is shared between WebVTT and WebVMT, some additional bits in WebVTT, VTTRegion isn't needed for metadata. The way the cue works is identical, so the solution could be shared between WebVTT and WebVMT

Nigel: Specifically, we need to hear from people using WebVTT in a fragmented MP4 context, who may want to use unbounded cues
… That explores the edge case of IDs not needing to be unique across multiple documents, and not depending on cue start time and cue text being unique

<Andy> HEY Iraj - if the end time is very large, then would a packager such as Rufael's need to repeat the payload inside LL fragmetns?

Nigel: Would be useful to try to engage those

Andy: Hope Iraj can clarify. I remember a call between myself, Iraj, Rufeal had - when subtitles are long and going into a packager making segments, you end up repeating the IMSC payload in each chunk
… If we start putting large end times in WebVTT payloads, issues with low latency?

Iraj: I suppose that if the target is low latency delivery, durations for either IMSC or WebVTT, it doesn't add latency for decoding, so you get the target latency. It would be part of the encoding characteristics

Andy: So we don't end up with the 0-0 score being repeated in every chunk

Iraj: The main concern for low latency is other components than subtitles

Nigel: You ask a good question, Andy. What happens if you miss the cue? If you also didn't see the beginning, how do you know it's supposed to be there?

Iraj: I thought that part of webVTT design relies on repeating the cues

Nigel: That's true for fragmented TTML and IMSC design. There's a clear model for how it works
… The WebVTT model is different, not sure of the details

Iraj: Is that WebVTT in fMP4 or as a side file?

Nigel: I think both should be considered. The one where that needs a good answer is framgneted mp4
… Non MP4 delivery needs to work also

Iraj: Isn't it possible to repeat WebVTT cues in fragmented MP4?
… I thought that was one of the key features of the design
… I was involved recently in event message tracks discussion, which was based on WebVTT. you repeat messages in case client missed earlier messages

Rob: Talking about fragments, are they fragments of a single WebVTT file, or are they a sequence of consecutive WebVTT files?

Iraj: In terms of fragmentation, assume you have a WebVTT. When you package it in ISO BMFF fragments, you break down the file into different cues. You put each WebVTT cue in an envelope as samples
… Each sample can have multiple WebVTT cues, and samples that are empty
… Each WebVTT cue is the same as in the WebVTT file, which defines start and end and text and positional information

Rob: Are the VTT cues treated as being in a single VTT file, or as separate VTT files becuase they're fragmented

Iraj: I assume when you get streams of VTT they come from the same file. I'd need to check. Not sure if you can multiplex cues from different files in the same fragment
… I thought the design is to break a file into fragments to package in to ISOBMFF, that's the scope of the spec

Rob: Do you have an example you could add to the issue?

Iraj: I'd suggest asking David Singer, as one of the editors of that spec

Chris: DASH-IF events group looking at repeating emsg boxes

Iraj: That's right, not just across fragments but also across periods

Chris: So we could draft a use case doc based on the info in issue #496. Any volunteers?

Rob: Happy to help.

Nigel: Gary also could give input
… Suggest contacting Gary and Media WG chairs, ask for people to be involved

<RobSmith> Gary's comment on use cases: https://github.com/w3c/webvtt/pull/493#issuecomment-808411871

Chris: Also WAVE?

Nigel: Their interest may more in IMSC, they may not have an issue

<RobSmith> Eric's WebKit use case comment: https://github.com/w3c/webvtt/pull/493#issuecomment-808429391

Chris: Support from Chromium to implement the TextTrackCue?

Rob: We talked about doing myself, also Firefox, as they're open source
… We could ask someone at Google about their interest in implementing
… I opened issues in the various browser bug trackers.

Chris: So next step is to reply to those issues saying we have spec PRs ready and WPTs

Rob: I can do that

DASH emsg in MSE

Chris: https://github.com/WICG/datacue/issues/26

Iraj: Depends on whether the emsg is v0 or v1. In v0 the event start time is signalled by an offset from the earliest presentation of the segment carrying the event
… With v0, you don't need any additional information. The timestampOffset is subtracted from the earliest presentation time of the segment, to give the location in the buffer
… The offset to the start time of the event, where it sits in the buffer, you just add the start time offset of the event. It's not the same buffer
… For v1, the reference time is the presentation start time, which MSE doesn't know. Similar to timestampOffset which is provided by the application, the UA can also provide a new offset, event timestampOffset, that value can be subtraceted to give the location of the start of the message

Chris: We need to write the processing rules

Iraj: Just came from a MPEG meeting. I'm writing a contribution with graphs that show how the timing works compared to the MSE, showing how all the offsets can be used
… It'll go to MPEG to be discussed

Chris: That's exactly what we need, to answer questions from the MSE spec editors about emsg integration

[adjourned]

– DRAFT –
Media Timed Events / DataCue

19 April 2021

Attendees

Meeting minutes

TextTrackCue end time

DASH emsg in MSE