MEIG WebVTT Unbounded Cues

Meeting minutes

Use cases

Gary: Live captioning
… Overlap between live captioning and segmented VTT files

Chris: How is overlap handled with segmented delivery?

Gary: If you know the cue end time, you can just put that, and not repeat the cue in the next segment. That should generally work
… At FOMS (a long time ago) we talked about segmented VTT delivery. There wasn't the idea of unbounded cues then. If you have the start/end time and text, most caption displays won't display and remove the caption. It will look like a continuous cue.
… So you could have the previous cue end at the segment bounary, then repeat the cue at the start, with start time set to end of previous segment
… That's not ideal, and would require display clients to not blink the caption out in between
… At the time, we thought that shuold be a TTWG note that the client shouldn't do that in these cases
… Would be nice if there were a more explicit signal

Chris: Any difference between captioning and chapter segments?

Gary: For live chapterisation, it would be the same. It ends at some point, then when you know the next chapter you can update the previous one

Chris: Last discussed in TTWG: https://www.w3.org/2021/05/13-tt-minutes.html

Chris: At that meeting, Pierre talked about live captioning where a caption may be updated.
… Is this a use case to be added to the document - updating existing captions?

Gary: Do you mean updating the text of the cue?

Chris: That seems to be the use case

Gary: That makes sense. Instead of making it into two cues, you could have a single cue that gets updated
… It would be useful if Nigel or Pierre were on this call. Not sure how other caption formats handle this
… Having an overview of that should help us, so we're not hitting the same issues with live

Chris: So specific scenarios i listed in the requirements section of the doc: Req1 is updating an existing cue
… Does this need to be more specific, or do 1a, b, c cover all cases?

Gary: I think so

Chris: Updating just the text or any other attributes?

Gary: VTTCue API allows anything to be updated, so could add that
… Is this requirements, or more of a wish-list? We may decide we want all or only some of these

Chris: [Describes Use cases vs Requirements]

Gary: Useful to capture the use cases and see how they translate into requirements.
… We could add a note to say that when we move to the spec we may not address all the requirements. Some could be done in user-land

Chris: looking at use cases, perhaps VTT in fMP4 should move to be a requirement

Gary: Yes, it seems lower level. It's equivalent to segmented VTT. In fMP4 the last cue can be unbounded. I'm not completely clear on that though

Chris: In the TTWG meeting, Cyril talked about sync samples.

Gary: If you have segmented vtt, you can seek to the middle of the video, and you'd only load the segmented vtt from that point on.
… If you have unbounded cues you'd need to collate them across multiple segments. Otherwise the client needs to collate them

Chris: And that means merging duplicates or reconciling cues that have been updated between the segments

Gary: Yes

Chris: That's kind of in use case 2a, b. Wondering whether to put "live captioning" as the use case. and move those to the requirements list

Gary: This would potentially apply to VoD segmented too. For that you already know the timing, so shouldn't be an issue, so calling it "live captioning" should be good

Chris: Nigel said: "it may be worth understanding and describing a working model for how to deliver live captions in a VTT context.".

Gary: Known end times, and segment accordingly. That's because WebVTT gives you that, so can't do anything else
… Could be part of the reason why people ship 608/708 to the client

Zack: With 608/708, one of the main reasons we send that for live streams, the state machine running at the encoder to output the cues. It's a non-trivial lift, not easy. Once we have plans to accept TTML based or other captioning formats then normalise everything to WebVTT at that point.

Chris: Does this lead to a requirement for unbounded cues?

Zack: We haven't dived deep enough to know yet. If you're trying to capture whole sentences in 608 as single cues, then unbounded is more likely what you want, then update the unbounded cue
… Otherwise you might put out bounded captions based on live segment size

Gary: For that, would you have the danger of cutting a sentence in the middle then starting a new cue in the middle of the sentence

Zack: You wouldn't have time, performing updates across

Gary: There's an unofficial draft from 2014 of how to convert 608/708 to WebVTT, it could potentially help

<zacharycava> https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html

Gary: It's out of date with respect to new WebVT features, e.g., first version of regions
… If there's ways to ensure you don't get cues that are really short

Chris: I think we have a set of well known use cases, should be reasonable easy to define, and derive a set of requirments from

Gary: And for each requirement, state which use case it relates to

Chris: To conclude, we need to update the document to capture what we've discussed

Gary: I can help, yes

Chris: The doc was an initial idea, happy for things to move around
… Once the doc is updated, we can get Nigel and Pierre's input on live TTML / IMSC captioning

Gary: Could ask at next TTWG meetings
… Once we have the doc more fleshed out, then we can decide if we want another call
… We can use an upcoming MTE DataCue call to get an update too

Chris: Also make sure we've captured the WebVMT requirement properly. I can follow up with Rob on that

[adjourned]

– DRAFT –
MEIG WebVTT Unbounded Cues

15 June 2021

Attendees

Meeting minutes

Use cases

Diagnostics