Meeting minutes
Use cases
Gary: Live captioning
… Overlap between live captioning and segmented VTT files
Chris: How is overlap handled with segmented delivery?
Gary: If you know the cue end time, you can just put that, and not repeat the cue in the next segment. That should generally work
… At FOMS (a long time ago) we talked about segmented VTT delivery. There wasn't the idea of unbounded cues then. If you have the start/end time and text, most caption displays won't display and remove the caption. It will look like a continuous cue.
… So you could have the previous cue end at the segment bounary, then repeat the cue at the start, with start time set to end of previous segment
… That's not ideal, and would require display clients to not blink the caption out in between
… At the time, we thought that shuold be a TTWG note that the client shouldn't do that in these cases
… Would be nice if there were a more explicit signal
Chris: Any difference between captioning and chapter segments?
Gary: For live chapterisation, it would be the same. It ends at some point, then when you know the next chapter you can update the previous one
Chris: Last discussed in TTWG: https://
Chris: At that meeting, Pierre talked about live captioning where a caption may be updated.
… Is this a use case to be added to the document - updating existing captions?
Gary: Do you mean updating the text of the cue?
Chris: That seems to be the use case
Gary: That makes sense. Instead of making it into two cues, you could have a single cue that gets updated
… It would be useful if Nigel or Pierre were on this call. Not sure how other caption formats handle this
… Having an overview of that should help us, so we're not hitting the same issues with live
Chris: So specific scenarios i listed in the requirements section of the doc: Req1 is updating an existing cue
… Does this need to be more specific, or do 1a, b, c cover all cases?
Gary: I think so
Chris: Updating just the text or any other attributes?
Gary: VTTCue API allows anything to be updated, so could add that
… Is this requirements, or more of a wish-list? We may decide we want all or only some of these
Chris: [Describes Use cases vs Requirements]
Gary: Useful to capture the use cases and see how they translate into requirements.
… We could add a note to say that when we move to the spec we may not address all the requirements. Some could be done in user-land
Chris: looking at use cases, perhaps VTT in fMP4 should move to be a requirement
Gary: Yes, it seems lower level. It's equivalent to segmented VTT. In fMP4 the last cue can be unbounded. I'm not completely clear on that though
Chris: In the TTWG meeting, Cyril talked about sync samples.
Gary: If you have segmented vtt, you can seek to the middle of the video, and you'd only load the segmented vtt from that point on.
… If you have unbounded cues you'd need to collate them across multiple segments. Otherwise the client needs to collate them
Chris: And that means merging duplicates or reconciling cues that have been updated between the segments
Gary: Yes
Chris: That's kind of in use case 2a, b. Wondering whether to put "live captioning" as the use case. and move those to the requirements list
Gary: This would potentially apply to VoD segmented too. For that you already know the timing, so shouldn't be an issue, so calling it "live captioning" should be good
Chris: Nigel said: "it may be worth understanding and describing a working model for how to deliver live captions in a VTT context.".
Gary: Known end times, and segment accordingly. That's because WebVTT gives you that, so can't do anything else
… Could be part of the reason why people ship 608/708 to the client
Zack: With 608/708, one of the main reasons we send that for live streams, the state machine running at the encoder to output the cues. It's a non-trivial lift, not easy. Once we have plans to accept TTML based or other captioning formats then normalise everything to WebVTT at that point.
Chris: Does this lead to a requirement for unbounded cues?
Zack: We haven't dived deep enough to know yet. If you're trying to capture whole sentences in 608 as single cues, then unbounded is more likely what you want, then update the unbounded cue
… Otherwise you might put out bounded captions based on live segment size
Gary: For that, would you have the danger of cutting a sentence in the middle then starting a new cue in the middle of the sentence
Zack: You wouldn't have time, performing updates across
Gary: There's an unofficial draft from 2014 of how to convert 608/708 to WebVTT, it could potentially help
<zacharycava> https://
Gary: It's out of date with respect to new WebVT features, e.g., first version of regions
… If there's ways to ensure you don't get cues that are really short
Chris: I think we have a set of well known use cases, should be reasonable easy to define, and derive a set of requirments from
Gary: And for each requirement, state which use case it relates to
Chris: To conclude, we need to update the document to capture what we've discussed
Gary: I can help, yes
Chris: The doc was an initial idea, happy for things to move around
… Once the doc is updated, we can get Nigel and Pierre's input on live TTML / IMSC captioning
Gary: Could ask at next TTWG meetings
… Once we have the doc more fleshed out, then we can decide if we want another call
… We can use an upcoming MTE DataCue call to get an update too
Chris: Also make sure we've captured the WebVMT requirement properly. I can follow up with Rob on that
[adjourned]