Media Timed Events: WebVTT unbounded cues

Meeting minutes

Agenda

ChrisN: Covered unbounded cues in segmented media last time
… Any features or syntax changes in WebVMT that should be added to WebVTT?
… Discuss whether we need to identify cues across WebVTT documents, if so how, and where to specify?
… Anything more generally on DataCue?

WebVMT features into WebVTT

Rob: This ties to what's happening with DataCue. One feature worth porting across is aligning metadata with VTT and DataCue
… Metadata is a JSON and it's amorphous, no formatting. A small amount of restriction on that would make it more useful or interoperable

ChrisN: In terms of other features, not sure what Gary had in mind

Zack: How far along is WebVMT?

Rob: Not on standards track. Dash-cam market has use cases that would benefit from it, which currently uses proprietary formats
… A way to export the data allows use of proprietary formats, but users can export to a common format to share data

ChrisN: Let's look at the DataCue API relation.

Rob: DataCue has a 'type' field and content in a 'value' field, which can be anything
… Too open-ended. In the WebVTT document, it's just a WebVTT document it's only JSON
… The simple addition of the 'type' such as a URN allows you to recognise what it is

ChrisN: Examples in the explainer: https://github.com/WICG/datacue/blob/main/explainer.md such as org.id3
… What's the scope of the type field? Are they globally defined, or do we have types specific to HLS defined in one place, WebVMT types defined else where

Rob: Thinking of an IANA type registration, to stop different uses stepping on each others toes

Zack: The HLS spec includes the date range type. Prior to that, Disney and Hulu implemented their own, and used com.hulu
… The same functionality as the Apple-defined, but non-conflicting
… When things get more standardised and more adopted, the ability to change the name is helpful
… Having a URN makes it flexible

Rob: When to do this? In the dash-cam market there'll be different variants, so having an open format is beneficial. There may be commonality, which may lead to something like mime types being set up

ChrisN: Is it enough to say this cue is WebVMT data, or do you need to be more specific?

Rob: We don't really have WebVMT data. Started with location, now can be anything. Speed, direction, acceleration. With drones, there's altitude, camera orientation, sensors
… So the data is really sensor data

ChrisN: Thinking of use cases. In HLS, cues are surfaced by the browser to the web app
… This the in-band time metadata cues, same with emsg boxes in DASH
… The other case is where the web app creates cues after reading a WebVMT document
… Where are the interop points?

Rob: WebVMT was designed for devices without connectivity, autonomous. Recording data such as temperature with minimum overhead
… Being able to record in a common format allows others to recognise what it is without converting

ChrisN: We can make setting up a registry for data types a part of the proposal
… Although as a developer I should be able to put any arbitrary data in a DataCue without registering

Zack: SCTE-35 has schemes in the spec, and people use it without registering their scheme. If people use URNs, you'd rarely get conflicts
… Having a spec with optional registering encourages private adoption, proprietary uses. If then later they need external interop, they'll be able to register at that time

ChrisN: Thinking about DataCue more generally - we've focused a lot on defining the emsg mapping, without really resolving
… We could perhaps usefully split DataCue into two parts: the API, as it is in WebKit, the second part is the mapping to emsg
… If there's still interest in emsg, I'd like to come back to that
… Unclear on the extent of interop there

ChrisN: If we end up with app-level emsg parsing, we'd still need a DataCue to put the data on the timeline. Current solution is using VTTCue
… Still need a way to distinguish VTT caption cues from metadata cues

Rob: Can use different TextTracks, one for captions, one for metadata
… If you have a metadata track, all the JSON objects would have a top-level 'type' and 'data' objects that would allow DataCues to be created
… It would be easy to inspect the list of cues to extract the ones you're interested in

ChrisN: So the presence of the 'type' field in itself is helpful, which isn't available with VTTCue
… We can write this into the explainer

Rob: With the current VTTCue workaround, it's a more complex object than a DataCue, as it has presentation stuff we don't need

ChrisN: Mapping of emsg box type information is complex, depends on the DASH-IF event interop work
… Need to summarise the open questions

Identifying cues across WebVTT documents

ChrisN: Context is segmented delivery. Each segment would have its VTT document specific to that section
… Do we need cue identifiers, or identifiers to higher level concepts?
… eg: { chapter: 1 } with consistent cue ids across VTT documents
… or { id: 'some-id', chapter: 1 }

Rob: Do you need identifier at the type level?

Zack: Answer could be driven by need for de-duplication. I'd expect user agents to do the collapsing
… Up-levelling anything needed to enable de-duplication is important. So pulling the id and type out, to bring them together
… In DASH, you have event id, start time, payload. The id allows de-duplication across period boundaries without parsing the data itself

Rob: So if the id is at the same level as type, do we get the same issue - cue ids should be unique within a document, but not between documents?

Zack: From a media perspective, i'd expect it to be unique within the track it's operating in. Could be multiple documents making up the track
… It's not unheard of property. It happens for audio and video, where track descriptions are shared across segments

ChrisN: Next steps: Follow up with Gary on WebVTT cue ids, revisit emsg mapping proposal, talk with DASH-IF about emsg cue id scope

[adjourned]

– DRAFT –
Media Timed Events: WebVTT unbounded cues

11 October 2021

Attendees

Meeting minutes

Agenda

WebVMT features into WebVTT

Identifying cues across WebVTT documents

Diagnostics