Media Timed Events TF / WICG DataCue

17 Jun 2019


Chris_Needham, Kazuyuki_Ashimura, Steve_Morris, Rob_Smith, Vladimir_Levantovsky, So_Vang, Nigel_Megitt


<scribe> scribenick: kaz



Chris: There's a lot to share today and get your input on.
... We have some input from DASH-IF to review.
... I would like to present an initial DataCue API proposal I've been working on.
... Review the DataCue explainer and start to capture a list of open questions for the API.
... Finally, next steps.

<Zakim> nigel, you wanted to ask if looking at design proposals is in scope of this IG?

Nigel: Can we discuss design proposals in an IG call?

Chris: I should have said at the start, it was mentioned in the agenda email, that we're running this call under the WICG terms.
... We're pretty much settled with the requirements, so I would like to move forward to design proposal.

Nigel: So anybody who would like to make contribution should join the WICG.

Chris: Yes
... AOB?

Rob: One item, I'll be presenting at OGC

Action items


Chris: A few things from the last call
... I added add detail on events supported in HLS to the IG Note. There's an an open PR

PR 47

Chris: The next item was around terminology, some disambiguation
... This is done, there's a PR open, some discussion on the changes.

PR 49

Chris: Nigel, there was one question from you regarding payload type signalling.
... I think the document captures that, unless I've missed something? Please let me know.
... I will merge in the next a few days, if no more feedback.

Nigel: I'll check

Chris: Next, Kaz to set up GH repo so that Chris can add changes.
... And Nigel to draft text to add to section 3.4 use cases around high rate captions.

Nigel: I've not done that yet.

Chris: And Rob to raise an issue in the DataCue repo for unknown / to-be-determined cue end times.

Rob: I raised 2 issues, for cue end time at end of media, and for updating cue times.
... Updating cue times may be a non-goal, if it's already covered by the existing API.



Chris: I've not done much work on the explainer document recently, focused on producing an initial API proposal

DASH-IF client processing model

<cpn> https://github.com/Dash-Industry-Forum/Events/blob/ba8ab99552c1e17ed4b05c5f440149757091f56e/Output/event.pdf

Chris: (Explains diagram in Figure 1)
... Some of the events are DASH-specific, so go to the DASH Client Control module, rather than the Application.
... Architecture here is somewhat generic. In HTML we have some of these components, e.g., the DASH Access API could be thought of in part as MSE.
... Different architecture for web-based players, where handling the MPD is done at the application level (Or by the UA for HbbTV).
... There are two dispatch modes called them on-receive and on-start. In the IG's media timed events document, we call them type 1 dispatch and type 2 dispatch, based on our early conversations with DASH-IF.
... (Figure 2) Here, the media timeline would be what's exposed by an HTML media element.
... (Figure 3) There's some detail on the structure of emsg events: timestamps, schema ID, message payload as a byte array.
... And then discussion around triggering timing.
... This document is quite useful, I like the diagrams that capture the different timing models visually.
... (Figure 6). MPD events timing model with on-receive and on-start triggering.
... The document doesn't present IDL for an API.
... (Figure 12) There's a mapping of how these values are obtained for MPD and emsg.
... This is useful input, we have two groups working in parallel, some communication to make sure we're reasonably aligned.
... I think what I'll present next is compatible with this.
... Any feedback or questions?

Nigel: One question about synchronization. Does this document describe exactly the exact mechanism?
... On the web, we have the event loop, so not sure how this fits with the model we have in HTML with TextTrackCue.

Chris: It's a good question, but I haven't read this in enough detail to know.
... Nigel: OK, let's come back to that.

DataCue API design


Chris: My proposal is that we use the use existing texttrack mechanism as much as possible, rather than defining a new set of APIs that are timed metadata specific.
... TextTrack give us a lot of the capabilities we need, and we can extend them to add the parts that are missing.
... One of the things missing is on-receive model, we have on-start.
... I have a proposal that simplifies adding event handlers to cues.
... I read the TAG's API design guidelines about feature detection.
... I also based the API proposal on the WebKit DataCue and aspects of HbbTV's emsg support.
... I'm also proposing no change to element, i.e, not proposing you could author a track element that references some kind of timed metadata track, e.g., a WebVMT document, and expect the UA to do something with it natively, I'd expect the application to handle that.
... That would give the flexibility in terms of the data formats that we use, because we can handle at the application level,
... rather than expecting the UA to handle every different kind of timed metadata format, or attempting to standardise on a particular format.

Rob: I would agree with that. The existing API includes a way of allowing metadata, rather than prescribing it, so I think that's sufficient.

Chris: Thanks. I looked at the HbbTV API, this has a native DASH player.
... It says that there'll be a TextTrack for each event type signalled in the MPD.
... For emsg, it will provide a TextTrack for each event type in the media representation currently being played.
... Any changes to the set of event streams, e.g., on switching representations or on MPD update, there are addtrack / removetrack events.
... My proposal is different to that. I'm proposing that, because in our requirements analysis, we said we want events only to be surfaced on application request, we have an opt-in model.
... It would be an application responsibility to know what kind of events are present.
... There isn't a discovery mechanism to know what kinds of events are present in the content.
... If think that's consistent with the requirements we have, but if there is a need for discovery, we'd have to go back to our requirements to capture that.

Nigel: How would the application know? Would this be a element inside a element, with an identifier that tells the UA to expect a certain kind of event.

Chris: Yes, exactly. The application would have to know enough about the content to know what kinds of events would be present.
... (Shows a big diagram starting with "HTMLMediaElement" on the upper/left side).
... I mapped out the TextTrack API. Changes to support DataCue and in-band events are shown in bold text.
... First question was: How would we let an application register its interest in receiving some kind of in-band event?
... I looked at the existing addTextTrack method, takes kind, and optional label, language parameters.
... That in itself doesn't allow us to identify an in-band event. Looking at the TextTrack interface, it has an inBandMetadataTrackDispatchType, which seems to be the existing identifier.

Nigel: Is it not the id?

Chris: That's the unique identifier for the text track, it doesn't tell you anything about the content of the text track.
... My first idea was overloading addTextTrack and adding a TextTrackConfig dictionary, where you can set the inBandMetadataTrackDispatchType.
... Seemed OK, and compatible with the existing API, UA can check the received parameter type to see how to handle.
... Then I remembered TAG guidance on feature detection. The only way you could detect this is to call it and see if it works or throws an error.
... A different way is to add a new method, e.g, addInBandMetadataTrack.
... TextTrackList doesn't change. In TextTrack, I've added oncuereceived, which is intended to support the on-receive dispatch mode that DASH-IF are interested in.
... This would be called when a timed metadata cue is extracted from the media content, as opposed to its time on the media timeline.
... I added oncueenter and oncueexit events for the on-start dispatch mode.

Nigel: Isn't duplicating the enter and exit handlers at another level in the object hierarchy against the current design?

Chris: One reason for this is so that you don't have to inspect each individual cue to attach handlers.
... If you're creating cues in the web application, you can do that, but there's no opportunity to do that for cues generated by the UA.
... The only way you can action those cues is by using the oncuechange, which leads to the sync problem because you have to inspect activeCues list which may not reflect what's currently active.

Nigel: Why not have an oncueadded or even an ontexttrackchanged event handler, so you can iterate through the TextTrackCues that have been added or removed and add the handlers to them?
... The problem you started with is different to the solution you've ended up with.

Chris: That's interesting, you could actually solve this with oncuereceived, because that gives you the cue when its parsed from the content, and you wouldn't need these extra oncueenter / oncueexit handlers.

Nigel: That feels cleaner.

Rob: My understanding of oncuechange is that it has two sub-types, onenter and onexit, and is onreceive not the one that happens before all of them?
... Would that not be more consistent with the current design, and also give what you want?
... Need to double check, but I think that's what oncuechange gives.

Chris: (Figure 10 in the DASH-IF doc). oncuereceived corresponds to the Arrival Time (AT) here, and the cue's onenter corresponds to the Presentation/Start Time (ST).

Rob: So this diagram shows it clearly that onreceive happens before onstart and onend.
... Are those not part of oncuechange?

Chris: I'd need to go back to the HTML spec to study oncuechange.


scribe: How do you know which cue is entered or exited? Need to use the activeCues list.

Rob: The cue should be passed with the event.

Nigel: The trouble is having multiple cues that could have been made active or not active. I wouldn't expect oncuechange to be called when a cue is received, because being received doesn't make it active or not active.

Chris: And doing that may give a compatibility issue with existing usage of oncuechange.

Rob: My concern was having duplication of onenter and onexit, in addition to oncuechange.
... I'm wondering if we need to discuss further.

Chris: I'll update this based on this feedback, oncuereceived may be all that's needed
... On TextTrackCue, we change the endTime to allow us to set to Infinity, which I think requires unrestricted double, although that allows you to set other strange values (e.g., NaN).
... And finally on the DataCue interface, as specified in HTML5 there's an ArrayBuffer with the data payload.
... We're proposing to use the WebKit approach, which is to have a value and a type, and the value could be an ArrayBuffer or an object or string or whatever,
... either depending on what you've give it, for application created events, or for UA created events it would be up to the UA to decide the appropriate representation.
... and we'd want to specify how that's done for emsg events for example.
... That's a summary of the proposal so far, a few changes to the existing interfaces rather than a new distinct set of APIs.
... Our requirements map well to the existing structure.
... I then looked in detail at the inBandMetadataTrackDispatchType in HTML.
... The spec describes how that field is populated for different kinds of media.
... I'm not so familiar with the details of these formats, but this is a summary from HTML.
... Not all browsers have support for DataCue or the inBandMetadataTrackDispatchType field.
... Interestingly, Chrome doesn't expose inBandMetadataTrackDispatchType.
... I looked at HbbTV, the TextTrack property / emsg binding. You concatenate the scheme id and value, to identify the event stream.
... We would need to consider if we can use this, or something similar.

Next steps

UNKNOWN_SPEAKER: What I'm proposing to do next is update the DataCue explainer with the latest text from the IG note,
... and add example code that illustrates all these features I've just described,
... and then schedule a call with implementers.
... I had the opportunity to talk with Apple during the Fraunhofer Media Web Symposium recently,
... they suggested coming up with a strawman proposal and scheduling a call, so that's what I've shared today.
... I want to make sure we have support from everybody, it will be interesting to see what in-band events different implementers want to support.
... Any feedback?

Rob: Thanks, Chris. That looks good, thank you very much!

Chris: Thank YOU!

Rob: I like your approach with minimum changes, the existing stuff has been thought through.
... Should we follow up about the events?

Chris: Yes, we don't have to wait the next scheduled call, let's do it separately.

Rob: OK

Chris: I want to schedule the call with implementers as soon as we can, so don't want to wait for the next scheduled TF call on July 15th.

Open Geospatial Consortium meeting

Rob: There is an OGC technical committee meeting coming up next week.
... I will present WebVMT and some of this DataCue WICG activity, to show there's more work going on in this area and generate interest in video and location from the Open Geospatial side.
... It's my first OGC meeting, I've proposed a breakout session.
... This would cover a wider area, to get some activity going either there or elsewhere in OGC, bring in different groups.
... They have something similar to WICG, known as the testbed.
... They're very application focused, from a geospatial point of view. Interested to get feedback.
... I have created a GitHub issue

<RobSmith> https://github.com/w3c/sdw/issues/1130

Chris: Excellent, thank you, I wish you well with that.
... I'm away myself next week, so I don't know exactly when we'll have the call with implementers, but I'll likely follow up week after next on that.


Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2019/06/25 07:35:34 $