W3C

- DRAFT -

Media Timed Events TF / DataCue API

20 Jul 2020

Agenda

Attendees

Present
Chris_Needham, Rob_Smith, John_Simmons, Andy_Rosen, Francois_Daoust, Gary_Katsevman, Rufael_Mekuria, Kazuyuki_Ashimura, Nigel_Megitt, Kazuhiro_Hoya, So_Vang
Regrets
Chair
Chris
Scribe
tidoust

Contents


<scribe> scribenick: cpn

Introduction

<kaz> Agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2020Jul/0012.html

TextTrackCUe end time

https://github.com/whatwg/html/issues/5297

Rob: May have been a misunderstanding, so i've looked again at how cue end times work
... A better way to put it is to have an unbounded cue, has a defined start time, then extends indefinitely
... The cue can end, if the user rewinds the media you'll get an exit event
... Where to go from here?

Chris: Two things: one to confirm if media duration includes text tracks as well as A/V tracks

Rob: I'd assume audio and video tracks would be same length, but metadata tracks not taken into account for duration
... You can have cues with negative cue times
... So a cue that extends the media duration shouldn't affect the duration

Nigel: Regarding negative cue times, in WebVTT the parsing algorithm doesn't accept a sign, so you can't put a negative cue time in a file. Same in TTML
... That's separate to the API in HTML which allows that

<arosen> The Digital Production Partnership explicitly recommends tolerances on short content (commercial messages, promotions, and music videos). An audio that is a quarter second short head and tail is by definition free of gaps.

John: If it's in HTML, why is it that way?

Rob: As I understand it, you can have a cue that starts before the start of media, or cues with start and end both negative

John: Would it be a legitimate constraint to not allow that?

Rob: Yes, I don't think it's relevant to our use cases here

Chris: May be as simple as the spec using double, which has no way to express values 0 or greater
... Eric suggested last time creating a pull request, so let's do that

<scribe> ACTION: Rob and Chris to draft PR for unbounded cue end times

Rufael: Does this work also cover subtitles, only metadata cues

Rob: It would be both, as TextTracks cover both
... There's a captioning use case also, a video with a timed events channel, a label overlaid on the video

TextTrack timing accuracy

https://github.com/whatwg/html/issues/5306

Chris: This change has been merged into HTML. Chromium still working on implementation, not sure of progress

Nigel: Can implementers let us know the status of their implementations?

<scribe> ACTION: Chris to ask browser vendors on implementation status of timing accuracy for TextTrackCues

<arosen> If you download the UK HD Commercials and Sponsorship [Template] from https://www.thedpp.com/ please look at the figure in clause five. Implicitly, cues with a negative time of 250ms should be common for European TV commercial spots.

How should DataCue expose parsed vs unparsed data, or subsets of emsg data?

https://github.com/WICG/datacue/issues/21

<arosen> It's currently in revision but this issue won't go away due to legacy equipment restrictions.

<tidoust> scribe: tidoust

cpn: If you have unparsed data, you expose that as an ArrayBuffer. If you have parsed data that happens to be binary, you would also expose that as an ArrayBuffer. Question being: can you distinguish between the two?
... It seems to be an unusual use case in practice

nigel: It could happen with payload such as MP4 (missed?), that could potentially be the way you do it.
... I would suggest to pay caution here. Unless of assuming that it won't happen, let's assume that it could. It's easy to design to allow it now, and costly to fix it later if we don't, and it turns out to be needed later.

cpn: Payload could be fragmented MP4, is that the use case you're suggesting?

nigel: Potentially. Payload could be audio for instance.

cpn: It's good to have a concrete example.

Johnsim: Generalization of that would be input/output of decoders.

cpn: I tend to agree. It's low cost at this stage and the existing HTML5 DataCue has a field already for exposing the unparsed data. I think we'd then be building on top of an "existing" API.
... I realize that Eric is not around today to present counter arguments. What I'd like to do for now is update the explainer to build on top of the existing API.

kaz: I agree we should clarify actual use case description for this topic. Also I was wondering if WebCodecs guys have some ideas or specific formats in mind. At least, we should be consistent with each other.

cpn: Yes, WebCodecs is really interesting. This raises a more general point about what we're doing here and WebCodecs.
... Given what we discussed on the IG call a couple of weeks ago: WebCodecs won't provide a muxing/demuxing API.
... The ability to expose timed events is an interesting question. I suspec that they would expect any kind of parsing of events track to be the responsibility of applications rather than a responsibility of the browser.
... I think that we need a good argumentation in place as to why we continue to need DataCue.
... Having a solid story for this API in that context is important.

nigel: The two proposals (WebCodecs and DataCue) seem different though

cpn: Right but if you're using WebCodecs, your application would already be doing the demuxing on its own. In that kind of model, it could also be an application-level responsibility to parse the event cues from a media container.
... If the application already has to do the demuxing, then the additional burden of parsing the emsg boxes is maybe not so significant.
... But it seems we're going to have two models of media playback on the Web. Video element + MSE (which is where DataCue is useful), and then a much lower set of APIs around WebCodecs where a lot more of the handling of the media is up to the application.

nigel: It seems that if there is a playback method that does not use text track cues at all, then DataCue is irrelevant, but it is useful any time that text track cues are being used.
... Are they not proposing to feed WebCodecs into MSE?

cpn: My understanding is that it doesn't go into MSE buffers, it's giving you video frames that you can then present [rendering to canvas]

Single vs multiple metadata tracks

https://github.com/WICG/datacue/issues/20

cpn: If you have a number of different cue events, then we came to the conclusion that they should be exposed into a single metadata track. We didn't see compelling scenarios to have mulitple event tracks.
... The only implementation I know of that does it this way is HbbTV, so it would be a change from their perspective to move to this new API.

Looking at requirements

<kaz> https://github.com/WICG/datacue/blob/master/requirements.md

cpn: This document is separate from the explainer and more focused on media industry requirements. Goal is to have something more tailored for DASH-IF and CTA WAVE Project.
... Andy and Rufael, I'd really welcome your feedback here.
... The introduction talks about information that describes the media: metadata and control message (cues used client-side such as those for ad insertion, manifest update)
... The first use case is around ad insertion. Relates to SCTE 35 and related specs for describing insertion points. This in particular is an area where I would like review. I do not know that area well.
... I would like to verify this from an application-level point of view.
... What does an application need to do to respond to these kinds of cues?

arosen: May I share this material with folks at Fraunhofer involved in DASH player and others?
... They might have something to say.

cpn: That would be very welcome!

Johnsim: In terms of feedback that you'd like to get, is there a timeline? We could submit this to CTA WAVE as well, and other companies, and get that feedback.
... Under what format?

cpn: Ideally, as issues in this GitHub repository. If people are unwilling to do that, people can contact me privately.

Johnsim: Issues would seem good.

<kaz> Issues for datacue discussion

cpn: Regarding timescale, the goal is, by TPAC time this year, meaning October 2020, we want to have good alignment between requirements so that we can take that to browser vendors to make the case for why this is needed, and then start spec development for real.
... So all relevant media industry feedback by October, and then on to browser vendors.
... Plan to share this with DASH-IF during 31 July 2020 call. And then in August with the CTA WAVE.

Johnsim: 31 July 2020 seems a relevant deadline to post the feedback so that you can discuss all issues during the DASH-IF call.
... And then setup a follow-up call with CTA WAVE if that seems needed.

arosen: For media companies, there's a meeting in a few minutes for now where a link to the issue tracker could be posted to the minutes.

Johnsim: I'll sync up with Thomas on this.
... What I think would be useful is communicate the plan with dates with browser vendors, so as to share the expectations that things will happen.

<scribe> ACTION: cpn to summarize the plan to collect requirements for DataCue and share with browser vendors

cpn: OK, thanks to all!

<kaz> [adjourned]

Summary of Action Items

[NEW] ACTION: Chris to ask browser vendors on implementation status of timing accuracy for TextTrackCues
[NEW] ACTION: cpn to summarize the plan to collect requirements for DataCue and share with browser vendors
[NEW] ACTION: Rob and Chris to draft PR for unbounded cue end times
 

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version (CVS log)
$Date: 2020/07/21 11:49:41 $