W3C

- DRAFT -

DataCue and "Time marches on" in HTML

17 Sep 2019

Attendees

Present
romain, gkatsev, pal, chcunningham, ericc, jkamata, stepsteg, Nigel
Regrets
Chair
Chris Needham
Scribe
Cyril, nigel

Contents


<cyril> scribe: Cyril

<scribe> meeting: DataCue and "Time marches on" in HTML

chris: topics are discuss the DataCue API and the Time marches on algorithm

DataCue API

chris: if anybody is not familiar so far
... our goal is we want to introduce native user agent support
... for DASH events
... as past of support for MPEG CMAF content
... alongside if UA has native user support for DASH playback
... we would like to support out of band MPD events
... HbbTV is an example of a player that has nativd DASH support
... I'd like to discuss implementer interest for other metadata cue formats
... for example Safari has support for ID3
... we also want API support for application-generated timed metadata cues
... when generated by a player
... the existing approach is to use the VTT cue
... either inline or as a reference in the VTTCue object
... having a more convenient datacue API that let us store in the preferred format would be better
... our goals are: sync arbitrary data with video
... e.g. dashcams sensor data
... there is a lot of interest from the Open Geo Consortium
... the applicability is broader than for the M&E IG
... [shows current support for DataCue API]
... basically no support so far in Chrome or Firefox
... some support in Safari with an extended API
... [showing the data structure for emsg]
... there a 2 versions that differ wrt the timing
... there would need a mapping between the esmg timing and the cue timing
... in v0 the timing is relative
... the data is a byte array, so you need a schema to identify the data
... DASH-IF is working in parallel around specifying a delivery and processing model for DASH events
... they are considering more types of players, not web platform only
... one of the requirements they have identified
... is that in order for an application to get prepared for presenting a cue, e.g. a video overlay
... that may require fetching other resources
... they signal the event to the application ahead of time
... to be able to render at the appropriate time
... so we have 2 events: onreceive and onstart
... I have a number of questions:
... it relates to the early discussion around this, should in band events be exposed as a byte array
... or should they be exposed as objects
... the second approach makes it easier for app devs
... this may be desirables for cues that are commonly used
... for example within DASH players
... the emsg can be used for application specific events
... and we don't need support from browsers for those
... there is a question of how we identify inband tracks
... there are various fields
... all of them seem to enable identifying the kind of metadata
... it is not clear to me reading the spec and comparing implementation
... what the level of support is

chris_c: on your first bullet
... is it a reasonable behavior to fallback to the opaque array buffer when you don't understand the type?

chris: I'd like to understand what common subset can be supported?
... but the fallback could be a good approach if we have an API for that

francois: youll end up with 2 representations for the same data
... so in the end the devs have to handle the opaque case
... so in this case we shouldn't bother about the structured object

eric: I disagree strongly
... there are metadata formats that are very difficult for JS to parse
... correctly
... so that's why I added the implementation to WebKit
... because we had lots of requests to support datacues
... and just supporting arrays is not doing web authors a servie

francois: it is better to have a system that works across browsers

eric: if we ddecide that structured data is important
... we need to agree on a set of types
... that we want to support
... there will always be custom metadata
... people can put anything a container format
... and they want to have access to them
... it does not make sense to have support for limited set

chris: I can imagine a world where an impl wants to provide access to ID3 and another not
... it's the responsability of the dev to know that

eric: I agree that we should not end up in this situation

mounir: is there benefit in trying to avoid that?

eric: I think so
... we don't have to end up there
... if we can come up with a way to describe the cue
... and require that a browser that uses that identifier have a structured data

mounir: there could be security issues and different parsing if the browser do it themself

richard: if the parsing within the browser and use webassembly, how could there be a security issue?

mounir: if you use webassembly that's ok
... we try to avoid doing parsing in C++

chris_c: I'm trying to understand what the fallback would look like
... maybe the ID3 would not be contentious

chris: having an API structure that lets the application introspect the cue

gkatsev: ID3 in HLS, safari parses it, but in other browsers you have to do it yoursefl

nigel: is the data in the array buffer a registered type

ericc: no the data has no indication

cyril: no magic number?

ericc: no

chris: the emsg also indicates the scheme id

ericc: with the current data cue api, the array buffer would have that whole thing from start to end
... and you'd have to snfiff the bits to figure out if it's an emsg or id3
... and it's going to have to parse it to determine if it's a emsg or not

nigel: imagine that we expose this data through MSE
... the bytestream would be identifiable

ericc: the UA, thing that parses the raw media container, does have a signal about what kind of metadata it is
... if the data cue had a scheme and identifier for the type of metadata and an array buffer
... then in theory it could know how to parse it
... the reason I decided that was not practical for us
... is that there are metadata values that are extremely complex to parse
... like HLS has a pList
... writing a parser for a binary pList in JS
... is not easy
... pratically speaking, WebKit does not have access to the raw pList
... the low level does the parsing
... and we get it as a native object
... a representation of the data
... which WebKit converts into a JS object attached to the datacue

greg: most of the conversation is about inband
... I can see datacue useful for out of band use cases

ericc: that is a part of this
... from script you can make a new data cue with start/end
... and attach anything

chris: the explainer is incomplete and in a very early stage
... it does not explain everything

ericc: any solution we come up with has to support cues from script

<nigel> scribe: nigel

cyril: Comment on synchronisation
... The payload of the metadata may trigger behaviour with unbounded complexity
... so that's why you probably need to process it in advance and to know in advance the practical bound.
... To me this is similar to how video content is processed.
... We don't have two timestamps, one for receiving, the other for presenting.
... The implementation has to know when to preprocess things.
... So I'm not convinced that having two events is a good approach.

ericc: I agree and am strongly opposed to having two.

<cyril> eric: I strongly oppose to having 2 timestamps

<scribe> scribe: cyril

UNKNOWN_SPEAKER: in addition you cannot predict how much it is going to take in the app to do the processing
... if what you are suggested is that a cue should be delivered as soon as it is available
... that's going to vary widely
... depending on where the parsing happens

francois: perhaps it's useful to look at why

<inserted> scribe: nigel

cyril: I agree, 3 categories of event:
... 1. Overlay, maybe after js processing.
... 2. Network impact, like making requests or sending messages
... 3rd, modifying the DOM
... The 3rd category - you should be able to pre-render in advance and keep your frames until they're ready
... The other two I'm not sure about yet.

<scribe> scribe: cyril

chris: I'd like to move on to the next part, synchronization

Time marches on

chris: web apps use the oncuechange
... triggered by the time marches on
... and the spec says there is an upper limit
... but in practice some implementations do follow the upper limit
... this means that it is possible for an application to miss a short duration cue entirely
... the cuechange event is fired, the app inspects the active cues list
... and acts
... it's quite possible that in between cues triggered there are cues that app don't see it
... there is a bug report raised by Jon Piesing, HbbTV
... and the recommendation is not to create short cues
... but it's worse than that
... you have to take execution time into account
... use of oncuechange is problematic for handling cues
... the good news is that if you want to avoid missng cues
... you can attach events to onenter and on exit

nigel: but if it was missed, enter/exit are triggered at the same time
... and if there are visual changes they will be missed

foolip: the time marches on step are not defined to run every 250ms
... it's meant to be continuous
... only the event are triggered every 250ms

chris: that's not my readinfg of the spec

foolip: the problem is that implementations are not following the spec because that's easier to do

ericc: if you run a test to look at the variance
... you'll see 10-20ms

because we don't use the time marches on

scribe: but look at the cues
... this is a quality of implementation issue

nigel: this is a spec question
... [reading the spec]

foolip: it's just for the timeupdate event
... not for the cue events
... [explaining how it worked in Presto]

nigel: chrome does it this way

foolip: not because the spec is wrong

nigel: but the spec allows it

chris: we need a follow-up to understand that

foolip: maybe open a bug in chromium

chris_n: the spec does not mandate 250ms

ericc: so that we are not firing timeupdate events to not overload the system
... we could, but that would cause other issues

ack

pierre: the spec guarantees that every single cue will be fired
... regardless of the algorithm ?

gkatsev: no some have been missed

scott: the text says some cues can be skipped

ericc: cues can be dropped

pal: if I have a cue that has a duration of d is there a req that difference between onenter and onexit is close to d?

ericc: no

pal: you could get them simulatenously

ericc: but if there is onenter/onexit it should be fired

foolip: that's a good idea

chris: another related issue is
... we want a more accurate firing of these events
... driven by the need to align captions with shots or scene changes in the video
... and we came up with a number of 20ms
... that gives a chance to the application

nigel: you want to replace the number 250 with 20?

chris: no

richard: the shorter the time limit goes down, it's exponential the power you're going to have

foolip: the reason the schedulig is poor isnot for battery saving

ericc: it was because it was simpler to write
... it's not possible to guarantee any kind specific latency
... because the browser is under the same constraints as anything else

nigel: that depends on the frame rate

ericc: cues are not tied to frames

foolip: but frames have time stamos

ericc: in my system the frames are rendered by a different subsystem

foolip: there is a quality of implementation issue

ericc: no matter what wording we put in the spec
... it won't help you
... you have to file bugs to get what you need

chris_n: I wouldn't close an issue because it is ok with the spec

chris: I see some inconsistencies between implementations
... when the application moves cues around in the timeline
... if you change time of the cues
... and if you seek the media
... and seek over some cues

ericc: have you filed bugs?

chris: not yet

chris_n: a spec update is not necessary but it may be useful to avoid others doing the same mistake

ericc: we should not wait for TPAC to file bugs
... if we want to have the issue fixed quickly

nigel: it's hard to file a bug with the given spec

ericc: if you file a bug with an example and it is not good enough even if it matches the spec we should fix it
... we could get the spec improved

foolip: all specs are wrong every other paragraph!

richard: sometimes I've asked to fix an implementation but been told that the impl is within the spec

foolip: it happens that implementers consider the spec as untouchable but you should escalate

chris: [showing a waverform library demo]
... I'm using VTTCues
... adjusting the times on cues
... it's not the only use case
... [showing a table of what events get fired in practice]

ericc: you should file a bug

chris: the next stage is the meeting on Friday, joint Media WG and Timed Text
... we should figure out how to use that time productively

<nigel> Blink bug

<nigel> blink bug

chris: it seems filing bug is the recommendation

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/09/18 03:01:14 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/Ov/1. Ov/
Succeeded: s/Ne/2. Ne/
Succeeded: i/cyril:/scribe: nigel
Present: romain gkatsev pal chcunningham ericc jkamata stepsteg Nigel
Found Scribe: Cyril
Inferring ScribeNick: cyril
Found Scribe: nigel
Inferring ScribeNick: nigel
Found Scribe: cyril
Inferring ScribeNick: cyril
Found Scribe: nigel
Inferring ScribeNick: nigel
Found Scribe: cyril
Inferring ScribeNick: cyril
Scribes: Cyril, nigel
ScribeNicks: cyril, nigel

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: 

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]