14:55:51 <RRSAgent> RRSAgent has joined #me
14:55:51 <RRSAgent> logging to https://www.w3.org/2022/05/16-me-irc
14:55:59 <Zakim> Zakim has joined #me
14:56:00 <xfq> xfq has joined #me
14:56:13 <cpn> Meeting: Media Timed Events
14:56:23 <cpn> Chair: Chris_Needham
14:57:03 <cpn> Agenda: https://www.w3.org/events/meetings/8a8b3816-aa7e-4f6f-8e84-1875fb98d85c
15:04:08 <cpn> present: Yuhao_Fu, Fuqiao_Xue, Rob_Smith, Kazuyuki_Ashimura
15:07:32 <cpn> scribe+ cpn
15:07:46 <cpn> present+ Iraj_Sodagar
15:08:16 <cpn> Topic: Proposed changes to DataCue explainer
15:08:29 <cpn> Chris: https://github.com/WICG/datacue/pull/32/files
15:09:44 <cpn> ... This proposes to separate into two parts. One is for DASH events, which is for browser-native handling of DASH emsg events
15:10:15 <cpn> ... The second is purely about DataCue API itself
15:13:02 <cpn> ... Separating allows us to pursue the two parts independently. Collaboration with DASH-IF suggested producing a polyfill implementation based on their event model
15:13:16 <cpn> ... There would be more to do to figure out the details in an MSE based implementation
15:13:37 <cpn> ... We don't yet know if any browsers are supportive of implementing native DASH event handling
15:14:26 <cpn> ... One of the arguments I've heard is that parsing the media segments to extract events is not costly in terms of application performance, so could be left to the media player library or web application rather than the browser
15:15:01 <cpn> ... We would perhaps need performance data to show there is a real need for browser native event handling
15:18:35 <cpn> ... The intent is so we can progress DataCue by itself, and the reason for that is that even with application-level parsing, there's a need to trigger events at the right time during media playback
15:20:10 <cpn> ... That's what DataCue provides. The proposal for native emsg handling needs to have other capabilities, such as subscribing to specific event types and setting dispatch mode
15:20:46 <cpn> ... Does this approach sound OK? Open to alternative suggestions
15:21:56 <cpn> Rob: Sounds sensible to me. My understanding is DataCue is a mechanism to present the raw data, hopefully it contains a label to indicate the type of data, but there's no action taken
15:22:44 <cpn> ... DASH events requires some action by the UA, so I see they're parallel activities. Simplest to exposed the data, then the second part talks about processing
15:25:40 <cpn> Yuhao: Question about parsing the container to create DataCues. If I want to use emsg, I need to parse the video myself, not get it from the video element?
15:29:23 <cpn> ... If I use the appendBuffer API, does it mean the browser would get the fMP4 data and parse the emsg box and create the DataCues internally? So if I want to parse the container myself and prevent the browser from creating DataCues, how to do that?
15:31:03 <cpn> Chris: Good question. I don't know if we considered that yet. A way it could work is to use a subscribe/unsubscribe API. The idea is that if your app subscribe to certain events, then you're asking the browser to handle those events
15:31:25 <cpn> ... When you subscribe you'd pass in an event type identifier
15:32:38 <cpn> Yuhao: So you'd subscribe, and use addTrack from the application side. So I can still get an add-track event. If I invoke the TextTrack's addTrack, the listener will trigger
15:33:29 <RobSmith> RobSmith has joined #me
15:33:47 <cpn> Chris: This is an early draft, I think those details would need to be figured out
15:34:42 <cpn> Yuhao: How can I tell if the add-track is caused by being added manually, or from the browser automatically generating the track? May be not for now...
15:35:44 <cpn> Chris: We should review based on the DASH-IF processing model too
15:37:10 <cpn> ... We need DataCue as otherwise you have to use VTTCue, which is for captions, and doesn't allow you to store arbitrary data structure
15:37:46 <cpn> ... VTTCue can be used as a workaround today for application-level timed metadata events
15:38:59 <cpn> ... The DataCue API itself is proposed as the same as WebKit implementation, we think has everything needed
15:39:33 <cpn> ... It has 'value' and 'type' fields
15:40:36 <cpn> ... Proposal is to deprecate the 'data' ArrayBuffer in favour of the new 'value' field
15:42:16 <cpn> Iraj: If the separation makes it easier for browser vendors to support, it's a good thing
15:44:54 <cpn> ... Need also to discuss MSE handling of events. But should DataCue need extension in future if browser handling of emsg events?
15:45:28 <RobSmith> +1 keep it simple
15:45:41 <cpn> ... Anything that lowers the bar is a good thing. Even if the first version of DataCue is simplified, and needs further extension, that's OK
15:46:35 <cpn> Chris: The DASH events proposal should cover that, I think
15:47:17 <cpn> Iraj: Can we map the event timing model? Only thing maybe is whether you need a subscription model, or application can request all events coming in?
15:48:14 <cpn> ... In terms of other timing information, if the cue has start time and duration, may need to look at that
15:48:22 <cpn> Chris: Timing requirements for emsg?
15:49:31 <Karen> Karen has joined #ME
15:49:38 <cpn> Iraj: The two dispatch modes, on-receive and on-start. For an MSE integration, both are needed. On receive mode is there for the application to process the event ahead of time, so it's ready when the data becomes active
15:49:46 <cpn> ... Is that valid in the DataCue model?
15:51:22 <cpn> Chris: DataCue only gives on-start processing, so on-receive handling is done by the media player or web app
15:52:03 <cpn> ... Timing accuracy requirements for on-start processing?
15:53:39 <cpn> Iraj: Every timed element, start time and duration, uses a timescale (in ticks). Both are integers. Those are used in metadata tracks. Double floating point for deciding the time, so you can do fractions, should be OK
15:54:14 <cpn> ... You can achieve frame accurate events with the media timeline with those data types.
15:55:35 <cpn> Chris: This is where it gets more difficult, JS execution is not closely tied to media playback
15:56:48 <cpn> Iraj: Two questions: can the API definition support it, and can implementations support frame accurate? So maybe some platforms such as DVB or HbbTV so could define if they need that
15:59:21 <cpn> Chris: On the first question, the time precision should be enough. But on the second, the HTML spec may not allow frame accurate rendering
15:59:39 <cpn> ... If frame accuracy is needed, we should look at those use cases more closely and propose a different rendering model
16:00:23 <cpn> ... Do we need to redefine our goals?
16:01:47 <cpn> Rob: I'm involved in a geo-aligment activity in Immersive Web, which considers not only where you are, but where you're looking. Needs frame accuracy. Need to show where you're looking in a video stream for AR applications
16:02:41 <cpn> Chris: Should we join up with others in that discussion?
16:03:59 <cpn> Rob: Yes, but it's early. Geo-alignment was being looked at in the Immersive Web group a few years ago. There's been a shift away from AR to VR, so a move to drop the geo-alignment
16:04:09 <cpn> ... In OGC, there's a group doing geo-pose with 6DoF for AR, and new activity with W3C just starting
16:07:57 <cpn> ... I made a short video that syncs video with location for a street drone car. The video is up on YouTube
16:08:38 <RobSmith> OGC T17 Moving Features autonomous vehicle analysis: https://youtu.be/-BjeAp_hgQc
16:08:54 <cpn> Topic: Next meeting
16:09:01 <cpn> Chris: June 20
16:09:08 <cpn> [adjourned]
16:09:14 <cpn> rrsagent, draft minutes
16:09:14 <RRSAgent> I have made the request to generate https://www.w3.org/2022/05/16-me-minutes.html cpn
16:09:18 <cpn> rrsagent, make log public
16:35:54 <Karen> Karen has joined #ME
18:35:03 <Karen> Karen has joined #ME
19:38:07 <Zakim> Zakim has left #me