Media Timed Events TF – 21 February 2022

Meeting minutes

DataCue status update

Chris: Work has slowed recently, not clear what to do next
… Two main parts to the proposal: one DataCue API for timed metadata events, generic, store any object and trigger events during media playback
… Second part was for surfacing DASH emsg events through DataCue
… IF people want to progress the emsg part then we need additional contributors
… The work started in MEIG, with a presentation from Giri with ATSC and 3GPP requirements
… Needs input to develop the technical proposal
… Can't and shouldn't do this myself

Kaz: Can you share the existing resources?

Chris: The documents are linked from GitHub page: https://github.com/WICG/datacue

<kaz> issue 21

<kaz> Explainer

<kaz> Requirements

Kaz: We might want to look for additional moderator?

Chris: Yes, also an editor who can help write documents, specifically the emsg integration with MSE
… But also for DataCue itself. Has value separate from emsg support
… Would like to progress that by itself
… Easier proposition, but needs editorial help

Kaz: So you can continue as moderator?

Chris: Yes
… We don't have many companies pushing for emsg support, so want to confirm the need for this

Kaz: Go back to the use cases and requirements and ask people their interest

Chris: Yes

<kaz> Use Cases

Francois: Also update the GitHub issues to say progress is blocked?

Chris: Could be a good thing to do next

<kaz> datacue issues

Chris: Will also talk with people at DASH-IF, they're working on interop issues around timed events

SEI events

Chris: Thank you to all who replied to the GitHub issues
… https://github.com/leonardoFu/video-sei-event

<kaz> Issue 82 - Video SEI events

Chris: Explainer: https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md
… and two open issues https://github.com/leonardoFu/video-sei-event/issues
… Let's discuss the open issues and decide how to update the explainer
… The goal is understand the use cases for SEI events and turn that into a technical API proposal
… One thing I would like to understand is if this is a new API proposal, or if it aligns with DataCue

<kaz> leonardo's repo - issue 2 - Interaction with Encrypted Media Extensions

<kaz> leonardo's repo - issue 3 - Timing accuracy and decode/presentation ordering

issue 2

<kaz> leonardo's repo - issue 2 - Interaction with Encrypted Media Extensions

Chris: On issue 2 (EME integration). This was mostly a clarification question, are the SEI events part of the encrypted bit stream?
… It seems so
… So would be good to describe in the explainer

Nigel: I agree, sounds like the only reasonable answer

<Zakim> nigel, you wanted to agree this must be the right answer

Chris: I recommend describing that in the explainer, to set the scope of the solution proposal
… So with EME it may not be possible to surface EME events, so worth clarifying in the explainer

Yuhao: Why can't we get the information. After decryption we have raw video frames
… After EME the video frame is handled by the rendering. So in this case maybe we can't get the information
… When using WebCodecs, how to handle? Does WebCodecs work with EME?

Chris: I don't think it does, so we don't have the same limitation

Yuhao: If we use WebCodecs, we need another way to do decryption, WASM or JS

Chris: With WebCodecs you would parse the video bitstream from the container in WASM or JS, then pass the video bitstream to WebCodecs
… If you're using encryption, that would have to be after parsing the container

Kaz: Let's clarify each use case, and what kind of framework and mechanism should be applied to each part for the expected service
… What kind of extension is expected here?

Chris: Yes, encryption is a point of detail really

Yuhao: Should we look at the media handling to see if we can get the video bitstream?

Chris: It's worth looking, yes. But unsure it's possible

Yuhao: I can spend some time on that. Suppose we can get the information after the EME pipeline, and we can still get SEI information?

Chris: Let's update issue #2 with that information, then use this to write a short description into the explainer

Issue 3 (timing accuracy and ordering)

Chris: https://github.com/leonardoFu/video-sei-event/issues/3

Chris: I notice that we said we want to have access to SEI events in decode order rather than presentation order
… Looking at WebCodecs issue https://github.com/w3c/webcodecs/issues/198
… This is the related issue for WebCodecs
… We should bring our requirements to this GitHub issue
… The WebCodecs spec describes that VideoFrames are output in presentation order https://w3c.github.io/webcodecs/#videodecoder-methods
… So if want the SEI events in decode order, how does that affect the API we propose?
… If we want SEI events in presentation order, we can propose to attach them to VideoFrame
… But if we want them in decode order, we may need an event handler so the VideoDecoder can surface the events earlier
… The explainer doesn't describe WebCodecs currently
… https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md
… So should we add it? Or should we focus on the video element?

Yuhao: I worked with WebCodecs recently. In some cases it's a better way to get the information from the media stream
… And it works well with WebRTC, so it should be considered

Nigel: Would it make sense to require WebCodecs to fire these events if it sees them?
… It would know how to fire them in the right order. If WebCodecs decodes the video it should provide this functionality
… You want anything that's decoding video to do it, somehow. But may be harder to define in the general case

Chris: Do we need to write something in the explainer, or do we simply add our input to https://github.com/w3c/webcodecs/issues/198
… The explainer could reference the GitHub issue

Francois: What I'd like to see is more a consideration of the synchronisation needs that the use cases have
… If we're talking about frame accuracy, are events a good approach?
… The explainer has AI based subtitles for a live stream, which might not require frame accuracy
… If you use SEI metadata for volumetric video where you need strong synchronization between SEI metadata and video, then events aren't going to work

Chris: Issue 3 talks about timing requirements: https://github.com/leonardoFu/video-sei-event/issues/3

Francois: Events could be good enough for 100 ms

<xfq> +1 to tidoust

Chris: Do we have a use case that requires more timing accuracy?

Yuhao: In my use case, don't need to render the SEI information on the exact frame where the SEI is payloaded. It can stand a 100ms tolerance
… If we really want to synchronise the frame with SEI, we can use WebCodecs with Canvas, to really control the frame to render
… The SEI can be a property on the VideoFrame
… I can see use cases such as video editing, where the need for accuracy is high
… Seeking a video element isn't as accurate as needed. So use WebCodecs and Canvas to really control rendering
… But SEI event may not be for this case
… The proposal is intended to be easy to use. Calculate the end to end latency. So I don't need to match SEI with the video frame
… Just need as exact as possible, also the presentation and decode order may not be so important

Francois: It could be useful to describe that the explainer is for the loose synchronization use cases, and WebCodecs is for high accurate synchronization

Kaz: Agree to work on expectations for use cases and actual need from services, then look at potential application later
… One potential use case that might require precise synchronization could be interactive multimodal avatars
… Responding to speech in a realtime manner, which needs low latency
… Clarify if we need to handle such advanced use cases

Rob: The video editor is interesting, no real time element, but you want precision to associate with video frames
… Regarding loose vs tight sync, we're looking at this in WebVMT. Location is a low sync use case, but orientation is a tight sync use case
… If you turn the camera around, the video frames change very quickly. AR use cases. Not sure what solutions are being used in AR
… A mobile phone streaming video, with a website that overlays information using AR

<Zakim> nigel, you wanted to ask about sync of message vs eventual correctness

Nigel: With applications that are doing something with metadata in realtime, human sensitivity to the timing may be higher when doing video editing
… In video editing you need to know the timing. When you pause you want the correct state for the video
… frame that you're paused on

<RobSmith> Accuracy vs latency

Nigel: Synchronisation of when messages are fired, so how quickly you can respond, then how quickly the state is updated
… For example if events are fired 120 ms behind live playback, then you pause. You want the events to come out correctly, so that the view you end up with is the correct one for that frame

Chris: Do we have that captured in a doc already? If not we should add to this explainer

Nigel: In a real world application for authoring captions and subtitles

Chris: Helpful to put application description into the explainer

RobSmith: I created a demo with video on two smartphones, then used WebVMT to sync them, then had to deal with latency in the browser
… Multiple cameras observing the same scene

Chris: For the explainer, describe the synchronization goals
… If that's loose sync, we should look at potential alignment with DataCue, which would already gives loose sync
… Applications can use VTTCue today, so can be a way to prototype

<RobSmith> WebVMT video sync demo: https://webvmt.org/webvmt/blog#20201229

Chris: I'll add a GitHub issue about DataCue, let's discuss there
… Use an existing mechanism if we can, or avoid proposing multiple solutions to similar problems

Kaz: Will we try to generate some concrete use case description, or continue discussing the explainer

Chris: I think the use cases could have some more detail

Takio: There other mechanisms such as ID3, video.js can use that. Recommend clarifying use cases, then do a gap analysis

Chris: I agree

Next meeting

Chris: Next planned meeting is March 21. Should we meet earlier?
… Can talk about plan, once we have the explainer ready

<kaz> [adjourned]

– DRAFT –
Media Timed Events TF

21 February 2022

Attendees