W3C

– DRAFT –
Media Timed Events TF

15 August 2022

Attendees

Present
Chris_Needham, Fuqiao_Xue, Kazuyuki_Ashimura, Rob_Smith, Yuhao_Fu
Regrets
-
Chair
Chris_Needham
Scribe
cpn

Meeting minutes

Introduction

ChrisN: Main topic is preparations for upcoming TPAC meeting
… Agenda for Media WG and MEIG is in planning, we could include an update on the work here

TPAC plans

ChrisN: I've started preparing some slides
… We have open questions, particularly about SEI events
… It's not clear that DataCue is the right solution in all cases

<RobSmith> Could someone post a link to join the meeting please.

ChrisN: For example, if we have metadata on every video frame, then DataCue not the right solution
… Need something more like the proposal from Yuhao

Yuhao: Do we need additional description for this scenario?

ChrisN: I think so, yes. Is that a use case you are interested in?

Yuhao: Yes, sometimes, but I don't think it's a good case. If video is 60FPS, each frame takes 16ms to trigger the metadata, frequency is a problem

Kaz: Wondering about TPAC agenda
… Should we concentrate on SEI event and clarify use cases, we can look into some expected template for that description
… For device and network settings, expected behaviour and pain points, e.g., time resolution

<RobSmith> Thanks for the link

ChrisN: We do need a clear written description for what we propose for SEI event, linked to application use cases

ChrisN: For TPAC, what do we want to say? What do we want input on, what do we want from browser vendors, etc?
… There's interest here in adding support for SEI events
… We've discussed, separately from these meetings, about possible alignment with DataCue proposal
… But it's not clear DataCue is the right solution

ChrisN: [talks through draft slides]
… Overall, I think we can separate the DataCue API part from the in-band event surfacing parts
… There's interest from DASH-IF on MSE and DASH emsg events and MPEG timed metadata tracks
… But that needs further exploration, and that may depend on how much implementer interest this has
… Do we do more analysis work on figuring out the MSE implementation algorithms? Or do we seek indications of level of interest before going ahead with that?
… MSE issue: https://github.com/w3c/media-source/issues/189
… Question about the SEI events, and what do we want to report there?
… Needs its own consideration, may be different solution proposals than DataCue?
https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md
… I need your help to drive that forward
… Link to draft slides: https://docs.google.com/presentation/d/1Xy6nb2RIxMCdbiO6rxRYbmjFug1cvESjWdVdXhiujkY/edit
… What additional information to include for SEI event support?
… Options: We could have a fully developed proposal, go into the detail of the problem space and requirements
… Or we can mention briefly but say that this is ongoing work and we would develop a proposal in the future
… Or we present the existing proposal

RobSmith: So I think that's a good approach. The two parts, the API (accessing the data), and in-band handling (which is processing the data)

ChrisN: The browser would only parse to extract the events, then present them to the JS layer
… Otherwise the browser would need to understand the data format in the in-band event

RobSmith: There's a type field
… So it can present the bytes and the type?

ChrisN: Yes. In the emsg case, the type field could say "this is a DASH emsg" event, and here are the bytes.
… But within the emg there's its own type infomation

RobSmith: Nested type information
… But if it's exposed as the top-level type, with the data, it's useful, and as a first step to explore how to process further
… If we want the UA to do something as a result, could handle in some automatic way. It would allow developers and browsers to explore further on what's needed
… On VTTCue vs DataCue, it'll work, but there's at least two overheads: stringifying data, and the unnecessary rendering attributes

ChrisN: There are different kinds of tracks, metadata. Does using a metadata track prevent rendering of a VTTCue?

RobSmith: If you have a VTTCue on a metadata track, it's still valid. Does it see the text and apply region and other rendering attributes? It could be an overhead
… DataCue is a cut down version of VTTCue
… Another thought, if we expose data through the API, we'd quickly find out the response issues of processing overhead of VTTCues and their attributes (processing time, memory, latency)

ChrisN: That would be part of the further exploration, e.g., around MSE integration

RobSmith: May help gauge interest and use cases

Kaz: I think the two-step proposal might be OK. We might want to check the existing possible solutions and workarounds used by vendors and broadcasters, as basis for further discussion
… What has been done to cope with the existing problems?

ChrisN: Many media players have this
… Ready to present progress?

RobSmith: Ready to present a first draft, we're fairly happy that that's well designed. That would be a talking point for feedback on the next stage

ChrisN: I agree, the second stage, MSE integration for DASH emsg, is a lot of work. It seems dependent on whether browser vendors are interested, in principle, in a solution. Then, it's dependent on the group coming up with a proposed solution
… I see it similarly with SEI events
… Potential interest in WebCodecs
… But that's only a small part of a solution. More generally, it's also MSE playback but also HLS playback on devices that don't support MSE
… I don't think there's time to document it all ahead of TPAC

Kaz: I'm ok with starting with the initial draft proposal, and have more discussion at TPAC

ChrisN: I was thinking of presenting this at the Media WG meeting. We may only have 30 minutes, not sure we'll have a full hour
… WG agenda should be finalised during this week

ChrisN: At this stage, I'd like to include the key points for SEI events in the presentation

Yuhao: What's the most important thing to explain?

ChrisN: We currently have two possible proposals. One is to use DataCue. The other is to define an event listener on the <video> element.
… Do we need both? Which is most useful?
… Or would an integration with requestVideoFrameCallback be more useful (e.g., if the metadata is per-video frame)

Yuhao: The per-frame question, it needs to be limited. I think DataCue isn't suitable for this case
… The listener can trigger fast. How to limit? The listener function would be run at a high frequency, it's harmful for the application
… The SEI proposal is not for this use case, it's not a high accuracy event
… On the question of DataCue, except for per-frame, is it suitable?

ChrisN: That may depend on the latency we can achieve. Ideally you need see the event some time before the playback reaches the time it needs to trigger the event

Yuhao: Is it not designed for low latency scenarios?

ChrisN: It may not be. Because the SEI event is instantaneous, it has zero duration. In that case it may not be visible from a cuechange handler
… The only way to handle instantaneous DataCues is to use the onenter event, and you can only use that if the event has been created already
… If web app creates the DataCue, it's simple. The web app can assign the onenter handler at the same it creates the cue, and it will be triggered at the right time.
… If the UA creates cue, how does the web app see the cue before it must be triggered, so that it can assign the onenter handler?
… This is the potential difficulty I see

Yuhao: For zero duration events, are onenter and onexit suitable?

ChrisN: I recommend we test in browsers. But my understanding is that browsers should trigger the onenter and onexit events, in the right order
… But you not see the event if you use cuechange event, because of delay

ChrisN: The delay comes from the firing of the cuechange event. When you handle cuechange, the event does not include the list of cues that triggered the event.
… Instead, your application has to query the textTrack.activeCues list
… activeCues gives the list of cues that overlap the current playback position

Yuhao: So zero-duration cues will exist in activeCues?

ChrisN: We should test this, because they may not

Yuhao: If we add zero duration cue support, and the cuechange can be triggered, what's the problem?

ChrisN: That could be a potential solution, but I think activeCues is intended for non-zero duration, so doesn't meet definition of an activeCue

Yuhao: They're not in an active status

ChrisN: Yes

Yuhao: If we made SEI duration match the frame duration, e.g., 33ms for 30FPS?
… I think that's fair

Chris: The other consideration is how often the web app can query activeCues
… If that frequency is low, you may miss the cue, because the duration is short
… How important is it that the web app sees every cue?

RobSmith: Two different use cases. If you use cuechange, you assume there's no latency between handling the event and the activeCue list
… If you want to know if a cue has fired, onenter and onexit must be fired, so you see the history

ChrisN: TextTrackCue designed for subtitles, much longer duration, so unlikely to miss
… Making it handle short duration cues is different

RobSmith: For WebVMT, reporting temperature data synchronised to video, it's only measured at discrete times, but you want a continuous measurement
… Draw a line between the measurements. Or other examples, like counting number of people in an image
… That would be an instantaneous example, associated with the frame, but interpolation isn't appropriate
… There's a way to do it in WebVMT, use an instantaneous cue to interrupt an unbounded cue, which represents an ongoing measurement
… Could publish guidance on how the tools are expected to be used

ChrisN: For SEI, does it matter if the application misses a single event?

Yuhao: Sometimes, it's important. For different use cases, the importance is not the same. For example, using SEI to trigger opening something. Here it's important
… Sometimes the payload is just for sync between video stream and position information for rendering something
… In this case, if we lost one or two frames, it doesn't matter
… So it's important to be able to support both kinds of scenario

RobSmith: Are particular kinds of SEI events always important, and others always not important?

Yuhao: So users define the meaning of the SEI, if it's important or not

ChrisN: Is that definition intrinsic to the SEI event data itself, or is it defined by the application?

Yuhao: So have a spec that SEI information with some indication so the browser knows how to handle it, it's a good idea
… A configuration API for users. Another question is about the production, to generate SEI with this signalling
… It's the provider's problem, not the consumer's problem
… Provider is producing content not only for web app, also apps, so need to modify the SEI structure. Depends on how we define the SEI data structure to not affect other applications
… It's a good point, for the design

ChrisN: Don't need to solve right now

Next steps

ChrisN: For SEI I think we need a more detailed description, that captures the considerations we discussed

Yuhao: I'm wondering if we need a design, define API and behaviour?

ChrisN: Not at this stage, I would focus on describing the user scenario and the limitations of the browsers that exist today
… Maybe also consider alternative approaches

Kaz: I agree. Yuhao, from your viewpoint, most of the points are clear, but for W3C standardisation activity, we need to describe the detail so that we can explain ideas to others, and get good feedback

ChrisN: My recommendation is to take the notes from this meeting, and the notes from our previous discussion. Everything is there, but it's not in the explainer document so far

Yuhao: I updated the explainer, just to extend use cases. But it doesn't include the problems we discussed
… Which part of the problems?

ChrisN: Describe the current solution you have
… Is this something you currently cannot do at all? Or is it something you can do, but with expensive workarounds?
… Also, look at the existing API support in browsers (in web specifications), and describe their limitations
… For example, the TextTrackCue limitation that we discussed today

Yuhao: Include DataCue in that?

ChrisN: Yes

Yuhao: So I could extend the explainer to cover solutions, WebCodecs, DataCue, and then include limitations and use cases

ChrisN: We also have this report, we already published: https://www.w3.org/TR/media-timed-events/
… So you could refer to this, where you see an issue in common
… It's possible you don't need to write from scratch
… This report does not cover your use case exactly, and it does not cover SEI technology
… But hopefully it's helpful

ChrisN: I'll continue to prepare presenation, and I'll report back on TPAC outcomes
… I'm hoping you can also join too
… I'll share the meeting details later
… Date for next meeting, to be confirmed

[adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 192 (Tue Jun 28 16:55:30 2022 UTC).