Media Timed Events TF – 21 March 2022

Meeting minutes

SEI events

Chris: Welcome. Topic for today is SEI metadata proposal and possible alignment with our proposed DataCue API
… DataCue allows webapps to put media timed events on the timeline: https://github.com/WICG/datacue/blob/main/explainer.md
… Here is the video SEI event proposal: https://github.com/leonardoFu/video-sei-event
… Limitation on iOS is that you don't get access to the SEI event information
… In browsers that support MSE, it's possible to parse the media segments to extract the NAL units and then get the SEI data
… We see this in media player libraries such as dash.js, hls.js, video.js. They all have the ability to parse 608 to 708 format captions, which are contained in the SEI data
… We may be in a similar position with SEI data as we are with DASH emsg, where it becomes a performance question
… Is the performance cost of JS or WASM parsing to extract the timed metadata sufficient to motive browser native API support
… Performance data may help to make the case for such an API
… Otherwise the argument will be to do it in JS and WASM
… How to approach, given the iOS limitations? Talk directly with the WebKit developers, make a standards proposal?

Nigel: Interesting that performance is the argument. It should be about need for interoperable standard for how things behave
… Not sure on the level of need, but if there is a common need to expose data in a standardised way, that should drive standardisation

Rob: Can't comment on SEI, but the other aspect justification for DataCue is to make metadata availble in a more standard way
… There's redundant overhead for metadata, for text styling. Depends also on volume of data. If there's a continuous stream of data, it could have significant overhead

Chris: I've heard the performance argument from Mozilla in terms of WebCodecs
… WebCodecs is a lower API, so apps can build features on top, less need for standardising at browser level

Chris: SEI explainer has two use cases: subtitle rendering, and facial detection

Yuhao: In real time communication, we'll produce a live stream and the media server will mark the active speaker and put data in SEI
… When we look at the recorded video, we can extract SEI information to see the active speaker and where they are
… In real time communication the content will be checked in case something illegal is said
… That's another use case
… My company is contributing to WebRTC, we have users using our WebRTC system. The media server will use the information and produce a livestream to people who'll check the video
… The SEI information could be about the user, the speaker
… Use case if user rewinds the video. In the background system, the user will rewind the video to check some part again. In this case we need to keep the SEI event data
… Without something like DataCue, we'd need to keep it outside the video somewhere
… When the user seeks the previous position we'd match up the time with the SEI data again. Being able to see the cue in the timeline would be helpful

Chris: WebCodecs editors may be open to adding SEI support, where the SEI data is attached to an output VideoFrame
… Other scenario is playing video with a video element, either with MSE or native HLS support
… This doesn't use WebCodecs at all

Chris: Nigel, any discussions around APIs for surfacing 608 and 708 captions?

Nigel: No, TTWG focuses on data formats rather than APIs, and haven't heard interest expressed in that
… If you're exposing data in arbitrary formats that a web player may not understand, there's an argument for converting it, e.g., populating a TextTrackCue

Chris: Trying to get to the bottom of what we'd actually need to propose

Nigel: Legacy formats like 608/708 have in-built issues, but they continued to be used.

Chris: So not good as motivating examples

Yuhao: With the video element and MSE, we could use WebCodecs with MSE
… If we have a DASH or HLS stream, pass the container TS or MP4, extract the AVC NAL units frame by frame
… On each frame we can construct an encoded video chunk that includes the SEI in the NAL
… So need to provide access to it in the VideoFrame
… With WebCodecs there can be several frames delay in the decoder, if it's cached. I think we need this information in EncodedVideoChunk and in VideoFrame
… If we don't want to play the video chunk with MSE, we want to render the video frame manually
… In this case we need the SEI information to render. Don't then need additional code for synchronisation, just render both together
… For EncodedVideoChunk, if you can play with MSE we get information from the AVC NAL units, it would make it easier for users, a good way to get the information

Chris: I'd suggest adding more detail about these use cases in the explainer
… Maybe the explainer could cover both the WebCodecs and MSE needs?

Chris: With your SEI proposal, with DataCue, an application could receive an 'sei' and construct a DataCue with the data
… And then the application uses DataCue onenter and onexit events to act in response to the SEI?

Chris: I can see a couple of options of how to use DataCue, depending on if the browser or the web application creates the DataCue

Yuhao: SEI standard doesn't include duration, so just belongs to the frame. The cue start and end time can be the same
… Would enter end exit events be triggered at the same time?

Chris: Yes, one would immediately follow the other

Yuhao: With video element timeupdate events, it is only generated every 250 milliseconds

Chris: enter and exit events are not dependent on the timeupdate, so don't have the same limitation

Chris: Suggest updating the explainer to add more detail on the video playback use cases with MSE, and also the RTC use case
… And to add information on timing accuracy requirements
… Also consider WebCodecs
… Having example code is good, but first want to have a good understanding of the use case and requirement

Next meeting

Chris: Scheduled April 18
… Happy to talk more before then

[adjourned]

– DRAFT –
Media Timed Events TF

21 March 2022

Attendees

Meeting minutes

SEI events

Next meeting