15:55:07 RRSAgent has joined #me 15:55:08 logging to https://www.w3.org/2022/03/21-me-irc 15:55:12 Zakim has joined #me 15:55:21 Meeting: Media Timed Events TF 15:56:02 Agenda: https://www.w3.org/events/meetings/25f35a2b-0c9e-4c3a-b0e2-b5135ac9b1a7 15:56:09 scribe: cpn 16:00:04 xfq has joined #me 16:01:05 present: Chris_Needham, Kazuyuki_Ashimura 16:01:17 RobSmith has joined #me 16:02:43 present: Fuqiao_Xue, Amber_Ryan, Rob_Smith 16:05:24 present+ Yuhao_Fu 16:06:22 nigel has joined #me 16:07:58 present+ Nigel_Megitt 16:08:56 Chris: Welcome. Topic for today is SEI metadata proposal and possible alignment with our proposed DataCue API 16:10:07 ... DataCue allows webapps to put media timed events on the timeline: https://github.com/WICG/datacue/blob/main/explainer.md 16:11:02 ... Here is the video SEI event proposal: https://github.com/leonardoFu/video-sei-event 16:12:09 ... Limitation on iOS is that you don't get access to the SEI event information 16:12:42 ... In browsers that support MSE, it's possible to parse the media segments to extract the NAL units and then get the SEI data 16:13:32 ... We see this in media player libraries such as dash.js, hls.js, video.js. They all have the ability to parse 608 to 708 format captions, which are contained in the SEI data 16:14:36 ... We may be in a similar position with SEI data as we are with DASH emsg, where it becomes a performance question 16:15:22 ... Is the performance cost of JS or WASM parsing to extract the timed metadata sufficient to motive browser native API support 16:15:48 ... Performance data may help to make the case for such an API 16:16:21 ... Otherwise the argument will be to do it in JS and WASM 16:18:17 ... How to approach, given the iOS limitations? Talk directly with the WebKit developers, make a standards proposal? 16:18:40 Nigel: Interesting that performance is the argument. It should be about need for interoperable standard for how things behave 16:19:03 q+ 16:19:05 ... Not sure on the level of need, but if there is a common need to expose data in a standardised way, that should drive standardisation 16:19:11 ack R 16:20:32 Rob: Can't comment on SEI, but the other aspect justification for DataCue is to make metadata availble in a more standard way 16:21:34 ... There's redundant overhead for metadata, for text styling. Depends also on volume of data. If there's a continuous stream of data, it could have significant overhead 16:22:17 Chris: I've heard the performance argument from Mozilla in terms of WebCodecs 16:24:13 ... WebCodecs is a lower API, so apps can build features on top, less need for standardising at browser level 16:26:00 Chris: SEI explainer has two use cases: subtitle rendering, and facial detection 16:26:40 Yuhao: In real time communication, we'll produce a live stream and the media server will mark the active speaker and put data in SEI 16:27:22 ... When we look at the recorded video, we can extract SEI information to see the active speaker and where they are 16:27:55 ... In real time communication the content will be checked in case something illegal is said 16:28:03 ... That's another use case 16:33:00 ... My company is contributing to WebRTC, we have users using our WebRTC system. The media server will use the information and produce a livestream to people who'll check the video 16:33:25 ... The SEI information could be about the user, the speaker 16:34:23 ... Use case if user rewinds the video. In the background system, the user will rewind the video to check some part again. In this case we need to keep the SEI event data 16:34:41 ... Without something like DataCue, we'd need to keep it outside the video somewhere 16:35:17 ... When the user seeks the previous position we'd match up the time with the SEI data again. Being able to see the cue in the timeline would be helpful 16:37:00 Chris: WebCodecs editors may be open to adding SEI support, where the SEI data is attached to an output VideoFrame 16:37:44 ... Other scenario is playing video with a video element, either with MSE or native HLS support 16:37:55 ... This doesn't use WebCodecs at all 16:40:22 Chris: Nigel, any discussions around APIs for surfacing 608 and 708 captions? 16:40:47 Nigel: No, TTWG focuses on data formats rather than APIs, and haven't heard interest expressed in that 16:41:34 ... If you're exposing data in arbitrary formats that a web player may not understand, there's an argument for converting it, e.g., populating a TextTrackCue 16:42:45 Chris: Trying to get to the bottom of what we'd actually need to propose 16:44:50 Nigel: Legacy formats like 608/708 have in-built issues, but they continued to be used. 16:45:04 Chris: So not good as motivating examples 16:47:23 Yuhao: With the video element and MSE, we could use WebCodecs with MSE 16:48:02 ... If we have a DASH or HLS stream, pass the container TS or MP4, extract the AVC NAL units frame by frame 16:49:31 ... On each frame we can construct an encoded video chunk that includes the SEI in the NAL 16:50:28 ... So need to provide access to it in the VideoFrame 16:51:24 ... With WebCodecs there can be several frames delay in the decoder, if it's cached. I think we need this information in EncodedVideoChunk and in VideoFrame 16:51:43 ... If we don't want to play the video chunk with MSE, we want to render the video frame manually 16:52:24 ... In this case we need the SEI information to render. Don't then need additional code for synchronisation, just render both together 16:53:03 ... For EncodedVideoChunk, if you can play with MSE we get information from the AVC NAL units, it would make it easier for users, a good way to get the information 16:53:47 Chris: I'd suggest adding more detail about these use cases in the explainer 16:55:14 ... Maybe the explainer could cover both the WebCodecs and MSE needs? 16:56:44 Chris: With your SEI proposal, with DataCue, an application could receive an 'sei' and construct a DataCue with the data 16:58:07 ... And then the application uses DataCue onenter and onexit events to act in response to the SEI? 17:01:43 Chris: I can see a couple of options of how to use DataCue, depending on if the browser or the web application creates the DataCue 17:02:17 Yuhao: SEI standard doesn't include duration, so just belongs to the frame. The cue start and end time can be the same 17:02:49 ... Would enter end exit events be triggered at the same time? 17:02:56 Chris: Yes, one would immediately follow the other 17:04:01 Yuhao: With video element timeupdate events, it is only generated every 250 milliseconds 17:05:54 Chris: enter and exit events are not dependent on the timeupdate, so don't have the same limitation 17:07:25 Chris: Suggest updating the explainer to add more detail on the video playback use cases with MSE, and also the RTC use case 17:07:35 ... And to add information on timing accuracy requirements 17:07:59 ... Also consider WebCodecs 17:09:07 ... Having example code is good, but first want to have a good understanding of the use case and requirement 17:09:28 Topic: Next meeting 17:09:34 Chris: Scheduled April 18 17:09:58 ... Happy to talk more before then 17:11:35 [adjourned] 17:11:42 rrsagent, draft minutes 17:11:42 I have made the request to generate https://www.w3.org/2022/03/21-me-minutes.html cpn 17:11:47 rrsagent, make log public 19:10:04 Zakim has left #me 22:12:16 Karen has joined #ME