15:52:57 RRSAgent has joined #me 15:52:57 logging to https://www.w3.org/2022/01/17-me-irc 15:53:01 Zakim has joined #me 15:53:11 Meeting: Media Timed Events 15:53:16 Chair: Chris 15:53:20 Agenda: https://www.w3.org/events/meetings/cdfcd0dc-e3be-4967-a4f4-f7a814f890b9 15:57:27 takio has joined #me 15:59:18 present: Chris_Needham, Fuqiao_Xue, Yuhao_Fu 16:01:06 RobSmith has joined #me 16:01:21 present+ Takio_Yamaoka 16:01:32 present+ Rob_Smith 16:02:01 present+ Xabier_Rodriquez_Calvar 16:02:26 present+ Kaz_Ashimura 16:04:48 present+ Karl_Carter, Amber 16:05:03 present+ Amber_Ryan 16:05:06 present- Amber 16:05:41 rrsagent, make log public 16:05:58 rrsagent, draft minutes 16:05:58 I have made the request to generate https://www.w3.org/2022/01/17-me-minutes.html kaz 16:06:30 scribenick: cpn 16:07:38 Topic: Introduction 16:07:40 agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2022Jan/0001.html 16:07:51 Chris: This is the Media Timed Events TF call 16:09:30 port6665 has joined #me 16:11:05 ... Welcome to Amber and Karl 16:11:13 Topic: DataCue review 16:11:18 Amber-SN has joined #me 16:11:58 Chris: We had some feedback from C2PA, who were looking at DataCue API as a potential solution 16:12:33 ... Most recent feedback is they aren't looking at the DASH emsg event as carrier for their metadata 16:12:44 Karl_SN has joined #me 16:13:31 ... The certificate check that was proposed may no longer be a requirement 16:14:12 ... Related issue: https://github.com/WICG/datacue/issues/21 16:14:30 q+ 16:16:11 ... Outcome is that we need to review the explainer, and possibly review the requirements related to encryption 16:16:44 Rob: Reading that issue, can you explain the use case? 16:17:42 Chris: It's about demonstrating the provenance, how content has been edited along the way 16:17:51 -> https://github.com/WICG/datacue/blob/main/explainer.md Explainer 16:18:07 Rob: I have a similar use case, dashcam evidence for the police, provenance is important 16:18:32 ... Public can submit footage to the police, must b as-captured and not be edited 16:18:35 ... Not sure how they determine that 16:18:52 s/must b/must be/ 16:18:58 ... With a sidecar file like WebVMT could be more difficult 16:20:22 q? 16:20:22 Chris: It does sound interesting to explore. Here's the info: https://c2pa.org/ 16:20:26 ack r 16:20:58 Rob: Another use cases is evidence when an event is reported using a smartphone. Is it authentic, has metadata been constructed or genuinely captured? 16:21:18 ... Disaster relief, state crime, etc 16:22:02 q? 16:22:24 Chris: I haven't had much time to contribute to the explainer or draft spec 16:23:10 ... We talked last time about separating the DataCue part from the sourcing of in-band emsg part 16:23:43 Topic: SEI event handling 16:24:20 -> https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md video SEI event Explainer 16:24:33 Chris: This was presented at a previous MEIG meeting: https://www.w3.org/2021/12/07-me-minutes.html 16:24:57 Yuhao: I'm in the web media team at ByteDance. There are many scenarios where we need SEI information 16:25:38 ... Broadcasters use software to publish the stream. Events to describe when something happened, go into RTMP stream as SEI event 16:25:42 (SEI stands for Supplemental Enhancement Information of H.264. 16:25:48 s/264./264.)/ 16:26:05 ... The player receives the SEI information and parses it, synchronises it, between demuxer and video current time 16:26:31 ... I raised the proposal to see if we can get SEI information directly from the video element, to make it easier to synchronise SEI information with the video frame 16:26:50 ... Also to make it easier for the demuxer, don't need to parse manually 16:26:54 q? 16:28:13 Chris: In this group we've looked at DASH emsg which is in the media container 16:28:36 ... SEI events are in the media bitstream rather than the container 16:28:55 Yuhao: We use SEI commonly, we produce different live stream formats: DASH, HLS, FLV 16:29:17 s/commonly/commonly in China/ 16:29:31 ... It's in the AVC or HEVC stream NAL unit, so there's no need to transfer the data, it can be put in any container 16:30:02 Chris: How does it interact with EME? 16:31:59 ... Can the SEI information be extracted before the media enters the CDM? 16:32:33 Yuhao: In most scenarios, the information in SEI is simple, it doesn't need EME 16:32:42 q+ 16:32:47 i|I'm i the web|-> https://www.w3.org/2011/webtv/wiki/images/c/cf/Video-SEI-Event-Proposal.pdf Yuhao's slides for the Dec-7 meeting| 16:33:34 ack ta 16:34:04 Takio: Should timing be before or after decoding? There's a decoding order and a presentation order 16:34:45 ... If you need the message before decoding, it's recommended which timing should be fired for SEI events 16:35:51 q+ 16:35:53 ... For emsg, the MP4 container describes the timestamp for decoding. The video stream doesn't present any timestamp, so we could make clear the use case and requirement for firing the message 16:36:25 Kaz: Thank you for the proposal and discussion. Based on the discussion in the December meeting, it could be useful to clarify use cases 16:36:40 ... and describe the timing of decoding and integration of EME 16:37:20 ack k 16:37:22 Chris: Let's capture questions in GitHub, use that to update the explainer 16:38:44 ... 1. Interaction with EME 16:38:57 -> https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md video SEI event Explainer on GitHub 16:39:09 ... 2. Timing of event firing, decode or presentation order 16:39:25 s|-> https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md video SEI event Explainer on GitHub|| 16:39:40 i|Let's ca|-> https://github.com/leonardoFu/video-sei-event/blob/main/explainer.md video SEI event Explainer on GitHub| 16:40:07 You describe the bullet chat use case, where information is used to describe where overlays can be placed in the image 16:40:34 Yuhao: In China, SEI is used to describe the shape and position of the body, and in the player we make a transparent mask 16:40:52 Chris: So is the composition of the image done in the client? 16:41:00 Masking in Bullet Chatting -> https://w3c.github.io/danmaku/usecase.html#masking 16:41:36 ... So the client uses the metadata to place the overlaid content while playing the video 16:42:00 s|Masking in Bullet Chatting -> https://w3c.github.io/danmaku/usecase.html#masking|| 16:42:09 i|So is the compos|Masking in Bullet Chatting -> https://w3c.github.io/danmaku/usecase.html#masking| 16:42:11 Yuhao: In IOS Safari we cannot get the stream content, we cannot demux, so it's not as accurate as we expect 16:42:37 Chris: Interesting from an implementation feasibility perspective 16:43:05 ... In this case do you simply give the video element the HLS manifest? 16:44:10 Yuhao: Yes 16:44:59 ... Another use case is WebRTC. We can solve the RTC problem with insertable streams. If we can get SEI information from the video directly, it will be simpler 16:45:35 ... Getting the information from the video is a simpler solution 16:45:54 Chris: What are the synchronization requirements? 16:46:30 ... The video playback runs separately to the DOM 16:46:58 ... It can be difficult to synchronize changes to the DOM with the video frames 16:47:41 ... In the bullet chat use case, are there SEI messages for every frame? 16:48:08 Yuhao: For a 60fps video, maybe we just need 15 fps frequency of updates to the information 16:48:50 ... You may not be able to see the difference, storing data at 60fps would use more bandwidth 16:49:29 Chris: How precise does the overlay rendering need to be, in relation to the video? 16:50:20 Yuhao: In most cases for playing video, we synchronise everything to currentTime. In most cases we don't need more precision 16:51:02 Chris: How many milliseconds accuracy, roughly? 16:52:05 Yuhao: In our scenarios, we use 60 fps video, so 16 ms is the smallest unit, so 10 ms is maybe enough 16:53:08 ... Maybe we can play with requestVideoFrame callback. When a video frame is rendered the callback will trigger, and we can synchronize to that 16:53:40 ... https://wicg.github.io/video-rvfc/ 16:54:22 q+ 16:54:24 ... The callback includes the presentation time of the current frame, so we can use that to synchronize on the specific frame 16:54:38 ack rob 16:54:53 Rob: This relates to a problem I'm thinking about with perspective imagery 16:55:22 ... If we're drawing on video, it matters which frame we're looking at. If it's wrong it could be noticeable 16:55:37 ... Any overlap also with AR applications, e.g., WebXR? 16:56:08 ... Other problem is latency, how fast can you respond to a frame 16:58:25 Chris: Good to look at rVFC, to see how our proposal fits 16:58:59 Rob: My use case is geo-pose, location and orientation (tilt, roll), related to geographic coordinates 16:59:27 ... The problem I have is that it's easy to sample location, e.g., every second. But orientation can change very quickly 16:59:47 ... So how fast do you need to sample it? 17:00:33 Chris: I also have questions about WebCodecs, e.g., related to https://github.com/w3c/webcodecs/issues/198 17:00:35 q+ 17:01:22 ... Do we need a proposal for WebCodecs and also a proposal for HTML