IRC log of me on 2022-01-17

Timestamps are in UTC.

15:52:57 [RRSAgent]
RRSAgent has joined #me
15:52:57 [RRSAgent]
logging to
15:53:01 [Zakim]
Zakim has joined #me
15:53:11 [cpn]
Meeting: Media Timed Events
15:53:16 [cpn]
Chair: Chris
15:53:20 [cpn]
15:57:27 [takio]
takio has joined #me
15:59:18 [cpn]
present: Chris_Needham, Fuqiao_Xue, Yuhao_Fu
16:01:06 [RobSmith]
RobSmith has joined #me
16:01:21 [cpn]
present+ Takio_Yamaoka
16:01:32 [cpn]
present+ Rob_Smith
16:02:01 [cpn]
present+ Xabier_Rodriquez_Calvar
16:02:26 [kaz]
present+ Kaz_Ashimura
16:04:48 [cpn]
present+ Karl_Carter, Amber
16:05:03 [kaz]
present+ Amber_Ryan
16:05:06 [kaz]
present- Amber
16:05:41 [kaz]
rrsagent, make log public
16:05:58 [kaz]
rrsagent, draft minutes
16:05:58 [RRSAgent]
I have made the request to generate kaz
16:06:30 [cpn]
scribenick: cpn
16:07:38 [cpn]
Topic: Introduction
16:07:40 [kaz]
16:07:51 [cpn]
Chris: This is the Media Timed Events TF call
16:09:30 [port6665]
port6665 has joined #me
16:11:05 [cpn]
... Welcome to Amber and Karl
16:11:13 [cpn]
Topic: DataCue review
16:11:18 [Amber-SN]
Amber-SN has joined #me
16:11:58 [cpn]
Chris: We had some feedback from C2PA, who were looking at DataCue API as a potential solution
16:12:33 [cpn]
... Most recent feedback is they aren't looking at the DASH emsg event as carrier for their metadata
16:12:44 [Karl_SN]
Karl_SN has joined #me
16:13:31 [cpn]
... The certificate check that was proposed may no longer be a requirement
16:14:12 [cpn]
... Related issue:
16:14:30 [RobSmith]
16:16:11 [cpn]
... Outcome is that we need to review the explainer, and possibly review the requirements related to encryption
16:16:44 [cpn]
Rob: Reading that issue, can you explain the use case?
16:17:42 [cpn]
Chris: It's about demonstrating the provenance, how content has been edited along the way
16:17:51 [kaz]
-> Explainer
16:18:07 [cpn]
Rob: I have a similar use case, dashcam evidence for the police, provenance is important
16:18:32 [cpn]
... Public can submit footage to the police, must b as-captured and not be edited
16:18:35 [cpn]
... Not sure how they determine that
16:18:52 [kaz]
s/must b/must be/
16:18:58 [cpn]
... With a sidecar file like WebVMT could be more difficult
16:20:22 [kaz]
16:20:22 [cpn]
Chris: It does sound interesting to explore. Here's the info:
16:20:26 [kaz]
ack r
16:20:58 [cpn]
Rob: Another use cases is evidence when an event is reported using a smartphone. Is it authentic, has metadata been constructed or genuinely captured?
16:21:18 [cpn]
... Disaster relief, state crime, etc
16:22:02 [cpn]
16:22:24 [cpn]
Chris: I haven't had much time to contribute to the explainer or draft spec
16:23:10 [cpn]
... We talked last time about separating the DataCue part from the sourcing of in-band emsg part
16:23:43 [cpn]
Topic: SEI event handling
16:24:20 [kaz]
-> video SEI event Explainer
16:24:33 [cpn]
Chris: This was presented at a previous MEIG meeting:
16:24:57 [cpn]
Yuhao: I'm in the web media team at ByteDance. There are many scenarios where we need SEI information
16:25:38 [cpn]
... Broadcasters use software to publish the stream. Events to describe when something happened, go into RTMP stream as SEI event
16:25:42 [kaz]
(SEI stands for Supplemental Enhancement Information of H.264.
16:25:48 [kaz]
16:26:05 [cpn]
... The player receives the SEI information and parses it, synchronises it, between demuxer and video current time
16:26:31 [cpn]
... I raised the proposal to see if we can get SEI information directly from the video element, to make it easier to synchronise SEI information with the video frame
16:26:50 [cpn]
... Also to make it easier for the demuxer, don't need to parse manually
16:26:54 [cpn]
16:28:13 [cpn]
Chris: In this group we've looked at DASH emsg which is in the media container
16:28:36 [cpn]
... SEI events are in the media bitstream rather than the container
16:28:55 [cpn]
Yuhao: We use SEI commonly, we produce different live stream formats: DASH, HLS, FLV
16:29:17 [kaz]
s/commonly/commonly in China/
16:29:31 [cpn]
... It's in the AVC or HEVC stream NAL unit, so there's no need to transfer the data, it can be put in any container
16:30:02 [cpn]
Chris: How does it interact with EME?
16:31:59 [cpn]
... Can the SEI information be extracted before the media enters the CDM?
16:32:33 [cpn]
Yuhao: In most scenarios, the information in SEI is simple, it doesn't need EME
16:32:42 [takio]
16:32:47 [kaz]
i|I'm i the web|-> Yuhao's slides for the Dec-7 meeting|
16:33:34 [kaz]
ack ta
16:34:04 [cpn]
Takio: Should timing be before or after decoding? There's a decoding order and a presentation order
16:34:45 [cpn]
... If you need the message before decoding, it's recommended which timing should be fired for SEI events
16:35:51 [kaz]
16:35:53 [cpn]
... For emsg, the MP4 container describes the timestamp for decoding. The video stream doesn't present any timestamp, so we could make clear the use case and requirement for firing the message
16:36:25 [cpn]
Kaz: Thank you for the proposal and discussion. Based on the discussion in the December meeting, it could be useful to clarify use cases
16:36:40 [cpn]
... and describe the timing of decoding and integration of EME
16:37:20 [kaz]
ack k
16:37:22 [cpn]
Chris: Let's capture questions in GitHub, use that to update the explainer
16:38:44 [cpn]
... 1. Interaction with EME
16:38:57 [kaz]
-> video SEI event Explainer on GitHub
16:39:09 [cpn]
... 2. Timing of event firing, decode or presentation order
16:39:25 [kaz]
s|-> video SEI event Explainer on GitHub||
16:39:40 [kaz]
i|Let's ca|-> video SEI event Explainer on GitHub|
16:40:07 [cpn]
You describe the bullet chat use case, where information is used to describe where overlays can be placed in the image
16:40:34 [cpn]
Yuhao: In China, SEI is used to describe the shape and position of the body, and in the player we make a transparent mask
16:40:52 [cpn]
Chris: So is the composition of the image done in the client?
16:41:00 [xfq]
Masking in Bullet Chatting ->
16:41:36 [cpn]
... So the client uses the metadata to place the overlaid content while playing the video
16:42:00 [kaz]
s|Masking in Bullet Chatting ->||
16:42:09 [kaz]
i|So is the compos|Masking in Bullet Chatting ->|
16:42:11 [cpn]
Yuhao: In IOS Safari we cannot get the stream content, we cannot demux, so it's not as accurate as we expect
16:42:37 [cpn]
Chris: Interesting from an implementation feasibility perspective
16:43:05 [cpn]
... In this case do you simply give the video element the HLS manifest?
16:44:10 [cpn]
Yuhao: Yes
16:44:59 [cpn]
... Another use case is WebRTC. We can solve the RTC problem with insertable streams. If we can get SEI information from the video directly, it will be simpler
16:45:35 [cpn]
... Getting the information from the video is a simpler solution
16:45:54 [cpn]
Chris: What are the synchronization requirements?
16:46:30 [cpn]
... The video playback runs separately to the DOM
16:46:58 [cpn]
... It can be difficult to synchronize changes to the DOM with the video frames
16:47:41 [cpn]
... In the bullet chat use case, are there SEI messages for every frame?
16:48:08 [cpn]
Yuhao: For a 60fps video, maybe we just need 15 fps frequency of updates to the information
16:48:50 [cpn]
... You may not be able to see the difference, storing data at 60fps would use more bandwidth
16:49:29 [cpn]
Chris: How precise does the overlay rendering need to be, in relation to the video?
16:50:20 [cpn]
Yuhao: In most cases for playing video, we synchronise everything to currentTime. In most cases we don't need more precision
16:51:02 [cpn]
Chris: How many milliseconds accuracy, roughly?
16:52:05 [cpn]
Yuhao: In our scenarios, we use 60 fps video, so 16 ms is the smallest unit, so 10 ms is maybe enough
16:53:08 [cpn]
... Maybe we can play with requestVideoFrame callback. When a video frame is rendered the callback will trigger, and we can synchronize to that
16:53:40 [cpn]
16:54:22 [RobSmith]
16:54:24 [cpn]
... The callback includes the presentation time of the current frame, so we can use that to synchronize on the specific frame
16:54:38 [kaz]
ack rob
16:54:53 [cpn]
Rob: This relates to a problem I'm thinking about with perspective imagery
16:55:22 [cpn]
... If we're drawing on video, it matters which frame we're looking at. If it's wrong it could be noticeable
16:55:37 [cpn]
... Any overlap also with AR applications, e.g., WebXR?
16:56:08 [cpn]
... Other problem is latency, how fast can you respond to a frame
16:58:25 [cpn]
Chris: Good to look at rVFC, to see how our proposal fits
16:58:59 [cpn]
Rob: My use case is geo-pose, location and orientation (tilt, roll), related to geographic coordinates
16:59:27 [cpn]
... The problem I have is that it's easy to sample location, e.g., every second. But orientation can change very quickly
16:59:47 [cpn]
... So how fast do you need to sample it?
17:00:33 [cpn]
Chris: I also have questions about WebCodecs, e.g., related to
17:00:35 [kaz]
17:01:22 [cpn]
... Do we need a proposal for WebCodecs and also a proposal for HTML <video> based playback
17:02:02 [cpn]
Kaz: Do we want to create a TF for this discussion, or continue within the MTE TF or main call?
17:02:51 [cpn]
Chris: Don't think we've figured out yet whether DataCue is the right solution for SEI, it may or may not be
17:03:06 [cpn]
... When should we meet next to continue the discussion?
17:03:40 [cpn]
... Can we raise questions in your GitHub repo?
17:04:01 [cpn]
Yuhao: Yes, that's OK
17:04:02 [kaz]
s/Can we/Yuhao, can we/
17:04:23 [cpn]
Chris: I'll do that
17:05:33 [cpn]
... And when schedule our next meeting?
17:06:08 [cpn]
... The next scheduled MTE call is February 21
17:08:17 [cpn]
... A future call we could discuss moving the proposal in GitHub to W3C space (probably WICG)
17:09:21 [cpn]
17:09:24 [kaz]
17:09:26 [cpn]
rrsagent, draft minutes
17:09:26 [RRSAgent]
I have made the request to generate cpn
17:09:31 [cpn]
rrsagent, make log public
17:09:34 [kaz]
17:09:40 [kaz]
rrsagent, draft minutes
17:09:40 [RRSAgent]
I have made the request to generate kaz
17:12:15 [kaz]
rrsagent, stop