13:44:55 <RRSAgent> RRSAgent has joined #webrtc
13:44:55 <RRSAgent> logging to https://www.w3.org/2021/10/26-webrtc-irc
13:45:10 <dom> Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0012/MEDIA-WEBRTC-10-26-2021.pdf
13:45:21 <dom> Meeting: Joint Media/Audio/WEBRTC WG Meeting
13:45:32 <dom> Agenda: https://www.w3.org/2011/04/webrtc/wiki/TPAC_2021#Joint_WebRTC.2FMedia.2FAudio_WG_Meeting
13:53:19 <cpn> cpn has joined #webrtc
13:58:10 <takio> takio has joined #webrtc
13:58:35 <tidoust> tidoust has joined #webrtc
13:58:43 <tuukkat> tuukkat has joined #webrtc
14:00:25 <eric_> eric_ has joined #webrtc
14:01:22 <plh> plh has joined #webrtc
14:01:32 <kajihata> kajihata has joined #webrtc
14:02:05 <mfoltzgoogle> mfoltzgoogle has joined #webrtc
14:02:07 <eehakkin> eehakkin has joined #webrtc
14:02:10 <mfoltzgoogle> present+ Mark_Foltz
14:02:20 <cpn> Meeting: WebRTC / Media / Audio WG joint meeting
14:02:25 <cpn> scribenick: cpn
14:02:36 <cpn> Topic: Introduction
14:02:42 <cpn> Bernard: Welcome to the joint meeting
14:03:01 <cpn> ... Slides are at @@
14:03:05 <jake> jake has joined #webrtc
14:03:35 <tidoust> s/@@/https://docs.google.com/presentation/d/1XKNdYR0JWTtO1EIGu_sQy5rqfXMorrCKpu0UTX6kizQ
14:03:45 <toyoshim> toyoshim has joined #webrtc
14:03:50 <vr000m> vr000m has joined #webrtc
14:04:54 <cpn> We'll talk about next generation APIs and areas to work on
14:05:08 <cpn> s/We'll/... We'll/
14:05:34 <cpn> ... We'll have time at the end for wrap-up and next steps
14:05:57 <cpn> ... We're not trying to solve problems in this meeting, just identify problems
14:06:14 <cpn> ... There may be issues we don't realise we have that have not been filed yet
14:06:23 <cpn> Topic: Next Generation Media APIs
14:06:37 <GregFreedman> GregFreedman has joined #webrtc
14:06:46 <cpn> Bernard: Streaming and real-time communications evolved in silos
14:07:13 <cpn> ... Real time streaming could only support a modest audience ~ 100s
14:07:29 <cpn> ... The pandemic has been pivotal, transformation, user-driven innovation
14:07:50 <cpn> ... What have you observed?
14:08:18 <cpn> ... One application that summarises some of the trends is "together mode" that superimposes participants in a virtual experience
14:08:57 <cpn> ... With large gatherings restricted, the goal was to include fans virtually
14:09:09 <cpn> ... The video was processed for background removal and composited
14:09:33 <cpn> ... Developers don't want to choose between streaming and real-time communication silos
14:09:47 <cpn> ... Next-gen media APIs provide access through a single set of tools
14:10:08 <cpn> ... Low-level building blocks such as capture, encode/decode, transport, and rendering
14:10:22 <cpn> ... [Shows some of the APIs]
14:10:38 <cpn> ... Capture APIs are in development in the WebRTC WG, encode/decode in the Media WG
14:10:55 <cpn> ... Web Transport and WHATWG Streams, WASM
14:11:03 <jib> jib has joined #webrtc
14:11:12 <fluffy> fluffy has joined #webrtc
14:11:16 <cpn> ... The APIs support multi-threading. Support for transferable media stream tracks was added
14:11:35 <cpn> ... Things not on the list, but useful additions, include JS libraries for containerisation / decontainerisation
14:11:51 <cpn> ... Some APIs are modelled on streams, others can be wrapped in a stream-like API
14:12:16 <cpn> ... Allows use of special effects in the pipeline. MediaCaptureTransform API to convert a track to a stream of A/V frames
14:12:28 <cpn> ... The transport packetized and takes care of FEC
14:12:52 <cpn> ... Encoded chunks are received, decoded, and presented with WebGL or WebGPU
14:13:02 <cpn> ... [Pipeline model example]
14:13:27 <cpn> ... Typically this runs in a Worker. The pipeThrough functions are implemented as TransformStreams
14:14:45 <cpn> ... Can use any transport, WebTransport or WebSocket, can choose reliable and orderd, reliable and unordered, unreliable and unordered
14:15:06 <cpn> ... Does this story hang together? What are we missing?
14:15:21 <cpn> ... One thing I noticed recently, we spent time talking about workers
14:15:36 <cpn> ... There is no meta-spec describing the overall requirements for Worker support across these APIs
14:15:53 <cpn> ... There's no guarantee that the browser will support all the APIs needed by an app
14:16:10 <cpn> .. No browser supports both MSE v2 and RTCDataChannel in Workers
14:16:17 <cpn> ... So you can't fully make use of Workers
14:16:46 <cpn> ... Could say it's an implementation issue, but we're spending time figuring out the fundamental tools needed to support media apps
14:17:07 <cpn> ... We may not have discovered the full set of tools, and there's no single spec that can do that
14:17:25 <cpn> ... Another issue is testing. WebRTC is difficult to test using WPT, as the tests use multiple endpoints
14:17:41 <cpn> ... In WebRTC and WebTransport WGs, we're extending WPT to add an echo test server
14:17:50 <cpn> ... You can test some aspects of protocol performance with this
14:18:01 <cpn> ... WebCodecs may also benefit from an echo testing framework
14:18:16 <cpn> ... Another issue is performance in WHATWG streams
14:18:37 <cpn> ... We don't currently have performance criteria, data on performance, or processes to bring discussion to closure
14:18:49 <cpn> ... For example, a process to have joint discussion with WHATWG on streams
14:19:10 <cpn> ... Some consider streams to be suitable until we prove otherwise, some think the opposite
14:19:32 <cpn> ... We don't have a single place to discuss this all together, between W3C WGs and WHATWG
14:19:50 <cpn> ... In theory, we have client/server and P2P transport, supporting multiple modes
14:20:01 <cpn> ... But why does it feel like there are gaps when we build an app?
14:20:19 <cpn> ... Apps using RTCDataChannel replace congestion control to control latency
14:20:44 <cpn> ... What if media is sent by the browser, e.g, video ingest in the browser or a video conference?
14:21:08 <cpn> ... How do the transports usable by WebCodecs compare with WebRTC?
14:21:37 <cpn> ... Live WebRTC was actually modified by YouTube Live to get better quality for video ingestion
14:22:06 <cpn> ... Because it optimises latency over video quality, so the encoder bitrate target is adjusted
14:22:19 <cpn> ... WebRTC probes and can restore drop layers
14:22:53 <cpn> ... After a loss, TCP additively increases, as does SCTP, but this produces a delay to recover quality
14:23:18 <cpn> ... RTMP/ RTCDataChannel work well for video upload, but not so well for video conferencing
14:23:32 <cpn> ... So would be beneficial to have access to a low latency transport, RTP
14:24:03 <cpn> ... On congestion control, there's an issue with the interaction, average bitrate target overshoot
14:24:21 <cpn> ... You'll lose packets if you can't build a queue, need to re-send a keyframe
14:24:49 <cpn> ... You could lower the avg bitrate target, reduce resolution of keyframe, but you'll still get a bandwidth spike
14:25:20 <cpn> ... Some bigger things (bigger because they involve an ecosystem)...
14:25:44 <cpn> ... Selective Forwarding Units are the basis of real-time streaming. Make it difficult to include end to end security
14:26:03 <cpn> ... Overall, we need a next generation of SFUs to go with the next-gen APIs
14:26:19 <cpn> ... APIs aren't enough, we need protocol standards for how audio and video are carried
14:26:33 <cpn> ... On the streaming side, we've struggled to replace RTMP which doesn't support next-gen codecs
14:26:46 <cpn> ... Many contenders: ??1, WHIP, RUSH
14:27:07 <tidoust> s/??1/SRT
14:27:30 <cpn> ... A paper "The QUIC fix for optimal video streaming" looked at the value of differential reliability
14:27:55 <cpn> ... that's eliminating HoL blocking, keyframes vs delat frames, and discardable frames with lower reliability to avoid holding up keyframes
14:28:32 <cpn> ... Another area that could be a big missing piece is the combination of WebCodecs and content protection
14:28:59 <cpn> ... Content protection is associated with containerisation. WebCodecs doesn't use containerised media
14:29:33 <lilin> lilin has joined #webrtc
14:30:02 <cpn> MarkFoltz: In the pipeline, I didn't see a step for executing ML models. Have you looked at the Web ML WG, compatible with WHATWG streams?
14:30:09 <hober> hober has joined #webrtc
14:30:19 <hober> present+
14:30:27 <cpn> Bernard: I put that into the effects block. What is the performance like?
14:30:46 <cpn> ... Arguments on main vs worker thread. Great question is whether we get the performance we need in this model
14:31:21 <cpn> ... Some use cases have a lot of users at once. Performance issues often aren't surfaced in the WGs handling media
14:31:28 <cpn> ... Want to make this stuff possible
14:32:03 <cpn> Piers: On low-latency, the APIs for measureing throughput accurately to allow ABR algorithms to work properly
14:32:29 <cpn> ... The stream APIs don't provide timestamping, so needs to be done at the user level, so there's a lack of facilities for decent perofrmance measurements
14:33:12 <cpn> Bernard: In some of the ingestion proposals, implementers are integrating directly with QUIC, so they get that from the QUIC stack, so they're avoiding the web APIs and protocols fro that reason
14:33:30 <tidoust> s/fro/for
14:33:39 <cpn> Piers: Timestamping of data delivery currently must be done by getting time of day, but could be at a lower level
14:34:09 <cpn> ... Especially with chunked transfer deliver, where there's potentially gaps in delivery
14:34:36 <cpn> Bernard: It's good question about how you can build the congestion control today, the answer is probably no
14:35:07 <cpn> Topic: WebRTC and WebCodecs
14:35:18 <cpn> Harald: I'm trying to get a feel for where we are and what's moving
14:35:42 <cpn> ... The send-side and receive-side are approximately equivalent
14:36:09 <cpn> ... When yuo want to send data, in WebRTC you create a MediaStreamTrack conntetced to a camera or microphone
14:36:24 <cpn> ... There are multiple feedback paths in the RTCRtpSender
14:36:32 <tidoust> s/conntetced/connected
14:36:39 <cpn> ... There's feedback from the transmitter that modifies continually the sending bitrate of the codec
14:36:54 <cpn> ... The whole thing is designed to keep the video rolling, freezing is not acceptable
14:37:23 <cpn> ... We did insert the ability to have a MediaStreamTrack processed in JS, using the breakout box
14:37:41 <cpn> ... You connect a track to a processor and get out a stream fo video frames
14:37:59 <cpn> ... That's perfect for feeding to a WebCodecs endocer, then you packetize and send
14:38:21 <cpn> ... But that's not WebRTC. The breakout box is a proposal in the WG, we haven't come to an agreement to accept it
14:38:34 <cpn> ... Should the stream of video frames be visible on the main thread or not?
14:38:56 <cpn> ... More people think that it shouldn't than that it should, which is the opposite of the WebCodecs conclusion
14:39:13 <cpn> ... Insertable Streams were originally designed for inserting a stream into an RTCRtpSender
14:39:36 <cpn> ... You get out a stream of encoded frames, not the same as you get from WebCodecs
14:39:54 <cpn> ... There's a number of creating things people want to do, e.g., encode once, send to many
14:40:06 <cpn> ... Those thigns don't work. It looks like it works but it doesn't
14:40:41 <cpn> ... Why do integration with realtime and stored video streams differ? For realtime, need to keep the media flowing
14:41:13 <cpn> ... adjust according to bandwidth. SVC allows you to drop part of the stream, create a less power-hungry stream without having to askthe sender to change it
14:41:29 <tidoust> s/askthe/ask the
14:41:40 <cpn> ... The opposite is to deliver stored media, e.g., YouTube. If you get congesting, you might switch to a different source encoding
14:41:52 <cpn> ... Encoding speed doesn't matter too much. When you have bandwidth, you can catch up
14:42:06 <cpn> ... People tolerate stalls in the video
14:42:12 <cpn> ... SVC not so useful in this context
14:42:39 <cpn> ... Some desirable patterns we want to be able to do. Connect the incoming stream to the outgoing encoded stream
14:42:56 <cpn> ... Sending to multiple destinations, or some to WebTransport or RTP transport as needed
14:43:06 <cpn> ... We want to be able to use all the tools with all the other tools, but we can't
14:43:17 <cpn> ... We can only use according to how they were initially designed
14:44:03 <cpn> ... Design choices may not be optimal for all circumstances. I get asked why not let the app control the congestion control
14:44:12 <cpn> ... People want to experiment
14:44:36 <cpn> ... SDP is an old and clunky language for describing media streams across the network
14:45:01 <cpn> ... But it has expressive power. We'd like to make sure that (a) people who don't need to deal with SDP, they don't have to
14:45:15 <cpn> ... and (b) if something is possible in SDP, it's still possible with new interfaces
14:45:46 <cpn> ... Some controls have to be reacted to immediately, where hopping to JS, asking the user what to do, can lead to suboptimal responses
14:46:00 <cpn> ... But in other circumstances, asking the user is what we want to do
14:46:16 <cpn> ... We haven't started the investigation to figure out what we need to do
14:46:45 <cpn> ... In summary, WebCodecs and WebRTC are powerful tools. Some things fit together, and some don't
14:46:50 <cpn> ... So we need to learn more
14:47:44 <cpn> Cullen: I agree things don't fit together
14:48:03 <cpn> ... Back-propagation of parameters, changing bandwidth etc, we talked about in WebRTC
14:48:19 <cpn> ... Do video coding in the camera before it gets to the camera. Is that someting we look at fixing?
14:48:31 <cpn> Harald: Yes, make components, not systems
14:48:55 <cpn> Cullen: Some of these things we imagined doing, but didn't in order to ship quickly. So now we need to go back
14:49:25 <cpn> Harald: I looked at the ORTC effort. The linkage between codec and transmission hadn't been taken apart in that effort
14:49:48 <cpn> ... YAGNI (you aren't gonna need it)
14:49:59 <cpn> ... Use cases drive designs
14:50:27 <cpn> Cullen: I think your use cases of sending the video you receive is interesting, as well as back-propagating bandwidth into the pipeline
14:50:47 <cpn> Bernard: WHATWG streams has an idea of backpressure, but that may be different to what we mean by backpressure
14:51:04 <cpn> Cullen: I mean things that a scalable codec would want, bandiwdth, resolutions
14:51:23 <cpn> ... A single scene reprented by multiple video flows
14:51:29 <cpn> ... Similar with audio
14:51:45 <cpn> TimPanton: We shoulnd't neglect peer to peer applications
14:51:58 <cpn> ... It would be a mistake to focus too much on server-centric APIs
14:52:22 <cpn> ... I want to keep the symmetry of WebRTC and do P2P stuff without processing in the middle. That'll become more necessary
14:52:53 <cpn> MarkWatson: To add to the idea of backchannel information, I'd like to add high resolution timing information when packets are received
14:53:09 <cpn> ... You need to carefully manage and understand what's happening
14:53:39 <cpn> Harald: The video frame / audio chunk timestamp, which is the presentation intent, the other is timestamps on the way
14:53:50 <cpn> ... The former need to be set by the originator but never change
14:54:01 <cpn> ... Timestamps on the way are an important consideration too
14:54:10 <tidoust> scribe+ tidoust
14:54:58 <tidoust> Topic: Audio Challenges
14:55:14 <tidoust> -> https://docs.google.com/presentation/d/1XKNdYR0JWTtO1EIGu_sQy5rqfXMorrCKpu0UTX6kizQ/edit#slide=id.gf9de700b1d_0_741 Audio Challenges slides
14:56:37 <tidoust> padenot: Going to talk about some of the audio challenges. First, explain the main problems that audio has and not video.
14:56:46 <tidoust> ... Then identify problems and proposals.
14:57:04 <tidoust> ... Main problem is that audio is hard real-time. It should never fail.
14:57:25 <tidoust> ... Video, you have an event, 60 times per second. In audio, event every ms or every 2ms.
14:57:58 <tidoust> ... Considering computers are likely under some load, audio data, which again should never glitch, should only touch real-time threads.
14:58:04 <tidoust> ... That's for PCM (decoded audio).
14:58:25 <tidoust> ... Any other setup will lead to resilience issue. E.g. if some code is delayed due to GC or the like.
14:59:04 <tidoust> ... There is a proposal for a push model for the audio.
14:59:14 <tidoust> ... All the audio in computers works as a pull model.
14:59:24 <tidoust> ... It follows that we need to insert a buffer between the push and pull.
15:00:01 <piers> piers has joined #webrtc
15:00:21 <tidoust> ... This is more or less what we have now. What is important is that we should be able to have a media stream, connect it to the real-time thread
15:00:41 <tidoust> ... and the real-time thread needs to know that there are missing bytes at the input.
15:01:01 <tidoust> ... In the audio worklet today, this is missing.
15:01:26 <tidoust> ... A strawman proposal: "process" method with inputs, output and params as parameters.
15:01:33 <riju> riju has joined #webrtc
15:01:37 <dom> Present+
15:02:04 <tidoust> ... If we say that we can make the size of the buffers known, then we can tell when we run into buffer underruns.
15:02:17 <tidoust> ... Useful in different scenarios, not only in WebRTC.
15:02:36 <dom> [slide 48]
15:03:07 <tidoust> ... Thought experiment with Chris Cunningham recently. Works well with WebCodecs. Low-latency. SharedArryBuffer being used.
15:03:29 <tidoust> ... A system like this is in production today in Gecko.
15:03:51 <tidoust> ... Extremely high resiliency for audio. Perceptually, we think it is better than the opposite.
15:04:49 <tidoust> ... The question is: what pulls? As the API stands today, you have to pull from a non real-time thread.
15:05:42 <tidoust> Harald: When we are pre-jitter buffer, if we want the processing, we need something that can access the jitter buffer between the input and the audio buffer.
15:05:57 <tidoust> padenot: I haven't found any advantage of not doing it only on the real-time thread.
15:06:19 <tidoust> Harald: The only problem is security. Some kind of guard against occupying the CPU.
15:06:35 <tidoust> padenot: Something we had to implement for AudioWorklet.
15:06:49 <tidoust> ... We do it explicitly in Gecko for security reason.
15:07:50 <tidoust> Youenn: You tried to look at MediaStreamTrackProcessor, what is available in Web Audio. You found some potential gap and this new API, you think that you close the gap.
15:07:55 <tidoust> ... Is that correct?
15:08:13 <tidoust> padenot: Yes, I found one gap. Can't distinguish between buffer silence and buffer underrun.
15:08:34 <tidoust> Youenn: There were mentions about timestamps in the GitHub discussion. Any idea?
15:08:48 <tidoust> padenot: Clock domain traversal is the fundamental problem.
15:09:02 <tidoust> ... Certain number of frames per second in the clock domain of the sending device.
15:09:31 <tidoust> ... When you play out one second of someone that has recorded 1s of audio data on their computer, you need to reconcile for the drift.
15:09:50 <tidoust> ... The timestamps allow you to measure how much faster or slower your computer is running.
15:10:19 <tidoust> ... Different ways to reconcile, e.g. looking at and adjusting the jitter buffer.
15:10:45 <tidoust> ... When the lengths of the data packets don't match, then you can identify precisely where the problem is.
15:11:15 <tidoust> ... This logic is needed without all that has been discussed because you can already connect devices with different clock domains today.
15:12:14 <tidoust> ... Re. A/V sync, you can match a frame with some time in the audio stream. Now, it's important to understand output latency. It can be significant when you're using e.g. Bluetooth devices.
15:12:41 <dom> RRSAgent, draft minutes
15:12:41 <RRSAgent> I have made the request to generate https://www.w3.org/2021/10/26-webrtc-minutes.html dom
15:12:55 <tidoust> ... For display, it's important that you would delay your video rendering to account for the audio latency.
15:12:58 <dom> RRSAgent, make log public
15:13:15 <tidoust> fluffy: My point is that you need both. You have to deal with audio and video.
15:13:29 <tidoust> padenot: I agree completely.
15:14:04 <tidoust> Topic: Next Generation Audio Codecs
15:14:23 <tidoust> Harald: This is me trying to exercise my imagination as it's not something that I know the most about.
15:14:33 <jcraig> jcraig has joined #webrtc
15:14:40 <dom> [slide 50]
15:14:49 <tidoust> ... We do have OPUS. Gives reasonable bandwidth.
15:14:54 <dom> Present+ PaulAdenot, CarineBournez, FrancoisDaoust, HaraldAlvestrand, BernardAboba, ChrisChunningham, RandellJesup, SteveLee, Tove
15:15:00 <dom> [slide 51]
15:15:01 <tidoust> ... Other codecs may actually work better.
15:15:48 <tidoust> ... You could for instance have meaning-based encodings, using ML, e.g. with shape of the mouth, etc, or use text-to-speech or speech-to-text.
15:15:50 <dom> Present+ AlbrechtSchwarz, ChenCheng, ChrisLilley, CullenJennings, EerorHakkinen
15:15:52 <dom> [slide 52]
15:15:59 <tidoust> ... Tons of things you can imagine. How do you get these deployed?
15:16:29 <dom> Present+ EladAlon, Eric, FlorentCastelli, GregFreedman, HiroshiKajihata, HongchanChoi, JamesCraig, JanIvarBruaroey, JungkeeSong
15:16:38 <tidoust> ... Obtain the licenses, then you run the experiment, then you integrate it into open source codebase, then you push that to make it available on all platforms to make sure that you have interoperability on it.
15:16:40 <tidoust> ... And then you win.
15:16:56 <tidoust> ... It would be great if you could start winning at step 2.
15:17:03 <dom> Present+ Kajihata, lideping, lilin, MarkWatson, MattParaid, PhilippeMilot, PiersOhanlon, RobinRaymond, ShinNagata
15:17:06 <dom> [slide 53]
15:17:27 <jcraig> rrsagent, make minutes
15:17:27 <RRSAgent> I have made the request to generate https://www.w3.org/2021/10/26-webrtc-minutes.html jcraig
15:17:29 <dom> Present+ SongXu, TakioYamaoka, TimPanton, TuukaToivonen, Varun, XiaohanWang, YanChangQing, Youenn
15:17:44 <tidoust> ... If we can get performant interfaces to raw and encoded data and precise timing guarantees. Then the only challenges is to minimize underruns.
15:17:45 <dom> Present+ ZhangLei
15:18:36 <tidoust> ... We can imagine that all the rest is just a codec. A typical deployment model for this kind of codec should be that we develop it as WASM, deploy it as part of the page, experiment with it before we can integrate when we have proved the value
15:18:46 <tidoust> ... as opposed to integrating it to start with.
15:19:27 <tidoust> ... The reason why we want codecs as separable components is that, if we want to deploy a new way of doing audio decoding/encoding, then we need these raw and performant interfaces.
15:19:32 <tidoust> ... That's my vision on new codecs
15:20:10 <tidoust> Tim: That seems to me to be a big company vision. Without access to IPR, hard to compete in that environment.
15:20:18 <tidoust> ... One of the joys of OPUS is that we compete on common grounds.
15:20:30 <tidoust> ... Closed IPR for new codecs.
15:20:38 <tidoust> Harald: It is a challenge.
15:21:07 <tidoust> ... It also means that "two men in a garage" can create a codec without having to integrate in Chrome or Mozilla.
15:21:31 <tidoust> Youenn: I was not sure about the scope here. New codecs for WebRTC? Or in general?
15:21:41 <tidoust> ... In general, you could use WebTransport.
15:22:10 <tidoust> ... One good thing about targeting audio first is that packetization is much easier to solve than with video.
15:22:17 <jcraig> present+ jcraig, youenn_fablet
15:22:25 <tidoust> ... If we want to open the box for WebRTC, starting with audio seems logical to me.
15:22:51 <tidoust> Harald: When we did the breakout box for WebRTC, we started with audio, and extended it to cover video.
15:23:03 <tidoust> ... And now we're cycling back to do video-only.
15:23:26 <tidoust> Youenn: For me, these are orthogonals. WebRTC Encoded Streams being a third one.
15:24:30 <tidoust> Topic: WebCodecs Challenges
15:24:34 <dom> [slide 39]
15:24:58 <tidoust> chcunningham: First issue I wanted to raise is containers.
15:24:58 <dom> [slide 40]
15:25:11 <tidoust> ... Folks come and ask how to do muxing/demuxing.
15:25:54 <tidoust> ... Answers is to go and find a JS/WASM lib. I actually like that answer, except that finding a library is a challenger.
15:26:12 <tidoust> ... Been using MP4Box for our demos, but there are other container formats.
15:27:02 <tidoust> ... I'm going to measure how configurable using libavformat is, how heavy that is.
15:27:04 <dom> [slide 41]
15:27:49 <dom> Present+ MichelBuffa
15:27:55 <tidoust> chcunningham: Another issue that is relevant is reclamation. The user agent can reclaim a codec for foreground apps. Yielding the codec, as done in Chrome today.
15:28:13 <tidoust> ... If we imagine a future where the video element is implemented in JS.
15:28:36 <tidoust> ... The challenge with this is identifying which apps are in the foreground.
15:28:52 <tidoust> ... Heuristics in Chrome to detect this.
15:29:28 <tidoust> ... There are apps like movie production apps. Long encode job, so you might leave that task in the background. Would not be great to come back and realize that zero progress was made.
15:30:11 <tidoust> ... We don't have a great solution to these problems right now. You should expect some proposal in the coming quarter. Taking feedback!
15:30:13 <dom> [slide 42]
15:30:35 <tidoust> chcunningham: Finally, we have content protection. E.g. live streaming of sports events.
15:30:51 <tidoust> ... Past discussion on whether this should be EME extension, SFrame.
15:31:17 <tidoust> ... Not a WebRTC expert, but my understanding is that SFrame was introduced to solve E2E encryption.
15:31:28 <tidoust> ... There were some discussion on applying SFrame to JS.
15:31:44 <tidoust> ... Should SFrame be part of WebCodecs? My thought is that it shouldn't.
15:32:19 <tidoust> ... EME protects the content more rigorously. Even if you use SFrame, you have a bunch of other things to look at, which EME covers.
15:32:29 <tidoust> ... It would be crazy to re-invent all of that.
15:32:46 <tidoust> ... Especially since most folks will want to depend on the same server-side infrastructure.
15:33:13 <tidoust> ... Also, the whole thing about JS not being trusted seems weird to me.
15:33:31 <tidoust> ... If you cannot trust your JS, you basically cannot do things such as banking.
15:34:23 <tidoust> ... I just wanted to call that out. I'm not a WebRTC expert. To the extent that folks are reasoning on SFrame, we should reconcile our views.
15:35:37 <tidoust> dom: Re. untrusted JS, you don't want or need JS access to the media stream. For example, imagine we're running a conferencing system for a high-level transaction, we don't want the company that provides the conferencing service to be able to access the decoded content.
15:35:56 <tidoust> ... Same as for EME where you're protecting against the end user.
15:36:27 <tidoust> ... SFrame with browser-managed key exchange system would go some way to addressing some envisioned scenarios.
15:37:01 <tidoust> chcunningham: Does SFrame need additional protection then? That's my fear. If you have access to decoded frames.
15:37:27 <tidoust> dom: The ultimate model for SFrame is that the JS wouldn't be able to do that, because they wouldn't have access to the key that the UA is using.
15:38:15 <tidoust> chcunningham: The concern is that, if we acknowledge that SFrame has this gap. If you want to solve that gap in a reasonable timeline without re-doing the whole exercise, you want to lean on EME.
15:38:45 <tidoust> ... It may be worth wondering whether SFrame is useful at all. If you have EME, what else do you need?
15:39:52 <tidoust> Tim: The ability to remove potential crypto footguns. The class of mistakes that devs make. Not allowing errors to be made is valuable, which is what SFrame provides.
15:40:03 <tidoust> ... The other interesting bit is integration with SFUs.
15:40:26 <tidoust> ... I have the feeling that EME may have side effects, e.g. when a participant leaves.
15:40:57 <tidoust> chcunningham: Some provision for key rotation in EME, I think, but we would get out of my expertise here.
15:41:29 <tidoust> ... I just haven't seen anything that couldn't be solved by EME now, so nothing that justifies inventing something else.
15:42:09 <tidoust> jib: SFrame is exposed in Encoded MediaStreamTransform. The goal there is to protect the keys from JS.
15:43:09 <tidoust> ... I think we should start from use cases. We should first go to people developing the SFrame protocol if we have specific needs.
15:43:23 <tidoust> ... Mozilla has a different proposal in that space by the way.
15:44:09 <tidoust> chcunningham: The idea of protecting the keys is worthwhile. I do think that there is a gap in protecting the media. If you're worried about the keys, I think you should be worried about the decoded media as well.
15:44:40 <tidoust> ... It's a big endeavor to protect media. I think we should try to avoid building a second mechanism for that.
15:45:08 <tidoust> jib: Re. keys, it's more to avoid them being in process context.
15:46:28 <tidoust> MarkWatson: I'm sort of confused. EME currently has registries where it refers to different container formats. Nothing in EME for RTP as container format. If you're going to introduce that, then SFrame may be the right mechanism there.
15:47:01 <tidoust> ... One of the things with WebCodecs is: what is the container? What is the encryption?
15:47:26 <tidoust> Bernard: To be clear, SFrame is an frame encryption mechanism, independent of RTP
15:47:46 <tidoust> MarkWatson: Then it may be the right frame encryption mechanism to be used in EME.
15:48:06 <tidoust> chcunningham: That could make sense. I just want to avoid re-creating things that already exist.
15:48:20 <tidoust> ... There are no containers at the WebCodecs level.
15:49:04 <tidoust> Richard: Same place as Mark. You can view EME as E2E encryption itself. And SFrame could be used for frame encryption.
15:49:25 <tidoust> fluffy: If you look at the use cases, EME is not a good match.
15:49:31 <tidoust> ... We looked at it in the broader context.
15:49:44 <tidoust> ... Unlike a streaming scenarios, lots of people and lots of keys.
15:49:55 <tidoust> ... Lots of security requirements.
15:50:10 <tidoust> ... Doesn't match how keys are distributed. I don't think that EME works very well at all.
15:50:25 <tidoust> Bernard: I think that the use case here is streaming with WebCodecs as decoder.
15:50:39 <tidoust> fluffy: I get that, but then we need a more generalized mechanism.
15:51:27 <tidoust> ... The part of the requirements that is similar is not having access to the decoded media once decoded
15:52:14 <tidoust> Bernard: Any feedback on how to move forward?
15:52:52 <tidoust> chcunningham: Cullen, if you can help us out with providing some links on investigation that you may have done. Next steps could be extensions that may be worth considering and how SFrame could be used with EME.
15:53:27 <tidoust> MarkWatson: If the use case is for streaming, then you're going to have some support Common Encryption somehow.
15:53:49 <tidoust> Bernard: Any other problems that people are interested in raising?
15:54:15 <jcraig> jcraig has left #webrtc
15:54:40 <tidoust> Harald: I would definitely want to follow up more on finding out where exactly we interface with real-time audio with protected content.
15:54:59 <tidoust> ... I think we need to have a safe interface there.
15:55:58 <tidoust> Youenn: Getting the input from Web audio people is very useful. I think we should make decisions based on what audio folks are doing.
15:56:33 <tidoust> ??: Where can group members follow up?
15:56:53 <tidoust> Harald: I don't know. It should belong in either WebCodecs or Web Audio. Probably WebCodecs.
15:58:17 <tidoust> jib: Further input on the application of creating readable streams at the video frame. The lifetime model uses a close method that requires apps to explicitly call close, which triggers headaches as you're not necessarily aware of how many frames have been buffered.
15:58:38 <tidoust> ... Please join us requesting features to WHATWG to make sure that we can have real-time streams.
15:59:03 <tidoust> Bernard: Yes, I think that we should have a mechanism on following up with WHATWG. It's not clear to me that we have a process in place.
15:59:18 <tidoust> jib: I understand that we're going to have a meeting with WHATWG.
15:59:49 <tidoust> cpn: A quick note on container investigation. I'm certainly interested in that. Primarily a Media WG thing.
16:00:19 <dom> RRSAgent, draft minutes v-slide
16:00:19 <RRSAgent> I have made the request to generate https://www.w3.org/2021/10/26-webrtc-minutes.html dom
16:02:32 <takio> takio has left #webrtc
16:11:46 <Yuhao_Fu> Yuhao_Fu has joined #webrtc