WebRTC WG / Media WG Joint Meeting

Meeting minutes

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0014/WEBRTC-MEDIA-2023-09-15.pdf

[Slide 2]

cpn: [goes through reminders]

[Slide 5]

[Slide 6]

jean-yves: If I have audio related issues, may I raise them today?

cpn: We'll see how we manage the schedule, if time allows

[Slide 7]

cpn: [reviewing tips]

Agenda

[Slide 8]

Introduction

[Slide 10]

Bernard: Some background. Streaming and RTC converging in general.
… Game streaming, broadcast with fan-out, perhaps to be called low-latency.
… Point is to combine things at large scale.
… WebCodecs combined with WebRTC Data channel.
… We see this solved differently.
… Raises concerns about duplication of efforts.
… WebRTC encoded transform often used as Poor man's WebCodecs
… Some things built into WebCodecs but not in WebRTC.
… Also two distinct code paths in the browsers. That creates issues

[Slide 11]

Bernard: Here are some examples of similar issues in both worlds.
… Example of QP-based rate control issue in Chromium. We have it in WebCodecs, not in WebRTC.
… [goes through other examples, including HDR support, encoding/decoding times]
… These encoder/decoder APIs need to run across a huge range of hardware, and platforms.
… That's difficult to test.
… Also differences in codec support, e.g., HEVC and AV1 with subtle differences.
… And then support for SVC and simulcast.
… Issues opened in WebCodecs.

[Slide 12]

Bernard: Another question has come up in WebRTC: whether goal is to support every desirable feature or to enable apps to build their own support?
… Examples: In WebRTC streaming, interest in HEVC which is not in WebRTC (work in progress in WebCodecs). In music contexts, AAC.
… Some of the the use cases that may addressable with a combination of WebCodecs and WebRTC transport.
… Unified encoder API, which Erik proposes. Under the cover, not a JavaScript API, but it illustrates some of the issues we're seeing that might benefit from being addressed in a more uniform way.

[Slide 13]

Bernard: This is an example of an issue I discovered yesterday.

[Slide 14]

Bernard: Look at the frame RTT.

[Slide 15]

Bernard: The glass-to-glass latency is slightly larger.
… Somewhere in the system, we're adding 200ms of delay and it's not due to network. That's in the browser.

[Slide 16]

Bernard: Encoding latency is pretty low, that looks good.

[Slide 17]

Bernard: But the decoding latency is excessive
… That seems pretty weird.
… Example of something that does not happen in WebRTC but happens in WebCodecs, and that needs testing.

Randell: Have you validated that the bug is a decoder stack issue or due to the codev AV1.

Bernard: It's not the API, something to do with the decode pipeline.

Jan-Ivar: In Firefox, we now support VideoDecoder, feel free to give it a try.

QP-based rate control in WebCodecs

<padenot> (this only work in Fx Nightly right now fwiw, so don't use a release build)

[Slide 20]

eugene: Recent change in WebCodecs to allow app to ask about bitrate mode and quantizer use.
… Some AV1 specific option, which is why it appears in that specific part.

[Slide 21]

eugene: I was able to create a demo.
… which shows how to achieve desired bitrates.
… Feel free to give it a try

[Slide 22]

eugene: My point is that, even with the most basic algorithm, I was able to achieve pretty good results for bitrate control.
… I think that makes it valuable.
… Also, very quick response, frame-level response to changing conditions.
… It allows to work around bugs in GPU drivers. We see in Chrome that, sometimes, their rate control algorithms contain bugs.
… It gives ability to set lower bounds on image quality: "never give images lower than something", as no one likes pixelated images.
… I encourage people to try.

hta: Very interesting. I tried your demo. You don't touch resolution at all, is that correct?

eugene: Yes.

hta: I was impressed by the result. For that codec, that seems like a very useful mechanism.
… May be room to harmonize between codecs.

Bernard: Very interesting exercice. That's an example of how you can write a PR in WebCodecs that would require a complex process in WebRTC. Not everyone might want this in WebRTC. Lots of use cases to validate.

Randell: Issue is not that QP values vary from one codec to another, but also between implementations of a given codec itself.

cpn: That variability, should we test it?

Randell: Yes. I would imagine hardware implementations could vary in their response as well.

eugene: Correlation between the bitrate becoming smaller and the resolution is the same regardless of the implementation.

Erik: In Chrome on Windows, we use this type of external controlled per-frame QP.

eugene: Yes, this allows us to workaound bugs in rate control, as I mentioned earlier.
… I wanted to encourage other browser vendors to implement this as well.

Hardware Encode/Decode Error Handling

[Slide 25]

Bernard: The related issues are listed here

[Slide 26]

Bernard: Little bit of background for issue 146. You can get encode/decode error outside of SDP negotiation.
… Slide lists examples of when that can happen.
… You can switch from hardware to software and vice versa.
… Also, we're seeing increasingly that some profiles are hardware-only.

[Slide 27]

<fluffy> I don't understand the case when we get a parsing error for encoder. Someone point me the right way ?

fippo: Some things that we can do.
… [goes through the list]
… More telemetry is always a good thing.

[Slide 28]

fippo: For WebRTC, how are we going to expose the decode errors?
… [goes through list of options in the slide]
… We need to come with a precision on where we want to expose the event

[Slide 29]

Bernard: Two main directions to go. This is proposal A.
… Reuse RTCError event.
… You can see in the dictionary and enum that we can list a number of reasons.
… Might be a good idea to add in the timestamp so that you know when something happens.

[Slide 30]

Bernard: Proposal B would be to create a custom event.
… Some sketching in the slide on how that might work.
… Just wanna get some feedback on which one of these proposals makes sense to people.

fluffy: Supportive of this either way. When we talk about parsing error for encoder, I wonder what that can be.

Bernard: More a decoder thing indeed.
… Parse error.

fluffy: OK, regardless, much needed.
… No preference from me.

Henrik: I prefer proposal B because I think that the error is different enough from other errors.
… Rather than an unsigned short error number, we should rather have an enum.

Bernard: Yes, we can do that.

florent: [missed]

hta: I also prefer proposal B, to avoid coupling.

cpn: The naming here is all RTC specific. If we were to introduce that in WebCodecs, we might need a more general name.

Bernard: Instead of RTCRtpSender or RTCRtpReceiver, we might want to use Encoder/Decoder.

<dom__> [maybe inherit from ErrorEvent rather than Event? and thus moved the error specific info in the error attribute?]

Henrik: One event handler per decoding could be used.

[Slide 31]

Bernard: WebCodecs does have errors. EncodingError for errors about data and OperationError for resource issue.

eugene: Done spec-wise. In Chromium, nothing done.
… It would be nice to make this recommendation more explicit in the spec so that people know what to expect.

New Video Encoder API

[Slide 34]

Erik: This is the view of how it works today in Chrome for WebRTC. Huge entangled ways of doing things.
… The most important thing is scalability.
… In the end, that is implemented in the WebCodecs wrapper.

[Slide 35]

Erik: Plan to do an overhaul of the internal WebRTC video encoder API.
… Anything related to RTP/Transport, we want that to be external.
… And we want everything to be asynchronous.

[Slide 36]

Erik: We think that's a good opportunity to aligne WebCodecs and WebRTC to avoid duplicate code.

[Slide 37]

Erik: What I would like to see is in this slide.
… One scalability controller in WebRTC.
… If you want to do that yourself with WebCodecs, you can.

[Slide 38]

Erik: Things we'd like to solve include codec selection, flexible reference structures, as much as possible to minimize codec-specifics and rate control.

[Slide 39]

Erik: The browser can be smart but cannot always make the best choice automatically.
… Maybe one choice is optimal for the sending, but suboptimal for the receiver.

[Slide 40]

Erik: To solve this, we want the app to be in full control. If you know the context, you know how to select and prioritize.

[Slide 41]

Erik: We had all of these scalability mode systems.
… and yet they are not enough

[Slide 42]

Erik: So many other things you could do.
… E.g. If you might want to do B-frames, or whatever magic your scenario might need.
… Not feasible to support everything in the browser.

[Slide 43]

Erik: Again, solution is to let the app be in charge.
… [goes through slide]

[Slide 44]

Erik: With these hooks, you can implement all of the scalability modes yourself, and do more, in a codec-agnostic way.
… As a side effect, if you do this, you need minimal feedback from the decoder.

[Slide 45]

Erik: Not going to talk about rate control, Eugene covered it already.

[Slide 46]

Erik: Illustration of the concepts that were discussed. Take it as an abstraction for now.

[Slide 47]

[Slide 48]

Erik: Some mechanism to query bitrate control capabilities. CQP or CBR.

[Slide 49]

Erik: Total number of buffers you have avilable. Max number or references, max temporal and spatial layers (output frames per input frame).

[Slide 50]

Erik: Which input format is accepted.
… What pixel formats.

[Slide 51]

Erik: Same thing for the output.

[Slide 52]

Erik: A bunch of other discussions about what else we could have.

[Slide 53]

Erik: How do we actually select and create an encoder

[Slide 54]

Erik: Enumeration. Gives you capabilities, implementation name, codec name, code specifics.

[Slide 55]

Erik: encoder settings that will apply to the lifetime of the encoder.

[Slide 56]

Erik: The main method is encode()

[Slide 57]

Erik: The input frame is just a frame. The content hint, the speed setting and how should you do frame drop.

[Slide 58]

Erik: Params you can give to the encoder

[Slide 59]

Erik: Apart from control, you have these layers parameters.

fluffy: Ignoring all of the details of the API, arbitrary buffers referenced cannot be passed around in the underlying codecs.
… I think that you'll have a hard time guessing what pieces might work for a given type of hardware.

Erik: For hardware, it depends on drivers. In Chrome OS, we already do that under the scenes.

fluffy: So, works for VP8?

Erik: Yes.

fluffy: HEVC?

Erik: We talked with a few vendors. Some can do it. Some API limits.

[Slide 60]

[Slide 61]

Erik: This is a complete example of how that would work.
… Can skip these frames, look at them offline!

[Slide 62]

[Slide 63]

[Slide 64]

Erik: The API is not as bad as it looks regarding fingerprinting. All of it can be derived somehow.

Bernard: One of my questions would be: what would be the effects on the JavaScript that we have?

Erik: My understanding is that we have some sort of software fallback that could clash with this.
… I don't really have an opinion on what the best route forward is this.

jan-ivar: In WebRTC, we had a problem getting powerEfficient. I'm having a hard time seeing how we can expose so many stuff to JavaScript.
… Double-edged sword is that there's a lot of copy-and-paste on the Web. Good defaults are needed.
… Tying browser vendors to do the right thing.
… Puts a lot of pressure on the client to implement things correctly.

Erik: Agree. I think we could have something separate for WebCodecs that gives some help. No sure I like that.
… Would one of use write that? Or would we hope that the community does?

Jan-Ivar: I worry about how people may approach these expert APIs.

Erik: Yes, I'm thinking about 3D cases where WebGL is not your go-to target but rather your game engine.

Elad: With fingerprinting, would giving some capabilities through permissions on microphones, cameras help?
… Regarding libraries, people are good at creating them.

jan-ivar: Asking for cameras, microphones could be seen as permission escalation.
… Better direction.

hta: Is this an API that you would expose in workers, main thread?

Erik: Not an expert in that.

hta: The current position in WebRTC encoded transform is that it's worker-only.

eugene: Everywhere where WebCodecs is available seems like a good approach.

<hta> correction to minutes: I said that the current position in webrtc encoded transform is that we're still quarreling.

Francois: The API allows enumarating decoders, why do you need the exactly list?

<Zakim> tidoust, you wanted to wonder about enumerateDevices

erik: if you just ask for a particular codec it's hard to reason what you do with it

paul: most of this is doable in WebCodecs, so prefer you reframe it in terms of WebCodecs
… do a gap analysis between web exposed capabilities and what's needed
… e.g., automatic fallback is not a thing. we have capabilities, a registry with per-codec settings
… we're duplicating a lot here, which we should avoid

+1 paul

erik: is this feasible? should we move towards doing that gap analysis?

paul: file issues. professional creator users and rtc users both have needs, lots of communities engaging
… reach a uniform API is good, but a lot of what you describe is doable
… avoid duplicating lots of work

hta: you're emphasising precise user control of features, we want to have these these base features and leave the higher level modes like SVC and rate control and simulcast as documented as implemented in terms of these primitives

<padenot> +1 hta

hta: we should do that style of spec more. I want WebCodecs core to be a primitive used by WebRTC and a more user friendly WebCodecs interface, but the core is clear and simple as possible

Erik: My thoughts as well

Florent: On complexity of the API, shouldn't be a problem, there are a few expert APIs on the web, WebCrypto and WebGPU
… If this were introduced, libraries would make it easier to use
… similar happened with WebCrypto

Bernard: A cautionary note - it looks like we're on the verge of a major hardware change
… ML based codecs for audio. The nature of the hardware is likely to change in the coming years, how would these fit the framework?
… Per-macroblock QP for segmentation - this would require API changes to WebCodecs
… The API meets demands over the last few years, but need to look to future demands

Erik: On inter-picture references, I haven't seen them breaking the mold drastically, something to watch for

Xiaohan: How many of these can be option, and have good defaults? So it remains a simple higher level API?

Chris: Agree on the need for a gap analysis. Previous approach has been to add to WebCodecs incrementally, e..g, the per-frame QP. Do we want to move to exposing everything? We've heard concerns about enumeration and potential fingerprinting
… Next step is to meet again when we have a gap analysis

Web Audio

jya: Currently, use canPlayType with opus, etc, not sufficient
… have something on top of MediaRecorder, do you support recording with multiple channels
… the hope is if you can play it you can decode it, not always true

Xiaohan: Is that the MSE or WebRTC case?

jya: It's MSE, file playback also
… It's a Media Capabilities decoding query. The requirements to play aren't always the same for playing

Xiaohan: Not so familiar with WebAudio, but the next step could be to raise an issue in Media Capabilities API, and we can follow up

Media Capabilities issue 185

Chris: [recaps the issue]

hta: When you pass a mime type, question is whether it contains parameters or not

jan-ivar: it stil references the webrtc spec, would we want it to move to MC API?

hta: that was on purpose, so you can pass it to the setCodecCapabilities
… should we make the capabilities convertible, ideal if they both were the same, but they're both deployed

chris: Discussion needs youenn, so let's follow up in a future needs

Wrap up

chris: Nothing else to discuss, so let's close here. We'll follow up in future calls

[adjourned]

– DRAFT –
WebRTC WG / Media WG Joint Meeting

15 September 2023

Attendees