WebCodecs VideoEncoderConfig by Chris Cunningham (Google)

WebCodecs VideoEncoderConfig

Presenter: Chris Cunningham (Google)
Duration: 9 minutes
Slides: PDF

Keyboard shortcuts in the video player

Play/pause: space
Increase volume: up arrow
Decrease volume: down arrow
Seek forward: right arrow
Seek backward: left arrow
Captions on/off: C
Fullscreen on/off: F
Mute/unmute: M
Seek to 0%, 10%… 90%: 0-9

Slide 1 of 5

Hi, I'm Chris Cunningham. If you missed my earlier WebCodecs primer talk, I encourage you to check that one out first.

In this talk, I'm going to focus on WebCodecs encoding configuration. The idea is to surface what we have and the reasoning behind it, and have you tell us what to prioritize next. If you look at our configuration structure and compare that to FFmpeg or libvpx, it will immediately be clear that we have lots of knobs we could consider adding.

Having said that, there are lots of folks who are using WebCodecs and encoding in media production settings and apps. Don't let this talk scare you off.

Slide 2 of 5

Before I get into the structure, I want to mention the underlying design principle, which is don't constrain features or quality unless the user asks for it.

For encoding defaults, this might mean we don't constrain quality for the sake of latency. We don't constrain bit rate unless requested. It's VBR by default. We allow B-frames wherever the profile that is configured would allow B-frames.

Slide 3 of 5

Another thing to mention. When you start encoding, the output callback will also admit a metadata dictionary that contains the video decoding config. This is obviously very important if you want to eventually decode the content you encoded.

And then it admits that same config anytime the configuration changes, such that, now you need a slightly different configuration to decode what is now being output.

I mention this because it's useful to describe some of the encoder configuration parameters we're about to get into, in terms of their effect on the emitted decoder config.

Slide 4 of 5

All right, so here we have just a screenshot from the spec. This is the video encoder config dictionary. At the top, we've got the required parameters. We have codec, which is a codec string, very similar to what you would see in MSE or MediaCapabilities. This is like for H.264, It's av1 dot profile byte, level byte. For VP9, it's VP09 dot profile dot so-on. AV1, AV01. For all codecs that are usable with WebCodecs, you can go to the WebCodecs specification codec registry, and there you can look up the specific codec strings to use for a given codec.

All right, so next we have width and height, and this is the number of pixels to encode. For video frames, this corresponds to visible width and visible height. For decoder config, this corresponds to display. No. It's again visible width and visible height. If you give us a video frame with a visible width and height that don't match these dimensions, we will actually scale the frame before we encode, and that is originally motivated by RTC use cases, where the camera's open at a high resolution, but then you want the encoder to be able to encode at whatever resolution, because your needs fluctuate with the changes in bit rate and CPU load.

Okay, so moving on, we are now into optional parameters. There's displayWidth and displayHeight. If not provided, these default to the width and height we just talked about, and generally, that's probably what most people want. Where this is interesting is for things like anamorphic content, where the pixel is actually wider than it is tall when finally rendered. We actually stretch the rendered video just slightly.

Then we have bitrate. The default is implementer-defined. In Chrome, it's gonna be whatever the underlying codec, like let's say libvpx would use, if you didn't specify a bitrate. That's probably not what you want. If you're using this production, I recommend setting bitrate, in which case I also recommend setting the next knob, which is frame rate. The value here can be rough, like variable frame rate video is common. But the idea is to guide the rate controller. bitrate is bits per second. And so you need to know how many frames per second to know how many bits to give a given frame.

On our roadmap, we intend to also use frame timestamps to refine this. The minute that you said 30 fps, but we detected it's actually closer to 25 in this one segment of your video, or it's much higher than that in some other segment, we should adjust rate control accordingly. But even then, it's good to provide some sort of baseline upfront, because we won't have any timestamps until some initial frames come in.

All right, so I'm gonna go out of order just a second and keep talking about bitrate. Next thing I'm gonna say is bitrateMode. This is second from the bottom. This is variable by default, and then the alternative is, of course, constant.

We aren't super prescriptive about how constant or how variable, like what is the buffer size that we're talking about here? And that reflects the reality that WebCodecs is built on top of many codec implementations, which, themselves, are not super prescriptive or consistent.

Having said that, some libraries do offer a lot more knobs here, so options to configure the buffer size, the amount of overshoot or undershoot, and we are open to adding those. Let us know if you need them.

Okay, so now I'm gonna go back up and start with the ones that I missed.

hardwareAcceleration: so the wording here, it reads like a hint. No preference. Prefer hardware. But browsers have the option to treat this as a hint or to treat it as a requirement, and in Chrome, we treat it as a requirement.

If you say prefer hardware and we don't have the hardware, or if we can't support some aspect of your configuration in hardware, we will fail to configure, which is fine. It can be actually very useful. You can use the isConfigSupported check in advance to figure out where the support will lie. But you could, let's say if you were really worried about power savings and you had a lot of flexibility in the codec that you were choosing, you could use this to iterate over options on the platform to see, okay, it has accelerated encoding for 264, but maybe not for VP9, so that's the one I'm gonna use because I'm very power-conscious on this platform, whatever. Or maybe you have a WASM fallback that you prefer. You wanna use our hardware encoding, but if it's gonna be software, you prefer to use your software, and that's also totally fine.

Most folks will probably just take the default of no-preference, and for that default, Chrome will always try to find hardware for the codec and then fall back to software if hardware wasn't available.

Actually a small caveat on that. It will find hardware above a certain resolution cut-off. Below a certain resolution point, using a hardware-accelerated codec is not desirable. It actually can be a little bit more in startup costs and no savings in power, in which case we use software.

All right, scalabilityMode. This is a string that identifies patterns of layering for scalable codecs. For media production, you probably don't care about this. You probably just want one layer. This is probably more interesting for RTC folks. The linked spec here, WebRTC SVC, has lots more details in the diagrams and patterns of scalability layering. And then finally latencyMode. This is, by default, quality", where the other option is, what is it? realtime". Most folks in this group probably want quality, but it's interesting to talk about realtime, because obviously very useful for realtime scenarios. But there's an interaction with some of the knobs above, which is that, in realtime, we strive to get the frames out and achieve the bit rate target and the frame rate target more strictly. We'll actually use frame rate as kind of a deadline. If we can't encode a frame in realtime mode and maintain the frame rate, in terms of our output, then we will drop frames, where encoders support that.

Slide 5 of 5

Okay, so those are our different knobs that we have so far. This is a link to our GitHub. Please let us know what you need. We like to prioritize things by impact. If you can tell us why you need it, what is it you plan to do with it, how is it gonna be a better experience for your users, if you come to our GitHub and you already see an issue filed for the knob that you want, pile on. Again, that's impact, to say, hey, there's five apps that want this knob. Okay, we'll pay attention to that, sure.

Yeah, that's it. Thank you so much for watching.

All talks

Workshop sponsor

Interested in sponsoring the workshop?
Please check the sponsorship package.

W3C/SMPTE Joint Workshop on Professional Media Production on the Web

WebCodecs VideoEncoderConfig

Slides & video

Workshop sponsor