Real-time video processing with WebCodecs and Streams

By François Daoust

Transcript

Hello!

This a really quick dive into processing video frames in real-time on the Web.

Videos on demand.

Live streaming.

Online conferences.

Cloud gaming.

What seemed impossible on the web a few years ago is now just part of our daily lives.

In parallel, computation capabilities are exploding on the Web, with technologies such as WebAssembly, WebGPU or WebNN.

Well, and you know, JavaScript itself is far from being slow.

Newer requests have emerged as a result.

They boil down to: how can these technologies be mixed together to process video frames in real-time?

One typical scenario is removing the background of a video in a teleconferencing system.

To better understand constraints and get a better grasp on performances of different processing mechanisms, my colleague Dom and I played a bit with web technologies last year.

The result is this test page.

It uses WHATWG Streams to create video processing workflows, and leverages WebCodecs, WebGPU, WebAssembly and JavaScript to process video frames.

The actual processing is of very little interest.

What's more interesting is that this makes it possible to evaluate the impact on performances of using this or that technology to process video frames.

I will not go into details but if you're interested, you can check the README in the underlying repository or you may have a look at two articles I wrote back in March for webrtcHacks.

One challenge that we identified is that it is currently hard to guess the overall performance of a processing workflow a priori because performance depends on the number of memory copies of the raw video frames that the browsers need to make.

And this number depends on a number of factors.

Here is an example.

Without going into details, the table on the left represents a simple processing workflow, optimized to leverage processing on the GPU (and, you know, processing on the GPU is really super fast).

Overall workflow typically takes 17ms per frame.

That's great!

Now, the table on the right is the same workflow, except I added a slow WebAssembly processing step in the middle (WebAssembly is not slow per se, but it uses separate memory, so each video frame needs to be copied back and forth when WebAssembly is used to process it).

That slow step takes 12ms... and yet it only increases the overall processing time by 4ms on my machine.

Why is that?

That's probably because the browser uses a video encoder bound to CPU memory, which incurs a costly memory copy when the video frame sits in GPU memory.

The supposedly optimized workflow on the left turns out not to be super optimized in practice.

Browsers manage a number of memory boundaries.

Some are physically disjoint, others are logically separated for security or because the shape of objects in memory is different.

Browsers typically abstract these boundaries away from web applications, and that's usually a good thing... except perhaps when applications need to reason about memory copies.

All in all, the exciting news is that we're getting there!

Ongoing work on WebCodecs, WebGPU, WebAssembly, WebNN, WebRTC and Streams creates an extremely powerful processing platform.

But it's not easy and some improvements are going to take time, partly because work happens in a number of different groups that have different objectives.

So let's finish with a call to action!

If you have specific needs with regards to audio and video processing that arise when you combine web technologies, please articulate these needs and reach out so that we can better organize coordination efforts between underlying groups.

Thanks!

11–15 September

Seville, Spain & online

Real-time video processing with WebCodecs and Streams

Transcript

Sponsors

Silver sponsor

Bronze sponsors

Inclusion fund sponsors