A Non-linear Video Editor built with WebAssembly

Presenter: Junyue Cao (ByteDance)
Duration: 7 minutes
Slides: PDF

All talks

Slides & video

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek to 0%, 10%… 90%: 0-9
Slide 1 of 9

Hi, my name is Junyue. I'm a software engineer who works for the multi-media team at Bytedance. Today I will talk about our web based non-linear video editor written in C++ and built with WebAssembly and EMScripten.

Slide 2 of 9

This is the agenda of my talk. First I will introduce what the video editor is. Secondly I will briefly introduce how it works. And lastly I will talk about some needs for our development.

Slide 3 of 9

Video editing is very important for every video content creator. It is one of the last steps of video creation. It turns your media clips into a complete story. There are many video editing software including desktop applications and mobile applications, or maybe some cloud based software.

What we are buiding is a web based multi-track video editor. Users can use a web browser to add video clips, audio clips, subtitles, transitions and special effects. By taking advantage of web technology and cloud based storage, users can open the projects with any computer at any time and continue to work.

Slide 4 of 9

At first our core engine code was written for native platforms in C++. It is a multi-threaded engine which runs on Android, iOS, Windows and MacOS. With the improvement of the browser support for WebAssembly, we now have the opportunity to migrate the engine to the web.

We use WebGL and other technologies for real-time video rendering. In fact, WebAssembly is not a very new technology anymore, so I won't talk about every detail of WebAssembly and EMScripten here.

Slide 5 of 9

This diagram shows a simplest structure of the application. We provide JS APIs to the web page to control the engine. The JS APIs call to the C++ engine compiled with EMScripten. And the C++ engine calls the browser feature APIs provided by EMScripten to use all the features such as WebGL for video and special effect rendering, WebAudio for audio playback, Web Workers for multi-threaded runtime and IndexedDB for file system persistence.

Slide 6 of 9

This is a diagram to illustrate the simplest process to show what browser features are used in the processing route.

First, the video part. For each video track, we first use WebAssembly to decode the video frame. After we get the decoded video, we convert it into a texture, and then blend the textures of multiple video tracks and display it on canvas through WebGL.

For the audio part, we also use WebAssembly to decode, and then send the decoded data to the Web Audio API for playback.

Of course, for any kind of video player, audio and video synchronization is indispensable. Here is just the simplest processing route. I have omitted many steps. The actual situation is much more complex than that shown in the figure.

Slide 7 of 9

So far, we've basically made the whole application work, but that doesn't mean it's perfect. There are unsatisfactory things at some points.

The first problem is decoding performance. We need to decode multiple video tracks at the same time, which is a great challenge for the performance of video decoding. Using WebAssembly to decode will consume a lot of CPU, and its speed is not as fast as native. At present, in order to support as many video tracks as possible, we limit the video resolution to 480p.

With the introduction of WebCodecs, we may have the opportunity to improve our video resolution. To use WebCodecs in a thread, that is a worker, providing synchronous WebCodecs API will be more friendly for C++ code to integrate.

Slide 8 of 9

Another one is better debugging experience.

When enabling the debug mode of DWARF, the WASM file will become very large. In our case, it is more than 1 GB, which causes the browser to be very unstable. The browser will crash easily after running for a while. And it's slow. Even when debugging in localhost, loading the DWARF information will still take more than 10 seconds. And the devtools often respond slowly and get stuck.

Secondly, in a multi-threaded application, a meaningful thread name, that is the name of the worker, is very useful for debugging. The current Web Worker only supports specifying its name when it is created. We hope it can be renamed when using a Web Worker, which is very helpful to find the desired thread.

Thirdly, we need to stop the world. At present, devtools can only select a thread and click the pause button. It will pause the worker or main thread. But when there are a lot of threads, I have to click the pause button one by one. Similarly, the resume button now needs to be clicked one by one. In this regard, I think we can refer to the debugging habits of the popular IDEs.

Slide 9 of 9

As shown in the figure, EMScripten provides a file system that supports the direct use of native POSIX file APIs. On the browser, it mainly includes MEMFS and IDBFS.

However, video files are often large. When we use MEMFS to read large files, the memory consumption will be large because it loads the whole file into memory, which is very easy to cause out-of-memory errors. At the same time, multi-threaded access will always proxy to the main thread, which has an impact on the performance of the main thread.

Therefore, I think we need a file system that supports better multi-threaded accessing, less memory consuming and better persistence solutions.

OK. That's all for me. Thank you very much.

All talks

Workshop sponsor

Adobe

Interested in sponsoring the workshop?
Please check the sponsorship package.