Thoughts and considerations on building audio apps on the web by Hongchan Choi (Google)

Thoughts and considerations on building audio apps on the web

Presenter: Hongchan Choi (Google)
Duration: 12 minutes
Slides: PDF

Keyboard shortcuts in the video player

Play/pause: space
Increase volume: up arrow
Decrease volume: down arrow
Seek forward: right arrow
Seek backward: left arrow
Captions on/off: C
Fullscreen on/off: F
Mute/unmute: M
Seek to 0%, 10%… 90%: 0-9

Slide 1 of 21

Hello, my name is Hongchan and today I'm going to talk about some thoughts and considerations in building audio apps on the web. The goal of this presentation is to lay out some discussion topics on web-based media production.

Slide 2 of 21

Just to briefly introduce myself, I'm the tech lead of the Chrome Web Audio team and the co-chair of W3C Audio Working Group.

Slide 3 of 21

Here's my question to you, what are the things that you need to think about if you were to build a web audio app today?

Slide 4 of 21

Obviously, the first thing you need to take a look at is the Web Audio API, but I'm not going to talk about how to use it here today.

It has been around for more than a decade and we have plenty of code examples and tutorials out there. Instead, I would like to discuss its architecture and performance characteristics.

Slide 5 of 21

Two notes, first, Web Audio API is a graph-based audio programming environment. There are a handful of audio nodes you can interconnect to create a graph.

Secondly, the graph renderer is run by a dedicated high priority thread which is usually a real-time thread.

This design was inevitable because the Web Audio API is a part of the web platform.

Slide 6 of 21

Processing audio streams directly on the application's main thread causes a poor user experience in general. This is why the web audio node lives on the main thread and the actual audio processing, I call them internals, happens on a dedicated isolated thread.

For better or worse, Web Audio API hides the low-level audio implementation away from the developer. It means that you don't have to write on oscillator or a filter or a compressor from scratch, it is provided by the implementation. But it also means that things can get complicated quickly when you want to touch the bare metal such as implementing your own filter that manipulates audio samples.

Slide 7 of 21

For that sort of use case, Web Audio API has AudioWorklet. With this object, you can write your own audio processing modules with JavaScript and WebAssembly.

Slide 8 of 21

Another interesting aspect is that Web Audio API is a JavaScript API. As you already know, JavaScript is a garbage collected language with some controversial quirks like typing and scoping, et cetera. When you are building a larger scale real world product, you will encounter problems related to the garbage collection and performance.

It's something you cannot control and it varies across browsers, but you have to be mindful.

Technically, garbage collection should not impact Web Audio API's renderer because it runs on a different thread, but that's not always the case. Even though your code is flawless, not creating any garbages, libraries that you're using might be wasteful, it might be inducing garbage collection. Creating too many objects at once will eventually put pressure on the audio renderer because audio nodes are garbage collected objects, even though internals are not, but they are still associated together.

Slide 9 of 21

So what do you do? You inspect and profile the performance. It gives you insights on when it happened and how it happened.

In Chrome, you can use the Web Audio perf toolkit and that's my first offer today.

Slide 10 of 21

First, the Web Audio DevTools panel. This is a very simple tool that allows you to monitor the health of the audio system and its rendering capacity.

If you're experiencing audio glitches, it's most likely either of two cases.

A: the callback timing is irregular, this can happen when the renderer runs on a lower priority thread

And B: the audio processing load goes beyond the CPU capacity. This can happen for so many reasons, but in the end, you're trying to do too much and the callback misses its deadline.

The DevTools panel provides the metrics for both.

Slide 11 of 21

Secondly, we have the audio graph visualizer extension. This is the most recent addition to our toolkit. This is not shipped with Chrome, so you will have to install it from the Chrome Web Store, which is just a one time process.

This tool is useful in at least two cases. First, a larger scale web audio application typically constructs and destroys a lot of audio nodes. It's really hard to spot a wrong connection between them by reading the source code.

The visualization is immensely better in pinpointing a mistake.

Secondly, it allows you to understand the level of a redundancy of your graph. You might be creating too many gain nodes for no reason. It is very common technique using several gain nodes to wrap a subgraph.

Also, there might be an orphaned node that is created, but not connected to anything, which is surprisingly common as well.

Slide 12 of 21

Lastly, you can use Chrome's Tracing tool. This is a bit more involved compared to the previous options, but it is comprehensive and it's full of insights.

You can use this by going to chrome://tracing. I suggest reading the article that I wrote to explain how to use it for a audio application. You can just Google Profiling Web Audio apps."

This tool is also important for two reasons. First, this shows exactly when things went down and how they happened. You will be able to see when an audio stream glitches, like buffer underruns, and make an informed guess about why.

Secondly, this is incredibly useful when you communicate with Chromium engineers. It is very likely that we don't have the exact same setup as you, so some reproduction of the issue might be impossible. So, when fixing bugs, exchanging a trace file with us really helps the communication.

Slide 13 of 21

Okay, let's shift gears and talk about other issues like device latency and user privacy.

Slide 14 of 21

As you're building a client side application, like an instrument or an audio recorder, editor, or a DAW, soon you will realize that the lack of access to audio device is a big gap between the web and the native platform.

It means that device related settings, such as number of channels, sample rate, and buffer size are not readily available for your application.

We, browser implementers, actually are aware of that that this is a huge pain point for developers, but it is not without a reason.

This device related information can be exploited by advertisers or attackers to infer the user's identity. This technique is called fingerprinting and it is one of the reasons that we cannot have nice things on the web.

There are of course, countermeasures to this type of exploitation, a constraint-based API pattern for example. The app can make an inquiry and the platform will accept or reject it depending on the current client's capability.

It's like asking, Hey, my app needs four channels at 48k and lowest possible latency. And the platform will say yes or no.

That way, it is much harder to sneak in with drive-by fingerprinting and at the same time, we don't lose much API usability.

Protecting user privacy was considered a hassle and it was definitely a limiting factor of the web platform, but I believe so-called privacy over API design is gradually becoming a norm, even on the native platforms.

These days, you will find similar protection mechanisms like a system-wide permission UI for microphone access in other operating system like MacOS or Windows.

Slide 15 of 21

Now let's talk a little bit about latency. I'm well aware that this is a thorny issue when it comes to the web platform and at least for Chrome Web Audio, we are not particularly doing well in audio latency department.

For audio production apps, the latency is important at least for two reasons. First, the minimum latency possible matters when you're recording or monitoring, but also accurate latency reporting from the platform is critical for compensating audio after the fact.

But it's a tricky problem for browsers. The browser needs to support a variety of configurations on many different platforms. It means that we are spreading thin and might be missing some obvious platform specific optimizations.

When seasoned audio developers jump in Chrome's audio infrastructure, point out some problems, we are always grateful for that and that happened actually several times in the past.

Also, Web Audio is not the only audio API on the platform. WebRTC and the media element, in Chrome, they also share the same audio infrastructure with Web Audio. This makes it hard to bring a big change that only benefits Web Audio and not others.

RTC and media usually focus on the resilience, which means more buffering, but Web Audio cares more about low-latency and interactivity, which means less buffering. This conflict makes it hard to apply the aggressive optimization that only benefits Web Audio.

Slide 16 of 21

What's the reality today?

For web audio, you have to use getUserMedia for microphone input and the output simply goes to the system default audio device.

But what if you want to use an audio device other than the default one? The only known solution is to use the audio element. By streaming the Web Audio output to an audio element with the selected device.

Here, streaming usually means there are some buffering going on somewhere down there. That can't be good for latency.

Slide 17 of 21

What can we do about it? The Audio Working Group is currently working to create a new API that allows you to select the audio output device for an audio context. Theoretically, this will guarantee the code path that minimizes the output latency.

Slide 18 of 21

Also, one can dream about creating a new API for input device selection as well. I'm curious how many people would want that. Please let me know what you think.

Slide 19 of 21

That's all I have for today and here's the conclusion.

Slide 20 of 21

We talked about the design and the architecture of the Web Audio API and also I introduced the Web Audio perf toolkit from Chrome, and also we discussed the problems in device access and latency.

By all means, this is just a conversation starter, not a comprehensive guideline.

Slide 21 of 21

With that, I would like to invite you to a survey, so we, browser implementers, can understand your needs better. Here's a link.

Lastly, with my Chrome tech lead hat on, building a healthy ecosystem for Web Audio is my job and I'm open to have a chat with anyone who is interested in partnership with my team.

Please feel free to email me or DM me on Twitter. Thank you for watching. Be safe and stay healthy.

All talks

Workshop sponsor

Interested in sponsoring the workshop?
Please check the sponsorship package.

W3C/SMPTE Joint Workshop on Professional Media Production on the Web

Thoughts and considerations on building audio apps on the web

Slides & video

Workshop sponsor