Minutes of live session 1 –
WebCodecs, Web Audio and media synchronization

Also see the agenda, the minutes of session 2 and the minutes of session 3.

Present: Andrew MacPherson (Soundtrap/Spotify), Animesh Kumar (InVideo), Anton Hedblad (Spotify), Bruce Devlin (SMPTE SVP), Charles Van Winkle (Descript), Chris Cunningham (Google), Chris Needham (BBC), Christoph Guttandin (Media Codings/InVideo/Source Elements), Dan Tatut, Daniel Gómez (InVideo), Enrique Ocaña (Igalia), Eric Desruisseaux (Autodesk), Francois Daoust (W3C), Gerrit Wessendorf, Abhilash Hande (InVideo), He Zhi (China Mobile), Hongchan Choi (Google), James Pearce (Grass Valley), Jeffrey Jaffe (W3C), Junyue Cao (Bytedance), Karen Myers (W3C), Kazuyuki Ashimura (W3C), Kelly (CaptionFirst), Kevin Streeter (Adobe), Louay Bassbouss (Fraunhofer FOKUS), Luis Barral (Avid), Marie-Claire Forgue (W3C), Matt Paradis (BBC), Max Grosse (Walt Disney), Mun Wai Kong (Grabyo), Nigel Megitt (BBC), Oliver Temmler (ARRI), Paul Adenot (Mozilla), Paul Randall (Avid), Paul Turner, Peter Salomonsen (WebAssembly Music), Pierre-Anthony Lemieux (Sandflow Consulting / MovieLabs), Qiang Fu (Bilibili), Rachel Yager (W3Chapter), Sacha Guddoy (Grabyo), Sahil Bajaj, Song XU (China Mobile), Spencer Dawkins (Tencent), Steve Noble (Pearson), Takio Yamaoka (Yahoo! JAPAN), Ulf Hammarqvist (Soundtrap/Spotify), Van Nguyen (Vidbase), Wolfgang Heppner (Fraunhofer IIS), Yuhao Fu (Bytedance).

Table of contents

  1. Opening remarks
  2. Agenda bashing
  3. WebCodecs
  4. Web Audio API
  5. Media synchronization
  6. Next session

Opening remarks

See Opening remarks slides and transcript.

Agenda bashing

Chris Needham: In the first session, we said we would focus largely on three main areas, looking at WebCodecs, Web audio and media synchronization issues.
... What we have done is review all of the talks that were submitted. Thank you to everybody to submitted presentations. We pulled out some of the questions that were raised in the video presentations.
... All of these that we have collected, we have added into GitHub and for some of these we have got some conversations started.
... I have tried to reply on particular issues where perhaps I have some input to give, but we thought we would go through the issues that have been captured and have conversation around each one to say what requirements exist and we should be considering in terms of the development of the web platform.

WebCodecs

Quality control knob

Related GitHub issue: issue #54.

Chris Needham: Let's start by looking at WebCodecs. Please use the Slido to add any WebCodecs specific questions, but I think what I would like to do is to come initially to a question that Chris Cunningham asked in his presentation, which is WebCodecs offers a number of configuration options allowing you to control the media output but the question is: what other control parameters would you like to see?
... I would like to open it up for any content or thoughts on that particular question.

Kevin Streeter: An area that's interesting to discuss is how do you control encoding quality in the amount of time spent, the rates and those kinds of things.

Chris Cunningham: Super interested to discuss that topic. If you could tell me some things about APIs that you have seen that do that and which you have liked.

Kevin Streeter: There are a few different ways that is typically done, sometimes you get the very granular slow/fast behaviors, that internally that's making assumptions on the quantization levels and in some respects the amount of work put in, for example, for optimizing motion vectors and finding patterns.
... And then others have a more direct kind of application where you can more or less put in a direct numeric value for representing sort of that rate distortion balance and you have the implementation of WebCodecs honor that.
... The former is very easy to use, if you don't have a really deep understanding of how the codec works, you get more or less reasonable results but variations on different content, and the latter you have a deeper understanding of how the codec operates and you have a lot more control.

Chris Cunningham: That sounds great. So I can follow up after the call and read more, for the latter example, is there a tag for instance or a specific API you're thinking of?
... For example, a case where you have the ability to just use the high level API into the tool to control how much the presets need to be tweaked, but then you also can also tunnel through parameters directly to the codec library, whichever one you're using. Most Codecs have a next level of control over how the encoding process operates.
... I have not done a full investigation. The challenge with WebCodecs is that you need to have a unifying layer on top of the libraries.
... What you're describing sounds doable and you should expect we'll get to it.

Kevin Streeter: To get the control, you have to really understand what's happening under the hood and that's exposing details of the implementation, so that's a hard balance.
... Some of this depends on what you're doing. At least on the work we're doing right now, what's happening with the browser, it's a preview type of operation, you're offering something to see how it will look, and we're not really relying on that for the final export.
... A final export would be something that would happen on a server someplace with encoders that we control. But we would like to be able to later port that in the browser, that would be great, not having to do a round trip to the server to do that kind of thing. That's really where having the deeper control over the encoding process matter, people care at that point where every pixel is.

Chris Cunningham: Makes a lot of sense. When you think of the challenges between different platforms, let's say I can give you what you're asking for on Windows but I couldn't on Android. How would you like to see that distinction surfaced with the API? We have done some knobs as more of a hint. There is a knob for instance that says "prefer quality over latency". That's a hint to the extent that if the encoder can't do it, we do our best.
... Then there are also knobs that prefer hardware acceleration that is at least in Chrome more of a guarantee, if we can't give you this, we fail to configure the encoder entirely. How do you reason about these particular settings?

Kevin Streeter: I think both treating it as a hint or a requirement and failing could work.
... If a hint, you would still want to know whether the hint was honored. You want some way to be able to introspect or test that you're getting the behavior that you want.
... In particular, you don't want it to change version over version of a browser. You don't want it to be something that didn't used to work, so you have some code in there managing that, and suddenly the behavior changes and things break. You want to know if it is active or not.

Chris Cunningham: Got it. That's probably all my questions, makes a lot of sense. All that is left is the homework.

Chris Needham: Any more thoughts from anyone else on this particular aspect?

Pierre-Anthony Lemieux: Just kind of a meta question that I'm sure we'll ask a number of times over the next three days. What is the best way for folks to provide feedback on that kind of issue? Really pragmatically, for instance, what kind of knobs should the API expose?
... Call directly to you? Is it on the WebCodecs GitHub, what's the right way for the community to provide feedback?

Chris Cunningham: GitHub is the best way and also, if there is a feature requested on the GitHub, pile on, you know, that helps us to get clarity on the features that are most requested.

(De-)Muxing API

Related GitHub issue: issue #35.

Pierre-Anthony Lemieux: We have a bunch of questions online so maybe we start with the first one. I know WebCodecs has no (de-)muxing capabilities. I wonder if there is an open-source library that fills that gap in user land. Who wants to take that on? Somebody says, well, WebCodecs is great, it allows me to decode the bitstream but how do I get to the bitstream? What's the right approach to that?

Chris Cunningham: I can take a quick shot. We have the Swiss army knife, FFmpeg, and we have seen that configured with WASM and you have everything.
... That's an involved process if you have not done that. I know there are a lot of builds and different things available now. But I recognize that is an adventure to say the least.
... This is on the agenda for this quarter, to experiment with this more myself. If I can find this elegant approach; I can fix this in one pass; if it is super painful; if it doesn't meet the means of a large group of user; whatever, I'm open to considering other plans!
... In JavaScript, there is the MP4Box.js, a great library, and we have written demos with that library and there are a handful of other libraries less well known. We have another example on the WebCodecs GitHub, and there are a dozen or so libraries in similar situations that are not well supported.
... That's not a super satisfying answer for folks that need to get something done.

Christoph Guttandin: I was asking the question. Thank you. I was specifically asking because, maybe I'm wrong, if I'm decoding, it kind of defeats the benefits of WebCodecs if I have to shift anyway, then why should I use WebCodecs besides the acceleration? What can I split when doing the decoding?

Chris Cunningham: I completely agree with you.
... If I find that it is impossible to do that, I would be very disappointed.
... FFmpeg is probably the wrong word, in that, I would only want to include the libavformat part. Based on experience doing this in Chrome, Chrome uses FFmpeg extensively and we have a script that configures just the codecs we want, just the decoders we want in Chrome and this is probably generally supported and something that we can do.

Pierre-Anthony Lemieux: Anything else on that topic?

Paul Adenot: A quick one, I'm Paul from Mozilla and we have been moving demuxing from native code and then running them in WASM in Firefox and for now, the performance difference has been in the noise, so if performance is a concern for you, I would try it anyway, even for big workloads we have not been able to notice any problem. Differences are not that big.

Chris Needham: In terms of an open-source library that I have developed, I was looking at replacing web audio's decodeAudioData with WebCodecs. decodeAudioData does everything, so moving that to the application layer, I have to ship a library for every format I want to support, for all of the different audio formats that people may choose to use.

Paul Adenot: That's the concern, you take the most expensive part of the process, and more importantly, you have the hardware decoders and encoders to be used with a high degree of flexibility and this was left for after because it felt like doing everything as once would be a bit complicated.
... I think Chris mentioned already that it is not excluded forever.
... Let's see how this fairs and let's reconsider if need be.

Chris Needham: Makes sense. Let's come to the related questions.

Memory copies

Related GitHub issue: issue #30.

Pierre-Anthony Lemieux: James Pearce asked: Does a Web Assembly demuxer require a memory copy from an ArrayBuffer in JavaScript land to the Web Assembly heap, and can we avoid that? Is it simply that the overhead is meaningless?

Paul Adenot: For now it does require a memory copy, but there are a few different solutions in the works. The important thing to note here is that it is on the encoded packets, so it is less of a problem really. It is much, much smaller memory footprints.
... There are different discussions happening to skip copies via a new API on the codec side of things or in the spec where you're able to have the memory and you free some memory that you're using and you will be able to do the zero copy stuff. For now, there is nothing finished or designed that's complete where everybody agrees. It is in the works.
... I have talked about this in the talk, but there are a number of links you can follow from the slides to the various efforts and issues. Not only for WebCodecs but for Web Audio and other things too.

Chris Cunningham: I agree with Paul. I just want to say, in terms of the browser, in Chrome, we do copy pretty liberally. Definitely in practice, Paul is correct, the copies, they're not a performance killer.

Further reading:

Back to quality control knob

Related GitHub issue: issue #54.

Pierre-Anthony Lemieux: Okay. Next, going back to quality, we touched on that a little bit, we talked about a quality knob but, what's a quality knob? Is it QP? Is it bitrate, variable bitrate, any thoughts on what a quality knob will look like, it is completely TBD and community input being sought?

Chris Cunningham: Definitely input is needed, it is a wide open question for anybody who has a favorite quality knob.
... This is covered to some extent in the first question with Kevin. If folks have additional ideas, different knobs from what was suggested, we're definitely listening.

Kevin Streeter: I feel it is also about understanding what the problem is. There are two modalities that we work in.
... One is where interactive performance matters and so what you're usually doing is saying: I'm willing to wait this amount of time for a frame or some number of frames; how do I configure the encoder to give me the best quality that fits me within that time scale?
... The other modality is: I'm willing to wait a long time, how do I configure this encoder to give me the best possible quality? Those are the two use cases that we probably want to consider.

Support for media production codecs

Related GitHub issues: issue #25 and issue #40.

Chris Needham: We have a couple more general codec questions in the Slido. The first one being, apart from decoding the current browser compatible formats, what codecs are or will be supported in the near future? This is interesting in the professional production sense, there are certain codecs that are quite specific to that use case that are not used for general distribution and playback.
... Do you have any thoughts or indications on the different codecs that browsers are looking to support through WebCodecs?

Chris Cunningham: From the Chrome side, the history of codec support is a mix of market demand, the browser's position on licensing and open source codecs, and what we think is best for the web platform. With WebCodecs basically none of that has changed.
... What may have changed is demand for codecs that we hadn't seen for the video tag generally. Definitely I'm interested to hear about that and keep an eye on it and compile the requests. At this point you should expect basically parity with what was already in the video tag for that browser, what was already part of WebRTC or MediaRecorder in the browser. We'll go from there.

Paul Adenot: Is there anything in particular that would be needed? For Firefox, we plan to more or less align with what Chrome does.

Pierre-Anthony Lemieux: Codecs that are in wide use in professional applications include ProRes and JPEG 2000.

Paul Adenot: Those are patented.

Pierre-Anthony Lemieux: JPEG 2000 is royalty free. It could be a different issue for ProRes, but JPEG 2000, and there are a bunch of others. One question, what is the right place to have that discussion?

Chris Needham: One of the presentation videos mentioned, is there a way to enable some kind of plug in architecture that if the system has such a codec available, then it could be discovered and exposed.

Chris Cunningham: I haven't considered this, the first thought is that it sounds hard.
... You know, WebCodecs is already basically plumbing the underlying libraries and so conceptually I guess all that's needed is just to open a door to configure any codec that that library supports and getting the signaling right for that seems pretty tricky.
... I don't know. I guess this is not a firm no. I don't know immediately how it would be done.

Chris Needham: And James in the question raises the point about when different formats potentially have different quality control knobs as well.

Paul Adenot: For sure.
... We have two layers that we can expose knobs on. We have the global layer and we have the bitrate layer for example and then we can add specific outputs if it is in the formats.
... For each codec, you have the special things, just that logic underpinning.

Chris Needham: Right so, we could define that in a per codec kind of basis, depending on a registry.

Paul Adenot: Technically, yes. I don't remember, Chris, do you remember if we have some of those? I mean, we have the format of the bitstream itself?

Chris Cunningham: That's right.

Paul Adenot: For Opus as well I think. We don't have anything specifically related to quality yet, I don't think.

James Pearce: The idea of being able to access and install software, codecs, is a potential security risk. It goes back to the days of ActiveX or something like that where you are running potentially unknown code in the browser.

Paul Adenot: We're well aware. Any web browser in production today includes various lists of things that can and cannot be used based on the crash reports and the security testing.
... It is not really a blanket, here is the list of stuff that can be decoded on this machine.
... Not because of that, but it is the problem, that the content, nobody could play on the web, right. It is also a problem. Something to be weighted. It could still be done carefully.

SEI metadata management

Related GitHub issue: issue #198 in WebCodecs repository.

Chris Needham: Anonymous on Slido asked about metadata handling. Some applications need data generated by the codec or could be inserted by the application directly like SEI. Is there any advice for how we would handle that?

Chris Cunningham: There isn't a provision for this currently, at least none that I'm aware of. I don't know SEI enough to know how it is typically done.
... I guess for instance if you typically pass the messages to FFmpeg in the bitstream, right alongside your framed data, that's already possible today.
... If it is more typical to have a higher level SEI parameter to the codec, that's obviously something we don't have.
... I guess a question back to the folks that used this sort of mechanism, how is it typically working?

Chris Needham: I don't know who asked the question, feel free to jump in and give your point of view.

Pierre-Anthony Lemieux: I didn't ask the question, but I'm going to try to channel the person who asked it.
... The issue to date, it doesn't work very well, right? That's been a perennial issue of trying to extract the metadata from the bitstream, when it's not present at the container level.
... Assuming that there is some data that's identified, that's within the bitstream that's useful to expose, could that be exposed by WebCodecs or is there a fundamental architecture issue that we have?

Chris Cunningham: I think I get it, the desire is not to configure the codec, rather it is in the bitstream and you want to get the codec to tell you what it finds. Is that right?

Pierre-Anthony Lemieux: Exactly. That's how I interpreted the question. There is some metadata deep down in the bitstream, and sure the application could parse the bitstream, extract that and pass it again to WebCodecs to be decoded but the question is, could WebCodecs just expose that information directly? I think that's the question.

Chris Cunningham: I don't have any philosophical reasons not to do that, it sounds like a useful thing. It will depend on the libraries of course and maybe the folks who asked the question are familiar with the native side or FFmpeg. What we could do is bounded by what they do already.
... It is already broadly supported and in a uniform manner and I think it is very kind of low hanging fruit. I'm not sure whether that's actually true.

Paul Adenot: Something related, in WebCodecs there is also a way to do image decoding and there are talks with exposing the EXIF metadata. If you find something in bitstream, it could be exposed, if it is structural enough and it could be exposed in a simple way.
... Then we have to discuss with the people that need this to make sure that it is useful or we could check actually. It would have to be in the codec for this to be useful. Otherwise it is out of scope as discussed previously.

Chris Cunningham: I think we have an issue in the GitHub requesting certain information, just jog my memory, and this would be a great moment where folks that are interested in it, who clearly are more knowledgeable than me on what they're interested in, should head over to GitHub, maybe find that issue, or file a new issue, and we can compile the use cases and consider a possible structure for the data.

Chris Needham: A use case that came up in the discussions around HDR is the metadata, the luminance information and so on that may be present in the bitstream.

Pierre-Anthony Lemieux: It comes naturally for the web community to go and make requests on GitHub. I think that's a pretty unusual process for the professional media production community.
... Really, this is a point to emphasize. If you have a request for a feature in the WebCodecs API, the right place to do it, certainly to start the discussion, is by filing an issue and providing sample bitstreams. For instance, if you'd really like WebCodecs to expose a certain parameter in the bitstream, maybe open an issue on the WebCodecs repo with a sample bitstream and what the API should look like. Is that a fair statement, is that the right mode for folks to provide feedback?

Paul Adenot: We want the APIs to be useful. So at the end of the day, we want to discuss it with anyone that's using it.

Pierre-Anthony Lemieux: If there is any folk on this particular meeting, for instance, that may not be really used to that type of work, feel free to contact Chris or myself to guide you. We can also have some pre-discussion, for instance in the Media & Entertainment Interest Group or in one of the Community Groups such as the Color on the Web Community Group if it is color related. Ultimately, issues get solved on the web by filing issues on GitHub or in some issue tracker with an example and some suggestions.

Chris Needham: Absolutely. I think in our wrap up session on Friday we will talk about the different groups that W3C has where, if we want to have conversations before filing issues, then there are various places that we can discuss as well.

Priming in the video domain and pre-charge in the audio domain

Chris Needham: Is any work planed to support codecs that support priming in the video domain and pre-charge in the audio domain?

Paul Adenot: The thinking has been that it is an API, so you get your preroll, the priming sample, and you know if it is fixed or whatever. Then you discount those.
... It is only the API, there is no way to prime it and we don't know. For the video part, I know much less about it.

Getting a decoded video frame from a video element

Chris Needham: Yuhao Fu asked a question about video element integration, is it possible to get a decoded video frame from a video element so that we can do some processing on it?

Paul Adenot: There is a way that's not quite standard yet, although it is somehow shaped already in MediaStreamTrack Insertable Media Processing using Streams. It is where you use the HTML video element captureStream method. That gives you a stream and then you construct the MediaStreamTrackGenerator from that and that gets you WHATWG Stream and with this you can consume the frames.
... There are lots of discussions here at the moment. It was designed more for WebRTC use cases so we'll see how it fares when it is really fine. For example, what happens if the processing takes longer than the interval between frames?

Chris Needham: Is there anything we can look at that captures the discussion or some example?

Paul Adenot: See MediaStreamTrack Insertable Media Processing using Streams.

Chris Cunningham: The same point, we have an example on GitHub. It takes camera frames and then encodes them using VP8 or VP9 and puts that in a file and uses the API that Paul just mentioned, the MediaStreamTrackProcessor to grab the frames from a user media stream. It is not exactly the same, you don't have a video element as the source, but it is mostly the same with that one exception.

Chris Needham: Right. You can take the example, adjust it to capture the stream from the video element.

Paul Adenot: See the above spec. You will see, it is an unofficial draft. You create the MediaStream. There are two objects, the MediaStreamTrackProcessor and the MediaStreamTrackGenerator for the two directions.
... A thing we mentioned, if you want, you may have a video element that's paused. When it is not paused, a video element is a media stream source and the VideoFrame constructor accepts the source as an input and you can create the video frames passing in the video element and that will grab whatever frame is currently presented. We recognize that's a little bit hard to synchronize with the actively playing video element.

Web Audio API

Will decodeAudioData be deprecated?

Chris Needham: In the interest of time, we should move to some Web Audio topics. There were a couple of questions. With WebCodecs, will decodeAudioData be deprecated given now that we have the AudioDcoder in WebCodecs?

Paul Adenot: Generally, things are not deprecated on the web, so no I guess. It will continue working but there are so many problems with decodeAudioData that people that want to do something with a higher degree of control, we prefer to do something even just with the sample. Generally, it is going to continue working.

Hongchan Choi: Is there a need for the deprecation?

Chris Needham: I agree with Paul's views. Once the APIs are shipped, there is a long term commitment to supporting those. I think it is just too late to remove the popular API at this point.
... The question came from someone in Slido. If you'd like to put your point of view, that would be welcome.

Measuring the intrinsic and output latency

Related GitHub issues: issue #31 and Web audio issue #469.

Chris Needham: One of the big topics that came out through the video presentations that were submitted was around latency and Web Audio. Real time media production applications need low latency so that you could, for example, record at the same time as you're playing back and not be terribly out of sync.
... A question is, are you able to query the latency of a Web Audio node or a complete audio graph?

Paul Adenot: There is no way to do that. There is a way to compute it, input an impulse to the node and measure the output.
... For audio nodes, it depends on the parameters, the simplest being the delay, you can change the delay and it could depend on the delay as well.

Charles Van Winkle: Thank you, Paul. This was my question. I appreciate you confirming that.
... The impulse method won't be robust and working in every scenario because there could be some arbitrary node that someone doesn't know that it actually mutes the audio depending on the parameters and you have no way to measure it.
... I agree with you, in a lot of scenarios, like with the filter, something like that, that could be a reasonable approach.

Paul Adenot: You would have to be able to have a special subgraph and then you can do an addition because the Web Audio itself doesn't add the latency in the node, just some of the latency of the nodes, just the direct synchronous proking. If you build the replica of the processing graph and you have the impulse, then you have the latency granted and you disable the node that mutes or you have the outputs depending on the output. It would be really hard to do. It is probably possible to do it analytically most of the time.

Charles Van Winkle: Any pass filter would smear the output of the impulse which would be a lot of DSP. The reason I'm asking is I'm trying to synchronize multiple streams of audio and some streams have a chain of nodes and another stream does not.
... A typical Digital Audio Workstation example. And I need to delay the playback of the stream that doesn't have the nodes so that when they come out of their respective chains and I mix them together they're timelined.
... That's something that is very optimal in plugins on the desktop and that's something that most people won't notice right now, but at some point I will have a customer, a project where it becomes egregious.
... I say add the delay and build manually, moving the slider up but it would be nice to do that in a way that when I prefetch audio I could guarantee the time latency when it comes from the respective graphs.

[[TODO: find link]] Paul Adenot: It is obvious with exactly the same audio that you're facing. For now I have a page up that lists the delay of the nodes and there are things that are variable. For example, if you do the web shaping with over sampling, the re-sample that's being used, it is the same.
... The filter is the same, it is the variabliliy with web browsers to be tested.
... Generally this is exactly the same, they're exactly the same. For most effects, the compressor, it is exactly the same although it is not the best.
... You have a fixed look ahead. Generally it is fairly consistent.

Charles Van Winkle: Thank you.

Peter Salomonsen: One question, regarding latency. If you are rendering audio in real time, using Web Audio, how to detect if you're going beyond the rendering window timeframe, if you're spending too much time in your rendering code? Currently, I haven't seen any way of doing that.

Hongchan Choi: You're asking about the render capacity, not the latency itself?

Peter Salomonsen: Yes, although they are kind of related.

Hongchan Choi: Right now we don't expose the performance or the high resolution sample on the audio, but I think there is some discussion on that. In the meantime, the shape of the new render capacity API is almost complete.
... I think we all agree on the API shape, we just need to add spec text and make a first implementation in Chrome with an origin trial so you could test it out. We're progressing there.

Peter Salomonsen: So there is an API for this.

Hongchan Choi: It will show the capacity of your render thread.
... If you miss anything, if you try to do too much on the audio processer, basically it will show the number.
... Now I have it running and the audio processor and any part of a Web Audio graph so you can do something with it.

Peter Salomonsen: That's excellent. Thank you.

AudioContext in workers

Related GitHub issue: issue #43.

Chris Needham: There is a general question from James around plans for Web Audio in a Worker context.

Hongchan Choi: This was actually decided, myself and other Audio Working Group members, we agree on the need to support AudioContext in Worker. I think the rest of the work is basically implementing the functionality.

Support for synthesized speech and object-based audio

Kazuyuki Ashimura: I was just wondering if we can think about synthesized audio and also object audio, other kinds of elements and instances.

Paul Adenot: For synthesized speech, there are no provisions to do it. The problem is that it relies on systems that are not well suited to real time processing.
... Sometimes it is remote, it necessitates server round trips. It could also be directly from the system and goes directly to the speaker or the headset and we don't have any control over the waveform. We don't even see the samples from this.
... It is unclear to us how we can do something considering this is the case for now in the current systems and implementations.
... It would be great actually, but unfortunately I don't have a better answer for now.

Kazuyuki Ashimura: That's what I thought, I'm investigating a possible workshop on voice interaction, maybe next year.

DSP languages support

Chris Needham: Thank you. James, would you like to put your question?

James Pearce: Just to give context, what we're building is a proxy-based editor. Rendering is done on the server using native code. A nice thing about WebGL is that the shaders are standardized and we pass that to the native renderer, so it's consistent between browser and native.
... Is there a way of doing that with audio? Some kind of DSP code that we may be able to give to the Web Audio API and then share that with native code somehow? It was just a thought I had as we were talking.

Paul Adenot: Generally you can write the processing code in any language, really.
... C++, FAUST, etc.
... People have implemented, I think, FAUST, PureData as well, and there have been other things.
... You have a bunch of those for compatibility, maybe from different standards, that kind of thing, then the compatibility story would be via plug-ins. What would you plug in this? The modules which are the standardized way to package the effects so that it is suitable via the URL.
... Then you point at this URL and you insert that and it contains all of the resources, all of the assets.

Measuring the input and round-trip latency

Related GitHub issue: issue #31.

Andrew MacPherson: I work for Spotify, someone wants to record something while the audio plays back, or they want to record a new instrument and we have to sync that up for them, we have to know something about the audio latency.
... We wonder, is there anything missing spec-wise to enable to us get a full round trip picture to give you a good idea of the latency without a measurement?

Paul Adenot: Spec-wise, nothing is missing. There is no web browser looking at the inputs and impacts as far as I know.
... Even if that is the case, the more I look into it, the more I find that getting the reliability output figure is really, really hard.
... I constantly grab a new machine, and it is gone, weird things after the call back, directly in the system or the kernel, that is not being reported by the system API, it should be, but it isn't. And that messes everything up.
... For now, it messes up the consideration but for a digital audio workstation, it messes it up as well.
... The goal is to provide the number, so Chrome should have the output and we implement the input, yeah. Hopefully, you can then convince the users to figure out a way to configure the system so that it is reported accurately.
... That's not been my experience so far, I don't know how many laptops I have tried, it is not aligned, it is not the same API for input and output, but access to input devices has been separate from access device for a long time now.

Hongchan Choi: I have a similar experience to what Paul just described basically. I have good news and bad news. For MediaStreamTrack, the latency in Chrome, I don't think it is working correctly. The value you get from the API is not really accurate last time I checked.
... The good news, it is we just sent out intent to ship outputLatency on the AudioContext in Chrome. We have a working prototype but we still have to go through the launch review process to ship the functionality to you.

Andrew MacPherson: That sounds great.
... The reliability of the numbers could be a problem as Paul explained. For the most part, it is the main gap with native in the platform.

Paul Adenot: For more professional users they often know how to align things between the mic and the speakers, and it should be workable.
... For general usage, the problem is the accuracy. Frankly we have the numbers internally and we are just not exposing it.

Ulf Hammarqvist: So just to add, I acknowledge it being really hard to get the numbers, so there's kind of two aspects to that, the detectability of it, it is carried on well as you initially would need know that the numbers are good or not.
... It pushes the problem back. I understand it is very hard. And the other one, what do you think you can do about it? I know there's a problem in the whole industry in a sense.
... How do you get to implement things behind the scenes, can you kind of put your worry behind certification programs or whatnot? I'm just dumping things here, but is there anything like that ever done?

Paul Adenot: On a more simple way, for Windows in particular we have ways to disable those objects in the web browser, in Firefox, and I had it enabled for some time and the latency feels were much better and much more stable.
... Unfortunately, what happens, some users require those effects to be on for the microphone to work correctly and so we can't enable it for everybody. At this point I don't know what to do.
... Do we put an option or something? I just didn't want to break everybody.

Seeking and variable bitrate

Chris Needham: So we have talked a bit around latencies and so on.
... Nigel, you have a question on a slightly different topic.

Nigel Megitt: Sort of related really. It is just thinking about the variable bitrate encoding and the need sometimes to seek to a specific moment in the decoded resource.
... I wonder if anyone else has hit this problem. I have hit this problem explicitly trying to seek in an MP3 file.
... It could happen in video as well, exactly the same thing, on the WebCodecs API, it is all about the decode, it is all about the input chunks rather than the decoded output and I'm wondering if anybody has been thinking about that and how to get precise location of the output samples when you're not really sure which input chunk they'll be in.

Chris Cunningham: I think it has been a long time since I worked on this issue.
... I remember in Chrome, seeing a bug about variable bitrate MP3s and I discovered that these MP3 files have a table of contents sometimes and it will tell you what byte offset corresponds to what timestamp.
... That may or may not be helpful depending on the content.
... From a WebCodecs point of view, if you can solve that problem using the table of contents or whatever external mechanism, what WebCodecs is left to do is just to timestamp the chunk accordingly.
... Even if they are variable bitrate, if you know the timestamp at the start, you should know the timestamp in the next packet and so on, the codec should honor that timestamp.
... Is it amidst the encoded audio data?

Nigel Megitt: Part of the problem is that approach actually.
... If you want precision about the Table of Contents, it is actually due course. I definitely have seen that in the implementation.
... If they go vaguely close based on the assumptions maybe about the encoded duration and encoded size and with the table of contents and they don't get accurately to the correct place and that has difficulties.

Paul Adenot: It is a really, really hard problem without a container.
... We should be decoding the file and jumping in to the offset.
... Even for the media element itself, we do a binary search and that is just not good enough.
... Depending on the flexibility of codec, it means that we can have assigned metadata, I know you have the major, for example, you have produced it and there is a way.

Nigel Megitt: It seems like the answer is you use WebCodecs to decode the whole input until you're at the point you want and then seek in the decoded output.

Chris Cunningham: Note, you can quickly discard things when you see you haven't reached your point.
... To keep the memory footprint low, just decode and discard immediately, then you can find out that you have what you need, so it is a lot more efficient.

Charles Van Winkle: I was going to add that I don't think what you're asking for is actually possible in the generic sense even with MP3 and definitely not for other audio codecs.
... I have tried that with the desktop API for Apple and Microsoft, you can set up their decoders and the demuxers to call back for the actual file reads such that they're calling you for the file reads, for the bitstream rates and even some of the APIs saying, don't read the entire file or do fast seeking, there are either bugs or implementation issues, they have to go back, you seek to a particular point and so desktop applications, again, decode stuff all the way even trying to do the byte offset calculations.
MP3 is a streaming format and so there can be arbitrary data in the middle of the stream. It's technically possible, but I have seen only a few files in my career and it could have some giant ID3 markers in the middle, would it throw off the offset table?
... Also as a streaming format, the format of the file could change mid-way through. Usually that doesn't happen but again it's technically possible and so I think in the general case there is no way we can do this.
... I have not seen it done in the general case for desktop technologies so having it on the web doesn't bring extra magic for us.

Media Synchronization

Related GitHub issue: issue #48.

Chris Needham: We're coming towards the end of our time for this meeting.
... There was one further topic on the agenda that we proposed which was around synchronization.
... This was referenced in Sacha's talk around synchronization of media playback with updates to the DOM.
... Sacha, if you're there, would you like to put your specific question about this?

Sacha Guddoy: Thank you for that. This is related to what people are saying about latency, Web Audio, other processing methods and how UIs that reflect the state of media, how they sync to that latency.
... What may be interesting to explore is a mechanism of creating a hard synchronization between what's happening in the media and what's happening in the DOM. So for example, we have some players that have a video player and an audio level display, so being able measure that audio latency, you have it match up exactly with when the audio is getting to the user's speakers. I think it would be really good. I don't know if that's essentially possible.

Paul Adenot: It is possible when you get the latency for this particular case.
... You need to artificially delay what you see.
... Let's say you have 50 milliseconds latency, you may need to draw what you measured to the screen 50 milliseconds later. It is an audio synchronization issue with the Web Audio API.
... Essentially this is it.
... I wrote a blog post about that (23 July 2019), looking into what different OS do, what level of precision you can expect and what are the API, of course, how do we use them, should you carry them once or should you carry them every time you want to draw something?

Chris Needham: Does the output latency only apply to the AudioContext, or does it also apply to the audio or video element?

Paul Adenot: This particular for the AudioContext. The media elements' clock includes audio latency ofsets built in because the purpose is not the same.
... If the currentTime is exactly 1.00 seconds, it is expected that you hear the sound at that same time, 1.00 seconds. The video frames that have been offset for you, that's a much higher level construct.
... There is also the case of taking the output of the media element into an AudioContext. That's different, you have to offset at this point.

Chris Needham: Does that answer make sense or is there more to it?

Sacha Guddoy: It definitely makes sense. The API that you need to do this, it was just talked about and is being shaped by Chrome in the future and should be covered there.

Next session

Chris Needham: We have a couple of minutes left.
... Pierre Anthony Lemieux, Francois Daoust, do we need to spend this time and wrap up the session? What do you think?

Pierre-Anthony Lemieux: I personally have a hard stop. I imagine others do. Maybe we should spent the last two minutes on thanking folks and talking about the next sessions and we can carry over topics in follow up sessions if that makes sense.
... I think we have done a good job going over the questions we had.

Chris Needham: I think so. We have three remaining questions and perhaps we can carry those through to the next session.

Pierre-Anthony Lemieux: When is the next session, maybe you can remind everybody?

Chris Needham: That will be tomorrow at 23:00 UTC. The focus is WebRTC, Web Assembly and file system integration. We hope to have some of the relevant browser experts joining us for that.
... My apologies if we didn't get to the question today, we will make a note of those and we'll cover them in the third session. That's on Friday.

François Daoust: Last session is about trying to agree on next possible steps for the issues. Take note of the discussions that we had today, take notes of where you would like things to go and we'll try to get back to them during the third session on Friday.

Chris Needham: Yes. The final thing I would say on this, it is that we have captured all of the questions in GitHub, so please do take a look at the GitHub repo. Your inputs, on any of these particular topics would be very welcome.
... All of the responses that go there become part of the overall workshop proceedings and so even if we don't get to cover it in the live session it is great that we have a record of the discussions that also are happening in GitHub.
... With that, we're out of time for today.
... I would like to thank you all for joining. It has been a really really good discussion, and as we said at the outset, it is great to bring the different communities, the production community and the web community together to look at this. We look forward to seeing you all again tomorrow.

Pierre-Anthony Lemieux: Thank you very much for the candid discussions and looking forward to continuing it in the second and third session.

Sponsor

Adobe