Meeting minutes
MSTP for Audio / AudioTrackGenerator
Slideset: https://
Guido: Proposal to add audio support in the mediacapture-transform spec.
… The spec allows working with raw media and MediaStreamTracks.
… Given a track, access to raw media.
… Given raw media, produce a track.
… Raw media uses interfaces from WebCodecs, with consensus to use VideoFrame.
<dom> MediaCapture Transform repo
Guido: No consensus yet for AudioDate for audio, but this is shipped in Chrome and I'm going to report experience there.
Guido: VideoTrackSource allows to produce a track, given frames.
Guido: Both can be combined to produce a transformed track.
Guido: My proposal is to try to achieve consensus on audio.
… We don't have consensus yet because it's kind of redundant with AudioWorklet, which allows you to do audio processing.
… Audio procssing sometimes requires a realtime thread all the way through, you will see glitches.
Guido: There are good use cases where the transform thread is a good fit, for example if buffering is acceptable.
… if there is no audio playback.
… If the processing has high variance. You may want the processing to have very low CPU.
… In the worker, if you use buffering, you may have tolerance for variance.
… You may want to combine audio and video processing in the same worker, which you can do with WebCodecs.
… If you use directly the data that comes from AudioDate, you can access directly the input metatada such as capture timestamps.
… It also saves you a realtime thread, which is more expensive.
… Some examples of applications include audio analysis, encoding (leaving the main thread untouched), or transforms with no playback, for example if you want to send the result to the network.
Guido: Comparing using of stream track processor, generator, and audio worklet, in general (not audio specific).
… Both are popular APIs.
Guido: If you go to medicapture-transform usage, 50% of AudioWorklet, audio is ~27% of all Processor usage. Widely used, and is used mainly without connecting to a generator.
… E.g., to compute metrics about the audio.
… Generator usage for audio is ~2.6%, less used than Processor. Main use case is transformations that output only to the network.
… For video, generator is used 47% more than the generator. For audio, it's the opposite, and hundreds of percents more.
Guido: Processor for Audio is widely used. Generator has more limited use cases. It's not good for playback. The use cases are more restricted, but experience suggests that people understand that.
… My proposal is to add it to the spec, with guidance on when to use it and when not to use it.
Youenn: I would see a use case for live recording. Sending data through WebSockets for example. In many cases, they are using Media Recording, but not always.
… Usage to generate PCM, which works but is a bit of an abuse.
… The other is that avoiding realtime threads would help/..
… It would be interesting to do measurements.
… For generator, I feel that the only use case is doing processing before encoding the data in peer connections.
… You have your microphone, doing some processing, and then sending over.
… If you're using generator elsewhere, that's not very great.
Markus: Recording audio is also a use case for generator.
… Doing realtime processing on the audio.
Youenn: If you have a peer connection approach.
… I will concentrate on a peerconnection approach for generator.
… We're only talking about making things from CPU-friendly. We have some time.
Guido: It's a niche use case, but it's quite good for it.
Youenn: We should put priority on the Processor first.
jib: The whole thing can easily be shimmed over AudioWorklet.
… The interop gap is more problematic.
… We don't a lot of traction as a result.
Markus: I'm not sure that you can observe timestamps in the right way.
Youenn: That's something that we can look at in the AudioWorklet context.
Markus: But then you pass a high priority thread
Youenn: For the use cases considered, that seems reasonable.
Youenn: One issue with Generator is usage in contexts when it should not be used, and one implementation may be optimized, leading to applications that ship because they work well in one particular browser, but then not as well in others.
jib: Even once we get implementations, I worry that it might negatively impact progress on web types.
Guido: We saw that people were smart enough to use the processor and not the generator, so people understand the constraints.
… The argument that not supporting a very good use case because you can also use it for other use cases when it's less good is weak.
Youenn: I'm more thinking about it in terms of priorities.
Guido: Re. spec compliance, as long as we fix the issues, we're happy to align.
dom: We want to hear feedback from audio folks.
Paul: As mentioned by jib, we shipped a shim, it's easy and it works. There is no need to do something more specific.
… In modern OS, the energy consumption gap is between 0 and 1, and there is almost no gap between 1 and 2.
… Careful measurements have shown that improvements exist between 0 and 1, not above.
… You do have a realtime thread.
Guido: If you want to do a processing that's doing something such as encoding. Why would I want to spin up a realtime thread?
Paul: If you have some audio running, you already have a realtime thread. If you have an audio context, you don't need another one.
dom: The point is not only about performance, more about audio and video processing being combined and requiring separate contexts to run as a result.
Youenn: In Safari, we have a limit on the number of audio contexts that you can create.
… If we can reuse some of them, that's good.
… This proposal for Processor would allow us to reuse, which seems good and measurable.
Markus: The type of processing we're considering here is WebGPU, WebNN.
Paul: In our polyfill, we send the data zero copy, that's fine.
mjwilson: [refers to a proposal worth considering as part of this discussion]
<mjwilson> https://
Youenn: Having performance measurements would help convince people.
… The Web Audio proposal would be worth hearing.
Decoder failure API
Steve: Edge team, talking about adding decoder fallback and failure events to WebRTC.
… We're continuing to operate on feedback that we received since last year.
Steve: Game streaming platforms realy on hardware decoding, but they don't have access to mic/cameras, so don't know when a decoder fallbacks from hardware to software.
… We want applications to know when this happens and help them analyze what happens.
Steve: We don't want to expose any vendor-specific hardware information.
Steve: Current proposal is two new events on the RTCRtpReceiver.
Steve: Here's the proposal for the decoder state change event.
… The codecParameters and powerEfficientDecoder are nullable to allow implementations to determine whether to report them.
Steve: You can send telemetry for example.
Steve: The error event could be useful if you have an hardware decoder that cannot fall back to software.
… Questions?
Youenn: I talked with our privacy folks. For the nullable properties, it should be pretty easy to get PING's attention through their repository.
… That would be a good next step to do for this proposal.
jib: Some nits on the API. First, why on the peerconnection rather than the transceiver?
… The example is not entirely clear when the event fires.
Steve: The proposal is that there will be an initial event when it starts decoding.
jib: It may be worth exposing another attribute then.
jib: Agree with the need to do a privacy review.
hta1: Since this is decoder, the only place it belongs is on the receiver.
Steve: That's what we did!
hta1: You did it right.
Youenn: When I'm looking at errorText, I'm wondering whether it will be platform specific. How you make use of that error code or error text. Are you trying to achieve "I should retry?" or is it more for stats purpose?
Steve: That's a good question. We copied this from the WebRTC error.
… It's mainly for error reporting and troubleshooting.
… There may be some more errors where this could be interesting.
Nishita: Another scenario is telemetry for the app. Coding negotiations or showing the user some report.
Youenn: For telemetry, there's a proposal from Guido to use the Reporting API. That may be a useful place for this as well.
<dom> Discussion on Reporting API for WebRTC this morning
Youenn: What can I do with it for my users might be the right question for this proposal.
cpn: Wondering whether PING already reviewed RTCRtpCodecParameters? I'm wondering whether we're opening the discussion to a much broader scope.
hta1: We switched to Media Capabilities as a result of previous discussions. What PING thinks about it right now, I do not know.
Media Playback Quality
Markus: Wanted to collect feedback on a proposal for improved metrics for temporal video quality.
Markus: To recap, the video conferencing problem is you capture a bunch of frames, with timestamps, and the job of the receiver is to represent that accordingly.
Markus: One way to measure that is through the framerate. It doesn't use timestamps at all.
… You don't see itches with this.
… Good for understanding video. For screen sharing, it's not so good anymore because it's variable framerate.
… Accuracy is very good.
Markus: More complicated territory.
… The Harmonic mean. The way it averages across frame durations.
… This is unfortunately not available for video tiles, because we don't expose them.
… What's good is that since the weights are the frame durations themselves, then lenghty durations will be over presented.
… We will see more the itches.
… But not so good for accuracy.
… WebRTC does not know when the frames will be rendered.
… It also cannot see frame drops.
Markus: Proposal is to add getVideoPlaybackQuality()
… Harmonic FPS.
… And then we have the info for local video tiles.
… Great accuracy.
… We can still not measure frame drops.
Markus: We also propose to add another measure which is reproduction jitter RMSE.
… Difference should be 0 but fluctuates.
… Root min square. With that, you can measure how accurate.
… It can actually miss frame drops.
Markus: And then add reproduction jitter metric.
… Accuracy for all current codecs is 90kHz.
Markus: Here's the idea, with a few new fields.
[showing vibe coded demo]
Markus: I'm doing video processing here in terms of burning the CPU.
… I can then play with it and look at the graphs.
… [going through different demo settings display harmonic FPS, WebRTC harmonic FPS]
… [and jitter reproduction metric]
cpn: Last TPAC, we talked about Video Playback Quality API. Your proposal just adds more things, right?
Markus: Yes.
cpn: Do the metrics belong on this or on getStats()?
Markus: I don't think getStats() is the right place for this.
… E.g., Youtube playback quality.
Youenn: Using RTP timestamps, I guess? If there are none?
Markus: Then I don't [missed]
cpn: Are there comments on the definitions of the stats?
dom: Are these standard metrics for some definition of standard?
… Are we paving the cowpath or are we trying something new?
Markus: A bit of both.
Youenn: Some A/V stats may be computed by native apps.
Markus: Only Chrome does it, I think.
… There is a performance hit as well as you need every frame.
… In getStats(), there's "jitter squared duration" or something.
<dom> https://
jib: I don't think provisional stats have gone through a standardization process.
dom: totalSquaredInterFrameDelay, it is.
cpn: There's a question around the Media Playback Quality spec as it is.
… A few years ago, we agreed to hand it over to the WHATWG.
… What you're proposing now is to make it a more lively document.
… And the question is whether the Media WG should handle it.
… That requires discussion with the WHATWG.
Francois: Charter suggests that Media WG will not do any actual work on the spec over than to transition it as well.
cpn: Let's discuss as part of the Media Playback Quality repository.