WebRTC April 2022 VI – 26 April 2022

Meeting minutes

ningxin: slide 15 is high level pipeline to build a background blur video pipeline

Two implementations: WebGL and WebGPU/WebNN.

texture uploads to GPU in both cases.

Last shader is taking 3 input: original image, blurred image, and computed segmentation map.

Description of the perf issues, in particular CPU usage and GC.

Bernard: is there a copy on the output at offscreencanvas level?

Ningxin: not sure.

Tim: is the perf acceptable? or do we need to make massive improvements?

ningxin: we need to measure battery impact

dom: we are doing this prototype to evaluate what HW acceleration can bring us. And identify potential roadblocks when trying to do video processing on media capture

for instance color conversion or pixel format.

youenn: looking at 20% CPU on GC - can that be fixed by implementations, or is it an architectural issue with having lots of objects created per frame?
… on native, there is usually a buffer pool to help with that
… does that need to be surfaced to the JS, or can that be dealt solely by the UA?

ningxin: GPUBuffers are created beforehand. Some objects are created for every frame, like textures.

There are ways to avoid many object allocations.

at JS level. Maybe UA optimisations might help.

dom: what are the next steps for this project?

ningxin: 1. enable WebGPU backend.

2. new APIs that allow import frames as GPU textures and see whether that will improve efficiency.

3. Improve VideoFrame GC PR: we will try out when it is merged in Chrome.

youenn: re CPU efficiency - this is moving between main thread and worker thread, that may have a small perf impact
… doing everything in the worker might be helpful once that's possible

WebRTC Extensions 🎞︎

[Slide 19]

Issue #95 🎞︎

Media Capabilities issue 185

Bernard: question to WG is: is it a goal for MC to deprecate getCapabilities?

youenn: my understanding is that media capabilities is really about audio/video capabilities
… so it doesn't make sense to expose e.g. CN there
… they should stay in WebRTC getCapabilities
… getCapabilities() being sync is problematic; that's less of an issue for software capabilities such as CN
… so deprecated getCapabilities fully is not a goal, but partially, yes

Florent: +1 on the approach usability of resulting split is a concern

chris: seems fine to use that split; do we want to return rtc codec info from media capabilities?
… if so, please take at look at https://github.com/w3c/media-capabilities/issues/185

youenn: +1 on disambiguating the outcome of this situation
… listing all codecs in just one call is a non goal
… an SFU is typically only interested in a few codecs
… for P2P, setCodecPreferences is probably not needed in the first place - you can deal with a generic codec negotiation

jib: would be good to clarify if we want to deprecate "real" codecs from getCapabilities? this sounds like a good long term goal for me

harald: I worry that RTX/RED/FEC info needs to be available somewhere
… getCapabilities has known problems and would be the only way to get it
… changing getCapabilities is actually harder to deprecating it
… in the long run, it's best to deprecated getCapabilities and replace it with a better dedicated API

Florent: two different scenarios for setCodecPreferences: talking with an SFU in which case you can make specific codec queries; in a P2P scenario, if you can't enumerate all the codecs, you won't be able to call setcodecpreferences
… this would require hardcoding a list of codecs
… is there a way to make getCapabilities evolve in a shape that would satisfy everyone?
… getCapabilities+setCodecPreferences has a lot of current usage, will be hard to deprecated

Issue #100 🎞︎

[Slide 25]

[Slide 27]

youenn: might be fine, but I worry about the defaults? would they be the same across browsers?
… there are current codecs that are defaults, but that may need to evolve over time
… this could create Web compat issues

Sergio: some of the codecs are receive-only
… the list would be based on common sense, but I don't have a strong opinion

youenn: my worry is about P2P - if the defaults aren't same across UAs, the negotiation will fail

sergio: my suggestion was to use defaults in the offer, and adapt the answer based on the offer

harald: two interfaces needed: the list of codecs currently willing to offer, the set of codecs you can offer
… the 1st one might be getCapabilities, the proposal on the slide for the 2nd
… in terms of interop, MTI codecs should be the safety net, and they should be in the mandatory-to-offer

florent: the proposal seems ot have a lot of overlap with setCodecPreferences / getParameters - could we improve these instead of coming up with new API

[Philipp supports this on the chat]

Sergio: would be fine; I started from the rtp header extensions, maybe that should apply there?

florent: the difference is that there is already an API to set codec preferences

sergio: but header extensions could be added there too?

Bernard: let's continue the discussion in the issue
… or work on a matching PR

https://github.com/w3c/mediacapture-extensions/issues/47 Voice Isolation Constraint 🎞︎

[Slide 41]

[Slide 42]

Resolution for issue 95: mark issue as ready for PR

[Slide 43]

[Slide 44]

[Slide 45]

youenn: it makes sense; reasonable to ignore `noiseSuppression`
… there is also `echoCancellation` in the audio pipeline
… does it make sense to do `echoCancellation` when this is set?

harald: I think it's mostly orthogonal

youenn: so `echoCancelation: false` is compatible with `voiceIsolation: true`
… it may be challenging for some implementations to support these combinations

jan-ivar: I like this too; what should the default be? that may bring concerns

harald: we can discuss this in the PR
… conservatively, the default should be the current behavior (false)

dom: instead of boolean, we could use strings for extra flexibility.

RESOLUTION: mark voiceIsolation issue #47 as ready for PR

support for contentHint in Capture Handle 🎞︎

youenn: setting the track hint is unnecessary - if the capturer is setting the hint on its side, the UA knows that the track being captured is text - there is no need to transmit it to the capturer
… except maybe if WebCodecs is the picture
… having the UA taking care of this seems preferable

elad: the suggestion would be that the captured content self-declare its type and the UA uses it?
… but that removes the liberty of the capturer to decide whether to use the hint or not
… which could be based on e.g an allowlist
… autodetection by the UA would have its own limitation

bernard: re the WebCodecs case - contentHint is not automatically consumed by WebCodecs, it's up to the app to apply it as codec setting

jib: I agree with youenn that the UA is in good place to shortcircuit the capturer part
… the proposal could be useful for the capturee side
… exposing further metadata to the controller might be an interesting addition to my capturecontroller proposal

youenn: it could be exposed at the videoframe level

jib: I see agreement on the need, not yet on the API shape

WebRTC Extensions 🎞︎

Avoid user-confusion by avoiding offering undesired audio sources 🎞︎

[Slide 54]

[Slide 55]

[Slide 56]

Tim: is this only applicable for echo management?

elad: it could be that an application is interested in recording a specific tab, no more than that.

Tim: this use case does not seem address: identifying the desired tab would be needed.

Elad: some VC applications usually do not want to capture system audio.

Jan Ivar: supportive, how about reusing displaySurface constraint here?

Elad: Might work for me.

Jan Ivar: I would like to remove monitor from here.

dom: if we do not include monitor here, audio: true might capture system audio. But applications would not be able to explicitly ask for system audio.

dom: displaySurface would be a strange name for audio.

youenn: let's enumerate the different approaches: avoidSystemAudio, displaySurface, sources

youenn: scope is unclear, we need to clarify this before going to PR.

youenn: different properties allow to do feature detection on what kind of recording the UA can do

elad: my focus is only limiting access to system audio, but I also think flexibility is helpful

timp: back to my echocancellation point - the constraint could be linked to whether the source can be echocancelled

Harald: source being echo cancellable is a second concern. Biggest point is avoiding system audio.

Tim: as well as window audio.

harald: echoCancellation is a secondary concern - capturing system audio could disclose info from a 3rd party

Region Capture 🎞︎

RESOLUTION: continue discussions on GitHub.

[Slide 59]

youenn: #11 is an issue on the shape of the CropTarget API
… given current chrome implementation work, feels it's useful to converge on the API shape

[Slide 60]

youenn: do we want to attach the API to element or to MediaDevices?
… element feels like a better path

jib: +1

elad: I prefer mediaDevices given its linkage to screen capture

youenn: cropTarget is linked to MediaStreamTrack, not mediaDevices
… and it's really tied to an element

elad: it can be used through an object you get from getDisplayMedia

youenn: but with a detached mediaDevices, you can't reject the promise

dom: prefer element option.

youenn: next question is attribute vs method
… slight pref for attribute, but no strong feeling

elad: there is a cost to minting a crop target - we mark the element in the rendering pipeline in specific ways that we shouldn't abuse

youenn: I thought you were going to use a lazy approach to reduce that cost

elad: lazy tagging might help, but this needs more thinking

jib: +1 to attribute
… developers value trump implementators value

elad: I don't think it matters much to developers in the first place

harald: disagree with messing with the element interface, and on hiding the fact that the operation has a cost
… also async (promises) may be needed for some implementations
… let's not hide the reality of the situation

jib: the cost seems to be Chrome-specific
… the real goal of this API is a transferable reference

youenn: +1
… other APIs in the past have re-used the element interface, have made similar decisions on methods / attributes, async vs sync
… we should follow existing implemented patterns

dom: is there any other API that may be use this tranferable reference?

youenn: that's something I bring up in the issue

elad: this may create unsafe usage for this well-defined target

jan-ivar: this could be evaluated

hta: but this shouldn't block progress on the specific narrow goal we have

youenn: my focus is aligning with current API patterns for this API

elad: the TAG will chime in; but if they don't give a clear specific suggestion
… we could move with the current design that can be polyfilled

– DRAFT –
WebRTC April 2022 VI

26 April 2022

Attendees

Meeting minutes

WebNN Integration with real-time video processing 🎞︎

WebRTC Extensions 🎞︎

Issue #95 🎞︎

Issue #100 🎞︎

https://github.com/w3c/mediacapture-extensions/issues/47 Voice Isolation Constraint 🎞︎

support for contentHint in Capture Handle 🎞︎

WebRTC Extensions 🎞︎

Avoid user-confusion by avoiding offering undesired audio sources 🎞︎

Region Capture 🎞︎

Summary of resolutions