WebRTC April 23 2024 meeting

Meeting minutes

Harald: this requires the ability of setting the mime type of aframe, which can be done two ways: with a frame constructor (merged in #233), or via setMetadata (#202) which has stalled
… setMetadata feels like a better fit from my perspective
… but at least the constructor allows for this, and so we may not need two different ways

jan-ivar: I'm supportive of the API shape; on the question of constructor vs setMetadata - it's a bit complicated
… because these encoded frames are mutable, unlike webcodecs
… that's a bit unfortunate but it makes sense in the context of encryption
… in webcodecs, frames are unmutable, which would require a copy-constructor step

Harald: with immutable data, we would have to have a copy constructor with a separate argument for the data itself

Jan-Ivar: iow, I don't have a clear answer to your question

bernard: also supportive of this; setMetadata should be fine here, we don't have the same constraints we had in WebCodecs
… for WebCodecs, we didn't want data to change while an operation is in progress
… here setMetadata should be safe
… it would be nice to allow for this without making a copy
… For some codecs like H264, it's not just the mime type, it's also a profile, packetization mode, etc
… can you set this here as well?

harald: yes, it includes all the parameters

[TimP: supportive of this]

Harald: based on the feedback, it sounds like moving forward with #202 would be worth looking into again

Guido: setMetadata feels like a better fit for this use case (although I was supportive of the copy constructor for a separate one)

Jan-Ivar: let's follow up on github

[TimP: any issue with having several transforms in sequence?]

Harald: if they're connected by pipelines, this creates good hand-off points from one to the next

Jan-Ivar: given this, I think the copy constructor would be better fit
… setMetadata can end up with @@@ issues
… not clear that we should extend the problem we have with data to metadata

Bernard: in WebCodecs, immutable data was a way to avoid race conditions with the work being done in a separate thread

Jan-Ivar: this is handled via the transfer step here

Bernard: setMetadata could only be called from the transform right? not after it has been enqueued?

Jan-Ivar: setMetadata can only be called if the object is still there…
… It feels to me like having setMetadata is redundant with the copy constructor

Harald: right now, the copy constructor is expensive

Jan-Ivar: let's continue the discussion on #202

RESOLUTION: Consensus on on #186, discussion to continue on #202

Captured Surface Switching

Tove: is this a promising way forward?

[TimP: Is simply supplying an event handler enough to discriminate ? Do we actually need the surface/session property?]

Tove: we discussed this in the December meeting whether an event handler (back then, a callback) would be enough to discriminate
… and there is a design principe that changing behavior whether an event handler is on

Jan-Ivar: indeed; there are cases where that would be OK
… we haven't talked about stopping tracks here
… it might be OK for the user agent to optimize away user visible behavior when it comes to how quickly the indicators state/permission UX change

Jan-Ivar: for backwards compatibility, I think we're in agreement the UA could optimize the case when no event handler has been added

Tove: the original proposal was that you would always get the two kind of tracks which if you don't need it would still need to be managed
… hence this new proposal that lets apps pick which tracks they want

Jan-Ivar: If I opt-in to the surface track, what would getDisplayMedia return?

Tove: I'm proposing getDisplayMedia returns the session track, and the event exposes the surface track
… but I'm open to other approaches

Elad: what if we had a getter for the session track, but only return the surface track from getDisplayMedia
… that way you don't have to wait for an event, you could access to either at any point
… stopping for unused surface tracks could be handled by the capturecontroller

Jan-Ivar: I like the behavior and concepts of surface/session tracks
… but asking developers to pick one upfront feels artificial
… I could move from one tab to another tab with audio, but then stay in tab+audio mode moving forward
… hence why I was proposing to expose both and let the app close the ones they don't want
… I was initially worried this would lead to confusing indicators
… but Youenn convinced me this could be optimized away

Harald: if I want to write an app that handles switching of surfaces and have code that covers both cases, I would struggle to maintain two code paths to manage what gets presented to the end user

Tove: the problem I see with Jan-Ivar's proposal is that we lose the guarantee that one track represents one surface which I think is an attractive invariant

Jan-Ivar: I don't think Web developers need to care about that; there is an isolation principle that when switching from one surface to another, you're also switching sources
… I like slide 19 - the only thing missing is stopping tracks
… if a developer doesn't care about surface track at all, don't register an event handler
… you would want to stop old tracks in the event handler
… this would also let the developer choose live which tracks they can support

Elad: what happens if the app doesn't stop either track?

Jan-Ivar: the backwards compatible design is injection; would we be talking about ending that model?

RESOLUTION: more discussion is needed on the lifecyle of surface tracks

Racy devicechange event design has poor interoperability in Media Capture and Streams

[Slide 28]

[Slide 29]

Jan-Ivar: this is modeled on the RTC track event

Jan-Ivar: any objection to merging this PR?

Guido: what does "current result from enumerateDevices" mean?

Jan-Ivar: good point, I should rephrase that - it's the devices at the time of the event is fired
… this would be a synchronous equivalent to what enumerateDevices would produce

Guido: I agree with the change, but the language should be clarified

Dom: is there an existing internal slot we could refer to?

Jan-Ivar: there is one, but with too much info in it, although we have an algorithm to filter it

RESOLUTION: merged 972 with language clarified on current device list

WebRTC API

Convert RTCIceCandidatePair dictionary to an interface

Jan-Ivar: FYI - please take a look and chime in if you have an opinion

setCodecPreferences should trigger negotiationneeded

[Slide 30]

Jan-Ivar: prompted by ongoing implementation of setCodecPreferences in Firefox
… is it a good idea to trigger negotiationneeded as needed? if so, what would "as needed" actual encompass?

Harald: when does setCodecPreferences make a difference? when you're in a middle of a negotiation, it will make a difference in the answer; it doesn't effect the local state, it can only change the remote state, which can only happen after negotiation
… wouldn't it be simpler to just fire negotiationneeded?

Jan-Ivar: there are edge cases when you're not in a stable state and negotiationneeded is fired
… it sounds like you're agreeing that firing negotiationneeded would be good

harald: I'm trying to figure out when to fire and not to fire
… it could be we fire it when the list of codecs is different from what is in remote description
… wouldn't fire when setCodecPreferences doesn't change the list (including because the negotiation trims down the list of codec preferences)
… that would mean we need to have an internal slot to keep track the last codec preferences call

jan-ivar: probably indeed, if we want to optimize the cases where setCodecPreferences look like it would make a difference but doesn't

Florent: It's a nice idea to trigger negotiationneeded by sCP, but I'm worried about backwards compatibility issues
… it could cause issues if apps get negotiation needed at unexpected times
… given the complexities of identifying cases where it's needed and backwards compatibility issues, I'm not sure we can move forward

Jan-Ivar: negotiationneeded is a queued task that can't happen during a negotiation
… in other words, you would face the same issues if that was handled manually by the app developer
… although I recognize there may be concerns in the transition

Florent: sCP is already used by a lot of widely deployed applications - I agree this might have been a better design, but it's not clear changing it now is the right trade-off at this point
… atm, negotiationneeded is triggered in a very limited number of API calls; adding it to another API call may break expectations

Jan-Ivar: if you're not using the negotiationneeded event, you wouldn't be affected by this
… if you're using sCP in remote-answer, neither

Florent: this may be problematic if that was happen later in the middle of a transaction since apps wouldn't have been built to handle this
… I'm also worried about the complexity of specifying "as needed"
… maybe this could be obtained via a different mechanism, e.g. an additional parameter in addTransceiver

Jan-Ivar: thanks - worth documenting these concerns in the github issue

receiver.getParameters().codecs seems under-specified

Harald: the attempt was to make sure that we have the conceptual list that contains that we can possibly negotiation, and that we could add to this list over time
… and this had to be per transceiver
… I missed this particular usage of the list
… we have to decide what we want to represent
… if we want to make sure we represent only codecs that we are able to receive at the moment, unimplemented codecs can't be received of course
… we could do this by making the enabled flag mean "currently willing to receive"
… ie it would have to match the most recently accepted local description

Jan-Ivar: ok, so this sounds like there is something worth re-instantiating from the previous algorithm

Jan-Ivar: these slides would like apply to sendCodecs as well, but I haven't had the change to check in details

Background segmentation mask

[Slide 37]

[Slide 38]

Video of the background mask demo

[Slide 39]

Riju: in background mask, the original frame remains intact and the mask get provided in addition to the original frame
… both frames are provided in the same stream
… .we expect to put up a PR sometimes this week based on this

Elad: this looks very interesting
… do I understand correct that the masks get interleaved in the stream?

Riju: the driver provide the masks data; the code on slide 39 shows how to operate on it

Eero: the order is first masked frame, then original frame

Elad: this could be confusing; could the actual frame be provided with metadata instead of providing it as a different frame?
… getting all the data at the same time would seem easier

Riju: the synthetic frame was easier for demo purposes, but we could add something like you suggested
… we got comments on the blur flag that having both the original and processed one was useful IIRC

Harald: this reminds me of discussion of alpha channels and masks which were very much about how to express the metadata
… this particular approach has the question of how you transmit it
… if this is encoded as metadata, the question is how it gets encoded
… have you looked into encoded the mask in the alpha channel?

eero: in Chrome, the GPU doesn't have access to an alpha channel

Jan-Ivar: +1 that the alpha channel feels intuitively a better place for this
… to clarify, this isn't a background replacement constraint

Riju: right, the app can do whatever they want with the mask

Bernard: currently we're not doing a great job of supporting the alpha channel - e.g. webcodecs doesn't support it
… it's just being added to AV1
… lots of holes currently
… I would encourage you to file bug and spec issues

Riju: as Elad mentioned, this would be mostly for local consumption

Frederik: is the API shape sustainable, e.g when adding gestures detection, face detection?
… can we add them all to metadata?

Riju: we've been looking at these other features

Bernard: there were discussions in the Media WG to add more metadata to VideoFrames and how encoders should react to it
… they're not preserved in the encoded chunks, they get dropped

Jan-Ivar: part of my comments on Face detection was to what extent this needed to be tied to the camera driver, and if instead this should be exposed in a generic media processing work

Riju: background segmentation is a priority because you get 2x or 3x performance improvements

Jan-Ivar: but is there something about masking that makes it worth dealing with it as a camera feature?

Riju: this is supported on any camera on Windows or Mac
… it takes advantage of the local optimized models available to native

Harald: what controls what gets masked?

Riju: only background/foreground

Riju: if there is rough support, we can start with a PR and iterate on it

Jan-Ivar: my concern is how it relates to generic media processing pipelines
… background blur was a way to mitigate what was being provided by platform and needed to allow for opt-in/opt-out from apps
… opening up an open-ended area of features would be a concern for us
… this sounds like something that ought to be part of generic media processing library

Riju: this provides a primitive that is generally useful across videoconferencing apps - green screen, blur, replacement

Bernard: there was another discussion in the Media WG to discussion media processing

dom: the tension is between doing a hardware-acceleration specific approach vs generic media processing

Riju: the motivation here is the performance boost

Jan-Ivar: no clear interest from us at this point, but this may change based on market interest

– DRAFT –
WebRTC April 23 2024 meeting

23 April 2024

Attendees