14:58:55 <RRSAgent> RRSAgent has joined #webrtc
14:59:00 <RRSAgent> logging to https://www.w3.org/2024/04/23-webrtc-irc
14:59:00 <dom> Meeting: WebRTC April 23 2024 meeting
14:59:01 <dom> Agenda: https://www.w3.org/2011/04/webrtc/wiki/April_23_2024
14:59:01 <dom> Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf
14:59:01 <dom> Chairs: HTA, Jan-Ivar, Bernard
14:59:40 <dom> Present+ Dom, Bernard, Tove
14:59:59 <dom> Present+ Harald
15:00:07 <dom> Present+ Elad
15:00:15 <dom> Present+ Sameer
15:00:40 <dom> Present+ Eero
15:01:09 <dom> Present+ Riju
15:01:23 <dom> Present+ TonyHerre
15:01:26 <dom> Present+ Jan-Ivar
15:01:38 <dom> Present+ Guido
15:02:24 <dom> Present+ FrederikSolenberg
15:02:43 <dom> RRSAgent, make log public
15:03:36 <dom> Present+ Florent
15:04:39 <dom> Recording is starting
15:05:15 <dom> Present+ Carine
15:05:50 <dom> Present+ TimP
15:06:14 <dom> scribe+
15:06:58 <dom> Topic: -> https://github.com/w3c/webrtc-encoded-transform/pull/186 Custom Codecs
15:06:58 <dom> [slide 10]
15:07:36 <dom> [slide 11]
15:08:06 <dom> [slide 12]
15:09:18 <dom> [slide 13]
15:09:56 <dom> Harald: this requires the ability of setting the mime type of aframe, which can be done two ways: with a frame constructor (merged in #233), or via setMetadata (#202) which has stalled
15:10:13 <dom> ... setMetadata feels like a better fit from my perspective
15:10:34 <dom> ... but at least the constructor allows for this, and so we may not need two different ways
15:11:00 <dom> jan-ivar: I'm supportive of the API shape; on the question of constructor vs setMetadata - it's a bit complicated
15:11:16 <dom> ... because these encoded frames are mutable, unlike webcodecs
15:11:30 <dom> ... that's a bit unfortunate but it makes sense in the context of encryption
15:11:56 <dom> ... in webcodecs, frames are unmutable, which would require a copy-constructor step
15:12:13 <dom> Harald: with immutable data, we would have to have a copy constructor with a separate argument for the data itself
15:12:23 <dom> Jan-Ivar: iow, I don't have a clear answer to your question
15:12:45 <dom> bernard: also supportive of this; setMetadata should be fine here, we don't have the same constraints we had in WebCodecs
15:13:01 <dom> ... for WebCodecs, we didn't want data to change while an operation is in progress
15:13:07 <dom> ... here setMetadata should be safe
15:13:18 <dom> ... it would be nice to allow for this without making a copy
15:13:38 <dom> ... For some codecs like H264, it's not just the mime type, it's also a profile, packetization mode, etc
15:13:40 <dom> ... can you set this here as well?
15:13:50 <dom> harald: yes, it includes all the parameters
15:14:05 <dom> [TimP: supportive on this]
15:14:09 <dom> s/on this/of this
15:15:04 <dom> Harald: based on the feedback, it sounds like moving forward with #202 would be worth looking into again
15:15:28 <dom> Guido: setMetadata feels like a better fit for this use case (although I was supportive of the copy constructor for a separate one)
15:15:42 <dom> Jan-Ivar: let's follow up on github
15:16:07 <dom> [TimP: any issue with having several transforms in sequence?]
15:16:26 <dom> Harald: if they're connected by pipelines, this creates good hand-off points from one to the next
15:16:57 <dom> Jan-Ivar: given this, I think the copy constructor would be better fit
15:17:32 <dom> ... setMetadata can end up with @@@ issues
15:18:23 <dom> ... not clear that we should extend the problem we have with data to metadata
15:18:48 <dom> Bernard: in WebCodecs, immutable data was a way to avoid race conditions with the work being done in a separate thread
15:19:12 <dom> Jan-Ivar: this is handled via the transfer step here
15:19:52 <dom> Bernard: setMetadata could only be called from the transform right? not after it has been enqueued?
15:20:56 <dom> Jan-Ivar: setMetadata can only be called if the object is still there…
15:21:18 <dom> ... It feels to me like having setMetadata is redundant with the copy constructor
15:21:44 <dom> Harald: right now, the copy constructor is expensive
15:21:53 <dom> Jan-Ivar: let's continue the discussion on #202
15:22:30 <dom> RESOLVED: Consensus on on #186, discussion to continue on #202
15:22:37 <dom> Topic: -> https://github.com/w3c/mediacapture-screen-share-extensions/issues/4 Captured Surface Switching
15:22:37 <dom> [slide 17]
15:23:52 <dom> [slide 18]
15:24:51 <dom> [slide 19]
15:25:45 <dom> [slide 20]
15:27:33 <dom> [slide 21]
15:28:23 <dom> [slide 22]
15:29:12 <dom> [slide 23]
15:30:06 <dom> [slide 24]
15:31:29 <dom> [slide 25]
15:31:53 <dom> Tove: is this a promising way forward?
15:32:26 <dom> [TimP: Is simply supplying an event handler enough to discriminate ? Do we actually need the surface/session property?]
15:33:21 <dom> Tove: we discussed this in the December meeting whether an event handler (back then, a callback) would be enough to discriminate
15:34:03 <dom> ... and there is a design principe that changing behavior whether an event handler is on
15:34:31 <dom> Jan-Ivar: indeed; there are cases where that would be OK
15:34:38 <dom> ... we haven't talked about stopping tracks here
15:35:19 <dom> ... it might be OK for the user agent to optimize away user visible behavior when it comes to how quickly the indicators state/permission UX change
15:36:08 <dom> Jan-Ivar: for backwards compatibility, I think we're in agreement the UA could optimize the case when no event handler has been added
15:37:01 <dom> Tove: the original proposal was that you would always get the two kind of tracks which if you don't need it would still need to be managed
15:37:18 <dom> ... hence this new proposal that lets apps pick which tracks they want
15:38:07 <dom> Jan-Ivar: If I opt-in to the surface track, what would getDisplayMedia return?
15:38:29 <dom> Tove: I'm proposing getDisplayMedia returns the session track, and the event exposes the surface track
15:38:42 <dom> ... but I'm open to other approaches
15:39:25 <dom> Elad: what if we had a getter for the session track, but only return the surface track from getDisplayMedia
15:39:42 <dom> ... that way you don't have to wait for an event, you could access to either at any point
15:40:15 <dom> ... stopping for unused surface tracks could be handled by the capturecontroller
15:40:27 <dom> Jan-Ivar: I like the behavior and concepts of surface/session tracks
15:40:49 <dom> ... but asking developers to pick one upfront feels artificial
15:41:13 <dom> ... I could move from one tab to another tab with audio, but then stay in tab+audio mode moving forward
15:41:28 <dom> ... hence why I was proposing to expose both and let the app close the ones they don't want
15:41:46 <dom> ... I was initially worried this would lead to confusing indicators
15:41:54 <dom> ... but Youenn convinced me this could be optimized away
15:42:58 <dom> Harald: if I want to write an app that handles switching of surfaces and have code that covers both cases, I would struggle to maintain two code paths to manage what gets presented to the end user
15:44:27 <dom> Tove: the problem I see with Jan-Ivar's proposal is that we lose the guarantee that one track represents one surface which I think is an attractive invariant
15:45:38 <dom> Jan-Ivar: I don't think Web developers need to care about that; there is an isolation principle that when switching from one surface to another, you're also switching sources
15:45:59 <dom> ... I like slide 19 - the only thing missing is stopping tracks
15:46:31 <dom> ... if a developer doesn't care about surface track at all, don't register an event handler
15:46:52 <dom> ... you would want to stop old tracks in the event handler
15:47:28 <dom> ... this would also let the developer choose live which tracks they can support
15:49:03 <dom> Elad: what happens if the app doesn't stop either track?
15:49:57 <dom> Jan-Ivar: the backwards compatible design is injection; would we be talking about ending that model?
15:50:47 <dom> RESOLVED: more discussion is needed on the lifecyle of surface tracks
15:51:06 <dom> Topic: -> https://github.com/w3c/mediacapture-main/issues/972 Racy devicechange event design has poor interoperability in Media Capture and Streams
15:51:06 <dom> [slide 29]
15:51:29 <dom> s/slide 29/slide 28
15:54:13 <dom> [slide 29]
15:54:37 <dom> Jan-Ivar: this is modeled on the RTC track event
15:54:51 <dom> Jan-Ivar: any objection to merging this PR?
15:55:27 <dom> Guido: what does "current result from enumerateDevices" mean?
15:55:57 <dom> Jan-Ivar: good point, I should rephrase that - it's the devices at the time of the event is fired
15:56:19 <dom> ... this would be a synchronous equivalent to what enumerateDevices would produce
15:56:44 <dom> Guido: I agree with the change, but the language should be clarified
15:57:12 <dom> Dom: is there an existing internal slot we could refer to?
15:57:32 <dom> Jan-Ivar: there is one, but with too much info in it, although we have an algorithm to filter it
15:57:46 <dom> RESOLVED: merged 972 with language clarified on current device list
15:57:58 <dom> Topic: -> https://github.com/w3c/webrtc-pc/ WebRTC API
15:57:58 <dom> Subtopic: -> https://github.com/w3c/webrtc-pc/pull/2961 Convert RTCIceCandidatePair dictionary to an interface
15:58:07 <dom> [slide 30]
15:58:27 <dom> Jan-Ivar: prompted by ongoing implementation of setCodecPreferences in Firefox
15:59:58 <dom> ... is it a good idea to trigger negotiationneeded as needed? if so, what would "as needed" actual encompass?
16:01:38 <dom> Present+ SunShin
16:03:05 <dom> Harald: when does setCodecPreferences make a difference? when you're in a middle of a negotiation, it will make a difference in the answer; it doesn't effect the local state, it can only change the remote state, which can only happen after negotiation
16:03:49 <dom> ... wouldn't it be simpler to just fire negotiationneeded?
16:04:13 <dom> Jan-Ivar: there are edge cases when you're not in a stable state and negotiationneeded is fired
16:04:38 <dom> ... it sounds like you're agreeing that firing negotiationneeded would be good
16:04:46 <dom> harald: I'm trying to figure out when to fire and not to fire
16:05:00 <dom> ... it could be we fire it when the list of codecs is different from what is in remote description
16:05:27 <dom> ... wouldn't fire when setCodecPreferences doesn't change the list (including because the negotiation trims down the list of codec preferences)
16:05:54 <dom> ... that would mean we need to have an internal slot to keep track the last codec preferences call
16:06:25 <dom> jan-ivar: probably indeed, if we want to optimize the cases where setCodecPreferences look like it would make a difference but doesn't
16:07:37 <dom> Florent: It's a nice idea to trigger negotiationneeded by sCP, but I'm worried about backwards compatibility issues
16:07:58 <dom> ... it could cause issues if apps get negotiation needed at unexpected times
16:09:11 <dom> ... given the complexities of identifying cases where it's needed and backwards compatibility issues, I'm not sure we can move forward
16:09:38 <dom> Jan-Ivar: negotiationneeded is a queued task that can't happen during a negotiation
16:09:52 <dom> ... in other words, you would face the same issues if that was handled manually by the app developer
16:10:06 <dom> ... although I recognize there may be concerns in the transition
16:11:07 <dom> Florent: sCP is already used by a lot of widely deployed applications - I agree this might have been a better design, but it's not clear changing it now is the right trade-off at this point
16:11:42 <dom> ... atm, negotiationneeded is triggered in a very limited number of API calls; adding it to another API call may break expectations
16:12:10 <dom> Jan-Ivar: if you're not using the negotiationneeded event, you wouldn't be affected by this
16:12:25 <dom> ... if you're using sCP in remote-answer, neither
16:13:01 <dom> Florent: this may be problematic if that was happen later in the middle of a transaction since apps wouldn't have been built to handle this
16:13:14 <dom> ... I'm also worried about the complexity of specifying "as needed"
16:14:10 <dom> ... maybe this could be obtained via a different mechanism, e.g. an additional parameter in addTransceiver
16:14:39 <dom> Jan-Ivar: thanks - worth documenting these concerns in the github issue
16:14:50 <dom> Subtopic: -> https://github.com/w3c/webrtc-pc/issues/2956 receiver.getParameters().codecs seems under-specified
16:14:51 <dom> [slide 31]
16:17:00 <dom> [slide 32]
16:17:57 <dom> [slide 33]
16:19:11 <dom> [slide 34]
16:21:10 <dom> Harald: the attempt was to make sure that we have the conceptual list that contains that we can possibly negotiation, and that we could add to this list over time
16:21:15 <dom> ... and this had to be per transceiver
16:21:28 <dom> ... I missed this particular usage of the list
16:21:38 <dom> ... we have to decide what we want to represent
16:22:00 <dom> ... if we want to make sure we represent only codecs that we are able to receive at the moment, unimplemented codecs can't be received of course
16:22:20 <dom> ... we could do this by making the enabled flag mean "currently willing to receive"
16:22:38 <dom> ... ie it would have to match the most recently accepted local description
16:23:57 <dom> Jan-Ivar: ok, so this sounds like there is something worth re-instantiating from the previous algorithm
16:25:30 <dom> Jan-Ivar: these slides would like apply to sendCodecs as well, but I haven't had the change to check in details
16:29:01 <dom> Topic: Background segmentation mask
16:29:01 <dom> [slide 37]
16:29:54 <dom> [slide 38]
16:31:15 <dom> -> https://drive.google.com/file/d/1vw8gLSGzdeqM7w1N7B4uolrxqE-8mU5f/view?resourcekey Video of the background mask demo
16:32:25 <dom> [slide 39]
16:33:33 <dom> Riju: in background mask, the original frame remains intact and the mask get provided in addition to the original frame
16:33:40 <dom> ... both frames are provided in the same stream
16:34:56 <dom> .. .we expect to put up a PR sometimes this week based on this
16:35:11 <dom> Elad: this looks very interesting
16:35:45 <dom> ... do I understand correct that the masks get interleaved in the stream?
16:36:24 <dom> Riju: the driver provide the masks data; the code on slide 39 shows how to operate on it
16:36:50 <dom> Eero: the order is first masked frame, then original frame
16:37:22 <dom> Elad: this could be confusing; could the actual frame be provided with metadata instead of providing it as a different frame?
16:37:38 <dom> ... getting all the data at the same time would seem easier
16:38:00 <dom> Riju: the synthetic frame was easier for demo purposes, but we could add something like you suggested
16:38:23 <dom> ... we got comments on the blur flag that having both the original and processed one was useful IIRC
16:40:11 <dom> Harald: this reminds me of discussion of alpha channels and masks which were very much about how to express the metadata
16:40:48 <dom> ... this particular approach has the question of how you transmit it
16:41:14 <dom> ... if this is encoded as metadata, the question is how it gets encoded
16:41:23 <dom> ... have you looked into encoded the mask in the alpha channel?
16:41:44 <dom> eero: in Chrome, the GPU doesn't have access to an alpha channel
16:42:16 <dom> Jan-Ivar: +1 that the alpha channel feels intuitively a better place for this
16:42:50 <dom> ... to clarify, this isn't a background replacement constraint
16:43:06 <dom> Riju: right, the app can do whatever they want with the mask
16:47:24 <dom> Bernard: currently we're not doing a great job of supporting the alpha channel - e.g. webcodecs doesn't support it
16:47:29 <dom> ... it's just being added to AV1
16:47:38 <dom> ... lots of holes currently
16:47:59 <dom> ... I would encourage you to file bug and spec issues
16:48:15 <dom> Riju: as Elad mentioned, this would be mostly for local consumption
16:48:40 <dom> Frederik: is the API shape sustainable, e.g when adding gestures detection, face detection?
16:48:51 <dom> ... can we add them all to metadata?
16:49:11 <dom> Riju: we've been looking at these other features
16:49:42 <dom> Bernard: there were discussions in the Media WG to add more metadata to VideoFrames and how encoders should react to it
16:49:57 <dom> ... they're not preserved in the encoded chunks, they get dropped
16:50:47 <dom> Jan-Ivar: part of my comments on Face detection was to what extent this needed to be tied to the camera driver, and if instead this should be exposed in a generic media processing work
16:51:23 <dom> Riju: background segmentation is a priority because you get 2x or 3x performance improvements
16:51:44 <dom> Jan-Ivar: but is there something about masking that makes it worth dealing with it as a camera feature?
16:52:03 <dom> Riju: this is supported on any camera on Windows or Mac
16:52:37 <dom> ... it takes advantage of the local optimized models available to native
16:53:52 <dom> Harald: what controls what gets masked?
16:53:59 <dom> Riju: only background/foreground
16:54:13 <dom> RRSAgent, draft minutes
16:54:14 <RRSAgent> I have made the request to generate https://www.w3.org/2024/04/23-webrtc-minutes.html dom
16:55:00 <dom> Riju: if there is rough support, we can start with a PR and iterate on it
16:55:45 <dom> Jan-Ivar: my concern is how it relates to generic media processing pipelines
16:56:14 <dom> ... background blur was a way to mitigate what was being provided by platform and needed to allow for opt-in/opt-out from apps
16:56:28 <dom> ... opening up an open-ended area of features would be a concern for us
16:57:38 <dom> ... this sounds like something that ought to be part of generic media processing library
16:58:49 <dom> Riju: this provides a primitive that is generally useful across videoconferencing apps - green screen, blur, replacement
16:59:38 <dom> Bernard: there was another discussion in the Media WG to discussion media processing
17:01:06 <dom> dom: the tension is between doing a hardware-acceleration specific approach vs generic media processing
17:01:29 <dom> Riju: the motivation here is the performance boost
17:02:06 <dom> Jan-Ivar: no clear interest from us at this point, but this may change based on market interest
17:02:09 <dom> RRSAgent, draft minutes
17:02:10 <RRSAgent> I have made the request to generate https://www.w3.org/2024/04/23-webrtc-minutes.html dom
18:32:03 <Zakim> Zakim has left #webrtc