14:58:55 RRSAgent has joined #webrtc 14:59:00 logging to https://www.w3.org/2024/04/23-webrtc-irc 14:59:00 Meeting: WebRTC April 23 2024 meeting 14:59:01 Agenda: https://www.w3.org/2011/04/webrtc/wiki/April_23_2024 14:59:01 Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf 14:59:01 Chairs: HTA, Jan-Ivar, Bernard 14:59:40 Present+ Dom, Bernard, Tove 14:59:59 Present+ Harald 15:00:07 Present+ Elad 15:00:15 Present+ Sameer 15:00:40 Present+ Eero 15:01:09 Present+ Riju 15:01:23 Present+ TonyHerre 15:01:26 Present+ Jan-Ivar 15:01:38 Present+ Guido 15:02:24 Present+ FrederikSolenberg 15:02:43 RRSAgent, make log public 15:03:36 Present+ Florent 15:04:39 Recording is starting 15:05:15 Present+ Carine 15:05:50 Present+ TimP 15:06:14 scribe+ 15:06:58 Topic: -> https://github.com/w3c/webrtc-encoded-transform/pull/186 Custom Codecs 15:06:58 [slide 10] 15:07:36 [slide 11] 15:08:06 [slide 12] 15:09:18 [slide 13] 15:09:56 Harald: this requires the ability of setting the mime type of aframe, which can be done two ways: with a frame constructor (merged in #233), or via setMetadata (#202) which has stalled 15:10:13 ... setMetadata feels like a better fit from my perspective 15:10:34 ... but at least the constructor allows for this, and so we may not need two different ways 15:11:00 jan-ivar: I'm supportive of the API shape; on the question of constructor vs setMetadata - it's a bit complicated 15:11:16 ... because these encoded frames are mutable, unlike webcodecs 15:11:30 ... that's a bit unfortunate but it makes sense in the context of encryption 15:11:56 ... in webcodecs, frames are unmutable, which would require a copy-constructor step 15:12:13 Harald: with immutable data, we would have to have a copy constructor with a separate argument for the data itself 15:12:23 Jan-Ivar: iow, I don't have a clear answer to your question 15:12:45 bernard: also supportive of this; setMetadata should be fine here, we don't have the same constraints we had in WebCodecs 15:13:01 ... for WebCodecs, we didn't want data to change while an operation is in progress 15:13:07 ... here setMetadata should be safe 15:13:18 ... it would be nice to allow for this without making a copy 15:13:38 ... For some codecs like H264, it's not just the mime type, it's also a profile, packetization mode, etc 15:13:40 ... can you set this here as well? 15:13:50 harald: yes, it includes all the parameters 15:14:05 [TimP: supportive on this] 15:14:09 s/on this/of this 15:15:04 Harald: based on the feedback, it sounds like moving forward with #202 would be worth looking into again 15:15:28 Guido: setMetadata feels like a better fit for this use case (although I was supportive of the copy constructor for a separate one) 15:15:42 Jan-Ivar: let's follow up on github 15:16:07 [TimP: any issue with having several transforms in sequence?] 15:16:26 Harald: if they're connected by pipelines, this creates good hand-off points from one to the next 15:16:57 Jan-Ivar: given this, I think the copy constructor would be better fit 15:17:32 ... setMetadata can end up with @@@ issues 15:18:23 ... not clear that we should extend the problem we have with data to metadata 15:18:48 Bernard: in WebCodecs, immutable data was a way to avoid race conditions with the work being done in a separate thread 15:19:12 Jan-Ivar: this is handled via the transfer step here 15:19:52 Bernard: setMetadata could only be called from the transform right? not after it has been enqueued? 15:20:56 Jan-Ivar: setMetadata can only be called if the object is still thereā€¦ 15:21:18 ... It feels to me like having setMetadata is redundant with the copy constructor 15:21:44 Harald: right now, the copy constructor is expensive 15:21:53 Jan-Ivar: let's continue the discussion on #202 15:22:30 RESOLVED: Consensus on on #186, discussion to continue on #202 15:22:37 Topic: -> https://github.com/w3c/mediacapture-screen-share-extensions/issues/4 Captured Surface Switching 15:22:37 [slide 17] 15:23:52 [slide 18] 15:24:51 [slide 19] 15:25:45 [slide 20] 15:27:33 [slide 21] 15:28:23 [slide 22] 15:29:12 [slide 23] 15:30:06 [slide 24] 15:31:29 [slide 25] 15:31:53 Tove: is this a promising way forward? 15:32:26 [TimP: Is simply supplying an event handler enough to discriminate ? Do we actually need the surface/session property?] 15:33:21 Tove: we discussed this in the December meeting whether an event handler (back then, a callback) would be enough to discriminate 15:34:03 ... and there is a design principe that changing behavior whether an event handler is on 15:34:31 Jan-Ivar: indeed; there are cases where that would be OK 15:34:38 ... we haven't talked about stopping tracks here 15:35:19 ... it might be OK for the user agent to optimize away user visible behavior when it comes to how quickly the indicators state/permission UX change 15:36:08 Jan-Ivar: for backwards compatibility, I think we're in agreement the UA could optimize the case when no event handler has been added 15:37:01 Tove: the original proposal was that you would always get the two kind of tracks which if you don't need it would still need to be managed 15:37:18 ... hence this new proposal that lets apps pick which tracks they want 15:38:07 Jan-Ivar: If I opt-in to the surface track, what would getDisplayMedia return? 15:38:29 Tove: I'm proposing getDisplayMedia returns the session track, and the event exposes the surface track 15:38:42 ... but I'm open to other approaches 15:39:25 Elad: what if we had a getter for the session track, but only return the surface track from getDisplayMedia 15:39:42 ... that way you don't have to wait for an event, you could access to either at any point 15:40:15 ... stopping for unused surface tracks could be handled by the capturecontroller 15:40:27 Jan-Ivar: I like the behavior and concepts of surface/session tracks 15:40:49 ... but asking developers to pick one upfront feels artificial 15:41:13 ... I could move from one tab to another tab with audio, but then stay in tab+audio mode moving forward 15:41:28 ... hence why I was proposing to expose both and let the app close the ones they don't want 15:41:46 ... I was initially worried this would lead to confusing indicators 15:41:54 ... but Youenn convinced me this could be optimized away 15:42:58 Harald: if I want to write an app that handles switching of surfaces and have code that covers both cases, I would struggle to maintain two code paths to manage what gets presented to the end user 15:44:27 Tove: the problem I see with Jan-Ivar's proposal is that we lose the guarantee that one track represents one surface which I think is an attractive invariant 15:45:38 Jan-Ivar: I don't think Web developers need to care about that; there is an isolation principle that when switching from one surface to another, you're also switching sources 15:45:59 ... I like slide 19 - the only thing missing is stopping tracks 15:46:31 ... if a developer doesn't care about surface track at all, don't register an event handler 15:46:52 ... you would want to stop old tracks in the event handler 15:47:28 ... this would also let the developer choose live which tracks they can support 15:49:03 Elad: what happens if the app doesn't stop either track? 15:49:57 Jan-Ivar: the backwards compatible design is injection; would we be talking about ending that model? 15:50:47 RESOLVED: more discussion is needed on the lifecyle of surface tracks 15:51:06 Topic: -> https://github.com/w3c/mediacapture-main/issues/972 Racy devicechange event design has poor interoperability in Media Capture and Streams 15:51:06 [slide 29] 15:51:29 s/slide 29/slide 28 15:54:13 [slide 29] 15:54:37 Jan-Ivar: this is modeled on the RTC track event 15:54:51 Jan-Ivar: any objection to merging this PR? 15:55:27 Guido: what does "current result from enumerateDevices" mean? 15:55:57 Jan-Ivar: good point, I should rephrase that - it's the devices at the time of the event is fired 15:56:19 ... this would be a synchronous equivalent to what enumerateDevices would produce 15:56:44 Guido: I agree with the change, but the language should be clarified 15:57:12 Dom: is there an existing internal slot we could refer to? 15:57:32 Jan-Ivar: there is one, but with too much info in it, although we have an algorithm to filter it 15:57:46 RESOLVED: merged 972 with language clarified on current device list 15:57:58 Topic: -> https://github.com/w3c/webrtc-pc/ WebRTC API 15:57:58 Subtopic: -> https://github.com/w3c/webrtc-pc/pull/2961 Convert RTCIceCandidatePair dictionary to an interface 15:58:07 [slide 30] 15:58:27 Jan-Ivar: prompted by ongoing implementation of setCodecPreferences in Firefox 15:59:58 ... is it a good idea to trigger negotiationneeded as needed? if so, what would "as needed" actual encompass? 16:01:38 Present+ SunShin 16:03:05 Harald: when does setCodecPreferences make a difference? when you're in a middle of a negotiation, it will make a difference in the answer; it doesn't effect the local state, it can only change the remote state, which can only happen after negotiation 16:03:49 ... wouldn't it be simpler to just fire negotiationneeded? 16:04:13 Jan-Ivar: there are edge cases when you're not in a stable state and negotiationneeded is fired 16:04:38 ... it sounds like you're agreeing that firing negotiationneeded would be good 16:04:46 harald: I'm trying to figure out when to fire and not to fire 16:05:00 ... it could be we fire it when the list of codecs is different from what is in remote description 16:05:27 ... wouldn't fire when setCodecPreferences doesn't change the list (including because the negotiation trims down the list of codec preferences) 16:05:54 ... that would mean we need to have an internal slot to keep track the last codec preferences call 16:06:25 jan-ivar: probably indeed, if we want to optimize the cases where setCodecPreferences look like it would make a difference but doesn't 16:07:37 Florent: It's a nice idea to trigger negotiationneeded by sCP, but I'm worried about backwards compatibility issues 16:07:58 ... it could cause issues if apps get negotiation needed at unexpected times 16:09:11 ... given the complexities of identifying cases where it's needed and backwards compatibility issues, I'm not sure we can move forward 16:09:38 Jan-Ivar: negotiationneeded is a queued task that can't happen during a negotiation 16:09:52 ... in other words, you would face the same issues if that was handled manually by the app developer 16:10:06 ... although I recognize there may be concerns in the transition 16:11:07 Florent: sCP is already used by a lot of widely deployed applications - I agree this might have been a better design, but it's not clear changing it now is the right trade-off at this point 16:11:42 ... atm, negotiationneeded is triggered in a very limited number of API calls; adding it to another API call may break expectations 16:12:10 Jan-Ivar: if you're not using the negotiationneeded event, you wouldn't be affected by this 16:12:25 ... if you're using sCP in remote-answer, neither 16:13:01 Florent: this may be problematic if that was happen later in the middle of a transaction since apps wouldn't have been built to handle this 16:13:14 ... I'm also worried about the complexity of specifying "as needed" 16:14:10 ... maybe this could be obtained via a different mechanism, e.g. an additional parameter in addTransceiver 16:14:39 Jan-Ivar: thanks - worth documenting these concerns in the github issue 16:14:50 Subtopic: -> https://github.com/w3c/webrtc-pc/issues/2956 receiver.getParameters().codecs seems under-specified 16:14:51 [slide 31] 16:17:00 [slide 32] 16:17:57 [slide 33] 16:19:11 [slide 34] 16:21:10 Harald: the attempt was to make sure that we have the conceptual list that contains that we can possibly negotiation, and that we could add to this list over time 16:21:15 ... and this had to be per transceiver 16:21:28 ... I missed this particular usage of the list 16:21:38 ... we have to decide what we want to represent 16:22:00 ... if we want to make sure we represent only codecs that we are able to receive at the moment, unimplemented codecs can't be received of course 16:22:20 ... we could do this by making the enabled flag mean "currently willing to receive" 16:22:38 ... ie it would have to match the most recently accepted local description 16:23:57 Jan-Ivar: ok, so this sounds like there is something worth re-instantiating from the previous algorithm 16:25:30 Jan-Ivar: these slides would like apply to sendCodecs as well, but I haven't had the change to check in details 16:29:01 Topic: Background segmentation mask 16:29:01 [slide 37] 16:29:54 [slide 38] 16:31:15 -> https://drive.google.com/file/d/1vw8gLSGzdeqM7w1N7B4uolrxqE-8mU5f/view?resourcekey Video of the background mask demo 16:32:25 [slide 39] 16:33:33 Riju: in background mask, the original frame remains intact and the mask get provided in addition to the original frame 16:33:40 ... both frames are provided in the same stream 16:34:56 .. .we expect to put up a PR sometimes this week based on this 16:35:11 Elad: this looks very interesting 16:35:45 ... do I understand correct that the masks get interleaved in the stream? 16:36:24 Riju: the driver provide the masks data; the code on slide 39 shows how to operate on it 16:36:50 Eero: the order is first masked frame, then original frame 16:37:22 Elad: this could be confusing; could the actual frame be provided with metadata instead of providing it as a different frame? 16:37:38 ... getting all the data at the same time would seem easier 16:38:00 Riju: the synthetic frame was easier for demo purposes, but we could add something like you suggested 16:38:23 ... we got comments on the blur flag that having both the original and processed one was useful IIRC 16:40:11 Harald: this reminds me of discussion of alpha channels and masks which were very much about how to express the metadata 16:40:48 ... this particular approach has the question of how you transmit it 16:41:14 ... if this is encoded as metadata, the question is how it gets encoded 16:41:23 ... have you looked into encoded the mask in the alpha channel? 16:41:44 eero: in Chrome, the GPU doesn't have access to an alpha channel 16:42:16 Jan-Ivar: +1 that the alpha channel feels intuitively a better place for this 16:42:50 ... to clarify, this isn't a background replacement constraint 16:43:06 Riju: right, the app can do whatever they want with the mask 16:47:24 Bernard: currently we're not doing a great job of supporting the alpha channel - e.g. webcodecs doesn't support it 16:47:29 ... it's just being added to AV1 16:47:38 ... lots of holes currently 16:47:59 ... I would encourage you to file bug and spec issues 16:48:15 Riju: as Elad mentioned, this would be mostly for local consumption 16:48:40 Frederik: is the API shape sustainable, e.g when adding gestures detection, face detection? 16:48:51 ... can we add them all to metadata? 16:49:11 Riju: we've been looking at these other features 16:49:42 Bernard: there were discussions in the Media WG to add more metadata to VideoFrames and how encoders should react to it 16:49:57 ... they're not preserved in the encoded chunks, they get dropped 16:50:47 Jan-Ivar: part of my comments on Face detection was to what extent this needed to be tied to the camera driver, and if instead this should be exposed in a generic media processing work 16:51:23 Riju: background segmentation is a priority because you get 2x or 3x performance improvements 16:51:44 Jan-Ivar: but is there something about masking that makes it worth dealing with it as a camera feature? 16:52:03 Riju: this is supported on any camera on Windows or Mac 16:52:37 ... it takes advantage of the local optimized models available to native 16:53:52 Harald: what controls what gets masked? 16:53:59 Riju: only background/foreground 16:54:13 RRSAgent, draft minutes 16:54:14 I have made the request to generate https://www.w3.org/2024/04/23-webrtc-minutes.html dom 16:55:00 Riju: if there is rough support, we can start with a PR and iterate on it 16:55:45 Jan-Ivar: my concern is how it relates to generic media processing pipelines 16:56:14 ... background blur was a way to mitigate what was being provided by platform and needed to allow for opt-in/opt-out from apps 16:56:28 ... opening up an open-ended area of features would be a concern for us 16:57:38 ... this sounds like something that ought to be part of generic media processing library 16:58:49 Riju: this provides a primitive that is generally useful across videoconferencing apps - green screen, blur, replacement 16:59:38 Bernard: there was another discussion in the Media WG to discussion media processing 17:01:06 dom: the tension is between doing a hardware-acceleration specific approach vs generic media processing 17:01:29 Riju: the motivation here is the performance boost 17:02:06 Jan-Ivar: no clear interest from us at this point, but this may change based on market interest 17:02:09 RRSAgent, draft minutes 17:02:10 I have made the request to generate https://www.w3.org/2024/04/23-webrtc-minutes.html dom 18:32:03 Zakim has left #webrtc