W3C

– DRAFT –
WebRTC June 2025 meeting

17 June 2025

Attendees

Present
Carine, Dom, Elad, Guido, Harald, Jan-Ivar, KacperWasniowski, PeterThatcher, RichardBarnes, TimP, Youenn
Regrets
-
Chair
Guido, Jan-Ivar, Youenn
Scribe
dom

Meeting minutes

Recording: https://www.youtube.com/watch?v=T66E24lWLoA

Slideset: https://docs.google.com/presentation/d/19txOdIoxN6SWEyeSyvE8mIkwzOSoejjG06xSOBayzZA/edit?slide=id.g2bb12bc23cb_0_0#slide=id.g2bb12bc23cb_0_0 (archived PDF copy)

Screen capture 🎞︎

getDisplayMedia: Distinct “Error” for Cancellation 🎞︎

[Slide 10]

[Slide 11]

[Slide 12]

[Slide 13]

[Slide 14]

[Slide 15]

Jan-Ivar: this seems like a valid use case to solve, if all browsers implement transient activations
… Firefox already returns a "NotFoundError" when hitting an OS limitation
… in an iframe with a policy limitation, it does return NotAllowedError which might indeed be improved
… [NB a mistake with the constraint attribute on slide 14]
… -1 on relying on prototype, would be preferable to have an additional attribute (e.g. boolean "userInitiated")

Youenn: could we use the Permission API to determine this already?
… it tells you if the user persistently denied access - which would be the same situation if denied by iframe policy
… I'm not sure if they need to be distinguished
… I think NotFoundError + persistent denied should be sufficient

Elad: the permission API can't be used for this: it's asynchronous, and it requires asking all the time

Youenn: if the user has put a setting to always deny getDisplayMedia, it will not be "ask" again, it will be systematically "deny"

Elad: I don't think this is possible in any browser at the moment, not even sure it should be exposed
… but that's an edge case compared to the majority of cases
… the asynchronous nature of the permission API doesn't allow to tie a rejection to a specific call of getDisplayMedia

Youenn: the same issue exists with getUserMedia - what is specific to getDisplayMedia here?

Elad: getDisplayMedia() will always involve a new prompt - each call has its own state
… getUserMedia can persist in a given session or even across sessions
… likewise, usually a choice on denying camera is unlikely to be changed, whereas canceling a screen share can reflect a temporary decision

Youenn: not seeing a big difference between the two

Harald: you listed 3 sources of failures - that point towards a string rather than a boolean (e.g. "user-denied", "internal-failure", "os-disallowed")

Elad: I tried this with mute reason before, but this wasn't too well received, hence I'm focusing on the narrowest need

Jan-Ivar: +1 to solving the use case; no strong opinion on the approach, but Youenn's point made sense to consider

Elad: the spec doesn't give any suggestion for persistent denial at the moment

Jan-Ivar: the Permission API allows to add a track handler which could probably solve this

Guido: the normal behavior is "always ask" - right after "deny" it becomes "ask" again

Jan-Ivar: it would have to be set to ask for all situations that aren't user initiated

Elad: that would still it a lot more complex than integrating it in the error message

Jan-Ivar: but there is value in not shipping new API surface

TimP: not convinced that synchronous handling is needed in the examples you gave (stats, prompt strategy)

Elad: one example: getDisplayMedia called from a button - should that button stay disabled while checking the permission state?

TimP: doesn't feel like a hard requirement
… I think the simplicity argument is more compelling

Harald: I don't how Youenn's suggestion would work

Youenn: let's explore on github what use cases can or cannot be addressed with the two approaches
… we need to see which situations would be distinguishable based on permission/error type

Guido: synchronicity is needed for correctness checking

RESOLUTION: continue discussion on github issue

Expose capturer/capturee overlap 🎞︎

[Slide 18]

[Slide 19]

[Slide 20]

[Slide 21]

[Slide 22]

[Slide 23]

[Slide 24]

[Slide 25]

Youenn: in the example with the PiP, the Web app would like to know the current situation and what would happen with PiP
… PiP might trigger an overlap of its own

Elad: I'll cover this when describing proposal #2
… but note that PiP is only one of the situations you want to manage
… the first question would be whether PiP has any chance to be useful - which it can't be if there is no overlap at all

[Slide 26]

[Slide 27]

[Slide 28]

Jan-Ivar: leaving aside our current open position on document PiP
… this feels like a good use case
… re proposal 1, could we expose only the percentages instead of the values needed to calculate them?

Elad: the absolute values can be recovered from the percentages

Jan-Ivar: but conversely requires exposing fewer attributes

Elad: I can live with either approaches

Jan-Ivar: re initial vs dynamic (slide 28), making it dynamic could lead to the browser fighting with the user

Elad: this would be a readonly value

Jan-Ivar: but there could be a fight via PiP

Elad: there is no such mechanism at the moment for the Web app to control the position of the PiP, and if there was, the problem would exist independently of exposing that dynamic data

Jan-Ivar: I think starting with starting with at "opening" time would be safer

Elad: the only use case I can imagine is if the web app wants to offer a better layout based on determining dynamic values changing, but it's arguably hypothetical
… some activity is already observable through the change of the captured window sizes

Jan-Ivar: let's focus on open for now, and discuss if/when to update the values
… I'd support initialPercentage

TimP: I don't see anything problematic in terms of privacy

Youenn: percentage seems fine; I'm wondering whether a hint to the UA would be sufficient
… Could you file an issue which would illustrate how the percentage would be used? sometimes what is overlapping is important
… my primary concern is about managing focus rather than dealing with PiP
… understanding the PiP use cases better would be important

Jan-Ivar: let's distinguish PiP as a remedy vs PiP as a source of overlap

Elad: the MVP for me is not to trigger PiP when disruptive, trigger it when it's clear it would help, and let web app developers explore the in between

RESOLUTION: agreement on the validity of use case, continue discussion on to-be-created github issue

WebRTC: How to find the remote fingerprint? 🎞︎

[Slide 31]

[Slide 32]

[Elad departs]

[Slide 33]

[Slide 34]

[Slide 35]

[Slide 36]

Peter: re slide 32 - does getRemoteCertificates() return an RTP certificate?

TimP: it returns a blob

Peter: it could be fixed to return something on which we could call getFingerprint(); re c), what's your worry about "being too late"?

TimP: receiving unexpecting RTP, earlier than expected; I'm keen to not have a connection up if it's not expected, before any packet exchange. The DLTS handshake is already finished by the time of getRemoteCertificates()

Peter: there are alternatives (although not great): you could set up a send-only RTP transceiver, and switch it sendrcv after the certificate is verified

TimP: you'd still be talking to someone that is likely hostile - letting this filtered by the upper layer feels weaker

Peter: but the existing API already allows to achieve this?

TimP: I think you can't; this would protect you against a malicious SCTP ack

Peter: you could wait to negotiate until you've verified the fingerprint

TimP: I'm trying to make a proper API; getFingerprint() was defined for IdP which never happened, I'm trying to make better use of it.

Peter: re slide 36, I like b), but it can only be done if it's fully bundled - which applies to other options

Harald: I think this is at the wrong level - fingerprints are a transport attribute, not a connection attribute. I also don't understand the API here - the fingerprint is what the other end tells you to verify they are who they say, it shouldn't be set by you

TimP: the problem we're trying to solve is persistence: imagine two devices have established trust and stored each other fingerprints in a trusted context, and want to re-use that trust relationship when exchanging SDP, à la QUIC zero, even in a case of a less trusted signaling mechanism

Youenn: is there a github issue to continue this discussion?

TimP: will file one.

Peter: d) would be sufficient to check the remote certificate is known (without having to manually parse the SDP)

WebRTC Encoded Transform 🎞︎

SFrameTransform mode per-packet vs per-frame 🎞︎

[Slide 39]

[Slide 40]

[Slide 41]

[Slide 42]

Jan-Ivar: why would it be difficult to support per-packet with ScriptTransform?

Youenn: ScriptTransform generates frame, would make it difficult to deal with packets. ScriptTransform could not deal with decryption

Jan-Ivar: ScriptTransform could still be used to deal with frame manipulation on top of SFrameTransform

Youenn: right, that's not the use case I'm discussing as out of scope

[Slide 43]

[Slide 44]

Youenn: there could be an option C, per transceiver

Jan-Ivar: the global option feels a bit artificially limiting; which mode you pick might depend on which SFU you talk to

Harald: if you want to switch from SFrame to JS or the other way around, you need to add new media lines - that argues against using RTCCnfiguration, it should be a transceiver parameter.

Youenn: will there be two m-line section, one using per-packet and another using per-frame? didn't seem very compelling

Harald: the use case I was thinking of is one using SFrame and the other not using it at all

Youenn: that's controlled by setting the transform on a transceiver basis; this is about setting the sframe flavor

Jan-Ivar: it does seem more like a per-transceiver thing; re slide 42, why an interface for the options?

Youenn: this allows the UA to check whether it's a scripttransformoptions and thus switch to a difference processing flow, allowing to have the type as a second parameter to the constructor

Jan-Ivar: not sure I like that - we can bikeshed that; the problem is the ambiguity between options and message to the worker - we had that issue with setting codecs as well. There are alternatives we could discuss in the PR.

Youenn: the PR uses a 4th option object as a dictionary

Jan-Ivar: that would be preferable from my perspective; let's discuss in the PR

Youenn: per-transceiver seems fine

Jan-Ivar: Per transform might be better

Youenn: that'll depend on how SDP negotiation happens; last I heard there would be an sframe a-line per-packet or per-frame to the m-section. In that case, both ends needs to abide to it. If this is only "use sframe", senders and receivers could do different things

Jan-Ivar: why not Transceiver.sframeTransform = true?

Youenn: that's equivalent to option A… Sounds like more discussion needed, and not clear Option B is getting much support. This will depend on the SDP negotiation for SFrame on which I expect progress this month

[Slide 45]

SFrame cipher suite #256 🎞︎

[Slide 46]

Jan-Ivar: LGTM

Harald: nothing to add beyond my comments on the PR

RESOLUTION: move forward with proposal

SFrame RTCEncodedVideoFrame on receiver side 🎞︎

[Slide 47]

Youenn: it's for the case where decryption is done via scripttransform

Peter: if you decrypt yourself, you could look at the payload format yourself

Youenn: the scripttransform on the receiver side exposes and rtcencodedvideoframe whose content is encrypted - what should the type attribute indicates?

Peter: if your job is to decrypt, it's not your job to set this; this sounds like 2 different stages

Youenn: it's require to expose a value here - it could be empty, or another value

Peter: saying "unknown" would be better

Youenn: except if that's provided by an RTC Header extension

Peter: likewise for spatialIndex/temporalIndex

Youenn: those are optional; they would only be exposed if the RTP header extension is present; the spec doesn't say how they're set in any case at the moment

Jan-Ivar: If I'm using SFrame and ScriptTransform, on the receiverside, do I get an encrypted or a decrypted frame?

Youenn: the former - it's up to you to decrypt it

Jan-Ivar: that's the old model we have today; why not have the browser deal with the decryption?

Youenn: you could do that

Jan-Ivar: that'd seem cleaner; you could imagine specifying separately JS transform and sframe transform since they're orthogonal

Youenn: When doing SFrame at the frame level and are already using ScriptTransform, being able to do decryption it as part of ScriptTransform feels simpler; for instance, you could apply different algorithms on the Sframe configuration itself based on inspection of the frame - this couldn't be done with separate transforms. Separating the two processing can already be done with the current API.

Jan-Ivar: worth clarifying the use cases on github

TimP: re frame type, it should be "encrypted" or "sframe"

Youenn: is there an RTCEncodedAudioFrame.type? We would also want the audio frame to know it is encrypted, so maybe this needs to be a different attribute to signal encryption that would apply both audio and video frames

Jan-Ivar: what is the use case for ScriptTransforming an encrypted frame?

Youenn: that's the current way of doing things; ideally, for ease of migration, we should allow apps to continue using ScriptTransform using native decryption, via the SFrameTransform stream writable/readable.

Summary of resolutions

  1. continue discussion on github issue
  2. agreement on the validity of use case, continue discussion on to-be-created github issue
  3. move forward with proposal
Minutes manually created (not a transcript), formatted by scribe.perl version 235 (Thu Sep 26 22:53:03 2024 UTC).