WebRTC WG March 2022 call

Meeting minutes

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Mar/att-0004/WEBRTCWG-2022-03-15.pdf

TPAC 2022 🎞︎

Dom: TPAC being considered as a hybrid event this year - please indicate whether you think you might join physically such an event?

[from online poll: 3 Yes, 4 No, 4 don't know]

WebRTC-SVC 🎞︎

[Slide 11]

Bernard: issue #68 relates to behavior of getParameters() - unclear about re-negotiation (vs before/after negotiation)
… PR #69 has proposed text that clarifies that we're talking about **initial** negotiation (before/after)
… if you re-negotiate, you'll still get the currently configured scalability mode

Harald: wfm

Jan-Ivar: is this correct? getParameters() algos are very explicit about what you get based e.g. on localDescription
… some come from pending, others from current

Bernard: let's say you change preference order for codecs, and you renegotiate (e.g. from VP8 with L1T2 to H264 that doesn't support scalability) - what happens then?
… at what point do things change?

JIB: even without setCodecPreferences, getParameters() may return different values depending on whether re-negotiation is happening or not
… e.g. if you have a local offer, it might affect the results

Bernard: looking at the VP8→H264 case, what should happen?

HTA: as long as you're sending VP8, you should get L1T2 back
… when you switch to H264, you get L1T1 back

Bernard: that's what I would expect and what the text tries to convey
… nothing changes until the new codec starts being used
… JIB, could you write up your concern in #68 ?

RESOLUTION: Continue discussion in issue #68

WebRTC-Extensions 🎞︎

[Slide 16]

Bernard: Fippo gathered a list of hardware acceleration bugs that has been encountered
… which raises the question of allowing to disable hardware acceleration
… WebCodecs provides an enum to hint about whether or not use hardware acceleration

[Slide 17]

Bernard: I looked into 2 approaches: setParameters, setCodecPreferences
… the first one doesn't really work since the envelope of changes may not include hardware alternatives
… it also only makes sense if mid-stream switch is necessary
… the second approach goes through re-negotiation via setCodecPreferences()
… How would you discover this?
… Media capabilities may need amendment https://github.com/w3c/media-capabilities/issues/185

Dom: should this be managed by the browser rather than left for developers to detect and manage?

Bernard: this would be useful *when* developers detect a problem so that they don't need to wait for browsers to react to it

Florent: there are also cases where a decoder interacts badly with a specific encoder

JIB: for setParameters, there are read-only properties
… putting it in codeccapability (which is returned to developers) means doubling the number of entries

Bernard: you may not have to return it from Capabilitiy

JIB: but then it doesn't fit very well with a notion of codec preference
… we've also moved fingerprinting surface to media capabilities
… I wouldn't want to reintroduce concerns without good reasons
… it doesn't seem necessary to include that info if it is tackled as a preference

Johannes: I understand this as developer wanting to disable hardware encoding as a short-term patch to the browser getting it fixed
… it sounds like a recovery mode, more than a capability
… also agree it's hard for developers to use it, but that it would have its uses

Harald: routing around bugs is for specific implementations of the codec, which requires they know the specific implementation
… does that point toward media capability as the right way to go?

Bernard: that's where you'd find out if it's "smooth", "power efficient", "supported"

Harald: if it's X's hardware encoder with software version Y, that may be the information you need to know whether or not to use it
… not sure that fits with the Media Capabilities model

Johannes: it would seem challenging
… Also, the bugs that have been identified seem to be browser-specific
… there are block-lists for this or that hardware; it may be worth investigate the possibility to move towards dynamic blocklists from browsers

Riju: we share the GPU blocklist defined in Chrome with our driver team to get them to be fixed platform by platfomr

Harald: no clear resolution, but some suggested paths worth exploring

[Slide 18]

Harald: issue #99 about RTP header extension
… if an implementation supports an extension, it doesn't show up in Capabilities at the moment
… is this problematic? if not, no change needed; if it is, we may need to surface that it exists but is disabled by default
… you can get the information by inspecting the offer, so this may not be needed

Bernard: it's a convenience in the use case; there will be scenarios where you don't want to set it on by default

Dom: is anyone asking for it?

JIB: if this is for debugging, looking at the SDP is fine; if it's to control running code, it should be an API

Harald: the most likely example would be if transport-cc is not supported, I fallback to another congestion control
… I think it can be shimmed by creating an offer and dancing with a throw-away peer connection

Dom: not hearing a lot pushback, nor a lot of demand either; maybe wait until we have more demand if it can be designed in a way that is backwards compatible

Harald: yes, it can be done later in a backwards compatible

RESOLUTION: close #99 with no change

Avoiding the “Hall of Mirrors” 🎞︎

Elad: the proposal would to add a new member to the DisplayMediaStreamContraints à la includeCurrentTab to hint to the UA whether or not to include the current tab or not

[Slide 25]

Elad: influencing the user decision in picking display surfaces has security implications
… but I argue that in this case, it is not problematic: the risks of selection are of two nature:
… - the attacker influence the user to share a surface under the attacker's control
… - the attacker influences the user to share a tab with sensitive content (e.g. their bank account)
… but excluding-self is orthogonal to these

[Slide 26]

Elad: if we agree this is worth solving; the question becomes what's the default value should be
… if we make it optional, this could be left as a UA dependent default

[Slide 27]

Elad: a potential expansion would cover additional surfaces (e.g. screen)

JIB: #209 has the detailed discussion - what is the proposal we're reviewing?

Elad: I suggest adding a dictionary member (either include or exclude) that serves as a hint, with no change to current behavior

JIB: I like this API, but would want the default to be "false"
… I don't think this is so much about hall of mirrors - a symptom that the UA could address either ways
… the real issue is that in many cases, self-capture is NOT the intent
… long term, self-capture would be getViewportMedia
… some sites that want self-capture to be part of the selection - they would need to opt-in
… also, TAG guidance is that undefined maps to false

Elad: re default true - agree
… re alternative approaches Youenn suggest, I don't think ti works for current tab (it would work for current screen)
… I agree with your characterization that the root cause is if you're not ready to self capture
… I suggest we don't take getViewportMedia into account since there is little visibility in terms of its adoption
… I think we should avoid breaking apps, even if shortly

JIB: I think we should keep that separate from what implementations do
… here the question is what's the most frequent case, most sites wouldn't want to it

Elad: lost of self-capture happning every year; assume a lot of it not accidental

Youenn: re security, the current spec doesn't deal much with tab capture in that regard
… we're bringing more and more control to what UAs will show, and that means we need to strengthen the guidance to UAs
… Chrome has some mitigations in this space that might serve as a starting point
… If this is a hint, this is fine
… Some implementations might remove entirely the possibility to select the tab, that's something new
… hints allow to push users towards the more meaningful choice, but leave the user in charge of the final choice
… re hall of mirrors - I don't think this is solving it
… some native apps have implemented current-app blurring to solving the issue
… cropping would be another way to solve the issue
… if it's only a hint, it's fine; but if it brings a required behavior, I don't think we should go there
… also want more security guidance
… and keep issue open on addressing other aspects of hall of mirrors

Elad: could you help with the security guidance?

Youenn: Ideally would like to get the work that Chrome has done

Dom: +1 on a hint; if boolean is problematic, we can use an enum to avoid the default value fallback

Elad: happy to help with getting the security considerations with guidance from Youenn on what he wants to see

Harald: hearing overall support to continue in that direction, towards a hint

Display Surface Hints 🎞︎

[Slide 30]

Elad: similar to previous issue, but distinct
… some apps want to hint to the UA that it is will geared toward a particular display surface type
… I think there is agreement that this is worth supporting
… but we've struggled to find an approach that everyone likes
… I'm suggesting a compromise based on the discussion which would be:
… - use constraints as a mechanism
… - make it a hint with UA dependent behavior

Youenn: hint is fine; it could be a constraint as a model, but with an improved simpler WebIDL surface

Elad: reject on "exact"?

Youenn: "exact" would be ignored

Harald: -1 in integrating this in the proposal - I hate irregularities

JIB: +1 to Harald; "exact" is already a type error in getDisplayMedia which already narrows down the constraint mechanism
… agree with reusing displaySurface
… I have concerns with an app asking for a monitor - I don't think we should provide this level of control
… I proposed text to steer away users from monitor capture

Elad: this is a hint - UAs can decide not to follow it

Dom: with a hint, UAs can provide the best experience they can
… not sure the SHOULD would achieve much if the main target isn't interested in SHOULD

Youenn: the SHOULd owuld be useful for new implementors

Elad: there is merit to that
… non-normative language pointing to the risk would be good

JIB: the SHOULD already allows for this; given Chrome has a good motivation, this feels like an exact reason why SHOULD would be used

RESOLUTION: modulo discussion on SHOULD guidance, we adopt the displaySurface constraint proposal to manage Surface Hints

getViewportMedia update 🎞︎

[Slide 31]

JIB: FYI, there is a PR up to describe getViewportMedia which hopes to bring to a call for adoption soon

Viewport Capture Unofficial Draft

Youenn: we probably need a different set of constraints than the ones for getDisplayMedia
… re audio, we need to think about whether to include system level audio or just current tab

JIB: currently restricted to current tab

Harald: if it can't be isolated, no audio should be captured

JIB: there are pending PRs that I hope will be merged before we start the call for adoption

Elad: the general intent of this work is awesome; looking forward to see it implemented
… that said, until we see it adopted, we need to be careful in basing our decisions on this work, or consider relaxing some of the restrictions

Youenn: has there been any outreach to web developers re x-origin isolation?

Elad: the feedback I got from developers was this was a blocker for them

Bernard: ditto

JIB: I agree this is taking the long view here
… hence the flexibility we're showing on getDisplayMedia
… re using different constraints, we can change it when it shows as needed

Youenn: displaySurface would be one case where this is needed

MediaCapture Extensions proposals 🎞︎

[Slide 34]

Riju: this is follow up from a conversation that started at TPAC

[Slide 35]

Riju: PR #48 is allowing in-browser face detection
… when we showed this last time, the feedback included:
… - tie it to VideoFrame rather than MediaStreamTrack, which the PR reflects
… - future-proofing the bounding box approach - this is addressed with the Contour described in the PR, with a way for the developer to request something other than the default 4
… - another request was to have a face mesh - which is now exposed as an additional property (although there is no native support for it today)
… - face expression was raised as a concern, so we removed it
… - making face detection work with transform stream

[Slide 36]

Riju: we've put up an example to show how they would work together
… we've done early testing that shows improved power consumption - more specific numbers to be shared soon

Youenn: good to expose it on VideoFrame; but would also be good to expose in requestVideoFrame callback e.g. for use with canvas
… re using "exact" constraints - I would expect "exact" not to be allowed in this
… There seems to be switches to give hints to cameras - do we need several switches to allow per-algo enabling, or could we have a single "face detection" switch?

Riju: e.g. "is face detection supported"?

Youenn: why multiple switches if a single one is good enough, leaving it to the Web app to deal with what they're obtaining

Riju: for instance, contour points would allow future support for additional more detailed contours

Youenn: since the camera is doing the work, not clear we need to give more hints to the driver

Riju: contour/mesh were added for extensibility

Youenn: maybe reduce to what's implementable, while future-proofing it

Bernard: high level questions about the API surface
… I understand the supported contraints & capabilities are used to provide the basic parameters for the algorithm in the driver
… videoFrame.detectedFaces is already done by the driver
… as opposed to have a promise-based method to which the parameters would be given
… if your camera driver doesn't support it, you wouldn't have it

Riju: going through promises, this would impact performance and re do work the driver has already done
… OS level face analysis would duplicate computation already done in the driver

JIB: so, it's a camera API - only available to sources that are camera?

Riju: right

JIB: my concern is that there is another effort in the WICG, the shape detection API - how does it relate to it?
… would be unfortunate to have it to deal with face detection differently depending on the source

Riju: shape detection work on images, can be called multiple time
… no face tracking available, which helps detecting face across frames efficiently
… face detection is based on OS level face analysis, which duplicates the driver work and is less power efficient / robust
… we started from that API in our effort in this space - we feel this new approach gives much better results
… FaceDetector is only supported in Windows atm; the work has stopped afaict

Bernard: so you're saying the WICG work is not going ahead?

Riju: I can check the status with Reilly (but my team was the one behind the implementation)

Harald: I share some of JIB's worries
… we have functions today that depend on high quality face detection e.g. background blur
… I'm worried about having these different interfaces to solve the same problem
… esp if some interfaces end up proprietary
… if the proprietary interfaces provide much higher quality than what standard interfaces can provide
… hence my pushback on making contours and meshes available in the API
… I'm still not happy with the design that seems to be totally focused on axing this on hardware/driver resources rather than a representation API
… it has a bit of that flavor, but there is still a lot of a sense of configuring the camera
… also I'm surprised this only gives a 50% factor over media pipe
… but in general, this feels like a major new way of treating media information
… I'd like to see be proposed as a proposal, not as a set of API patches
… with an explainer, use cases, examples - that we typically put together before agree on taking it up

Riju: no need to configure the driver
… the PR includes examples

Harald: I'm thinking of what application would be use this for, what problems to solve

Dom: what an explainer would cover

Riju: I can come up with that

Dom: happy to help with the logistics of making it happen

Riju: is the question about whether this is useful or not?

harald: yes

bernard: or rather whether it handles all the use cases people want

Jan-Ivar: e.g. tying this with camera may become obsolete or too limiting
… having an API that isn't as strongly tied to hardware acceleration

Harald: I'd like to have a better understanding of which apps want a rectangle around a face

Youenn: encoders actually optimize around faces if such metadata are available
… +1 on defining API that can obtain metadata from the hardware or a TransformStream

JIB: among other things, having less hardware-dependency allows UAs to step in

[Slide 37]

Riju: backgroundBlur has more platform API support than replacement

Youenn: iOS has the ability to switch on & off background blur, fully outside of the Web app, and fully dynamic
… the Web app could not unblur if the user has set this us at the OS level
… (but not vice versa)
… that situation is not well supported by constraints
… we may need a way to surface whether a constraint *can* be changed (and to signal when it can no longer be changed)

JIB: this is a case where constraints work very well - the app states its ideal
… background blur is popular, would be good to support it

Youenn: I don't think "ideal" suffices to expose the situation
… re backgroundBlur level - it's not settable on iOS; are there platforms that would benefit from it?

Riju: no platform API supports this, but some software models have that parameters
… but I understand some platforms are working towards making it settable

Youenn: but without knowing the algorithm, setting a particular value would be hard for developers
… we may need a boolean instead

JIB: part of the question is whether this needs to be controllable by apps vs the UA

harald: in audio, we've encountered cases that it's valuable to tell have manipulating settings that are supposed to be useful in the driver, but actually creates issues
… e.g. double echo cancellation control
… the most important control we have is to turn platform effects off; the second was to detect the situation to ask the user to turn it off

Riju: on the last three proposals (lighting correct, face framing, eye gaze correction), any sense of interest?
… the goal is to give options to developers on whether or not to use hardware capabilities

Bernard: should we get back to this in April?

JIB: from Mozilla's perspective, we don't have strong interest in this approach given possible interop cross-OS issues
… we don't see any urgency

Harald: for face detection, we have a pretty solid way forward via the explainer with use cases and justifications to support adoption
… some of these additional camera controls may fit into that new document
… if we accept constraints as a way to control camera drivers, grouping them together make sense

JIB: but adding individual constraints is something we've used mediacapture-extensions in the past

Youenn: the complexity of a boolean constraint is very different from the more complex Face API detection

Dom: I'll work with the chairs to agree on a clearer path forward then :)

– DRAFT –
WebRTC WG March 2022 call

15 March 2022

Attendees