WebRTC June 2022 VI – 23 June 2022

Meeting minutes

dom: we need a new charter, and to discuss issue #70 (migrating work to another group)

youenn: would this require rechartering Media WG?

dom: not sure, possibly

aboba: we have a joint meeting with Media WG at TPAC

dom: but we can't wait until then to come to a conclusion

youenn: migrating items between charters is always challenging from our side, in administrative aspecs

bernard: do we need meetings with the media wg to discuss this?

dom: if we want to migrate the work, yes - but we first need to decide whether we want to do that

bernard: the fact that we have dependencies on videoframe makes it an interesting question to consider

elad: how busy is the Media WG? would it increase or decrease the pace of our work?

harald: that would have to be something to discuss with the chairs

dom: bringing it to media wg would bring more of a media perspective when this group comes more with a transmission perspective

harald: so the chairs will discuss this with the Media WG chairs

dom: no objection from the group in exploring this?

[none]

Region Capture Issues 🎞︎

Issue #17: the case for making CropTarget Sync 🎞︎

[Slide 13]

Jan-Ivar: this is about the cropping API - issue #17 is about whether it should be sync vs async
… long discussion on the issue - I'll be presenting arguments why it should be sync
… The TAG design principles include encouragement to use sync APIs when appropriate, with some exceptions (incl cross-process communications)

[Slide 14]

Jan-Ivar: the API currently in the spec (that doesn't have consensus) is async
… so you have to `await CropTarget.fromElement(element)`
… I'm proposing it doesn't need to be async,
… the purpose of the operation is associating an identifier with an element
… as currently specified, it can't fail
… the goal of the crop target is to share it over postMessage across documents

[Slide 15]

Jan-Ivar: multiple actions needs to happen before we're cropping anything
… cropTarget can be done ahead of time or later
… if it gets postMessaged to the top-level document, the said document can offer to the user to crop to that target
… it's only at the end of this process that there is a clear intent to crop
… UA can optimize this by running some of the underlying tasks early, but that creates risks in case this doesn't go through
… the complexity of that situation shouldn't be exposed to developers

[Slide 16]

Jan-Ivar: #48 is asking to allow failure from the minting process due to resource exhaustion
… which seems linked to optimizations implemented in Chrome
… The issue is that it allows random document to exhaust cropping resources
… since the API is not gated by permissions - creating DOS risks
… and may expose apps that don't deal well with that failure
… if resource allocation is moved to the cropTo step, this risk disappears
… similar to mediaSource.getHandle

[Slide 17]

Jan-Ivar: I believe my proposed API is faster, simpler and still optimizable
… I don't think we need to block on inter-process communication

[Slide 18]

jan-ivar: failing of optimization shouldn't imply a failure of the operation
… optimizations that influence API design decision tend to generate further issues
… because optimizations create new side effects
… there is no developer benefits to this API being async
… and there are general developer costs to async APIs - they create pre-emption points which risk data races
… multiplying failure points for rare error cases is a footgun
… and async is slower as I've shown

elad: the TAG offers design principles, but also meta principles - not sure it's productive to discuss other browsers implementation
… the fingerprinting risk is reasonable, but the spec doesn't force to surface this
… a promise doesn't have to fail in your implementation either

jan-ivar: the API should be designed on principles
… Mozilla is here to push a better API for the Web

Issue #17: the case for making CropTarget Async 🎞︎

[Slide 20]

Elad: this API is used in production through origin trial
… we know the API works and makes developers and customers happy
… we've learned a lot of lessons by implementing and shipping this

[Slide 21]

Elad: the question is whether minting a token needs to be sync or async
… I'll explain the Chrome decision and that it doesn't impact negatively anyone else

[Slide 22]

Elad: our implementation keeps track of which apps have a crop target
… once it's postMessage'd, this allows Chrome to optimize the time-to-cropping when the cropTo method is actually invoked
… that makes it simpler and more performant, in particular in case of CPU congestion in the capturing document

[Slide 23]

[Slide 24]

Elad: Chrome needs this to be async, whether Mozilla prefers it to be sync
… what's the harm of having an async API?

[Slide 25]

Elad: Priority of consistuencies goes through users, developers, implementers, spec writers

[Slide 26]

Elad: from a user perspective, seems mostly orthogonal
… from a developer perspective - what we've heard is that they don't care as it doesn't change much in their huge existing codebase
… implementors - as an implementor, we see this as an imperative for us

[Slide 27]

Elad: what is the negative impact then? Is this theoretical purity?
… given that IPC is involved, async makes sense

[Slide 28]

Elad: the TAG actually insists that theoretical purity doesn't trump implementers needs

[Slide 29]

Elad: the TAG discussed this API
… they were satisfied with the API
… they haven't seen much ergonomic gain for sync
… they also highlighted that interop should drive the work of the group

[Slide 30]

Elad: I've shown that consistuencies either don't care about sync vs async, and that at least one implementor needs async
… also, it's easy to go from async to sync, while the other way is difficult

#17 discussion 🎞︎

Bernard: by making it async, you're saying that this implies that when you have a cropTarget, you know it's ready to use
… we've had situations in the WebRTC WG where we've found that sync APIs needed in fact to be async

Elad: cf slide 22

Youenn: I'm surprised you're saying this is MUST - I thought this was implementable as a sync API, but that you favored the trade-off that async allows
… as I've pointed out, this trade-off creates a fingerprinting and interop issue - so a footgun
… so I thought both were implementable but sync would be more complex in Chrome

Elad: I don't agree with that characterization

Youenn: I'm surprised that both approaches are claiming to be faster - both can't be true

Youenn: usually sync APIs are more efficient, except when they creating a blocking situation, which I'm not hearing is the case here
… a sync API helps developers, at least a bit

Elad: resource allocation design decision is orthogonal to sync vs async
… I don't think the resource mitigation limits fingerprinting risks can be mitigated through a per-iframe limit
… in terms of performance, what needs to be fast is cropTo - anything before that, the user doesn't notice

youenn: sync cropTarget minting is faster, but you're saying this is not a relevant optimization compared to cropTo

elad: anything that comes before cropTo is irrelevant to user perceived performance (and mostly negligible in any case)

TimP: as a developer, I have a mild preference to keep it sync as it's easier to use
… I don't think a developer benefit to making it async
… there may be a user benefit in terms of the crop transition UX
… that may convince me of the value if it can be shown
… from the developer perspective, managing interesting failures on obtaining target would also be convincing, but I haven't heard that

Elad: it's a particular choice of trade-offs that require async, and async isn't going to harm other consistuencies

TimP: developers will suffer

Elad: I claim it's negligible

TimP: I don't agree - this can generate non-trivial changes, although it's certainly doable

Jan-Ivar: re no downside - nobody claiming that Chrome optimizations aren't useful
… I don't understand why these optimizations can't be done with a sync API
… why can't you implement a fallback?
… other downsides include fingerprinting, DOS, proliferation of async, failure management for developers
… why does it need to be async?

Harald: code complexity itself is a risk; this particular implementation has been used and tested
… anyone that depends on cropTo and doesn't notice it failing as an issue
… it's time to stop this discussion - we have seen that Chrome claims that a sync implementation would be make it significantly more complex
… the impact on developers is irritating but not fatal
… we have not seen compelling arguments that we need to change what has been proposed
… I don't see consensus for change, there is an implementation of the current spec - I suggest we declare the API to be async and move on

Jan-Ivar: I hear ease of implementation - complexity in the API trumps complexity of implementation
… because of the priority of consistuencies

Harald: a more complex API that does the right thing is better than a simple API that does the wrong thing

Elad: I'm not seeing consensus, but I don't think there are remaining benefits to discuss this

dom: we could either run a vote, or wait for more implementation experience

TimP: I would be inclined to say that getting other implementation is most important
… sync would be more elegant, but it doesn't look like we're going to get that

Issue #18: Is CropTarget name too generic? 🎞︎

[Slide 33]

Youenn: "crop" as at term isn't used too broadly so far, so probably OK
… not sure that "target" helps

[Slide 34]

[Slide 35]

Youenn: "object whose sole purpose is to be given cropTo" - may not be limited to elements
… the term CropRegion might work better to represent
… and would align with the spec name ("region capture")
… thoughts?

TimP: the name should reflect that it is a token, not a region or a target itself
… if it's opaque, it should say so

Youenn: it may not remain opaque

TimP: a region sounds like something you could do math on, e.g. calculate its surface area
… which you can't

elad: similar reservation - region feels something with coordinates; also, a cropTarget isn't static, it can move, which cropregion makes more misleading

harald: this is bikeshedding; I don't see benefit in changing it

Bernard: +1 that CropRegion is confusing; I prefer the current name

Youenn: any interest in clarifying the definition (i.e. whether it's a reference to an element or something more generic)
… I guess that can be done later

RESOLUTION: close #18 without changing the name of CropTarget

Issue #63: Cropping non-self-capture tracks 🎞︎

[Slide 38]

Elad: currently you can crop only to current tab
… I suggest we allow cropping arbitrary tabs

[Slide 39]

[Slide 40]

Jan-Ivar: a concern is that it might allow sites to censor themselves when captured

Elad: the capturing app can ignore crop targets
… in fact, cropping automatically would make no sense

Jan-Ivar: I want the group to be aware of the risk

Elad: but is it likely?

Jan-Ivar: I won't predict the future; I don't see other issues with this

harald: please write this up in the github issue; I don't understand the risk

TimP: I support allowing this beyond self-capture

Elad: can we agree that by next meeting we agree to expand this unless a compelling case is made against it?

Jan-Ivar: imagine a bank wanting to redact what would get shared over screen capture
… can write this up by next meeting

Harald: I support this too

dom: me too

Making CropTargets stringifiable 🎞︎

[Slide 41]

Elad: making croptargets stringifiable would help e.g. for communication over capture handle
… not sure I understand the risks

Youenn: a string makes it much more difficult to garbage-collect a croptarget

Jan-Ivar: +1 to Youenn
… I haven't heard use cases that justify this
… having GCable croptarget is good to keep

youenn: if a croptarget comes with resource allocation, being able to end these allocations is a good thing

elad: you can't just associate the string to the element, and when gc'ing the croptarget, remove that association

TimP: this removes the opacity that I relied in my previous support
… stringifying makes it harder to reason about the safety of this

elad: the only difference between the two is equality

jan-ivar: there are differences in garbage collection

TimP: from a developer perspective, there is a difference
… there are very limited number of paths to get a cropTarget
… once it's a string, many more paths can be used

Youenn: can you bring that argument to the github?

Elad: would like to get resolution to this; we can skip the predictable errors

harald: part of the issue seems to be about reconstructing a croptarget from a string (not about stringifying per se)

Face Detection 🎞︎

Face Detection explainer

[Slide 45]

[Slide 46]

riju: this shows our proposal helps reduce power consumption - almost 2x at 15fps compared to using TF.js

youenn: the 1st column has no face detection, and the 2nd is doing face detection in the driver?

tuukka: right

youenn: in some OSes, the two might be equal if the camera is doing it systematically

riju: indeed, in Android this might be the case

[Slide 47]

Riju: having a persistant id is very important for face tracking
… re keeping the probably optional - sure, but all platforms provide this
… a developer may use this to decide to apply further processing (e.g. funny hat)
… but open to making it optional
… Re VideoFrameMetadata, you suggest coordination on WebCodecs?

Youenn: yes, we need to engage with them to find the right construct

Riju: re API surface, we started with a minimal set, increased it based on feedback
… but we can re-reduce it for the MVP
… e.g. remove the mesh parts
… the contour was Harald's request
… face landmarks are usually important in post-processing, think we should keep in MVP

youenn: my point was in terms of priorities & focus
… e.g. for the next 6 months

riju: removing mesh, but keep landmarks

harald: I still have a problem with the API
… the power consumption improvement is nice-to-have
… attachment to the videoframe is nice
… but still unclear what to use it for
… the explainer doesn't help much with it
… what can I do with the output of that API?
… what the MVP would be viable for?

riju: e.g. landmarks would be used for post-processing, e.g eye-gaze correction
… the platforms only give bounding boxes at this stage

Harald: I would like to a more complete use case

Youenn: one use case is that some encoders optimize based on specific bounding boxes

Bernard: +1 to youenn - segmentation helps with encoding

Jan-Ivar: the explainer talks about attaching to videoframe, but the API is still anchored in mediastreamtrack (e.g. for capabilities)
… how would this API be usable on non-camera sources?
… e.g. on recorded videos

riju: we couldn't use the same platform APIs to get the power consumption benefits

youenn: I think cameras should be our primary target
… for recorded videos, you could add this through a media capture transform

jan-ivar: adding these metadata through the transform?

youenn: yes

eero: our proposal has support for setting custom metadata in videoframe

harald: the constraints are used to instruct the driver to produce the info, which is then attached to the videoframe
… that makes sense to me
… but writing up the enhanced encoding use case would help making compelling

riju: any support for prototyping this?

harald: yes - we need to find compelling applications

riju: also heard support from youenn

jan-ivar: I still have some concerns whether this would reveal difference across platforms
… would suggest raising an issue on Mozilla's standard positions

bernard: useful to prototype; the metadata discussion should be brought to the WebCodecs folks

riju: will follow up accordingly

– DRAFT –
WebRTC June 2022 VI

23 June 2022

Attendees