WebRTC November 2021 Virtual interim

Meeting minutes

Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf

Recording: https://www.youtube.com/watch?v=NHErekrFlSE

Media Capture Transform

[Jan-Ivar gives updates on discussion with WHATWG re Streams]

Harald: we have issues that need solving, and we're fairly confident we'll be able to solve them.

[Harald presents how to evolve Jib's and his proposal]

Youenn: Is audio part of the presentation?

Harald: Audio is out, we have no consensus

Youenn: want stronger guarantees that we'll be able to solve the issues

Dom: We need to be convinced we have a solution for those problems.

TimPanton: What happens if you don't close it (a stream)?

Harald: They will hang around until garbage collection.

TimPanton: We need to be clear about the expectations to avoid surprise browser slowdowns.

Jib: We could add notes in the document about life cycle issues.

Youenn: As long as we do not have a good API, we cannot make progress. I feel more positive now.

Jib: I feel confident, we have 2 solutions for this problem.

Bernard: Looking at samples, we have not done all the required cleanup, especially related to error conditions.

Bernard: AI: First step to make the changes for the draft, announce it on the list and make a call for adoption when it’s ready.

Region Capture

Elad presents the rationale for the Region Capture API and shows code examples.

Youenn: You can use message channels to transfer, it shouldn't be a problem. I think you can use decorations instead of crop ids

Elad: we want it to work with GDM, GVM and extensions. It should work with multiple tabs from different domains. It seems this is a faster way to ship.

Dom: What happens if the targeted element is no longer visible?

Elad: There's a spec draft with more details on my Github. It suggests that the track is muted when the element is no longer visible and unmuted when it’s again visible.

Youenn: Adding features like cropping to GDM tab capture can be risky, we want to move away from it. If we add to GDM people might stick to it while GVM is safer. The more we provide benefits in GVM and the less there is in GDM, the safer the web is.

Jib: We should avoid the word "id" or exposing an ID in an API as it might invite a lot of scrutiny. Suggests adding an interface instead. It helps with garbage collection.

Jib: Using applyConstraints() instead, which could have with the timing and presenting an uncropped frame might be better.

Elad: Constraints might be more difficult implementation wise and have weaker guarantees on when they apply.

Jib: Constraints may be underspecified and we could improve applyConstraints() as well.

TimPanton: I think we should do this quickly. I like the idea of an opaque token better than transferring a stream. Not fond of constraints and in favor of the API.

Elad: What is a problem with string IDs?

TimPanton: It's about avoiding a conversation regarding potential risks. Advises making an interface and adding strings later.

Elad: Open to that idea.

Dom: UUIDs can indicate returning users. It's better to avoid the discussion with the Privacy Interest Group.

Youenn: Supports token but not constraints.

Harald: Interface can be serialisable, and if so, they can be strings in disguise. UUIDs are fine if they are short lived, but there is a privacy risk.

[Discussion about if we want a call for adoption for the document or if a call for review is possible.]

Jib: Suggests we can have the CFA after the document is updated to use an opaque token.

Dom: Will Elad offer to be an editor?

Elad: Yes

Dom: AI: Need to confirm this with the chairs. Updating the document with the token and then CFA.

WebRTC NV Use Cases

[ Slide 30 ]

Bernard: lots of systemic changes since our FPWD of NV use cases, in particular due to the pandemic and technological advances

[ Slide 31 ]

Bernard: NV use cases include 3 use cases from the original use cases that could be improved
… not much API support for these - essentially webrtc-ice and webrtc-svc
… lots of references to ICE
… the first 2 related use cases aren't necessarily the most requested
… not clear that ICE is such a key enabler either
… the video conferencing use case might be approached differently, with webcodecs rather than by extending WebRTC

[ Slide 32 ]

Bernard: in terms of new use cases
… only API that applies is rtcdatachannel in workers
… no progress on other, despite the use cases themselves having found broad adoption

[ Slide 33 ]

Bernard: for use cases 3.6-8, quite a bit of activity in the WG - mediacapture transform, machine learning
… not so much for 3.9
… still discussions around intersection with ML
… no specifics on face/body tracking API
… nothing on data exchange in service workers

[ Slide 34 ]

Bernard: does the doc reflect current industry priorities? current state of tech (incl webcodecs)?
… none of the use cases have all their requirements met by API proposals
… only 4 have at least one proposal
… substantial gaps in requirements around data transport
… The document doesn't talk about the long term architecture
… current doc seems to build on the view of "extending" webrtc, but this may need to evolve based e.g. on the webcodecs view of the world

TimP: in terms of environment change, a lot more is happening on mobile than used to be and what we would have expected

Bernard: true; incl for game streaming

TimP: also for e.g. small social / family gathering
… also a question worth addressing is the P2P architecture
… in reality, most of the WebRTC usage isn't P2P
… WebTransport is not designed for P2P
… maybe WebRTC is P2P and WT is the centralized architecture?

Youenn: some of the new technologies like WebTransport are providing more flexibility
… where WebRTC is more of an integrated system
… the need for metadata synchronization (e.g. in metaverse) seems very relevant, needs more detailed anchoring in our APIs
… RTP headers extension might be exposed for non-browser handling
… also +1 to TimP's point about mobile browsers - getting more interop across iOS / android would be good
… e.g. handling of muting in case of priority audio capture in mobile (e.g. in mobile phone)
… in general, providing more consistency across os/browsers would be good

harald: there is a lot of stuff that uses webrtc outside the browser - e.g. recently in a ring doorbell
… being able to contact these (pseudo-)webrtc endpoints from browsers is important
… our use case driven approach hasn't worked to well to track what's going on
… in terms of long term architecture, it's hard to manage - trade-off between consistency and fitness

anssi: ML WG chair here - we're very interested in making sure our WebNN API helps with the webrtc use cases

<anssik> https://github.com/webmachinelearning/webnn/issues/226

Integration with real-time video processing #226

anssi: we've started developing a prototype based on background blurring

Jan-Ivar: apart from funny hats, most of the use cases focus on PeerConnection
… we've seen lots of use cases around media capture (e.g. screen sharing)
… Mozilla takes use cases pretty seriously - some use cases are marked as not having consensus, would prefer we call for consensus or remove them

Face Detection API

[ Slide 36 ]

Riju: I hope to present proposals that help address 4 of the use cases that were presented
… today focusing on face detection, following the related breakout at TPAC last month

WebRTC Intelligent Collaboration TPAC 2021 breakout

Riju: we have an updated proposal for what an API might look like

Face detection proposal

[ Slide 37 ]

Riju: developers could request a specific number of points for the contour
… a face mesh is unlikely to be available in the short term, but documented it in the API for sake of completeness
… the proposal includes a set of expressions that can be obtained from drivers without DNN
… again, we can decide to remove items

Harald: contours is an improvement to square or rectangles - particularly needed for e.g. background blur
… I worry about relying on what's available from drivers
… instead, I would like us to approach this based on what's available in the frames of an MST, no matter how it was added
… likewise, I owuld like to be able to add that data to frames when I'm a producer
… the API should allow consumption, production and even refinement of annotations attached to a track
… e.g. a transform could improve the rough contour identified by the driver to bring an improved annotation downstream
… the shape of the API has the right amount of metadata, but it shouldn't be described as a one-way consumption API

Riju: I'll try to show a proposal for background blur early January
… what we're trying to provide here is processing free-of-computation, because it already happens in the driver

TimP: +1 to Harald in terms of enabling successive refinements, on top of the compute-free results
… I'm also nervous about the expression enums
… others are reasonable factual, while expressions is more subjective

riju: is the concern about restricting the list? blink and smile are available across platforms

TimP: my concerns is that the detection could be wrong, in particular for some subgroups
… given the level of subjectiveness

Riju: note taken

Jan-Ivar: what was the previous agreement on this topic? what is the question you're bringing to the WG?

Riju: last time I presented a proposal on top of image capture, it was suggested to bring to mediacapture-extensions
… also there was a request of making it more generic
… the goal would be to bring it to the mediacapture extension specs

jan-ivar: there would need to be a process on whether to adopt or not this API
… mediacapture-extensions sounds like a good place for a future proposal
… I'm not entirely sure how to deal with this for now

Bernard: this is interesting; I have concerns with emotion analysis in terms of accuracy
… in terms of how this would be used - it's a method on MediaStreamTrack, but it would have to be designed to work with Media Capture Transform
… I woudl see it as TransformStream to be used e.g. for background blur
… the information would be used to execute the blur faster
… the provided information (e.g. the contour) is meant to help processing something the GPU buffer
… when the information itself might be CPU-side

riju: the performance on chromeos/window based on CPU doesn't depend on GPU memory
… I think it's giving good results

bernard: in terms of API shape, having this on mediastreamtrack feels wrong - I want it to work on a videoframe

Youenn: similar feedback to Bernard - exposing driver data is a decision at the mediastreamtrack level, but the data should not be on MST - it should be synchronized with videoframes
… either by getting it from the frame, or getting it at the same time as a video frame
… that's the kind of model that would make sense to me
… Given that the idea is to expose driver info, it's good to be as specific as possible
… we should separate driver-specific metatadata from more general approaches

Bernard: next steps?

Riju: could show some demos with performance numbers

dom: heard consistent feedback to anchor this in VideoFrame
… defined in WebCodecs

Youenn: let's have the discussion in mediacapture-extensions and identify an architecture there

Face Detection. #289

[now at https://github.com/w3c/mediacapture-extensions/issues/44 ]

– DRAFT –
WebRTC November 2021 Virtual interim

24 November 2021

Attendees

Meeting minutes

Media Capture Transform

Region Capture

WebRTC NV Use Cases

Face Detection API