WebRTC March 26 2024 meeting

Meeting minutes

Youenn: a possible way would be to extract information to store it on the cloud that the Web page can access it on reload; but that's not helping when you're not connected
… I would prefer not to expose key materials to JavaScript; it's not clear this would help with this use case anyway
… in terms of privacy, if you're allowed to keep it for a longer lifetime, it harms privacy

Jan-Ivar: +1 to NOT exposing key materials to JS - this shouldn't be a conclusion from the E2E

Dom: seems hard to reconcile persistent storage with privacy protection

Youenn: Chrome has a persistent storage API that may help
… but it's not implemented in Safari

StorageManager.persist() API

Dom: +1 on exploring this rather than try to design a new solution in this space from scratch

Should web applications be aware of reaction effects added by OS to camera feeds? #118 🎞︎

[Slide 18]

Youenn: some OS allow users to insert reactions based on user gestures into captured videos
… this isn't in control of the app or the browser

[Slide 19]

[Slide 20]

Youenn: Jan-Ivar made a proposal in PR #141

Elad: great solution for what is a significant problem
… looking forward to having it

Youenn: there is an API in current MacOS that allows to say whether reactions are one or off
… it's not currently possible to disable/enable reactions at this point, but this is under consideration

Elad: enabling/disabling is what we're most looking forward to

Youenn: the user is in control of enabling/disabling, so apps could still guide the user toward that as a first step
… this proposal is compatible with both level of flexibility

Elad: practically speaking, I'm skeptical teleconferencing apps would try and guide users, so really hoping we get to the 2nd level

Jan-Ivar: I'm supportive too; very similar to background blur
… adding this API removes an obstacle towards making it available with possible read/write capabilities in the future

RESOLUTION: Proceed with proposal based PR #141, with name bikeshedding left to editors

Media Capture Specifications 🎞︎

Clarify each source is responsible for specifying mute/unmute/ended and constraints behavior 🎞︎

[Slide 24]

Jan-Ivar: we agreed on a resolution to #984 at our last meeting - that requires some spec clean up in places where mediastreamtrack sources get defined

Jan-Ivar: I opened a batch of issues to match:

Review mute/unmute/ended and constraints on tracks from getDisplayMedia()

Review mute/unmute/ended and constraints on RTCRtpReceiver's track

Review mute/unmute/ended and constraints on new VideoTrackGenerator().track

Review mute/unmute/ended and constraints on tracks from element.captureStream()

Review mute/unmute/ended and constraints on tracks from canvas.captureStream()

Review mute/unmute/ended and constraints on track in audioContext.createMediaStreamDestination().stream

[Slide 26]

Elad: +1 to closing mediacapture-screen-share#298

Youenn: +1 too

RESOLUTION: close mediacapture-screen-share#298

[Slide 27]

Youenn: muting tracks on stopped transceivers sound like a webkit bug, indeed; is there a matching WPT test?
… re unmuting ahead of RTP - waiting for the RTP could create a race where a frame gets sent while you still think you're muted; how valuable is it to wait for the RTP packet?
… since there isn't interop on this yet, there may be room to change/simplify

Jan-Ivar: I'm not aware that this has creating an issue for FF
… an event that fires on main thread will be put potentially on a busy queue
… but it may be worth filing an issue to reconsider the current spec'd behavior, ideally with some measures/experiments to back the question
… or we leave webrtc-pc#2915 open?

Youenn: a separate issue sounds good
… any input from Chromium crbug 941740

Guido: that's behavior inherited from old language; we need to look at it to check web compat, but this shouldn't block progress on the spec
… I agree with the spec clarification

youenn: if this isn't web compatible, this would be useful feedback

RESOLUTION: close webrtc-pc#2942 as reviewed and webrtc-pc#2915, open an issue on timing of unmuting on RTC reception

[Slide 28]

RESOLUTION: close mediacapture-transform#109

[Slide 29]

Youenn: AFAIK, this is only implemented in Chrome; we probably should start from what's implemented
… when content is tainted, it cannot be untainted except by restarting the streaming (ending the track)

Jan-Ivar: FF has an implementation (prefixed as mozCaptureStream)

Youenn: then that would be worth integrating in our analysis as well

[Slide 30]

Jan-Ivar: mediacapture-fromelement#99 is a dup of mediacapture-fromelement#82

Youenn: +1; we should look at origin-clean - I think a tainted canvas stay canvas, but I could be wrong
… otherwise, we should consider ending the track

Guido: in Chrome, the muting behavior is based on whether we're sending frames

Jan-Ivar: what happen when calling captureStream(0) in that situation?

Guido: it will be muted too

Jan-Ivar: we should clarify since I don't think there is interop on atm

Guido: +1 on aligning within web compat
… I don't envision big challenges

RESOLUTION: no implementation-defined behavior for mediacapture-fromelement#82; close mediacapture-fromelement#99

Youenn: we should file tests for this
… Safari is not firing events afaict
… I'm still not seeing how a tainted canvas can be cleaned - this could be a different issue

Jan-Ivar: +1 on new clarified issues

[Slide 31]

Jan-Ivar: this is up to the WebAudio WG, but is there any additional advice we would like to give them?

Youenn: it makes sense; I wonder about muting when suspending a context? not sure there is a demand for it

Jan-Ivar: maybe an ended event in case of a terminal action on the audiocontext?
… in any case, please chime in on the webaudio issue if interested WebAudio/web-audio-api#2571

mimeType ambiguity: "video/webm;codecs=vp8" means? 🎞︎

[Slide 32]

Youenn: +1 to align the spec with Chrome
… the audio codec might depend on the container

RESOLUTION: Align spec with Chrome

RTCRtpEncodedSource 🎞︎

[Slide 35]

[TimP joins]

[Slide 36]

Guido: this is the result of iteration on github, now matching the pattern of encoded transform, with only a writable part

[Slide 37]

[Slide 38]

[Slide 39]

Guido: having this wouldn't preclude a packet-based API if useful for other use cases

[Slide 40]

TimP: I like this; some worry that it's not obvious that it's the same frame from multiple sources
… not sure how the metadata get preserved in the routing

Guido: re metadata, this is under control of the overall application - this doesn't have to be solved at the level of that spec

TimP: right, but I wonder if there is a question the timeline of exposing metadata in the frame

Tony: the header extension dependency descriptor should help

TimP: but will that be available in the worker?

Guido: right - but this can be considered separately

Youenn: the API shape is fine, the use case as well; there may be some refinement on names
… in encoded transform, we're making it hard to trigger errors
… here there are many error triggering opportunities, which will need to be signalled in a timely manner to the Web app
… this is basically modeling an encoder - we should do that modeling work early on

Guido: Harald's model with bandwidth congestion etc can help think about these questions

Youenn: for instance, we should list all the error cases we can think of, and determine how they get handled (signal to the app, drop the data, ignore)

Guido: +1

Bernard: so this is an alternative to addTrack/replaceTrack - does that imply additional work where other parts of the system assume it's a track but it's not, e.g. stats?

Guido: since this is very similar to a track, it's probably a matter of having clarifications in the text
… from the point of view of the system, it's very similar to a MediaStreamTrack - we need to identify the parts where they differ (e.g. it's already encoded)
… the initial proposal I made had identified a few monkey patches, but they weren't substantial

Bernard: would track events pass through? are there equivalent to ended events? how would replaceTrack work since it depends on the state of the track?

Guido: the track signals shouldn't be much of problem

Youenn: re impact on stats, this would have been more difficult on the initial proposal; in this new API shape, the UA can determine that stats are no op

Jan-Ivar: thanks for explaining the use case; it looks ambitious, but I like the small step approach
… it does raise a number of questions at the model level as we just discussed
… I like the API shape
… re constructor for RTCEncodedVideoFrame - is there a copy of the arraybuffer when constructing a new frame?

guido: you should (although it could be optimized for copy on write)

Jan-Ivar: other potential sources, e.g. WebCodecs?

Guido: there could be yes
… this would require better integrating encoded chunks in rtcencodedframes

Jan-Ivar: in the fan-out/fan-in example - if one of the nodes has specific encoding capabilities, this may create limitations

Guido: this is an app decision

Jan-Ivar: how would you deal with simulcast layers?

Guido: that's part of the error modeling we need to describe with appropriate signals

Jan-Ivar: this looks promising to me

Philip: do you have an example of what kind of metatadata would be passed?

guido: e.g. consistent frame ids
… this is up to the application

Tony: the RTCEncodedVideoFrameMetadata dictionary expose the frame id via the header extension

Jan-Ivar: we have to figure how a sender without a track works, how setParameters work, ...

Youenn: we should definitely rely on WebCodecs

Guido: I'll explore creating an encoded frame from encoded chunks

Peter: getting these constructors is a fantastic idea

WebRTC Extended Use Cases 🎞︎

PR #118 Bandwidth feedback Speed(Configurable RTCP transmission interval) 🎞︎

TimP: I understand the need, but I'm bothered by milliseconds; it feels like it should be tied to the frame interval rather than an absolute time
… I like the general idea

Bernard: I'm not sure I understand the need for a control for L4S in the WebRTC API

Youenn: I had the same question - if the UA knows L4S is in use, it can just do it
… it feels like this would be something the UA would want to enable in any case when it's available

Sun: we think different apps may have different interval configurations
… video recovery in loss conditions may also need different parameters

Bernard: looking at RFC8888 talks about the overhead of the feedback channel

Jan-Ivar: Gamestreaming sounds like a good use case to support, but I'm not sure about making some of these under the control of the app vs UA
… maybe the requirement can be phrased in a way that avoids imposing a particular model

Bernard: the goal is to avoid building queues, reduce congestion
… I agree a more general statement would work better

Jan-Ivar: I'm hearing the current configuration of UAs isn't optimized for game streaming - a question if a different configuration would work better for all use cases or only some

Bernard: part of the challenge is that it is under deployment at the moment

Joachim: at Meta, we have also found the need for these APIs
… RTCP being restricted to 5% of bandwidth ties our hand
… in the game streaming use case, it would be useful to remove that restriction

Bernard: connections tend to be so asymetric though

Peter: not limiting RTCP for feedback from congestion control would make sense

Bernard: there can be a huge amount of feedback since the messages are large

Peter: but you're only doing this to figure out how much you can send - you can allocate more of that for the feedback with lower estimate
… this is part of the congestion control mechanism - it's a different category from the rest of RTCP

TimP: that illustrates why a hint would be useful - there are use cases where this would be detrimental
… e.g. in videoconference, you want to optimize the bandwidth for audio, not feedback
… in game streaming, this may be a very different trade-off

Peter: I think it makes sense to let the app to choose between the trade-off of amount of bandwidth and responsive in feedback messages

Bernard: I like Tim's idea of a hint that lets the UA decide
… there is also discussion on the jitter buffer target
… I wonder if there is a set of hints that we could describe that covers the needs, without making them settings

Jan-Ivar: the current N48 req is very specific - maybe it could say the app "must be able to lower the timing…" or have some influence. Would that work?

Sun: yes

PR #129 video decoding recovery after packet loss 🎞︎

[Slide 46]

Bernard: video recovery would be good - it's more complicated in the confenrecing case; there are a variety of proposals to improve this

[Slide 47]

[Slide 48]

Peter: it'd be great to have support for LTR frames or support for control of the temporal layer in general
… I'd be curious about how much we want to make this a WebRTC encoder API vs WebCodec API, or both
… but dependency control is an important API to have

Bernard: if it was on WebCodec, it would bubble up to us through some of the changes we're discussing
… I'd suggest to generalize the requirement about recovery reference control or additional recovery mechanisms

– DRAFT –
WebRTC March 26 2024 meeting

26 March 2024

Attendees