WebRTC TPAC 2023 meeting – 12 September 2023

Meeting minutes

Recording: https://www.youtube.com/watch?v=xTGBiq4uUBU

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0010/WEBRTCWG-2023-09-12.pdf

[reviewing WebRTC related meetings and breakout sessions during TPAC]

State of the WebRTC WG 🎞︎

hta: I've looked at which repos have had what activity since last TPAC
… mediacapture-main still progressing towards Rec, incl removing unimplemented features, e.g. permissions query
… mediacapture-extensions has been active as a holding pens for new ideas
… webrtc-pc fixing bugs and merging some extensions, using the mechanism to keep track of diffs from the Rec
… somewhat painful, but it works
… stats behaving like a living standards - adding and removing stuff
… lots of discussions on use cases
… webrtc-encoded-transform, looking at new functionalities
… webrtc-ice - may need to be killed off now that the work is migrating in webrtc-extensions
… platform processing needs synchronization with the Media WG, there were discussions yesterday about video filters
… Screen Capture - largely pursued in SCCG with whom we meet on Thu
… The remaining docs are mostly stable - they're implemented but haven't seen much change
… some because no change is needed, some because noone is driving the changes

Youenn: re mediacapture-main being close to Rec, it still have 30 open issues
… should they be triaged, moving some to -extensions
… or focus the group's effort on closing it

Dom: +1 on triaging issues - JIB is looking at it AFAIK
… on the unattended repos, some have pending requests from the community; I would hope we get more attention/ownership on them

JIB: re mediacapture-main, I don't think there are any big remaining open issue - please chime in if you feel differently

HTA: one of the sticky issue is around localization of device names; discussion between TC39 and the I18N WG have made significant progress on establishing a JS pattern for this
… I personally feel that lack of progress means we shouldn't block on this

WebRTC Extended Use Cases 🎞︎

Relationship between use case docs and other resources 🎞︎

[Slide 19]

[Slide 20]

(no objection expressed on the proposal)

[Slide 22]

(no objection expressed on the proposal)

[Slide 24]

HTA: if there are links from use cases to API proposals, they need to be updated each time new proposals emerge
… I would rather say that use cases should usually not link to API proposals

(no objection expressed on the amended proposal)

Funny Hats use case 🎞︎

[Slide 26]

hta: e.g. if background blur can happen at different layers, frames need to be annotated with the fact they've been processed - which means deviating from the standards frame format
… which needs to be surfaced in SDP as a non-standard codec

Bernard: we need to add matching requirements

HTA: I can make a PR towards that

Bernard: there is also a metadata related to this topic

HTA: I wanted confirmation the WG recognizes these usages as relevant

<youenn> youenn: To help this use case, being more precise about the potential type of metadata/application might be more convincing.

Riju: re background blur happening at different layers, …

henrik: re metadata - are we thinking app-specific metadata or standardized ones?

hta: so far, we've seen app-specific metadata, and I don't see them going away even if there are standardized ones

henrik: +1

hta: they need to be standardized when we need interoperability

JIB: SDP negotiation seems to assume we need to modify metadata in frames - another approach might be to bring media sync in data channels
… adding details on when in-frame metadata is needed

henrik: the problem with datachannel is there is no guarantee of having received it before the frame
… plus the need to timestamp the info

RESOLUTION: HTA to get a PR to articulate the requirements for this use case

Low Latency Streaming: Summary 🎞︎

[Slide 27]

Low Latency Streaming: Game Streaming use case 🎞︎

[Slide 28]

[Slide 29]

Bernard: issue #80 - gaming has e.g. spatial audio; they need raw audio data to manage this
… also related discussions to WASM codecs

[Slide 30]

[Slide 31]

Bernard: do we need to add related requirements? L16 is popular and important for spatial audio in gaming

HTA: what is raw audio data? we should be specific if we mean L16
… "access to" is unclear on whether that's at the sender or receiver side, which impacts what goes across the wire

Bernard: in the original issue, it's more on the sender side (although the receiver will need to know about the codec)

HTA: which then intersects with the SDP munging requirement

Youenn: I think the real issue is about pluggable audio codecs (rather than raw audio data)
… Access to raw audio data is already provided by the platform

Henrik: how is this different from WebCodecs?
… is this distinct from our discussion of plugging WebCodecs in RTCPC?

Youenn: it's specific to audio; WebCodecs is likely one of the solutions or part of it

HTA: should we rename this to pluggable audio codec?

Bernard: does this issue need to be addressed for the low latency use case? or does it go somewhere else?

HTA: it's not linked to specifically to low-latency use case; it has a requirement for pluggable audio codec, but not for this particular use case

Youenn: cloud gaming is behind this use case, but it's not about low latency streaming
… not sure if this is about non-standard codec, or codec not supported by the platform

Peter: +1 to Henrik - the desire to use a different (possibly a custom/proprietary) one is not streaming specific, it's broader

Bernard: do we need a new use case for this?

Peter: if we don't have one, we should

Youenn: this is linked to wanting it on RTP (rather than via a datachannel)

Peter: data channels for audio is problematic due to e.g. congestion control, or compatibility with existing RTP endpoints

Youenn: but the latter conflicts with non-standards codecs

HTA: although not with codecs not supported by browsers

RESOLUTION: detach #80 from low-latency use case; create a new use case related to pluggable audio codecs

Bernard: I can take care of defining the use case

Peter: is this specific to audio? or should that encompass video?

Bernard: the OP was specific to audio
… the question exists in video for HEVC - but maybe we can assume it will be natively supported in WebRTC

HTA: a major difference between audio & video is the integration with breakout box

[Slide 33]

Harald: we should write an explainer that explains that jitterBufferTarget satisfies N38

RESOLUTION: add a note in webrtc-extensions that jitterBufferTarget satisifes req N38

SunShin: presenting PR #118 to clarify game streaming requirements

[Slide 34]

[Slide 35]

<jesup> Requirements make sense. Layered codecs for example may help with recovery and consistent latency at cost of resolution changes. UAs are allowed to display corrupt frames today instead of pausing, but there's no way to ask for it

SunShin: these 4 four requirements would help improve the current experience of game streaming

[Slide 36]

Bernard: RFC8884 already recommends RPSI - the IETF requires it for the stack
… the issue is not whether to implement, but there are practical issues

[Slide 37]

Bernard: RPSI support in HEVC was mentioned in discussion on HEVC in WebRTC
… support for RPSI with VP8/9 was from removed from libwebrtc in 2017
… if we can figure a way to address this in the code base, the W3C requirements would not be an obstacle
… since it's already recommended by the IETF, I'm not sure we need a requirement for it

Youenn: is there a case where a WebApp would want to enable or disable RPSI, or can that be left to the UA?

SunShin: we would want app-control for RPSI

Bernard: this can be negotiated with SDP munging (if it was in WebRTC)

Youenn: but SDP munging isn't great
… if we think it needs to be app-exposed, then having the requirement explicit is good, so that it gets exposed via an API

Bernard: I don't recall a stack with app control on RPSI; it's typically something the RTP stack just does it
… that's how it was originally implemented for VP8/VP9

Elad: @@@
… we would want to switch between different modes

HTA: there was a Google decision to switch from RPSI to Loss Notification - any recollection about this?

HTA: looks like we need more information
… some of it needs to go back to IETF

RESOLUTION: re N48 & N49, investigate what information needs to be passed on the wire and figure if it's IETF or API work to satisfy this requirement

Bernard: this will come out of the HEVC implementation in WebRTC from which it should bubble up

SunShin: re N50 - we still need a transport wide RTCP

Erik Stags: there should be implementations for that

Bernard: we're running out of time, maybe finish discussion in overflow session

HTA: 38-43 for overflow

Modifications for Low Latency Fanout (WebRTC Encoded Transform) 🎞︎

HTA: platform objects with serialization and de-serialization don't require the output be JS observable

Peter: you're talking about encoded frames

Palak: yes

Peter: would a constructor for encoded frames sufficient? would it alleviate some of these needs? or is it unrelated?

HTA: we discussed constructors; for this particular use case, defining the constructor would require defining all the metadata
… whereas only modifying the needed metadata is a significant simplication
… constructors are needed for other use cases

Peter: re forwarding encoded frame to another PC - how do you know the available bandwidth to know what you're capable of forwarding or not

HTA: the use case we're considering is for a PC on the same network where we assume similar bandwidth characteristics

Henrik: when forwarding or sending a frame, the question of having enough bandwidth is unrelated

JIB: I have a concerned on how we got here; WebRTC encoded transform is taking a pipe out of the wall and plug JS to allow to transform incoming frames
… this has become a very powerful opportunity to bring more changes
… it changes the ability for UA to make optimal decisions based on understanding what's happening
… I wonder if we need a different / more specialized API surface
… re structure cloning, encoded frames don't come with different ownership (e.g. GPU) so I don't see an issue

Youenn: similar concerns in terms of use cases
… bandwidth estimation may get fooled by having JS sending much bigger or lower data than the UA estimated
… a different API shape would be preferable, with making the JS being responsible for encoding, telling the UA clearly needs to approach bandwidth estimation differently
… not sure what mechanism would be used in case of packet loss
… and a Web app can't detect whether the network is susceptible to packet loss
… an API where the Web App is repsonsible for encoding and is notified when something gets lost

HTA: we know this can be done because we have done it
… there are alternative API shapes; I would try to pursue a few ones, but it hasn't been done
… this particular one works and will be part of a solution in any case

Youenn: I think a dedicated API would be better

HTA: I think webrtc-encoded-transform is the right document; I wouldn't want to wait for the perfect solution to arrive

Youenn: this isn't a transform - this is about sending

HTA: new API shapes haven't be conclusive, and using the existing API has proved workable

Youenn: we still don't have a solution for the metadata issue with encoded transform

WebRTC & Media Capture 🎞︎

[Slide 34]

Henrik summarising the issue

<caribou> youenn: I would prefer that you would only start counting if there's a call. that's not the case since it's a getter

youenn: we encourage developers to do more on the main thread with that kind of API

PaulA: is this reimplementing stable state?
… that's efficient, nice

Paul: stable state attribute in the HTML spec can be used for that.

HTA: update the counters whenever in stable state. A cache is fine to remove the overhead if you do not read.

Henrik: how about I modified this PR or do a follow-up to handle stable state.

Jan-Ivar: naming might be a remaining issue (videoStats vs. stats) but synchronous is fine.

Follow-up issues to handle are whether using stable state and whether renaming.

Issue 137 Undesirable prompt from selectAudioOutput({deviceId}) if valid device removed 🎞︎

[Slide 57]

Jan Ivar explaining the issue

youenn: how about extending the possibility for enumerateDevices to expose output device a little bit longer.

like done in Firefox for prompting cameras and microphones.

HTA: If devicechange event is fired, app will know AirPods are gone. If reload happens before, it will not know.

HTA: one time courtesy is not great. How about returning NotFoundError.

JI: Maybe a SHOULD would be good enough to give enough leeway for User Agents.

Issue 2899 No way to observe DataChannel-only transport events in initial negotiation 🎞︎

[Slide 59]

[Slide 60]

Peter: like proposal A. Proposal C might be to add a method getIceTransport().

Henrik: is it useful to have sctp transport?

Jan Ivar: yes (see code in slide 53)

Florent: like proposal A but not make maxMessageSize nullable.

HTA: I like option C. For dtls transport, web pages had to do extra code to get it.

Jan Ivar: during initial negotiation, it is not clear which transport we will expose.

HTA: with maxBundle, you will get one transport. And you can compare the transport objects if you have multiple.

HTA: Proposal C (without doing proposal A).

Rough consensus to go with proposal A in the room.

ICE Controller API 🎞︎

Issue 171 ICE candidate pair selection 🎞︎

[Slide 64]

HTA: why ArrayBuffer for transactionId?

Peter: transactionId is 20 bytes.

youenn: is it ok in a window? should it be in a worker?

Peter: nice to have in worker as well

RESOLUTION: consensus to move on with a PR for that API.

RTPTransport 🎞︎

[Slide 74]

Peter: new createRtpTransport()

Youenn: can you mix legacy senders/receivers with RtpTransport?

Peter: Yes

it should be feasible

Stefan: what about overreaching bwe?

Peter: packets would be queued and/or dropped.

Jan Ivar: what about creating several rtp transports?

Peter: will cover SDP negotiation later

[Slide 71]

Randell: should be discussed in this WG. Incremental API is better. Worker-only.

Jan Ivar: similar to Randell, worker-only is good.
… examples with WebRTC encoded transform would be good.

HTA: this is interesting. Why not having a RTPTransport getter on the peer connection?

Peter: tied to what is negotiated. Maybe this is enough.
… trying to avoid the situation where packets sent is not what was negotiated by SDP

Bernard: Good to have workers.

Maybe good to separate them.

youenn: workers is good.

I would concentrate on webcodecs + RTPTransport. Interaction with WebRTC encoded transform can come later.

youenn: having discussions at frame vs. packet level APIs would be good to come to a conclusion.

TimPanton: would be nice to be able to demultiplex processing.

HTA: limitation if there is a need for a new SDP m-line to talk with other peers. Maybe it is not that big a problem nowadays since apps tend to control both sides.

Jan Ivar: might need to think about back pressure (events vs streams)

HTA: sense of the room is that there is interest. Need to sort out the thread story (main thread/non main thread). Whether interop or not (SDP changes or not).

Consensus to get this work done here.

Should probably be a new document

SDP negotiation 🎞︎

Summary: PT to MimeType -> arguments on both sides. Setting packetisers is also accepted but needs further discussion.

Back to WebRTC use case issues 🎞︎

<vr000m_> The Ultra low latency broadcast with fanout seems to be Application Layer Multicast, ALMs. There are quite a few papers and implementations with RTP.

Next step for PR #123: continue discussion on GitHub with the intent to merge.

PR 123

Wrap-up and next steps 🎞︎

Transport draft could be a candidate for a new document.

HTA: we have more meetings this week

breakout on WebRTC UCs tomorrow as well

[end]

<caribou> s/RyosukeNIwa//

– DRAFT –
WebRTC TPAC 2023 meeting

12 September 2023

Attendees