WebRTC October 2022 meeting – 18 October 2022

Meeting minutes

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Oct/att-0001/WEBRTCWG-2022-10-18.pdf

Encoded Transform - Overflow from TPAC 🎞︎

Harald: during TPAc, we discussed the concept of a packet API, with an explainer, use cases and architecture - not yet done
… other issues didn't get covered

Issue #109 & #119 Depacketization order 🎞︎

[Slide 12]

Harald: packets don't arrive order on the network (they get lost or retransmitted)
… frames need to be in order for the decoder
… in general, a transformation is simpler when happening in decoding order
… this requires a jitter buffer in front of the decoder
… if the transformer itself introduces jitter, it doesn't get compensated
… currently Chromium has the jitter after the transformer

Bernard: this isn't the only place where we're encoutnering this problem
… are you imagining an explicit API for jitter buffer - e.g. a jitter buffer provided as a transform stream?

Harald: we could say "frames arrive in the order they arrive in", vs "the UA reorder them, incl waiting for frames" (probably not good), with a flag allowing one or the other

Youenn: I recall we discussed this previously
… iirc, we thought that in-order matched the Web developers expectations
… it may make it harder to implement for UA
… we should look at use cases where having out-of-order would be a benefit
… it's a possible footgun; if there are good use cases for it, then we should look for a solution, but otherwise, we should stick with in-order as in the spec

Bernard: for the crypto use case, is out-of-order even doable?

Youenn: for SFrame yes
… the counter may not be monotonic in that situation
… which would lead to dropped frames
… but it shouldn't be an issue from a decryption perspective

Bernard: so is out-of-order a speed concern?

hta: my worry about is in-order is in the case of lost frames
… without nack, rtx - you have to give up at some point
… if we accept in-order frames, we accept that lost frames will cause delays of some magnitude

dom: the wait-for-loss delay could be provided by the developer?

youenn: having both options would create complexity for developers
… if the transform is taking sometimes 2ms and sometimes much longer, a jitter buffer would then be beneficial
… for decryption or metadata passing, it should be fairly stable
… not sure of the value of a jitter buffer positioned after

hta: sounds like we need more time on use cases

Tony: moving the jitter buffer earlier means increased packet loss (given that it removes the processing time from the jitter buffer)
… there will be delays introduced from operating in a worker (rather than say a real time worklet)

youenn: currently chrome & safari implementations do out-of-order, which don't match the spec
… is Chrome planning to move to in-order? if implementations don't intent to align with the spec, that's also a consideration

hta: switching to in-order would require a compelling argument

jib: unless the transform has side effects (time-dependent), it shouldn't matter too much
… use cases would be helpful
… out-of-order seems a footgun - why should developers worry about that?

hta: if delay matters, in-order is a footgun

youenn: so we should use cases for both in-order and out-of-order

Issue #143 generateKeyFrame 🎞︎

[Slide 13]

TPAC discussion

Fippo: I wanted to suggest a 4th proposal - an empty return value, but allow the app to pass any subset of the rids to generate keyframes
… some encoders can generate keyframes from individual rids, others can't - it depends on the codecs

hta: the argument list API would thus be strictly more powerful without additional implementor burden

youenn: at TPAC our conclusion was one rid was good & simple enough; we didn't have use cases for 2 layers hitting the same frame
… an encoder-behavior dependent API isn't so helpful, but I agree it isn't a big burden to add either

hta: medium objection to single value, no strong objection to array - should we go with the array args?

RESOLUTION: pass an array arguments to generateKeyframes

fippo: I'll do the PR

Issue #158 / PR #140: add mimeType to metadata 🎞︎

[Slide 14]

HTA: figuring the meaning of a payload requires parsing the SDP to figure out what was negotiated
… the UA already knows which mime type is associated with which payload type

Fippo: another argument for it is that we don't specific how the data is structured
… being able to specify it as depending on the mime type would be good

youenn: thanks, this provides a good use case
… I think that's a pattern we already apply elsewhere

Fippo: in stats, indeed

Florent: isn't that available via getParameters? that exposes the list of payload types

HTA: but only if you have the PC

Fippo: that's harder in workers

RESOLUTION: Add mimeType to metadata

Issue #154: add rtp seqNum to inbound audio 🎞︎

[Slide 15]

Fippo: we have a custom decoder that relies on the rtp sequence number to detect loss in the audio
… relatively easy to add to incoming frames for audio
… more complicated for video, or for outgoing frames

HTA: for incoming audio, you have one packet resulting in one set of samples

youenn: coming back to in/out-of order, this would expose that
… if we're not doing in-order, this may create confusion

Fippo: in our use case, we have our custom JS jitter buffer; we don't reenqueue the frame into the pipeline

HTA: so that's also a use case for out-of-order: bring your own jitter buffer

Fippo: I can that written up as input to the other discussion

HTA: are we happy to expose this only for audio incoming frames, as a non required dictionary?

jib: I think it would still be interesting to understand better this one-ended use cases

HTA: ok, so let's wait for the use cases before proceeding then

Issue #131: Packetization API 🎞︎

[Slide 16]

HTA: any more comment on the packetization API beyond what was discussed at TPAC?

Youenn: we could start with things like MTU

HTA: in the frame API?

Fippo: MTU is mostly an issue for audio; I don't think we hit that threshold even with redundancy
… it becomes an issue with transform that changes size largely

Youenn: I don't think adding the MTU to the frame API would make sense - more at the context level, with changes signaled via events
… the frame is coming from the encoder, that's not where the MTU info lives

Media Capture Extensions 🎞︎

PR #77: Add MediaStreamTrack framesCaptured and framesEmitted 🎞︎

[Slide 18]

Henrik: `track.getSettings().frameRate` tells the configured, but not actual frame rate
… knowing the actual frame rate and the dropped frames would be useful
… some of that are exposed in stats, or in media playback metrics
… but the measurements are happening later in the pipeline - e.g. if the frame is dropped as soon as it is produced, it won't show up
… and we shouldn't force a webrtc PC to get track specific info

[Slide 19]

henrik: my proposal is to add a frame counter to track API, with a `getStats()` method

youenn: all APIs that are using an MST will allow you to get the number of frames that you're actually receiving
… Media capture transform gives you the count of frames, likewise for WebRTC & HTMLMediaElement
… what you want is focused between the sink & the source
… not sure I understand the diff between emitted and captured - that feels a bit specific to a specific pipeline
… in our model, it's not clear it would be easy to specific an interoperable way to distinguish captured from emitted
… so maybe focusing first on captured?

henrik: that makes sense; captured is the main gap in any case

jan-ivar: framesCaptured makes sense with a low-lighting camera use case (although we could revisit the constraint model for that)
… share Youenn's concerns for emitted, which feels implementation dependent
… I'm not sure about `getStats()` vs a constraint

Bernard: next step?

Henrik: I'm hearing support for framesCaptured in some form, and leave emitted for later

HTA: framesEmitted makes sense for consistency, but I see the argument that it may be redundant
… so let's start with framesCaptured as accepted

RESOLUTION: move forward with framesCaptured only for now

WebRTC & Simulcast 🎞︎

Issue #2732: Inconsistent rules for rid in RTCRtpEncodingParameters 🎞︎

[Slide 23]

jib: following up to our discussions started in TPAC about rid length
… limiting RID length to 16 characters would help with web compat
… an errata has been published on RFC8851 removing - and _ characters
… feedback on restricting the length would be hard as an erratum, but could be done in a -bis

hta: note that the empty string is outlawed by the BNF

dom: if we wait for -bis, are implementations going to be updated to match the allowed lengths?

florent: it should be possible to update chrome in that direction if we think if it's a good idea

hta: we don't know of any use case where 17 characters are necessary

youenn: we could limit to 16 characters with a note mentioning ongoing IETF discussion

jib: we could also have a separate decision on addTransceiver vs accepting incoming offers and answers

dom: I don't think it goes against the protocol to limit what the API accepts to generate rids (we should definitely accept any valid rid in O/A)

jib: but then you have an API that doesn't let you set values that you accept from a remote description

Issue #2764: What is the intended behavior of rollback of remote simulcast offer? 🎞︎

[Slide 25]

RESOLUTION: proceed with the proposed clarification

Issue #2737 / PR #2788: Modifications to [[SendEncodings]] from setParameters and sLD/sRD can be racy 🎞︎

[Slide 26]

hta: should that addition also be guarded by "if remote is true"?

jib: it would have to also have a "is an answer" gate - I can update the PR

henrik: if you restart and apply the steps again, wouldn't you implicitly rollback anything changed by the in-parallel operations?
… to do that correctly, you would have to wait until the SDP is applied

jib: this is run before we call the success callback
… we would wait until all setParameters are settled
… similar to if a remote description came right after

henrik: so this is done before the SDP process?

jib: right

Issue #2762: Simulcast: Implementations do not fail (and that seems good) 🎞︎

[Slide 27]

[Varun, Youenn depart]

[Slide 28]

RESOLUTION: close #2762 as is

WebRTC Extensions: Data Channels 🎞︎

Issue #114: RTCDataChannel transfer and maxMessageSize 🎞︎

[Slide 32]

florent: RTCDataChannels are transferable; maxMessageSize in RTCSctpTransport needs to be checked before sending data over a channel
… with a channel transferred to a worker, the maxMessageSize may be renegotiated on the main thread, which wouldn't be visible to the worker trying to send data

[Slide 33]

Florent: we could prevent changing the maxMessageSize during renegotiation - doesn't really happen in practice
… then that value could be kept in the transferred rtcdatachannel and keep the send algorithm as is
… the other aspect to consider is that the datachannel might have been transferred before the initial negotiation
… updating that value of maxMessageSize could be done as part of the "announcing a data channel as open" algorithm

dom: how confident are we that maxMessageSize can be frozen in renegotiation is web compatible?

florent: we would want to confirm that indeed
… sending too much data closes the data channel, so developers already need to pay attention

Bernard: the only time you would see this is in some weird maintenance scenarios - it should be very rare

florent: we can add some measurement in Chrome to see if that happens

dom: +1 to these solutions if they're web compatible

florent: so we can start with copying the value in opening, and measure web-compatibility of rejecting a renegotiated size

jib: would maxMessageSize end up being exposed on the data channel?

florent: we could do that, but that's not part of this proposal
… this wasn't useful in the context of running everything in the same context as peerconnection
… but with transferred channels, this makes more sense to consider

dom: it would be clunky not to expose it

Issue #115: Need to specify behavior of detached RTCDataChannel objects 🎞︎

[Slide 34]

florent: we need to document a [[Detached]] internal slot per the HTML spec for transferable platform objects
… we would keep [[isTransferable]] for the a datachannel that has already sent

[no objection]

jib: it remains unclear what happens to data channels when they're transfered in the main thread

[Slide 35]

florent: should transfered data channels be garbage collectable in the main thread? they're "closed" which make them collectable without a strong reference
… we could add a new state "detached" on top of opening, open, closed etc

Bernard: I prefer Proposal 2

jib: transferable objects are more like a clone, leaving an unoperative a clone
… so the broader question is how a [[Detached]] data channel should behave, how it should affects the existing algorithms

florent: because they're closed, this already impacts the methods close() and send()

florent: hearing some support to introducing a "detached" state, and a [[Detached]] internal slot

hta: what about garbage collection?

florent: let's discuss on github

Capture Handle 🎞︎

Elad: the proposal is to add some structure to capture handle
… fo crop targets (possibly with specific content hints)

jib: what about a messageport?

elad: still not structured, so leads to tight coupling

jib: I'm not sure we want to specific all the different things that application might need to agree
… per #11, I don't think we should re-invent postMessage

elad: a messageport informs the capturee they're being captured
… capture handle is a unidirectional message port
… being able to update the handle is useful given that the captured content is going to change
… a messageport can be useful in general, but for different use cases

jib: can we take a step back to understand the requirements we have?
… what API surface would be expose here?

elad: adding structure for a crop target in the capture handle instead of a simple string
… croptarget would have contenthints, and also add a messageport as a separate suggestion

dom: I think maybe a unidirectional messageport would work for what we want?

elad: several suggestions: move from string to object in capture handle - already needed for tightly coupled apps
… for loosely coupled apps, similar to what capture actions already allow, adding explicit support for croptargets / contenthints would go a long way to help

elad: what about the first suggestion - moving from a string to an object?

jib: would re-iterate #11 - let's not reinvent postMessage

elad: but this adds ability to decouple capturees/capturer

jib: but adding this to the browser API when it facts it's down to the app to use it or not
… that's odd

elad: it's similar to capture actions, not really more formalized in semantics

hta: are there establishing standardized protocols over messageport already?

dom: don't know off the top of my head, would have to check

hta: if we were to have to come up with that, this feels scary

dom: re going with an objects, would that be for serializable objects?

elad: yes

jib: the original purpose for handle was an identifier; now we're talking about passing objects, that changes the nature of the API

[Slide 51]

elad: a messageport doesn't address all the use cases - it's not structured
… I'm hearing support for the use cases, and not seeing an alternative proposal

jib: I remain a bit lost on the requirements we're solving with this API
… e.g. it could be a separate field instead of being part of the handle
… I'm not sure why should allow random web sites to specific crop targets

elad: slide 47 illustrates how this could be a purely user-driven process to avoid any user tricking

jib: but I'm not sold we need to allow this for random web sites

harald: what criteria would a web site eligible to this?

jib: with a messageport?

elad: but that makes it more likely to create situations where a web site might want to trick another provider?

jib: I still don't see a compelling case for making handle an object

elad: is the video provider / vc collaboration use case compelling?

jib: yes - we should figure a better way

elad: what way though?

hta: I'm hearing 2 proposals: make handle with some pre-defined fields for specific purposes (e.g. listing croptargets); and a messageport for tightly coupled apps
… these are 2 independent proposals that should be evaluated separately

jib: would allow any serializable object be safe to expose to the capturer? that seems problematic

elad: the security properties are similar (or even somewhat safer) than a messageport

Ben: with arbitrary objects, could that raise OOM concerns?

elad: 1. The captured page would be attacking itself first and foremost.
… 2. The captured page would be attacking an unknown capturer that likely doesn't even exist.
… 3. We can neuter the attack by ensuring the capture-handle is no-op on the capturer if the capturer does not read the handle. But that's for the future, if the attack comes up in the wild, which is unlikely.

ben: are there objects that could create risks for the receiver?

elad: not that I'm aware

dom: I think the chairs will have to propose steps to unblock this conversation
… maybe an explainer would help figure out all the considerations that need to be taken into account

hta: the chairs will do so

– DRAFT –
WebRTC October 2022 meeting

18 October 2022

Attendees