W3C

– DRAFT –
WebRTC October 2022 meeting

18 October 2022

Attendees

Present
Ben, Bernard, Dom, Elad, fippo, Florent, Harald, Henrik, Jan-Ivar, MikeEnglish, PatrickRockhill, Tony, Tove, Varun, Youenn
Regrets
-
Chair
Bernard, HTA, Jan-Ivar
Scribe
dom

Meeting minutes

Recording: https://youtu.be/cbLpKnU6RoI

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Oct/att-0001/WEBRTCWG-2022-10-18.pdf

Encoded Transform - Overflow from TPAC 🎞︎

Harald: during TPAc, we discussed the concept of a packet API, with an explainer, use cases and architecture - not yet done
… other issues didn't get covered

Issue #109 & #119 Depacketization order 🎞︎

[Slide 12]

Harald: packets don't arrive order on the network (they get lost or retransmitted)
… frames need to be in order for the decoder
… in general, a transformation is simpler when happening in decoding order
… this requires a jitter buffer in front of the decoder
… if the transformer itself introduces jitter, it doesn't get compensated
… currently Chromium has the jitter after the transformer

Bernard: this isn't the only place where we're encoutnering this problem
… are you imagining an explicit API for jitter buffer - e.g. a jitter buffer provided as a transform stream?

Harald: we could say "frames arrive in the order they arrive in", vs "the UA reorder them, incl waiting for frames" (probably not good), with a flag allowing one or the other

Youenn: I recall we discussed this previously
… iirc, we thought that in-order matched the Web developers expectations
… it may make it harder to implement for UA
… we should look at use cases where having out-of-order would be a benefit
… it's a possible footgun; if there are good use cases for it, then we should look for a solution, but otherwise, we should stick with in-order as in the spec

Bernard: for the crypto use case, is out-of-order even doable?

Youenn: for SFrame yes
… the counter may not be monotonic in that situation
… which would lead to dropped frames
… but it shouldn't be an issue from a decryption perspective

Bernard: so is out-of-order a speed concern?

hta: my worry about is in-order is in the case of lost frames
… without nack, rtx - you have to give up at some point
… if we accept in-order frames, we accept that lost frames will cause delays of some magnitude

dom: the wait-for-loss delay could be provided by the developer?

youenn: having both options would create complexity for developers
… if the transform is taking sometimes 2ms and sometimes much longer, a jitter buffer would then be beneficial
… for decryption or metadata passing, it should be fairly stable
… not sure of the value of a jitter buffer positioned after

hta: sounds like we need more time on use cases

Tony: moving the jitter buffer earlier means increased packet loss (given that it removes the processing time from the jitter buffer)
… there will be delays introduced from operating in a worker (rather than say a real time worklet)

youenn: currently chrome & safari implementations do out-of-order, which don't match the spec
… is Chrome planning to move to in-order? if implementations don't intent to align with the spec, that's also a consideration

hta: switching to in-order would require a compelling argument

jib: unless the transform has side effects (time-dependent), it shouldn't matter too much
… use cases would be helpful
… out-of-order seems a footgun - why should developers worry about that?

hta: if delay matters, in-order is a footgun

youenn: so we should use cases for both in-order and out-of-order

Issue #143 generateKeyFrame 🎞︎

[Slide 13]

TPAC discussion

Fippo: I wanted to suggest a 4th proposal - an empty return value, but allow the app to pass any subset of the rids to generate keyframes
… some encoders can generate keyframes from individual rids, others can't - it depends on the codecs

hta: the argument list API would thus be strictly more powerful without additional implementor burden

youenn: at TPAC our conclusion was one rid was good & simple enough; we didn't have use cases for 2 layers hitting the same frame
… an encoder-behavior dependent API isn't so helpful, but I agree it isn't a big burden to add either

hta: medium objection to single value, no strong objection to array - should we go with the array args?

RESOLUTION: pass an array arguments to generateKeyframes

fippo: I'll do the PR

Issue #158 / PR #140: add mimeType to metadata 🎞︎

[Slide 14]

HTA: figuring the meaning of a payload requires parsing the SDP to figure out what was negotiated
… the UA already knows which mime type is associated with which payload type

Fippo: another argument for it is that we don't specific how the data is structured
… being able to specify it as depending on the mime type would be good

youenn: thanks, this provides a good use case
… I think that's a pattern we already apply elsewhere

Fippo: in stats, indeed

Florent: isn't that available via getParameters? that exposes the list of payload types

HTA: but only if you have the PC

Fippo: that's harder in workers

RESOLUTION: Add mimeType to metadata

Issue #154: add rtp seqNum to inbound audio 🎞︎

[Slide 15]

Fippo: we have a custom decoder that relies on the rtp sequence number to detect loss in the audio
… relatively easy to add to incoming frames for audio
… more complicated for video, or for outgoing frames

HTA: for incoming audio, you have one packet resulting in one set of samples

youenn: coming back to in/out-of order, this would expose that
… if we're not doing in-order, this may create confusion

Fippo: in our use case, we have our custom JS jitter buffer; we don't reenqueue the frame into the pipeline

HTA: so that's also a use case for out-of-order: bring your own jitter buffer

Fippo: I can that written up as input to the other discussion

HTA: are we happy to expose this only for audio incoming frames, as a non required dictionary?

jib: I think it would still be interesting to understand better this one-ended use cases

HTA: ok, so let's wait for the use cases before proceeding then

Issue #131: Packetization API 🎞︎

[Slide 16]

HTA: any more comment on the packetization API beyond what was discussed at TPAC?

Youenn: we could start with things like MTU

HTA: in the frame API?

Fippo: MTU is mostly an issue for audio; I don't think we hit that threshold even with redundancy
… it becomes an issue with transform that changes size largely

Youenn: I don't think adding the MTU to the frame API would make sense - more at the context level, with changes signaled via events
… the frame is coming from the encoder, that's not where the MTU info lives

Media Capture Extensions 🎞︎

PR #77: Add MediaStreamTrack framesCaptured and framesEmitted 🎞︎

[Slide 18]

Henrik: `track.getSettings().frameRate` tells the configured, but not actual frame rate
… knowing the actual frame rate and the dropped frames would be useful
… some of that are exposed in stats, or in media playback metrics
… but the measurements are happening later in the pipeline - e.g. if the frame is dropped as soon as it is produced, it won't show up
… and we shouldn't force a webrtc PC to get track specific info

[Slide 19]

henrik: my proposal is to add a frame counter to track API, with a `getStats()` method

youenn: all APIs that are using an MST will allow you to get the number of frames that you're actually receiving
… Media capture transform gives you the count of frames, likewise for WebRTC & HTMLMediaElement
… what you want is focused between the sink & the source
… not sure I understand the diff between emitted and captured - that feels a bit specific to a specific pipeline
… in our model, it's not clear it would be easy to specific an interoperable way to distinguish captured from emitted
… so maybe focusing first on captured?

henrik: that makes sense; captured is the main gap in any case

jan-ivar: framesCaptured makes sense with a low-lighting camera use case (although we could revisit the constraint model for that)
… share Youenn's concerns for emitted, which feels implementation dependent
… I'm not sure about `getStats()` vs a constraint

Bernard: next step?

Henrik: I'm hearing support for framesCaptured in some form, and leave emitted for later

HTA: framesEmitted makes sense for consistency, but I see the argument that it may be redundant
… so let's start with framesCaptured as accepted

RESOLUTION: move forward with framesCaptured only for now

WebRTC & Simulcast 🎞︎

Issue #2732: Inconsistent rules for rid in RTCRtpEncodingParameters 🎞︎

[Slide 23]

jib: following up to our discussions started in TPAC about rid length
… limiting RID length to 16 characters would help with web compat
… an errata has been published on RFC8851 removing - and _ characters
… feedback on restricting the length would be hard as an erratum, but could be done in a -bis

hta: note that the empty string is outlawed by the BNF

dom: if we wait for -bis, are implementations going to be updated to match the allowed lengths?

florent: it should be possible to update chrome in that direction if we think if it's a good idea

hta: we don't know of any use case where 17 characters are necessary

youenn: we could limit to 16 characters with a note mentioning ongoing IETF discussion

jib: we could also have a separate decision on addTransceiver vs accepting incoming offers and answers

dom: I don't think it goes against the protocol to limit what the API accepts to generate rids (we should definitely accept any valid rid in O/A)

jib: but then you have an API that doesn't let you set values that you accept from a remote description

Issue #2764: What is the intended behavior of rollback of remote simulcast offer? 🎞︎

[Slide 25]

RESOLUTION: proceed with the proposed clarification

Issue #2737 / PR #2788: Modifications to [[SendEncodings]] from setParameters and sLD/sRD can be racy 🎞︎

[Slide 26]

hta: should that addition also be guarded by "if remote is true"?

jib: it would have to also have a "is an answer" gate - I can update the PR

henrik: if you restart and apply the steps again, wouldn't you implicitly rollback anything changed by the in-parallel operations?
… to do that correctly, you would have to wait until the SDP is applied

jib: this is run before we call the success callback
… we would wait until all setParameters are settled
… similar to if a remote description came right after

henrik: so this is done before the SDP process?

jib: right

Issue #2762: Simulcast: Implementations do not fail (and that seems good) 🎞︎

[Slide 27]

[Varun, Youenn depart]

[Slide 28]

RESOLUTION: close #2762 as is

WebRTC Extensions: Data Channels 🎞︎

Issue #114: RTCDataChannel transfer and maxMessageSize 🎞︎

[Slide 32]

florent: RTCDataChannels are transferable; maxMessageSize in RTCSctpTransport needs to be checked before sending data over a channel
… with a channel transferred to a worker, the maxMessageSize may be renegotiated on the main thread, which wouldn't be visible to the worker trying to send data

[Slide 33]

Florent: we could prevent changing the maxMessageSize during renegotiation - doesn't really happen in practice
… then that value could be kept in the transferred rtcdatachannel and keep the send algorithm as is
… the other aspect to consider is that the datachannel might have been transferred before the initial negotiation
… updating that value of maxMessageSize could be done as part of the "announcing a data channel as open" algorithm

dom: how confident are we that maxMessageSize can be frozen in renegotiation is web compatible?

florent: we would want to confirm that indeed
… sending too much data closes the data channel, so developers already need to pay attention

Bernard: the only time you would see this is in some weird maintenance scenarios - it should be very rare

florent: we can add some measurement in Chrome to see if that happens

dom: +1 to these solutions if they're web compatible

florent: so we can start with copying the value in opening, and measure web-compatibility of rejecting a renegotiated size

jib: would maxMessageSize end up being exposed on the data channel?

florent: we could do that, but that's not part of this proposal
… this wasn't useful in the context of running everything in the same context as peerconnection
… but with transferred channels, this makes more sense to consider

dom: it would be clunky not to expose it

Issue #115: Need to specify behavior of detached RTCDataChannel objects 🎞︎

[Slide 34]

florent: we need to document a [[Detached]] internal slot per the HTML spec for transferable platform objects
… we would keep [[isTransferable]] for the a datachannel that has already sent

[no objection]

jib: it remains unclear what happens to data channels when they're transfered in the main thread

[Slide 35]

florent: should transfered data channels be garbage collectable in the main thread? they're "closed" which make them collectable without a strong reference
… we could add a new state "detached" on top of opening, open, closed etc

Bernard: I prefer Proposal 2

jib: transferable objects are more like a clone, leaving an unoperative a clone
… so the broader question is how a [[Detached]] data channel should behave, how it should affects the existing algorithms

florent: because they're closed, this already impacts the methods close() and send()

florent: hearing some support to introducing a "detached" state, and a  [[Detached]] internal slot

hta: what about garbage collection?

florent: let's discuss on github

Capture Handle 🎞︎

[Slide 39]

[Slide 40]

[Slide 41]

[Slide 42]

[Slide 43]

[Slide 44]

[Slide 45]

[Slide 46]

[Slide 47]

[Slide 48]

[Slide 49]

Elad: the proposal is to add some structure to capture handle
… fo crop targets (possibly with specific content hints)

jib: what about a messageport?

elad: still not structured, so leads to tight coupling

jib: I'm not sure we want to specific all the different things that application might need to agree
… per #11, I don't think we should re-invent postMessage

elad: a messageport informs the capturee they're being captured
… capture handle is a unidirectional message port
… being able to update the handle is useful given that the captured content is going to change
… a messageport can be useful in general, but for different use cases

jib: can we take a step back to understand the requirements we have?
… what API surface would be expose here?

elad: adding structure for a crop target in the capture handle instead of a simple string
… croptarget would have contenthints, and also add a messageport as a separate suggestion

dom: I think maybe a unidirectional messageport would work for what we want?

elad: several suggestions: move from string to object in capture handle - already needed for tightly coupled apps
… for loosely coupled apps, similar to what capture actions already allow, adding explicit support for croptargets / contenthints would go a long way to help

elad: what about the first suggestion - moving from a string to an object?

jib: would re-iterate #11 - let's not reinvent postMessage

elad: but this adds ability to decouple capturees/capturer

jib: but adding this to the browser API when it facts it's down to the app to use it or not
… that's odd

elad: it's similar to capture actions, not really more formalized in semantics

hta: are there establishing standardized protocols over messageport already?

dom: don't know off the top of my head, would have to check

hta: if we were to have to come up with that, this feels scary

dom: re going with an objects, would that be for serializable objects?

elad: yes

jib: the original purpose for handle was an identifier; now we're talking about passing objects, that changes the nature of the API

[Slide 51]

elad: a messageport doesn't address all the use cases - it's not structured
… I'm hearing support for the use cases, and not seeing an alternative proposal

jib: I remain a bit lost on the requirements we're solving with this API
… e.g. it could be a separate field instead of being part of the handle
… I'm not sure why should allow random web sites to specific crop targets

elad: slide 47 illustrates how this could be a purely user-driven process to avoid any user tricking

jib: but I'm not sold we need to allow this for random web sites

harald: what criteria would a web site eligible to this?

jib: with a messageport?

elad: but that makes it more likely to create situations where a web site might want to trick another provider?

jib: I still don't see a compelling case for making handle an object

elad: is the video provider / vc collaboration use case compelling?

jib: yes - we should figure a better way

elad: what way though?

hta: I'm hearing 2 proposals: make handle with some pre-defined fields for specific purposes (e.g. listing croptargets); and a messageport for tightly coupled apps
… these are 2 independent proposals that should be evaluated separately

jib: would allow any serializable object be safe to expose to the capturer? that seems problematic

elad: the security properties are similar (or even somewhat safer) than a messageport

Ben: with arbitrary objects, could that raise OOM concerns?

elad: 1. The captured page would be attacking itself first and foremost.
… 2. The captured page would be attacking an unknown capturer that likely doesn't even exist.
… 3. We can neuter the attack by ensuring the capture-handle is no-op on the capturer if the capturer does not read the handle. But that's for the future, if the attack comes up in the wild, which is unlikely.

ben: are there objects that could create risks for the receiver?

elad: not that I'm aware

dom: I think the chairs will have to propose steps to unblock this conversation
… maybe an explainer would help figure out all the considerations that need to be taken into account

hta: the chairs will do so

Summary of resolutions

  1. pass an array arguments to generateKeyframes
  2. Add mimeType to metadata
  3. move forward with framesCaptured only for now
  4. proceed with the proposed clarification
  5. close #2762 as is
Minutes manually created (not a transcript), formatted by scribe.perl version repo-links-187 (Sat Jan 8 20:22:22 2022 UTC).