WebRTC November 2022 meeting – 15 November 2022

Meeting minutes

Recording: https://youtu.be/YHLpqvcRAlY

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Nov/att-0000/WEBRTCWG-2022-11-15.pdf

Encoded Transform 🎞︎

[Slide 10]

Harald: I offered to use the IETF Hackathon to experiment with encoded transform (on my own, for lack of participants)

[Slide 11]

[Slide 12]

Harald: developed 2 demos to evaluate the API (but not for signals)

[Slide 13]

Harald: I had initially thought I needed both producers and consumers, but writing the demos, only the producers seemed necessary

[Slide 14]

Harald: the processing is done via a user-defined JS class that you insert in the processing pipeline, but without requiring a single PC used in both end of the pipe
… this led to the conclusion that the API could be used
… Peter worked separately on how that one-way API approach could be done with the existing two-ways APIs

[Slide 15]

Peter: I got it working with transport, codec

[Slide 16]

Peter: a constructor would help
… also missing signals for congestion control

[Slide 17]

Peter: pretty straightforward on the receiver side

[Slide 18]

Peter: again, missing a way to control e.g. the encoder bitrate based on congestion control

[Slide 19]

Peter: for a Decoder, we would again want a constructor for the encoded video frame, and signals to detect the need for a key frame

[Slide 20]

Peter: Harald's approach would satisfy these needs

Youenn: with regard to these 5 gaps, there is already a solution for the keyframe problem
… for constructors, I'm not sure why we need something on top of what WebCodecs provide from raw data; what's the point of using PC for incoming data?

Peter: WebCodecs doesn't have a built-in jitter buffer, whereas this would

Youenn: but we've been discussing letting the app define the jitter buffer
… so it's not clear that there is a benefit

Peter: it would still allow to get the same behavior that you get from WebRTC without having to write your own jitter buffer

Youenn: I think this would benefit from clearer use cases

Harald: one of the use cases that needs this is getting an incoming video frame and passes it out to a different peer connection
… or passing it to 2 peer connections

Youenn: to re-forward it?

harald: possibly, yes

Youenn: this may be mostly about serialization, rather than a constructor

harald: metadata may need rewriting
… let's see about use cases

Jan-Ivar: what's the high level problem we're solving? would this be instead of encoded transform? re-imagining it? identifying issues with it?
… we have readable and writable streams on mediastreamtracks
… so I can already receive a track and forward it
… what's the difference?

Harald: this relates to the use cases discussed at TPAC
… there were compelling arguments that this could not be addressed without substantive changes of the webrtc encoded transform API
… not clear if this should replace or extend it - depending on where the shape lands

Peter: you cannot forward well without bandwidth estimation
… you could re-use the encoded(audio|video)frame to forward them as is, but you probably need to re-packetize which you can't do without a constructor

Jan-ivar: OK; still unclear how this would affect the API shape

Peter: I was focused on identifying the gaps at this stage

Harald: I explicitly shied away from presenting an API shape, to focus on use cases and requirements at this stage
… this is to stimulate the discussions

Peter: my impression is that this could be added with fairly minimal changes (constructors, signals)
… not a big delta from what we have

Harald: so next step is to enumerate use cases a bit more before making a change proposal
… Peter and I will continue to iterate on this

WebRTC PC 🎞︎

Issue #2795: Missing URL in RTCIceCandidateInit 🎞︎

[Slide 24]

Youenn: this follows from discussion at the previous meeting
… the server URL used to be exposed in the event, and it has been proposed to move it to the candidate object itself
… but we didn't discuss whether it would survive JSON serialization / deserialization
… so far serialization/deserialization has been without information less
… should that apply to the URL attribute?

[Slide 25]

Youenn: this impacts whether it gets submitted to remote parties by default (although this is only about defaults, not about protecting the info in general since it remains available to JS)
… in general, do we want to keep the invariant of non-lossiness on this object?
… Personally, I don't think there are good use cases to pass the url to remote parties, and we should keep the model consistent with regard to lossiness
… so we should keep the url attribute to the event rather than the object
… it can be shimmed easily from one to the other

[Slide 26]

fippo: toJSON conveys information that is needed for ICE
… additional properties were added to avoid having developers parsing data out of the canddiate string
… e.g. to determine the network topology

[Slide 27]

youenn: the question is about convenience / POLA

fippo: exposing the data on candidate is best to avoid having to go through stats
… you can't correlate you event with stats except through IP address matching

Jan-Ivar: I'm hearing that the candidates already has information that aren't exposed in toJSON

Fippo: right, e.g. relayProtocol

Jan-Ivar: so that already breaks the supposed invariant on non-lossiness

Youenn: if so, that goes against the spirit of the spec
… if that's not the case, this may require clarifying the spec or aligning it with the invariant

fippo: the problem is that we're trying to treat local and remote candidates the same
… but local candidates can have more info

youenn: that's why I thought the event was a good way to expose local information

Fippo: in stats we distinguish a lot between local & remote

jan-ivar: my preference is to not send it to remote parties, and so not include it in toJSON
… the design pattern for events is also not to expose properties on the event when it can be exposed on the underlying object
… so I lean towards exposing it in the object

youenn: we then at least to change the constructor

harald: the candidate is behaving like a data object, without inherent behavior
… people expect to copy data objects, and they would expect toJSON() to allow this - breaking such a pattern is a bad idea
… we have a backwards compatibility problem since toJSON is used to send data to remote parties
… I think it was a mistake to use toJSON for transmission
… I think putting the url data on the candidate is right
… I think the right direction would be to add a method that only exposes the right info for the remote party

Youenn: another approach would be to distinguish local vs remote candidates

Harald: that's an interesting idea

Jan-Ivar: I agree we have a wart here, but I don't think we should chase technical purity
… subclassing would not solve the backwards compat issue

Youenn: let's iterate on this discussion on github

Issue #2796: A simulcast transceiver saved from rollback by addTrack doesn’t re-associate, but unicast does 🎞︎

[Slide 28]

Jan-Ivar: more corner cases, esp to with rollbacks

Harald: proposal seems reasonable to me

Bernard: +1

Harald: Jan-Ivar will propose a PR

Issue #2724: The language around setting a description appears to prohibit renegotiation of RIDs 🎞︎

[Slide 29]

Jan-Ivar: see also PR #2794

[Slide 30]

Jan-Ivar: this would match Chrome & Safari, although there is a remaining inconsistency identified in Chrome

Harald: this is the one where you discovered Chrome disabled layers rather than removing them
… this sounds reasonable given our previous agreement on this

Jan-Ivar: it's a small change that doesn't introduce new behaviors, but extend them

Harald: I think this works
… Will you add tests too?

Jan-Ivar: yes, along with FF implementation of setParameters

Timing Model & WebCodecs 🎞︎

Bernard: the gorup created a videoframemetadata registry with a process
… an example of that is the request to register human face metadata #607
… this also relates to the requestVideoFrameCallback spec (being merged in HTML)
… which also exposes metadata and whether they should be exposed there as well
… the rVFC spec exposes timing info at all aspects of the pipeline (captureTime, rtpTimestamp, receiveTime, processingDuration, expectedDisplayTime, presentationTime)
… it mixes codec-related timing but also rtp-related info
… this brings up a number of questions: where is the metadata exposed in our APIs (e.g. mediacapture transform)
… should I expect .captureTime to be visible in a videoframe
… likewise, there are assumptions on whatthings should happen in WebRTC (e.g. setting the rtpTimestamp)
… is metadata passed through the pipeline: converting a video frame with mediacapture transofrm and pass it to webrtc - is this still visible at the end in rVFC? in encoded transform?
… in WebCodecs encoded chunks?
… do we need to file related issues?

Youenn: I filed some of these issues - captureTime etc are planned to move to videoframemetadata
… that should bring consistency throughout the pipeline
… mediacapture transform will not perserve it magically - if you clone the frame, metadata will be clone along with it
… likewise if it goes through WebRTC PC
… encoded chunks doesn't expose that metadata - maybe we should; we haven't heard feedback or use cases for that yet
… in terms of what the WG may need to discuss: how do we compute presentationTime? VideoTrackGenerator allows to set timestamp, but we're not defining what happens on rendering (e.g. re jitter buffer)

Harald: if a processing element has metadata defined both as part of input & output, should we have a general rule about metadata it doesn't understand?
… for the metadata info it knows about (e.g. width and height for an encoder), it won't remain unchanged
… but for metadata that isn't understood, should have a rule to leave it unchanged?

Bernard: the registry rule is that this is up to the registry-linked spec to define
… not sure we can have a rule that is imposed to all WGs
… a rule would have to be proposed to be enforced

Youenn: individual metadata spec could describe how they're handled by processors

Bernard: next step would be to file specific issues on specific specs

Youenn: the main remaining issue might be on rendering time
… in media capture main

Face Detection 🎞︎

[Slide 33]

Tuukka: the face detection proposal now uses videoframemetadata object

Tuukka: looking for feedback on the general direction

Youenn: thanks - looks like a great improvement, and exciting to see this moving forward
… dictionary members probably don't need to be nullable, but some may need to be marked as required
… re center points vs bounding box vs best possible contours: I'm not sure if a sequence is best vs different fields
… not sure about faceDetectionMaxCountourPoints - do we really need this now? can we leave this for later? or have a hint?
… if developers just want a bounding box, maybe we should let developers express it, and send back a detailed contour otherwise
… the example may need an update wrt @@@
… I guess this means the proposal will be split across webcodecs and mediacapture-extensions

Tuukka: the metadata and the constraints are both specified in mediacapture extensions
… are you suggesting the former should be done in webcodecs?

Youenn: not sure - I guess this is testing the registry process
… the registry entry could either define the metadata or link to the mediacapture extensions spec

Tuukka: the constraints and metadata are co-dependent
… they need to be maintained together

youenn: that makes sense; webcodecs has been asking to be able to review metadata when they change, so it may be best to have something in webcodecs space
… we can iterate with webcodecs folks on the details

timp: I like this - looks useful & interesting
… it would be good to document the lifespan and meaning of the id - in particular, that it doesn't allow to correlate faces across streams
… re contour & bounding box, I agree with Youenn that they're not the same and should be handled separately, not rely on 4 items == bounding box

tuukka: the goal here was to avoid cluttering metadata as new contour approaches emerge

jan-ivar: looking at the broader question of merging this
… from a privacy perspective, it looks like it doesn't add any concerns over having the detection done in JS
… this looks good to me

Youenn: let's see a PR that editors can iron out and then run a CfC?

MessagePort on Capture Handle 🎞︎

Youenn: having a message channel between capturer and capture makes sense
… a few things off in the API shape that we can iterate on (e.g. event handler in a dictionary - they're usually on objects)
… I'm not sure about the "supportsMessagePort" boolean
… I would prefer we start from a minimal API surface
… also for messageportinvalidated - we should discuss this with the HTML spec folks
… this underlying behavior already exists with other messageports
… I would prefer a name different "getMessagePort" given its side effects
… I like the integration with capture handle

elad: +1 to "openMessagePort" instead of get...
… I'm happy to discuss reduction of API surface
… s/handle/controller
… my proposal deals both with capture handle and controller - how do you feel about integration with handle?

Youenn: event handler in a dictionary feels wrong
… don't have strong feelings on handle vs mediaDevices in general

elad: the link to capture handle happens both on capturer & capturee
… you commented on only one side?

youenn: on the other side, I would move it to capture controller

jan-ivar: I really like the 1st part of the presentation - agree on use cases & requirements
… would like to iterate on github on the API shape
… generally would agree with youenn to move it to controller rather than track
… I think the direction you're presenting makes sense as a starting point

Elad: so next steps is to surface similar events following that pattern on capture controller
… we should revisit this a the next meeting

enumerateDevices & Focus 🎞︎

[Slide 53]

jan-ivar: PR #912 allows the behavior in Safari by relaxing the focus requirements a little bit

[Slide 54]

[Slide 55]

Youenn: I like this proposal; LGTM

Harald: my reading is that it waits after the gUM prompt has been replied to?

jan-ivar: after it has shown up, not responded to (since that requires focus in any case)

harald: I'll re-read the PR carefully to make sure it doesn't introduce issues

Elad: can you clarify the "anti-spying" behavior?

Jan-Ivar: the PR doesn't change the focus requirement, only its timing

Elad: ok, I'll bring the question on github then

[Slide 56]

Jan-Ivar: we also had developers complaining that enumerateDevices() block when there is no focus (which is marked an optional behavior)
… the PR proposes to make it tied to visibility, not focus
… this helps backwards compat, and still satisfies the anti-fingerprinting requirement (anti-spying only applies to getUserMedia)
… this would make the check deterministic as requested by the developer

[Slide 57]

Youenn: so the goal is to reduce friction for developers and align user agents behaviors - that's a good goal
… do you foresee compat issues in implementing this?
… will it fix existing firefox issues that developers were complaining about or does that require developers adoption before it does?

jan-ivar: they would have to add the visibilityState check to avoid being "blocked"

elad: I could use more time to review this

youenn: I think it would be good to get feedback from other UAs and developers

Bernard: do we need a CfC?

Jan-Ivar: developers should be happy given that it relaxes the behavior

Dom: does this need an updated privacy review?

jan-ivar: I don't think so since the behavior was already optional
… and the fuzzing advice is already in the spec

harald: I'll have to review this in details

Dom: so we can delegate this for final review by Harald, Elad & Youenn?

JIB: SGTM

– DRAFT –
WebRTC November 2022 meeting

15 November 2022

Attendees