Meeting minutes
Recording: https://
Slideset: https://
Encoded Transform 🎞︎
Harald: I offered to use the IETF Hackathon to experiment with encoded transform (on my own, for lack of participants)
Harald: developed 2 demos to evaluate the API (but not for signals)
Harald: I had initially thought I needed both producers and consumers, but writing the demos, only the producers seemed necessary
Harald: the processing is done via a user-defined JS class that you insert in the processing pipeline, but without requiring a single PC used in both end of the pipe
… this led to the conclusion that the API could be used
… Peter worked separately on how that one-way API approach could be done with the existing two-ways APIs
Peter: I got it working with transport, codec
Peter: a constructor would help
… also missing signals for congestion control
Peter: pretty straightforward on the receiver side
Peter: again, missing a way to control e.g. the encoder bitrate based on congestion control
Peter: for a Decoder, we would again want a constructor for the encoded video frame, and signals to detect the need for a key frame
Peter: Harald's approach would satisfy these needs
Youenn: with regard to these 5 gaps, there is already a solution for the keyframe problem
… for constructors, I'm not sure why we need something on top of what WebCodecs provide from raw data; what's the point of using PC for incoming data?
Peter: WebCodecs doesn't have a built-in jitter buffer, whereas this would
Youenn: but we've been discussing letting the app define the jitter buffer
… so it's not clear that there is a benefit
Peter: it would still allow to get the same behavior that you get from WebRTC without having to write your own jitter buffer
Youenn: I think this would benefit from clearer use cases
Harald: one of the use cases that needs this is getting an incoming video frame and passes it out to a different peer connection
… or passing it to 2 peer connections
Youenn: to re-forward it?
harald: possibly, yes
Youenn: this may be mostly about serialization, rather than a constructor
harald: metadata may need rewriting
… let's see about use cases
Jan-Ivar: what's the high level problem we're solving? would this be instead of encoded transform? re-imagining it? identifying issues with it?
… we have readable and writable streams on mediastreamtracks
… so I can already receive a track and forward it
… what's the difference?
Harald: this relates to the use cases discussed at TPAC
… there were compelling arguments that this could not be addressed without substantive changes of the webrtc encoded transform API
… not clear if this should replace or extend it - depending on where the shape lands
Peter: you cannot forward well without bandwidth estimation
… you could re-use the encoded(audio|video)frame to forward them as is, but you probably need to re-packetize which you can't do without a constructor
Jan-ivar: OK; still unclear how this would affect the API shape
Peter: I was focused on identifying the gaps at this stage
Harald: I explicitly shied away from presenting an API shape, to focus on use cases and requirements at this stage
… this is to stimulate the discussions
Peter: my impression is that this could be added with fairly minimal changes (constructors, signals)
… not a big delta from what we have
Harald: so next step is to enumerate use cases a bit more before making a change proposal
… Peter and I will continue to iterate on this
WebRTC PC 🎞︎
Issue #2795: Missing URL in RTCIceCandidateInit 🎞︎
Youenn: this follows from discussion at the previous meeting
… the server URL used to be exposed in the event, and it has been proposed to move it to the candidate object itself
… but we didn't discuss whether it would survive JSON serialization / deserialization
… so far serialization/deserialization has been without information less
… should that apply to the URL attribute?
Youenn: this impacts whether it gets submitted to remote parties by default (although this is only about defaults, not about protecting the info in general since it remains available to JS)
… in general, do we want to keep the invariant of non-lossiness on this object?
… Personally, I don't think there are good use cases to pass the url to remote parties, and we should keep the model consistent with regard to lossiness
… so we should keep the url attribute to the event rather than the object
… it can be shimmed easily from one to the other
fippo: toJSON conveys information that is needed for ICE
… additional properties were added to avoid having developers parsing data out of the canddiate string
… e.g. to determine the network topology
youenn: the question is about convenience / POLA
fippo: exposing the data on candidate is best to avoid having to go through stats
… you can't correlate you event with stats except through IP address matching
Jan-Ivar: I'm hearing that the candidates already has information that aren't exposed in toJSON
Fippo: right, e.g. relayProtocol
Jan-Ivar: so that already breaks the supposed invariant on non-lossiness
Youenn: if so, that goes against the spirit of the spec
… if that's not the case, this may require clarifying the spec or aligning it with the invariant
fippo: the problem is that we're trying to treat local and remote candidates the same
… but local candidates can have more info
youenn: that's why I thought the event was a good way to expose local information
Fippo: in stats we distinguish a lot between local & remote
jan-ivar: my preference is to not send it to remote parties, and so not include it in toJSON
… the design pattern for events is also not to expose properties on the event when it can be exposed on the underlying object
… so I lean towards exposing it in the object
youenn: we then at least to change the constructor
harald: the candidate is behaving like a data object, without inherent behavior
… people expect to copy data objects, and they would expect toJSON() to allow this - breaking such a pattern is a bad idea
… we have a backwards compatibility problem since toJSON is used to send data to remote parties
… I think it was a mistake to use toJSON for transmission
… I think putting the url data on the candidate is right
… I think the right direction would be to add a method that only exposes the right info for the remote party
Youenn: another approach would be to distinguish local vs remote candidates
Harald: that's an interesting idea
Jan-Ivar: I agree we have a wart here, but I don't think we should chase technical purity
… subclassing would not solve the backwards compat issue
Youenn: let's iterate on this discussion on github
Issue #2796: A simulcast transceiver saved from rollback by addTrack doesn’t re-associate, but unicast does 🎞︎
Jan-Ivar: more corner cases, esp to with rollbacks
Harald: proposal seems reasonable to me
Bernard: +1
Harald: Jan-Ivar will propose a PR
Issue #2724: The language around setting a description appears to prohibit renegotiation of RIDs 🎞︎
Jan-Ivar: see also PR #2794
Jan-Ivar: this would match Chrome & Safari, although there is a remaining inconsistency identified in Chrome
Harald: this is the one where you discovered Chrome disabled layers rather than removing them
… this sounds reasonable given our previous agreement on this
Jan-Ivar: it's a small change that doesn't introduce new behaviors, but extend them
Harald: I think this works
… Will you add tests too?
Jan-Ivar: yes, along with FF implementation of setParameters
Timing Model & WebCodecs 🎞︎
Bernard: the gorup created a videoframemetadata registry with a process
… an example of that is the request to register human face metadata #607
… this also relates to the requestVideoFrameCallback spec (being merged in HTML)
… which also exposes metadata and whether they should be exposed there as well
… the rVFC spec exposes timing info at all aspects of the pipeline (captureTime, rtpTimestamp, receiveTime, processingDuration, expectedDisplayTime, presentationTime)
… it mixes codec-related timing but also rtp-related info
… this brings up a number of questions: where is the metadata exposed in our APIs (e.g. mediacapture transform)
… should I expect .captureTime to be visible in a videoframe
… likewise, there are assumptions on whatthings should happen in WebRTC (e.g. setting the rtpTimestamp)
… is metadata passed through the pipeline: converting a video frame with mediacapture transofrm and pass it to webrtc - is this still visible at the end in rVFC? in encoded transform?
… in WebCodecs encoded chunks?
… do we need to file related issues?
Youenn: I filed some of these issues - captureTime etc are planned to move to videoframemetadata
… that should bring consistency throughout the pipeline
… mediacapture transform will not perserve it magically - if you clone the frame, metadata will be clone along with it
… likewise if it goes through WebRTC PC
… encoded chunks doesn't expose that metadata - maybe we should; we haven't heard feedback or use cases for that yet
… in terms of what the WG may need to discuss: how do we compute presentationTime? VideoTrackGenerator allows to set timestamp, but we're not defining what happens on rendering (e.g. re jitter buffer)
Harald: if a processing element has metadata defined both as part of input & output, should we have a general rule about metadata it doesn't understand?
… for the metadata info it knows about (e.g. width and height for an encoder), it won't remain unchanged
… but for metadata that isn't understood, should have a rule to leave it unchanged?
Bernard: the registry rule is that this is up to the registry-linked spec to define
… not sure we can have a rule that is imposed to all WGs
… a rule would have to be proposed to be enforced
Youenn: individual metadata spec could describe how they're handled by processors
Bernard: next step would be to file specific issues on specific specs
Youenn: the main remaining issue might be on rendering time
… in media capture main
Face Detection 🎞︎
Tuukka: the face detection proposal now uses videoframemetadata object
Tuukka: looking for feedback on the general direction
Youenn: thanks - looks like a great improvement, and exciting to see this moving forward
… dictionary members probably don't need to be nullable, but some may need to be marked as required
… re center points vs bounding box vs best possible contours: I'm not sure if a sequence is best vs different fields
… not sure about faceDetectionMaxCountourPoints - do we really need this now? can we leave this for later? or have a hint?
… if developers just want a bounding box, maybe we should let developers express it, and send back a detailed contour otherwise
… the example may need an update wrt @@@
… I guess this means the proposal will be split across webcodecs and mediacapture-extensions
Tuukka: the metadata and the constraints are both specified in mediacapture extensions
… are you suggesting the former should be done in webcodecs?
Youenn: not sure - I guess this is testing the registry process
… the registry entry could either define the metadata or link to the mediacapture extensions spec
Tuukka: the constraints and metadata are co-dependent
… they need to be maintained together
youenn: that makes sense; webcodecs has been asking to be able to review metadata when they change, so it may be best to have something in webcodecs space
… we can iterate with webcodecs folks on the details
timp: I like this - looks useful & interesting
… it would be good to document the lifespan and meaning of the id - in particular, that it doesn't allow to correlate faces across streams
… re contour & bounding box, I agree with Youenn that they're not the same and should be handled separately, not rely on 4 items == bounding box
tuukka: the goal here was to avoid cluttering metadata as new contour approaches emerge
jan-ivar: looking at the broader question of merging this
… from a privacy perspective, it looks like it doesn't add any concerns over having the detection done in JS
… this looks good to me
Youenn: let's see a PR that editors can iron out and then run a CfC?
MessagePort on Capture Handle 🎞︎
Youenn: having a message channel between capturer and capture makes sense
… a few things off in the API shape that we can iterate on (e.g. event handler in a dictionary - they're usually on objects)
… I'm not sure about the "supportsMessagePort" boolean
… I would prefer we start from a minimal API surface
… also for messageportinvalidated - we should discuss this with the HTML spec folks
… this underlying behavior already exists with other messageports
… I would prefer a name different "getMessagePort" given its side effects
… I like the integration with capture handle
elad: +1 to "openMessagePort" instead of get...
… I'm happy to discuss reduction of API surface
… s/handle/controller
… my proposal deals both with capture handle and controller - how do you feel about integration with handle?
Youenn: event handler in a dictionary feels wrong
… don't have strong feelings on handle vs mediaDevices in general
elad: the link to capture handle happens both on capturer & capturee
… you commented on only one side?
youenn: on the other side, I would move it to capture controller
jan-ivar: I really like the 1st part of the presentation - agree on use cases & requirements
… would like to iterate on github on the API shape
… generally would agree with youenn to move it to controller rather than track
… I think the direction you're presenting makes sense as a starting point
Elad: so next steps is to surface similar events following that pattern on capture controller
… we should revisit this a the next meeting
enumerateDevices & Focus 🎞︎
jan-ivar: PR #912 allows the behavior in Safari by relaxing the focus requirements a little bit
Youenn: I like this proposal; LGTM
Harald: my reading is that it waits after the gUM prompt has been replied to?
jan-ivar: after it has shown up, not responded to (since that requires focus in any case)
harald: I'll re-read the PR carefully to make sure it doesn't introduce issues
Elad: can you clarify the "anti-spying" behavior?
Jan-Ivar: the PR doesn't change the focus requirement, only its timing
Elad: ok, I'll bring the question on github then
Jan-Ivar: we also had developers complaining that enumerateDevices() block when there is no focus (which is marked an optional behavior)
… the PR proposes to make it tied to visibility, not focus
… this helps backwards compat, and still satisfies the anti-fingerprinting requirement (anti-spying only applies to getUserMedia)
… this would make the check deterministic as requested by the developer
Youenn: so the goal is to reduce friction for developers and align user agents behaviors - that's a good goal
… do you foresee compat issues in implementing this?
… will it fix existing firefox issues that developers were complaining about or does that require developers adoption before it does?
jan-ivar: they would have to add the visibilityState check to avoid being "blocked"
elad: I could use more time to review this
youenn: I think it would be good to get feedback from other UAs and developers
Bernard: do we need a CfC?
Jan-Ivar: developers should be happy given that it relaxes the behavior
Dom: does this need an updated privacy review?
jan-ivar: I don't think so since the behavior was already optional
… and the fuzzing advice is already in the spec
harald: I'll have to review this in details
Dom: so we can delegate this for final review by Harald, Elad & Youenn?
JIB: SGTM