WebRTC WG 2023-04-18

Meeting minutes

Recording: https://www.youtube.com/watch?v=VkBeQdbVjWs

Date: April 18, 2023

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf

PR 147: Add RTCRtpEncodingParameters.codec to change the active codec (Florent)

Florent: We failed to mention that this API could also be used for audio. We intend this to also be audio. If you have any objections please mention them now.

Jan-Ivar: Simulcast on audio is probably not on anyone’s table. The use case for audio here is to not have to negotiate, just want to clarify that.

If people use this API, what codec can they choose from? Can they go outside what was negotiated?

Florent: No, only what is negotiated. Any negotiated codec supported by the user agent.

Peter: If someone asks for something that is not negotiated, what happens?

Florent: An exception is thrown, it’s part of the PR.

RESOLUTION: No objection, let’s include audio.

PR Stats/751: RTX and FEC stats are incomplete (Fippo)

[Slide 13]

Fippo: Outbound has RTX, not inbound. Inbound has FEC, outbound does not except in provisional stats. Proposal: to add metrics for both sides and merge once there is an implementation.

Jan-Ivar: No objection, but the slide is asking for more than what is discussed on the slide. Is that correct? WebRTC stats is in Candidate Recommendation. If not implemented we put them in provisional. Are you providing implementations?

Fippo: Yes I’m providing implementations.

Jan-Ivar: No objection.

RESOLUTION: No objection, Fippo to make PRs: merge on main spec if implementation is provided or provisional spec if no implementation is provided.

Issue 146: Exposing decode errors / SW fallback as an Event (Bernard)

[Slide 14]

Bernard: In Media WG a useful distinction was made between data errors and resource problems. HW encoders are more restrictive than SW. Nothing to show the user but the developer has to debug the bitstream.

If there is a resource problem we may want to prompt the user. Application might re-acquire resources.

That’s my recommendation here. What do people think about distinguishing these two?

Youenn: We already support SW fallback today, so that seems good as-is. I don’t see any reason to notify the app. For resource problem is seems similar to “intense CPU usage”. In Safari we have a UI to show this, so again there I don’t see what the application can do to help the user that the browser cannot already do instead.

Jan-Ivar: Thumbs up.

Florent: There is also the case of the SW decoder not being able to decode the data. That also needs to be handled.

HW to SW fallback: what if application wants to change codec instead?

Jan-Ivar: It’s unclear to me if the only option to fall back to SW. We should be clear about what problem we’re trying to solve.

Henrik: The distinction makes sense but I don’t see how this slide solves the original problem of letting the app change codec if it doesn’t get HW if you’re saying SW fallback solves the problem I don’t agree.

Bernard: An event that distinguishes between the two is the proposal, app observable. In the Media working group we want to get this information out of WebCodecs. Just to be clear this is under investigation, it’s not perfect, but Eugene is trying to address this.

Jan-Ivar: Because of the privacy issue I think we [are hesitant]. (Scribe: not sure what was said here.)

Bernard: The benefit of this approach is that we don’t need more info than this. It’s two bits of information.

Youenn: The privacy concerns is real.

RESOLUTION: Waiting for Media Working Group.

Issue 170: Incompatible SVC Metadata (Bernard)

Bernard: What metadata do you need to handle things? Dropping spatial layers is easy but adding them back is not easy because of the way SVC works. Let me show a slide: diagram that shows how a frame may not be decodable if a dependent frame was not received for some reason. The same problem can happen if there is a dependency between multiple frames, the first frame could be lost. Need an unbroken chain of dependencies.

Bernard: Another problem is if a mobile device can receive all frames, but may not be able to decode it because the frames are too big to be able to handle.

Bernard: Receiver needs to quickly be able to decode the frames and if a receive frame is necessary for a desired resolution

[Slide 19]

[Slide 20]

Bernard: We have a PR for EncodedVideoFrameMetadata for SVC (see slide and link to detailed proposal).
… Peter investigated state in Chromium for WebCodecs.

Peter: Not all codecs implemented all of this. I don’t think there was anything in there for chains.

Bernard: Question is why this stuff is in encoded transform if it has not been implemented? Should we remove them?

Florent: Temporal and spatial index is implemented but it might depend on the codec implementation. E.g. it could be in the VP8 frame descriptor but not in the AV1 bitstream but if you provide it you should be able to get most of this. That doesn’t mean there are no bugs. I am working on some tests for SVC in Chrome.

Bernard: So we have this incompatibility between WebCodecs and WebRTC metadata. How to make these compatible?

SUMMARY: Needs further investigation?

Issue 93: MediaStreamTrack audio delay/glitch capture stats (Henrik)

[Slide 21]

Henrik: The following audio capture metrics were recently added to getStats. But issues were filed and the metrics have been marked Feature at risk. The following metrics were added (see slide). The metrics are only applicable if the media source represents an audio capture device. They allow calculating important quality metrics not available elsewhere such as glitches in audio or average capture delay.

[Slide 22]

Henrik: w3c/webrtc-stats#741 is that GUM is frequently used without WebRTC, we should talk about audio frames rather than audio samples. The stats spec actually means audio frames as clarified in the terminology section. And lastly it was not clear why audio may be dropped so we need to clarify this happens when audio is not processed in a timely manner from the real time audio device.

Henrik: So the main issue, other than clarifications, is that the metrics are in getStats() but they may be useful outside of WebRTC. The proposal is to move them to MediaStreamTrack. This is similar to how we recently added video capture stats to track.getFrameStats(). The second point is name bikeshedding, if we should use the getFrameStats, rename it or add a new method specific to audio capture stats.

Youenn: Makes sense. No strong opinion on names.

Jan-Ivar: Correct direction. I’m not sure why we didn’t just call this track.getStats().

RESOLUTION: Moving them to MediaStreamTrack makes sense, Henrik to write a PR.

PR 173 adding presentationTime to RTCEncodedVideoFrameMetadata (Tony)

[Slide 23]

Tony: The PR suggests a presentation the two timestamps don’t match.

Bernard: Any objections?

Jan-Ivar: We’re overall supportive, but there is a comment about naming bikeshedding.

Bernard: We’re heading down a road where we’re recreating WebCodecs.

Jan-Ivar: No objection that we need it but name needs to be figured out.

Tony: Let’s continue on the PR.

WebRTC Combines Media and Transport

[Slide 26]

[Slide 27]

Bernard: WebRTC combines media and transport. Good and bad. Good if it does what you want, but bad if you’re trying to do different things like:
… * Some things can’t be expressed in SDP.
… * Difficult to support bleeding edge codecs
… What if we could separate media and transport?

[Slide 28]

Peter: Tightly coupled, but what if we could cut in two? Then the app can sit between “media” and “transport” parts and tweak. See slide.
… The left part looks a lot like WebCodecs. The right part: RtpTransport?
… RTP sender API is “stream of packets in; stream of packets out on wire”. RTP receiver is “stream of packets received on the wire, stream of packets out”. RtpTransport.
… Here is how we could allow constructing these independently of a PeerConnection: (slide).
… This would allow the app to bring its own packetization, jitter buffer and rate adaptation.
… This is similar to WebTransport with datagrams. But it can also be P2P and latency-sensitive congestion control. RtpTransport would solve the following use cases:
… NV07, NV08, NV09, NV15, (non-consensus ones) NV38, NV39, NV40,
… See slides with WebIDL.
… If we wanted to do jitter buffers there is more we could do, but focus right now is RtpTransport. Is this a good direction to go, giving the application more controls for transport?

Jan-Ivar: NV use cases kind of pre-dates WebTransport. I think a lot of these use cases have already been met. Some of these use cases are not actually use cases but requirements for use cases. I feel like I’ve already objected to this.

Bernard: No, the NV use cases not labeled “non-consensus” ones don’t have consensus but the other ones do have WG consensus.

Jan-Ivar: There are now two implementations of WebTransport. I’d like to push back. It seems a bit early to discuss WebIDL.

Youenn: We see a lot of people trying to push a lot of stuff. There is a desire to have an ICE transport. I think it is the right approach. Are there sufficient use cases to warrant this? That is a good question, and we should spend some time there. But if we can gather enough use cases…

Florent: Regarding overlap with WebTransport, I don’t believe it will work in most cases that involve peer to peer, since WebTransport only works between browser and server, not browser and browser.

Randell: I think this is interesting. I have concerns similar but not identical to Jan-Ivar. It does seem easy for something to violate bandwidth constraints. Perhaps there would be mechanisms in place to block that somehow. It does smack me as being very low level and it is inviting applications to implement their own implementation of WebRTC. Doing it is one thing, but doing it well is not so easy. I am a little concerned about that, so I think it would be worth discussing a little more what use cases this would enable and what alternatives there might be to resolve those use cases. I have some reservations about the details here. There are two issues with WebTransport: one, as Florent mentioned, it is only client to server. That’s not to say it couldn’t be, but that would be additional work to get there. The second issue is that congestion control currently implemented for WebTransport is not real-time friendly, but that could be fixed with a “I want real-time congestion” flag.

Peter: I’m a fan of WebTransport and I would like to see it solve both P2P and CC. But it will never have the same interop with endpoints. So even if WebTransport becomes everything we want, I think there will still be a need for web apps to have RTP controls. The reason I mentioned jitter buffer and packetization is that these are the main thing for somebody to implement. A high quality jitter buffer is not an easy task. I have a proposal for actually. Part of the detailed design I have for this addresses how the browser could stop the application from screwing things up. Related to that, sequence number reuse I do have a solution for that too. But the overall question is if we want to have these discussions?

Randell: All things being equal, if I had a choice, I would prefer to solve this via WebTransport. Perhaps there are some cases… but I’d rather solve it if we can. I want to get clarification if this cannot be solved via WebTransport for the use cases we decide care about.

Bernard: The IETF will not use WebTransport for peer-to-peer. This working group decided not to work on QUIC. Anyway the protocol question is for IETF and it is not clear that they will solve this.

Jan-Ivar: This would only provide a data channel, which we already have. So are there peer-to-peer use cases that are not already solved through existing technology? You can already use VideoTrackGenerator to send data in this. If there are not enough use cases this sounds like a premature optimization.

Peter: Regarding use cases - we’ve been talking about them forever. And we’ve never gotten around to making solid proposal. It has been far too long, this is a shame. We’re just talking about talking about talking about use cases. What I’m proposing is a solution to all of these and more. We’re not getting anywhere talking about use cases.

Youenn: There was talk about being able to replace WebRTC. It would be great to collect exactly what people are trying to solve. If we have enough RTP use cases we should solve them instead of trying to solve them via other APIs. That should not be the path forward.

Bernard: This would need some time. We might want to consider this for TPAC or future meetings.

WebRTC ICE improvements

Sameer: Continuing our discussions on ICE improvements.
… Last time we had 3 proposals. Feedback: Can we split this into increments? This is what Peter and I have been trying to do. Harmonize into a single proposal with a lot of common grounds. Meets all the NV requirements.
… Order of incremental improvements…
… 1-3: candidate pair controls
… 4-5: low latency
… 6-9: ICE connectivity checks, controlling timing and observing checks
… 10-11: candidate gathering
… 12-13: prevent premature pruning or custom pruning
… 14-15: ICE without peer connection and forking
… Peter will talk about the API shapes.

Peter: Cancellable events versus direct control: A and B on the slides. For today just pick the option you like, we can decide that later, judge the API as a whole based on the one you like.
… Lots of WebIDL. See slides!

Bernard: If you’re passing an ICE gatherer, I guess you’re max-bundle?

Peter: If you’re willing to do ICE forking, you’re probably willing to do max-bundle.

Jan-Ivar: I do think we want to go in this direction. When it comes down to cancellable events, I think they’re semantics may make sense. There is a valuable pattern. On the other hand if the application wants to remove a pair not in response to something, there is value to that too. So if you could agree.
… It’s hard to imagine how the JavaScript would look when using all of these, but it does seem to me like a lot of defining custom events. I imagine we only need to create custom events in certain cases.

Peter: It might be possible to instead have attributes or something.

Jan-Ivar: But overall it’s a good direction.

Youenn: It’s a long list of interfaces, so I cannot sign on all interfaces. But it is good to have a path forward that seems pretty clear. I am hoping that the first API bits are the ones that developers will use first, because then we can start prototyping and shipping more. If the first three adds value we have more motivation to implement more. So let’s dig into that and start being nit-picky about the design and so on.

Peter: We made the list in the order that we think the developers are asking for the most. One option is to really nail down those three and then work from there.

Sameer: Regarding how does the JS look, for the new proposal I don’t have a full example yet, but on my github I do have an example of how the old API being used look. It is slightly different of course, but it should give some view of how this might look like in general.
… Regarding cancellable events versus the other approach: we would still have explicit methods for specific actions. The difference is only between how to affect the default behavior: do you decide at one time or do you decide when the event is firing?

Jan-Ivar: Reactionary versus method seems different uses.

Peter: The method to remove a candidate pair, for example, exists in both proposals. The difference is just with how to prevent the default action.
… Examples. Get down and dirty.

RESOLUTION: The working group supports the direction this is taking. Sameer/Peter to create PRs to solve things piece by piece, starting the the most important use cases (first few bullet points). As we progress we’re expecting to see reason to continue down the list and spec + implement more.

playoutDelay (Jan-Ivar)

[Slide 81]

Jan-Ivar: Chrome has implemented this as “playoutDelayHint”. Multiple issues have been raised. Firefox is now trying to implement and address issues. Questions:
… * Jitter buffer delay OR jitter buffer plus playout delay?
… * Milliseconds versus seconds?
… * How to test?
… * The positive goal should be jitter-free media.
… Delay is a measure of a negative side-effect, it is vague. It makes it hard to test and confusing for implementers. Chrome is inconsistent.

[Slide 82]

Jan-Ivar: Proposal: jitterBufferDelay = value in milliseconds. Let the application compare this to getStats.

Henrik: I like everything you are presenting and think this is how we should have done it in the first place. But I just want to point out that I think the jitter buffer issue in Chrome is a bug in getStats() rather than a bug in playoutDelay, so if this is essentially already implemented under a different name, is it really worth the migration effort to change name versus changing the spec?

Jan-Ivar: It also says to throw an exception which Chrome doesn’t do. If you set this to 20 seconds that would change what this API can be used for. WebRTC is used for real-time and 20 seconds is not real time.

Henrik: My understanding is that Chrome clamps. So if you set to 20 it might not throw but you still only get 4 seconds, or whatever the max value is. I think. I’m not saying it’s better but overall it seems like the difference between the spec and the implementation is rather nit picky and I’m wondering if it is worth having to migrate.

Jan-Ivar: But hint sounds like this is optional. We want to have control surfaces that we can test.

Henrik: I agree it’s just if it’s worth the extra effort. I mean, I’m not going to object if we go down this route but. I have a feeling this will come back on my table. Heh.

Jan-Ivar: Anyone else? So is that it?

RESOLUTION: Let’s go in Jan-Ivar’s direction. Jan-Ivar to make PRs.

Issue 39: Solve user agent camera/microphone double-mute

[Slide 85]

Youenn: The user agent can pause the camera in Safari. This affects the muted state. But applications tend to implement their own mute function, so it would be good if there was a way for the website and UA to sync.

[Slide 86]

Youenn: OS side indicator is widely deployed in the UX. So when websites really want to mute they tend to stop() the track rather than mute.
… OS level microphone: application tends to not stop() the track in this case for speech detection (“are you speaking?” hint)
… Proposal: allow application to request to mute/unmute capture.

[Slide 87]

Youenn: It could be placed on the MediaStreamTrack, InputDeviceInfo or Navigator. I would personally prefer on the MediaStreamTrack or InputDeviceInfo.
… Thoughts?

Jan-Ivar: I’m supportive but there are some privacy concerns. I think the privacy concerns need to be met which I think means requiring user activation. My preference would be to put it on the track. Bikeshed preference would be to simply call it mute() instead of requestMute().

Youenn: Maybe muting happened a day ago, the user may need to accept prompt again. This is why it returns a promise.

Jan-Ivar: What is the use case of mute?

Youenn: People tend to clone tracks, what you actually want to do is mute the source, it is error prone to hunt down on all tracks.

Jan-Ivar: If I’m transferring a track and there is a clone, I could also request to mute?

Youenn: Muting on one could mute all clones. If one website wants to mute it is good if we could do that roughly at the source level. Hence the InputDeviceInfo proposal.

Jan-Ivar: I want on the track, and focus on mute. Not as clear about unmute.

RESOLUTION: No objections.

– DRAFT –
WebRTC WG 2023-04-18

Attendees