W3C

– DRAFT –
WebRTC January 2024 Meeting

16 January 2024

Attendees

Present
Bernard, Fippo, Florent, Guido, Harald, Henrik, Jan-Ivar, Tony, Youenn
Regrets
Dom
Chair
Bernard, Harald, Jan-Ivar
Scribe
Henrik, scribe

Meeting minutes

Recording: https://youtu.be/LhlxjzncXeU

Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf

WG Document Status

Bernard: What’s going on with these specs and how can we make progress. Issues not advancing is a red flag.

WEBRTC-PC:

[Slide 11]

Bernard: How often should we recycle this?

Harald: We have extension specs too that should go from candidate recommendation to recommendation. It would make sense to recycle about once per year.

[Slide 12]

Bernard: 50 open issues, 20 open > 1 year. We’re not in great shape for recycling each year.

Mediacapture-Streams

[Slide 13]

[Slide 14]

Bernard: This is unusual in that it is widely implemented but only candidate recommendation. 31 open issues, 9 open > 1 year. It doesn’t look like we’re on the road to proposed recommendation.

[Slide 15]

Bernard: Are there issues with the WPT tests, surprising failures.

Jan-Ivar: There are some issues with testing device infrastructure.

Harald: Is the transferable track errors a sign of feature not being implemented or a difficulty with testability?

Jan-Ivar/Youenn: We have not implemented this yet.

Youenn: Maybe we should better organize the specs, the transferable track is an extension.

RESOLUTION: We should move mediacapture extension tests.

MST-ContentHint

[Slide 16]

[Slide 17]

Bernard: This being a working draft seems to be in sync, we should push to advance. It’s not a huge list to push to CR.

WebRTC-SVC

[Slide 18]

[Slide 20]

Bernard: This is also working draft but it has been implemented in Chromium. Safari Tech Preview indicate support but it’s not passing the WPT tests. Is there a Sonoma dependency?

Youenn: I need to look at these tests.

Encoded Transform

[Slide 21]

[Slide 22]

Bernard: This is a working group draft but the test is odd in that you have some passes for Firefox and Safari but very little on Edge and Chrome. This is a little bit worrisome, is this an issue with the spec or implementations?

Youenn: Chrome is passing 23/27 in the tentative folder, the test in the top folder are following the spec, Firefox/Safari is implementing ScriptTransform but not SFrameTransform, so that could be a feature at risk.

Harald: This is a spec problem, we don’t have agreement on a couple of key features of the spec, so the tests in tentative reflect the state of implementation before we come to agreement.

Bernard: So this one seems to have some legitimate issues keeping it back. But they won’t go away if we don’t talk about them.

MediaCapture Transform

[Slide 23]

[Slide 24]

[Slide 25]

Bernard: Not much has been happening since October 2022. Chromium and Sfari Test Preview implements it. 18 open issues, 17 open > 1 year.

Youenn: It’s partially implemented, but like with ScriptTransform chromium implements a previous version, so this could be a similar status. There are two API versions in different browsers.

Bernard: But the functionality is the same, it’s only in the API shape?

Youenn: Yes.

Harald: The key issue is availability on main thread, and we have a problem with transferability. Transferring MediaStreamTrack is not implemented by any browser.

Bernard: So the media stream transform relates to the implementability of this spec.

Guido: It’s similar but it’s not quite the same as encoded transform in the sense that chromium is proposing mostly a superset of what the current spec says, which is basically availability on window and support on audio. There is one small difference in API shape which is MediaTrackGenerator, it’s very similar to the generator that we have in the older version, and we could very easily make a version that is compatible with the spec. But the main thing blocking is the transferability of the track which nobody has implemented.

Youenn: We have a prototype that will probably be available in Safari Tech Preview in the coming months. Just video, not audio. It’s not enabled by default or complete yet.

Bernard: In summary, good news: widely implemented specs, but the specs are lagging behind implementations. It doesn’t seem like a huge task. But then there are the transforms where there are real spec issues. So in the next couple of meetings we should try to make progress on these blocking issues.

BLOCKING ISSUES

setCodecPreferences vs unidirectional codecs

[Slide 30]

Fippo: setCodecPreferences does not directly affect the send codec but webrtc-pc looks at both send and recv. We could either…
… Fix webrtc-pc by removing mentions of send codecs.
… Clarify codecs match algorithm.
… Do we agree that we should remove send codec?

[Thumbs up from several people.]

Harald: I’m doing a deep dive in a follow-up issue and would like to get input.

WebRTC spec should explicitly specify all causes of a PC sourced track being muted

Jan-Ivar: We’ve discussed mute in a lot of specs, there still seems to be some doubt about what mute means in webrtc-pc (remote tracks). Part of the problem is that the mediastream-main has two jobs, which is defining what MediaStreamTrack is and to define device capture. But in short, the mute event is controlled by the user agent and is controlled by the source. But for WebRTC, the source is an incoming track.
… WebRTC-PC defines its own set of mute/unmute steps, but there is lack of clarity if what mediacapture-main says about muting still applies here or not which is more specific to camera and microphone, so the question is, does that still apply here?
… The way I read the spec, the definition of WebRTC-PC is the full description replacing mediacapture-main.

Harald: I think there are situation where it is natural to mute that is not listed, for example if the network is disconnected. Or the state of the peer connection goes to disconnected. It would seem reasonable to mute the tracks.

Youenn: We should get consensus on when mute should happen. I would try to get consensus on why we mute and we should list that.

Jan-Ivar: My proposal is that webrtc-pc listed all the reasons. It should list all reasons. We did this for constraints.

Henrik: I think we should separate the question of where we define mute reasons, from the question of if all mute reasons are listed. I agree with Harald that we should mute if the pc disconnects, but I think webrtc-pc should say this, not mediacapture-main.

Youenn: I agree and we can add reasons in a PR.

Jan-Ivar: We should focus on what has already been implemented.

Harald: I think we have consensus that for any case where we agree that browsers should mute, like the BYE, that should be in the spec. I don’t think we have consensus if it is up to the user agent to mute at other times.

Jan-Ivar: When mute and unmute happens in different specs could overlap and cause races.

Youenn: Maybe WebRTC-PC can remain open-ended, so that they are not open-ended, hopefully nobody is implementing mute for canvas capture and it would be good that if the spec said so. Then we could follow up that discussion.

Jan-Ivar: I think the reason for mediacapture-main’s mute being open-ended was for privacy reasons which may not apply to other specs. Hopefully other specs don’t need open-endedness so that we can get implementations to converge.

Youenn: If would be good if we can get a list of reasons why Chromium might mute.

General approach to capabilities negotiation

[Slide 32]

Bernard: MediaCapabilities indicate “supported”, “powerEfficient” and “smooth”.
… PING did a review in March 2021. They liked the fingerprinting analysis but questioned why we expose device capabilities for the purpose of negotiation as opposed to having the user agent negotiate based on capabilities and pick the one it likes the best. The problem is that this does not work with the RTC media negotiation model, this sounds more like a streaming use case model. No progress for years, PING wants progress.

[Slide 33]

[Bernard is doing PR #212 (see slide) and wants reviews.]

Align exposing scalabilityMode with WebRTC “hardware capabilities” check

[Slide 34]

[Slide 35]

Bernard: PING also did a review of scalabilityMode and said it would expose additional fingerprinting surface. But it’s not enabled in MediaCapabilities.
… Through trial and error you could set scalabilityMode to check which modes are or are not supported, but it does not tell if it is hardware or software. Maybe you can figure it out via performance in getStats.
… The bottom line is that in webrtc-svc you only get a subset of what is exposed in MediaCapabilities. We also don’t want to add a hardware check for MC since it it can be used for streaming use cases.

Henrik: scalabilityMode is a subset of MC, I don’t understand, did PING say MC is OK or is it that they haven’t had time to object to MC yet? These issues are entangled so I think we need to be consistent.

Florent: We need them to understand that the way RTC works on the Internet. How about we invite them and explain the situation?

Jan-Ivar: What’s unique with SDP is that it is exposed to JavaScript, so there is no way not to expose this. But if MediaCapabilities is not exposed, you could still do a suboptimal call, so we need to figure out if that is tenable. We could determine the minimum set of codecs that need to be exposed, and if those are the same across browsers than it wouldn’t say much.

Harald: I don’t think we should waste time discussing such redesign, at least not on this basis. Our current webrtc-pc is what it is.

Bernard: Codecs tend to come in waves, so really only what you’re learning is if they have a new device or not, it’s not a huge privacy risk.

Youenn: We don’t have the same analysis, we think it is a real issue. As the older devices diminish it will become a very important fingerprinting.

Bernard: I will continue to work on the privacy analysis.

How does generator.mute change track stats?

[Slide 36]

Bernard: What happens when you mute with the generator attribute. One option is that you fire the event, or you could queue a task to fire mute on all of the clones.

Proposal: let’s go with the second option.

RESOLUTION: Let’s go with second option

Is RTCEncodedVideoFrameMetadata.frame_id actually an unsigned long long or does it wrap at 16 bits?

[Slide 37]

Tony: In chromium this is implemented from the dependency descriptor which is 16 bit, but unwraps it into a 64 but unsigned on the receiver side.
… Proposing to keep unsigned long long. frameId is a monotonically increasing frame counter and its the lower 16 bits will match the frame_number of the DD header extension.

Bernard: Do we care about dependency chains? There could be circumstances where the dependency is fulfilled. [...] I support what this slide is saying, we can talk about chains in a separate issue.

RESOLUTION: Consensus to move forward with Tony’s proposal

Mark resizeMode, sampleRate and latency as feature at risk

Jan-Ivar: Some constraints only have one implementation, so the proposal is to mark them as feature at risk.

Guido: I object because reasizeMode is widely used by people who use Chromium, some users have even requested additional resize modes. The other three Chromium implements them to varying degrees, latency is used particularly on Windows for users to select capturing with lowest possible capture sizes. So if we eventually remove them from the spec then we would not be able to remove them from the web because it would break the web. Sample size is implemented and exposed by Chromium but I’m not aware of any use case.

Henrik: I think sampleRate relates to another issue where people today use SDP munging to change the codec sample rate, and a possible outcome of that was if it should use the track’s sample rate, but I’m not sure about the status of that.

Youenn: Maybe we can move these to the mediacapture-exensions instead of being marked as feature at risk? Or maybe both. But eventually we may remove them from mediacapture-main

Guido: I think it makes sense for sampleRate, sampleSize and latency but for resizeMode I think it is important.

Jan-Ivar: Is chromium’s default to automatically downscale? Guido: Yes. Jan-Ivar: We’re also planning to do this default, so I’m curious what the remaining use case is for developers to turn this off.

Guido: Some people want to make sure they get a native resolution.

RESOLUTION: Move sampleSize, sampleRate and latency to the extension spec. And then work harder on resizeMode.

Highly detailed text in video content

[Slide 40]

Harald: We have some text saying that if you use contentHint in “text” you activate some flags in AV1. The original PR was intended to acknowledge that in some scripts, details matter more than others (see examples). If you downscale those fonts those would become unreadable earlier than if ASCII be unreadable.
… Bernard also noted that red text on yellow background will work worse than black and white if 4:2:0 coding is used and recommends 4:4:4. We may not want to mandate that due to extra overhead, but we could…
… Reword addition to note that encoded of colored text may cause readability issues
… Recommend 4:4:4 if colored text dominates when contentHint “text” is used.
… Mandate use of 4:4:4 for this case

Youenn: I wouldn’t go with mandating, it is a contentHint, so I’m all fine with saying “hey user agents please advice…” but in terms of mandating I’m not sure we will have wording that is always right so I think that’s too far. So 3 is out.

Bernard: 3 is also out for me, there’s a lot of extra bandwidth, and is this even supported? Anyway it’s certainly not prevalent so mandating seems too much. Even recommending seems pretty high for something like this, it’s almost like saying that someone who implements AV1 must implement it.

Jan-Ivar: I would also gravitate towards the lower numbered proposals. In addition there could be an API that is not hints-based, for example constraints that specify. I’m reluctant to add new functionality that only acts on a hint on the track. Perhaps there should be a corresponding API on the sink instead. That would rule out 2 and 3.

Fippo: We do have 4:4:4 support for H264, but I wouldn’t recommend it too much, people can codec negotiate all they want, I’d go for 1.

RESOLUTION: Consensus on proposal 1 (note, not recommend).

Comments and request from APA review

[Slide 41]

Harald: The APA has reviewed contentHint. See slides for issues that we may or may not need to address.
… We don’t do links between things at higher level. I suggest saying that in the track model a track is a track is a track. Things that link tracks together need to be specified on a higher level. We should not have regions on videos. I think in general we should reject.

Bernard: I think in general this is not a problem for the MST contentHint spec, but there are things in the media capture working group worth discussing. There may be some regulation that applies to some of this.

Harald: But these things can be addressed on a higher layer, for example I just turned on CC so it is possible to have a separate track with subtitles.

Bernard: I’m just worried that APA gets ignored like PING, maybe we should have a joint meeting.

Harald: We could do this at TPAC if we get the right people into the same room.

Harald to draft a reply.

WebRTC-Extensions: API to control encode complexity

[Slide 45]

Florent: We want to be able to optimize the tradeoff between device resource usage and compression efficiency for different use cases. Affecting CPU, video bitrate and quality.
… We looked at similar APIs, in Android Media it’s a 0-9 integer, on Azure Media Services it’s “speed, balanced, quality”, in x264 (an H264 library) there is a wide range of presets from ultrafast to veryslow.
… The actual results could vary depending on the codec or specific encoder used and are not meant to be fixed by the specification. But we expect on average encode time and QP is affected as per slide depending on a low, normal or high complexity mode.

[Slide 46]

Youenn: Sometimes compressing more is better for battery due to using less battery, how does an app decide what is a good decision and does web applications already have this information or is it not available? Have you considered specifying if you prefer battery or quality as an alternative API shape?

Florent: There are different options that could be discussed. I think it is important to have a per stream setting, for example the presentation is more important than the face.

Youenn: I wonder how the page could know which setting to use.

Florent: It’s more about knowing what the stream is used for. Driven by use case. For example thumbnail is less important. But the app could also monitor encode time in getStats and use that to decide.

Jan-Ivar: I was going to say we’re going from user agent which has a lot of information to the web application which has less information so it could make it worse, but you make a good case that it can know that one stream is more important than another stream. So I just have a bikeshed question on the naming. But I’m also concerned if a web app just asks for high quality across the board.

Florent: I’m not opposed of changing the names. This is more to say if this is more important or less important. This is mostly about CPU time allocated to encode. It’s very common in other APIs.

Jan-Ivar: Would medium or middle mean that user agent decide?

Florent: Yes the user agent decides, but you can also have the web application tell the user agent to

Jan-Ivar: It might be better to have the default be unset.

Fippo: Should this also exist for audio? It sounds a lot like a setting that exists in opus where there is a value between 1 and 10.

Florent: I’m not opposed, but it would be per browser and codec dependent, so it’s more like a hint to the browser. But there is nothing preventing us from doing this for audio as well.

Youenn: The user agent is still doing degradation and adaptation, so this sounds more like a priority between streams rather than CPU or QP.

Florent: But if you use less time that would affect the QP.

Henrik: I don’t think this is just about priority between streams - it’s that too, but I think even for a single stream you could have one use case where you only care about bitrate but another use case where it’s all about quality. Right?

Florent: Yes.

Bernard: What about upper or lower bounds? Is there a limit? Can it affect jitter buffer?

Florent: It’s still WebRTC deciding, it’s up to the user agent that the impact is minimal.

Harald: The control knob should be specified on encode, not priority, we already have priority APIs. I don’t much care about the name but it needs to be specific to encoding.

Bernard: Do we have consensus we want to go ahead with this?

Youenn: I think it’s worth exploring.

[Fippo gives thumbs up.]

[Florent will come up with a PR so we can iterate]

Summary of resolutions

  1. We should move mediacapture extension tests.
  2. Let’s go with second option
  3. Consensus to move forward with Tony’s proposal
  4. Move sampleSize, sampleRate and latency to the extension spec. And then work harder on resizeMode.
  5. Consensus on proposal 1 (note, not recommend).
Minutes manually created (not a transcript), formatted by scribe.perl version 222 (Sat Jul 22 21:57:07 2023 UTC).