W3C

Media WG Teleconference - 2022-12-13

13 December 2022

Attendees

Present
Alastor Wu, Bernard Aboba, Chris Needham, Dale Curtis, Eric Carlson, Francois Daoust, Frank Liberato, Harald Alverstrand, Jer Noble, Matt Wolenetz, Peter Thatcher, Sushanth Rajasankar, Youenn Fablet
Regrets
-
Chair
-
Scribe
cpn, tidoust

Meeting minutes

ITU-T SG16 Liaison statement on WebCodecs

cpn: We received an incoming liaison statement from ITU-T SG16.

https://github.com/w3c/media-wg/blob/main/liaisons/2022-10-28-itu-t-sg16.md <- Draft reply

cpn: Around WebCodecs, and also around new VVC codec.
… I drafted a reply, which describes WebCodecs, the use cases, a few indications about our own plans such as current work on VideoFrame metadata registry.
… I shared this. Got a thumbs up from Bernard, Jer, Paul.
… I want to make sure that everything we write here is representative.
… I was hoping to get this out before the Christmas break.
… If you haven't had a chance to look at it yet, now would be a good time.

youenn: I like the fact that you state that the group would be open to add a registration provided there was support from implementors.
… I assume that means user agent implementors?

cpn: That's a question for the group perhaps. H.263 comes to mind for instance.

Dale_Curtis: I don't think that we want to be gatekeepers of what the registry contains, even though there isn't support in web browsers per se.
… We'd still want some technical constraints to be met.

cpn: Right. That would apply to any future registration as well.

WebKit update on Audio focus/audio session API

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf

[Slide 2]

Youenn: We received reports that audio handling on iOS isn't easy, e.g., VC applications
… The intent of the application may not match our heuristics for setting up the audio pipeline
… So a new API may be appropriate
… You might remember the Audio Focus API, initially in Media Session, then split out from that
… There's an explainer, linked from the slides
… The overall goal is to get feedback, is the scope right, next steps?
… Compared to the original Audio Focus API, we wanted to reduce scope, for the iOS platform
… We focused on the audio session category, and interruptions
… The API should support future features such as requesting or abandoning audio focus
… Handling audio providers as a group
… We wrote an explainer, and a prototype in WebKit

[Slide 3]

Youenn: Some examples: setting the audio session category, you can open the demo in iOS
… playAudio and capture functions, for microphone input
… If you call playAudio initially, then capture, it's disruptive in iOS. The reason is that when you play using Web Audio, it's ambient
… Two different audio levels when going from ambient to play & record. Something we want to avoid
… The setCategory function allows you to set the category to play & record, don't use ambient

[Slide 4]

Youenn: On interruption, when you're in a video call, you might receive a phone call, which is higher priority, and the website is interrupted, capture stopped, audio or video elements may be stopped
… But the website may not know that
… It's also not clear to the website whether to restart audio after the phone call
… Providing the concept of an audio session, which can go between active and interrupted, allows the website to change what is visible to the user
… On an interruption, it could show a UI, or UI to allow the user to restart capture

[Slide 5]

Youenn: We tried to keep the API small. There's an audio session state and audio session type. Then we added an AudioSession interface, which we though was clearer
… Use that to say it's ambient (mix with others), or play & record, so the UA can set the audio pipeline accordingly
… There are event handlers, no constructor. For simple use cases, a getter on navigator to get the audio session
… A default global audio session. Use this object to query or tailor it

[Slide 6]

Youenn: My main interest is not to go into specific issues. More issues are welcome
… Question: is this of interest, is it going in the right direction? Any thoughts on potential next steps?

Dale: From a Chrome point of view, Mounir and Becca worked on it. At a glance, seems reasonable. There might be worry about duplication between Media Session and Audio Session, but no specific thoughts on that

Youenn: The API shape is different, there might be only one Media Session in a page, but only one Audio Session
… The call to split the two things in the past is OK
… We decided to delay the grabbing and releasing of audio focus. There might be other things to consider, e.g., auto play
… A question I have, is it's not yet submitted in the WG. Is it already in scope?

cpn: Looking at the charter, Audio Focus API is in the list of potential normative deliverables
… We just need to run a call for consensus to adopt the spec to the Media WG

Sushanth: How to handle audio from multiple tabs?

Youenn: This would help with that

Sushanth: If the audio type requested by one browser is playback, and from another is ambient, only one can exist at a time

Youenn: You'd mimic what two native applications would do. One session with playback would probably not be interrupted by another that requests ambient

cpn: At what point would we be ready to run a call for consensus on this?

youenn: If there's already consensus in this call, we'd be interested to run it as soon as possible.
… No particular hurry, but the sooner the better.
… If there's no consensus, we'd like to know what to work on.

cpn: Just worried about support from other browser vendors.

youenn: We talked a bit with Mozilla. I can check with them and get back to you.

alwu: From Mozilla Firefox perspective, that's an API we'd be interested in supporting as well.

Dale: And no reason to hold off calling for consensus while we figure things out internally.

jernoble: In the meantime, feedback on existing issues is welcome.

cpn: So proposed resolution is to run a CfC.

Consistent SVC metadata between WebCodecs and Encoded Transform API

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf

[Slide 2]

[Slide 3]

[Slide 4]

[Slide 5]

Bernard: [going through slides]. Sequence of unsigned long dependencies. There's also some missing information.
… We're essentially re-inventing WebCodecs in another spec, perhaps not the right way to go.
… Two different SVC metadata dictionaries could be avoided.
… Temporal may be shipping in Safari, but spatial is not shipping anywhere.

Dale: I'm in favor of unifying what we can.

Bernard: Proposal is for a few of us to get together and prepare a PR to harmonize things
… This would at least avoid future issues.
… We made some progress in the last couple of days, and Youenn prepared a bunch of PRs that solved a number of type mismatches.

cpn: Is this something for the WebCodecs spec itself or the metadata registry?

Bernard: This is for encoded metadata for which we don't have a registry.

Media Pipeline architecture - Media WG input and WebRTC collaboration planning

cpn: Back at TPAC, we identified several places where we may benefit from coordination between groups.
… This is picking up on where we're at with this.

[Slide 6]

Bernard: We created a Media Pipeline architecture repo following discussions.
… Issues and pointers to sample code covering integration of next generation web media apis.
… Also to go beyond just the specs we mentioned already, e.g. WebTransport which could be used to transport media.
… From time to time, it's hard to undertand whether there are performance issues in the specs, implementations or in the code sample.

[Slide 7]

Bernard: When I started of, I was thinking about capture with Media Capture and Streams Extensions, then encode/decode with WebCodecs (and also MSE v2 to some extent), Transport (WebTransport, WebRTC data channels in workers), and Frameworks (WHATWG streams, WASM)

[Slide 8]

Bernard: The pipeline model is based on WHATWG Streams, through TransformStreams piped together.
… When you're sending frames, you have a several options, e.g. reliable/unreliable, etc.
… To stream these pipelines together, you have to use all of these APIs together. Does it all make sense?
… I don't know that many developers who understand all of these APIs.

[Slide 9]

Bernard: Some issues already created in the repo.

Media Pipeline architecture repo

Bernard: A lot of the issues are focused on transport.
… There are a few things that are worth discussing here.
… E.g. rendering and timing. Media Capture Transform is an interesting API. Does VideoTrackGenerator have a jitter buffer? Does it not?
… That is not particularly well defined in the spec.

[Slide 10]

Bernard: We have two samples at the moment. One is a WebCodecs encode/decode in worker in the WebCodecs repo.
… The second one adds WebTransport to that. This one took more work to optimize the transport. It adds serialization/deserialization.
… We use frame/stream transport. That's not exactly RTP but it's close.
… We're using SVC at baseline and partial reliability.
… Overall, it's working surprisingly well.
… I had to do a reorder buffer but still not a full jitter buffer.

[Slide 11]

Bernard: Here are some of the things that you can play with.
… You can play with this stuff. At the end, it generates a Frame RTT graph. That does not really give you glass to glass measurements.
… Performances are pretty reasonable now after some work.

[Slide 12]

Bernard: Slide shows an example with AV1 at full-HD.
… What's interesting is that key frames can be transmitted within a single congestion window.
… General question is what do we do with this?

cpn: That's really great to get that practical feedback from building things.

Bernard: Yes, we're seeing a lot of stuff. Similarly, there are a few things where I don't know enough of the internals to understand what needs to be done.
… You have to be cautious of await calls with WHATWG Streams, since they are going to block. Debugging is also hard.

youenn: Note you may use JS implementations or ReadableStream and WritableStream to ease debugging.

Bernard: Good idea. You can get a dozen stages and you don't really know where things are in the different queues. It's not easy to figure out what happens. The code is fairly small though.

cpn: Immediate next step?

Bernard: Adding APIs in multiple groups adds question. It's worthwhile checking in on this periodically.
… I don't want to act like I have a handle on this.

cpn: OK, we'll talk more about how to improve that cross-group collaboration.

cpn: Our next meeting will be on the new year. Happy Christmas and looking forward to seeing you next year!

Minutes manually created (not a transcript), formatted by scribe.perl version 196 (Thu Oct 27 17:06:44 2022 UTC).