Media WG meeting – 30 April 2025

Meeting minutes

Media Capabilities #231

Mark: I created this PR in response to issue 152. When a site wants to query for decoding support with parameters that include color gamut and transfer function
… The discussion was updating the steps in the spec taking these into account when they don't match the MIME type
… If the passed MIME type isn't compatible with the color gamut and transfer function parameters, we want to return unsupported
… We rewrote the steps to check MIME type support in #222
… There are some details to check on
… I think these parameters only matter when decoding video. The way the steps are written, they're passed as extra input to Check MIME Support, but they're undefined unless you're checking for video support
… That seemed the cleanest way to do it, otherwise you'd have to fork the steps
… The second thing is: I now realise this doesn't cover HDR metadata. I don't know if this was intentionally omitted from the discussion, or an oversight. It could easily be added, with the other parameters
… Finally, there's one test case that checks for the mismatch between color gamut and MIME type. It passes in Chrome and Edge but not in Safari or Firefox
… If we land this I'd want to add tests for all three parameters
… I think the PR is good to land, unless someone feels strongly they want HDR metadata included
… I'd like review feedback

Jer: I can have a look at the PR, to see if I still have comments
… WebKit already does some validation of the MIME type to see it matches other parameters, e.g., height and width, so this is in the same category

Mark: That seems to be how the discussion concluded. Also want feedback if the steps make sense in spec language
… I can look into the test failures too

Jer: I can look too

Francois; Some of the checks look at the color gamut for the MIME type. Is there a way to make that clearer, from an interop point of view? Does it mean you need to check the codec spec for the MIME type? Risk of different interpretations?

Mark: Codecs inherently have support for one or more color gamuts. That hopefully is expressed through profile arguments in the MIME type. So you'd have to refer to individual codec specs to know what's valid

Jer: To expand on that: one is that the codec doesn't support a color gamut or transfer function. Or if parsing the MIME type is somehow in conflict with other parameters passed in

Mark: VP9 has some profile info to say what color space is

Youenn: A question about MIME type validation. WebRTC is looking at this for MediaRecorder
… Would it be a good approach to reference MIME type parsing from MC API?

Mark: You can query using the 'record' type

Chris: The algorithms could well be reusable, if not exported already

Youenn: So we could call the algorithm from MediaRecorder, and reject if not supported? Are there hooks in the spec for that?

Mark: There's an xxx algorithm

Youenn: I'll file an issue and tag you, Mark

Media Session #358

Tommy: When we addded the enterpictureinpicture event, I suggested adding a flag to know why it was triggered
… We didn't have use cases at the time, so we omitted it for launch
… But we found sites do have a use, some way to distinguish between manually and automatically triggered PiP
… There are web developers who want more info than that. Right now Chrome only automatically opens PiP on a tab switch, but we're thinking about other scenarios, such as mimising
… So we might want an enum rather than a boolean
… Youenn asked about a more declarative way
… I want to propose two things. Some automatic PiP API you can turn on or off, and on a video element
… Additionally, in the Media Session action details, have a reason enum to say why it's triggered
… If a website wants auto-pip on, they want to know the reason. Developers want both

Youenn: I think a declarative autoplay policy is something we could consider. Being able to decide in the action handler whether to auto-pip or not isn't something we may be able to implement
… Having a way to declare they want auto-pip is fine. It would just be a preference, and the UA decides
… Having a boolean in the details is useful for statistics, to understand what the user is doing, but not for deciding whether to auto-pip or not?

Tommy: I think having both makes sense, both for stats, and for sites to decide on the fly so they don't set a boolean over and over

Youenn: And not auto-pipping would stop playing the video?

If they want to decide on the fly, it means on iOS the autopip would start, and the video playback would stop. Is that good for the user?

Tommy: Could they close?

Jer: There's a distinction between closing and returning to inline from PiP

Tommy: So in the iOS case their only option would be to turn it off in advance
… Do you oppose adding the details? Is having declarative more important?

Youenn: We haven't discussed in detail internally. The declarative might be higher priority. We can discuss and comment on the issue

Tommy: I also need to update the issue based on discussino with web developers

Jer: Whether or not the details is added, I don't tihnk it would be possible for us to implement it given the architecture of PiP on iOS so it wouldn't be something we expose or use
… So as long as the spec doesn't require those values to be used...

Tommy: If you open an auto-pip, I don't think it's required to call the enter-pip action. So you can avoid it if you want

Youenn: I agree, the dictionanry has attributes that aren't required

Tommy: With iOS, when would you call the enter-pip handler?

Jer: I don't think there's anywhere we expose an enter pip. I don't want to preclude it in the future

Tommy: I'm fine with it being optional, and happy to have both declarative and the extra details for websites to make their UX user friendly

Jer: Declarative is higher priority because the iOS auto-pip is already declarative. You have to say which things will go into auto-pip

Tommy: For the declarative approach, make as part of Media Session, or something on Window or video maybe?

Jer: Good question, will need to think about it

Tommy: I can capture this in the issue

Chris: Suggest raising an issue against the PiP spec for the declarative part

Youenn: Agree. Could be a boolean, but having a way to select which element to auto-pip

Jer: If you make it an element on the media element that reflects to a DOM property, it's the most declarative approach
… PiP is a shared resource, one at a time. So you may have a situation where you reduce but not eliminate ambiguity. So having it on Media Session, making it a choice wich is auto-pipped is more declarative

Tommy: If all we have is a way to say "this video element is the one", that's not enough. We have docment PIP, so there may not be video element
… I'll file the issue

Audio Session #6

Chris: w3c/audio-session#6

Chris: Should AudioSession be able to specifiy the output speaker and/or route options

Sunggook: Last time we discussed making this part of Audio Session. But every frame could have their own audio session, so this proposal is making it global
… Currently we have setSinkId, and it globally changes the output device for the top level and all sub frames
… It can be called from any iframe
… Question at the time: Do we need a new permission to call setSinkID? setSinkId is a per Audio Context or element
… Second issue is iframe, children and siblings. Do we support the vertical tree only? Proposal: alllow it to be called from the top-frame only or same-origin

Chris: Is this the same as discussed last time? Top level + permission policy to allow calling from iframes

Sunggook: Use the existing speaker-selection permission? Call from top level and from same-origin iframes

Youenn: I'm not sure whether we want the whole tree or the vertical tree. For the top-frame only, we don't have to decide...
… I think it's fine to set at top level, but we should discuss about whole tree or vertical tree
… I'm not sure I like any iframe to set the output for the whole page
… We could allow the API call for the top frame only and for sub-frames to say not supported or not allowed

Sunggook: That would be clearer in this case
… Assuming we only allow calls from the top frame, where to put the API?

Youenn: No preference, for the vertical tree, I prefer to have the mechanism in Audio Session, but it's not clear there yet if vertical or whole tree. It could be in Audio Session or in the Audio Output spec (in WebRTC)

Sunggook: Can there be multiple audio sessions in a single frame?

Youenn: Not possible, we could have an audio session constructor, to let you tie different audio producers to different audio sessions
… We'd need web developer input to do that work

Youenn: Ok to discuss here, but we'll need to discuss with WebRTC

Nigel: Is this a declarative API? There could be calls at different times at different points in the heirarchy, so which takes precedence? So having a declarative model could help

Sunggook: It's a global API, so the latest call takes over

Nigel: Is that good for users? Can be confusing for users if the order in which you do things becomes important

Sunggook: That depends on the developer providing multiple ways to confuse the user, to call at different times. This is like getUserMedia
… Hence the discussion today about restricting to the top frame

Nigel: The use case in the explainer includes routing different to audio to different devices. Seems difficult to set up, a declarative model might be clearer and easier to manipulate

Sunggook: In the explainer there's example with the web page and a native player both playing audio. So there's a different default device for all pages

Nigel: A related example is an accessibilty use case. You might have a group of people watching the same people, someone wants to hear it with audio description mixed in, others don't

Sunggook: I think that's already supported through setSinkId, The audio element or audio context can have its own sink id

Nigel: What if someone changes at the global level, to override a specific setting elsewhere?

Sunggook: If someone chooses a specific output, this global API wouldn't override it. They use the default output from the iframe. So there's an existing setSinkId, that any frame can use. The global API doesn't affect them

Nigel: I may need to do some more reading

Sunggook: Please file an issue, we can continue to discuss

Chris: Summary?

Youenn: There's some consensus that going with top-frame only in the short term is fine
… Two questions to address: Which spec should these go in ? And in future should we define the API in terms of the vertical tree or the whole tree?
… For the second issue, a use case to check is WebRTC solutions in websites to provide video calls. They're in an iframe. What do they want to do? Route only their own audio, or route the whole page audio?
… It could depend on where the device picker is, is it in their hands, or are they only controlling the rendering and sending of audio

Chris: And continue the discussion here before taking to WebRTC WG

Next meeting

Chris: Our next call is in 2 weeks, at the later time

[adjourned]

– DRAFT –
Media WG meeting

30 April 2025

Attendees

Meeting minutes

Media Capabilities #231

Media Session #358

Audio Session #6

Next meeting

Diagnostics