Media Working Group TPAC meeting

Meeting minutes

Agenda

cpn: Quick review of agenda
… EME - I have some extra info from CTA WAVE
… Then Markus on Media Pipeline performance
… Then the morning break
… AB visit to talk about Process
… WebCodecs Reference Frame Control and Encoded Transform, following up from joint meeting with WebRTC yesterday
… iframe media parsing
… lunch
… Media Capabilities
… Audio Session
… Registries
… MSE
… wrap-up
… Any changes to the agenda?

no changes requested

cpn: [Health and safety rules, Code of conduct reminder, antitrust and competition policy]
… IRC channel: #mediawg
… Please join - we would like to use that for the speaker queue, q+ yourself
… I will also try to respond to people raising a hand on zoom

xhwang: I was just seeing if it worked!

Charter

cpn: We've just rechartered
… Main changes are looking at a protected media pipeline including WebCodecs
… Nothing on that today
… We're v specific about what we intend to do with EME, we have coverage for features Xiohang will take us through
… We have 10 specifications in progress, all in WD except Media Playback Quality in ED.
… Question if we should hand that over to WHATWG or keep working on it here.
… My goal as Chair is to help us make progress and advance specs to their next maturity status, eventually to Rec.
… Next after WD is Candidate Recommendation Snapshot.
… That needs us to address feedback and complete Horizontal Review

and Wide Review

cpn: Want to maximise time taken today to move what we can forward.

EME

xhwang: [shares screen]
… Suggest following the order of issues on the agenda

<tidoust> Key System divergence on setServerCertificate() fallback behavior

Key System divergence

issue #573

xhwang: Recent issue
… Some key systems do setServerCertificate but not all.
… Overall I feel we should not regulate this in the spec but relax it a bit.
… There's a note saying it's intended to be an optimization, and is not required.
… There are existing applications that do not call `setServerCertificate()`
… I think it's a nice optimization but the spec shouldn't need to enforce it.
… I added details about Fairplay and Widevine

cpn: If it's not a required step, the developer needs to know which system they're using to know whether to call it?

xhwang: I actually have this situation.
… The key system is required to support requesting cert from the server via a message
… so why bother doing it?
… Our current spec makes it look like there's optimization built in but it can still work if not called,
… in reality I don't think that's the case.

xhwang: My proposal is to say SHOULD call `setServerCertificate()` otherwise it may fail.
… and a note to say that the message option is not supported by all

cpn: Does this look okay to people?
… You had a question around Fairplay behaviour. Is the answer to that going to change the proposal?

xhwang: I don't think so. I don't see a reason to regulate it, it's too strict.
… Unless Fairplay or Apple really wants to implement it.

eric-carlson: It seems like a reasonable change, and not have a MUST when not every system can support it.

xhwang: Agreed

cpn: Agreed
… If it's not a PR already please open one, it seems uncontroversial

xhwang: I will work on that

Mixed encrypted and unencrypted content

<cpn> w3c/encrypted-media#251

xhwang: Not all systems support specifying mixed encrypted and unencrypted content

<tidoust> Specify mixed encrypted/unencrypted content

xhwang: There are a lot of discussions in the issue.
… My summary of where we agreed:
… UAs must support switching between clear and encryped streams if MediaKeys is set before playback starts
… Sets clear expectations
… Ad insertion is a use case
… The UA can set up the media pipeline correctly if this is set.
… Otherwise the media pipeline assumes clear playback and when the stream switches to encrypted then in many cases it causes a failure
… For UAs to support this especially with hardware decoding it is complicated
… Having this text in the spec will make the implementation a lot easier
… That's the proposal
… There are some questions about the detail of the spec

cpn: I haven't had time to review your answers.
… Do any of them change the proposed text?

xhwang: I think it's more about the notes.
… You ask a good question what does it mean "before playback starts"?
… I think it's something like when the READYSTATE is HAVE_NOTHING

cpn: Makes sense to tie it to the media element state

xhwang: We can work on those details.
… Does anyone have any bigger comments or objections to this direction?

jya: From what I've seen working on Safari bugs, when there is a switch of content,
… whether playback starts with encrypted content or switches to it, especially with Netflix I've seen
… them send a new init segment that indicates the encryption state,
… so there's no notice period
… They might play 1 minute of clear then switch to encrypted.
… From the UA perspective all I see is a new init segment
… and no other information from the website

xhwang: That's true. Subtle difference is that the clear lead allows them to send a signal that the content is encrypted
… but all the frames are clear at first
… The media pipeline doesn't get any signal.
… Either way you're right that there's no other signal
… My point is that implementations like Chromium: pipeline cannot support the switch
… So the page needs a workaround like a new media element, but the switch is not smooth
… Intent of the proposal is to prep, without needing even to fetch a license.
… Cost is minimal, useful signal for setup.

jya: I understand from an EME perspective but in MSE I don't think you can indicate the encryption

xhwang: True

greg: What signal does jya need, we are doing exactly what he says

xhwang: For MSE change type we should have some indication for the switch

greg: You're right we can indicate when we switch from clear to encrypted
… and we were in favour of this proposal to ensure this continues to work.
… We're open to suggestions but would prefer not to change media sources.
… That's one of the alternatives - we want it as seamless as possible but don't object to that.

jya: MediaSource has been made so that you simply unqueue a new init segment that indicates what
… you are going to change to. It's commonly used even though there's no clear indication that you're going to switch.

xhwang: Are you proposing an addition to this proposal?

jya: No, good to have coordination between the two [MSE and EME]
… When you do the switch it's already too late. We need an early signal.
… I agree some coordination for additional information would be nice.
… Will the UA or application do anything differently with the additional signal?
… (asking myself!)

jya: I'm not too sure. A lot of things I've seen recently start with MSE and clear content,
… and start a new media key session later in the stream and it just magically works.
… Not seen any way to signal that pattern. I believe it's supported by all UAs so maybe it is a non-issue.
… I don't personally deal with hardware that has such restrictions.
… But there's no way to query if starting with clear and then switching to encrypted, with the same MSE object, is supported.

xhwang: Right, there's no such API
… We do work with hardware this issue a lot

greg: Not all UAs handle this in practice

jya: Oh I see. I've seen Firefox and Chrome do it, but maybe not on all hardware.

xhwang: There are more issues with hardware pipelines related to this
… I feel the change we're saying is pretty straightforward for apps to do and the cost is minimal

cpn: The normative change is the first part, requiring implementations to support switching if the Mediakeys is set.
… Concern about notes re quality of implementation issues and if some of it should be normative
… Do we have enough capability detection, maybe we don't?

tidoust: I'm wondering if having some test cases would help make sure we have a shared understanding
… about what it means to switch and what "before playback starts" means.
… Making sure that this proposed statement can be tested.

cpn: And whether the mechanism is to have a separate media source.
… MSE has a detachable source buffer change - does it have a bearing on this?

greg: It's related and we're in favour of that for other reasons but this is another issue.
… jya's proposal allows a solution to a different problem, keeping buffers while switching sources

xhwang: What MSE does at the javascript level doesn't affect this issue.
… I agree we can give some examples.
… Example would be code with MSE start with clear and switch to encrypted
… and either set mediakeys up front or not. That would be the test.

sushraja: Question if this implies if the DRM systems now all have to support clear.
… We know that older OSes don't support "clear-lead" on MSE.

xhwang: I don't know whether this implies that or not.
… That's a great question.
… I suppose somewhere in EME it says clear-lead should be supported.
… The original thing here was about switching from clear to encrypted.

sushraja: I think we need capability detection then

xhwang: The fact that one key system doesn't support one case should not dominate this discussion.
… We are trying to say the reasonable correct behaviour. We are separately trying to fix Playready behaviour so that won't
… continue as an issue anymore. The spec is long lasting.

sushraja: We will fix it but not for old implementations so we will need capability detection for clear-lead

xhwang: I see

sushraja: Then systems that are supporting it would work

xhwang: I think that's a different issue. I filed an issue about more feature detection - clearly this is an example.
… I had a different example.
… There are a lot of reasons for a common system to detect support.
… We can probably discuss this in that other issue.

cpn: Is that the plan, to move ahead with this proposal and investigate the need for capability detection

xhwang: That's my suggestion

cpn: I see Greg nodding. Any objections?

no objections

cpn: That sounds like a plan. Can we open a PR for this and then fold in the detailed points?

xhwang: Yes. I propose we continue that discussion especially about the Notes, and then open the pull request

Key rotation support in EME

<tidoust> Support continuous key rotation per MPEG Common Encryption (ISO/IEC 23001-7)

xhwang: In the common expression spec we are saying embedded keys should be supported,
… but it wasn't present originally in 2011. It was added in 2012 as an amendment.
… The title includes "key rotation" even though the spec doesn't mention that at all.
… There are many ways to embed keys. Just one example:
… Root keys and embedded keys, the embedded keys are encrypted with the longer lived root key
… The embedded keys can be used to decrypt the actual stream.
… How does the root key get delivered to the CDN? There's no one way.
… Some people bake it into the client, others have a separate delivery channel.
… There's an EME way with a normal EME licence requiring a change to deliver the root key to the client.
… With that one, there's packaging, then the encrypted content keys are in the "moof" PSSH box.
… At playback we continue to get the embedded keys and use the CDM to decrypt the key and use it to decrypt the media.
… This is about performance and efficiency
… Especially for live streams millions of clients might cause a client storm getting the updated root key when it rotates.
… Problems with the current spec?

<cpn> ... [Reads the explainer]

xhwang: Big summary is that now the spec says the keys are only used [scribe missed]
… Spec is incompatible with the key rotation problem we describe.
… Not a new problem. There were a lot of discussions before,
… both technical and the spec editor really tried to make EME into a Rec and didn't have time to deal with this.
… I'm trying to state the requirements for Compat, Simplicity and Interop.
… Not sure about all systems, but I know some have a key rotation feature.
… Many TV industry people request this feature.
… 1st proposal:
… Relax generateRequest() to accept data without generating a key request
… [goes through sequence diagram]
… I did check with Joey the owner of Shaka player and former spec editor
… He said most players don't track the pair of generateRequest() and key message,
… so just having generateRequest() shouldn't break anything, so I believe this won't break existing implementations.
… 2nd thing more interesting. What if one session closes while another one is opening - keep the rootkey from the first or discard it?
… When a session is closed we say now that keys in other sessions must be unaffected
… But in our new model if one session is closed then others might be affected.
… Trying to introduce a parent-child model for sessions.
… [goes through 2nd proposal sequence diagram]
… Proposing a new attribute on media key session called "parent".
… s2 can point to s1 as a parent.
… Then if s1 closes then it closes s1 and s2.
… But it is not required.
… If for some system the key is baked in the client such that all the sessions are independent of each other
… then closing one session wouldn't close another, so it's versatile in that case.
… Single Session Mode: the implementation requires every session to be closed before another is opened.
… For compat, must close s1 before creating s2.
… In this case the key can only work with one session so there's something like "parent-merged"

cpn: Time check - any thoughts of what we've seen so far?
… Have others had time to review this before the meeting?

sushraja: Context question. In s2 session if the keys are changed and the UA handles the chain of keys internally, why do we need this?

xhwang: Yes, some systems work like that.
… In current implementations, in Chromium, the PSSH is not passed in the media pipeline at all.
… Also if we do that then the demuxer needs to know which PSSH to use.
… They are specific. I don't know how the browser can tell which one is which.
… I think moof box handling rules are too specific for EME, there are other box types.

sushraja: So you'd need to parse the media segment in JavaScript?

xhwang: Two options: JS, when it sees initdata it will create a session. 1, Keep the new session, and manage the keys there. 2, if the CDM can support in one session it can close the session and manage them as one

<tidoust> [xhwang to stay closer to WebIDL and make it easier for external people to review, I would suggest to reformulate "Add a new attribute `readonly attribute MediaKeySession.parent`" as "Add a new attribute to `MediaKeySession` defined as `readonly attribute MediaKeySession parent` (interface this goes to clarified, attribute type defined, no ".")]

cpn: I think people need time to digest this and work through the detail. Can schedule for a future call.

xhwang: Of course, I'm just raising awareness
… I think it's an important feature

cpn: Very much so, I'm hearing that from other industry groups

tidoust: Thank you for the explainer
… the next step is to send this to TAG for review.
… TAG sent some comments about being careful with EME.
… The explainer highlights the key aspects that justify such a change, such as access to live streams as the main use case.
… Maybe worth reformulating the explainer a bit to really highlight what end users will get.

xhwang: I see, highlighting the use cases.

tidoust: For example the part that says you avoid impacting servers is good for content providers but not for end users

cpn: Emphasise the end user

nigel: That might be a false distinction, as reducing load on servers improves the user experience

tidoust: what I'm highlighting is the benefit of having live streams in the first place

Media Pipeline Performance

Slideset: https://lists.w3.org/Archives/Public/www-archive/2025Nov/att-0004/Worker_QoS_reboot.pdf

markus: Web Worker quality of service, right now there's no notion of QoS

markus: History - Intel presented a proposal in 2023

<dom> Intel presentation on Web Workers Quality of Service

markus: at TPAC. That proposal was that the worker construtor is given a dictionary of options

<dom> Web Worker Quality of Service Explainer

markus: Work on this stopped, not sure why.

<dom> Minutes of "Worker QoS" breakout in TPAC 2023

markus: Now we have more and more issues with the AI worklet we want to run speech recogniition at the same time as the user running LLM queries
… At the same time, VC applications run background blur, de-noising, etc

[Slide 2]

markus: So it's hard for the system to udnerstand what to prioritise, so you get jitter in vide, etc

[Slide 3]

markus: QoS issue including audio glitches, dropped video frames, janky video, etc

[Slide 4]

markus: The proposal was power-focused

[Slide 5]

markus: I want to reboot this proposal. Remove the high/load/default classes, and have workload-descriptive hints, e.g., video-capture, audio-recording, or have a long running batch computation ,and you don't want to interfere with showign video frames

[Slide 6]

markus: If you use WebGPU or WebNN in a worker, the proposal is to let these usage hints influence priorities of those other APIs in the system
… Abuse control, you could have an audio application take priority.
… Audio application is woken by stuff related to playout, or a video application woken by camera frames

[Slide 7]

markus: Alternatives, heuristically classify workloads. Why not just give them the priority they need, with no APIs

[Slide 8]

markus: This is where we are, a forced opt-in to the system. Giving applications more control seems beneficial
… Want to also bring this to Web Performance, and Audio WG

Youenn: I read the 2023 breakout minutes, Paul Adenot said for audio needs to use audio threads in addition to worklets. It's the same scenario, want to ensure the worker and worklet schedule at the same pace
… If something bad happens, both could be demoted to regular priority, which would be bad

Andy: About the descriptive priorities, UAs would implement based on platform understanding of user-initiated, background etc
… What if those vary across platforms, leading to differing behaviour?
… Would the spec need to address, in terms of a ranking of the descriptions

Markus: Not sure how much you could spec. Good to be as relaxed as posssible in the spec

Dom: Issue is interoperability. I wonder if describing in an ideal world what you want to prioritise for a given hint. UA's may not always honour all of it, then it leads some discretion to the UA
… If we find the right criteria for what the hints are optimising for, and leave to the UA to decide, better for interop

Youenn: [Example of a source to a native sink, piping]. Different implementations might have different prioritise for native sources and sinks, but all should have same QoS

Dom: That's about groupings?

Youenn: Audio worklet you know is real time, except when everything falls apart

Dom: Maybe what we should decsribe is the pipeline

Youenn: Yes, the big use cases are audio processing and camera capture

Markus: Various use cases: audio processing in WebRTC, where we have problems. Also present problems. Intel issue linked in the presentation
… On windows, easy to make the system misbehave

Youenn: Suggest grouping priorities of parts of the pipeline, some in JS, some native

Markus: Is that a proposal to not expose categories?

Youenn: If a group connects to a native source or sink, the UA knows the priority
… Could be a way to reduce interop concerns

Markus: So instead of exposing categories, do it automatically?

Dom: No, have a way to tell the UA that a set of workers are connected to a sink, so the UA can optimise to make that operate smoothly
… So declaring the pipeline rather than each piece

Markus: In that case, how would the UA identify the constituent pieces?

Dom: Could give an API the sink and source

Markus: Could put a QoS on getUserMedia, then anything connected inherits that
… There are opportunities for that system to misunderstand what's going on

<Zakim> nigel, you wanted to ask if user needs should be able to impact implementation and interpretation of the priorities

Nigel: On interop and consistent implementation, the user need might influence the priority, e.g., if they're using a screen reader, they wouldn't want that to influence the priority
… In that case, janky video might not be so bad. Just something to consider

Markus: Yes, it could factor in usage of screen reader, for example

Mark: For realtime media applications, what's the ideal model from the application point of view, from each media input, camera or over the network
… Browser could learn the inherent data rate feature of each pipeline. Then, e.g., prioritise the audio pipeline over the video pipeline
… What's the application's ideal model? Give the app time to process each incoming or outgoing frame, and tell the time box that's availalble, such as a time deadline

Markus: Specifying deadlines would be the best, then we don't need to talk about priorities any more
… Not many OS's can do something with that

Mark: You'd add the deadines to the scheduler so it makes the prioirity decision

Youenn: From UA perspective, hard to know the time of input or output for the work item. Could describe more, e.g., i'm an image processing tool doing blur
… If you know the timing of input and output

Mark: Streams API gives you a way do define a pipeline, so could be a place to expose

Nigel: How to test?

Markus: Tests might be flaky, when you have real time tests
… It's a hard problem. A big challenge is the range of OS's that can or can't support this in a good way

Dom: Groups involved?

Youen: Media WG, WebRTC, Web Performance
… Audio

Dom: Suggest writing an explainer, and circulate across the groups
… Get something started

Mark: WebGPU and WebNN would want input

Markus: I can produce the explainer

Suggestions for improving the W3C Process

cpn: Welcome Igarashi-san from the advisory board

igarashi: Thank you, I'm here to discuss potential Process issues.
… The Process CG discusses issues with the Process and gets feedback from the community.
… The AB has decided to participate in each WG's meeting during TPAC to get more feedback.
… The Process is very complicated so in a short session we may not have time for details,
… but I'd like to get any feedback about concerns with the Process.

song: In the last few days XXX has had discussions so we are going to work together for the Media WG.
… The Process document is the first document we need to check when drafting the Charter,
… and we are trying to improve the documentation so that's the goal from us.
… If there are any frustrations about the Process for the organisation you can answer the questionnaire.

igarashi: Let's open it for discussion.

cpn: Straw poll: how many people are aware of it and how many have read it in detail?

8 hands raised

cpn: Maybe a third to a half of the group
… It's a big question.
… Are there things that cause concern?
… Is it understandable?
… Does it help us achieve our goals?
… Not going to review it all

Mark_Foltz: I don't think I've ever read the entire document.
… Whenever I have a question I ask someone like Francois who knows it and answers correctly.
… That works very well. Maybe better socialisation would mean I wouldn't have to ask so many questions.

Youenn: +1

cpn: I recognise that too.
… The Process has flexibility and different WGs can operate differently.
… Without Francois we would really struggle.
… That points to... there's complexity, and how much is essential vs how much is accidental.

Mark_Foltz: In any development environment, when there's a complex process we usually build tools to manage it,
… which distils it into a series of manageable steps.
… I've noticed with things like Wide Review is that someone creates a GitHub issue with tick boxes,
… but maybe it needs to be supported by better tooling to automate it.
… For a long time in Blink when we did intent to implement everyone wrote an email,
… it got complicated so we developed a web tool to make it easier to do consistently.

tidoust: Same thing, the AB is interested in what could change in the Process,
… I guess the two main impacts for WGs are the Chartering process that we do every 2 years.
… I refresh the Charter and come back to the group to get feedback.
… I'm not sure how useful that is, maybe it could be simplified.
… The other impact is Wide Review, where I see some struggle in the MediaWG. Chris showed the list with everything in WD
… earlier. If we find it hard to get to CRS maybe the Process is making it more difficult than it needs to be.
… I see that some of the specs are widely implemented but are still WD, so some steps are missing.
… I'm taking feedback on tooling, we've had a lot of discussion about wide review earlier this week as well.
… I'm dreaming of a world where WGs naturally track wide review tracking changes as they are made.
… So it's not too late to get review when things have been shipped.
… Some things need tooling but not Process changes, others need Process changes.

cpn: In each of our specs we have e.g. a CR tracking issue, so we can work out what we want to resolve before entering CR.
… Looking at "adequate implementation experience" - I need to check the Process each time to see if we are meeting the right criteria.
… It is all described here but I personally find it hard to keep track of.
… Then what does it mean in practice for each spec.
… Is shipping enough, or do we need developer feedback for Wide Review.
… Turning these requirements into more specific criteria in our case.

igarashi: Thank you very much, very interesting proposals about tooling.
… Chris, we may have other comments, some sort of polling about complexity?

cpn: Good suggestion. Anything that [scribe missed] would be a good idea. I don't think we need to poll right now.

Song: We will try to get feedback and it will be a priority for the AB to improve the Process document.

cpn: I look forward to hearing more about where you take this next.
… Thank you for bringing it to us, it is helpful.

WebCodecs Reference Frame Control

<tidoust> Reference frame control

Slideset: https://docs.google.com/presentation/d/1gnhEvCFPUsmaiz-jpQyiGS7WDzjCKIhH59hu-DdnVcI/edit and archived PDF copy

[Slide 1]

erik: I work on Google Meet. We want to use WebCodecs more, but it lacks some features
… We want to reference any structure using minimal tools in a codec agnostic way

[Slide 2]

erik: This is the proposal from Eugene, from 2 years ago
… There a new 'manual' scalability mode

[Slide 3]

erik: Call getAllFrameBuffers(), and for each frame you encode, you signal the references
… To ship something, we have some constraints. You get an array of buffers out. VP9 can have 7 buffers but you can only reference 3 at once
… Since the encoder doesn't know the layer you're encoding....
… We don't deal with spatial scalability. Hoping to address in future

[Slide 4]

erik: Can do 80% of the things we'd like to with this API update
… Scale down the frame rate.Frame 0 is a leaf node, Frame 2 references the temporal layer

[Slide 5]

erik: In code, it's straightforward
… I made some demos
… Top left is my self preview, AV1 bitstream with 3 temporal layers. On right, at top we show entire bitstream, and middle we drop frames

<dom> [shows demo https://sprangerik.github.io/webcodecs-demos/webcodecs_manual_scalability.html ]

erik: Rate control demo

<dom> [shows https://sprangerik.github.io/webcodecs-demos/rate_control_demo.html ]

erik: Compares target and actual bitrate. Use the CBR rate controller in VP9 and AV1
… It's a realtime rate control, so you have to do some guessing. The bitrate fluctuates
… With external rate control, I set the quantiser value
… It's a good rate controller, reacting faster
… Instead of always referencing the last frame, I have two that I alternate between
… Re-do with a new QP, it's a semi-realtime encode mode. We use this for screensharing. When you do a slideshare, there's a huge spike in bitrate. That's what this is made for

[Slide 6]

[Slide 7]

erik: Another use case is long term references, where you have an active feedback signal
… When you send a frame, send a signal back. Then you have a guarantee that the frame will exist, even if there's loss in the network

<dom> [shows LTR demo https://sprangerik.github.io/webcodecs-demos/ltr_demo.html ]

erik: The spatial quality suffers, as it uses references that are older

[Slide 8]

erik: Lots of use cases: TLDR is about it helping it be dynamic and react to network conditions, with knowledge of the state of the system
… Status: There's a Chrome implementation. I published an explainer yesterday
… Intent to experiment in Q4
… Implementation-wise, we're rolling out support for D3D12 hardware video encoders on Windows
… Works in same way for H264 and H265 and AV1
… Should work across Intel, nVidia, AMD chipsets. Some work to get it stable

[Slide 9]

[Slide 10]

erik: Next step? Interest from others in using this?

Youenn: Long term, if you have a peer connection object ,should web apps just forget about peerconnection and use this web codecs fine tuning

Erik: This unlocks more customisation and experimentation. Long term, we can move more logic in libraries instead of being baked into the UA
… For conferencing the transport option of delivering is a question. Not just the encoded frame, you need metadata with it

Youenn: How does this relate to the complexity / priority idea?

Erik: This is the minimum thing that gets us most of the way there. Should have speed control
… Need to know what speed settings the encoder supports
… Skipping that for now to have something shippable

<dom> TPAC 2024 breakout: Evolved Video Encoding with WebCodecs

Erik: Not much of a fingerprinting issue?

Chris: Who do you want feedback from, other apps than Google Meet, browser vendors?

Alastor: I'll ask my colleague who works on this

Youenn: Implementation-wise, with software encoders it's easy. Less sure about hardware codecs
… Two teams involved: hardware-level, and the API surface
… Feedback from WebCodecs users like Zoom?

Erik: From Microsoft Teams, yes.

Randell: It's interesting, want to look more for implications on hardware support. We'd have to see from a prioritisation perspective
… Demos look interesting.

cpn: Good next step, all to take a closer look, investigate feasibility of implementation at hardware level.
… bring it back to a future meeting when you're ready.

WebCodecs and WebRTC encoded transform

Youenn: We mostly converged... We have WebCodecs and encoded transform interfaces
… Doing similar things, expressing encoded video and audio data
… They're used differently, and expose different information
… One thing could be done is use the EncodedVideoChunk, could reduce the WebRTC Encoded Transform Spec
… Second topic, there's a proposal in WebRTC WG to add an encoded source to a peer connection
… A WebCodecs encoded piped into PeerConnection
… How to have a good match between EncodedAudio|VideoChunk and the peer connection pipeline
… The encoded source proposal has an API. Should we try to increase exposure of encoded audio frame or reduce XXX?
… Do WebCodecs folks have thoughts?

<dom> EncodedAudioChunk interface

<dom> EncodedAUdioFrame interface

Youenn: Interfaces are quite different currently.
… Basically, you have a metadata object which was meant for use in transforms.
… EncodedVideoChunk is quite different. It's immutable.
… You can construct it very easily.
… If you have a WebCodecs that is giving you such a chunk, you will need to create two destination and create and EncodedAudioFrame from that chunk.
… WebRTC tries to reuse as much as possible from WebCodecs.
… But we cannot change interfaces for old usage.

Eugene: From a WebRTC point of view, wondering about adding dependency?

Youenn: It's already the case in practice.

<GabrielBrito> Sorry, my mic was open by accident

Eugene: If you're already using EncodedVideoChunk and VideoFrame, I'm all in favor of extending that.

Youenn: What's missing is the metadata.

Eugene: I see what you mean.

Erik: My preference would be to add metadata as part of the WebRTC call rather than in WebCodecs

guidou: To give a little history, encoded and raw were practically developed at the same time, but encoded came out first.
… Now that we encoded chunks and new use cases, it would make sense to use the WebCodecs one.
… But we need to add the metadata to the raw version one.
… The idea would be to add metadata here and support this for this use case, which lets you use EncodedAudioChunks directly without having to convert.

Youenn: About the fan-out use case, in that case, the input is really an rtc encoded audio object.
… You can transfer the data from what I can tell.
… It should be fast as well.
… Going to encoded audio chunks should work.
… I'm hearing that if there's consensus, then we should extend WebCodecs, and not do it in WebRTC.
… About the second part, is it something that WebCodecs encoders should implement or is it easy to do?

Eugene: We had the discussion before.
… We came to the conclusion that the encoders/decoders won't be doing the work of transferring the metadata from the inputs to the output.
… Since we have don't have 1:1 correspondance between video frames and encoded chunks and so on, it's up to apps to do the mapping based on timestamps and so on.

cpn: No need here for metadata on AudioData then?

Youenn: Right.

Markus: Is is fully possible to associate this metadata with what we send to the encoder?

Guido: We would need to add that separately via timestamp matching

Markus: So you need to match on the outside.

Erik: For the drop case, garbage collecting metadata for something that isn't known may be hard.
… That's a separate discussion.

Eugene: If you feel that your API design would be more ergonomic with RTC versions of WebCodecs constructs, go for it. We're not going to make the encoders/decoders pass the metadata.
… For VideoFrame, the main reason for metadata is that we have different sources, with different parameters and extra information.
… For encoded audio/video chunks, if they come from RTC streams, I can also imagine that they have metadata attached to them and it makes sense to have the metadata there as well.
… What I mean is that, originally, we added metadata on VideoFrame because they can come different sources (cameras, canvas, etc.), with different information attached to them. From a camera, you may have rectangles attached to detected faces for example.
… We have metadata on VideoFrame because of that.
… The same logic can be applied to encoded audio/video chunks if we get them from different streams.
… E.g. demuxed streams, webrtc streams.
… I see that as an argument that we can potentially add metadata to audio/video encoded chunks. Curious to see what Paul thinks about it.

cpn: Paul is not in the room right now.

Youenn: What Eugene proposes is arbitrary metadata, whereas the proposal we have is more specific metadata.

jesup: Will discuss with Paul.

Youenn: I was looking for guidelines to put encoded sources in the right path. Next step is, I guess, to come up with a concrete proposal and let it be reviewed by both groups.
… I suspect Guido will be driving this.

Guido: Yes, we can perhaps write an explainer for encoded sources. How to work with encoded chunks and putting emphasis on integration with encoders/decoders.

cpn: Sounds good to me.

iframe media pausing

Slideset: https://docs.google.com/presentation/d/1uzyCPbnLvQ-ME_CNETFubUrUZQLiPbFhwzzRDUT9pcQ/edit?usp=sharing and archived PDF copy

[Slide 1]

GabrielBrito: Software engineer for Microsoft Edge. We have been working on this feature.

[Slide 2]

GabrielBrito: Main motivation is when an iframe needs to be hidden away for some reason. Since the iframe can be arbitrarily complex, if the application chooses to hide it for some reason, it has no control over the media, and needs to destroy it to make sure it pauses the media. Re-creating an iframe is resource intensive.
… If it does not destroy and the audio keeps playing, there's no way to control it from a UX perspective.

[Slide 3]

GabrielBrito: We propose a new permission policy that can be used to prevent an iframe and its children from playing media when the iframe is hidden.
… We have integration with media playback, Web Audio, and autoplay.

[Slide 4]

GabrielBrito: We have a working prototype behind a flag in Chromium. Mzoilla now supports the feature and we're incubating the feature in WICG.
… The spec is fairly simple.

[Slide 5]

GabrielBrito: That's what we've been doing the last year.
… Looking forward for questions, ideas.

SteveBeckerMicrosoft: Some developers have adopted the features and provided feedback.
… Use cases include use of cross-origin iframes with complex apps being loaded.

Youenn: What about the case of a top-level document that never wants an iframe to play media?
… Has it been discussed?

SteveBeckerMicrosoft: There's a possibility to consider other uses for the permission.
… If there are other customers for these scenarios, happy to iterate.

Eric: Did you say when the iframe becomes invisible, do you pause playback or do you mute playback?

GabrielBrito: The current spec says that it should be paused.

Eric: Have you found compat issues with scripts that don't expect playback to be paused externally?

GabrielBrito: Haven't heard problems so far.

Eric: There are some sites that have problems when playback is paused via a script that isn't initiated by their own.
… It's a big problem at some sites.

<Zakim> nigel, you wanted to ask why generic media playback is intersecting with visibility specifically - what about audio only media?

nigel: This is expressed as media element playback. But the condition is visibility. What if the element is only playing audio?
… If it's not producing visible changes, should it be restricted to video playback?

GabrielBrito: The permission policy would apply to the iframe, not to the media element. It's when the iframe itself is hidden.

cpn: In the audio case, a music player may want to continue playing music. The top level document would choose not to use the feature.

Mark_Foltz: I was wondering whether the explainer talks about the integration with the MediaSession API?

GabrielBrito: This is something we need to take into consideration.

Andy: Directly pausing the media element should fire the event with a reason.
… It might be more compatible with sites that don't expect the playback to be paused.

randall: About these sites and audio only cases, because that is being applied at the iframe level, in most cases, they know that this is the case.
… If an app encloses arbitrary content, they probably know that things can break. I don't see that as a blocker.

GabrielBrito: Yes, AudioContext comes to mind as well for the interrupted state.

Eric: I think this is a great feature, just raised the issue as something to watch for.

nigel: Question about picture-in-picture. If the iframe is hidden, does PiP still continue?

Youenn: We should consider that it is still visible, yes.

<Zakim> nigel, you wanted to ask about the interaction with PiP

Randell: I think the spec should detail this and have opinion on it.

Picture in Picture spec on interaction with page visibility: https://www.w3.org/TR/picture-in-picture/#page-visibility

Youenn: Also relation with the AudioSession spec. One of the purposes is to expose to the page "Hey, you're interrupted". This is already happening for a page on iOS, when another application starts playing audio.
… AudioSession has an algorithm that says how to interrupt media playback in a document.
… It would be good to relate the two as I think there could be hooks between the specs.
… Similarly, for MediaSession, AudioSession may allow you to resume, I think... [checking]
… Ah, no.
… I would recommend reading the spec, and let Alastor and I know if adjustments to the spec may be needed to ease usage.

cpn: You have a PR opened. Are you looking for feedback on that before iterating?

GabrielBrito: I was discussing this with Paul.
… Looking for feedback from Mozilla team.

cpn: Sounds good. I think you have a good starting point. And then some of the issues that were raised today can perhaps be addressed separately in different issues.

GabrielBrito: I'll create issues for things we discussed today.

Media Capabilities

https://docs.google.com/presentation/d/e/2PACX-1vT4gctxWcgjcsWk9mIm8M3cR7lwdlBTQlBRkGnVk1_RGG0H-g2mlmZp_89jfSjxoOfYbL-X1PKpOpzV/pub?start=false&loop=false&delayms=60000

Mark: I became spec editor a year ago. I'm working through existing PRs and issues
… Making good progress on clearing up the backlog of issues to resolve

[Slide 2]

Mark: The current status, the spec is a Working Draft on the Rec Track. We have work to do to go to CR
… There are working implementations in major engines
… We still have about 33 open issues in GitHub. Several are questions or feature ideas, or things that don't impact the spec so much
… Short term, we have 14 v1 issues, that we should resolve. Of those, about 8 require changes to normative text, so want to make sure we have consensus on those

[Slide 3]

Mark: Two open PRs from before I took over
… Interop-wise, it looks OK. We added tests on being stricter on validating codec strings and mime types
… For example, using a video mime type for audio and vice verse. Some interop work needed there
… There's an open issue for Chrome, on main issues failing WPTs
… Haven't found someone with time to work on it yet
… If other vendors want to work on them, that's encouraged

[Slide 4]

Horizontal reviews

Mark: Some were done a while ago. Do we need new reviews. Some have pending actions for us
… There's one on a11y, seems more of a question on whether you can use MC API to detect whether media has a text track, and infer user preferences
… I don't think it's relevant to the spec, but needs an answer
… Some substantial TAG feedback, all the issues have been done and closed. Nothing needing TAG input now, AFAIK
… Talking with Chris, there was a Privacy review that raised questions. We responded and added text
… So all the things they fed back on we've addressed. Might want to ask for another review
… Security review might still need doing

Francois: Security reviews are restarting, so can do that now

Mark: Have separate privacy and security sections? An issue was raised then they changed their mind

cpn: Privacy group always has concerns over exposing capabilities to applications.
… Due to fingerprinting considerations. It seems to me that any feature we add might impact that. I'm wondering whether the current spec is materially different from the one that was reviewed, warranting another round of review.
… One question was: is capability detection the right approach to start with? Why doesn't the site offer a set of choices that the user agent could then choose from.
… But I think we answered this question.
… Partly because they would become observable anyway.

Fredrik: Querying capabilities is very powerful. It's very much the right choice. You can always infer all of this information anyway by trying videos out and figuring out what works. Media Capabilities does not reveal new bits, but it does make it slightly easier to query these capabilities.

Eric: If it's easy to do by attempting to play video, then question could be: why do we need that in the first place?
… It's very easier to use the API.
… But it makes also it much easier for sites to get additional fingerprinting bits.
… I second Chris in getting an additional privacy review.

Mark_Foltz: Next step would be to gather the list of changes that were made since last review and determine whether that warrants another review.
… I just want to point out that this is not a new conversation.
… I will put it in the queue of things to do. Help would be welcome!

cpn: There's a whole question here around text tracks.
… Around querying some capabilities of the media you're playing.
… which is not what the spec does.
… But it may be about captioning support.

Mark_Foltz: There's no algorithm in the spec to report on any text track capability.
… If it says something, we'll want to add a note about not linking that to user settings, only to browser settings.

cpn: I'm reading the issue as coming from the perspective of someone who thought the spec allowed querying the capabilities of the media being played.

hta: To me, it makes more sense to compile a delta of changes, and send that to horizontal review groups.
… Reviewers looking at a delta might be faster.

Eric: I was going to say the same thing.
… Opening up for an entire review for a simple question like that seems unnecessary. It would work to answer the question.
… A full review would take a substantial amount of time.

cpn: OK, so let's compose an answer to the question that was put forward and see if that satisfies the reviewers.

cpn: Security has now been restarted. Simone now leads the activity at W3C. I assume that's now an expected part of the horizontal review process.
… We should do a self-review and request review afterwards.

Mark_Foltz: If you can look at the self review from 2020, and update it accordingly, that would be great!

cpn: Happy to do that.

[Slide 5]

Media Capabilities and webrtc

w3c/media-capabilities#185

Mark_Foltz: Feature that allows site to get additional information when they query the API for webrtc use cases.
… There was a PR that was put together some time ago to add the feature.
… Since then, we did a refactoring of the spec.
… I rebased the PR, no substantive changes introduced in that rebase.
… Now ready to be reviewed.
… I wanted to assess if it's still accurate. No implementations for now.
… It looks like it could be done with a small amount of JS. Is it something that the API needs to do?
… Is there something that the browser knows that the site does not?

Youenn: In my mind, it's more convenience.
… There may be issues around clock rate.
… Normally, you should be able to implement everything yourself.
… If you're using the WebRTC API to set preferences, then it should still work.

hta: I think you're wrong on that one.
… The way that setCodecPreferences work is that it requires an exact match between a codec that the system knows about and the codec that is requested.
… If you try to construct one through JS, you're likely to miss what the platform is capable of, as it will vary across platforms.
… You want to set complete different parameters depending on the device.
… We need to get back the exact information that you need to put in the RTC space.
… Also, H.264 defaults to 0, and the only sensible value is 1.
… The underlying spec is very old and now outdated.

Mark_Foltz: Two jobs for the user agent here: 1. pick the correct values depending on the implementation of the codec, and 2. produce correct values with defaults.

Youenn: Yes.

hta: The Media Capabilities spec should say that what is returned by the query is one member of the family of codecs that the device is capable of supporting.

Youenn: If you have two browsers on the same machine, you want them to report the same values.

hta: In the WebRTC codec world, we landed on specifying only VP8, OPUS, and GSMA were to be supported by everyone, with other codecs being up to implementations.

Mark_Foltz: As an editor, I need to know where the steps to populate this are.
… If there's a spec I can normatively reference, that's good, otherwise I need someone to provide the steps.

hta: In the WebRTC spec, there's a spec that says "populate this with a platform-defined...". You can point at that.

wschildbach: I wasn't familiar with Media Capabilities API being used to query codec capabilities for WebRTC usage. Now I wonder about codec support that does not work for WebRTC.

Mark_Foltz: On the right side, you get "supported: false" in that case.

wschildbach: So there's a flag to query only for WebRTC usage.

Mark_Foltz: Yes.

cpn: The flag also distinguishes file based media.

Mark_Foltz: I will make an attempt to use the existing steps to populate the things on the right.

Youenn: I'm hoping that with the hook on WebRTC, you get a list of codecs, and then filter the list, and then pick up the first one in the ordered list as I suspect there are cases when you still have multiple entries.
… That is, I'm hoping it's a sorted list, and that the first one is the preferred one.

Mark_Foltz: It might be implementation independent.

Stereoscopic video support

Hubbe: From Meta. We want to query stereoscopic video support. Media Capabilties seems the right place to do that.
… Most browsers will not support the stereo mode.
… Small change, pretty straightforward.

cpn: Is it a decoding or rendering capability? That's a difference we make.
… If it's more a rendering capability, then Media Capabilities may not be the right spec for that.

Hubbe: There's something for audio capabilities.

cpn: There is, but the general feeling is that this may not have been a good decision.

Hubbe: Question being where else should it go to answer the question correctly?

Mark_Foltz: It feels similar to HDR where we landed with a two-part approach: one query for whether you can decode HDR, and one query for whether your display supports HDR.
… I'm thinking that the second could be used to detect stereoscopic display.

Hubbe: We don't necessarily have this capability in CSS or so on. It's purely for videos.
… Media Capabilities also has this efficiency parameter, that is often tied to rendering as the defect is often in the renderer.

Mark_Foltz: I don't object to using the API to query into the metadata support.
… Details about the rendering aspects of it would need to be discussed somewhere else.
… There's CSS. There's document.screen.

Hubbe: If you try an AV1 video, it won't work is software-decoder. With an H.264 video, it will because it's hardware decoder.

Mark_Foltz: I'm not disagreeing. Media Capabilities is the right place for that part. I'm saying that whether the screen can actually render is out of scope.

hta: left-right, top-bottom is probably not the right thing for some codecs that encode stereo in one stream.

Hubbe: There's a "multiview" value for that.

cpn: What would a pure decoding capability query look like?
… I'm wondering about the distinction between decoding and rendering.

Mark_Foltz: What happens when you render one of these videos to a canvas?

Hubbe: It depends on what type of video. For top-bottom, you get two halves. For multiview, you usually cannot see that they are even there.

Mark_Foltz: In that scenario, you could be decoding the video but not putting the video anywhere on a display. You are rendering in some sense but not targeting any device.
… Similar to HDR support.

Eric_Cabanier: The CSS part you mentioned wasn't handled in this group?

Mark_Foltz: No, that was handed over to CSS.

Hubbe: dynamic range and also video dynamic range.

Xiaohan: In Media Capabilities, we do have HDR metadata. I feel the line between decoding and rendering is not super clear. Some capabilities are very close to the screen.

cpn: Are there pieces of this that could go to the screen interface? I wonder whether they can be factored out and done through media queries or some other mechanism.

Hubbe: If we implement this in our browser today, can we do that in the meantime while we figure things out?
… The videos look very ugly if the browser does not support stereoscopic.

Mark_Foltz: I see the value in the use case for media capabilities for querying whether the browser can at least understand the stereoscopic metadata.
… Most of my feedback is around the API shape, whether enum values can be implemented through various specs that deal with stereo.
… I think we need more discussion there as to whether these are the right values.
… or maybe start with a boolean.
… And then the developer needs to work with the implementation to understand whether it supports all of the values.

Hubbe: Totally fine to continue the discussion on GitHub about the actual value.

cpn: The overall objective seems good. It's just figuring out how.

Eric_Cabanier: Should we also talk to the CSS WG?

cpn: It may be too early to do that.
… That's the model we followed for HDR rendering, but I would suggest we reach to that conclusion first before we bring them in.

Interaction between Media Capabilities with WebCodecs

Mark_Foltz: Two related APIs to query for codec support. WebCodecs has isConfigSupported().
… The APIs are related but query different things.
… The APIs have some overlap.
… WebCodecs have this registry.
… Media Capabilities is more open ended.
… But you may want them to work together.
… E.g., encode with WebCodecs and playback with WebRTC or MSE.
… It would be nice to be able to use the same query in both APIs.
… I checked. They're pretty much aligned.
… WebCodecs has an unsigned long to describe audio channels, Media Capabilities has a DOMString (with an open issue to describe the format).
… Also spatial capability query is different.
… So I think we should resolve #73. Then have some working examples of how the two APIs can work together so we can devise a follow-up plan.
… That's my very short version.

hta: At IETF, there's ongoing work for defining a mulit-channel codec. There may be prior art to copy from.

Audio Session setDefaultSinkId

Slideset: https://docs.google.com/presentation/d/1t3aK1CuqyFO4ytWHubLeUiXFesRSuTYYWXlgxSC_TEE/edit?usp=sharing and archived PDF copy

Should AudioSession be able to specifiy the output speaker and/or route options (a la sinkId)?

[Slide 1]

SteveBeckerMicrosoft: From Edge team. We talked about pausing media in iframe. This proposal builds on that

[Slide 2]

SteveBeckerMicrosoft: [conferencing system with a.com and z.com]. No cooperation between the two sites, no way to change audio input.

[Slide 3]

SteveBeckerMicrosoft: Existing API. selectorAudioOutput, enumerateDevices to select the input and output.
… But the problem is that the top level cannot call setSinkId of its top level window.

[Slide 4]

[Slide 5]

SteveBeckerMicrosoft: We published an explainer in 2024.
… We launched an experiment in April 2025.
… Some feedback already. We'd like to continue to gather feedback.
… We'd like to spec the API if there's support, not exactly clear where.

Youenn: Why not in the AudioSession API directly?

SteveBeckerMicrosoft: I think this is broader.

Youenn: AudioSession of the top level document would trickle down to its children, except if the iframe itself would override setSinkId()

alwu: Is it restricted to the top level page?

SteveBeckerMicrosoft: I think we restricted to the top level page only.

Youenn: It would be exposed to the web but it would not be supported and we should not worry about an iframe expecting support.
… We will want to expose this in AudioSession objects anyway.
… Either it's on MediaDevice or in AudioSession.

Youenn: AudioSession is implemented in Safari, main use case is on mobiles, but also exposed in desktop.

alwu: Wondering about integration afterwards in AudioSession. Done before?

Youenn: At some point in the future, we might want to construct AudioSession objects, linked to media elements.
… That's something we cannot get with MediaDevices. Small difference, I agree.

Mark_Foltz: When a top level frame sets setSinkId, does the iframe get an event?
… If there's a UI, that may cause an issue.
… The user may be confused that the audio may be coming through the speaker if the UI says otherwise.

Youenn: The iframe can always override.

Mark_Foltz: But if they don't get an event.

SteveBeckerMicrosoft: That's a good use case. We can file an issue.
… In order to do that, you also need a permission policy.

guidou: AudioContext and media element has a sinkId. Do you change that property?

Youenn: How do you get the deviceId in the top level frame?
… selectDeviceOutput()?

SteveBeckerMicrosoft: That would work.

Youenn: [mentions no exposure of default speakers to top level]
… The top level frame who wants to set setSinkId would not have the IDs.
… That may require some changes to the media capture spec.
… Microphone and camera access would need to be granted.

[discussion on cross-origin rules and implications for sinkId]

?: It may just be that people interested in this already have that permission anyway.

Youenn: Yes and we are trying to converge.

SteveBeckerMicrosoft: Where would we like to spec this remains the biggest question.

Media WG Registries

Francois: We have a number of registries, in WebCodecs, MSE, and EME
… The Registry Track is relatively new. I'd hoped that other WGs would have progressed their Registries, so we can learn from their experience
… But we're one of the first to publish registries.
… We have a Draft Registry status. We can move to Candidate Registry, then Registry
… Shouldn't be too hard, as we don't plan to change the registry definitions
… We would be able to change the registry entries at any time, according to the requirements in the registry itself
… As a group, if we're fine with what the Draft Registry says, in terms of who the custodian is and how we review and approve additions, the next step is Wide Review
… Not sure what the review groups will review.
… I'd suggest we do them all at once, to ease the reviewers' workload

Chris: This is the equivalent of moving to CR for specs

Paul: Sounds good. We've exercised the WebCodecs registries a few times, to add entries
… So happy, for the WebCodecs ones, happy as editor

Mark: Does it trigger a call for exclusions?

Francois: There are no patent implications as far as I know

Media Source Extensions

seekable range for a MediaSource with finite duration

Model seekable range for a MediaSource with finite duration

cpn: When you're using MSE the seekable attribute returns a duration from 0 to the duration if the duration is finite.
… For live, it returns infinity.
… You may have a finite duration stream where the 0 time may no longer be seekable.
… Events where we overrun the 24h capacity of our CDNs for example at the BBC.
… My colleague who raised this issue is that JS players tend to ignore the seekable attribute in the media element and implement their own seekable time range logic.
… I'm wondering whether this is something we should look to address in the spec in some way.

jya: I looked at the Webkit implementation.
… I saw comments there.
… It seems that the seekable range when the duration is set is the intersection of the buffer ranges of the media elements, and the seekable range on the media source itself.
… It is really that it's continuous from 0 to duration?
… Am I incorrect? I have the feeling that the problem raised here is not actually one.

jya: endOfStream does not make the range continuous.
… I am not sure that there's an actual problem here.

cpn: Is seekable supposed to indicate what the user agent buffered?

jya: It does not mean that 0 is always seekable if the media source is ended.

cpn: Maybe we need more details or a test case to show where this is causing problems.

jya: seekable takes a TimeRanges, and it does not have to be continuous.
… What the bug describes may be a particular user agent problem. I don't think that's what is in the spec.
… Can I take more time to look more into it?

cpn: I think that would help, yes.
… Also see the way JS players implement this timing properties.

jya: It seems to me that if implementations follow the spec, 0 may not be seekable.

cpn: That's the mismatch, difference between what the server can seek to and what the user agent says.
… Maybe there's a question there about what implementations do, and whether that's per spec or whether the spec has issues.
… I can check to see if it's just a mismatch in the spec or if we're seeing actual problems in implementations.

Detachable MediaSource

Proposal: Have a detachable MediaSource object

cpn: We've had a number of calls that point to the need in the case when players want to avoid having to maintain two different media elements each with their own buffer.
… I'm wondering whether Webkit has an implementation of this.

jya: Yes, behind a flag.
… I let that rot a little bit.
… I also have a series of tests that I've written, to be integrated in WPT.
… The spec is simple to define what to do.
… Maybe the next step would be to write a spec proposal?

cpn: Some worry about objects having to contain a large amount of data around.

jya: It will only be kept if the script keeps a reference to the objects.
… We still have an issue around collecting media stream. Bug that was opened by Mozilla like 10 years ago.
… I don't believe this to be a problem in practice.

cpn: Moving this forward into a proposal sounds like a valuable next step.

jya: For the underlying test, I avoided the never expiring issue through detach which is a very clear lifecycle.

Media Capabilities - Dolby Vision HDR Metadata

DolbyVision HDR Metadata

Mark_Foltz: Sites may want to know the availability of playback with DolbyVision HDR metadata. There are multiple MIME types.
… I read through the issues and it wasn't clear what spec changes were requested.
… There were some comments about a "cross compatibility ID".
… The other question is how we try to get multiple implementations.
… Compatibility queries are generally delegated to the OS/CDM.
… I don't see major issues with this, but would like to see examples of how to parse the MIME types.

wschildbach: There are certain profiles of DolbyVision that are backward compatible.
… Meaning that you can decode the same stream in different ways.
… That's not the case in every profile.
… The ask is that it's possible to query the system to know whether it can decode.

Mark_Foltz: So single stream which can be decoded either as SDR or HDR.
… Would the profile be different? Would it have to be part of the query?
… Would I be able to make a second query to know whether I can decode as SDR or HDR?

wschildbach: Unfortunately, no.
… If you take dvh1 as an example.
… The answer might be "yes", but you don't know at this point whether it can decode into a stream that supports Dolbyvision.
… It's a property of the decoder.
… Having an enum for HDR Metadata type for DolbyVision would solve it.

cpn: If the solution is to add an enumeration value, the question becomes what is that string.
… We don't have an example of a vendor specific value.
… The question is the criteria that we have to add an enum value.

wschildbach: If you have an enum, somewhere you need to specify what this enum means, with a reference to some specification.

Eric: Is there a publicly available specification?

wschildbach: There are documents that describe profiles, but it depends on what your expectations are.

Eric: It sounds that the answer is no.
… That was the issue when we originally talked about this.

wschildbach: Why is it an issue?

Mark_Foltz: That is an issue with proprietary codecs in general.
… For Media Capabilities, we need to know how to parse a MIME type and parameters. The minimum that needs to be public is that.

wschildbach: The user agent needs to understand what they need to do to query the underlying system, right?

Mark_Foltz: Yes.

wschildbach: OK. Not public today, but could perhaps be made public in the future.
… There's still going to be something that is specific to the implementations in practice.
… If the implementation knows how to talk to codecs in Windows, that should work.

Mark_Foltz: As far as the API shape, I'm still looking at what's missing from the current API to enable what's missing at all.
… HDR Metadata, and then a type fallback.

wschildbach: No, there is no additional thing that someone would need to query for the MIME type.

Mark_Foltz: Are all DolbyVision streams going to be represented by these MIME types.

wschildbach: These are the only ones right now in the scope of this discussion.
… If a codec returns "can decode", it doesn't imply that it can decode DolbyVision. It may decode HDR10.

Mark_Foltz: So you need a query that can tell whether you can support DolbyVision without a fallback.
… I think I understand the problem more.
… I can take that back internally.

– DRAFT –
Media Working Group TPAC meeting

14 November 2025

Attendees

Meeting minutes

Agenda

Charter

EME

Key System divergence

Mixed encrypted and unencrypted content

Key rotation support in EME

Media Pipeline Performance

Suggestions for improving the W3C Process

WebCodecs Reference Frame Control

WebCodecs and WebRTC encoded transform

iframe media pausing

Media Capabilities

Horizontal reviews

Media Capabilities and webrtc

Stereoscopic video support

Interaction between Media Capabilities with WebCodecs

Audio Session setDefaultSinkId

Media WG Registries

Media Source Extensions

seekable range for a MediaSource with finite duration

Detachable MediaSource

Media Capabilities - Dolby Vision HDR Metadata

Diagnostics