W3C

– DRAFT –
Media & Entertainment Interest Group

07 April 2026

Attendees

Present
Atsushi Shimono, Bernd Czelhan, Chris Needham, Francois Daoust, Hiroki Endo, Hisayuki Ohmata, Kazuyuki Ashimua, Niko Farber, Paul Adenot, Rob Smith, Shunsuke Iwamura, Song Xu, Wolfgang Schildbach
Regrets
-
Chair
Chris Needham, Song Xu, Wolfgang Schildbach
Scribe
cpn, song, tidoust

Meeting minutes

Announcements

<song> walk through of the use case and req. document

two topics: webtransport API entering final stage. looking for wide review feedback

in web app., ppl. has experience to using it.

sugeestions for changes or improvement is very welcome on webTransport

Group set a deadline, end of this month.

https://www.w3.org/TR/webtransport/

Video delivery and streaming in combination with Web Codex for lower latency

https://github.com/w3c/ColorWeb-CG/blob/main/hdr-big-picture.md

SMPTE/st2094-50

The spec there, follow the spec through to Github issues, ppl to send feedback.

A couple od doc for review and feedback

You're an HDR speciallist, doc interesting for you.

Might need to be adapted to support HDR. including sort of canvas, and CSS

processing model for it is going through simply spec.

sort of proposed a necessarily unless anybody has sort of clarifying questions.

For the color stuff, recommend joning thit on the web community

leave a bit of time at the end if AOB

Next Generation Audio

Next Generation Audio. Wolfgang will lead it.

Wolfgang: The document is a group draft note

thanks

Wolfgang: I suggest opening the document as I talk you through it
… I have a presentation too that summarises the document
https://w3c.github.io/me-next-generation-audio
… As some history, we decided in TPAC to make a Group Note
… Made a draft in December
… I'm hoping we can have a call for consensus
… The Note has use cases, requirements, gap analysis, and privacy considerations
… What use cases do we want to enable? Dolby and Fraunhofer have developed this together
… And what would an API need to provide?
… The requirements describes cross-cutting concerns, e.g., it should work for all codecs
… The gap analysis answers why we think existing APIs can't be used to support the use cases
… The privacy considerations is a collection of thoughts, regarding privacy implications
… Please interrupt to ask questions
… The first use case is selecting a preselection. Interacting with the gain or volume, e.g., to increase the dialog
… Related is position interactivity, where you could put the dialog in a position where it doesn't overlap other audio elements
… Then, selecting individual audio elements in the mix. e.g., musical instruments to apply gain to them
… Another is where all these elements are controlled in conjunction
… The document has more detail
… Requirements - the first is to be codec agnostic. We're not asking for an API that's specific to one company's codec
… It should work for protected media. If the API only works in non-protected use cases, we think it doesn't solve the commercially relevant use cases
… It should work where there are multiple media streams. At a minimum, you'll have audio and video. (By the way, some of these concepts could also apply to video)
… With multiple audio streams, the personalisation can apply to one of those, so apply to the media stream, not just the device
… Controls should happen in realtime. If a user selects a specific preselection, they want it to be active right away, no perceivable latency
… Non-blocking hardware access: what we mean is that users will interact with the media, as it plays, and the APIs should be asynchronous, and not require waiting until a presentation is done. It should be async to the media playback
… Gap Analysis. In meetings so far, we were asked to explain why existing web APIs can't be used
… There's HTMLMediaElement, WebCodecs, e.g., in conjunction with AudioNode. Or implement it all in WASM or JS?
… HTMLMediaElement has an audioTracks attribute. It could conceivably be used to select audio preselections. But doing this confuses tracks and preselections. Some subtle problems are described in the Note
… Some audio tracks have very few attributes, e.g., language and kind, not enough to select by the user
… Selection semantics may not work, selecting audio tracks is mutually exclusive and may not work for preselections
… Use WebCodecs and AudioNodes to mix and process the result? A limitation here is that WebCodecs output is in the clear, and AudioNode input is in the clear
… So would work for non-protected content. And no object audio support. If we want to support spatial and object audio, moving it in space, it would have to be built in. It's not available today.
… If the pipeline is built by the application developer, the content creator has no control of the end result. This is important to creatives to ensure their product is presented
… JS and WASM implementations have a performance concern. It might work on a PC, but not on a TV set. Battery life is related to performance limitations. Also this doesn't work with content protection
… Finally, the privacy considerations. It's hard to compartmentalise these. Everything comes down to fingerprinting mechanisms. Not all of these are really germane to this API, they apply to any new API. For example, if the API is supported on one platform but not others, that's fingerprinting surface. It might also provide information about the

media the user consumes
… That might happen, through same-origin information leakage. If you open a media stream and query what's available, or what the default is, it might leak information about the user's default
… If the user sets preferences and those are shared between sessions, another session might query those set in the first session.
… This might happen implicitly, e.g., a smart implementation might pre-filter the personalisation options available against some preferences. So if you are able to get at the list of pre-filtered choices, it reveals something about what was filtered out
… These are considerations for implementers
… Data persistence, as preselection choices might persist beyond one session
… Any questions?

(none)
… Can we do the CfC?

Paul: Might need some wider exposure on the mailing list. So stakeholders can read through it

<RobSmith> Is there a relevant audio CG from which we could seek feedback?

In essence, IG cannot publish spec. we could go through the same process as WG

The mental model would be if a browser implementer looks at it, the doc expose enough info.

Chris: Is the document comprehensive, in terms of which codecs it includes? We've had a liaison in Media WG on 3GPP IVAS

in terms of the set of codecs that it's considering, the next generation audio codecs describing in the group note. like 3GPP EVAS

API approach would work across all of those codecs

Chris: So we could reach out to those groups

Bernd: Could we make a list? We could put a first version out there?

RobSmith: A suggestion, would putting together a demo be a good idea, to show how it works? It invites others to review how it fits their own model

Wolfgang: We've made demos, e.g., one at a previous TPAC

Bernd: There are standards around world using NGA with a certain toolset. We could do another demo, but I feel we don't make progress on the formal status of this note

Wolfgang: We'll need support of browser vendors, so that will need demos. I don't think demos progress the Group Note

<tidoust> Support for content protection in WebCodecs

Paul: Worth reading the WebCodecs issue on protection in WebCodecs

Rob: WebVMT was published as a Note 3 years ago, the process we used was to invite a 6 or 8 week deadline, and advertising it. I got some good feedback

cpn: I think next step is getting stakeholders feedback
… and then do the formal publication later on.

wolfgang: I'll send this around and set a deadline for review. 4-6 weeks.
… It's understood that we need to have more discussions with implementers.

Kaz: As Rob mentioned, there's a chicken and egg question. Asking Audio WG makes sense. However, the requirements in Section 4 is broader than Web Audio. So I suggest we ask the other W3C groups, like WoT and Voice Interaction for comments also. Voice Interaction guys organised a workshop on smart voice agents, and discussion included time synchronisation among multiple data streams.

Talking with them would make sense

<kaz> fyi, report from the Smart Voice Agents Workshohp

Next meeting

Chris: It's scheduled for 5 May, but I'll be away

[adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 248 (Mon Oct 27 20:04:16 2025 UTC).

Diagnostics

Succeeded: i|walk through|topic: Announcements|

Succeeded: s/3GPP/3GPP IVAS/

Succeeded: i/cpn: I think/scribe+ tidoust/

Succeeded: s/The requirements/However, the requirements/

Succeeded: s/synchronisation/synchronisation among multiple data streams/

Maybe present: Bernd, Chris, cpn, Kaz, Paul, Rob, RobSmith, Wolfgang

All speakers: Bernd, Chris, cpn, Kaz, Paul, Rob, RobSmith, Wolfgang

Active on IRC: cpn, kaz, RobSmith, song, tidoust, wschildbach