MEIG TPAC Meeting – 23 September 2024

Meeting minutes

MEIG group photo with TTWG during TPAC 2024 in Anaheim

MEIG group photo with TTWG and APA WG during TPAC 2024 in Anaheim

M&E IG Introduction

Chris: Igarashi-san, Chris Lorenzo and myself co-chair the IG. Kaz is our staff contact.

Chris: The role of this group is to provide a forum for media related discussions.
… We maintain relationships with other groups within W3C and other organizations to help bring requirements from outside of W3C within W3C.
… So that they can be used as input to Working Group discussions.*
… Looking at the entire spectrum from media inception to rendering.
… The group was at the inception of what became Media Source Extensions and Ecnrypted Media Extensions.
… Ongoing revisions of these specs are now happening.
… The web platform is much more powerful regarding media.
… We look at use cases, requirements and gaps to feed into existing working groups. When it's something new, we can look into creating a community group to incubate a solution before it transitions to a working group.

Chris: [presenting a slide with links to M&E IG resources]

Chris: 3 sessions today, this morning, then joint session with Timed Text, and then again this afternoon.
… [going through the agenda of the morning session]

Next Generation Audio codec API proposal

<kaz> Slides

<Niko> Presenter: Bernd Czelhan

Bernd: Together with Chris from BBC and colleagues from Fraunhofer, we've been working on a proposal for next generation audio codec API.
… Next generation audio experiences are defined in an EBU factsheet.
… The definition is a little abstract but has some key points.

Slideset: https://www.w3.org/2011/webtv/wiki/images/0/0b/W3C_NGA_Interface_no_video.pdf

Bernd: The demo shows different mixes, switches between different language commentaries, possibility to change the prominence of the dialogues.
… And you can change the presets.

Bernd: There are differences with traditional audio codecs. Main difference is the use of metadata embedded, that allow the user to select a prefered contenxt version. It's also to adjust the gain/prominence of certain components.
… It's also possible to adjust the position of certain audio components.
… We need a dedicated interface to accept these knobs.
… From a use case perspective, you could imagine a Home Team and an Away Team presets.
… Second use case is prominent of audio objects. Very useful to follow a video with audio in a foreign language when the actor can be hard to understand for non-native speakers.

Bernd: Third use case is selecting among a set of audio objects, components representing different content languages

Bernd: The fourth use case is position interactivity, e.g., if you are visually impaired.

Bernd: Last use case is controlling multiple components at the same time. For example, the breaking noise of a car may be very important for story telling.

Bernd: I will sketch a possible NGA API, that could perhaps be incubated in a Community Group.

Bernd: Some goals include covering the full potential of NGA and be codec agnostic.
… Thank you for the integration.

Mark_Foltz: Do what extent are the use cases specifc to pre-recorded content?

<Wolfgang> Bernd: metadata can be created for live as well as pre-recorded content.

Bernd: Someone needs to take care of creating the metadata, but the demo I showed was a live show.

Mark_Foltz: To what extent do we expect that the metadata already exists or could be synthesized?

Bernd: Assumption so far is that metadata already exists.

Bernd: Our idea is to make NGA available in the Web context.

Nigel: Can you be more specific about the metadata being exposed?

Bernd: metadata has been defined in MPEG ISOBMFF

Nigel: This is to be able to expose options to the user in the player.

Bernd: Yes.

Nigel: Why is that a good thing to depart from the DASH/HLS model?

<Zakim> nigel, you wanted to ask what are the API requirements

?1: It allows to be more efficient e.g., with one mono-channel in two different languages.

Nigel: It's more efficient at the provider side?

Niko Färber: Yes.

Wolfgang: There are other use cases that make more sense. I wouldn't say that the DASH model is being made obsolete, or that the proposal is even comparable with the DASH problem.
… What you need is a way to control the decoder with regards to how it uses the metadata.

Nigel: Thinking about the accessibility of this, do you have a scheme in mind for a way to associate the choice that the user might make in terms of preselection and an associated text track.
… Let's say you switch to French. Do you think anything additional is needed to associate the chosen language with the subtitles?

Bernd: The API would give you the metadata so that you can easily see what languages are included.

Wolfgang: If you're using this with DASH, I would expect all the preselection metadata to be also available in the DASH manifest.

Chris: What's being proposed here is to take these use cases and move to the next stage of work: taking these requirements and turning them into some kind of Web APIs that browsers would implement in order to enable web applications to make use of these customization features.

<kaz> l_Megitt, Francois_Daoust, Timo_Kunkel, Wolfgang_Schildbach, Chris_Needham, Tatsuya_Igarashi, Kaz_Ashimura, Kensaku_Komatsu, Mark_Foltz, Nikolaus_Faerber, Bernd_Czelhan, Hiroki_Endo, Hisayuki_Ohmata, Song_Xu

Chris: The question to the room is, do we think that's a valuable thing to be working on? If so, what do we do to take it to that next stage? If not, what are the concerns or barriers that might exist?

Nigel: My response to that question is: who is interesting to implement and deploy this? There's work going on in ISO for defining some of this stuff. What's the implementation outlook for this?
… Is there a chicken and egg situation?

Wolfgang: In terms of implementations, I know that the Dolby and Fraunhofer NGA codecs have supported these features since the beginning. These features are just not exposed in a Web context.
… We know some consumers who use these features in a native space and would like to use them in a web context too, but cannot right now for lack of an API.

Wolfgang: The MPEG Systems group at ISO has recently published a final draft. It's final, it won't be changing anymore.

Francois: That's all audio API - there's probably metadata in the video space as well,
… created by cameras etc. Is there an interest in approaching that with both angles at once?
… Like we do in other contexts, treating audio and video the same?

Wolfgang: We presented this as an audio API because we're audio guys
… In ISO/MPEG this was discussed in a media independent way, and it does transfer
… to video, for example with picture in picture or multiview video.
… if you have codecs where HDR is available as a post-processing layer, switching it on or off
… could be exposed via preselections as well. Use cases like picture in picture or multi-view video.
… So it could be relevant for video.
… I haven't thought about subtitles, maybe it could be a use case as well

Bernd: We're interested in a general API and our idea is that if we move to the next stage
… then people will bring their contributions to it.
… (video people, not us, we're audio people!)

Mark_Foltz: Pieces of feedback: Before drafting an API, try to understand precisely what the tehcnical gaps are.
… And two, if you're not trageting codecs per se, try to explain the limitations of what you're trying to do. For example, metadata that is carried within the codec or on the side.

Bernd: You may have certain requirements that you cannot play some components instead of others, for example, no way to disable a component entirely.

Wolfgang: Maybe I'll clarify the out-of-bitstream delivery. With ISOBMFF, the metadata will be in the init segment. With MSE, you would be able to inspect the metadata.
… The practical gap is telling the decoder which of these metadata to apply. I don't think that can be easily emulated.

Kaz: Applying this to 3D audio and 3D video, at some point we may want to look at the relationship between this API and geolocation information of the speaker and listener.
… A way to express the actual location of the audio input and output.

Wolfgang: For the audio, it's possible to qualify what the layout of the audio is.
… I think that would be possible to express this in the metadata. That would be a question more for the file formats.

Chris: With this kind of codec, does it work well with outputs such as headphones binaural to position things in a 3D space?

Bernd: Live meetings could come to mind. That's a possibility, but then someone needs to create the metadata in the first place.

Wolfgang: I think sources can be positioned in a room, and then some metadata can be used to create interactivity, but that's already a very advanced use case. The very basic "selecting one of a set of versions" works with immersive audio, but it's more basic.

Youenn: The API would be about allowing the web app to tell the user agent to selecting this or that setting. By default, I would expect the user agent to be good at selecting the right choice.
… Describing exactly why the web application may be in a better position to choose would help pushing this forward.
… In terms of accessibiliy, usually, the user agent is in a good place.

Bernd: In the use cases I presented, the ability to change the prominence of a component is a good example of where things may be hard to determine for the user agent.

Chris: Back to identifying gaps, it seems to me that what is being proposed here is coming from a place where new codecs were developed to satisfy a set of use cases. Mark, you're suggesting to look at it independently of whether new codecs had had been created to understand where the gaps precisely are.

Mark_Foltz: One example is real-time mixing. How can the metadata interact with the Web Audio API. Identifying the gaps would be useful for that. I don't think browsers today understand NGA codecs.

Wolfgang: I believe that the browser needs to understand a specific codec in practice, the metadata is codec agnostic.

in response to youenn: Agree we should investigate which are the cases where the browsers' default choice is not optimal. As Bernd says, there are some "one size does not fit all" cases.

song: I'm working on NGVC (Next Generation Video Codec). To introduce a new codec, I need to prove what the new codec brings is necessary.
… Examples of power consumption, efficiency. I'm not familiar with next generation audio. Is the task to introduce new codecs and/or the API?
… Do we have any plan to introduce NGV to the Web too?

Bernd: We're really just focusing on interface. The technology is deployed today otherwise.

song: If we introduce the NGA to the web, do we have any plan to introduce the NGV?

Wolfgang: I don't think a plan exists. As we mentioned earlier, this proposal that we're bringing, we think, apply to video as well.
… We're looking for feedback on what needs to be changed or added to also make it work for video.

Chris: Trying to bring some conclusion. There isn't strong support to look at incubating an API is the right next step. More support to specify more thoroughly the analysis of use cases, requirements and gaps. That is something that the Media and Entertainment IG can do.
… I would find it helpful that continuing work in this direction is seen as worthwhile doing.
… Focusing on audio as we have been, the following question is about broadening the scope to video.
… Both of these things can be addressed in the IG.
… How much interest is there in the outcome of the work?
… Who's interested in helping write this analysis document?

Wolfgang: We're willing to contribute to the gap analysis. We'd be looking forward to having more contributors to the gap analysis.

ZheWang: Huawei would also be happy to join this gap analysis.

Nigel: For players, there's a place where players could be made aware of different options. Maybe we should think from that perspective. For example, we mentioned DASH and NGA, there's no point if metadata disagree.
… I can think of many cases where this kind of thing is helpful for audio. I don't think I've ever seen different video options available.
… Sign-language interpretation could perhaps come to mind.
… A holistic view of the picture would be useful. A step beyond that, how should they be making a decision for what the users need? From an accessibility perspective, it is not easy to know what settings need to be applied. MovieLabs has an algorithm to compute scores that can help make a choice.

Wolfgang: I can relate to the difficulty of making a selection.

Chris: Proposed next step: elaborate a gap analysis in this IG. The scope can include the use cases. We may include both NGA and NGV. That's what I'm proposing that we do next.

<Zakim> nigel, you wanted to note that signalling alternative playback options more generally could be useful

Francois: Anyone willing to become the main editor who can drive the analysis?

Bernd: Happy to.

Media Capabilities

<kaz>Slides

Timo: We brought forward this proposal last year at TPAC in the AV4browsers CG. We're now thinking of moving this proposal to this IG.
… This is a quick update.
… Starting point is the Media Capabilities specification.
… We've been looking at how to specific HDR format support.
… Currently, you have color gamut and transfer function, which is a great start but there's no way to target commercial formats.

Timo: Proposal to adopt similar rules for HdrMetadataType

Timo: Proposal is to add commercial formats, but not in enum but through a registry instead
… Would love to see a way to develop this further and make it a reality
… Define registry entries, the three existing ones, plus Dolby Vision and any others that are needed
… Can look at how this could be materialised
… Continue discussion in this group and bring it over from the CG to this group
… One comment that came up:
… To have more documentation available.
… Since last year we have published a lot of documentation on line about how Dolby Vision works
… and how to implement.
… [slide] Links to resources
… Hope these slides will be shared later.

Slideset: https://www.w3.org/2011/webtv/wiki/images/a/a6/MediaCapabilities_Proposal_Update.pdf

Timo: Nothing new compared to last year, just a recap as to what we discussed.
… This is just a recap.
… Comments, are we making omissions?

Chris: Is it sufficient to have a single Dolby Vision Metadata value, or is more granularity needed?

Timo: to my understanding yes, but there is flexibility for new values if needed in the Registry in the future.
… At the moment, it's just one.

<Zakim> nigel, you wanted to ask about Registry definition ownership

Nigel: For a Registry, you need a WG to own the Registry Definition. Have you talked to any WG about this?

Timo: Not yet

Nigel: Also they could delegate management of the Registry table values to an IG, potentially.

Chris: Media WG would be a good fit for owning this, if they agree they want to do it.
… Media WG has a large number of existing Registries.
… Like for EME, MSE Web Codecs, etc. Used to managing Registries in that group.
… Would change the API Specification, from having an enum value to other identifiers.
… There's complexity with enums, we switched to strings.
… We could revisit the decision to have a Registry if it's felt that just adding an enum value to the
… existing spec is the appropriate way to go.
… The concern that we had when we discussed it last year was variability between implementations
… that meant that the presence of the Registry wouldn't necessarily mean all implementations
… would be required to include it. Is that correct?

Timo: Not sure, there was concern to add a commercial enum value.
… From our point of view we would like to be able to address the format correctly,
… don't mind how, we're happy with both approaches.

Wolfgang: We're not asking for any specific technology to be supported.
… Just an addition to the media capability API so that the client code can ask what is supported.
… "No" is a perfectly acceptable answer.

Chris: Trying to recall why we concluded that a Registry would be worthwhile last time.

Francois: Might depend on whether a normative definition of what it means is needed.
… In this case, if there's no public specification, it means that you end up with a normative reference
… to a commercial specification, which you may not want.
… The Registry provides an intermediate layer.

AramZS: I noticed when reviewing the spec, you should try to get it early through PING.
… Adding a navigator object that returns implementation support details is a fingerprinting vector.
… Should get PING review early rather than later.

Francois: Seems to be a more generic question about the Media Capabilities API, or just on this new value?
… The API already has an identified issue about fingerprinting and mitigations.
… There's been interaction with PING on this already.
… You may not like the solution!

Chris: Concerns about what you mentioned did come out.
… To the extent of questioning the whole MSE and EME approach,
… where the web app is doing things not just the browser.
… They're argument is why not hand a URL to the browser and let it handle all the details.
… We need to discuss this on Thursday - the rationale for why the design is how it is.

AramZS: Yeah that's pretty normal, it just occurred to me that a different approach might be
… more satisfactory from a privacy focus.

Chris: Yes, that's about the API in general not just the individual value.
… Thank you, it's a very good point.

AramZS: Thanks

Timo: Thinking about the Audiovisual Media Formats for Browsers CG, discussing next steps, keep going or close it.
… Want to give members a chance to share their opinions.
… At the moment we don't have too much traction.
… Any thoughts, please let us know.
… Someone signed up only this morning, so there is some momentum!

Chris: We created the CG to focus on a specific scope, and as a place not to design solutions,
… but to be more of an interest group contribution structure.
… I'd like to fold the activity there into this IG.
… The only question is your (Dolby) membership status, to solve that problem.
… Then you'd get the support of Chairs here to help move it forwards.
… What tends to happen is that the CG list is public and can be discovered and joined
… but if there's no real activity happening there then they end up with nothing to see or follow up on.
… Having a place where it's actively being worked on is helpful.
… That's just my suggestion
… If people feel that a dedicated group is more appropriate structure, happy to operate that way
… I suggest thinking about that over the break.

Kaz: Another WG is also thinking about Registries for various reasons.
… Would like you all to think about the whole mechanism of the expected Registry and
… which parts would be handled by which Registry and what our role is here.
… Please discuss during the meeting on Thursday.

Chris: Happy to talk to the WoT WG about how they handle Registries.

Nigel: TTWG has Registries too, happy to share experiences.

Chris: Adjourn for now, next after the break is joint meeting with Timed Text WG.

TTWG / MEIG Joint Meeting

<kaz> Slides

Nigel: I chair TTWG, with Gary Katsevman
… There are 3 topics in the agenda: Update on TTWG, a look at DAPT, and discussion on Timed Text in MSE
… Anything else to cover?

(nothing)

Updates from TTWG

<kaz> IMSC HRM Rec

Nigel: We published the IMSC HRM as Recommendation
… It used to be in IMSC, but we've updated it and made a test suite, and met Rec requirements
… The biggest area activity is DAPT, some activity on WebVTT, and less on TTML2
… We recharter in April next year
… TTML 2nd Edition has been in CR since 2021
… On Friday's TTWG meeting is to assess what we need to do to get to Rec
… IMSC has been Rec for a while. Three versions simultaneously active
… We'll think about whether we do a new version of IMSC
… A requirement that came up is the ability to do superscript and subscript
… And removing the HRM, as it's in a separate spec
… On WebVTT, it's been at CR since April 2019
… Apple will discuss interop issues on Friday, and attribute block for metadata, and an inline block backdrop element
… DAPT is a working draft. We aim to resolve to get to CR on Friday
… Other things, we got feedback last year. It's hard for beginners to the TTML ecosystem to get started. Is that a shared issue for people here?
… We are happy to work with industry partners on the best way to share information
… MDN documentation is user, not implemneter focused
… CCSUBS group has industry partners, sharing implementation experience

Chris: On the TTML ecosystem, where is the feedback coming from?

Nigel: Someone in the APA WG mentioned it. But it seems easy for people who're immersed in it. Interested to hear views
… Also happy to take feedback outside the meeting

Francois: Could depend on the audience. IMSC documentation in MDN, seems a good entry point for developers. If you're targeting standards groups, e.g., the W3C accessibility community, it's a different audience
… A document for them could be useful?

Nigel: The MDN doc is useful for people making caption documents, less useful for implementers building players
… Lots of implementations not in browsers, and variety in the implementations
… People may want to use advanced features but not know where to get started

DAPT

<kaz> Dubbing and Audio description Profiles of TTML2

Nigel: The purpose of the spec, it's a profile of TTML2, transcripts of audio or video, transform them into scripts
… Dubbing scripts, or audio description scripts, with instructions for the player
… Can be used to play in the browser. I have JS code to do that, using TextTrackCue and Web Audio
… Your first step creating dubbing scripts is to capture the original language. Because it's based on TTML, easy to transform to IMSC
… It's an improvement to the audience experience.

Cyril: We've had reports at netflix where content isn't dubbed or described consistency
… We've brought our TTAL spec to W3C for standardisation. It's something we really care about
… I have colleagures who are German-speaking with dubbing and captions in German. If the entire sentence is different, it's really annyoing

Nigel: That happens because when people localise the media, they send it to different providers for dubbing and for captioning
… DAPT is good, as it allows people to do the transcript and translation stage once, then go the dubbing and captions separately - from a common source

Cyril: DAPT aims to fill the gap between the different vendors. Scripts weren't shareable between formats, Excel, scanned PDFs with notes. So a standard mahine-readable format helps

Nigel: Current status, it's a working draft. The spec is defined with a data model, defines entities to represent the scripts, then how to represent the data model in TTML2
… It helps guide us producing tests
… The data model is a script, with script events (timed). Those contain text objects, potentially in different languages. Then what that represents - dialog or non-dialog sounds, something in-vision (text or objects)
… If text, is it a credit or a location indicator?
… We may use a registry for the content descriptors for what each thing represents
… Happy to answer questions. Anything else?

Cyril: We tried to write the spec in a way that you don't have to undertsand the complexity of TTML
… Incrementally understanding the concepts. Hopefully we did a good job, interested in fedeback
… Also, a simple way to implement, can be very restricted to what you actually need. Because DAPT is a strict subset of TTML
… DAPT isn't about presentation, subtitles ina position with colour. It's about the timing and the text

Nigel: But you can add the styling and position data if useful

Nigel: There's significant interest in using this. Peopl eusing proprietary standards. Some of the AD standarsd are old and hard to extend
… So be doing this, and improving the feature seat, we want to make it easier for people creating localised versions
… For the end user, for dubbed versions, we don't intend to expose the ... directly. No expectation to deliver to the browser for mass distribution
… For AD, could all be provider-side, but there's an additional benefit of delivering to the browser
… The user can adjust the mix to their own needs
… In the UK broadcast market, you can adjust the mix levels. Clash between background sounds and dialog in the description
… As well as changing the mix, you can expose to assistive technology, e.g., a braille display. Watch video and get the braille description simultaneously

Wolfgang: How is the use case of changing the audio levels being implemented today in web apps?

Nigel: I use the DAPT description to drive gain nodes in Web Audio. It is feasible, but not typically done in the web space. Similar use case to earlier this morning
… If you wanted to create an NGA version, with preselection, you could do iit by proessing a DAPT document. Happy to work with you on that

Wolfgang: I'd be interested to try that

Nigel: The gap, is the DAPT can contain texts and the audio components, but with NGA I don't think there's a way to put the text equivalent in. How would you sync that with NGA?

Timed text in MSE

<kaz> Media Source Extensions WD

Nigel: This topic has come up several times over the years, and not gone anywhere
… If you're familiar with MSE, you know you typically use them to fill a buffer pipeline of media
… It works well got audio and video, but not specified for timed text
… When I've talked to my BBC colleagues who write our players, they say it could simplify their implementation
… Buffer management all done in the same way, rather than a different mechanism for text
… Could create TextTrackCues for the data. It can be difficult to manage the size of the TextTrackCue list, these could grow and grow, so good to have a way to manage the buffering
… Not suggesting doing anything different for rendering
… Is this a shared requirement? Suggest adding Text Tracks in MSE to Media WG?

Wolfgang: So today, subtitles are rendered by the web app?

Nigel: They could be, or passed to the browser for rendering
… But in DASH or HLS, the pipeline management isn't done through MSE, but in parallel by the app

Wolfgang: Playout in times?

Chris: The media element handles that, by triggering the cues at the right time

Cyril: The problem that we have to have different code paths for AV and text assets is really there

Wolfgang: So I understand it is not about playout in sync, but rather about buffer management.

Cyril: On rendering, ideally it should be possible to download and buffer text assets like audio and video and defer rendering to the browser, if it has the capabilities, or the JS level
… Options for how you preserve styles. Apple want to have OS styles applied to captions, whereas Netflix wants to have control over the styles across the

Chris: Fetch as separate files or within the media container?

Nigel: That happens now, you wrap the data in the MP4, that's how they're package according to CMAF
… If you have the infrastructure for fetching MP4 segments, why do something different for timed text compared to the audio and video?

Nikolaus: Question about the rendering

Cyril: It is a separable problem, but it never was separated. Feed data to the MSE, and it gets rendered. Here you need a callback to the app. How do you ensure a bad user doesn't create many many cues to overload the browser?

Nigel: Why would you want to do that?

Cyril: It's about the browsers need to be defensive

Chris: The requirement would be that there is some notifcation from MSE to the web app that a cue has been extracted from the media.

Chris: currently we have enter and exit events, if browser manages all this. Is this adequate and in time?

Cyril: it is too late.
… would have to have the notification in advance

Nigel: The Safari tech preview, it puts each Cue into an HTML fragment, then the cue rendering is obvious
… If you think about a video decoder pipeline, it's not instantaneous either. It needs a few frames to do something
… The processing load for putting text on screen is unlikely to be higher than that

Chris: why does the web app not extract the cues from the media, and then purge out cues no longer relevant?

Chris: So this is about having a unified approach to the memory management

Nigel: Yes

Aram: Is there a reason why the Safari approach of HTML framents isn't desirable?

Nigel: That moves the presentation side to a better place than it is now
… Partly, it relates to the privacy issue we discussed earlier. Pass the content to the browser. Modelled on WebVTT at the moment
… But it's no good if you don't want to use WebVTT. It provides a canonical format for WebVTT
… I have questions about exactly how it works, but aims to be more flexible

Aram: It seems desirable from a developer perspective to have the HTML fragments in place. Developer movement to using HTML fragments, so there could be more people familiar now than even a year ago

Nigel: There are lots of formats people use in practice. We can try to avoid that mattering, than have a canonical way to pass that information through

Chris: Are there implementations that do this with VTT today? Like creation of queues and pulling them out of the media.
… DataCue is an example. There are implementations that exist but are not speced.

Nigel: How to get to a conclusion? We could stop worrying about it, if it's not something we want to do. Or try to get the ball rolling

Cyril: Apple has this proposal, implemented. How did other browsers react? Was there a lack of interest?

Nigel: One proposal is to manage the buffer pipeline for timed text through the MSE buffer. I dont' know of any implementations of that
… The other is decoding and passing to the web app, as HTML fragments

Cyril: The second depends on the first.

Nigel: From previous conversations, e.g, with Eric, his view was the way browsers implement WebVTT is to already create an HTML fragment then expose to the shadow DOM
… A consequence of that is move the WebVTT decoding upstream. then it's easier to polyfill
… WebVTT spec has been in CR for 5 years, not enough implemnetation experience to move ahead. But people aren't implementing it
… This could be a path out of that. You could democratise the format requirement, so not have a dependency on the single format, but also make it easier to standardise the formats people want

Nigel: Standardising the presentation part means it's easier to standardise what goes in to MSE

Cyril: If you have a way to put text into MSE, without a way to surface that, the only alternative is native implementation
… Browsers don't like that because of attack surface area, complexity, etc

Chris: This all widely supported in other environments
… Allowing a plurality of formats overall seems a good thing, personally

Nigel: It's a part of the web standards ecosystem that doesn't make sense to me. A lot of people view the WebVTT format as the web's format. But it's not a Rec

Chris: Is there interest to work on VTT interop?

Nigel: Apple want to talk about that on Friday in TTWG

Chris: Next steps for this?

Nigel: Text Tracks in MSE isn't on TTWG agenda. It would need Media WG support

Nigel: How to summarise. There's some interest and benefit to pushing timed text through MSE. Nobody said it's a bad idea. Possibly need to consider together a canonical handover format on the output of the MSE pipeline to the MSE processor
… And Apple have a prototype implementation based on HTML fragments

Chris: We also discussed similar requirements aroudn emsg events and timed metadata. But timed text seems a better motivatign use case

Nigel: If you're delivering packaged segments of timed text info, do you convert to the canonical format on the way in to the MSE buffer or on the way out
… I think I prefer on the way out

Chris: Need to get implementer input, we have siginficant steraming providers interested to use it, but the folks aren't in the room right now

Aram: Want to see improvement in this area, so it does seem worth doing

Chris: Suggest catching up with people during TPAC week

Nigel: Action should be to tell Media WG that we've discussed, and there are some views on how might be implemented

Media Breakouts

Igarashi: There are several media breakout sessions proposed.

<kaz2> Sync on Web, now and next of realtime media services on web Breakout

Komatsu(NTT Com): On Wednesday, we have a session on Media Over QUIC. People interested in low latency features
… Also synchronising media and data with the video and how can be realised with MoQ
… I'll demonstrate some use cases. I think there's a gap, so we'd like to discuss

<kaz2> Smart Cities Breakout

Kaz: Smart cities will be at 4pm

https://www.w3.org/2024/09/TPAC/breakouts.html

<kaz2> HTTPS for Local Networks Breakout

Igarashi: HTTPs on the local network, I was cochairing the CG, but others have suggested the topic. Media streaming on the home network

Song: Is that MoQ? China Mobile is interested, we released a joint development at MWC in Barcelona
… How can I find the details on the breakout?

https://github.com/w3c/tpac2024-breakouts/

<Igarashi> https://www.w3.org/events/meetings/49386363-7a65-4f4a-9580-bff867a1c6e9/

<Igarashi> Evolved Video Encoding with WebCodecs

<tidoust> Official breakout schedule

CTA WAVE Streaming Media Test Suite

<kaz> Slides

Louay: Sorry I couldn't be there in person
… You may have heard about this already, we presented to an MEIG teleconference
… I'll explain about CTA WAVE in general
… Consumer Technology Association, Web Application Video Ecosystem
… Web delivered media and commercial audio and video on consumer electronics
… WAVE defines a set of specs, test suites, and test tools

Louay: Listed here are some of the most important specs, Device Playback Capabilities is the focus today, others like CMCD, CMSD, Web Media API
… Test suites are available to be able to test these specs and whether consumer devices are compliant
… Web Media API Snapshot test suite, but here I'll focus on DPC
… Conformance test software, GCP is a conformance tool to make sure DASH content is conformant to different profiles
… Going into the DPC spec, it defines normative requirements for content playback on devices
… CTA WAVE content is not only content and programmes, which is basically a sequence of CMAF content
… Media playback model, DRM protected media, WAVE content playback requirements etc

Louay: With playback requrements, there are multiple scenarios, where DPC makes sure devices are compliant, switching sets, random access, sequential track playback
… These are defined in the DPC spec and the Test Content Format spec
… They define what steps are needed to run a sequence of tests and pass/fail criteria
… W3C testing activities, mainly happens in the WPT project. In the DPCTF test suite we're introducing additional components to what is already in WPT
… Slide 5
… The main components, github links. Mezzanine content annotated A/V content to make the testing process easier
… The main requirement here is to make the test process automatic, no human support to run the tests, which would be time consuming
… Detect if frames are being skipped on the device
… Next are the test content repositories. Different codecs and configurations
… These are used in the test conformance validator. Joint project between DASH-IF, CTA WAVE, HbbTV
… You can run the tests on devices in different environments. We focus on the web environment, MSE and EME
… Slide 4 shows there are multiple tests. For each , there's a test implenented
… The most important component is the observation framework
… This is basically a camera that records A/V content playing on the device
… After a test is done, the recording is automatically analysed to understand if the test passes or fails
… Slide 6
… A test video is playing, video includes annotations, rotating QR codes on each frame. It includes info about the current frame displayed
… It allows the OF to automatically analyse
… The red arrows allow you to ensure the content is properly displayed on the screen
… QR codes for audio and video sync
… Time codes are shown
… We have audio annotations, provided by Dolby and Fraunhofer IIS, partners in the project
… Also available in different frame rates and fractional frame rates. CMAF test content is generated from the mezzanine content
… Slide 7
… CMAF media segments and fragments, with metadata describing the segments
… DASH MPD, but you can use HLS as well, as they reference the same content
… We have validatorrs for DASH content
… The content will be encoded in different combinations of content options, content. Special content for debugging. Different resolutions for switching sets, and DRM protected content
… There's a matrix between the tests and the content. And a validation process that uses JCCP DASH validator
… Test implementations are in HTML and JS templates
… Slide 8
… The templates reference the specification section numbers, clear definitions for how the tests should work
… The templates allow a single test can be run with multiple content options
… A test is a combination of HTML, JS, and CTA WAVE content
… Using MSE and EME for encrypted tests. We have common libraries used in different tests, MPD parsing
… Why not use existing players like dash.js or Shaka Player? We're trying to have a minimal implementation, independent of external players, the test instructions work at the MSE level, buffering
… The players don't give the flexibility from a testing perspective
… They include what is to be tested and nothing else. Sequential playback, random access to fragments, buffer underrun and recovery
… Thsi is just a small snapshot of the tests
… Slide 9
… The HTML and JS templates are similar to those in the WPT repository, e.g., for WebRTC, Fetch API, etc
… Tests focus on the functionalities of specific APIs. Similar here
… Because we focus on embedded devices, streaming devices and smart TVs, but also applies to any mobile or desktop device
… The test runner takes the tests from the test repository and runs them on a device
… It's built on top of WPT. We extended WPT to support embedded devices, where you don't have much flexibility on user inputs
… Also limited ability to install a test runner, e.g., an HTTP server
… Our extension is contributed back to the WPT project
… You can use these with existing W3C WPTs, not only the tests developed by WAVE
… You have a Device under test, which just needs to open a web page, scan a QR code
… You have a companion app where you can select the tests to run
… You can do a full test run, or a selection of tests
… You can filter the tests, e.g., a set for HbbTV specifications
… Slide 10
… The last component is the observation framework. One component of this is recording. You can use a smartphone to record
… One requirement is the recording quality
… Each frame needs to be detected, so the frame rate needs to be at least 2x the frame rate on the TV
… With new smartphone models you can record at 120fps
… Most of the content is 50 or 60fps or a fractional frame rate. In future we'll have 120fps content, so you'd need to record at 240fps
… Record offline, then copy the video file to a server and run the OF script on it
… This will analyse the content frame by frame, using the QR codes
… It detects if frames are skipped and so on
… With automatic observation, it's important to have automatic generation of test reports
… This picture shows 2 additional QR codes. The one on the right is generated by the test app, not in the video content. It includes info about the status of the tests and the video playback
… You can compare the frame timestamp with the time recorded by the video element
… For the random access to segment tests, you can check the right events are generated. All this info is embedded in the QR codes
… The tests checks the media player integration in the device is working.
… Slide 11
… To make it easy to use the test suite, we made a landing page, with an explainer
… You can instal and run the tests yourself
… Everything is open source, in GitHub
… You can raise issues in GitHub or provide feedback
… I recommend the landing page as the starting point, follow the instructions there
… [Shows demo video]
… Slide 12
… Cable connects the audio output from the TV
… Now the test is playing on the TV, you can follow the progress on the companion screen
… We have assertions that the video is playing properly and no errors from the media element or MSE
… Now the video is ended. We have special QR codes for the start and end frames, because in many cases there are issues with playing the first or the last frame of the content
… The next content is then loaded. A special QR code is shown between the tests
… The OF knows there the test results are to be submitted
… When all the tests are finished, you get a QR code on the test screen, the OF detects the test is ended
… These aren't the final results as you then need to do the video analysis. The page is then updated
… You can download the results in HTML or JSON format, same as WPT
… Slide 13
… Technical requirements for installing the test suite. We prefer Linux, everythiing is containerised with Docker
… If you're running the DRM tests you'll need TLS server certificates
… And you need to record at 120fps
… Slide 14
… Test report is similar to WPT
… In this case every video frame assertion is failed, because a frame was skipped
… Another failure is the playback duration, which doesn't match the CMAF track duration
… Slide 15
… We use the HbbTV plugfest to do testing and validation. There have been testing events in the last 2 years, the next is in 2 weeks at Fraunhofer Fokus in Berlin
… We'll ensure the test suite is working across devices
… We use this daily in our labs
… Slide 16
… How it works with HbbTV? We use a setup with a DVB modulator where you can signal the app to the TV, which is the test suite landing page
… The device under test is an HbbTV terminal
… But it could be any device with MSE and EME
… Slide 17
… Test results from the plugfest last year. We run the test suite on the latest TVs on the market
… Slide 18
… Any questions? You're welcome to get in touch

Wolfgang: You use HbbTV as a convenient way to start the tests on the TV
… Are you aware of others using DIAL or other ways to launch?

Louay: You can use DIAL, as the way to launch the tests
… In some situations we also have issues with external networks at testing events, you sometimes discover the terminal through DIAL, due to restrictions on the local network
… There are 3 browser engines ont he TV. The HbbTV browser, the runtime for running smart TV apps, like Tizen or WebOS - these are browsers with additional features
… We use hosted web apps. Your app is a small file that points to the landing page and you can run the tests in the same way

Chris: What is happening next?

Louay: Audio test content from Dolby and Fraunhofer IIS. We're now going to validate the tests
… In terms of the OF, the most wanted feature is to do this in real time. Currently you record with the smartphone, then upload, then analyse
… So while you're recording we'd start the analysis in parallel. Maybe we can get it working in real time
… It'll mean you don't need to wait to get the results

Kaz: wondering about NHK's interest in this mechanism for testing. Differences with Hybridcast, but we should be able to use this framework

Louay: HbbTV is one platform, but we don't use the broadcast features or HbbTV specific APIs. We only rely on HTML, JS, MSE
… and EME, for the DRM tests

Ohmata: IPTF-J is a member. We tried to use the suite on Hybridcast TV. The limited memory and CPU on TVs causes some to freeze
… We're trying now to use all the tests

Louay: Non functional requirements. If you test on older TVs, you may have issues with memory management. If the test suite fails because of these limitations, it's an indication that an open source player like DASH.js or Shaka might also have issues
… It's helpful for us, create a GitHub issue and we'll do our best to support you

<nigel> Ohmata: Thank you

Chris: Consistency of implemtations or any W3C spec issues that can help with interop?

Louay: MSE is the most important API. There are examples in the tests for using MSE. The thing we see most is skipped frames at the beginning
… Was it really an issue with the test implementation, or with the TV? It was a repeating failre on many devices
… Also some with A/V synchronisation
… If you look at the validated tests now, those pass on major TV sets

Chris: This is such an important contribution. Congratulations on the work you've done here

Web Media Snapshot Update

<kaz> Slides

<kaz> Web Media Snapshot 2024

<kaz> WMAS

John: One of the groups in WAVE is the HTML5 API Task Force
… Some updates worth bringing to the group here
… Since 2017 every year we've released a snapshot document. the minimum set of standards for media playback across the 4 major browsers
… Drives an understanding of what features are availble on TV devices at the time the snapshot is published
… The work happens in WAVE and the W3C Web Media API CG
… We jointly publish the spec, W3C and CTA
… We publish every December. Starting last year with WMAS 2023, we set the target date to November 1st
… Standards groups reference it, e.g., ATSC, so we try to fit their timescales
… ATSC 3.0 they reference the 2021 snapshot, HbbTV 2.4 references 2021
… There is a WMAS test suite that Louay and team have developed
… Launch through DIAL or a TV modulator
… The primary tests are from WPT, also ECMAscript and WebGL tests
… You can test the full set of featrures in browsers for the TVs for each year
… 2024 snapshot updates. The expectations in the next release: CSS to 2023 snapshot. This includes Media Queries Level 4, which reduces the media types to print and screen. TV was deprecated
… There's a WPT for the deprecation, and we tested on devices, so it's supported and excepted
… It was added to the spec in 2011, so we were comfortable making the change. If you're concerned let us know. Use the media feature option instead.
… We updated to ECMAscript 2024, which has a couple exceptions based on what's implemented
… WHATWG are living standards, we reference a recent review draft from WHATWG, and tell people to use this version or later
… Also WebSockets, which are in its own WHATWG spec
… Features anticipated to be included, WebTransport, WebAssembly, Push API
… WT doesn't have full support yet
… You can identify which features are included, downstream groups typically reference a year or two earlier
… You can watch the GitHub or comment

Chris: What sources of information do you use to decide what to include?

John: Start with MDN and caniuse, validate with WPT, which we rely heavily on

Chris: Some inconsistencies in WPTs and coverage. Does it lead to interop issues?

John: Sometimes the tests go in before the features land
… I'll check why the tests are failing, are they a newer version of the spec? There's not an easy way to know, for a given spec, what is the state of the spec?
… That's the majority of the work, determining where the tests fall, e.g., the CSS tests getting updated

MarkV: I always thought that doing the work to find areas of incompatibility really points out problems developers have to face
… It seems there's a missing process, to take these exceptions to interop as a to-do list and drive solving them
… I don't know if that's ever happened. Would that make sense?

Francois: It is something we should do. There's an ongoing activity that's relevant, in the WebDX CG and the notion of Baseline, which appears in MDN and caniuse

<kaz> WebDX CG

<kaz> baseline-status

Francois: We have fine-grained information, but doesn't give you high level information. The web features projcet is about finding the middle ground
… A mapping of features that make sense to web developers, and an interoperability score, which can detect anomalies
… This mappnig exercies is ongoing, Google is investing resources in the CG. By end of 2024 all feature will have been mapped
… It's onging as new features will be added.

Chris: The browsers run annual Interop events, we could promote certain areas to be looked at?

Chris: What holds ATSC and HbbTV back from using more recent WMAS snapshots?

John: The specs could end up in certain regulations, in certain countries. They want to give manufacturers time to catch up. So being behind allows manufactures time to be up to date
… In practice I find most devices are up to date

Chris: You seem to be doing much of the work. Do you need help?

John: Flagging tests from updated specs. Future considerations. some suggestions from HbbTV for WASM
… If you see we've missed something, we welcome those updates, that would be helpful

Kaz: Given your snapshot documet includes WebRTC, Web Sockets, you might be interested in MoQ breakout on Wednesday
… WebCodecs and WebTransport

Nigel: Subtitles and captions, the core requirements needed to implement a JS renderer are met, but there's nothing in the snapshot that says browsers should play back TTML or IMSC, as expected
… How much are browsers influenced by the snapshot? Do they use as motivation for interop?

John: Haven't seen the main browsers looking for those things. I have seen feedback from WebKit and WPE. We hope the future considerations section will help push things
… It's more on the CE side than the browsers

Nigel: Successfully passing tests or running 3rd party libraries, e.g., dash.js or hls.js, imsc.js. If you had a stack that couldn't run that, it would be a problem
… Have you looked at testing not just what's in the browsers, but these 3rd-party libraries?

John: Not yet. Want a suitable way to test that accurately.

<kaz> DPCTF tests

Chris: Could do using the DPCTF tests and observation framework, to test end to end

Louay: Could extend the OF to check subtitles are properly rendered in the right position and right time
… Need to do it in an automatic way. For video it's easier, so would need to come up with a solution for subtitles

Originator Profile

<kaz> Originator Profile Web site

Michiko: The issue is: is the info I'm being shown correct? Declaration from the originator that the content is correct
… Show that Information has not been tampered
… we use cryptographic techniques. Implementation has begun this year
… Local goverments using this in 2024
… We have a breakout on Wednesday. We want to share the challenges and how to discuss. We start at 8:30

<JohnRiv> This appears to be the event: https://www.w3.org/events/meetings/2da8028a-367a-4ddd-a43c-f821d04bbc0c/

<kaz> Brreakout on Trust the Origin, Trust the Content - Originator Profil (Catalina 2)

Media Content Metadata Japanese CG update

<kaz>Slides

Endo: I'm from NHK, grateful to participate, I was remote at last TPAC
… I'll introduce our recent activities on interoperability of metadata for media content
… The mission of the CG is to improve metadata interop and promote media content distribution across industries
… Focuses on actual demand across industry

<nigel> MCM-JP CG home page

Endo: The group is not trying to create a new metadata specification
… We have 41 participants from 19 companies participating
… Slide 3
… The focus on interoperability of metadata related to media content
… Two parts: between participants that belong to the media industry. Then between media industry and non-media content industry
… Each has its own organisations and specifications
… We won't work on a single protocol. Explore potential interop
… One goal is for various operators to use the case studies to develop a variety of servcies that use media content
… Slide 4

<nigel> MCM-JP GitHub page

Endo: We held 3 online meetings
… We received case studies from various industries
… Slide 5
… This case study is from the publishing industry. Linking media content, e.g., an online dictionary
… Appropriate metadata for each industry
… Slide 6
… TV programme metadata are used for measurement of effectiveness of sales promotion
… It demonstrates the potential of increasing the value of media content, across industry
… Co-creation of use cases is expected. Unfortunately there isn't a widely used specification for TV programmes
… Slide 7
… Some protypes for validating interop between publishing and broadcast
… NHK made two prototypes
… The first is where ePub content recommends a TV program
… The second is where TV recommends ePub content
… The prototypes were implemented based on existing specs like ePub for eBooks (W3C standard). and the Hybridcast spec in Japan, with existing metadata
… Some use cases can already be realised by combining knowledge from multiple industries
… We plan to increase the case studies and publish a CG report next years
… We want to contribute to MEIG by sharing the progress and results. Also contribute to W3C through the Publishing BC and Smart Cities IG

Chris: Any indications of requirements coming from the use cases already considered?

Endo: Currently we don't find any technical specification needs, existing specifications are developed all over the world, and they work well
… But only one specification cannot be realised currently. We found that multiple specifications and knowledge can be needed. to cover it limits the spread in their own industry
… So each operator in each industry should share knowledge for the other industry members
… Common rules can be identified. The CG would like to share data exchange examples, not technical specifications but things necessary for actual usage of metadata

Kaz: I have attended their group meetings. They have started to clarify current practices and best practices, and pain points and potential requirements for new spec work
… Next steps could include documenting those best practices

<nigel> Chris: In your use case, as with the publishing, I'm used to something like Amazon X-Ray, where it can

<nigel> .. show you the related characters or other things to what you are watching.

<nigel> .. The Amazon ecosystem has a lot of metadata about things and objects.

<nigel> .. It make me wonder if we need consistent identifiers for products,

<nigel> .. or is it okay just to point to different vendors.

<nigel> .. Some kind of service that can resolve an identifier and take you to an outlet.

Kaz: Could use DID in additon to existing ISBN codes or similar
… Discussion should include what kinds of identifiers to use

Nigel: Companies like GS1 do product identifiers

Kaz: the MCM-JP CG should organize a survey on IDs including GS1's work

Endo: Existing identity systems are provided, I think this effort identifying a technical lack, schema.org or IMDB. From 20 years ago, there's a metadata format, it's used in DVB-I
… It's a nice format, but other industries cannot adapt easily these existing formats
… We're hopeful for the near future. Other features more common specs can be developed

Kaz: Previous work, like TV Anytime, as Nigel just mentioned should be included in the document

Endo: Yes. I explain these efforts in Europe to those in Japan, and we collect outputs from them
… We plan to publish the CG report. It contains exsting ID systems and metadata. Case studies for existing specs, IMDB, TV Anytime

Kaz: Another comment for this CG - even though it's a Japanese group, they'd like to collaborate with MEIG and others, so your input and comments are welcome

Endo: We plan to discuss business related case studies in the CG, and discuss technical issues and further specs in MEIG or other groups

Chris: Does the group meet at TPAC?

Endo: No, only online events

Chris: We look forward to the CG report and taking the work forward, when you're ready

<kaz> [ MEIG Meeting on Monday adjourned; continue discussion during our monthly calls ]

– DRAFT –
MEIG TPAC Meeting

23 September 2024

Attendees

Meeting minutes

M&E IG Introduction

Next Generation Audio codec API proposal

Media Capabilities

TTWG / MEIG Joint Meeting

Updates from TTWG

DAPT

Timed text in MSE

Media Breakouts

CTA WAVE Streaming Media Test Suite

Web Media Snapshot Update

Originator Profile

Media Content Metadata Japanese CG update