Meeting minutes
(Some more photos from the MEIG Meeting are available online (Member-only))
(The minutes from the joint meeting with APA WG and TTWG are available online.)
(Some more photos from the joint meeting are also available online (Member-only))
M&E IG Introduction
<kaz> Slides
Chris: Igarashi-san, Chris Lorenzo and myself co-chair the IG. Kaz is our staff contact.
<kaz> MEIG Charter
Chris: The role of this
group is to provide a forum for media related discussions.
… We maintain relationships with other groups
within W3C and other organizations to help bring requirements from
outside of W3C within W3C.
… So that they can be used as input to Working
Group discussions.*
… Looking at the entire spectrum from media
inception to rendering.
… The group was at the inception of what became
Media Source Extensions and Ecnrypted Media
Extensions.
… Ongoing revisions of these specs are now
happening.
… The web platform is much more powerful regarding
media.
… We look at use cases, requirements and gaps to
feed into existing working groups. When it's something new, we can
look into creating a community group to incubate a solution before
it transitions to a working group.
Chris: [presenting a slide with links to M&E IG resources]
Chris: 3 sessions today,
this morning, then joint session with Timed Text, and then again
this afternoon.
… [going through the agenda of the morning
session]
Next Generation Audio codec API proposal
<kaz> Slides
<Niko> Presenter: Bernd Czelhan
Bernd: Together with
Chris from BBC and colleagues from Fraunhofer, we've been working
on a proposal for next generation audio codec API.
… Next generation audio experiences are defined in
an EBU factsheet.
… The definition is a little abstract but has some
key points.
Slideset:
https://
Bernd: The demo shows
different mixes, switches between different language commentaries,
possibility to change the prominence of the dialogues.
… And you can change the presets.
Bernd: There are
differences with traditional audio codecs. Main difference is the
use of metadata embedded, that allow the user to select a prefered
contenxt version. It's also to adjust the gain/prominence of
certain components.
… It's also possible to adjust the position of
certain audio components.
… We need a dedicated interface to accept these
knobs.
… From a use case perspective, you could imagine a
Home Team and an Away Team presets.
… Second use case is prominent of audio objects.
Very useful to follow a video with audio in a foreign language when
the actor can be hard to understand for non-native
speakers.
Bernd: Third use case is selecting among a set of audio objects, components representing different content languages
Bernd: The fourth use case is position interactivity, e.g., if you are visually impaired.
Bernd: Last use case is controlling multiple components at the same time. For example, the breaking noise of a car may be very important for story telling.
Bernd: I will sketch a possible NGA API, that could perhaps be incubated in a Community Group.
Bernd: Some goals
include covering the full potential of NGA and be codec
agnostic.
… Thank you for the integration.
Mark_Foltz: Do what extent are the use cases specifc to pre-recorded content?
<Wolfgang> Bernd: metadata can be created for live as well as pre-recorded content.
Bernd: Someone needs to take care of creating the metadata, but the demo I showed was a live show.
Mark_Foltz: To what extent do we expect that the metadata already exists or could be synthesized?
Bernd: Assumption so far is that metadata already exists.
Bernd: Our idea is to make NGA available in the Web context.
Nigel: Can you be more specific about the metadata being exposed?
Bernd: metadata has been defined in MPEG ISOBMFF
Nigel: This is to be able to expose options to the user in the player.
Bernd: Yes.
Nigel: Why is that a good thing to depart from the DASH/HLS model?
<Zakim> nigel, you wanted to ask what are the API requirements
?1: It allows to be more efficient e.g., with one mono-channel in two different languages.
Nigel: It's more efficient at the provider side?
Niko Färber: Yes.
Wolfgang: There are
other use cases that make more sense. I wouldn't say that the DASH
model is being made obsolete, or that the proposal is even
comparable with the DASH problem.
… What you need is a way to control the decoder
with regards to how it uses the metadata.
Nigel: Thinking about
the accessibility of this, do you have a scheme in mind for a way
to associate the choice that the user might make in terms of
preselection and an associated text track.
… Let's say you switch to French. Do you think
anything additional is needed to associate the chosen language with
the subtitles?
Bernd: The API would give you the metadata so that you can easily see what languages are included.
Wolfgang: If you're using this with DASH, I would expect all the preselection metadata to be also available in the DASH manifest.
Chris: What's being proposed here is to take these use cases and move to the next stage of work: taking these requirements and turning them into some kind of Web APIs that browsers would implement in order to enable web applications to make use of these customization features.
<kaz> l_Megitt, Francois_Daoust, Timo_Kunkel, Wolfgang_Schildbach, Chris_Needham, Tatsuya_Igarashi, Kaz_Ashimura, Kensaku_Komatsu, Mark_Foltz, Nikolaus_Faerber, Bernd_Czelhan, Hiroki_Endo, Hisayuki_Ohmata, Song_Xu
Chris: The question to the room is, do we think that's a valuable thing to be working on? If so, what do we do to take it to that next stage? If not, what are the concerns or barriers that might exist?
Nigel: My response to
that question is: who is interesting to implement and deploy this?
There's work going on in ISO for defining some of this stuff.
What's the implementation outlook for this?
… Is there a chicken and egg situation?
Wolfgang: In terms of
implementations, I know that the Dolby and Fraunhofer NGA codecs
have supported these features since the beginning. These features
are just not exposed in a Web context.
… We know some consumers who use these features in
a native space and would like to use them in a web context too, but
cannot right now for lack of an API.
Wolfgang: The MPEG Systems group at ISO has recently published a final draft. It's final, it won't be changing anymore.
Francois: That's all
audio API - there's probably metadata in the video space as
well,
… created by cameras etc. Is there an interest in
approaching that with both angles at once?
… Like we do in other contexts, treating audio and
video the same?
Wolfgang: We presented
this as an audio API because we're audio guys
… In ISO/MPEG this was discussed in a media
independent way, and it does transfer
… to video, for example with picture in picture or
multiview video.
… if you have codecs where HDR is available as a
post-processing layer, switching it on or off
… could be exposed via preselections as well. Use
cases like picture in picture or multi-view video.
… So it could be relevant for video.
… I haven't thought about subtitles, maybe it could
be a use case as well
Bernd: We're interested
in a general API and our idea is that if we move to the next
stage
… then people will bring their contributions to
it.
… (video people, not us, we're audio
people!)
Mark_Foltz: Pieces of
feedback: Before drafting an API, try to understand precisely what
the tehcnical gaps are.
… And two, if you're not trageting codecs per se,
try to explain the limitations of what you're trying to do. For
example, metadata that is carried within the codec or on the
side.
Bernd: You may have certain requirements that you cannot play some components instead of others, for example, no way to disable a component entirely.
Wolfgang: Maybe I'll
clarify the out-of-bitstream delivery. With ISOBMFF, the metadata
will be in the init segment. With MSE, you would be able to inspect
the metadata.
… The practical gap is telling the decoder which of
these metadata to apply. I don't think that can be easily
emulated.
Kaz: Applying this to
3D audio and 3D video, at some point we may want to look at the
relationship between this API and geolocation information of the
speaker and listener.
… A way to express the actual location of the audio
input and output.
Wolfgang: For the
audio, it's possible to qualify what the layout of the audio
is.
… I think that would be possible to express this in
the metadata. That would be a question more for the file
formats.
Chris: With this kind of codec, does it work well with outputs such as headphones binaural to position things in a 3D space?
Bernd: Live meetings could come to mind. That's a possibility, but then someone needs to create the metadata in the first place.
Wolfgang: I think sources can be positioned in a room, and then some metadata can be used to create interactivity, but that's already a very advanced use case. The very basic "selecting one of a set of versions" works with immersive audio, but it's more basic.
Youenn: The API would
be about allowing the web app to tell the user agent to selecting
this or that setting. By default, I would expect the user agent to
be good at selecting the right choice.
… Describing exactly why the web application may be
in a better position to choose would help pushing this
forward.
… In terms of accessibiliy, usually, the user agent
is in a good place.
Bernd: In the use cases I presented, the ability to change the prominence of a component is a good example of where things may be hard to determine for the user agent.
Chris: Back to identifying gaps, it seems to me that what is being proposed here is coming from a place where new codecs were developed to satisfy a set of use cases. Mark, you're suggesting to look at it independently of whether new codecs had had been created to understand where the gaps precisely are.
Mark_Foltz: One example is real-time mixing. How can the metadata interact with the Web Audio API. Identifying the gaps would be useful for that. I don't think browsers today understand NGA codecs.
Wolfgang: I believe that the browser needs to understand a specific codec in practice, the metadata is codec agnostic.
in response to youenn: Agree we should investigate which are the cases where the browsers' default choice is not optimal. As Bernd says, there are some "one size does not fit all" cases.
song: I'm working on
NGVC (Next Generation Video Codec). To introduce a new codec, I
need to prove what the new codec brings is necessary.
… Examples of power consumption, efficiency. I'm
not familiar with next generation audio. Is the task to introduce
new codecs and/or the API?
… Do we have any plan to introduce NGV to the Web
too?
Bernd: We're really just focusing on interface. The technology is deployed today otherwise.
song: If we introduce the NGA to the web, do we have any plan to introduce the NGV?
Wolfgang: I don't think
a plan exists. As we mentioned earlier, this proposal that we're
bringing, we think, apply to video as well.
… We're looking for feedback on what needs to be
changed or added to also make it work for video.
Chris: Trying to bring
some conclusion. There isn't strong support to look at incubating
an API is the right next step. More support to specify more
thoroughly the analysis of use cases, requirements and gaps. That
is something that the Media and Entertainment IG can do.
… I would find it helpful that continuing work in
this direction is seen as worthwhile doing.
… Focusing on audio as we have been, the following
question is about broadening the scope to video.
… Both of these things can be addressed in the
IG.
… How much interest is there in the outcome of the
work?
… Who's interested in helping write this analysis
document?
Wolfgang: We're willing to contribute to the gap analysis. We'd be looking forward to having more contributors to the gap analysis.
ZheWang: Huawei would also be happy to join this gap analysis.
Nigel: For players,
there's a place where players could be made aware of different
options. Maybe we should think from that perspective. For example,
we mentioned DASH and NGA, there's no point if metadata
disagree.
… I can think of many cases where this kind of
thing is helpful for audio. I don't think I've ever seen different
video options available.
… Sign-language interpretation could perhaps come
to mind.
… A holistic view of the picture would be useful. A
step beyond that, how should they be making a decision for what the
users need? From an accessibility perspective, it is not easy to
know what settings need to be applied. MovieLabs has an algorithm
to compute scores that can help make a choice.
Wolfgang: I can relate to the difficulty of making a selection.
Chris: Proposed next step: elaborate a gap analysis in this IG. The scope can include the use cases. We may include both NGA and NGV. That's what I'm proposing that we do next.
<Zakim> nigel, you wanted to note that signalling alternative playback options more generally could be useful
Francois: Anyone willing to become the main editor who can drive the analysis?
Bernd: Happy to.
Media Capabilities
<kaz>Slides
Timo: We brought
forward this proposal last year at TPAC in the AV4browsers CG.
We're now thinking of moving this proposal to this IG.
… This is a quick update.
… Starting point is the Media Capabilities
specification.
… We've been looking at how to specific HDR format
support.
… Currently, you have color gamut and transfer
function, which is a great start but there's no way to target
commercial formats.
Timo: Proposal to adopt similar rules for HdrMetadataType
Timo: Proposal is to
add commercial formats, but not in enum but through a registry
instead
… Would love to see a way to develop this further
and make it a reality
… Define registry entries, the three existing ones,
plus Dolby Vision and any others that are needed
… Can look at how this could be
materialised
… Continue discussion in this group and bring it
over from the CG to this group
… One comment that came up:
… To have more documentation
available.
… Since last year we have published a lot of
documentation on line about how Dolby Vision works
… and how to implement.
… [slide] Links to resources
… Hope these slides will be shared
later.
Slideset:
https://
Timo: Nothing new
compared to last year, just a recap as to what we discussed.
… This is just a recap.
… Comments, are we making omissions?
Chris: Is it sufficient to have a single Dolby Vision Metadata value, or is more granularity needed?
Timo: to my
understanding yes, but there is flexibility for new values if
needed in the Registry in the future.
… At the moment, it's just one.
<Zakim> nigel, you wanted to ask about Registry definition ownership
Nigel: For a Registry, you need a WG to own the Registry Definition. Have you talked to any WG about this?
Timo: Not yet
Nigel: Also they could delegate management of the Registry table values to an IG, potentially.
Chris: Media WG would be
a good fit for owning this, if they agree they want to do it.
… Media WG has a large number of existing
Registries.
… Like for EME, MSE Web Codecs, etc. Used to
managing Registries in that group.
… Would change the API Specification, from having
an enum value to other identifiers.
… There's complexity with enums, we switched to
strings.
… We could revisit the decision to have a Registry
if it's felt that just adding an enum value to the
… existing spec is the appropriate way to
go.
… The concern that we had when we discussed it last
year was variability between implementations
… that meant that the presence of the Registry
wouldn't necessarily mean all implementations
… would be required to include it. Is that
correct?
Timo: Not sure, there
was concern to add a commercial enum value.
… From our point of view we would like to be able
to address the format correctly,
… don't mind how, we're happy with both
approaches.
Wolfgang: We're not
asking for any specific technology to be supported.
… Just an addition to the media capability API so
that the client code can ask what is supported.
… "No" is a perfectly acceptable answer.
Chris: Trying to recall why we concluded that a Registry would be worthwhile last time.
Francois: Might depend
on whether a normative definition of what it means is needed.
… In this case, if there's no public specification,
it means that you end up with a normative reference
… to a commercial specification, which you may not
want.
… The Registry provides an intermediate
layer.
AramZS: I noticed when
reviewing the spec, you should try to get it early through
PING.
… Adding a navigator object that returns
implementation support details is a fingerprinting
vector.
… Should get PING review early rather than
later.
Francois: Seems to be a
more generic question about the Media Capabilities API, or just on
this new value?
… The API already has an identified issue about
fingerprinting and mitigations.
… There's been interaction with PING on this
already.
… You may not like the solution!
Chris: Concerns about
what you mentioned did come out.
… To the extent of questioning the whole MSE and
EME approach,
… where the web app is doing things not just the
browser.
… They're argument is why not hand a URL to the
browser and let it handle all the details.
… We need to discuss this on Thursday - the
rationale for why the design is how it is.
AramZS: Yeah that's
pretty normal, it just occurred to me that a different approach
might be
… more satisfactory from a privacy
focus.
Chris: Yes, that's about
the API in general not just the individual value.
… Thank you, it's a very good point.
AramZS: Thanks
Timo: Thinking about
the Audiovisual Media Formats for Browsers CG, discussing next
steps, keep going or close it.
… Want to give members a chance to share their
opinions.
… At the moment we don't have too much
traction.
… Any thoughts, please let us know.
… Someone signed up only this morning, so there is
some momentum!
Chris: We created the CG
to focus on a specific scope, and as a place not to design
solutions,
… but to be more of an interest group contribution
structure.
… I'd like to fold the activity there into this
IG.
… The only question is your (Dolby) membership
status, to solve that problem.
… Then you'd get the support of Chairs here to help
move it forwards.
… What tends to happen is that the CG list is
public and can be discovered and joined
… but if there's no real activity happening there
then they end up with nothing to see or follow up on.
… Having a place where it's actively being worked
on is helpful.
… That's just my suggestion
… If people feel that a dedicated group is more
appropriate structure, happy to operate that way
… I suggest thinking about that over the
break.
Kaz: Another WG is also
thinking about Registries for various reasons.
… Would like you all to think about the whole
mechanism of the expected Registry and
… which parts would be handled by which Registry
and what our role is here.
… Please discuss during the meeting on
Thursday.
Chris: Happy to talk to the WoT WG about how they handle Registries.
Nigel: TTWG has Registries too, happy to share experiences.
Chris: Adjourn for now, next after the break is joint meeting with Timed Text WG.
TTWG / MEIG Joint Meeting
<kaz> Slides
Nigel: I chair TTWG,
with Gary Katsevman
… There are 3 topics in the agenda: Update on TTWG,
a look at DAPT, and discussion on Timed Text in MSE
… Anything else to cover?
(nothing)
Updates from TTWG
<kaz> IMSC HRM Rec
Nigel: We published the
IMSC HRM as Recommendation
… It used to be in IMSC, but we've updated it and
made a test suite, and met Rec requirements
… The biggest area activity is DAPT, some activity
on WebVTT, and less on TTML2
… We recharter in April next year
… TTML 2nd Edition has been in CR since
2021
… On Friday's TTWG meeting is to assess what we
need to do to get to Rec
… IMSC has been Rec for a while. Three versions
simultaneously active
… We'll think about whether we do a new version of
IMSC
… A requirement that came up is the ability to do
superscript and subscript
… And removing the HRM, as it's in a separate
spec
… On WebVTT, it's been at CR since April
2019
… Apple will discuss interop issues on Friday, and
attribute block for metadata, and an inline block backdrop
element
… DAPT is a working draft. We aim to resolve to get
to CR on Friday
… Other things, we got feedback last year. It's
hard for beginners to the TTML ecosystem to get started. Is that a
shared issue for people here?
… We are happy to work with industry partners on
the best way to share information
… MDN documentation is user, not implemneter
focused
… CCSUBS group has industry partners, sharing
implementation experience
Chris: On the TTML ecosystem, where is the feedback coming from?
Nigel: Someone in the
APA WG mentioned it. But it seems easy for people who're immersed
in it. Interested to hear views
… Also happy to take feedback outside the
meeting
Francois: Could depend
on the audience. IMSC documentation in MDN, seems a good entry
point for developers. If you're targeting standards groups, e.g.,
the W3C accessibility community, it's a different audience
… A document for them could be useful?
Nigel: The MDN doc is
useful for people making caption documents, less useful for
implementers building players
… Lots of implementations not in browsers, and
variety in the implementations
… People may want to use advanced features but not
know where to get started
DAPT
<kaz> Dubbing and Audio description Profiles of TTML2
Nigel: The purpose of
the spec, it's a profile of TTML2, transcripts of audio or video,
transform them into scripts
… Dubbing scripts, or audio description scripts,
with instructions for the player
… Can be used to play in the browser. I have JS
code to do that, using TextTrackCue and Web Audio
… Your first step creating dubbing scripts is to
capture the original language. Because it's based on TTML, easy to
transform to IMSC
… It's an improvement to the audience
experience.
Cyril: We've had
reports at netflix where content isn't dubbed or described
consistency
… We've brought our TTAL spec to W3C for
standardisation. It's something we really care about
… I have colleagures who are German-speaking with
dubbing and captions in German. If the entire sentence is
different, it's really annyoing
Nigel: That happens
because when people localise the media, they send it to different
providers for dubbing and for captioning
… DAPT is good, as it allows people to do the
transcript and translation stage once, then go the dubbing and
captions separately - from a common source
Cyril: DAPT aims to fill the gap between the different vendors. Scripts weren't shareable between formats, Excel, scanned PDFs with notes. So a standard mahine-readable format helps
Nigel: Current status,
it's a working draft. The spec is defined with a data model,
defines entities to represent the scripts, then how to represent
the data model in TTML2
… It helps guide us producing tests
… The data model is a script, with script events
(timed). Those contain text objects, potentially in different
languages. Then what that represents - dialog or non-dialog sounds,
something in-vision (text or objects)
… If text, is it a credit or a location
indicator?
… We may use a registry for the content descriptors
for what each thing represents
… Happy to answer questions. Anything
else?
Cyril: We tried to
write the spec in a way that you don't have to undertsand the
complexity of TTML
… Incrementally understanding the concepts.
Hopefully we did a good job, interested in fedeback
… Also, a simple way to implement, can be very
restricted to what you actually need. Because DAPT is a strict
subset of TTML
… DAPT isn't about presentation, subtitles ina
position with colour. It's about the timing and the text
Nigel: But you can add the styling and position data if useful
Nigel: There's
significant interest in using this. Peopl eusing proprietary
standards. Some of the AD standarsd are old and hard to
extend
… So be doing this, and improving the feature seat,
we want to make it easier for people creating localised
versions
… For the end user, for dubbed versions, we don't
intend to expose the ... directly. No expectation to deliver to the
browser for mass distribution
… For AD, could all be provider-side, but there's
an additional benefit of delivering to the browser
… The user can adjust the mix to their own
needs
… In the UK broadcast market, you can adjust the
mix levels. Clash between background sounds and dialog in the
description
… As well as changing the mix, you can expose to
assistive technology, e.g., a braille display. Watch video and get
the braille description simultaneously
Wolfgang: How is the use case of changing the audio levels being implemented today in web apps?
Nigel: I use the DAPT
description to drive gain nodes in Web Audio. It is feasible, but
not typically done in the web space. Similar use case to earlier
this morning
… If you wanted to create an NGA version, with
preselection, you could do iit by proessing a DAPT document. Happy
to work with you on that
Wolfgang: I'd be interested to try that
Nigel: The gap, is the DAPT can contain texts and the audio components, but with NGA I don't think there's a way to put the text equivalent in. How would you sync that with NGA?
Timed text in MSE
<kaz> Media Source Extensions WD
Nigel: This topic has
come up several times over the years, and not gone anywhere
… If you're familiar with MSE, you know you
typically use them to fill a buffer pipeline of media
… It works well got audio and video, but not
specified for timed text
… When I've talked to my BBC colleagues who write
our players, they say it could simplify their
implementation
… Buffer management all done in the same way,
rather than a different mechanism for text
… Could create TextTrackCues for the data. It can
be difficult to manage the size of the TextTrackCue list, these
could grow and grow, so good to have a way to manage the
buffering
… Not suggesting doing anything different for
rendering
… Is this a shared requirement? Suggest adding Text
Tracks in MSE to Media WG?
Wolfgang: So today, subtitles are rendered by the web app?
Nigel: They could be,
or passed to the browser for rendering
… But in DASH or HLS, the pipeline management isn't
done through MSE, but in parallel by the app
Wolfgang: Playout in times?
Chris: The media element handles that, by triggering the cues at the right time
Cyril: The problem that we have to have different code paths for AV and text assets is really there
Wolfgang: So I understand it is not about playout in sync, but rather about buffer management.
Cyril: On rendering,
ideally it should be possible to download and buffer text assets
like audio and video and defer rendering to the browser, if it has
the capabilities, or the JS level
… Options for how you preserve styles. Apple want
to have OS styles applied to captions, whereas Netflix wants to
have control over the styles across the
Chris: Fetch as separate files or within the media container?
Nigel: That happens
now, you wrap the data in the MP4, that's how they're package
according to CMAF
… If you have the infrastructure for fetching MP4
segments, why do something different for timed text compared to the
audio and video?
Nikolaus: Question about the rendering
Cyril: It is a separable problem, but it never was separated. Feed data to the MSE, and it gets rendered. Here you need a callback to the app. How do you ensure a bad user doesn't create many many cues to overload the browser?
Nigel: Why would you want to do that?
Cyril: It's about the browsers need to be defensive
Chris: The requirement would be that there is some notifcation from MSE to the web app that a cue has been extracted from the media.
Chris: currently we have enter and exit events, if browser manages all this. Is this adequate and in time?
Cyril: it is too
late.
… would have to have the notification in
advance
Nigel: The Safari tech
preview, it puts each Cue into an HTML fragment, then the cue
rendering is obvious
… If you think about a video decoder pipeline, it's
not instantaneous either. It needs a few frames to do
something
… The processing load for putting text on screen is
unlikely to be higher than that
Chris: why does the web app not extract the cues from the media, and then purge out cues no longer relevant?
Chris: So this is about having a unified approach to the memory management
Nigel: Yes
Aram: Is there a reason why the Safari approach of HTML framents isn't desirable?
Nigel: That moves the
presentation side to a better place than it is now
… Partly, it relates to the privacy issue we
discussed earlier. Pass the content to the browser. Modelled on
WebVTT at the moment
… But it's no good if you don't want to use WebVTT.
It provides a canonical format for WebVTT
… I have questions about exactly how it works, but
aims to be more flexible
Aram: It seems desirable from a developer perspective to have the HTML fragments in place. Developer movement to using HTML fragments, so there could be more people familiar now than even a year ago
Nigel: There are lots of formats people use in practice. We can try to avoid that mattering, than have a canonical way to pass that information through
Chris: Are there
implementations that do this with VTT today? Like creation of
queues and pulling them out of the media.
… DataCue is an example. There are implementations
that exist but are not speced.
Nigel: How to get to a conclusion? We could stop worrying about it, if it's not something we want to do. Or try to get the ball rolling
Cyril: Apple has this proposal, implemented. How did other browsers react? Was there a lack of interest?
Nigel: One proposal is
to manage the buffer pipeline for timed text through the MSE
buffer. I dont' know of any implementations of that
… The other is decoding and passing to the web app,
as HTML fragments
Cyril: The second depends on the first.
Nigel: From previous
conversations, e.g, with Eric, his view was the way browsers
implement WebVTT is to already create an HTML fragment then expose
to the shadow DOM
… A consequence of that is move the WebVTT decoding
upstream. then it's easier to polyfill
… WebVTT spec has been in CR for 5 years, not
enough implemnetation experience to move ahead. But people aren't
implementing it
… This could be a path out of that. You could
democratise the format requirement, so not have a dependency on the
single format, but also make it easier to standardise the formats
people want
Nigel: Standardising the presentation part means it's easier to standardise what goes in to MSE
Cyril: If you have a
way to put text into MSE, without a way to surface that, the only
alternative is native implementation
… Browsers don't like that because of attack
surface area, complexity, etc
Chris: This all widely
supported in other environments
… Allowing a plurality of formats overall seems a
good thing, personally
Nigel: It's a part of the web standards ecosystem that doesn't make sense to me. A lot of people view the WebVTT format as the web's format. But it's not a Rec
Chris: Is there interest to work on VTT interop?
Nigel: Apple want to talk about that on Friday in TTWG
Chris: Next steps for this?
Nigel: Text Tracks in MSE isn't on TTWG agenda. It would need Media WG support
Nigel: How to
summarise. There's some interest and benefit to pushing timed text
through MSE. Nobody said it's a bad idea. Possibly need to consider
together a canonical handover format on the output of the MSE
pipeline to the MSE processor
… And Apple have a prototype implementation based
on HTML fragments
Chris: We also discussed similar requirements aroudn emsg events and timed metadata. But timed text seems a better motivatign use case
Nigel: If you're
delivering packaged segments of timed text info, do you convert to
the canonical format on the way in to the MSE buffer or on the way
out
… I think I prefer on the way out
Chris: Need to get implementer input, we have siginficant steraming providers interested to use it, but the folks aren't in the room right now
Aram: Want to see improvement in this area, so it does seem worth doing
Chris: Suggest catching up with people during TPAC week
Nigel: Action should be to tell Media WG that we've discussed, and there are some views on how might be implemented
Media Breakouts
Igarashi: There are several media breakout sessions proposed.
<kaz2> Sync on Web, now and next of realtime media services on web Breakout
Komatsu(NTT Com): On Wednesday, we
have a session on Media Over QUIC. People interested in low latency
features
… Also synchronising media and data with the video
and how can be realised with MoQ
… I'll demonstrate some use cases. I think there's
a gap, so we'd like to discuss
<kaz2> Smart Cities Breakout
Kaz: Smart cities will be at 4pm
https://
<kaz2> HTTPS for Local Networks Breakout
Igarashi: HTTPs on the local network, I was cochairing the CG, but others have suggested the topic. Media streaming on the home network
Song: Is that MoQ?
China Mobile is interested, we released a joint development at MWC
in Barcelona
… How can I find the details on the
breakout?
https://
<Igarashi>
https://
<Igarashi> Evolved Video Encoding with WebCodecs
<tidoust> Official breakout schedule
CTA WAVE Streaming Media Test Suite
<kaz> Slides
Louay: Sorry I couldn't
be there in person
… You may have heard about this already, we
presented to an MEIG teleconference
… I'll explain about CTA WAVE in
general
… Consumer Technology Association, Web Application
Video Ecosystem
… Web delivered media and commercial audio and
video on consumer electronics
… WAVE defines a set of specs, test suites, and
test tools
Louay: Listed here are
some of the most important specs, Device Playback Capabilities is
the focus today, others like CMCD, CMSD, Web Media API
… Test suites are available to be able to test
these specs and whether consumer devices are compliant
… Web Media API Snapshot test suite, but here I'll
focus on DPC
… Conformance test software, GCP is a conformance
tool to make sure DASH content is conformant to different
profiles
… Going into the DPC spec, it defines normative
requirements for content playback on devices
… CTA WAVE content is not only content and
programmes, which is basically a sequence of CMAF
content
… Media playback model, DRM protected media, WAVE
content playback requirements etc
Louay: With playback
requrements, there are multiple scenarios, where DPC makes sure
devices are compliant, switching sets, random access, sequential
track playback
… These are defined in the DPC spec and the Test
Content Format spec
… They define what steps are needed to run a
sequence of tests and pass/fail criteria
… W3C testing activities, mainly happens in the WPT
project. In the DPCTF test suite we're introducing additional
components to what is already in WPT
… Slide 5
… The main components, github links. Mezzanine
content annotated A/V content to make the testing process
easier
… The main requirement here is to make the test
process automatic, no human support to run the tests, which would
be time consuming
… Detect if frames are being skipped on the
device
… Next are the test content repositories. Different
codecs and configurations
… These are used in the test conformance validator.
Joint project between DASH-IF, CTA WAVE, HbbTV
… You can run the tests on devices in different
environments. We focus on the web environment, MSE and
EME
… Slide 4 shows there are multiple tests. For each
, there's a test implenented
… The most important component is the observation
framework
… This is basically a camera that records A/V
content playing on the device
… After a test is done, the recording is
automatically analysed to understand if the test passes or
fails
… Slide 6
… A test video is playing, video includes
annotations, rotating QR codes on each frame. It includes info
about the current frame displayed
… It allows the OF to automatically
analyse
… The red arrows allow you to ensure the content is
properly displayed on the screen
… QR codes for audio and video sync
… Time codes are shown
… We have audio annotations, provided by Dolby and
Fraunhofer IIS, partners in the project
… Also available in different frame rates and
fractional frame rates. CMAF test content is generated from the
mezzanine content
… Slide 7
… CMAF media segments and fragments, with metadata
describing the segments
… DASH MPD, but you can use HLS as well, as they
reference the same content
… We have validatorrs for DASH content
… The content will be encoded in different
combinations of content options, content. Special content for
debugging. Different resolutions for switching sets, and DRM
protected content
… There's a matrix between the tests and the
content. And a validation process that uses JCCP DASH
validator
… Test implementations are in HTML and JS
templates
… Slide 8
… The templates reference the specification section
numbers, clear definitions for how the tests should
work
… The templates allow a single test can be run with
multiple content options
… A test is a combination of HTML, JS, and CTA WAVE
content
… Using MSE and EME for encrypted tests. We have
common libraries used in different tests, MPD parsing
… Why not use existing players like dash.js or
Shaka Player? We're trying to have a minimal implementation,
independent of external players, the test instructions work at the
MSE level, buffering
… The players don't give the flexibility from a
testing perspective
… They include what is to be tested and nothing
else. Sequential playback, random access to fragments, buffer
underrun and recovery
… Thsi is just a small snapshot of the
tests
… Slide 9
… The HTML and JS templates are similar to those in
the WPT repository, e.g., for WebRTC, Fetch API, etc
… Tests focus on the functionalities of specific
APIs. Similar here
… Because we focus on embedded devices, streaming
devices and smart TVs, but also applies to any mobile or desktop
device
… The test runner takes the tests from the test
repository and runs them on a device
… It's built on top of WPT. We extended WPT to
support embedded devices, where you don't have much flexibility on
user inputs
… Also limited ability to install a test runner,
e.g., an HTTP server
… Our extension is contributed back to the WPT
project
… You can use these with existing W3C WPTs, not
only the tests developed by WAVE
… You have a Device under test, which just needs to
open a web page, scan a QR code
… You have a companion app where you can select the
tests to run
… You can do a full test run, or a selection of
tests
… You can filter the tests, e.g., a set for HbbTV
specifications
… Slide 10
… The last component is the observation framework.
One component of this is recording. You can use a smartphone to
record
… One requirement is the recording
quality
… Each frame needs to be detected, so the frame
rate needs to be at least 2x the frame rate on the TV
… With new smartphone models you can record at
120fps
… Most of the content is 50 or 60fps or a
fractional frame rate. In future we'll have 120fps content, so
you'd need to record at 240fps
… Record offline, then copy the video file to a
server and run the OF script on it
… This will analyse the content frame by frame,
using the QR codes
… It detects if frames are skipped and so
on
… With automatic observation, it's important to
have automatic generation of test reports
… This picture shows 2 additional QR codes. The one
on the right is generated by the test app, not in the video
content. It includes info about the status of the tests and the
video playback
… You can compare the frame timestamp with the time
recorded by the video element
… For the random access to segment tests, you can
check the right events are generated. All this info is embedded in
the QR codes
… The tests checks the media player integration in
the device is working.
… Slide 11
… To make it easy to use the test suite, we made a
landing page, with an explainer
… You can instal and run the tests
yourself
… Everything is open source, in GitHub
… You can raise issues in GitHub or provide
feedback
… I recommend the landing page as the starting
point, follow the instructions there
… [Shows demo video]
… Slide 12
… Cable connects the audio output from the
TV
… Now the test is playing on the TV, you can follow
the progress on the companion screen
… We have assertions that the video is playing
properly and no errors from the media element or MSE
… Now the video is ended. We have special QR codes
for the start and end frames, because in many cases there are
issues with playing the first or the last frame of the
content
… The next content is then loaded. A special QR
code is shown between the tests
… The OF knows there the test results are to be
submitted
… When all the tests are finished, you get a QR
code on the test screen, the OF detects the test is
ended
… These aren't the final results as you then need
to do the video analysis. The page is then updated
… You can download the results in HTML or JSON
format, same as WPT
… Slide 13
… Technical requirements for installing the test
suite. We prefer Linux, everythiing is containerised with
Docker
… If you're running the DRM tests you'll need TLS
server certificates
… And you need to record at 120fps
… Slide 14
… Test report is similar to WPT
… In this case every video frame assertion is
failed, because a frame was skipped
… Another failure is the playback duration, which
doesn't match the CMAF track duration
… Slide 15
… We use the HbbTV plugfest to do testing and
validation. There have been testing events in the last 2 years, the
next is in 2 weeks at Fraunhofer Fokus in Berlin
… We'll ensure the test suite is working across
devices
… We use this daily in our labs
… Slide 16
… How it works with HbbTV? We use a setup with a
DVB modulator where you can signal the app to the TV, which is the
test suite landing page
… The device under test is an HbbTV
terminal
… But it could be any device with MSE and
EME
… Slide 17
… Test results from the plugfest last year. We run
the test suite on the latest TVs on the market
… Slide 18
… Any questions? You're welcome to get in
touch
Wolfgang: You use HbbTV
as a convenient way to start the tests on the TV
… Are you aware of others using DIAL or other ways
to launch?
Louay: You can use
DIAL, as the way to launch the tests
… In some situations we also have issues with
external networks at testing events, you sometimes discover the
terminal through DIAL, due to restrictions on the local
network
… There are 3 browser engines ont he TV. The HbbTV
browser, the runtime for running smart TV apps, like Tizen or WebOS
- these are browsers with additional features
… We use hosted web apps. Your app is a small file
that points to the landing page and you can run the tests in the
same way
Chris: What is happening next?
Louay: Audio test
content from Dolby and Fraunhofer IIS. We're now going to validate
the tests
… In terms of the OF, the most wanted feature is to
do this in real time. Currently you record with the smartphone,
then upload, then analyse
… So while you're recording we'd start the analysis
in parallel. Maybe we can get it working in real time
… It'll mean you don't need to wait to get the
results
Kaz: wondering about NHK's interest in this mechanism for testing. Differences with Hybridcast, but we should be able to use this framework
Louay: HbbTV is one
platform, but we don't use the broadcast features or HbbTV specific
APIs. We only rely on HTML, JS, MSE
… and EME, for the DRM tests
Ohmata: IPTF-J is a
member. We tried to use the suite on Hybridcast TV. The limited
memory and CPU on TVs causes some to freeze
… We're trying now to use all the tests
Louay: Non functional
requirements. If you test on older TVs, you may have issues with
memory management. If the test suite fails because of these
limitations, it's an indication that an open source player like
DASH.js or Shaka might also have issues
… It's helpful for us, create a GitHub issue and
we'll do our best to support you
<nigel> Ohmata: Thank you
Chris: Consistency of implemtations or any W3C spec issues that can help with interop?
Louay: MSE is the most
important API. There are examples in the tests for using MSE. The
thing we see most is skipped frames at the beginning
… Was it really an issue with the test
implementation, or with the TV? It was a repeating failre on many
devices
… Also some with A/V synchronisation
… If you look at the validated tests now, those
pass on major TV sets
Chris: This is such an important contribution. Congratulations on the work you've done here
Web Media Snapshot Update
<kaz> Slides
<kaz> Web Media Snapshot 2024
<kaz> WMAS
John: One of the groups
in WAVE is the HTML5 API Task Force
… Some updates worth bringing to the group
here
… Since 2017 every year we've released a snapshot
document. the minimum set of standards for media playback across
the 4 major browsers
… Drives an understanding of what features are
availble on TV devices at the time the snapshot is
published
… The work happens in WAVE and the W3C Web Media
API CG
… We jointly publish the spec, W3C and
CTA
… We publish every December. Starting last year
with WMAS 2023, we set the target date to November 1st
… Standards groups reference it, e.g., ATSC, so we
try to fit their timescales
… ATSC 3.0 they reference the 2021 snapshot, HbbTV
2.4 references 2021
… There is a WMAS test suite that Louay and team
have developed
… Launch through DIAL or a TV
modulator
… The primary tests are from WPT, also ECMAscript
and WebGL tests
… You can test the full set of featrures in
browsers for the TVs for each year
… 2024 snapshot updates. The expectations in the
next release: CSS to 2023 snapshot. This includes Media Queries
Level 4, which reduces the media types to print and screen. TV was
deprecated
… There's a WPT for the deprecation, and we tested
on devices, so it's supported and excepted
… It was added to the spec in 2011, so we were
comfortable making the change. If you're concerned let us know. Use
the media feature option instead.
… We updated to ECMAscript 2024, which has a couple
exceptions based on what's implemented
… WHATWG are living standards, we reference a
recent review draft from WHATWG, and tell people to use this
version or later
… Also WebSockets, which are in its own WHATWG
spec
… Features anticipated to be included,
WebTransport, WebAssembly, Push API
… WT doesn't have full support yet
… You can identify which features are included,
downstream groups typically reference a year or two
earlier
… You can watch the GitHub or comment
Chris: What sources of information do you use to decide what to include?
John: Start with MDN and caniuse, validate with WPT, which we rely heavily on
Chris: Some inconsistencies in WPTs and coverage. Does it lead to interop issues?
John: Sometimes the
tests go in before the features land
… I'll check why the tests are failing, are they a
newer version of the spec? There's not an easy way to know, for a
given spec, what is the state of the spec?
… That's the majority of the work, determining
where the tests fall, e.g., the CSS tests getting
updated
MarkV: I always thought
that doing the work to find areas of incompatibility really points
out problems developers have to face
… It seems there's a missing process, to take these
exceptions to interop as a to-do list and drive solving
them
… I don't know if that's ever happened. Would that
make sense?
Francois: It is something we should do. There's an ongoing activity that's relevant, in the WebDX CG and the notion of Baseline, which appears in MDN and caniuse
<kaz> WebDX CG
<kaz> baseline-status
Francois: We have
fine-grained information, but doesn't give you high level
information. The web features projcet is about finding the middle
ground
… A mapping of features that make sense to web
developers, and an interoperability score, which can detect
anomalies
… This mappnig exercies is ongoing, Google is
investing resources in the CG. By end of 2024 all feature will have
been mapped
… It's onging as new features will be
added.
Chris: The browsers run annual Interop events, we could promote certain areas to be looked at?
Chris: What holds ATSC and HbbTV back from using more recent WMAS snapshots?
John: The specs could
end up in certain regulations, in certain countries. They want to
give manufacturers time to catch up. So being behind allows
manufactures time to be up to date
… In practice I find most devices are up to
date
Chris: You seem to be doing much of the work. Do you need help?
John: Flagging tests
from updated specs. Future considerations. some suggestions from
HbbTV for WASM
… If you see we've missed something, we welcome
those updates, that would be helpful
Kaz: Given your
snapshot documet includes WebRTC, Web Sockets, you might be
interested in MoQ breakout on Wednesday
… WebCodecs and WebTransport
Nigel: Subtitles and
captions, the core requirements needed to implement a JS renderer
are met, but there's nothing in the snapshot that says browsers
should play back TTML or IMSC, as expected
… How much are browsers influenced by the snapshot?
Do they use as motivation for interop?
John: Haven't seen the
main browsers looking for those things. I have seen feedback from
WebKit and WPE. We hope the future considerations section will help
push things
… It's more on the CE side than the
browsers
Nigel: Successfully
passing tests or running 3rd party libraries, e.g., dash.js or
hls.js, imsc.js. If you had a stack that couldn't run that, it
would be a problem
… Have you looked at testing not just what's in the
browsers, but these 3rd-party libraries?
John: Not yet. Want a suitable way to test that accurately.
<kaz> DPCTF tests
Chris: Could do using the DPCTF tests and observation framework, to test end to end
Louay: Could extend the
OF to check subtitles are properly rendered in the right position
and right time
… Need to do it in an automatic way. For video it's
easier, so would need to come up with a solution for
subtitles
Originator Profile
<kaz> Originator Profile Web site
Michiko: The issue is:
is the info I'm being shown correct? Declaration from the
originator that the content is correct
… Show that Information has not been
tampered
… we use cryptographic techniques. Implementation
has begun this year
… Local goverments using this in 2024
… We have a breakout on Wednesday. We want to share
the challenges and how to discuss. We start at 8:30
<JohnRiv> This appears
to be the event:
https://
<kaz> Brreakout on Trust the Origin, Trust the Content - Originator Profil (Catalina 2)
Media Content Metadata Japanese CG update
<kaz>Slides
Endo: I'm from NHK,
grateful to participate, I was remote at last TPAC
… I'll introduce our recent activities on
interoperability of metadata for media content
… The mission of the CG is to improve metadata
interop and promote media content distribution across
industries
… Focuses on actual demand across
industry
<nigel> MCM-JP CG home page
Endo: The group is not
trying to create a new metadata specification
… We have 41 participants from 19 companies
participating
… Slide 3
… The focus on interoperability of metadata related
to media content
… Two parts: between participants that belong to
the media industry. Then between media industry and non-media
content industry
… Each has its own organisations and
specifications
… We won't work on a single protocol. Explore
potential interop
… One goal is for various operators to use the case
studies to develop a variety of servcies that use media
content
… Slide 4
<nigel> MCM-JP GitHub page
Endo: We held 3 online
meetings
… We received case studies from various
industries
… Slide 5
… This case study is from the publishing industry.
Linking media content, e.g., an online dictionary
… Appropriate metadata for each
industry
… Slide 6
… TV programme metadata are used for measurement of
effectiveness of sales promotion
… It demonstrates the potential of increasing the
value of media content, across industry
… Co-creation of use cases is expected.
Unfortunately there isn't a widely used specification for TV
programmes
… Slide 7
… Some protypes for validating interop between
publishing and broadcast
… NHK made two prototypes
… The first is where ePub content recommends a TV
program
… The second is where TV recommends ePub
content
… The prototypes were implemented based on existing
specs like ePub for eBooks (W3C standard). and the Hybridcast spec
in Japan, with existing metadata
… Some use cases can already be realised by
combining knowledge from multiple industries
… We plan to increase the case studies and publish
a CG report next years
… We want to contribute to MEIG by sharing the
progress and results. Also contribute to W3C through the Publishing
BC and Smart Cities IG
Chris: Any indications of requirements coming from the use cases already considered?
Endo: Currently we
don't find any technical specification needs, existing
specifications are developed all over the world, and they work
well
… But only one specification cannot be realised
currently. We found that multiple specifications and knowledge can
be needed. to cover it limits the spread in their own
industry
… So each operator in each industry should share
knowledge for the other industry members
… Common rules can be identified. The CG would like
to share data exchange examples, not technical specifications but
things necessary for actual usage of metadata
Kaz: I have attended
their group meetings. They have started to clarify current
practices and best practices, and pain points and potential
requirements for new spec work
… Next steps could include documenting those best
practices
<nigel> Chris: In your use case, as with the publishing, I'm used to something like Amazon X-Ray, where it can
<nigel> .. show you the related characters or other things to what you are watching.
<nigel> .. The Amazon ecosystem has a lot of metadata about things and objects.
<nigel> .. It make me wonder if we need consistent identifiers for products,
<nigel> .. or is it okay just to point to different vendors.
<nigel> .. Some kind of service that can resolve an identifier and take you to an outlet.
Kaz: Could use DID in
additon to existing ISBN codes or similar
… Discussion should include what kinds of
identifiers to use
Nigel: Companies like GS1 do product identifiers
Kaz: the MCM-JP CG should organize a survey on IDs including GS1's work
Endo: Existing identity
systems are provided, I think this effort identifying a technical
lack, schema.org or IMDB. From 20 years ago, there's a metadata
format, it's used in DVB-I
… It's a nice format, but other industries cannot
adapt easily these existing formats
… We're hopeful for the near future. Other features
more common specs can be developed
Kaz: Previous work, like TV Anytime, as Nigel just mentioned should be included in the document
Endo: Yes. I explain
these efforts in Europe to those in Japan, and we collect outputs
from them
… We plan to publish the CG report. It contains
exsting ID systems and metadata. Case studies for existing specs,
IMDB, TV Anytime
Kaz: Another comment for this CG - even though it's a Japanese group, they'd like to collaborate with MEIG and others, so your input and comments are welcome
Endo: We plan to discuss business related case studies in the CG, and discuss technical issues and further specs in MEIG or other groups
Chris: Does the group meet at TPAC?
Endo: No, only online events
Chris: We look forward to the CG report and taking the work forward, when you're ready
<kaz> [ MEIG Meeting on Monday adjourned; continue discussion during our monthly calls ]