Meeting minutes
Slideset:
https://
Intro and agenda
Chris: Welcome to the
first MEIG of the year.
… Following up on discussions in the Media WG about
media capabilities.
… They've done some triage and
prioritisation.
… I saw that there were a number of issues where
the WG could use wider industry input,
… which is what today is about.
… There are some proposed features raised in issues
that the WG has not prioritised.
… We can re-evaluate those and discuss if they
continue to be useful, and provide advice into the Media
WG
… to help with that prioritisation.
Chris: Recap:
… Media Capabilities API is a browser
API
… Provides info to the page about the browser's
ability to decode and play various media format.
… Also for encoding and transmission, applies more
in a WebRTC context.
… This one API works in both a streaming media
context and also in a WebRTC context.
… One interesting aspect of the design is that the
Media Capabilities API is intended to focus on
… the decoding and encoding capabilities, and the
ability to render the decoded media
… is not really in scope except for a couple of
exceptions.
… It's a design choice.
… e.g. things to do with the properties of the
display is excluded.
… It works by providing a MIME type, if it's HDR
content, use parameters to describe the use of HDR
metadata
… and colour spaces.
… The information you get back indicates if the
format can be decoded, and if so, if playback is expected to be
smooth.
… This is in some implementations dependent on real
time feedback based on previous experience
… that the browser might accumulate.
… A flag tells you if playback is power efficient,
which could be due to hardware acceleration.
… In some cases software decoding could be as
efficient.
… I'm focusing on MSE decoding and playback for
today, rather than encoding or WebRTC,
… but if you're interested in those we can talk
about them.
… This is intended to get your feedback on these
open questions.
Chris: Implementation
status. There are differences in how up to date the implementations
are
… with the draft specification.
… Particular items for today:
… Text Track Capabilities
… Ability to decode multiple streams
simultaneously.
… Transitions (e.g. for ad insertion)
… Rendering and Decoding capabilities.
… Open question: are there other requirements that
should be prioritised beyond the ones I chose today?
… Any thoughts or initial questions?
no group questions
Text track capabilities
Chris: 3 GitHub issues
from review.
… Accessibility HR highlighted that audio and video
media are often accompanied by text tracks,
… either embedded or separate. They asked if media
capabilities' scope would include that.
… w3c/media-capabilities#157 is "text tracks not
supported"
… Raised by Mike Dolan who pointed to a general
need that detection for TTML. IMSC, WebVTT is out of
scope.
… There's some discussion in the
thread.
… The other issue, 99 is more
specific.
… w3c/media-capabilities#99 is about SEI messages
in the video stream carrying 608 or 708 captions
… As I understand it, of the major browser engines,
only Safari has support for embedded timed text media.
… Chrome and Firefox don't do that.
Nigel: Can you explain embedded timed text media?
Chris: I was avoiding using the term "in-band"
Nigel: Is it something in the manifest, or something multiplexed with the video itself? Or a HTML track element?
Chris: I see the HTML
<track> element as a separate thing. Text Track Cues can be
programmatically
… added to a track object.
… This depends on how you're providing the
media.
… For example, if you were to give an HTML
<video> element a DASH or HLS manifest and it
can
… play whatever is in that manifest then maybe
there's a case for knowing whether it can play it.
… For example if it can play TTML subtitles
delivered as separate files.
… You'd need to know if the UA can handle and
render those.
… I was also thinking about MSE-based playback,
where the browser doesn't know about manifests,
… at that point the player that's running in the
browser would be fetching the timed text content
… and then using the Text Track API to
render.
… Then there's the case where the multiplexed video
includes other content, e.g. SEI messages,
… VTT or TTML multiplexed in, e.g. to an MPEG2
TS.
… I assume MP4 and CMAF has similar
capabilities.
Nigel: MP4 is
predicated generally on having separated components, but that's not
essential
… CMAF has a hard requirement that each media type
is in a separate MP4 sample
… So the CMAF profile says you must provide IMSC
and may have WebVTT, and each is in a separate resource whose URL
can be derived from the manifeset
… So although CMAF might have provision for the
embedding, I think it's not encouraged
Chris: That's
interesting because that suggests to me that this multiplexing and
embedding use case
… is maybe not one that needs to be
prioritised.
Nigel: Interested in other's views, but thinking about BBC's approach I'd agree. But I suspect there are others with different needs, please speak up
Chris: Certainly very interested to know from anyone else who has a view on this.
Nigel: Some of the US based folks may be interested in 608/708, possibly translated into TTML, so separate resources might be how things are expected to go
Chris: You're saying
timed text content would be in a separate resource, fetched by the
player and
… then rendered either in the browser or in custom
script via the text track api?
Nigel: I think so, yes
Chris: In that sense,
I'm not hearing a strong requirement that media capabilities needs
to do anything at all
… with respect to text track cues or formats,
because it would all be handled through script.
… For those browsers that have VTT support you can
know from the user agent that you're on, e.g.
… a Safari browser would give enough information,
rather than needing to extend this API.
Nigel: There's
something here we shouldn't skip past. The architectural model for
presenting captions and subtitles, is that the page doesn't know
about the user's choice
… Either the choice to fetch and render the
captions, or any customisation options
… If you're using MSE, and you're fetching caption
resources, you already have script that knows it's fetching timed
text
… But with presentation, and the additional
layering in of customisation for user preferences, if you do it all
in script, you have to have your own model for the user
preferences
… These may be different from what's set in the
browser. And those can't be queried in the web context
… Or you need to pass the content to the browser
and the browser applies the preferences
… In principle Safari allows for IMSC to be used
directly, so you can have IMSC in an HLS manifest, and Apple's
documentation says it will play back
… If you're doing that, the browser controls the
rendering and appearance. What could be useful is for the web app
to know if the UA can handle that presentation
… Or if you need to do that yourself, and if you
do, do you need to handle personalisation, so not use the user's
browser level preferences
Kaz: I would agree with
Nigel's view and I think there are two points here.
… how to deal with the capability for timed text
handling itself,
… and how to deal with embedded data by the
API.
… For both purposes, thinking about more use cases
would be helpful, for examples.
… When I watch Netflix at home I prefer English
captions but my wife prefers Japanese captions.
… It would be useful for us if we could see both at
the same time.
… We ourselves don't care how the timed text is
provided, embedded or separate.
… Do we really need embedded captions or not? I
personally am okay with separate files for each
language
… and allowing the browser to iterate all of
them.
… If someone needs embedded style that should be
clarified as a concrete use case.
Nigel: Style? Most formats support styling.
Kaz: I mean in-band in
the MP4 rather than separate TTML. If we really need that option we
should clarify the use case.
… Using CMAF is fine, but...
… I mean where to have the caption information,
rather than the appearance of the captions themselves.
… Data structure, carriage etc.
Chris: This is a
conversation we've had a few times, this idea of the application of
user preferences
… and the privacy issue and so on.
… I'm wondering to what extent that has an impact
on the Media Capabilities API.
… In what Nigel was saying there was a suggestion
that that in itself could be something that's useful
… to query.
JohnRiv: Along those
lines, it's been a while since I discussed it at work, but in the
past we talked about
… if the user sets preferences in one device, and
wants to carry them through to another device, that's
… a use case.
… It might be interesting to be able to query a
binary "does the user have presentation preferences or
not"
Nigel: I'd be interested to know how you'd use that information
JohnRiv: To say, use the browser based preferences or the app-level preferences. I'm not sure
Nigel: I'm concerned
that the model doesn't really work. Every page you visit collects
data about usage, e.g., for doing A/B tests
… But the web mechanism here directly pushes
against that. It says the needs of the user in terms of privacy are
more important than the needs of the user for improved
product
… That makes me unformfortable. A mechanism for
using that would be helpful. But by not providing any data for
improving players and products, it acts against the community whose
privacy we're protecting
… So I kind of want both
… This may be away from the core question, i.e,
querying the UA for ability to play timed text
… As a player developer, if you're in a browser
that says it will take your subtitle format and present it, that
would really help some implementers
… Also helps users as there's a unified world for
how things work in every page for timed text support
Chris: What do we do
next with this?
… I've seen various proposals, particularly from
Apple, on how to provide support for custom formats.
… When they say that they mean anything that is not
VTT.
… I'm conscious that turning this into something we
can propose, or next step...
Nigel: I think the idea
of having a canonical transfer format for text tracks that need to
be rendered, probably based on HTML and DOM elements, is a good
one
… It separates the decoding of your format. But in
that case the query is simply: which types of texttrackcue do you
implement: only a VTTCue or an HTML Cue (or whatever we call
it)
Chris: Makes sense to me
Nigel: Then, the follow on is, why need a separate API? Because you could test for the HTML Cue constructor, so there'd be nothing to do in MC API
Chris: Right, there's a
detection mechanism already in a more relevant place
… I came to this from the idea that embedded
support might be needed,
… but if that's less critical then these issues
that have been raised like SEI events with 608 and
708,
… if the browser is not going to do anything with
those captions, i.e. no browser,
… then there's no need to query a capability for
them.
… The expectation is you present them as a separate
resource.
… Ultimately the Media Capabilities API is about
providing choices for which encoding of media to
deliver.
… I think we
… have come to a reasonable place on this, unless
there's more input from anybody on this.
Transitions
Chris: There's been
some discussions on this but we haven't specified anything in the
WG yet.
… In MSE v2 there's the source buffer change type
method which allows you to tell MSE that
… the format of the media you're passing in is
changing.
… This allows you to do codec or media container
transitions within a source buffer.
… The use case that's driving the need for that API
is advertising content that may be encoded differently
… or have different encryption from the primary
content.
… There may be implementations that support
different kinds of transitions between each,
… depending on the decoding mechanism, if it can be
seamless or requires reinitialisation etc.
… There are issues in the GitHub repo to talk about
how this might look in terms of an API shape.
… We don't need to go through all the
proposals,
… but one is a transition method where you query
for one set of capabilities and then
… make another call to say given that state, can
you transition to this other one?
… Another proposal is, given two encodings, can the
transition be supported,
… and it answers to say supported, smooth, power
efficient etc as applicable.
… A third is to use the API as it stands with an
additional flag reported in the information you get
back
… so it's not just supported and power efficient,
but also what codec switching is supported.
… If both encodings say they support transition
then that tells the web app that smooth transitions are
supported.
… There are these different proposals. The one
we're seeing now is the current one.
… I think there's a pull request but it hasn't been
merged yet or strongly prioritised.
… The question here is if this is something that is
important and needs to be solved?
… Are there other ways of doing this that mean it's
not a high priority for websites or players?
… I'm looking for input and feedback from anybody
here as to the immediate need for this and
… whether the WG should prioritise specifying
this.
… If you have thoughts about the shape of the API
that's important.
… The issue numbers are #102 for the ergonomics. If
you don't want to say anything in this meeting
… that's okay, you're welcome to leave comments in
the GitHub issue as well.
Decoding and Rendering
Chris: The Media
Capabilities API is really about querying the browser's ability to
decode.
… We have a separate set of feature proposals that
focus on the display capabilities for the rendering
side.
… CSS Media Queries let you query for the dynamic
range and color gamut support.
… We've proposed additions to these video- prefixed
features that allow you to query the video plane
… on the TV device which might be different to the
graphics plane.
… These may have separate capabilities (on each
plane).
… The idea is to find out if the TV supports 4K and
HDR even if the graphics plane is only sRGB, say.
… Similar question around the height and width of
the video plane, to query for its resolution.
… We have a proposal in the linked issues, which
has a lot of conversation.
… We ended up exposing a pixel ratio, as a numeric
value, but not through a media query which is a yes/no
… only, but a number is more useful here than a
series of yes/no queries for each resolution.
… This needs to be taken back to the CSS WG to
progress it. Last discussed a few years ago.
… The question here is if these are useful features
for us to progress?
… They're not really in the Media WG, they're CSS
WG, so we shouldn't necessarily look to the WG to push
them.
… This is where the IG can have a role and make
recommendations into CSS WG to move them forwards.
… Anything you can do to help raise the priority,
e.g. commenting on the issue, or commenting here would
help.
… The other concern is that as CSS progresses
through the Rec track they will need to see
implementations
… of these features. It would be helpful for us to
point to devices that include these features (TVs),
… otherwise there's a risk of the features being
removed potentially.
… Because of a lack of uptake.
… We're out of time, any more thoughts from members
now?
Kaz: do you want to
discuss the remaining 3 slides as well?
… maybe you can ask people for feedback by
email?
… or talk about them as well again during the next
call