MEIG meeting – 06 February 2024

Meeting minutes

Slideset: https://www.w3.org/2011/webtv/wiki/images/4/46/2024-02-06-W3C-MEIG-Meeting-Media-Capabilities.pdf

Intro and agenda

Chris: Welcome to the first MEIG of the year.
… Following up on discussions in the Media WG about media capabilities.
… They've done some triage and prioritisation.
… I saw that there were a number of issues where the WG could use wider industry input,
… which is what today is about.
… There are some proposed features raised in issues that the WG has not prioritised.
… We can re-evaluate those and discuss if they continue to be useful, and provide advice into the Media WG
… to help with that prioritisation.

[Slide 2]

[Slide 3]

Chris: Recap:
… Media Capabilities API is a browser API
… Provides info to the page about the browser's ability to decode and play various media format.
… Also for encoding and transmission, applies more in a WebRTC context.
… This one API works in both a streaming media context and also in a WebRTC context.
… One interesting aspect of the design is that the Media Capabilities API is intended to focus on
… the decoding and encoding capabilities, and the ability to render the decoded media
… is not really in scope except for a couple of exceptions.
… It's a design choice.
… e.g. things to do with the properties of the display is excluded.
… It works by providing a MIME type, if it's HDR content, use parameters to describe the use of HDR metadata
… and colour spaces.
… The information you get back indicates if the format can be decoded, and if so, if playback is expected to be smooth.
… This is in some implementations dependent on real time feedback based on previous experience
… that the browser might accumulate.
… A flag tells you if playback is power efficient, which could be due to hardware acceleration.
… In some cases software decoding could be as efficient.
… I'm focusing on MSE decoding and playback for today, rather than encoding or WebRTC,
… but if you're interested in those we can talk about them.
… This is intended to get your feedback on these open questions.

[Slide 4]

Chris: Implementation status. There are differences in how up to date the implementations are
… with the draft specification.
… Particular items for today:
… Text Track Capabilities
… Ability to decode multiple streams simultaneously.
… Transitions (e.g. for ad insertion)
… Rendering and Decoding capabilities.
… Open question: are there other requirements that should be prioritised beyond the ones I chose today?
… Any thoughts or initial questions?

no group questions

Text track capabilities

[Slide 5]

Chris: 3 GitHub issues from review.
… Accessibility HR highlighted that audio and video media are often accompanied by text tracks,
… either embedded or separate. They asked if media capabilities' scope would include that.
… w3c/media-capabilities#157 is "text tracks not supported"
… Raised by Mike Dolan who pointed to a general need that detection for TTML. IMSC, WebVTT is out of scope.
… There's some discussion in the thread.
… The other issue, 99 is more specific.
… w3c/media-capabilities#99 is about SEI messages in the video stream carrying 608 or 708 captions
… As I understand it, of the major browser engines, only Safari has support for embedded timed text media.
… Chrome and Firefox don't do that.

Nigel: Can you explain embedded timed text media?

Chris: I was avoiding using the term "in-band"

Nigel: Is it something in the manifest, or something multiplexed with the video itself? Or a HTML track element?

Chris: I see the HTML <track> element as a separate thing. Text Track Cues can be programmatically
… added to a track object.
… This depends on how you're providing the media.
… For example, if you were to give an HTML <video> element a DASH or HLS manifest and it can
… play whatever is in that manifest then maybe there's a case for knowing whether it can play it.
… For example if it can play TTML subtitles delivered as separate files.
… You'd need to know if the UA can handle and render those.
… I was also thinking about MSE-based playback, where the browser doesn't know about manifests,
… at that point the player that's running in the browser would be fetching the timed text content
… and then using the Text Track API to render.
… Then there's the case where the multiplexed video includes other content, e.g. SEI messages,
… VTT or TTML multiplexed in, e.g. to an MPEG2 TS.
… I assume MP4 and CMAF has similar capabilities.

Nigel: MP4 is predicated generally on having separated components, but that's not essential
… CMAF has a hard requirement that each media type is in a separate MP4 sample
… So the CMAF profile says you must provide IMSC and may have WebVTT, and each is in a separate resource whose URL can be derived from the manifeset
… So although CMAF might have provision for the embedding, I think it's not encouraged

Chris: That's interesting because that suggests to me that this multiplexing and embedding use case
… is maybe not one that needs to be prioritised.

Nigel: Interested in other's views, but thinking about BBC's approach I'd agree. But I suspect there are others with different needs, please speak up

Chris: Certainly very interested to know from anyone else who has a view on this.

Nigel: Some of the US based folks may be interested in 608/708, possibly translated into TTML, so separate resources might be how things are expected to go

Chris: You're saying timed text content would be in a separate resource, fetched by the player and
… then rendered either in the browser or in custom script via the text track api?

Nigel: I think so, yes

Chris: In that sense, I'm not hearing a strong requirement that media capabilities needs to do anything at all
… with respect to text track cues or formats, because it would all be handled through script.
… For those browsers that have VTT support you can know from the user agent that you're on, e.g.
… a Safari browser would give enough information, rather than needing to extend this API.

Nigel: There's something here we shouldn't skip past. The architectural model for presenting captions and subtitles, is that the page doesn't know about the user's choice
… Either the choice to fetch and render the captions, or any customisation options
… If you're using MSE, and you're fetching caption resources, you already have script that knows it's fetching timed text
… But with presentation, and the additional layering in of customisation for user preferences, if you do it all in script, you have to have your own model for the user preferences
… These may be different from what's set in the browser. And those can't be queried in the web context
… Or you need to pass the content to the browser and the browser applies the preferences
… In principle Safari allows for IMSC to be used directly, so you can have IMSC in an HLS manifest, and Apple's documentation says it will play back
… If you're doing that, the browser controls the rendering and appearance. What could be useful is for the web app to know if the UA can handle that presentation
… Or if you need to do that yourself, and if you do, do you need to handle personalisation, so not use the user's browser level preferences

Kaz: I would agree with Nigel's view and I think there are two points here.
… how to deal with the capability for timed text handling itself,
… and how to deal with embedded data by the API.
… For both purposes, thinking about more use cases would be helpful, for examples.
… When I watch Netflix at home I prefer English captions but my wife prefers Japanese captions.
… It would be useful for us if we could see both at the same time.
… We ourselves don't care how the timed text is provided, embedded or separate.
… Do we really need embedded captions or not? I personally am okay with separate files for each language
… and allowing the browser to iterate all of them.
… If someone needs embedded style that should be clarified as a concrete use case.

Nigel: Style? Most formats support styling.

Kaz: I mean in-band in the MP4 rather than separate TTML. If we really need that option we should clarify the use case.
… Using CMAF is fine, but...
… I mean where to have the caption information, rather than the appearance of the captions themselves.
… Data structure, carriage etc.

Chris: This is a conversation we've had a few times, this idea of the application of user preferences
… and the privacy issue and so on.
… I'm wondering to what extent that has an impact on the Media Capabilities API.
… In what Nigel was saying there was a suggestion that that in itself could be something that's useful
… to query.

JohnRiv: Along those lines, it's been a while since I discussed it at work, but in the past we talked about
… if the user sets preferences in one device, and wants to carry them through to another device, that's
… a use case.
… It might be interesting to be able to query a binary "does the user have presentation preferences or not"

Nigel: I'd be interested to know how you'd use that information

JohnRiv: To say, use the browser based preferences or the app-level preferences. I'm not sure

Nigel: I'm concerned that the model doesn't really work. Every page you visit collects data about usage, e.g., for doing A/B tests
… But the web mechanism here directly pushes against that. It says the needs of the user in terms of privacy are more important than the needs of the user for improved product
… That makes me unformfortable. A mechanism for using that would be helpful. But by not providing any data for improving players and products, it acts against the community whose privacy we're protecting
… So I kind of want both
… This may be away from the core question, i.e, querying the UA for ability to play timed text
… As a player developer, if you're in a browser that says it will take your subtitle format and present it, that would really help some implementers
… Also helps users as there's a unified world for how things work in every page for timed text support

Chris: What do we do next with this?
… I've seen various proposals, particularly from Apple, on how to provide support for custom formats.
… When they say that they mean anything that is not VTT.
… I'm conscious that turning this into something we can propose, or next step...

Nigel: I think the idea of having a canonical transfer format for text tracks that need to be rendered, probably based on HTML and DOM elements, is a good one
… It separates the decoding of your format. But in that case the query is simply: which types of texttrackcue do you implement: only a VTTCue or an HTML Cue (or whatever we call it)

Chris: Makes sense to me

Nigel: Then, the follow on is, why need a separate API? Because you could test for the HTML Cue constructor, so there'd be nothing to do in MC API

Chris: Right, there's a detection mechanism already in a more relevant place
… I came to this from the idea that embedded support might be needed,
… but if that's less critical then these issues that have been raised like SEI events with 608 and 708,
… if the browser is not going to do anything with those captions, i.e. no browser,
… then there's no need to query a capability for them.
… The expectation is you present them as a separate resource.
… Ultimately the Media Capabilities API is about providing choices for which encoding of media to deliver.
… I think we
… have come to a reasonable place on this, unless there's more input from anybody on this.

Transitions

[Slide 10]

Chris: There's been some discussions on this but we haven't specified anything in the WG yet.
… In MSE v2 there's the source buffer change type method which allows you to tell MSE that
… the format of the media you're passing in is changing.
… This allows you to do codec or media container transitions within a source buffer.
… The use case that's driving the need for that API is advertising content that may be encoded differently
… or have different encryption from the primary content.
… There may be implementations that support different kinds of transitions between each,
… depending on the decoding mechanism, if it can be seamless or requires reinitialisation etc.
… There are issues in the GitHub repo to talk about how this might look in terms of an API shape.
… We don't need to go through all the proposals,
… but one is a transition method where you query for one set of capabilities and then
… make another call to say given that state, can you transition to this other one?
… Another proposal is, given two encodings, can the transition be supported,
… and it answers to say supported, smooth, power efficient etc as applicable.
… A third is to use the API as it stands with an additional flag reported in the information you get back
… so it's not just supported and power efficient, but also what codec switching is supported.
… If both encodings say they support transition then that tells the web app that smooth transitions are supported.
… There are these different proposals. The one we're seeing now is the current one.
… I think there's a pull request but it hasn't been merged yet or strongly prioritised.
… The question here is if this is something that is important and needs to be solved?
… Are there other ways of doing this that mean it's not a high priority for websites or players?
… I'm looking for input and feedback from anybody here as to the immediate need for this and
… whether the WG should prioritise specifying this.
… If you have thoughts about the shape of the API that's important.
… The issue numbers are #102 for the ergonomics. If you don't want to say anything in this meeting
… that's okay, you're welcome to leave comments in the GitHub issue as well.

[Slide 13]

Decoding and Rendering

[Slide 15]

Chris: The Media Capabilities API is really about querying the browser's ability to decode.
… We have a separate set of feature proposals that focus on the display capabilities for the rendering side.
… CSS Media Queries let you query for the dynamic range and color gamut support.
… We've proposed additions to these video- prefixed features that allow you to query the video plane
… on the TV device which might be different to the graphics plane.
… These may have separate capabilities (on each plane).
… The idea is to find out if the TV supports 4K and HDR even if the graphics plane is only sRGB, say.
… Similar question around the height and width of the video plane, to query for its resolution.
… We have a proposal in the linked issues, which has a lot of conversation.
… We ended up exposing a pixel ratio, as a numeric value, but not through a media query which is a yes/no
… only, but a number is more useful here than a series of yes/no queries for each resolution.
… This needs to be taken back to the CSS WG to progress it. Last discussed a few years ago.
… The question here is if these are useful features for us to progress?
… They're not really in the Media WG, they're CSS WG, so we shouldn't necessarily look to the WG to push them.
… This is where the IG can have a role and make recommendations into CSS WG to move them forwards.
… Anything you can do to help raise the priority, e.g. commenting on the issue, or commenting here would help.
… The other concern is that as CSS progresses through the Rec track they will need to see implementations
… of these features. It would be helpful for us to point to devices that include these features (TVs),
… otherwise there's a risk of the features being removed potentially.
… Because of a lack of uptake.
… We're out of time, any more thoughts from members now?

Kaz: do you want to discuss the remaining 3 slides as well?
… maybe you can ask people for feedback by email?
… or talk about them as well again during the next call

– DRAFT –
MEIG meeting

06 February 2024

Attendees

Meeting minutes

Intro and agenda

Text track capabilities

Transitions

Decoding and Rendering