W3C

- DRAFT -

M&E IG Monthly Call

4 Jun 2019

See also: IRC log

Attendees

Present
Nigel_Megitt, Francois_Daoust, Scott_Low, Tatsuya_Igarashi, Barbara_Hochgesang, Francois_Beaufort, Mounir_Lamouri, Greg_Freedman, John_Luther, Larry_Zhao, Mark_Vickers, Xu_Song, mg_chen, wushan, Rob_Smith, Jer_Noble, Chris_Needham
Regrets
Chair
Chris
Scribe
nigel, cpn, tidoust

Contents


<kaz> Recorded video from this call

<nigel> scribenick: nigel

Chris: Thank you everyone, welcome to M&E IG call
... Francois will tell us about the Media Session API.
... Introduction to the API? What is the purpose of the API? Then we can discuss specific details.

Mounir: I am present too, and can take over the Media Session API part.

<cpn> scribenick: cpn

Media Session API

<tidoust> Media Session Standard

Mounir: The initial goal of the Media Session API was to produce a UI that most mobile operating systems have, when you play media.
... We wanted that to be available for mobile web browsers.
... The way it works, the web page can specify metadata regarding what is currently playing, which we call the media session.
... The API assumes there is only one media session for a given web page.
... If the web page plays multiple content at the same time, this is an edge case, we'd have to figure out how to express that through the Media Session API.
... The metadata in the Media Session API allows you to expose the artist name, title, album name, then a list of media images.
... We define a MediaImage dictionary, with source, and sizes and type, a classic set you that get from any type of image on the web.
... Web App Manifest has the same, icons in HTML also have the same thing, we use the image type to customize image types, e.g., if some browsers don't support SVG,
... and use the size to pick the best size depending on where the browser wants to show the image.
... So the first part of the Media Session API is to allow websites to expose metadata, so we can have a richer UI.
... The second part of the Media Session API is around actions. The API allows the website to expose what type of actions it supports and how to handle them.
... Basic actions are play and pause. The API suggests the UA should have a default behaviour for these.
... For example, in Chrome these would play and pause the active media element.
... If there is more than one, it would be all of them at the same time.
... And then richer actions like seek backward, seek forward, previous track, next track.
... We recently added skip-ad, which we may remove (or not). We're thinking in Chrome if we'll end up using it. We had an experiment, depends on whether there are websites that are interested.
... We also added stop, a fairly common UI action, to pause notifications entirely.
... Seek back and previous track etc are much harder for a UA to implement by default. Previous and next track, for obvious reasons.
... Seek backward and forward is more about compatibility, we see a lot of compatibility issues when we tried it.
... I believe Safari used to handle seeking from the Touch Bar, but we haven't tried to do something similar, so unless a website decides to help with that, we will not.
... The time of seeking isn't only dependent on the website. We only tell the website that seeking has been asked, it can seek as many seconds as they specified.
... A website can set an action handler with a callback.
... When the action is specified, ie., the handler is not null, the UA should notify the website, and have a UI is possible to expose these actions. And the website can later set the callback to null.
... We also recently added position state, which allows the website to tell the UA how long the media session is, the duration, the playback rate, and the current position.
... This allows the UA to try to guess where we are in the media playback, and we can then draw a timeline and allow the user to seek,
... which would require some new actions being discussed in GitHub right now, to have a seek action that is not seek forward/backward, but seek to a specific time.
... [...] To allow users to do whatever they want with the timeline.
... This is a bit harder to do on mobile. I think it was there in the Android Q Beta, on tablets you can have a have you a UI where you can use the mouse to more easily drag the timeline.
... What we're looking at in the future, there's a Media Session action for seeking to an arbitrary time, for skip-ad there's a discussion around skip as a generic action, or skip ad, skip episode, or skip intro, actions like that for Netflix or YouTube.
... And then we have some use cases around WebRTC, should we have actions for hanging up or muting.
... Maybe we should have a mute action, or even volume actions. These are not top priorities now for the Media Session API.

<tidoust> scribenick: tidoust

Chris: You mentioned this is to make use of the native controls on mobile devices. Have you given any thought to how this might work on a big screen type implementation, e.g. TV?

<cpn> scribenick: cpn

Mounir: We haven't done a lot of work around that. The Media Session API in Chrome is exposed via an API in Android, also named Media Session. That is used on Android TV. If you have an Android phone connected to the android TV, the Media Session API shows actions on the TV, or even Android Auto.
... Most of the use cases for TV have been around cast, this not using MediaSession (Chrome specifically).

Scott: We recently upstreamed a change to add support on Windows desktop. There's integration with system transport controls on Windows.

Jer: Absolutely, this is something we're interested in, as Apple has a big screen interface. I'm hopeful this kind of API would allow a remote to drive a web based presentation, as a media remote control API.

Chris: This something that would be interesting to us as well.

Mounir: I agree, it would be interesting. We looked at it when implementing Remote Playback API on Chrome for Android.
... Cast already exposes actions, we could add more actions, but the cost was fairly high.
... We'd need a bigger opportunity to make the cost worth it. Same with Presentation API, we have all the actions we need.

Chris: What would happen if you happen if you have multiple browser tabs that each wants to activate Media Session, how is it mediated between multiple tabs?

Mounir: Media Session is different from Audio Focus. Each page has a media session, but it doesn't mean that it's playing, and it can be playing at the same time as another tab.
... We have some ideas around the Audio Focus API to make audio focus on the web a little bit clearer.
... Today, audio focus is mostly browser specific. In short, most browsers on mobile have audio focus, and don't do anything on desktop platforms.
... In ChromeOS we started audio focus for some specific cases, but MediaSession doesn't care about that. If you have audio focus, the actions will be sent to your page, but not if you don't have audio focus, so play and pause actions will not be sent to you if another tab is playing. That's the only interaction between Audio Focus and Media Session.
... Scott mentioned Media Session on Windows support, using system transport controls.
... A reason why we're working on mobile first is that, we have media session handling on all platforms in Chrome, system controls on Windows, on MacOS we use native APIs, same on Linux.
... On ChromeOS we are working to show a notification using the Media Session.

Francois: What's the relationship between some of the Media Session actions, such as play, pause, seek, and the media remote controls being defined in KeyboardEvents, for typical TV remote control buttons?

Mounir: We don't look at that for Media Session, we use platform-specific APIs. on Mac OS there are APIs for media actions at the system level, which is what we are listening to. It means that how that action shows up doesn't matter, it can be a keyboard, a bluetooth headset, or anything that triggers that action.

<Zakim> tidoust, you wanted to wonder about relationship between some media session actions and media remote controls defined (KeyboardEvent)

Jer: At the beginning of the Media Session API discussion, we considered the use of keyboard events instead of something more explicit, like actions, and came up against the problem of how a web page indicates to the UA which keyboard events it supports.
... With just a keyboard event handler, there's no way to know whether to handle play, pause, toggle, seek forward, seek backward. How does the UA know what UI to elevate based on what the page can do? That was a significant shortcoming to using keyboard events for media controls.
... The use case we were considering was not necessarily general purpose remote controls for TVs. It was orignally about the UI that's elevated in mobile devices.
... Specifically, for Apple, touch pad and touch controls. IIRC that's why we didn't choose to go in the direction of keyboard events.

<Barbara_Intel> Timeline - Working Draft and Candidate Recommendation?

Mounir: The main blocker today is having a second implementation. AFAIK there's no other implementation of Media Session.
... Otherwise, the API has been progressing recently, as we've been looking at the desktop use cases.
... Picture in Picture adds some use cases for Media Session, but for about a year we haven't touched the API at all. It's stable, so we can make a cut of a v1 we're happy with, and send that to the Rec track, and work on a v2 if we have any features to add to it.

<tidoust> scribenick: tidoust

Chris: Coming back to the first scenario you described, which is presentation of metadata around the content. Does the spec give any guideline with regards to the length of strings that should be supported through the native rendering of the system?
... How would I know what those constraints are?

<cpn> scribenick: cpn

Mounir: We haven't done that. We don't have that information. In the case of Android there are so many different phone sizes, strings could be shown on a TV, in a car, on a watch.
... In general, web developers should expect it to be shown on a small screen, that should be ready to handle long strings usingellipses, or scrolling. Devices handle that fairly well, so we didn't see a need to expose the length.
... It would be hard for us to figure it out, and that data could be used for fingerprinting.

<Zakim> nigel, you wanted to ask if accessibility related media actions are considered in scope

<nigel> MediaAction

<nigel> My question is if MediaAction has in scope not just play/pause/seek etc but also accessibility actions like subtitles/captions On/off?

Chris: Nigel's having audio issues, let's come back to that later.

Jer: Exposing the length for strings isn't something we want to expose, a non-feature, because the place where that data will be presented will change dynamically based on what the user's doing.
... We don't want the web developer to have to respond to display length change notifications for them to update the string. It will be up to the UA to figure out how to display, in the variety of places the description will show up.
... For accessibility, we haven't discussed in the development of Media Session. But, platforms have a large number of accessibility features around media playback, and one of the benefits of the Media Session API in general is to make those same accessibility features available to web pages that do custom presentations. The Media Session API in itself is an accessibility feature for platforms that already support accessibility for media playback.

<nigel> Thanks Jer, that didn't really answer my question!

Mounir: We haven't really thought of toggling captions, haven't had feature requests about that. If others would be willing to implement it, I don't see why not. It sounds like it could make sense.
... We have accessibility features at the OS level, e.g., on Android you can have captions on by default, may want to show captions at different times.
... The other issue I see here is that the Media Session use case is to expose the media being played, outside of where you see the media, e.g., controlling the media from the lock screen, from your watch, from your TV. So captions may not make much sense, so showing captions on a watch may not be that useful.
... But there might be some use case around that. If we get user or developer feedback on this, I see no reason why not.

Jer: The web has a general problem with accessibility and captions right now, where many websites choose to roll their own caption behaviour rather than use the APIs provided by the web in general to do native captions.
... We are very interested in making that work better. For those websites that choose to present their own captions non-natively,having an action to turn on captions would probably be a big benefit. I don't know that Apple has any specific UI around enabling or disabling captions that would benefit from this. The original intent of this API was to let mobile presentations work for complex web applications.

<nigel> +1 to there being a big benefit

Jer: This wasn't a use case we considered. If there are implementers in the TV space that have a caption on/off button, we should consider that.

<nigel> There are system requirements in some countries to have for example a "subtitles" button on remote controls, I see this as analogous.

Jer: If you're just coming across this API, we'd love to get feedback on what features seem to be missing. If we're not considering all the use cases that you are, please go to GitHub and add issues.

<nigel> Should do for Audio Description too.

Jer: A question for Mounir, Francois, Scott. What adoption are you seeing from websites, now that this feature is available in Edge and in Chrome?

Mounir: Fairly big, YouTube, Spotify using it. We have metrics, YouTube is high. There's good adoption. Big one missing is probably Netflix.
... It's an easy API to use, so if you want to try, I'd be very happy. You can really improve the user experience, especially on mobile.

Picture in Picture API

<nigel> Picture-in-Picture Draft Community Group Report

FrancoisB: Picture in Picture is a feature that allows users to watch videos in a floating resizeable window that is always on top of other windows, so they can keep an eye on what they're watching while working on other things.
... As background, in 2016, Safari added Picture in Picture support through a WebKit APi in Mac OS Sierra.
... A year after that, we shared our intent to standardise and build a web API that would be feature compatible with Safari.
... Last year, we launced Picture in Picture for the HTML video element on Chrome for desktop.
... It takes the video layer and moves it to a window that follows rules specific to Picture in Picture, such as always on top, fixed aspect ratio, etc.
... From there we iterated to not only a regular video element, but also video captured from a camera, or a stream from a canvas element.
... Then we added Media Session support. If you set previous/next track action handlers in the Media Session API, it would add controls to the Picture in Picture window.
... This has been used by Spotify recently, where they have a mini pop-out player that allows the user to control music playback in a Picture in Picture window.
... We're now iterating on this version of Picture in Picture, as we want to allow any HTML element to be in a Picture in Picture window. We call that Picture in Picture v2, or Picture in Picture for arbitrary content.
... Looking at the GitHub issues, it's now full of v2 issues.
... We're focused on v2 now. v1 was used by Spotify, not used yet by YouTube, which is one of the reasons we're thinking of arbitrary content.
... Some other big players are experimenting now with Picture in Picture v1.
... Integration with Media Session helps build some compelling use cases.
... Are we allowing arbitrary content in scope for Media WG, or focus on v1?

Mounir: I think the whole API is part of the Media WG, no need to separate arbitrary content. Depending on consensus, we can have a v1 go to Rec and work on the v2. If we don't have implementations, it would have to wait.
... Picture in Picture v1 is mostly trying to be feature compatible with the Safari API.
... It offers a very simple API for websites, but a lot of websites we worked with felt the API was not powerful enough.
... For example, if you have EME content. Many websites wanted to show timed overlays on top of the video. With the API as it is today, it's not interactive. You can overlay with a canvas, but doesn't work with EME.
... We had a partner that could not use Picture in Picture because of that. We have partners that require more actions.
... [?] are experimenting with Picture in Picture, but it would be much more exciting if they could do whatever they want in the Picture in Picture UI.
... It's a balance between what web developers want and what's best for users.
... We believe at this stage we have a good implementation of the Picture in Picture API for the video element, and we are looking at Picture in Picture API for the DOM. It's similar to what happened with Full Screen, it started with video and got extended.

Chris: I sent some feedback on this. One of the use cases we have is the ability to customise the controls that are visible in the Picture in Picture window for video. Our video player has a certain look and feel, we provide our own play and pause controls and we'd want to give the same visual styling to the Picture in Picture window, as we do in the regular video element. Is this something you've heard from other users?

Mounir: That's the no. 1 feedback we've got, branding. People want to keep the look and feel associated with their brand. For play and pause buttons, we could offer ways to customise slightly, different colour, but that would be limited. If a website decided the play button should be a square, we would not offer this.
... Instead of going down the rabbit hole of offering customisation options, we felt it would be better to let you do whatever you want as a website. That way we have a simple API. If, as a website you don't care about the look and feel, you can use this. For media controls with a video element, you can put this in a Picture in Picture API for the video element.
... Another one that's fairly common is subtitles. We have an issue, a non-issue in Chrome is we don't show the native subtitles. But a lot of websites [...]
... We could tell them to use the native subtitles, but they're not really going to change everything for Picture in Picture. If they want to show subtitles, they need a way.
... Today, they could use a canvas tag to draw subtitles on top, but that's a lot of work. Subtitles are usually injected on top of the video, and to make those drawn with WebGL is a significant amount of work.
... I know only of one website that draws subtitles with WebGL today.

Jer: I think subtitles is an orthogonal problem, for the reasons I mentioned earlier.
... Drawing subtitles yourself through WebGL or as native DOM elements has accessibility problems.
... Any work we do to enable subtitles should be to fix holistically, in my opinion. We should not try to take one-off actions to make the currently bad presentation of subtitles work in more places. We should address those features issues that keep websites from using the native subtitling support.

<nigel> Where should subtitles be shown, regardless of native/non-native, when video is in a PIP window?

Jer: There was added to the Media WG, as a possible incubation target, a new subtitle API that would hopefully allow more native presentation of subtitles for those websites that are using non-WebVTT subtitle formats, with all the custom styling rules they feel they need to present the subtitles on top of the content.

<MarkVickers> +1 to Media WG taking on subtitles rework.

Jer: For me, the subtitle issue should get fixed on the web platform level, so that it does work for Picture in Picture v1, for those user agents that can present subtitles inside a Picture in Picture window.

Nigel: Interesting point. If you have a small video presentation window, where should the subtitles go? There could be an accessibility problem, regardless of whether its native or non-native presentation.
... In Picture in Picture, where on the screen should they go, or in a separate window?

<fbeaufort_> Quick question: Jer, will Safari implement PiP V2 (arbitrary content)?

Jer: One of the problems with the v2 approach is that it constrains the solution space so that every website has to come up with their own implementation specific presentation of subtitles. The UA can't do anything to help in that case.
... Netflix want to very specifically to control the presentation of subtitles regardless of the user's current platform choices.
... One of the benefits of having native subtitle experience is that users who have varying ability to see the subtitles can make those decisions on a platform basis, and we as a UA can provide a separate window for the subtitles, or different renderings, different fonts, colours, outlines, backgrounds, etc, without that having to be exposed by the website itself, and having every website have to make those decisions and custom presentations based on user choices.
... This is one reason I'm pushing back on subtitles being a driving feature for PiP v2, because it leaves out the existing users of the Picture in Picture window that's currently window, and it constrains the UA's ability to make good decisions for the presentation of subtitles.
... I hadn't considered the case of having a second PiP window for subtitles, but I can see perhaps resizing the PiP window to give a separate space along the bottom or top for having a highly visible set of subtitles.

Mounir: To clarify, subtitles isn't a driving use case for v2, but it's one of the many use cases we've heard.
... It's not a big one, as we can usually tell them that there's a solution for them.
... It's a death by a thousand cuts that websites are facing. With Picture in Picture as it is today, there's so many things they cannot do, that they'd rather not spend the time doing it.

Jer: We've seen the same thing, where Safari's own prefixed Picture in Picture API is not heavily used outside of a number of websites, probably for the same reasons. We let people get into Picture in Picture without requiring the website to adopt it specifically. We see that the feature is popular, and would like more to use it. I don't know we'd every be able to give a fully interactive HTML content PiP window, given the constraints of our own UI, and our own UI guidelines for Mac and iOS more generally.

Nigel: Regarding native vs not native playback with the interface design, would be awkard also for BBC in the same way as for Netflix.

<Barbara_Intel> Separate topic: Media Capture Stream and Depth API. How will you work with that media API? Will it become part of the media working group?

Nigel: What we'd like to see from a requirements perspective is something that allows us to factor that decision out and take it elsewhere. You mentioned an API approach that would be useful. If the answer was that subtitles only worked in PiP if you use native playback, that would be a broken architecture. So I'd like to un-mix those things.

Jer: Why would it be that subtitles would only work in native playback?

Nigel: I'm Responding to the suggestion that fixing subtitles and captions by using native is the best way, and for the moment that's not clear-cut. Something like PiP, it would be useful to have a model where you can say where subtitles should go or what should happen, or what the user expectation should be by default. Or even a place you can put them, at an API level, whether it's DOM or a specific subtitle API.

Jer: What would prevent you from using native subtitles? What feature is missing?

Nigel: That's a broader topic, but let's make that discussion orthogonal to PiP.

Jer: That discussion will be interesting from a Media WG perspective, as we are looking at a spec that should allow more customised presentation of subtitles, that we're hoping will address some of what you said for why the current state of subtitles on the web isn't sufficient for web pages.
... Let's have that as a separate discussion, but it will become relevant very soon.

Nigel: I agree with the summary that there's a problem with how subtitles and captions work on the web, not sure everybody agrees on the solution, happy to keep working on it.

Barbara: Media capture is also evolving, I'm trying to get a strategic direction from the Media WG, will you be looking at Media Capture?

Mounir: I don't think that Media Capture is in the Charter for the Media WG. But Picture in Picture and Media Capture work well together. You should be able to send a video from Media Capture to Picture in Picture.

Barbara: So these are two distinct APIs, going down their own path to implementation, draft and release candidates?

Mounir: Yes

Jer: Picture in Picture, Media Capabilities, and Media Session APIs all interface with the media element that's described elsewhere, in HTML. We want to make these APIs that we're incubating and shipping work with the media element. We also want to make them work correctly with Media Capture that's defined elsewhere.
... Just because it's in another WG, we shouldn't forego the possibility of making specific features to make capture work with Picture in Picture or Media Session or Audio Focus API.

Barbara: I'd suggest producing an architectural view of the Media Working Group and how you work with different APIs, it would help understanding who's doing what.

Chris: This could relate to the W3C's website, the Overview of Media Technologies for the Web, could be a good place to capture it.

Barbara: Potentially, they're all interrelated.

Chris: How does Picture in Picture work on mobile or TV devices?

FrancoisB: We haven't shipped on Chrome Android, we're thinking about it. For TV, I've contacted the LG OS folks regarding Picture in Picture.

Mounir: On mobile, we have a feature called [?] which sends a video picture in picture automatically. You watch full screen and hide the page by pressing the home button or similar. It's a fairly popular feature. We're starting to work on the Picture in Picture API implementation on Android.

Chris: Any more questions on Picture in Picture before we move on?

Media Working Group

Chris: As Mounir and Jer are both here, as co-chairs of the Media WG, what are your plans for starting the group, and initial focuses?

<tidoust> Media WG home page

<tidoust> [Please note (and use!) the "Join this group" link on the Media WG home page!]

<tidoust> Media WG deliverables (normative and potentially normative)

Mounir: The main focuses are Picture in Picture and Media Session. For Media Session, there's work to do getting it to Rec, for Picture in Picture there's the v2 part.
... Media Capabilities is a big one, there's a lot of work to do, we get new use cases almost every day.
... Work around AutoPlay, there's what used to be called Video Playback Quality that was part of MSE, we get requests to make that compatible.
... Obviously, MSE and EME, adding new features based on browser and developer feedback. The main one is around Workers, and transitions.
... For EME there is a handful of small changes that we added.
... As Jer mentioned, there's things around TextTracks and DataCue.

Jer: What are your priorities among those?

Mounir: I think those are the priorities, it's hard to order them.
... Looking at the Charter, we have normative specs and potential normative specs, those are the P1s and P2s for the WG.
... One priority we have is getting things that have been done in WICG for a couple of years into the Rec track.
... Media Capabilities isn't quite ready, as we get a lot of feature requests, then taking care of MSE and EME. Those are very mature, some small changes.
... And then there are new features, in the potential normative specs, lower priority as we have lots to do already.
... Maybe Francois and Jer have different visions?

Jer: EME has particular importance, as it has been neglected for so long for process reasons, has lots of little bug fixes that can be taken care of quickly, then have the entire EME discussion put to bed for a while.
... MSE has new features that are interesting from a web page point of view. A lot of the pain points around MSE adoption have been raised over the last couple of years, and new features are being proposed to address them. That's one of the highest priorities from my point of view.
... Next is to bring all the WICG work that's been done to a Rec track, specifically the Picture in Picture and Media Session APIs.

<fbeaufort_> I assume it was you ;)

Chris: When are you planning a first F2F meeting, conference calls, or online discussion?

Jer: Mounir and I haven't discussed that yet. I haven't joined the WG yet, it's stuck in the process at Apple. Our first F2F meeting is scheduled at TPAC, mid-September this year.
... In my opinion, I think we should have yearly or semi-yearly F2F.
... GitHub issues for change proposals and feature requests has been working so far, no pressing need to move away from that.
... Mounir and I haven't discussed WG process yet.

Mounir: I agree we should have one or two F2F per year.
... GitHub is fine where issues are for minor changes or aren't controversial, can be resolved with GitHub fairly quickly.
... As soon as there is controversy, it gets harder to resolve discussions on GitHub, or it can take a long time, that's where regular F2F meetings can help.
... I'm not a huge fan of [?] meetings with a lot of people, depends how we can schedule them. I would rather have F2F meetings.

Jer: A regular phone call can happen more frequently and don't require travel. What I found with the Audio WG is that regular phone calls did manage to iterate on problems relatively quickly.
... Mounir and I will have to work out what process works best for the wide variety of specs we're dealing with. Nothing has been settled. Please let us know what you'd prefer as well. It can be a bigger discussion than just between the WG chairs.

Chris: This is something to follow up on. And separately, the relationship between the Interest Group and the Working Group and how the IG can bring use cases and requirements and new candidate topics for the WG.

Mark: We should discuss how the IG and the WG can best work together. I'm sure there'll be a lot of contribution from the IG to the GitHub repos, and paricipation in calls and F2F.
... Anytime the WG would want to outsouce some requirements gethering to the IG, happy to do that. We have contact with a large number of media companies and media experts. We could do requirements gathering on any particular topic.

Chris: I agree. And also we have open liaisons with other media organisations, so if you need input from ATSC or HbbTV, that's something we can help with.

Wrap up

Chris: We've covered all the agenda items. Anything else?

[Nothing]

Chris: Thank you everyone for joining, especially Mounir, Francois, and Jer for giving an update on Picture in Picture and Media Session. These are interesting and exciting new APIs that you're bringing to the web, to improve the user experience of media on the web.
... I'd like to invite you back to cover some of the other topics for the Working Group, we'll follow up separately.

<tidoust> scribenick: tidoust

Next call

Chris: Next call will be on Tuesday July 2nd, topic to be decided.

[Adjourned]

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2019/07/09 06:21:03 $