See also: IRC log
<kaz> Recorded video from this call
<nigel> scribenick: nigel
Chris: Thank you everyone, welcome to
M&E IG call
... Francois will tell us about the Media Session API.
... Introduction to the API? What is the purpose of the API? Then
we can discuss specific details.
Mounir: I am present too, and can take over the Media Session API part.
<cpn> scribenick: cpn
<tidoust> Media Session Standard
Mounir: The initial goal of the Media
Session API was to produce a UI that most mobile operating systems
have, when you play media.
... We wanted that to be available for mobile web browsers.
... The way it works, the web page can specify metadata regarding
what is currently playing, which we call the media session.
... The API assumes there is only one media session for a given web
page.
... If the web page plays multiple content at the same time, this
is an edge case, we'd have to figure out how to express that
through the Media Session API.
... The metadata in the Media Session API allows you to expose the
artist name, title, album name, then a list of media images.
... We define a MediaImage dictionary, with source, and sizes and
type, a classic set you that get from any type of image on the
web.
... Web App Manifest has the same, icons in HTML also have the same
thing, we use the image type to customize image types, e.g., if
some browsers don't support SVG,
... and use the size to pick the best size depending on where the
browser wants to show the image.
... So the first part of the Media Session API is to allow websites
to expose metadata, so we can have a richer UI.
... The second part of the Media Session API is around actions. The
API allows the website to expose what type of actions it supports
and how to handle them.
... Basic actions are play and pause. The API suggests the UA
should have a default behaviour for these.
... For example, in Chrome these would play and pause the active
media element.
... If there is more than one, it would be all of them at the same
time.
... And then richer actions like seek backward, seek forward,
previous track, next track.
... We recently added skip-ad, which we may remove (or not). We're
thinking in Chrome if we'll end up using it. We had an experiment,
depends on whether there are websites that are interested.
... We also added stop, a fairly common UI action, to pause
notifications entirely.
... Seek back and previous track etc are much harder for a UA to
implement by default. Previous and next track, for obvious
reasons.
... Seek backward and forward is more about compatibility, we see a
lot of compatibility issues when we tried it.
... I believe Safari used to handle seeking from the Touch Bar, but
we haven't tried to do something similar, so unless a website
decides to help with that, we will not.
... The time of seeking isn't only dependent on the website. We
only tell the website that seeking has been asked, it can seek as
many seconds as they specified.
... A website can set an action handler with a callback.
... When the action is specified, ie., the handler is not null, the
UA should notify the website, and have a UI is possible to expose
these actions. And the website can later set the callback to
null.
... We also recently added position state, which allows the website
to tell the UA how long the media session is, the duration, the
playback rate, and the current position.
... This allows the UA to try to guess where we are in the media
playback, and we can then draw a timeline and allow the user to
seek,
... which would require some new actions being discussed in GitHub
right now, to have a seek action that is not seek forward/backward,
but seek to a specific time.
... [...] To allow users to do whatever they want with the
timeline.
... This is a bit harder to do on mobile. I think it was there in
the Android Q Beta, on tablets you can have a have you a UI where
you can use the mouse to more easily drag the timeline.
... What we're looking at in the future, there's a Media Session
action for seeking to an arbitrary time, for skip-ad there's a
discussion around skip as a generic action, or skip ad, skip
episode, or skip intro, actions like that for Netflix or
YouTube.
... And then we have some use cases around WebRTC, should we have
actions for hanging up or muting.
... Maybe we should have a mute action, or even volume actions.
These are not top priorities now for the Media Session API.
<tidoust> scribenick: tidoust
Chris: You mentioned this is to make use of the native controls on mobile devices. Have you given any thought to how this might work on a big screen type implementation, e.g. TV?
<cpn> scribenick: cpn
Mounir: We haven't done a lot of work
around that. The Media Session API in Chrome is exposed via an API
in Android, also named Media Session. That is used on Android TV.
If you have an Android phone connected to the android TV, the Media
Session API shows actions on the TV, or even Android Auto.
... Most of the use cases for TV have been around cast, this not
using MediaSession (Chrome specifically).
Scott: We recently upstreamed a change to add support on Windows desktop. There's integration with system transport controls on Windows.
Jer: Absolutely, this is something we're interested in, as Apple has a big screen interface. I'm hopeful this kind of API would allow a remote to drive a web based presentation, as a media remote control API.
Chris: This something that would be interesting to us as well.
Mounir: I agree, it would be
interesting. We looked at it when implementing Remote Playback API
on Chrome for Android.
... Cast already exposes actions, we could add more actions, but
the cost was fairly high.
... We'd need a bigger opportunity to make the cost worth it. Same
with Presentation API, we have all the actions we need.
Chris: What would happen if you happen if you have multiple browser tabs that each wants to activate Media Session, how is it mediated between multiple tabs?
Mounir: Media Session is different
from Audio Focus. Each page has a media session, but it doesn't
mean that it's playing, and it can be playing at the same time as
another tab.
... We have some ideas around the Audio Focus API to make audio
focus on the web a little bit clearer.
... Today, audio focus is mostly browser specific. In short, most
browsers on mobile have audio focus, and don't do anything on
desktop platforms.
... In ChromeOS we started audio focus for some specific cases, but
MediaSession doesn't care about that. If you have audio focus, the
actions will be sent to your page, but not if you don't have audio
focus, so play and pause actions will not be sent to you if another
tab is playing. That's the only interaction between Audio Focus and
Media Session.
... Scott mentioned Media Session on Windows support, using system
transport controls.
... A reason why we're working on mobile first is that, we have
media session handling on all platforms in Chrome, system controls
on Windows, on MacOS we use native APIs, same on Linux.
... On ChromeOS we are working to show a notification using the
Media Session.
Francois: What's the relationship between some of the Media Session actions, such as play, pause, seek, and the media remote controls being defined in KeyboardEvents, for typical TV remote control buttons?
Mounir: We don't look at that for Media Session, we use platform-specific APIs. on Mac OS there are APIs for media actions at the system level, which is what we are listening to. It means that how that action shows up doesn't matter, it can be a keyboard, a bluetooth headset, or anything that triggers that action.
<Zakim> tidoust, you wanted to wonder about relationship between some media session actions and media remote controls defined (KeyboardEvent)
Jer: At the beginning of the Media
Session API discussion, we considered the use of keyboard events
instead of something more explicit, like actions, and came up
against the problem of how a web page indicates to the UA which
keyboard events it supports.
... With just a keyboard event handler, there's no way to know
whether to handle play, pause, toggle, seek forward, seek backward.
How does the UA know what UI to elevate based on what the page can
do? That was a significant shortcoming to using keyboard events for
media controls.
... The use case we were considering was not necessarily general
purpose remote controls for TVs. It was orignally about the UI
that's elevated in mobile devices.
... Specifically, for Apple, touch pad and touch controls. IIRC
that's why we didn't choose to go in the direction of keyboard
events.
<Barbara_Intel> Timeline - Working Draft and Candidate Recommendation?
Mounir: The main blocker today is
having a second implementation. AFAIK there's no other
implementation of Media Session.
... Otherwise, the API has been progressing recently, as we've been
looking at the desktop use cases.
... Picture in Picture adds some use cases for Media Session, but
for about a year we haven't touched the API at all. It's stable, so
we can make a cut of a v1 we're happy with, and send that to the
Rec track, and work on a v2 if we have any features to add to
it.
<tidoust> scribenick: tidoust
Chris: Coming back to the first
scenario you described, which is presentation of metadata around
the content. Does the spec give any guideline with regards to the
length of strings that should be supported through the native
rendering of the system?
... How would I know what those constraints are?
<cpn> scribenick: cpn
Mounir: We haven't done that. We
don't have that information. In the case of Android there are so
many different phone sizes, strings could be shown on a TV, in a
car, on a watch.
... In general, web developers should expect it to be shown on a
small screen, that should be ready to handle long strings
usingellipses, or scrolling. Devices handle that fairly well, so we
didn't see a need to expose the length.
... It would be hard for us to figure it out, and that data could
be used for fingerprinting.
<Zakim> nigel, you wanted to ask if accessibility related media actions are considered in scope
<nigel> MediaAction
<nigel> My question is if MediaAction has in scope not just play/pause/seek etc but also accessibility actions like subtitles/captions On/off?
Chris: Nigel's having audio issues, let's come back to that later.
Jer: Exposing the length for strings
isn't something we want to expose, a non-feature, because the place
where that data will be presented will change dynamically based on
what the user's doing.
... We don't want the web developer to have to respond to display
length change notifications for them to update the string. It will
be up to the UA to figure out how to display, in the variety of
places the description will show up.
... For accessibility, we haven't discussed in the development of
Media Session. But, platforms have a large number of accessibility
features around media playback, and one of the benefits of the
Media Session API in general is to make those same accessibility
features available to web pages that do custom presentations. The
Media Session API in itself is an accessibility feature for
platforms that already support accessibility for media
playback.
<nigel> Thanks Jer, that didn't really answer my question!
Mounir: We haven't really thought of
toggling captions, haven't had feature requests about that. If
others would be willing to implement it, I don't see why not. It
sounds like it could make sense.
... We have accessibility features at the OS level, e.g., on
Android you can have captions on by default, may want to show
captions at different times.
... The other issue I see here is that the Media Session use case
is to expose the media being played, outside of where you see the
media, e.g., controlling the media from the lock screen, from your
watch, from your TV. So captions may not make much sense, so
showing captions on a watch may not be that useful.
... But there might be some use case around that. If we get user or
developer feedback on this, I see no reason why not.
Jer: The web has a general problem
with accessibility and captions right now, where many websites
choose to roll their own caption behaviour rather than use the APIs
provided by the web in general to do native captions.
... We are very interested in making that work better. For those
websites that choose to present their own captions
non-natively,having an action to turn on captions would probably be
a big benefit. I don't know that Apple has any specific UI around
enabling or disabling captions that would benefit from this. The
original intent of this API was to let mobile presentations work
for complex web applications.
<nigel> +1 to there being a big benefit
Jer: This wasn't a use case we considered. If there are implementers in the TV space that have a caption on/off button, we should consider that.
<nigel> There are system requirements in some countries to have for example a "subtitles" button on remote controls, I see this as analogous.
Jer: If you're just coming across this API, we'd love to get feedback on what features seem to be missing. If we're not considering all the use cases that you are, please go to GitHub and add issues.
<nigel> Should do for Audio Description too.
Jer: A question for Mounir, Francois, Scott. What adoption are you seeing from websites, now that this feature is available in Edge and in Chrome?
Mounir: Fairly big, YouTube, Spotify
using it. We have metrics, YouTube is high. There's good adoption.
Big one missing is probably Netflix.
... It's an easy API to use, so if you want to try, I'd be very
happy. You can really improve the user experience, especially on
mobile.
<nigel> Picture-in-Picture Draft Community Group Report
FrancoisB: Picture in Picture is a
feature that allows users to watch videos in a floating resizeable
window that is always on top of other windows, so they can keep an
eye on what they're watching while working on other things.
... As background, in 2016, Safari added Picture in Picture support
through a WebKit APi in Mac OS Sierra.
... A year after that, we shared our intent to standardise and
build a web API that would be feature compatible with Safari.
... Last year, we launced Picture in Picture for the HTML video
element on Chrome for desktop.
... It takes the video layer and moves it to a window that follows
rules specific to Picture in Picture, such as always on top, fixed
aspect ratio, etc.
... From there we iterated to not only a regular video element, but
also video captured from a camera, or a stream from a canvas
element.
... Then we added Media Session support. If you set previous/next
track action handlers in the Media Session API, it would add
controls to the Picture in Picture window.
... This has been used by Spotify recently, where they have a mini
pop-out player that allows the user to control music playback in a
Picture in Picture window.
... We're now iterating on this version of Picture in Picture, as
we want to allow any HTML element to be in a Picture in Picture
window. We call that Picture in Picture v2, or Picture in Picture
for arbitrary content.
... Looking at the GitHub issues, it's now full of v2 issues.
... We're focused on v2 now. v1 was used by Spotify, not used yet
by YouTube, which is one of the reasons we're thinking of arbitrary
content.
... Some other big players are experimenting now with Picture in
Picture v1.
... Integration with Media Session helps build some compelling use
cases.
... Are we allowing arbitrary content in scope for Media WG, or
focus on v1?
Mounir: I think the whole API is part
of the Media WG, no need to separate arbitrary content. Depending
on consensus, we can have a v1 go to Rec and work on the v2. If we
don't have implementations, it would have to wait.
... Picture in Picture v1 is mostly trying to be feature compatible
with the Safari API.
... It offers a very simple API for websites, but a lot of websites
we worked with felt the API was not powerful enough.
... For example, if you have EME content. Many websites wanted to
show timed overlays on top of the video. With the API as it is
today, it's not interactive. You can overlay with a canvas, but
doesn't work with EME.
... We had a partner that could not use Picture in Picture because
of that. We have partners that require more actions.
... [?] are experimenting with Picture in Picture, but it would be
much more exciting if they could do whatever they want in the
Picture in Picture UI.
... It's a balance between what web developers want and what's best
for users.
... We believe at this stage we have a good implementation of the
Picture in Picture API for the video element, and we are looking at
Picture in Picture API for the DOM. It's similar to what happened
with Full Screen, it started with video and got extended.
Chris: I sent some feedback on this. One of the use cases we have is the ability to customise the controls that are visible in the Picture in Picture window for video. Our video player has a certain look and feel, we provide our own play and pause controls and we'd want to give the same visual styling to the Picture in Picture window, as we do in the regular video element. Is this something you've heard from other users?
Mounir: That's the no. 1 feedback
we've got, branding. People want to keep the look and feel
associated with their brand. For play and pause buttons, we could
offer ways to customise slightly, different colour, but that would
be limited. If a website decided the play button should be a
square, we would not offer this.
... Instead of going down the rabbit hole of offering customisation
options, we felt it would be better to let you do whatever you want
as a website. That way we have a simple API. If, as a website you
don't care about the look and feel, you can use this. For media
controls with a video element, you can put this in a Picture in
Picture API for the video element.
... Another one that's fairly common is subtitles. We have an
issue, a non-issue in Chrome is we don't show the native subtitles.
But a lot of websites [...]
... We could tell them to use the native subtitles, but they're not
really going to change everything for Picture in Picture. If they
want to show subtitles, they need a way.
... Today, they could use a canvas tag to draw subtitles on top,
but that's a lot of work. Subtitles are usually injected on top of
the video, and to make those drawn with WebGL is a significant
amount of work.
... I know only of one website that draws subtitles with WebGL
today.
Jer: I think subtitles is an
orthogonal problem, for the reasons I mentioned earlier.
... Drawing subtitles yourself through WebGL or as native DOM
elements has accessibility problems.
... Any work we do to enable subtitles should be to fix
holistically, in my opinion. We should not try to take one-off
actions to make the currently bad presentation of subtitles work in
more places. We should address those features issues that keep
websites from using the native subtitling support.
<nigel> Where should subtitles be shown, regardless of native/non-native, when video is in a PIP window?
Jer: There was added to the Media WG, as a possible incubation target, a new subtitle API that would hopefully allow more native presentation of subtitles for those websites that are using non-WebVTT subtitle formats, with all the custom styling rules they feel they need to present the subtitles on top of the content.
<MarkVickers> +1 to Media WG taking on subtitles rework.
Jer: For me, the subtitle issue should get fixed on the web platform level, so that it does work for Picture in Picture v1, for those user agents that can present subtitles inside a Picture in Picture window.
Nigel: Interesting point. If you have
a small video presentation window, where should the subtitles go?
There could be an accessibility problem, regardless of whether its
native or non-native presentation.
... In Picture in Picture, where on the screen should they go, or
in a separate window?
<fbeaufort_> Quick question: Jer, will Safari implement PiP V2 (arbitrary content)?
Jer: One of the problems with the v2
approach is that it constrains the solution space so that every
website has to come up with their own implementation specific
presentation of subtitles. The UA can't do anything to help in that
case.
... Netflix want to very specifically to control the presentation
of subtitles regardless of the user's current platform
choices.
... One of the benefits of having native subtitle experience is
that users who have varying ability to see the subtitles can make
those decisions on a platform basis, and we as a UA can provide a
separate window for the subtitles, or different renderings,
different fonts, colours, outlines, backgrounds, etc, without that
having to be exposed by the website itself, and having every
website have to make those decisions and custom presentations based
on user choices.
... This is one reason I'm pushing back on subtitles being a
driving feature for PiP v2, because it leaves out the existing
users of the Picture in Picture window that's currently window, and
it constrains the UA's ability to make good decisions for the
presentation of subtitles.
... I hadn't considered the case of having a second PiP window for
subtitles, but I can see perhaps resizing the PiP window to give a
separate space along the bottom or top for having a highly visible
set of subtitles.
Mounir: To clarify, subtitles isn't a
driving use case for v2, but it's one of the many use cases we've
heard.
... It's not a big one, as we can usually tell them that there's a
solution for them.
... It's a death by a thousand cuts that websites are facing. With
Picture in Picture as it is today, there's so many things they
cannot do, that they'd rather not spend the time doing it.
Jer: We've seen the same thing, where Safari's own prefixed Picture in Picture API is not heavily used outside of a number of websites, probably for the same reasons. We let people get into Picture in Picture without requiring the website to adopt it specifically. We see that the feature is popular, and would like more to use it. I don't know we'd every be able to give a fully interactive HTML content PiP window, given the constraints of our own UI, and our own UI guidelines for Mac and iOS more generally.
Nigel: Regarding native vs not native playback with the interface design, would be awkard also for BBC in the same way as for Netflix.
<Barbara_Intel> Separate topic: Media Capture Stream and Depth API. How will you work with that media API? Will it become part of the media working group?
Nigel: What we'd like to see from a requirements perspective is something that allows us to factor that decision out and take it elsewhere. You mentioned an API approach that would be useful. If the answer was that subtitles only worked in PiP if you use native playback, that would be a broken architecture. So I'd like to un-mix those things.
Jer: Why would it be that subtitles would only work in native playback?
Nigel: I'm Responding to the suggestion that fixing subtitles and captions by using native is the best way, and for the moment that's not clear-cut. Something like PiP, it would be useful to have a model where you can say where subtitles should go or what should happen, or what the user expectation should be by default. Or even a place you can put them, at an API level, whether it's DOM or a specific subtitle API.
Jer: What would prevent you from using native subtitles? What feature is missing?
Nigel: That's a broader topic, but let's make that discussion orthogonal to PiP.
Jer: That discussion will be
interesting from a Media WG perspective, as we are looking at a
spec that should allow more customised presentation of subtitles,
that we're hoping will address some of what you said for why the
current state of subtitles on the web isn't sufficient for web
pages.
... Let's have that as a separate discussion, but it will become
relevant very soon.
Nigel: I agree with the summary that there's a problem with how subtitles and captions work on the web, not sure everybody agrees on the solution, happy to keep working on it.
Barbara: Media capture is also evolving, I'm trying to get a strategic direction from the Media WG, will you be looking at Media Capture?
Mounir: I don't think that Media Capture is in the Charter for the Media WG. But Picture in Picture and Media Capture work well together. You should be able to send a video from Media Capture to Picture in Picture.
Barbara: So these are two distinct APIs, going down their own path to implementation, draft and release candidates?
Mounir: Yes
Jer: Picture in Picture, Media
Capabilities, and Media Session APIs all interface with the media
element that's described elsewhere, in HTML. We want to make these
APIs that we're incubating and shipping work with the media
element. We also want to make them work correctly with Media
Capture that's defined elsewhere.
... Just because it's in another WG, we shouldn't forego the
possibility of making specific features to make capture work with
Picture in Picture or Media Session or Audio Focus API.
Barbara: I'd suggest producing an architectural view of the Media Working Group and how you work with different APIs, it would help understanding who's doing what.
Chris: This could relate to the W3C's website, the Overview of Media Technologies for the Web, could be a good place to capture it.
Barbara: Potentially, they're all interrelated.
Chris: How does Picture in Picture work on mobile or TV devices?
FrancoisB: We haven't shipped on Chrome Android, we're thinking about it. For TV, I've contacted the LG OS folks regarding Picture in Picture.
Mounir: On mobile, we have a feature called [?] which sends a video picture in picture automatically. You watch full screen and hide the page by pressing the home button or similar. It's a fairly popular feature. We're starting to work on the Picture in Picture API implementation on Android.
Chris: Any more questions on Picture in Picture before we move on?
Chris: As Mounir and Jer are both here, as co-chairs of the Media WG, what are your plans for starting the group, and initial focuses?
<tidoust> Media WG home page
<tidoust> [Please note (and use!) the "Join this group" link on the Media WG home page!]
<tidoust> Media WG deliverables (normative and potentially normative)
Mounir: The main focuses are Picture
in Picture and Media Session. For Media Session, there's work to do
getting it to Rec, for Picture in Picture there's the v2
part.
... Media Capabilities is a big one, there's a lot of work to do,
we get new use cases almost every day.
... Work around AutoPlay, there's what used to be called Video
Playback Quality that was part of MSE, we get requests to make that
compatible.
... Obviously, MSE and EME, adding new features based on browser
and developer feedback. The main one is around Workers, and
transitions.
... For EME there is a handful of small changes that we
added.
... As Jer mentioned, there's things around TextTracks and
DataCue.
Jer: What are your priorities among those?
Mounir: I think those are the
priorities, it's hard to order them.
... Looking at the Charter, we have normative specs and potential
normative specs, those are the P1s and P2s for the WG.
... One priority we have is getting things that have been done in
WICG for a couple of years into the Rec track.
... Media Capabilities isn't quite ready, as we get a lot of
feature requests, then taking care of MSE and EME. Those are very
mature, some small changes.
... And then there are new features, in the potential normative
specs, lower priority as we have lots to do already.
... Maybe Francois and Jer have different visions?
Jer: EME has particular importance,
as it has been neglected for so long for process reasons, has lots
of little bug fixes that can be taken care of quickly, then have
the entire EME discussion put to bed for a while.
... MSE has new features that are interesting from a web page point
of view. A lot of the pain points around MSE adoption have been
raised over the last couple of years, and new features are being
proposed to address them. That's one of the highest priorities from
my point of view.
... Next is to bring all the WICG work that's been done to a Rec
track, specifically the Picture in Picture and Media Session
APIs.
<fbeaufort_> I assume it was you ;)
Chris: When are you planning a first F2F meeting, conference calls, or online discussion?
Jer: Mounir and I haven't discussed
that yet. I haven't joined the WG yet, it's stuck in the process at
Apple. Our first F2F meeting is scheduled at TPAC, mid-September
this year.
... In my opinion, I think we should have yearly or semi-yearly
F2F.
... GitHub issues for change proposals and feature requests has
been working so far, no pressing need to move away from that.
... Mounir and I haven't discussed WG process yet.
Mounir: I agree we should have one or
two F2F per year.
... GitHub is fine where issues are for minor changes or aren't
controversial, can be resolved with GitHub fairly quickly.
... As soon as there is controversy, it gets harder to resolve
discussions on GitHub, or it can take a long time, that's where
regular F2F meetings can help.
... I'm not a huge fan of [?] meetings with a lot of people,
depends how we can schedule them. I would rather have F2F
meetings.
Jer: A regular phone call can happen
more frequently and don't require travel. What I found with the
Audio WG is that regular phone calls did manage to iterate on
problems relatively quickly.
... Mounir and I will have to work out what process works best for
the wide variety of specs we're dealing with. Nothing has been
settled. Please let us know what you'd prefer as well. It can be a
bigger discussion than just between the WG chairs.
Chris: This is something to follow up on. And separately, the relationship between the Interest Group and the Working Group and how the IG can bring use cases and requirements and new candidate topics for the WG.
Mark: We should discuss how the IG
and the WG can best work together. I'm sure there'll be a lot of
contribution from the IG to the GitHub repos, and paricipation in
calls and F2F.
... Anytime the WG would want to outsouce some requirements
gethering to the IG, happy to do that. We have contact with a large
number of media companies and media experts. We could do
requirements gathering on any particular topic.
Chris: I agree. And also we have open liaisons with other media organisations, so if you need input from ATSC or HbbTV, that's something we can help with.
Chris: We've covered all the agenda items. Anything else?
[Nothing]
Chris: Thank you everyone for
joining, especially Mounir, Francois, and Jer for giving an update
on Picture in Picture and Media Session. These are interesting and
exciting new APIs that you're bringing to the web, to improve the
user experience of media on the web.
... I'd like to invite you back to cover some of the other topics
for the Working Group, we'll follow up separately.
<tidoust> scribenick: tidoust
Chris: Next call will be on Tuesday July 2nd, topic to be decided.
[Adjourned]