W3C

Media WG / WebRTC Joint Meeting

23 May 2022

Attendees

Present
Bernard Aboba, Brian Baldino, Chris Needham, Dominique Hazael-Massieux, Elad Alon, Eric Carlson, Francois Daoust, Frank Liberato, Harald Alvestrand, Jan-Iva Bruaroey, Jer Noble, Tommy Steimel, Xabier Rodriguez Calvar, Youenn Fablet
Regrets
-
Chair
-
Scribe
cpn, tidoust

Meeting minutes

Media session API

cpn: I reviewed the notes from last meeting. We didn't reach a conclusion but had a good exploration of the space, of the stated goals, and potential interactions that this raises for the Media Session API.
… Would it help to restate some of that?

youenn: WebRTC develops tab capture. E.g.: WebEx wants to capture a tab and you may want to interact with the content in the tab, e.g. a slideshow.
… Elad provided a way to ease the interaction between the capturer and the capturee, in the Capture Handle spec.
… E.g. to tell the capturee to go to next/previous slide.
… We knew there would be some overlap with Media Session API. Also similar to use cases that involve picture-in-picture windows.
… Goal is to understand the overlap, whether we can share the work, and how.

cpn: The Media Session API is quite simple for the time being. You receive the actions that are UA-provided and you're evoking a script on the page.
… If we're looking at page-initiated actions, there are security implications. What I'm wondering is what the related specifications should be covering. Whether we think that the implementation of next/previous slide action would be defined in the Capture handle spec or Media Session API.

youenn: I would distinguish between capturer and capturee. For capturer, I think this should remain in WebRTC land.
… For the capturee side, most would still be owned by Media Session API.

tommy: Some of the things that you're targeting, I don't think they mean much for Media Session. We're providing the action when a page has audio focus, and this is slightly different here.

elad: I'm concerned that we need opt-in on both sides here (capturer and capturee) and we don't really know whether there's appetite for this. I would suggest that we start with something like original trial.

jan-ivar: I think we made a good job at separating concerns into issues. I would start there.

https://github.com/w3c/mediasession/issues/274

jan-ivar: We would need to look at concrete concerns first and then look at whether there is overlap between actions defined in Media Session and Capture Handle.

youenn: It makes sense to me to start with #274. If we see that we cannot do it, then that gives us a reasonable clue that we may need to stop looking at other issues.

cpn: Key issue as to whether we see Capture Handle as being closely related to having audio focus.

youenn: Some input from François Beaufort in the issues.

tommy: The existing Picture-in-Picture API allows you to open a video element in an always-on-top window. The experimentation is with extending it to arbitrary DOM elements.
… Do we want to do this depends on whether we see motivation to do this.
… I think we'd have to change the API in a way that would be backward-compatible to handle the opt-in.

youenn: We could certainly have opt-in with booleans for instance, add dictionary arguments. To say "I'm ok getting more callbacks than just UA actions".

https://github.com/w3c/mediasession/issues/275

youenn: Also, capturee might want to restrict some of these actions based on the origin. "Is it somebody I trust?"

elad: Only a single entity can receive the actions, right?

tommy: In Chrome, we would only send the pause/play actions to the tab we consider active, but there are other actions and situations where we can share the actions. All can receive actions at the same time.

elad: So a) actions are not exactly the same, b) velocity will slowed down, c) I haven't heard compelling reasons to reuse Media Session here.

frank: I also tend to think that this is a marriage of convenience rather than a good match.
… Not sure users will actually use the action through media session.

jan-ivar: There's a chicken and egg issue. Depending on the software I use, e.g. Google slides in Chrome may have built-in controls. But this is brand new. It's still clear that I would need to have the buttons.
… I'm not too interested on the technicalities of the solution. It's not clear to me what the criteria are to be or not to be in the list, e.g. toggle microphone / camera. It may be that the same arguments apply here.

jer: The original set of Media Session were added to give hardware-related controls to web pages. E.g. play/pause/next track/previous track very often appear on keyboards.
… Then it made sense to represent the current playing state, current time, slider. The media session actions grew organically out of these use cases.
… For communications, the system provides mute/unmute/... actions. It was a natural fit to extend the list of media sessions for these system-provided actions.
… That's the background. Some developers wanted to have something like media session to access further keyboard controls.

[scribe missed question]

jan-ivar: Thanks for the background. Interesting that this has come up before. Maybe it's ok to discuss some of the consequent capturer/capturee use cases.
… This might allow us to justify the decision for actions.
… Would there be any use for a capturer to issue play/pause/next track/previous track actions?

youenn: We discussed that last time. I think there were rough agreement that mute/unmute/toggle camera would not be a great fit for Capture Handle. But play/pause/next track/etc. can still make sense when e.g. there are videos on a slide.
… I would expect that other actions on top of next/previous slide would fit.

jan-ivar: About the Picture-in-Picture mode, what can you do already?
… When you're presenting, it would be nice to have next/previous slide on that.

youenn: Maybe it would work. Question we asked during previous meeting was whether we could reuse next track/previous track for slides.
… One example is a page that registers both.

jan-ivar: I agree it is a detail here. What's interesting here is that the idea of what is a media session gets complicated during capture, and maybe web developers can get some help here that there are multiple origins involved.
… Media session could be a good spec where we could include the concept of actions linked to a secondary captured origin.
… That seems valuable to at least explore to me.
… If the working group wants to dismiss that out of the gate, I can see arguments for that too.

jer: Something that occurred to me. If we had added Next/Previous slides to start with, a lot of this conversation would be moot. We'd have to figure things out how things worked during capture.
… I can imagine a different API shape where the capturer would delegate media session actions to the capturee and everything would fall out from that.
… The user agent would pass the actions to the capturee. No cross-origin issue.
… I realize that this is a big departure from what we're currently talking about.
… The Media Session API is the API that is currently used to deliver UA commands to applications. It seems reasonable to look into extending it to other non-purely media related scenarios.

youenn: It could be complementary.

jan-ivar: It could also be browser-provided buttons.
… Another benefit of having next slide, previous slide is that it would not require tight integration between capturer and capturee.
… It would definitely seem useful if we can get an API that can look into "what about malicious applications" scenarios.

frank: I want to be sure I understand the model. I would register media session actions for next page / previous page. And if someone is capturing, then that could be another source of actions. Would someone want to use media session actions apart from capture?

jan-ivar: Definitely. For instance if I'm projecting on another monitor. Remote clicker, remote control, etc.
… It provides more context to the user agent.

tommy: I don't see a world where we would take over the next/previous track for an application. People get annoyed in the rare occasions when we need to do it. If you're not capturing, then you're on the page itself.

jan-ivar: I wasn't implying that. Linked to an active presentation mode. How that gets defined is up for discussion. User agent should be able to detect whether presentation is occurring.

elad: Do we want to have more actions? I think the answer would be generally yes. Do we want to tie the two mechanisms? Media session and capture? Here, I'm not hearing new requirements. Have requests come from web developers that this would be useful?
… Here, we're trying to create a close-ish relationship between two applications. That makes sense. But the rest seems a bit distant from what I've been hearing web developers ask about.

jer: Insofar as we have had web developers input on keyboard media actions, I don't recall that they requested tie-in between current media session actions and capture scenarios actions. Even if there is a separate API or entry point for this kind of things, there would need to be some coordination that happens if two things register for the same hardware button.

jan-ivar: Microphone and camera don't seem to fit in with a regular media session, they only fit with a presentation.
… But then presentation linked to microphone and camera, this is only true if you're presenting online, not if you're presenting in a conference room for instance.

youenn: I'm wondering what we should do on next steps.
… Should we summarize the discussion on GitHub and ask people to comment there?

cpn: If we could get agreement around whether the next slide / previous slide actions could be added to Media Session. I'm hearing that they were not initially considered to be in scope but not hearing objections so far to include those.
… And then there's obviously the broader question of, assuming there's a concept of an active presentation session, the proper routing of the actions to the capturee.

hta: It's kind of interesting that we haven't defined what it means to toggle the camera if an active presentation is ongoing.

cpn: I would suggest to capture that in a separate issue against Media Session.
… If we think that Media Session is not the right home for this, where would be the right home? It would be useful to me to compare some different options.

elad: The immediate alternative is the Capture handle actions registry repository. We should at least compare to that.

cpn: And then perhaps the argument to put it into Media Session is if there are applications that want to use the actions outside of presentations.

elad: Exactly. Right now, I don't know whether there is appetite for that.
… I think what matters is a mechanism to deliver from capturer to capturee. If you add next slide / previous slide, it may be that you actually want to listen to scroll down / scroll up.
… What's missing here is a way to connect capturee and capturer.

cpn: I would recommend that we capture all of this in GitHub issues. How long should we wait before we meet again on this topic?
… Proposing 4 weeks.

Minutes manually created (not a transcript), formatted by scribe.perl version 185 (Thu Dec 2 18:51:55 2021 UTC).