15:53:46 RRSAgent has joined #mediawg 15:53:46 logging to https://www.w3.org/2022/04/25-mediawg-irc 15:53:49 Zakim has joined #mediawg 15:53:56 RRSAgent, make logs public 15:54:32 Meeting: WebRTC WG / Media WG Joint Meeting 15:55:04 Agenda: https://github.com/w3c/media-wg/blob/main/meetings/2022-04-25-WebRTC_Working_Group_Media_Working_Group_Joint_Meeting-agenda.md 16:01:05 present+ Francois_Daoust, Bernard_Aboba, Chris_Needham, Eric_Carlson, Jan-Ivar_Bruaroey, Youenn_Fablet 16:01:41 dom has joined #mediawg 16:02:17 scribe+ cpn 16:03:37 present+ Jer_Noble 16:03:54 Present+ 16:04:05 present+ Elad_Alon 16:04:56 present+ Greg_Freedman 16:05:11 jib has joined #mediawg 16:05:11 jernoble has joined #mediawg 16:05:22 Chair: Bernard Aboba, Jan-Ivar Bruaroey, Harald Alvestrand (WebRTC WG), Jer Noble, Chris Needham (Media WG) 16:05:44 steely-glint has joined #mediawg 16:06:12 eric_carlson has joined #mediawg 16:06:22 present+ Harald_Alvestrand 16:06:36 Topic: Capture Handle Actions & Media Session Actions 16:06:42 scribe+ jernoble 16:07:09 jan-ivar: Two specifications, Capture Handle Identity & Capture Handle Actions 16:07:17 -> https://w3c.github.io/mediacapture-handle/identity/index.html Capture Handle Identity 16:07:17 Can someone PM me the webex password - somehow my password manager has got into a fight with the w3c's auth - Thanks. 16:07:32 -> https://w3c.github.io/mediacapture-handle/actions/index.html Capture Handle Actions 16:07:54 ... When presenting via a WebRTC Call, if the presentation is in another browser window, the user may background the browser tab/view which is running the presentation 16:08:19 ... The use case would be an integrated solution, where video conferencing and presentation are going at the same time 16:08:49 ... short of that, it would be good to be able to control the presentation through the browser view which is doing the call 16:09:15 ... The goal is to standardize the actions which can be supported through presentation/capture pairs 16:10:26 youenn: Media Session one use cases are to control the page, from among other use cases, to control the page through e.g. a Picture-in-Picture window 16:10:30 present+ Chris_Cunningham 16:10:38 ... This seems very similar to what we are trying to achieve through Media Session actions. 16:10:54 ... Except in this case, the actions would be sent to the capturing page, not to the presentation page 16:11:24 ... One thing that is nice about Media Session Actions is that they are already in place and are supported 16:11:44 ... It would be nice if we could be re-use the existing supported API for WebRTC, rather than come up with our own 16:12:09 ... If we were able to share or re-use that API, adoption would come for free. 16:13:05 ... On the other hand, if it's required to adopt a new, capture-only API, while WebRTC clients may support it, sites which the user is capturing (e.g. presentation sites) may not add support for those actions 16:13:34 jean-yvar: Media Session does appear to pave the way for useful actions, like "next" and "previous track", however these do not work on Google Slides today. 16:13:52 ... Perhaps because these are more music or media related; are "next track" the same as "next slide"? 16:14:35 ... We considered whether these were in-process buttons provided by the page; however, would those instead be UI presented by the UA itself? 16:15:21 ... There are security concerns; for example, even with a UG requirement, it's the capturing page that is controlling the presentation page; there may be security implications to allowing these pages to communicate or control cross-site. 16:15:34 Elad: Things I've heard from web developers: 16:15:35 s/jean/jan/ 16:15:49 ... 1. they don't really like UA provided controls; it clashes with their own UI 16:16:02 s/jean-yvar/jan-ivar/ 16:16:03 ... And they can't provide a consistent UI across different browsers. 16:16:44 present+ Tim_Panton 16:16:45 ... 2. When video capturing site captures Google Slides; there's a login pattern where they require the same account to be signed into both tabs 16:17:23 ... This is a pattern they require. It's not clear that even if this API existed, that sites would want arbitrary other sites to control their presentation 16:19:14 youenn: That would be something worth investigating with Google. 16:19:54 Elad: to make this work generally, the API has to provide more information; namely the origin that generated the message. 16:20:26 youenn: that's something that's already solvable through javascript 16:20:44 ... We're targeting the 80% case, and allow JS to handle the 20% 16:21:27 jernoble: This sounds like something broadcastchannel already provides 16:22:21 elad: If you have multiple sessions with google slides, you don't want them all to respond 16:22:41 ... so use capture handle identity, and capture handle actions that lets you talk directly to the thing you're capturing 16:23:30 jan-ivar: the concern is that only lead to siloing, can we provide a baseline set of actions that need a minimal setup, the 80-20 case 16:24:03 harald: With media actions there's an interop concern with different device buttons and applications 16:24:55 Harald: The goal would be to allow a page written by a google developer to control presentations written by Microsoft Office or vice versa 16:25:53 ... Something to consider is if a common registry of actions and models between different presentation types (Spotify vs. Slideshows, e.g.) 16:26:24 ... And the Media Session actions have a lot of metadata about those actions (speed, seeking to particular time) 16:26:39 q? 16:27:09 q+ 16:27:50 jan-ivar: two options: we could have two APIs where developers could have to opt into both and separate implementations, or there could be a single API that's driven by either hardware buttons or a capturing sites 16:28:42 dom: I want to give other examples of where sites are using these apis already. For example, when embedding a YouTube video, sites must use postMessage to communicate with the embedded player 16:28:57 ... There has been a natural convergence on these APIs in a non-standard way. 16:29:33 ... So this is an example of an existing situation where different sites/origins want to communicate actions to each other 16:30:09 ... It would be useful to reduce the semantics across these use cases to a common set. 16:30:19 ack me 16:30:53 Elad: We should not go with an API shape that makes everything work with existing sites; there are security implications to allowing sending messages cross-origin 16:31:18 youenn: we need to study and enumerate those security issues and provide mitigations if necessary 16:31:43 cpn: Can we hear from someone from a Media Session perspective? 16:32:18 ... Are we imagining a combined set of actions between media and presentation use cases? 16:33:07 eric_carlson: I can imagine a page wanting to provide both media actions and slide actions; so having separate actions for the two use cases would remove the possibility of confusion about which action to perform 16:33:25 ... and we have already added new actions to the MEdia Session API 16:33:43 youenn: Agreed, you may want to "play/pause" media within a slide in a presentation 16:33:56 ... The Media Session registry could handle that 16:34:07 jan-yvar: What does the "hangup" action do? 16:34:49 eric_carlson: It allows UAs to provide a "mute" or "hangup" action similar to the one a page would provide 16:35:33 jan-yvar: A conservative view would be that Media Session is narrowly about AV playback; however "mute" and "hangup" are more about camera capture 16:35:51 ... would people think we should re-use "next track" and "previous track" actions to support page changes? 16:36:38 jernoble: web authors have wanted to reuse the media session API to move between slides, so seems reasonable to add actions for those cases 16:37:04 Elan: How do sites know what actions are supported across origins? 16:37:23 ... e.g., how do sites know whether they should send the 'next track' or 'next slide' action? 16:37:43 youenn: for WebRTC, the site might need to know what actions are registered. 16:38:13 ... Perhaps we need to provide that information through a new capture api 16:38:43 Elan: from the side of the site being captured; it's not confusing 16:39:12 ... but from the capturing side, it could get confusing about which action should be sent 16:39:39 ... what happens when the user hits the "next" button on their keyboard? 16:41:14 jernoble: The UA knows which actions have been registered so can route the user input from hardware controls accordingly 16:41:49 ... You want the action to go to the frontmost, as least in one implementation it goes to the current playing browser tab 16:42:17 ... This is outside the spec, on iOS only one thing can play audio at a time, so it would be the most recently played browser tab 16:42:40 ... For MacOS where multiple things can play audio, it would be the one that most recently started playing 16:43:38 dom: it seems to me we should try to figure out how to move forward with the broader discussion on whether application semantics can be exposed to the browser, and to sites 16:44:20 ... part of the question is: is next/previous slide, something that could get traction. Question of feasibility. Would sites implement and would browsers provide controls in their chrome 16:44:40 ... For website to website, there's a security framework question, can we delegate controls and under what conditions? 16:44:51 ... How to go about discussing more deeply? 16:45:30 jan-ivar: if media session wanted to move closer to capture actions, by using next/prev slide there'd have to be a current capture session. I can open issues on Media Capture Session if that's a way forward 16:46:08 nigel has joined #mediawg 16:46:33 eric: sounds good to me 16:47:05 chcunningham: I'll check with the Media Session team internally here, current editors have moved on, and I'll reach out them to nominate a new editor 16:47:15 ... if others want to edit the spec, that would be welcome 16:48:01 cpn: Are we seeing that control within a page can influence actions on the captured page? 16:48:33 ... it's my understanding that media session API is to allow the UA to control a page; does this fit with the design of Media Session to allow another page to send actions rather than the UA? 16:48:45 eric_carlson: It does make sense for me. 16:49:11 jan-ivar: there are security implications; perhaps "toggle mic" is not the best thing to expose cross site 16:49:34 ... there's also another argument that you can use morse-code (or similar) to communicated arbitrary data across 16:49:50 ... however, for capture, there's already a lot of information flowing from the captured page to the capturer 16:50:28 dom: It's not just security across the two sites; it's also about the impact to the end user. This will require analysis of the risks the end user will face. 16:50:57 jan-ivar: This is why remote control of a site is out of scope for WebRTC. 16:51:38 dom: It is the recipient's understanding that the action is coming from the UA and not another site 16:51:57 ... the expectations of the two may not match. 16:52:08 ... this may not be a real issue, but it does need analysis. 16:52:37 harald: if the event can come from multiple sources, the message should include enough information to tell the difference between the sources. 16:53:20 Elad: There are 3 levels: 1. knowing that this came from another origin, 2. knowing the origin that the message came from, and 3. knowing which user on that other origin issued the message. 16:54:12 Tim: I would like to refine that and say above and beyond that the message came from another site, but that it came from a local user. How do we know that the event didn't originate outside the local machine, like another user on the call? 16:54:28 [shared control of slideset would actually be useful too] 16:54:46 ... We should be more distinct about whether we can prove that the local user was the origin of the message 16:54:57 Elad: And a user gesture requirement does not guarantee the intent 16:55:20 Tim: We do need careful thought about these potential security issues 16:55:46 Elad: That is why I think we need the remote site to adopt a specific API, as a caveat-emptor 16:56:10 Tim: We need more in the origin than just the origin, if that makes sense. 16:56:40 jan-ivar: We have an existing issue to whether we should extend Media Session to support new actions 16:56:59 ... We would need a separate issue to track whether actions should be sent across origins. 16:57:18 cpn: Would we use the Media Session repo for these discussions? 16:58:04 jan-ivar: The questions raised are more for Media Session; to consider whether the scope of Media Session should be expanded to send actions from a page 16:58:31 Elad: What is the argument for using Media Session if we need specific adoption? 16:58:52 ... are the two APIs truly similar enough to justify only a single API surface for both? 16:59:04 RRSAgent, draft minutes 16:59:04 I have made the request to generate https://www.w3.org/2022/04/25-mediawg-minutes.html tidoust 16:59:30 Harald: we have competing concerns: both functional concerns about having the correct thing happen when you press a button, and security concerns as well. 16:59:36 present+ Tommy_Steimel 17:00:12 ACTION: capture these concerns and issues in the Media Session github 17:00:28 ACTION: Chris to follow up internally about new editors for the Media Session specification itself. 17:00:39 cpn: what would be the timeline for this? 17:00:51 Harald: two weeks would be good; four weeks at the maximum 17:01:10 cpn: lets continue to work together across the two WGs. 17:02:19 RRSAgent, draft minutes 17:02:19 I have made the request to generate https://www.w3.org/2022/04/25-mediawg-minutes.html tidoust 17:17:17 nigel has joined #mediawg 17:35:42 nigel has joined #mediawg 17:54:24 nigel has joined #mediawg 18:11:17 nigel has joined #mediawg 18:40:55 nigel has joined #mediawg 19:07:33 nigel has joined #mediawg 19:39:11 Zakim has left #mediawg 19:44:30 nigel has joined #mediawg 20:42:07 nigel has joined #mediawg 22:43:42 nigel has joined #mediawg