15:53:46 <RRSAgent> RRSAgent has joined #mediawg
15:53:46 <RRSAgent> logging to https://www.w3.org/2022/04/25-mediawg-irc
15:53:49 <Zakim> Zakim has joined #mediawg
15:53:56 <tidoust> RRSAgent, make logs public
15:54:32 <tidoust> Meeting: WebRTC WG / Media WG Joint Meeting
15:55:04 <tidoust> Agenda: https://github.com/w3c/media-wg/blob/main/meetings/2022-04-25-WebRTC_Working_Group_Media_Working_Group_Joint_Meeting-agenda.md
16:01:05 <tidoust> present+ Francois_Daoust, Bernard_Aboba, Chris_Needham, Eric_Carlson, Jan-Ivar_Bruaroey, Youenn_Fablet
16:01:41 <dom> dom has joined #mediawg
16:02:17 <cpn> scribe+ cpn
16:03:37 <tidoust> present+ Jer_Noble
16:03:54 <dom> Present+
16:04:05 <tidoust> present+ Elad_Alon
16:04:56 <tidoust> present+ Greg_Freedman
16:05:11 <jib> jib has joined #mediawg
16:05:11 <jernoble> jernoble has joined #mediawg
16:05:22 <tidoust> Chair: Bernard Aboba, Jan-Ivar Bruaroey, Harald Alvestrand (WebRTC WG), Jer Noble, Chris Needham (Media WG)
16:05:44 <steely-glint> steely-glint has joined #mediawg
16:06:12 <eric_carlson> eric_carlson has joined #mediawg
16:06:22 <tidoust> present+ Harald_Alvestrand
16:06:36 <jernoble> Topic: Capture Handle Actions & Media Session Actions
16:06:42 <tidoust> scribe+ jernoble
16:07:09 <jernoble> jan-ivar: Two specifications, Capture Handle Identity & Capture Handle Actions
16:07:17 <tidoust> -> https://w3c.github.io/mediacapture-handle/identity/index.html Capture Handle Identity
16:07:17 <steely-glint> Can someone PM me the webex password - somehow my password manager has got into a fight with the w3c's auth - Thanks.
16:07:32 <tidoust> -> https://w3c.github.io/mediacapture-handle/actions/index.html Capture Handle Actions
16:07:54 <jernoble> ... When presenting via a WebRTC Call, if the presentation is in another browser window, the user may background the browser tab/view which is running the presentation
16:08:19 <jernoble> ... The use case would be an integrated solution, where video conferencing and presentation are going at the same time
16:08:49 <jernoble> ... short of that, it would be good to be able to control the presentation through the browser view which is doing the call
16:09:15 <jernoble> ... The goal is to standardize the actions which can be supported through presentation/capture pairs
16:10:26 <jernoble> youenn: Media Session one use cases are to control the page, from among other use cases, to control the page through e.g. a Picture-in-Picture window
16:10:30 <tidoust> present+ Chris_Cunningham
16:10:38 <jernoble> ... This seems very similar to what we are trying to achieve through Media Session actions.
16:10:54 <jernoble> ... Except in this case, the actions would be sent to the capturing page, not to the presentation page
16:11:24 <jernoble> ... One thing that is nice about Media Session Actions is that they are already in place and are supported
16:11:44 <jernoble> ... It would be nice if we could be re-use the existing supported API for WebRTC, rather than come up with our own
16:12:09 <jernoble> ... If we were able to share or re-use that API, adoption would come for free.
16:13:05 <jernoble> ... On the other hand, if it's required to adopt a new, capture-only API, while WebRTC clients may support it, sites which the user is capturing (e.g. presentation sites) may not add support for those actions
16:13:34 <jernoble> jean-yvar: Media Session does appear to pave the way for useful actions, like "next" and "previous track", however these do not work on Google Slides today.
16:13:52 <jernoble> ... Perhaps because these are more music or media related; are "next track" the same as "next slide"?
16:14:35 <jernoble> ... We considered whether these were in-process buttons provided by the page; however, would those instead be UI presented by the UA itself?
16:15:21 <jernoble> ... There are security concerns; for example, even with a UG requirement, it's the capturing page that is controlling the presentation page; there may be security implications to allowing these pages to communicate or control cross-site.
16:15:34 <jernoble> Elad: Things I've heard from web developers:
16:15:35 <dom> s/jean/jan/
16:15:49 <jernoble> ... 1. they don't really like UA provided controls; it clashes with their own UI
16:16:02 <jib> s/jean-yvar/jan-ivar/
16:16:03 <jernoble> ... And they can't provide a consistent UI across different browsers.
16:16:44 <tidoust> present+ Tim_Panton
16:16:45 <jernoble> ... 2. When video capturing site captures Google Slides; there's a login pattern where they require the same account to be signed into both tabs
16:17:23 <jernoble> ... This is a pattern they require. It's not clear that even if this API existed, that sites would want arbitrary other sites to control their presentation
16:19:14 <jernoble> youenn: That would be something worth investigating with Google.
16:19:54 <jernoble> Elad: to make this work generally, the API has to provide more information; namely the origin that generated the message.
16:20:26 <jernoble> youenn: that's something that's already solvable through javascript
16:20:44 <jernoble> ... We're targeting the 80% case, and allow JS to handle the 20%
16:21:27 <cpn> jernoble: This sounds like something broadcastchannel already provides
16:22:21 <cpn> elad: If you have multiple sessions with google slides, you don't want them all to respond
16:22:41 <cpn> ... so use capture handle identity, and capture handle actions that lets you talk directly to the thing you're capturing
16:23:30 <cpn> jan-ivar: the concern is that only lead to siloing, can we provide a baseline set of actions that need a minimal setup, the 80-20 case
16:24:03 <cpn> harald: With media  actions there's an interop concern with different device buttons and applications
16:24:55 <jernoble> Harald: The goal would be to allow a page written by a google developer to control presentations written by Microsoft Office or vice versa
16:25:53 <jernoble> ... Something to consider is if a common registry of actions and models between different presentation types (Spotify vs. Slideshows, e.g.)
16:26:24 <jernoble> ... And the Media Session actions have a lot of metadata about those actions (speed, seeking to particular time)
16:26:39 <dom> q?
16:27:09 <dom> q+
16:27:50 <jernoble> jan-ivar: two options: we could have two APIs where developers could have to opt into both and separate implementations, or there could be a single API that's driven by either hardware buttons or a capturing sites
16:28:42 <jernoble> dom: I want to give other examples of where sites are using these apis already. For example, when embedding a YouTube video, sites must use postMessage to communicate with the embedded player
16:28:57 <jernoble> ... There has been a natural convergence on these APIs in a non-standard way.
16:29:33 <jernoble> ... So this is an example of an existing situation where different sites/origins want to communicate actions to each other
16:30:09 <jernoble> ... It would be useful to reduce the semantics across these use cases to a common set.
16:30:19 <dom> ack me
16:30:53 <jernoble> Elad: We should not go with an API shape that makes everything work with existing sites; there are security implications to allowing sending messages cross-origin
16:31:18 <jernoble> youenn: we need to study and enumerate those security issues and provide mitigations if necessary
16:31:43 <jernoble> cpn: Can we hear from someone from a Media Session perspective?
16:32:18 <jernoble> ... Are we imagining a combined set of actions between media and presentation use cases?
16:33:07 <jernoble> eric_carlson: I can imagine a page wanting to provide both media actions and slide actions; so having separate actions for the two use cases would remove the possibility of confusion about which action to perform
16:33:25 <jernoble> ... and we have already added new actions to the MEdia Session API
16:33:43 <jernoble> youenn: Agreed, you may want to "play/pause" media within a slide in a presentation
16:33:56 <jernoble> ... The Media Session registry could handle that
16:34:07 <jernoble> jan-yvar: What does the "hangup" action do?
16:34:49 <jernoble> eric_carlson: It allows UAs to provide a "mute" or "hangup" action similar to the one a page would provide
16:35:33 <jernoble> jan-yvar: A conservative view would be that Media Session is narrowly about AV playback; however "mute" and "hangup" are more about camera capture
16:35:51 <jernoble> ... would people think we should re-use "next track" and "previous track" actions to support page changes?
16:36:38 <cpn> jernoble: web authors have wanted to reuse the media session API to move between slides, so seems reasonable to add actions for those cases
16:37:04 <jernoble> Elan: How do sites know what actions are supported across origins?
16:37:23 <jernoble> ... e.g., how do sites know whether they should send the 'next track' or 'next slide' action?
16:37:43 <jernoble> youenn: for WebRTC, the site might need to know what actions are registered.
16:38:13 <jernoble> ... Perhaps we need to provide that information through a new capture api
16:38:43 <jernoble> Elan: from the side of the site being captured; it's not confusing
16:39:12 <jernoble> ... but from the capturing side, it could get confusing about which action should be sent
16:39:39 <jernoble> ... what happens when the user hits the "next" button on their keyboard?
16:41:14 <cpn> jernoble: The UA knows which actions have been registered so can route the user input from hardware controls accordingly
16:41:49 <cpn> ... You want the action to go to the frontmost, as least in one implementation it goes to the current playing browser tab
16:42:17 <cpn> ... This is outside the spec, on iOS only one thing can play audio at a time, so it would be the most recently played browser tab
16:42:40 <cpn> ... For MacOS where multiple things can play audio, it would be the one that most recently started playing
16:43:38 <cpn> dom: it seems to me we should try to figure out how to move forward with the broader discussion on whether application semantics can be exposed to the browser, and to sites
16:44:20 <cpn> ... part of the question is: is next/previous slide, something that could get traction. Question of feasibility. Would sites implement and would browsers provide controls in their chrome
16:44:40 <cpn> ... For website to website, there's a security framework question, can we delegate controls and under what conditions?
16:44:51 <cpn> ... How to go about discussing more deeply?
16:45:30 <cpn> jan-ivar: if media session wanted to move closer to capture actions, by using next/prev slide there'd have to be a current capture session. I can open issues on Media Capture Session if that's a way forward
16:46:08 <nigel> nigel has joined #mediawg
16:46:33 <cpn> eric: sounds good to me
16:47:05 <cpn> chcunningham: I'll check with the Media Session team internally here, current editors have moved on, and I'll reach out them to nominate a new editor
16:47:15 <cpn> ... if others want to edit the spec, that would be welcome
16:48:01 <jernoble> cpn: Are we seeing that control within a page can influence actions on the captured page?
16:48:33 <jernoble> ... it's my understanding that media session API is to allow the UA to control a page; does this fit with the design of Media Session to allow another page to send actions rather than the UA?
16:48:45 <jernoble> eric_carlson: It does make sense for me.
16:49:11 <jernoble> jan-ivar: there are security implications; perhaps "toggle mic" is not the best thing to expose cross site
16:49:34 <jernoble> ... there's also another argument that you can use morse-code (or similar) to communicated arbitrary data across
16:49:50 <jernoble> ... however, for capture, there's already a lot of information flowing from the captured page to the capturer
16:50:28 <jernoble> dom: It's not just security across the two sites; it's also about the impact to the end user. This will require analysis of the risks the end user will face.
16:50:57 <jernoble> jan-ivar: This is why remote control of a site is out of scope for WebRTC.
16:51:38 <jernoble> dom: It is the recipient's understanding that the action is coming from the UA and not another site
16:51:57 <jernoble> ... the expectations of the two may not match.
16:52:08 <jernoble> ... this may not be a real issue, but it does need analysis.
16:52:37 <jernoble> harald: if the event can come from multiple sources, the message should include enough information to tell the difference between the sources.
16:53:20 <jernoble> Elad: There are 3 levels: 1. knowing that this came from another origin, 2. knowing the origin that the message came from, and 3. knowing which user on that other origin issued the message.
16:54:12 <jernoble> Tim: I would like to refine that and say above and beyond that the message came from another site, but that it came from a local user. How do we know that the event didn't originate outside the local machine, like another user on the call?
16:54:28 <dom> [shared control of slideset would actually be useful too]
16:54:46 <jernoble> ... We should be more distinct about whether we can prove that the local user was the origin of the message
16:54:57 <jernoble> Elad: And a user gesture requirement does not guarantee the intent
16:55:20 <jernoble> Tim: We do need careful thought about these potential security issues
16:55:46 <jernoble> Elad: That is why I think we need the remote site to adopt a specific API, as a caveat-emptor
16:56:10 <jernoble> Tim: We need more in the origin than just the origin, if that makes sense.
16:56:40 <jernoble> jan-ivar: We have an existing issue to whether we should extend Media Session to support new actions
16:56:59 <jernoble> ... We would need a separate issue to track whether actions should be sent across origins.
16:57:18 <jernoble> cpn: Would we use the Media Session repo for these discussions?
16:58:04 <jernoble> jan-ivar: The questions raised are more for Media Session; to consider whether the scope of Media Session should be expanded to send actions from a page
16:58:31 <jernoble> Elad: What is the argument for using Media Session if we need specific adoption?
16:58:52 <jernoble> ... are the two APIs truly similar enough to justify only a single API surface for both?
16:59:04 <tidoust> RRSAgent, draft minutes
16:59:04 <RRSAgent> I have made the request to generate https://www.w3.org/2022/04/25-mediawg-minutes.html tidoust
16:59:30 <jernoble> Harald: we have competing concerns: both functional concerns about having the correct thing happen when you press a button, and security concerns as well.
16:59:36 <tidoust> present+ Tommy_Steimel
17:00:12 <jernoble> ACTION: capture these concerns and issues in the Media Session github
17:00:28 <jernoble> ACTION: Chris to follow up internally about new editors for the Media Session specification itself.
17:00:39 <jernoble> cpn: what would be the timeline for this?
17:00:51 <jernoble> Harald: two weeks would be good; four weeks at the maximum
17:01:10 <jernoble> cpn: lets continue to work together across the two WGs.
17:02:19 <tidoust> RRSAgent, draft minutes
17:02:19 <RRSAgent> I have made the request to generate https://www.w3.org/2022/04/25-mediawg-minutes.html tidoust
17:17:17 <nigel> nigel has joined #mediawg
17:35:42 <nigel> nigel has joined #mediawg
17:54:24 <nigel> nigel has joined #mediawg
18:11:17 <nigel> nigel has joined #mediawg
18:40:55 <nigel> nigel has joined #mediawg
19:07:33 <nigel> nigel has joined #mediawg
19:39:11 <Zakim> Zakim has left #mediawg
19:44:30 <nigel> nigel has joined #mediawg
20:42:07 <nigel> nigel has joined #mediawg
22:43:42 <nigel> nigel has joined #mediawg