Capture Handle - Bootstrapping Collaboration when Screensharing

Abstract

This document proposes a mechanism by which an application APP can opt-in to exposing certain information with another application CAPTR, if CAPTR is screen-capturing the tab in which APP is running. It describes a mechanism for tab capture only.

Consider a web-application, running in one tab, which we’ll name "main_app." Assume main_app calls getDisplayMedia and the user chooses to share another tab, where an application is running which we’ll call "captured_app."

Note that:

main_app does not know what it is capturing.
captured_app does not know that it is being captured; let alone by whom.

Both these traits are desirable for the general case, but there exist legitimate use cases where the browser would want to allow applications to opt-in to bridging that gap and enable a connection.

We wish to enable the legitimate use cases while keeping the general case as it was before.

Consider a collaborating presentation software and video-conferencing software. Assume the user is in a VC session. The user starts sharing a presentation. Both applications are interested in letting the VC app discover that it is capturing a slides session, which application, and even which session, so that the VC application will be able to expose controls to the user for flipping through slides. When the user clicks those controls, the VC app will be able to send messages to the presentation app, requesting that it do such things as flip through slides, enter/leave presentation-mode, etc.

The means for transmitting these messages are outside the scope of this document. Some options are:

Shared cloud infrastructure.
Messaging via a worker. (Note: Storage Partitioning might disrupt this option.)
A rudimentary messaging API might be added expressly for this purpose.

Capturing applications often wish to gather statistics over what applications their users tend to capture. For example, VC applications would like to know how often their users share presentation applications from specific providers, Wikipedia, CNN, etc. Gathering such information can be used to improve service for the users by introducing new collaborations, such as the one described above.

Users sometimes choose to share the wrong tab. Sometimes they switch to sharing the wrong tab by clicking the share-this-tab-instead button by mistake. A benevolent application could try to protect the user by presenting an in-app dialog for re-confirmation, if they believe that the user may have made a mistake.

This use-case is a sub-case of #3, but deserves its own section due to its importance. The "Hall of Mirrors" effect occurs when users choose to share the tab in which the VC call takes place. When detecting self-capture, a VC application can avoid displaying the captured stream back to the user, thereby avoiding the dreaded effect.

Applications are allowed to expose information to capturing applications. They would typically do so before knowing if they even are captured. The mechanism used is calling setCaptureHandleConfig with an appropriate CaptureHandleConfig.

The CaptureHandleConfig dictionary is used to instruct the user agent what information the captured application intends to expose, and to which applications it is willing to expose said information.

WebIDLdictionary CaptureHandleConfig {
  boolean exposeOrigin = false;
  DOMString handle = "";
  sequence<DOMString> permittedOrigins = [];
};

exposeOrigin

If true, the user agent MUST expose the captured application's origin through the origin field of CaptureHandle. If false, the user agent MUST NOT expose the captured application's origin.

handle

The user agent MUST expose this value as handle.

Note: Values to this field are limited to 1024 16-bit characters. This limitation is specified further in setCaptureHandleConfig.

permittedOrigins

Valid values of this field include:

The empty list.
A list with the single item "*"
A list consisting of valid origins.

If permittedOrigins consists of the single item "*", then the CaptureHandle is observable by all capturers. Otherwise, CaptureHandle is observable only to capturers whose origin is lists in permittedOrigins.

MediaDevices is extended with a method - setCaptureHandleConfig - which accepts a CaptureHandleConfig object. By calling this method, an application informs the user agent which information it permits capturing applications to observe.

Note

There is no consensus yet on how setCaptureHandleConfig should behave if called more than once, due to concerns over it being misused as a cross-origin messaging channel itself. This is under discussion in issue #11.

WebIDLpartial interface MediaDevices {
  undefined setCaptureHandleConfig(optional CaptureHandleConfig config = {});
};

setCaptureHandleConfig

The user agent MUST run the following validations:

If handle is set to an invalid value, the user agent MUST reject by raising TypeError.
If permittedOrigins is set to an invalid value, the user agent MUST reject by raising NotSupportedError.
If the call to setCaptureHandleConfig() is not from the top-level browsing context, the user agent MUST reject by raising InvalidStateError.

If all validations passed, the user agent MUST accept the new config. The user agent MUST forget any previous call to setCaptureHandleConfig; from now on, the application's CaptureHandleConfig is config.

The observable CaptureHandle is re-evaluated for all capturing applications.

For every capturing application for which the new observable CaptureHandle is different than prior to the call to setCaptureHandleConfig, the user agent MUST fire an event named capturehandlechange.
The user agent MUST report the new observable CaptureHandle whenever getCaptureHandle is called.

Capturing applications which are permitted to observe a track's CaptureHandle have two ways of reading it.

Reading the current value returned by getCaptureHandle.
Registering an EventListener at oncapturehandlechange.

The user agent exposes information about the captured application to the capturing application through the CaptureHandle dictionary. Note that a CaptureHandle object MUST NOT be given to a capturing application that is not permited to observe it.

WebIDLdictionary CaptureHandle {
  DOMString origin;
  DOMString handle;
};

origin: If the captured application opted-in to exposing its origin (by setting exposeOrigin to true), then the user agent MUST set origin to the origin of the captured application. Otherwise, origin is not set.
handle: The user agent MUST set this field to the value which the captured application set in handle.

Extend MediaStreamTrack with a method called getCaptureHandle. When the MediaStreamTrack is a video track derived of screen-capture, getCaptureHandle returns the latest observable CaptureHandle. Otherwise it returns null.

Note

There is no consensus yet on whether getCaptureHandle belongs on MediaStreamTrack or on a dedicated controller object that is neither clonable nor transferable, to separate messaging affecting all tracks from consumption of a single track. This is under discussion in issue #12.

WebIDLpartial interface MediaStreamTrack {
  CaptureHandle? getCaptureHandle();
};

getCaptureHandle

If the track in question is not a video track, or does not represent a browser display surface, then the user agent MUST return null.

If the track is ended, then the user agent MUST return null.

If the captured application did not set a CaptureHandleConfig, or if the last time it set it to the empty CaptureHandleConfig, then the user agent MUST return null.

The user agent MUST compare the origin of the capturing document to those which the captured application listed in permittedOrigins. If the capturing origin is not permitted to observe the CaptureHandle, then the user agent MUST return null.

If all previous validations passed, then the user agent MUST return a CaptureHandle dictionary with the values derived of the last CaptureHandleConfig set by the captured application.

Whenever the observable CaptureHandle for a given capturing application changes, the user agent fires an event named capturehandlechange. This can happen in the following cases:

The captured application call setCaptureHandleConfig() with a new CaptureHandleConfig. (Note that the new CaptureHandleConfig might or might not cause the observable CaptureHandle to change, e.g. if changing permittedOrigins.)
The captured application's top-level browsing context is navigated cross-document.
The user agent switches the track to follow a new application.

Events are not fired when the track ends, nor after it ends.

MediaStreamTrack is extended with an EventListener called oncapturehandlechange.

WebIDLpartial interface MediaStreamTrack {
  attribute EventHandler oncapturehandlechange;
};

oncapturehandlechange: EventHandler for events named capturehandlechange.

Capture Handle - Bootstrapping Collaboration when Screensharing

Abstract

Status of This Document

1. Conformance

2. Problem Description

2.1 Generic Problem Description

2.2 Use-case #1: Driving Presentations from Video Conferencing Apps

2.3 Use-case #2: Analytics

2.4 Use-case #3: Detecting Unintended Captures

2.5 Use-case #4: Avoiding "Hall of Mirrors"

3. The Capture-Handle Mechanism

4. Captured Side

4.1 `CaptureHandleConfig`

4.2 MediaDevices.setCaptureHandleConfig()

5. Capturing Side

5.1 `CaptureHandle`

5.2 MediaStreamTrack.getCaptureHandle()

5.3 On-Change Event

5.3.1 capturehandlechange

5.3.2 oncapturehandlechange

A. References

A.1 Normative references

A.2 Informative references

Capture Handle - Bootstrapping Collaboration when Screensharing

Abstract

Status of This Document

1. Conformance

2. Problem Description

2.1 Generic Problem Description

2.2 Use-case #1: Driving Presentations from Video Conferencing Apps

2.3 Use-case #2: Analytics

2.4 Use-case #3: Detecting Unintended Captures

2.5 Use-case #4: Avoiding "Hall of Mirrors"

3. The Capture-Handle Mechanism

4. Captured Side

4.1 CaptureHandleConfig

4.2 MediaDevices.setCaptureHandleConfig()

5. Capturing Side

5.1 CaptureHandle

5.2 MediaStreamTrack.getCaptureHandle()

5.3 On-Change Event

5.3.1 capturehandlechange

5.3.2 oncapturehandlechange

A. References

A.1 Normative references

A.2 Informative references

4.1 `CaptureHandleConfig`

5.1 `CaptureHandle`