Screen Capture

Abstract

This document defines how a user's display, or parts thereof, can be used as the source of a media stream using getDisplayMedia, an extension to the Media Capture API [ GETUSERMEDIA].

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation.

This document was published by the Device and Sensors Working Group and the Web Real-Time Communications Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Device and Sensors Working Group) and a public list of any patent disclosures (Web Real-Time Communications Working Group) made in connection with the deliverables of each group; these pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

4. Terminology

This document uses the definition of NavigatorUserMedia, MediaStreamTrack, and ConstrainablePattern from [GETUSERMEDIA].

Screen capture encompasses the capture of several different types of screen-based surfaces. Collectively, these are referred to as display surfaces, of which this document defines the following types:

A monitor display surface represents a physical display. Some systems have multiple monitors, which can be identified separately. Multiple monitors might also be aggregated into as a single logical monitor. An aggregated display surface is captured as a single MediaStreamTrack.
A window display surface is a single contiguous surface that is used by a single application.
A single application might have several windows available to it, and those can be aggregated into a single application surface, representing all the windows available to that application and therefore presented as a single MediaStreamTrack.
A browser display surface is the rendered form of a single document. This is not strictly limited to HTML [HTML5] documents, though the discussion in this document will address some specific concerns with the capture of HTML.

This document draws a distinction between two variants of each type of display surface:

A logical display surface is the surface that an operating system makes available to an application for the purposes of rendering.
a visible display surface is the portion of a logical display surface that is rendered to a monitor.

Some operating systems permit windows from different applications to occlude other windows, in whole or part, so the visible display surface is a strict subset of the logical display surface.

5. Capturing Displayed Media

Capture of displayed media is enabled through the addition of a new getDisplayMedia method on the NavigatorUserMedia interface, that is similar to getUserMedia [ GETUSERMEDIA].

getDisplayMedia supports all constraints that are defined for use with getUserMedia, however it does not use these constraints to select a specific source or to narrow the set of options for source selection (see 5.2 Constraining Display Surface Selection). Two new constraints allow an application to observe what type of information was acquired.

5.1 `NavigatorUserMedia` Additions

partial interface NavigatorUserMedia {
    Promise<MediaStream> getDisplayMedia (MediaStreamConstraints constraints);
};

5.1.1 Methods

getDisplayMedia

This method operates identically to getUserMedia, except that it acquires media from output devices.

Parameter	Type	Nullable	Optional	Description
constraints	`MediaStreamConstraints`	✘	✘

Return type: Promise<MediaStream>

5.2 Constraining Display Surface Selection

The getDisplayMedia function permits the use of constraints in the same way that constraints are defined for getUserMedia. However, these constraints MUST NOT constrain the selection of choices that are presented to users in the same way.

Once a display surface has been selected, constraints apply to the MediaStreamTrack instances that are returned. This allows for changes such as adjustments to frame rate or resolution, or other constraints.

5.2.1 New Constraints for Captured Display Surfaces

Two constraints are defined that allow an application to observe properties of the selected display surface. Since the source of media cannot be changed after a MediaStreamTrack has been returned, these constraints cannot be changed by an application.

The displaySurface constraint allows an application to observe the type of display surface that is being captured.

The logicalSurface constraint allows an application to express a preference or requirement to capture the logical display surface, rather than the visible display surface.

partial dictionary MediaTrackConstraintSet {
             ConstrainDOMString displaySurface;
             ConstrainBoolean   logicalSurface;
};

5.2.1.1 Dictionary `MediaTrackConstraintSet` Members

displaySurface of type ConstrainDOMString: The type of display surface that is being captured. This assumes values from the DisplayCaptureSurfaceType enumeration.
logicalSurface of type ConstrainBoolean: A value of true indicates capture of a logical display surface; a value of false indicates a capture capture of a visible display surface.

5.2.1.2 `DisplayCaptureSurfaceType`

The DisplayCaptureSurfaceType enumeration describes the different types of display surface.

enum DisplayCaptureSurfaceType {
    "monitor",
    "window",
    "application",
    "browser"
};

Enumeration description
`monitor`	a monitor display surface, physical display, or collection of physical displays
`window`	a window display surface, or single application window
`application`	an application display surface, or entire collection of windows for an application
`browser`	a browser display surface, or single browser window

5.3 Device Identifiers

Each potential source of capture is treated by this API as a discrete media source. However, display capture sources MUST NOT be enumerated by enumerateDevices, since this would reveal too much information about the host system.

Display capture sources MUST NOT be selected with the deviceId constraint, since this would allow applications to influence selection. A display capture source is represented in the MediaStreamTrack API as having a deviceId parameter that is randomized each time a MediaStreamTrack is connected. The only other constraint on this value is that it cannot duplicate any existing values for deviceId.

6. Security and Permissions

This section is informative; however, it notes some serious risks to platform security if the advice it contains are not adhered to.

Issue 1

This is consistent with other documents, but the absence of strong normative language here is a little worrying.

The risks to user privacy and security posed by capture of displayed content are twofold. The immediate and obvious risk is that users inadvertently share content that they did not wish to share, or might not have realized would be shared.

Display capture presents a less obvious risk to the cross site request forgery protections offered by the browser sandbox. Display and capture of information that is also under the control of an application, even indirectly, can allow that application to access information that would otherwise by inaccessible to it directly. For example, the canvas API does not permit sampling of a canvas, or conversion to an accessible form if it is not origin-clean [2DCONTEXT].

This issue is discussed in further detail in [RTCWEB-SECURITY-ARCH] and [RTCWEB-SECURITY].

Display capture that includes browser windows, particularly those that are under any form of control by the application, risks violation of these basic security protections. Though this risk is not entirely contained to browser windows, since control channels between browser applications and other applications, depending on the operating system. The key consideration is whether the captured display surface could be somehow induced to present information that would otherwise be secret from the application that is receiving the resulting media.

6.1 Capturing Logical or Visible Display Surfaces

Capture of logical display surfaces causes there to be a potential for content to be shared that a user is not made aware of. A logical display surface might render information that a user did not intend to expose. This can be more easily recognized if this information is visible. Such means are likely ineffectual against a machine, but a human recipient is less able to process content that appears only briefly.

Information that is not currently rendered to the screen SHOULD be obscured in captures unless the application has been specifically authorized to access that content (this might require elevated permissions).

How obscured areas of the logical display surface are captured to produce a visible display surface capture MAY vary. Some applications, like presentation software, benefit from having obscured portions of the screen render the image that appeared prior to being obscured. Freezing images can cause visual artifacts for changing content, or hide the fact that content is being obscured. Note that frozen portions of a capture can be incorrectly perceived as a bug. Alternatively, obscured areas might be replaced with content that marks them as being obscured, such as a grey color or hatching.

Some systems MAY only capture the logical display surface. Devices with small screens, for instance, do not typically have the concept of a window, and render applications in full screen modes only. These systems might provide a capture of an application that is not currently visible, which could be unusable without capturing the logical display surface.

An important consideration when capturing a window or other display surface that is partially transparent is that content from the background might be shared. A user agent MUST NOT capture content from the background of a captured display surface.

6.2 Authorizing Display Capture

This document provides recommends that implementations provide additional limitations on the mechanisms used to affirm user consent. These limitations are designed to mitigate the security and privacy risks that the API poses.

Two forms of consent interaction are described: active user consent and a range of elevated permissions. These are non-normative recommandations only.

6.2.2 Elevated Permissions

It is strongly advised that elevated permissions be required to access any display surface that might be used to circumvent cross-origin protections for content. The key goal of this consent process is not just to demonstrate that a user intends to share content, but to also to determine that the user exhibits an elevated level of trust in the application that is being granted access.

Several different controls might be provided to grant elevated permissions. This section describes several different capabilities that could be independently granted. A user agent might opt to prohibit access to any capability that requires elevated permissions.

If access to these surfaces is supported, it is strongly advised that any mechanism to acquire elevated permissions not rely solely on simple prompts for user consent. Any action needs to ensure that a decision to authorize an application with elevated privileges is deliberate. For instance, a user agent might require a process equivalent to software installation to signify that user consent for elevated permissions is granted.

An elevated permissions experience could allow the user agent to communicate the risks associated with enabling this feature, or at least to convey the need for augmented trust in the application.

Note that elevated permissions are not a substitute for active user consent. It is advised that user agents still present users with the ability to select what is shared, even for applications that have elevated permissions.

6.2.3 Capabilities Depending on Elevated Permissions

Elevated permissions are recommended as a prerequisite for access to capture of monitor or browser display surfaces. Note that capture of a complete monitor is included because this could include a window from the user agent.

Similarly, elevated permissions are a recommended prerequisite for access to logical display surfaces, where that would not ordinarily be provided.

A user agent SHOULD persist any elevated permissions that are granted to an origin. An elevated permissions process in part relies on its novelty to ensure that it correctly captures user intent.

6.3 Feedback and Interface During Capture

Implementations are advised to provide user feedback and control mechanisms similar to those offered users when sharing a camera or microphone, as recommended in [GETUSERMEDIA].

It is important that a user be aware that content is being shared when content is actively being captured. User agents are advised to display a prominent indicator while content is being captured. In addition to an indicator, a user agent is advised to provide a means to learn precisely what is being shared; while this capability is trivially provided by an application by rendering the captured content, this information allows a user to accurately assess what is being shared.

In addition to feedback mechanisms, a means to for the user to stop any active capture is advisable.

B. References

B.1 Normative references

[GETUSERMEDIA]: Daniel Burnett; Adam Bergkvist; Cullen Jennings; Anant Narayanan; Bernard Aboba. W3C. Media Capture and Streams. 19 May 2016. W3C Candidate Recommendation. URL: https://www.w3.org/TR/mediacapture-streams/
[RFC2119]: S. Bradner. IETF. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RTCWEB-SECURITY]: Eric Rescorla. IETF. Security Considerations for WebRTC. 22 January 2014. Active Internet-Draft. URL: http://datatracker.ietf.org/doc/draft-ietf-rtcweb-security/
[RTCWEB-SECURITY-ARCH]: Eric Rescorla. IETF. WebRTC Security Architecture. 22 January 2014. Active Internet-Draft. URL: http://datatracker.ietf.org/doc/draft-ietf-rtcweb-security-arch/
[WEBIDL]: Cameron McCormack; Boris Zbarsky. W3C. WebIDL Level 1. 8 March 2016. W3C Candidate Recommendation. URL: https://www.w3.org/TR/WebIDL-1/

B.2 Informative references

[2DCONTEXT]: Rik Cabanier; Jatinder Mann; Jay Munro; Tom Wiltzius; Ian Hickson. W3C. HTML Canvas 2D Context. 19 November 2015. W3C Recommendation. URL: https://www.w3.org/TR/2dcontext/
[ECMA-262]: Ecma International. ECMAScript Language Specification. URL: https://tc39.github.io/ecma262/
[HTML5]: Ian Hickson; Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Edward O'Connor; Silvia Pfeiffer. W3C. HTML5. 28 October 2014. W3C Recommendation. URL: https://www.w3.org/TR/html5/
[MEDIACAPTURE-DEPTH]: Anssi Kostiainen; Ningxin Hu; Rob Manson. W3C. Media Capture Depth Stream Extensions. 12 July 2016. W3C Working Draft. URL: https://www.w3.org/TR/mediacapture-depth/
[WEBRTC]: Adam Bergkvist; Daniel Burnett; Cullen Jennings; Anant Narayanan; Bernard Aboba. W3C. WebRTC 1.0: Real-time Communication Between Browsers. 31 May 2016. W3C Working Draft. URL: https://www.w3.org/TR/webrtc/

Screen Capture

W3C Working Draft 14 July 2016

Abstract

Status of This Document

1. Introduction

2. Conformance

3. Example

4. Terminology

5. Capturing Displayed Media

5.1 `NavigatorUserMedia` Additions

5.1.1 Methods

5.2 Constraining Display Surface Selection

5.2.1 New Constraints for Captured Display Surfaces

5.2.1.1 Dictionary `MediaTrackConstraintSet` Members

5.2.1.2 `DisplayCaptureSurfaceType`

5.3 Device Identifiers

6. Security and Permissions

6.1 Capturing Logical or Visible Display Surfaces

6.2 Authorizing Display Capture

6.2.2 Elevated Permissions

6.2.3 Capabilities Depending on Elevated Permissions

6.3 Feedback and Interface During Capture

7. Change Log

Changes since 2014-02-07

Changes since 2014-10-31

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

Abstract

Status of This Document

1. Introduction

2. Conformance

3. Example

4. Terminology

5. Capturing Displayed Media

5.1 NavigatorUserMedia Additions

5.1.1 Methods

5.2 Constraining Display Surface Selection

5.2.1 New Constraints for Captured Display Surfaces

5.2.1.1 Dictionary MediaTrackConstraintSet Members

5.2.1.2 DisplayCaptureSurfaceType

5.3 Device Identifiers

6. Security and Permissions

6.1 Capturing Logical or Visible Display Surfaces

6.2 Authorizing Display Capture

6.2.1 Active User Consent

6.2.2 Elevated Permissions

6.2.3 Capabilities Depending on Elevated Permissions

6.3 Feedback and Interface During Capture

7. Change Log

Changes since 2014-02-07

Changes since 2014-10-31

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

5.1 `NavigatorUserMedia` Additions

5.2.1.1 Dictionary `MediaTrackConstraintSet` Members

5.2.1.2 `DisplayCaptureSurfaceType`