Copyright © 2015 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines how a user's display, or parts thereof, can be used as the source of a media stream using getOutputMedia, an extension to the Media Capture API [GETUSERMEDIA].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation.
This document was published by the Web Real-Time Communication Working Group and Device APIs Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Web Real-Time Communication Working Group, Device APIs Working Group) made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 August 2014 W3C Process Document.
This section is non-normative.
This document describes an extension to the Media Capture API [GETUSERMEDIA] that enables the acquisition of a user's display, or part thereof, in the form of a video stream. This enables a number of applications, including screen sharing using WebRTC [WEBRTC].
This feature has signficant security implications. Applications that use this API to access information that is displayed to users could access confidential information from other origins if that information is under the control of the application. This includes content that would otherwise be inaccessible due to the protections offered by the user agent sandbox.
This document concerns itself primarily with the capture of video, but the general mechanisms defined here could be extended to other types of media, of which audio [GETUSERMEDIA] and depth [MEDIACAPTURE-DEPTH] are currently defined.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST NOT, and SHOULD are to be interpreted as described in [RFC2119].
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript [ECMA-262] to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.
The following example demonstrates a request for display capture using the navigator.mediaDevices.getOutputMedia
method defined in this document.
navigator.mediaDevices.getOutputMedia({ video: true }) .then(stream => { // we have a stream, attach it to a feedback video element videoElement.srcObject = stream; }, error => { console.log("Unable to acquire screen capture", error); });
A more complicated example shows the use of constraints to limit the choice to windows and their visible display surface.
navigator.mediaDevices.getOutputMedia({ video: { displaySurface: { exact: "window" }, logicalSurface: { exact: false } } }).then(stream => { // we have a stream, attach it to a feedback video element videoElement.srcObject = stream; }, error => { console.log("Unable to acquire screen capture", error); });
Screen capture encompasses the capture of several different types of screen-based surfaces. Collectively, these are referred to as display surfaces, of which this document defines the following types:
This document draws a distinction between two variants of each type of display surface:
Some operating systems permit windows from different applications to occlude other windows, in whole or part, so the visible display surface is a strict subset of the logical display surface.
Output capture is enabled through the addition of a new getOutputMedia
method on the
NavigatorMediaDevices
interface, that is similar to getUserMedia
[GETUSERMEDIA]. New constraints allow an
application to control what type of information is requested.
Two constraints are defined that allow an application to limit the display surfaces that are selected. This only allows an application to control the type of surface that is selected, not to identify a specific surface.
The displaySurface
constraint allows an application to express a preference or requirement
for the type of display surface that is acquired.
The logicalSurface
constraint allows an application to express a preference or requirement to
capture the logical display surface, rather than the visible
display surface.
partial dictionary MediaTrackConstraintSet {
ConstrainDOMString displaySurface;
ConstrainBoolean logicalSurface;
};
MediaTrackConstraintSet
MembersdisplaySurface
of type ConstrainDOMStringWhat type of display surface to capture. This assumes
values from the
enumeration.OutputCaptureSurfaceType
logicalSurface
of type ConstrainBooleanA value of true
requests or requires capture of a
logical display surface; a value of false
requests
or requires capture of a visible display surface.
OutputCaptureSurfaceType
The
enumeration
describes the different types of display surface.OutputCaptureSurfaceType
enum OutputCaptureSurfaceType {
"monitor",
"window",
"application",
"browser"
};
Enumeration description | |
---|---|
monitor | a monitor display surface, physical display, or collection of physical displays |
window | a window display surface, or single application window |
application | an application display surface, or entire collection of windows for an application |
browser | a browser display surface, or single browser window |
Github#2: Determine whether we want to support application capture.
Each potential source of capture is treated by this API as a discrete
media source. However, display capture sources MUST NOT be enumerated by
enumerateDevices
,
since this would reveal too much information about the host system.
Display capture sources MUST NOT be selected with the
deviceId
constraint, since this would allow applications to
influence selection. A display capture source is represented in the
MediaStreamTrack
API as having a deviceId
parameter that is randomized each time a MediaStreamTrack
is
connected. The only other constraint on this value is that it cannot
conflict with any existing values for deviceId
.
This section is informative; however, it notes some serious risks to platform security if the advice it contains are not adhered to.
This is consistent with other documents, but the absence of strong normative language here is a little worrying.
The risks to user privacy and security posed by capture of displayed content are twofold. The immediate and obvious risk is that users inadvertently share content that they did not wish to share, or might not have realized would be shared.
Display capture presents a less obvious risk to the cross site request forgery protections offered by the browser sandbox. Display and capture of information that is also under the control of an application, even indirectly, can allow that application to access information that would otherwise by inaccessible to it directly. For example, the canvas API does not permit sampling of a canvas, or conversion to an accessible form if it is not origin-clean [2DCONTEXT].
This issue is discussed in further detail in [RTCWEB-SECURITY-ARCH] and [RTCWEB-SECURITY].
Display capture that includes browser windows, particularly those that are under any form of control by the application, risks violation of these basic security protections. Though this risk is not entirely contained to browser windows, since control channels between browser applications and other applications, depending on the operating system. The key consideration is whether the captured display surface could be somehow induced to present information that would otherwise be secret from the application that is receiving the resulting media.
Capture of logical display surfaces causes there to be a potential for content to be shared that a user is not made aware of. A logical display surface might render information that a user did not intend to expose. This can be quickly rectified if this information is visible. Such means are of course ineffectual against a machine, but a human recipient is less able to process content that appears only briefly.
Information that is not currently rendered to the screen SHOULD be obscured in captures unless the application has been specifically authorized to access that content (this might require elevated permissions).
How obscured areas of the logical display surface are captured to produce a visible display surface capture MAY vary. Some applications, like presentation software, benefit from having obscured portions of the screen render the image that appeared prior to being obscured. Freezing images can cause visual artifacts for changing content, or hide the fact that content is being obscured. Note that frozen portions of a capture can be incorrectly perceived as a bug. Alternatively, obscured areas might be replaced with content that marks them as being obscured, such as a grey color or hatching.
Some systems MAY only capture the logical display surface. Devices with small screens, for instance, do not typically have the concept of a window, and render applications in full screen modes only. These systems might provide a capture of an application that is not currently visible, which could be unusable without capturing the logical display surface.
Implementations can handle feedback and control mechanism in a similar way to how they handle sharing of a camera or microphone, as recommended in [GETUSERMEDIA].
It is important that a user be aware that content is being shared when content is actively being captured. User agents are advised to display a prominent indicator while content is being captured. In addition to an indicator, a user agent is advised to provide a means to learn precisely what is being shared; while this capability is trivially provided by an application by rendering the captured content, this information allows a user to accurately assess what is being shared.
In addition to feedback mechanisms, a means to for the user to stop any active capture is advisable.
This section will be removed before publication.
Using monitor to describe scren/display share.
Changed getUserMedia to new method getOutputMedia.
The editors wish to thank ....