Copyright © 2017 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This specification extends the Media Capture and Streams specification [GETUSERMEDIA] to allow a depth-only stream or combined depth+color stream to be requested from the web platform using APIs familiar to web authors.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This extensions specification defines a new media type and constrainable property per Extensibility guidelines of the Media Capture and Streams specification [ GETUSERMEDIA]. Horizontal reviews and feedback from early implementations of this specification are encouraged.
This document was published by the Device and Sensors Working Group and the Web Real-Time Communications Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Device and Sensors Working Group) and a public list of any patent disclosures (Web Real-Time Communications Working Group) made in connection with the deliverables of each group; these pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 September 2015 W3C Process Document.
Depth cameras are increasingly being integrated into devices such as phones, tablets, and laptops. Depth cameras provide a depth map, which conveys the distance information between points on an object's surface and the camera. With depth information, web content and applications can be enhanced by, for example, the use of hand gestures as an input mechanism, or by creating 3D models of real-world objects that can interact and integrate with the web platform. Concrete applications of this technology include more immersive gaming experiences, more accessible 3D video conferences, and augmented reality, to name a few.
To bring depth capability to the web platform, this specification
extends the MediaStream
interface [GETUSERMEDIA] to enable it to also contain depth-based MediaStreamTrack
s. A depth-based
MediaStreamTrack
, referred to as a depth stream track, represents an abstraction of a stream of frames that can each be converted to objects which contain an array of pixel data, where each pixel represents the distance between the camera and the objects in the scene for that point in the array. A MediaStream
object that contains one or more depth stream tracks is referred to as a
depth-only stream or depth+color stream.
Depth cameras usually produce 16-bit depth values per pixel, so this specification defines a 16-bit grayscale representation of a depth map.
This specification attempts to address the Use Cases and Requirements for accessing depth stream from a depth camera. See also the Examples section for concrete usage examples.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key word MUST is to be interpreted as described in [RFC2119].
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.
The MediaStreamTrack
and MediaStream
interfaces this specification extends are defined in [GETUSERMEDIA].
The Constraints
,
MediaTrackSettings
,
MediaTrackConstraints
,
MediaTrackSupportedConstraints
,
MediaTrackCapabilities
, and MediaTrackConstraintSet
dictionaries this specification extends are defined in [
GETUSERMEDIA].
The getUserMedia()
,
getSettings()
methods and the NavigatorUserMediaSuccessCallback
callback are defined in [GETUSERMEDIA].
The concepts muted,
disabled, and
overconstrained
as applied to MediaStreamTrack
are defined in [GETUSERMEDIA].
The terms source and consumer are defined in [GETUSERMEDIA].
The MediaDeviceKind
enumeration is defined in [GETUSERMEDIA].
The video
element and ImageData
(and its data
attribute and
Canvas Pixel
ArrayBuffer
), VideoTrack
, HTMLMediaElement
(and its
srcObject
attribute), HTMLVideoElement
interfaces and the CanvasImageSource
enum are defined in [HTML].
The terms media data, media provider object, assigned media provider object, and the concept potentially playing are defined in [ HTML].
The term permission and the permission name "camera
" are defined in [PERMISSIONS].
The DataView
,
Uint8ClampedArray
, and Uint16Array
buffer source types are defined in [WEBIDL].
The term depth+color stream means a MediaStream
object that contains one or more MediaStreamTrack
objects whose
videoKind
of Settings
is "depth
" (depth stream track) and one or more MediaStreamTrack
objects whose videoKind
of Settings
is "
color
" (color stream track).
The term depth-only stream means a MediaStream
object that contains one or more MediaStreamTrack
objects whose
videoKind
of Settings
is "depth
" (depth stream track) only.
The term color-only stream means a MediaStream
object that contains one or more MediaStreamTrack
objects whose
videoKind
of Settings
is "color
" (color stream track) only, and optionally of kind "
audio
".
The term depth stream track means a MediaStreamTrack
object whose videoKind
of Settings
is "
depth
". It represents a media stream track whose
source is a depth camera.
The term color stream track means a MediaStreamTrack
object whose videoKind
of Settings
is "
color
". It represents a media stream track whose
source is a color camera.
A depth map is an abstract representation of a frame of a depth stream track. A depth map is an image that contains information relating to the distance of the surfaces of scene objects from a viewpoint. A depth map consists of pixels referred to as depth map values. An invalid depth map value is 0 (the user agent is unable to acquire depth information for the given pixel for any reason).
A depth map has an associated near value which is a double. It represents the minimum range in meters.
A depth map has an associated far value which is a double. It represents the maximum range in meters.
A depth map has an associated horizontal focal length which is a double. It represents the horizontal focal length of the depth camera, in pixels.
A depth map has an associated vertical focal length which is a double. It represents the vertical focal length of the depth camera, in pixels.
The data type of a depth map is 16-bit unsigned integer. The algorithm to convert the depth map value to grayscale, given a depth map value d, is as follows:
The rules to convert using range linear are as given in the following formula:
d n =d−near far−near
d 16bit =⌊d n ⋅65535 ⌋
The depth measurement d (in meter units) is recovered by solving the rules to convert using range linear for d as follows:
d n =d 16bit 65535
d=(d n ⋅(far−near) ) +near
MediaTrackSupportedConstraints
dictionary
partial dictionary MediaTrackSupportedConstraints
{
boolean videoKind
= true;
boolean depthNear
= true;
boolean depthFar
= true;
boolean focalLengthX
= true;
boolean focalLengthY
= true;
};
MediaTrackSupportedConstraints
Members
videoKind
of
type boolean,
defaulting to true
videoKind
for details.
depthNear
of
type boolean,
defaulting to true
depthNear
for details.
depthFar
of
type boolean,
defaulting to true
depthFar
for details.
focalLengthX
of type boolean,
defaulting to true
focalLengthX
for details.
focalLengthY
of type boolean,
defaulting to true
focalLengthY
for details.
MediaTrackCapabilities
dictionary
partial dictionary MediaTrackCapabilities
{
DOMString videoKind
;
(double or DoubleRange) depthNear
;
(double or DoubleRange) depthFar
;
(double or DoubleRange) focalLengthX
;
(double or DoubleRange) focalLengthY
;
};
MediaTrackCapabilities
Members
videoKind
of type DOMString
videoKind
for details.
depthNear
of type (double or DoubleRange)
depthNear
for details.
depthFar
of type (double or DoubleRange)
depthFar
for details.
focalLengthX
of type (double or DoubleRange)
focalLengthX
for details.
focalLengthY
of type (double or DoubleRange)
focalLengthY
for details.
partial dictionary MediaTrackConstraintSet
{
ConstrainDOMString videoKind
;
ConstrainDouble depthNear
;
ConstrainDouble depthFar
;
ConstrainDouble focalLengthX
;
ConstrainDouble focalLengthY
;
};
MediaTrackConstraintSet
Members
videoKind
of type ConstrainDOMString
videoKind
for details.
depthNear
of type ConstrainDouble
depthNear
for details.
depthFar
of type ConstrainDouble
depthFar
for details.
focalLengthX
of type ConstrainDouble
focalLengthX
for details.
focalLengthY
of type ConstrainDouble
focalLengthY
for details.
MediaTrackSettings
dictionary
partial dictionary MediaTrackSettings
{
DOMString videoKind
;
double depthNear
;
double depthFar
;
double focalLengthX
;
double focalLengthY
;
};
MediaTrackSettings
Members
videoKind
of type DOMString
videoKind
for details.
depthNear
of type double
depthNear
for details.
depthFar
of type double
depthFar
for details.
focalLengthX
of type double
focalLengthX
for details.
focalLengthY
of type double
focalLengthY
for details.
The following constrainable properties are defined to apply only to video MediaStreamTrack
objects:
Property Name | Values | Notes |
---|---|---|
videoKind |
ConstrainDOMString
|
This string should be one of the members of
. The members describe the kind of video that the camera can capture. Note that getConstraints may not return exactly the same string for strings not in this enum. This preserves the possibility of using a future version of WebIDL enum for this property.
|
enum VideoKindEnum
{
"color",
"depth"
};
Enumeration description | |
---|---|
color
|
The source is capturing color images. |
depth
|
The source is capturing depth maps. |
The MediaStream
consumer for the depth-only
stream and depth+color stream is the video
element [HTML].
If a MediaStreamTrack
whose videoKind
of
Settings
is muted or disabled, it MUST render frames as if all the pixels would be 0.
This section is non-normative.
A color stream track and a depth stream track can be combined into one depth+color stream. The rendering of the two tracks are intended to be synchronized. The resolution of the two tracks are intended to be same. And the coordination of the two tracks are intended to be calibrated. These are not hard requirements, since it might not be possible to synchronize tracks from sources.
The following constrainable properties are defined to apply only to depth stream tracks
Property Name | Values | Notes |
---|---|---|
depthNear |
ConstrainDouble
|
The near value, in meters. |
depthFar |
ConstrainDouble
|
The far value, in meters. |
focalLengthX |
ConstrainDouble
|
The horizontal focal length, in pixels. |
focalLengthY |
ConstrainDouble
|
The vertical focal length, in pixels. |
The depthNear
and depthFar
constrainable properties, when set, allow the implementation to pick the best depth camera mode optimized for the range [depthNear,
depthFar]
and help minimize the error introduced by the lossy conversion from the depth value d to a quantized d
8bit and back to an approximation of the depth value
d.
If the depthFar
property's value is less than the
depthNear
property's value, the depth stream
track is overconstrained
.
If the near value, far value, horizontal focal length or vertical focal length is fixed due to a hardware or software limitation, the corresponding constrainable property's value MUST be set to the value reported by the underlying implementation. (For example, the focal lengths of the lens may be fixed, or the underlying platform may not expose the focal length information.)
WebGLRenderingContext
interface
This section is non-normative.
A video
element whose source is a MediaStream
object containing a depth stream track may be uploaded to a WebGL texture of format RGB
and type
UNSIGNED_BYTE
. [WEBGL]
For each pixel of this WebGL texture, the R component represents the lower 8 bit value of 16 bit depth value, the G component represents the upper 8 bit value of 16 bit depth value and the value in B component is not defined.
This section is non-normative.
navigator.mediaDevices.getUserMedia({
video: {videoKind: {exact: "color"}, groupId: {exact: id}}
}).then(function (stream) {
// Wire the media stream into a <video> element for playback.
// The RGB video is rendered.
var video = document.querySelector('#video');
video.srcObject = stream;
video.play();
}
);
navigator.mediaDevices.getUserMedia({
video: {videoKind: {exact: "depth"}, groupId: {exact: id}}
}).then(function (stream) {
// Wire the depth-only stream into another <video> element for playback.
// The depth information is rendered in its grayscale representation.
var depthVideo = document.querySelector('#depthVideo');
depthVideo.srcObject = stream;
depthVideo.play();
}
);
// This code sets up a video element from a depth stream, uploads it to a WebGL
// texture, and samples that texture in the fragment shader, reconstructing the
// 16-bit depth values from the red and green channels.
navigator.mediaDevices.getUserMedia({
video: {videoKind: {exact: "depth"}}
}).then(function (stream) {
// wire the stream into a <video> element for playback
var depthVideo = document.querySelector('#depthVideo');
depthVideo.srcObject = stream;
depthVideo.play();
}).catch(function (reason) {
// handle gUM error here
});
// ... later, in the rendering loop ...
gl.texImage2D(
gl.TEXTURE_2D,
0,
gl.RGB,
gl.RGB,
gl.UNSIGNED_BYTE,
depthVideo
);
<script id="fragment-shader" type="x-shader/x-fragment">
varying vec2 v_texCoord;
// u_tex points to the texture unit containing the depth texture.
uniform sampler2D u_tex;
uniform float far;
uniform float near;
void main() {
vec4 floatColor = texture2D(u_tex, v_texCoord);
float dn = floatColor.r;
float depth = 0.;
depth = far * near / ( far - dn * ( far - near));
// ...
}
</script>
This section is non-normative.
The privacy and security considerations discussed in [GETUSERMEDIA] apply to this extension specification.
Thanks to everyone who contributed to the Use Cases and Requirements, sent feedback and comments. Special thanks to Ningxin Hu for experimental implementations, as well as to the Project Tango for their experiments.