Media Capture Depth Stream Extensions

1. Introduction

Depth cameras are increasingly being integrated into devices such as phones, tablets, and laptops. Depth cameras provide a depth map, which conveys the distance information between points on an object's surface and the camera. With depth information, web content and applications can be enhanced by, for example, the use of hand gestures as an input mechanism, or by creating 3D models of real-world objects that can interact and integrate with the web platform. Concrete applications of this technology include more immersive gaming experiences, more accessible 3D video conferences, and augmented reality, to name a few.

To bring depth capability to the web platform, this specification extends the MediaStream interface [GETUSERMEDIA] to enable it to also contain depth-based MediaStreamTracks. A depth-based MediaStreamTrack, referred to as a depth stream track, represents an abstraction of a stream of frames that can each be converted to objects which contain an array of pixel data, where each pixel represents the distance between the camera and the objects in the scene for that point in the array. A MediaStream object that contains one or more depth stream tracks is referred to as a depth-only stream or depth+color stream.

Depth cameras usually produce 16-bit depth values per pixel, so this specification defines a 16-bit grayscale representation of a depth map.

2. Use cases and requirements

This specification attempts to address the Use Cases and Requirements for accessing depth stream from a depth camera. See also the Examples section for concrete usage examples.

3. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key word MUST is to be interpreted as described in [RFC2119].

This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.

Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.

4. Dependencies

The MediaStreamTrack and MediaStream interfaces this specification extends are defined in [GETUSERMEDIA].

The Constraints, MediaTrackSettings, MediaTrackConstraints, MediaTrackSupportedConstraints, MediaTrackCapabilities, and MediaTrackConstraintSet dictionaries this specification extends are defined in [ GETUSERMEDIA].

The getUserMedia(), getSettings() methods and the NavigatorUserMediaSuccessCallback callback are defined in [GETUSERMEDIA].

The concepts muted, disabled, and overconstrained as applied to MediaStreamTrack are defined in [GETUSERMEDIA].

The terms source and consumer are defined in [GETUSERMEDIA].

The MediaDeviceKind enumeration is defined in [GETUSERMEDIA].

The video element and ImageData (and its data attribute and Canvas Pixel ArrayBuffer), VideoTrack, HTMLMediaElement (and its srcObject attribute), HTMLVideoElement interfaces and the CanvasImageSource enum are defined in [HTML].

The terms media data, media provider object, assigned media provider object, and the concept potentially playing are defined in [ HTML].

The term permission and the permission name "camera" are defined in [PERMISSIONS].

The DataView, Uint8ClampedArray, and Uint16Array buffer source types are defined in [WEBIDL].

5. Terminology

The term depth+color stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is "depth" (depth stream track) and one or more MediaStreamTrack objects whose videoKind of Settings is " color" (color stream track).

The term depth-only stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is "depth" (depth stream track) only.

The term color-only stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is "color" (color stream track) only, and optionally of kind " audio".

The term depth stream track means a MediaStreamTrack object whose videoKind of Settings is " depth". It represents a media stream track whose source is a depth camera.

The term color stream track means a MediaStreamTrack object whose videoKind of Settings is " color". It represents a media stream track whose source is a color camera.

5.1 Depth map

A depth map is an abstract representation of a frame of a depth stream track. A depth map is an image that contains information relating to the distance of the surfaces of scene objects from a viewpoint. A depth map consists of pixels referred to as depth map values. An invalid depth map value is 0 (the user agent is unable to acquire depth information for the given pixel for any reason).

A depth map has an associated near value which is a double. It represents the minimum range in meters.

A depth map has an associated far value which is a double. It represents the maximum range in meters.

A depth map has an associated horizontal focal length which is a double. It represents the horizontal focal length of the depth camera, in pixels.

A depth map has an associated vertical focal length which is a double. It represents the vertical focal length of the depth camera, in pixels.

The data type of a depth map is 16-bit unsigned integer. The algorithm to convert the depth map value to grayscale, given a depth map value d, is as follows:

Let near be the the near value.
Let far be the the far value.
If the given depth map value d is unknown or invalid, then return the invalid depth map value.
Apply the rules to convert using range linear to d to obtain quantized value d_16bit.
Return d_16bit.

The rules to convert using range linear are as given in the following formula:

$d_{n} = \frac{d - n e a r}{f a r - n e a r}$

$d_{16 b i t} = ⌊ d_{n} \cdot 65535 ⌋$

Note

The depth measurement d (in meter units) is recovered by solving the rules to convert using range linear for d as follows:

If d_16bit is 0, let d be an invalid depth map value, and return it.
Otherwise, given d_16bit, near near value and far far value, normalize d_16bit to [0, 1] range:
$d_{n} = \frac{d_{16 b i t}}{65535}$
Solve the rules to convert using range linear for d:
$d = (d_{n} \cdot (f a r - n e a r)) + n e a r$
Return d.

6. Extensions

6.1 `MediaTrackSupportedConstraints` dictionary

partial dictionary MediaTrackSupportedConstraints {
    boolean videoKind = true;
    boolean depthNear = true;
    boolean depthFar = true;
    boolean focalLengthX = true;
    boolean focalLengthY = true;
};

6.1.1 Dictionary `MediaTrackSupportedConstraints` Members

videoKind of type boolean, defaulting to true: See videoKind for details.
depthNear of type boolean, defaulting to true: See depthNear for details.
depthFar of type boolean, defaulting to true: See depthFar for details.
focalLengthX of type boolean, defaulting to true: See focalLengthX for details.
focalLengthY of type boolean, defaulting to true: See focalLengthY for details.

6.2 `MediaTrackCapabilities` dictionary

partial dictionary MediaTrackCapabilities {
    DOMString               videoKind;
    (double or DoubleRange) depthNear;
    (double or DoubleRange) depthFar;
    (double or DoubleRange) focalLengthX;
    (double or DoubleRange) focalLengthY;
};

6.2.1 Dictionary `MediaTrackCapabilities` Members

videoKind of type DOMString: See videoKind for details.
depthNear of type (double or DoubleRange): See depthNear for details.
depthFar of type (double or DoubleRange): See depthFar for details.
focalLengthX of type (double or DoubleRange): See focalLengthX for details.
focalLengthY of type (double or DoubleRange): See focalLengthY for details.

6.3 MediaTrackConstraints

partial dictionary MediaTrackConstraintSet {
    ConstrainDOMString videoKind;
    ConstrainDouble    depthNear;
    ConstrainDouble    depthFar;
    ConstrainDouble    focalLengthX;
    ConstrainDouble    focalLengthY;
};

6.3.1 Dictionary `MediaTrackConstraintSet` Members

videoKind of type ConstrainDOMString: See videoKind for details.
depthNear of type ConstrainDouble: See depthNear for details.
depthFar of type ConstrainDouble: See depthFar for details.
focalLengthX of type ConstrainDouble: See focalLengthX for details.
focalLengthY of type ConstrainDouble: See focalLengthY for details.

6.4 `MediaTrackSettings` dictionary

partial dictionary MediaTrackSettings {
    DOMString videoKind;
    double    depthNear;
    double    depthFar;
    double    focalLengthX;
    double    focalLengthY;
};

6.4.1 Dictionary `MediaTrackSettings` Members

videoKind of type DOMString: See videoKind for details.
depthNear of type double: See depthNear for details.
depthFar of type double: See depthFar for details.
focalLengthX of type double: See focalLengthX for details.
focalLengthY of type double: See focalLengthY for details.

6.5 Constrainable properties

The following constrainable properties are defined to apply only to video MediaStreamTrack objects:

Property Name	Values	Notes
videoKind	`ConstrainDOMString`	This string should be one of the members of `VideoKindEnum`. The members describe the kind of video that the camera can capture. Note that `getConstraints` may not return exactly the same string for strings not in this enum. This preserves the possibility of using a future version of WebIDL enum for this property.

enum VideoKindEnum {
    "color",
    "depth"
};

Enumeration description
`color`	The source is capturing color images.
`depth`	The source is capturing depth maps.

Note

If the user agent requests a combined depth+color stream, the devices in the constraint should be satisfied as belonging to the same group or physical device. The decision to select and satisfy which device pair is left up to the implementation.

The MediaStream consumer for the depth-only stream and depth+color stream is the video element [HTML].

Note

New consumers may be added in a future version of this specification.

If a MediaStreamTrack whose videoKind of Settings is muted or disabled, it MUST render frames as if all the pixels would be 0.

6.5.1 Implementation considerations

This section is non-normative.

A color stream track and a depth stream track can be combined into one depth+color stream. The rendering of the two tracks are intended to be synchronized. The resolution of the two tracks are intended to be same. And the coordination of the two tracks are intended to be calibrated. These are not hard requirements, since it might not be possible to synchronize tracks from sources.

The following constrainable properties are defined to apply only to depth stream tracks

Property Name	Values	Notes
depthNear	`ConstrainDouble`	The near value, in meters.
depthFar	`ConstrainDouble`	The far value, in meters.
focalLengthX	`ConstrainDouble`	The horizontal focal length, in pixels.
focalLengthY	`ConstrainDouble`	The vertical focal length, in pixels.

The depthNear and depthFar constrainable properties, when set, allow the implementation to pick the best depth camera mode optimized for the range [depthNear, depthFar] and help minimize the error introduced by the lossy conversion from the depth value d to a quantized d _8bit and back to an approximation of the depth value d.

If the depthFar property's value is less than the depthNear property's value, the depth stream track is overconstrained.

If the near value, far value, horizontal focal length or vertical focal length is fixed due to a hardware or software limitation, the corresponding constrainable property's value MUST be set to the value reported by the underlying implementation. (For example, the focal lengths of the lens may be fixed, or the underlying platform may not expose the focal length information.)

6.6 `WebGLRenderingContext` interface

6.6.1 Implementation considerations

This section is non-normative.

Note

This section is currently work in progress, and subject to change.

A video element whose source is a MediaStream object containing a depth stream track may be uploaded to a WebGL texture of format RGB and type UNSIGNED_BYTE. [WEBGL]

For each pixel of this WebGL texture, the R component represents the lower 8 bit value of 16 bit depth value, the G component represents the upper 8 bit value of 16 bit depth value and the value in B component is not defined.

B.1 Normative references

[GETUSERMEDIA]: Media Capture and Streams. Daniel Burnett; Adam Bergkvist; Cullen Jennings; Anant Narayanan; Bernard Aboba. W3C. 19 May 2016. W3C Candidate Recommendation. URL: https://www.w3.org/TR/mediacapture-streams/
[HTML]: HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[PERMISSIONS]: The Permissions API. Mounir Lamouri; Marcos Caceres. W3C. 7 April 2015. W3C Working Draft. URL: https://www.w3.org/TR/permissions/
[RFC2119]: Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WEBIDL]: Web IDL. Cameron McCormack; Boris Zbarsky; Tobie Langel. W3C. 15 December 2016. W3C Working Draft. URL: https://www.w3.org/TR/WebIDL-1/

B.2 Informative references

[WEBGL]: WebGL Specification, Version 1.0. Chris Marrin (Apple Inc.). Khronos. 10 February 2011. URL: https://www.khronos.org/registry/webgl/specs/1.0/

Media Capture Depth Stream Extensions

W3C Working Draft 27 February 2017

Abstract

Status of This Document

1. Introduction

2. Use cases and requirements

3. Conformance

4. Dependencies

5. Terminology

5.1 Depth map

6. Extensions

6.1 `MediaTrackSupportedConstraints` dictionary

6.1.1 Dictionary `MediaTrackSupportedConstraints` Members

6.2 `MediaTrackCapabilities` dictionary

6.2.1 Dictionary `MediaTrackCapabilities` Members

6.3 MediaTrackConstraints

6.3.1 Dictionary `MediaTrackConstraintSet` Members

6.4 `MediaTrackSettings` dictionary

6.4.1 Dictionary `MediaTrackSettings` Members

6.5 Constrainable properties

6.5.1 Implementation considerations

6.6 `WebGLRenderingContext` interface

6.6.1 Implementation considerations

7. Examples

Playback of depth and color streams from same device group.

WebGL Fragment Shader based post-processing

8. Privacy and security considerations

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

Abstract

Status of This Document

1. Introduction

2. Use cases and requirements

3. Conformance

4. Dependencies

5. Terminology

5.1 Depth map

6. Extensions

6.1 MediaTrackSupportedConstraints dictionary

6.1.1 Dictionary MediaTrackSupportedConstraints Members

6.2 MediaTrackCapabilities dictionary

6.2.1 Dictionary MediaTrackCapabilities Members

6.3 MediaTrackConstraints

6.3.1 Dictionary MediaTrackConstraintSet Members

6.4 MediaTrackSettings dictionary

6.4.1 Dictionary MediaTrackSettings Members

6.5 Constrainable properties

6.5.1 Implementation considerations

6.6 WebGLRenderingContext interface

6.6.1 Implementation considerations

7. Examples

Playback of depth and color streams from same device group.

WebGL Fragment Shader based post-processing

8. Privacy and security considerations

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

6.1 `MediaTrackSupportedConstraints` dictionary

6.1.1 Dictionary `MediaTrackSupportedConstraints` Members

6.2 `MediaTrackCapabilities` dictionary

6.2.1 Dictionary `MediaTrackCapabilities` Members

6.3.1 Dictionary `MediaTrackConstraintSet` Members

6.4 `MediaTrackSettings` dictionary

6.4.1 Dictionary `MediaTrackSettings` Members

6.6 `WebGLRenderingContext` interface