This specification extends the Media Capture and Streams specification [GETUSERMEDIA] to allow a depth stream to be requested from the web platform using APIs familiar to web authors.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is not complete and is subject to change. Early experimentations are encouraged to allow the Media Capture Task Force to evolve the specification based on technical discussions within the Task Force, implementation experience gained from early implementations, and feedback from other groups and individuals.

This document was published by the Device APIs Working Group and Web Real-Time Communications Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Device APIs Working Group, Web Real-Time Communications Working Group) made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

Table of Contents

1. Introduction

This specification extends the MediaStream interface [GETUSERMEDIA] to enable it to also contain depth-based MediaStreamTracks. A depth-based MediaStreamTrack, referred to as a depth track, represents an abstraction of a stream of frames that can each be converted to objects which contain an array of pixel data, where each pixel represents the distance between the camera and the objects in the scene for that point in the array. A MediaStream object that contains one or more depth tracks is referred to as a depth stream.

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].

This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.

Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.

3. Terminology

The Constraints, MediaStreamConstraints, and MediaTrackConstraints dictionaries, MediaStreamTrack and MediaStream interfaces this specification extends are defined in [GETUSERMEDIA].

The NavigatorUserMediaSuccessCallback callback is defined in [GETUSERMEDIA].

The ImageData interface and its data attribute are defined in [2DCONTEXT2].

The ArrayBuffer, ArrayBufferView and Uint16Array types are defined in [TYPEDARRAY].

A depth stream is a MediaStream object that contains one or more depth tracks.

A depth track represents media sourced from a depth camera or other similar source.

A depth data represents the underlying depth data structure of an area of a canvas element.

4. Extensions

4.1 MediaStreamConstraints dictionary

partial dictionary MediaStreamConstraints {
    (boolean or MediaTrackConstraints) depth = false;

The depth attribute MUST return the value it was initialized to. When the object is created, this attribute MUST be initialized to false. If true, the attribute represents a request that the MediaStream object returned as an argument of the NavigatorUserMediaSuccessCallback contains a depth track. If a Constraints structure is provided, it further specifies the nature and settings of the depth track.

4.2 MediaStream interface

partial interface MediaStream {
    sequence<MediaStreamTrack> getDepthTracks ();

The getDepthTracks() method, when invoked, MUST return a sequence of MediaStreamTrack objects representing the depth tracks in this stream.

The getDepthTracks() method MUST return a sequence that represents a snapshot of all the MediaStreamTrack objects in this stream's track set whose kind is equal to "depth". The conversion from the track set to the sequence is user agent defined and the order does not have to be stable between calls.

4.3 MediaStreamTrack interface

The kind attribute MUST, on getting, return the string "depth" if the object represents a depth track.

4.4 DepthData interface


Depth cameras usually produce 16-bit depth values per pixel. However, the canvas drawing surface used to draw and manipulate 2D graphics on the web platform does not currently support 16bpp.

To address the issue, this specification defines a new DepthData interface and extends the CanvasRenderingContext2D interface to provide pixel manipulation constructors and methods that create, and interact with, DepthData objects.

[Constructor(unsigned long sw, unsigned long sh),
 Constructor(Uint16Array data, unsigned long sw, optional unsigned long sh),
interface DepthData {
    readonly    attribute unsigned long    width;
    readonly    attribute unsigned long    height;
    readonly    attribute Uint16Array      data;
    readonly    attribute CameraParameters parameters;

New DepthData objects MUST be initialised so that their width attribute is set to the number of entries per row in the depth data, their height attribute is set to the number of rows in the depth data, and their data attribute, except where an existing array is provided, is initialised to a new Uint16Array object. The Uint16Array object MUST use a new Canvas depth ArrayBuffer for its storage, and MUST have a zero start offset and a length equal to the length of its storage, in bytes.

A Canvas Depth ArrayBuffer is an ArrayBuffer whose data is represented in left-to-right order, row by row top to bottom, starting with the top left, with each pixel's depth component being given in that order for each pixel. Each depth component of each pixel represented in this array MUST be in the range 0..65536, representing the 16 bit value for that depth component. The depth components MUST be assigned consecutive indices starting with 0 for the top left pixel's depth component.

4.5 CameraParameters interface

Issue 1

What is the minimum set of metadata CameraParameters should expose? At minimum, the general pinhole camera model have to be calculable. For related discussion, see the Focal length/fov capabilities and general camera intrinsics thread.

Each DepthData interface is associated with a CameraParameters object. It represents the parameters of a pinhole camera model that describes the mathematical relationship between the coordinates of a 3D point and its projection onto the image plane.

interface CameraParameters {
    readonly    attribute double focalLength;
    readonly    attribute double horizontalViewAngle;
    readonly    attribute double verticalViewAngle;

The focalLength attribute, on getting, MUST return the focal length of the camera in millimeters.

The horizontalViewAngle attribute, on getting, MUST return the horizontal angle of view in degrees.

The verticalViewAngle attribute, on getting, MUST return the vertical angle of view in degrees.

4.6 CanvasRenderingContext2D interface

partial interface CanvasRenderingContext2D {
    DepthData createDepthData (double sw, double sh);
    DepthData createDepthData (DepthData depthdata);
    DepthData getDepthData (double sx, double sy, double sw, double sh);
    void      putDepthData (DepthData depthdata, double dx, double dy);
    void      putDepthData (DepthData depthdata, double dx, double dy, double dirtyX, double dirtyY, double dirtyWidth, double dirtyHeight);
Issue 2

Define the algorithms for the createDepthData(), getDepthData(), and putDepthData() methods.

We may want to file bugs against [2DCONTEXT2] to add extension points this specification can hook into to facilitate reuse of common algorithms and avoid monkey patching.

A. Acknowledgements

Thanks to everyone who contributed to the Use Cases and Requirements, sent feedback and comments. Special thanks to Ningxin Hu for experimental implementations, as well as to the Project Tango for their experiments.

B. References

B.1 Normative references

Rik Cabanier; Jatinder Mann; Jay Munro; Tom Wiltzius; Ian Hickson. HTML Canvas 2D Context, Level 2. 28 August 2014. W3C Working Draft. URL: http://www.w3.org/TR/2dcontext2/
Daniel Burnett; Adam Bergkvist; Cullen Jennings; Anant Narayanan. Media Capture and Streams. 3 September 2013. W3C Working Draft. URL: http://www.w3.org/TR/mediacapture-streams/
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: http://www.ietf.org/rfc/rfc2119.txt
David Herman; Kenneth Russell. Typed Array Specification. 26 June 2013. Khronos Working Draft. URL: https://www.khronos.org/registry/typedarray/specs/latest/
Cameron McCormack. Web IDL. 19 April 2012. W3C Candidate Recommendation. URL: http://www.w3.org/TR/WebIDL/