Media Capture Depth Stream Extensions

Abstract

This specification extends the Media Capture and Streams specification [GETUSERMEDIA] to allow a depth stream to be requested from the web platform using APIs familiar to web authors.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

The following changes were made since the W3C First Public Working Draft 07 October 2014:

Removed the DepthData interface
Added depth value presentation into the ImageData interface
Added a depth video element as an image source in the CanvasImageSource typedef
Added support for direct assignment to media elements for MediaStream object containing a depth track
Added new properties to the Settings dictionary
Added new definitions to the terminology section
Added a non-normative implementation considerations section regarding combining a depth track and a video track into one MediaStream object
Added a non-normative implementation considerations section considering the WebGLRenderingContext interface
Added examples for 2D canvas context and WebGL fragment shader based post-processing

This document is not complete and is subject to change. Early experimentations are encouraged to allow the Media Capture Task Force to evolve the specification based on technical discussions within the Task Force, implementation experience gained from early implementations, and feedback from other groups and individuals.

This document was published by the Device APIs Working Group and Web Real-Time Communications Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Device APIs Working Group, Web Real-Time Communications Working Group) made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

4. Terminology

The MediaStreamTrack and MediaStream interfaces this specification extends are defined in [GETUSERMEDIA].

The Constraints, Settings, MediaStreamConstraints, and MediaTrackConstraints dictionaries this specification extends are based upon the Constrainable pattern defined in [GETUSERMEDIA].

The NavigatorUserMediaSuccessCallback callback is defined in [GETUSERMEDIA].

The CanvasRenderingContext2D and ImageData interfaces, and the CanvasImageSource typedef are defined in [2DCONTEXT2].

The ArrayBuffer, ArrayBufferView and Uint16Array types are defined in [TYPEDARRAY].

A depth stream is a MediaStream object that contains one or more depth tracks.

A depth track represents media sourced from a depth camera or other similar source.

A video track represents media sourced from an RGB camera or other similar source.

5. Extensions

5.1 `MediaStreamConstraints` dictionary

partial dictionary MediaStreamConstraints {
    (boolean or MediaTrackConstraints) depth = false;
};

The depth attribute MUST return the value it was initialized to. When the object is created, this attribute MUST be initialized to false. If true, the attribute represents a request that the MediaStream object returned as an argument of the NavigatorUserMediaSuccessCallback contains a depth track. If a Constraints structure is provided, it further specifies the nature and settings of the depth track.

5.2 `MediaStream` interface

partial interface MediaStream {
    sequence<MediaStreamTrack> getDepthTracks ();
};

The getDepthTracks() method, when invoked, MUST return a sequence of MediaStreamTrack objects representing the depth tracks in this stream.

The getDepthTracks() method MUST return a sequence that represents a snapshot of all the MediaStreamTrack objects in this stream's track set whose kind is equal to "depth". The conversion from the track set to the sequence is user agent defined and the order does not have to be stable between calls.

5.2.1 Implementation considerations

This section is non-normative.

A MediaStreamTrack object representing a video track and a MediaStreamTrack object representing a depth track can be combined into one MediaStream object. The rendering of the two tracks are intended to be synchronized. The resolution of the two tracks are intended to be same. And the coordination of the two tracks are intended to be calibrated. These are not hard requirements, since it might not be possible to synchronize tracks from sources.

5.3 `MediaStreamTrack` interface

The kind attribute MUST, on getting, return the string "depth" if the object represents a depth track.

5.4 Direct assignment to media elements

User agents that support MediaStream direct assignment to media elements MUST allow a MediaStream object containing a depth track to be assigned directly to a media element.

For each MediaStreamTrack representing a depth track in the MediaStream, the user agent MUST create a corresponding VideoTrack as defined in [HTML5].

5.5 `CanvasImageSource` typedef

Note

Several methods in the CanvasRenderingContext2D API take the union type CanvasImageSource as an argument. This specification extends the list of image sources for 2D rendering contexts defined in [2DCONTEXT2].

A video element whose source is a MediaStream object containing a depth track is said to be a depth video element.

A depth video element may be used as a CanvasImageSource.

5.6 `ImageData` interface

Note

Depth cameras usually produce 16-bit depth values per pixel. However, the canvas drawing surface used to draw and manipulate 2D graphics on the web platform does not currently support 16bpp.

To address the issue, this specification defines a new data representation for current Canvas Pixel ArrayBuffer of ImageData interface to represent the 16bpp depth image produced by depth cameras.

An ImageData object is said to represent depth data, when the CanvasImageSource used as the image source for the CanvasRenderingContext2D is a depth video element.

When representing a depth image, the Canvas Pixel ArrayBuffer is an ArrayBuffer whose data is represented in left-to-right order, row by row top to bottom, starting with the top left, with each pixel's lower 8 bit of 16 bit depth value, upper 8 bit of 16 bit depth value, 8 bit reserved data, and another 8 bit reserved data being given in that order for each pixel. Each component of each pixel represented in this array must be in the range 0..255, representing the 8 bit value for that component. The components must be assigned consecutive indices starting with 0 for the top left pixel's lower 8 bit of 16 bit depth value component.

5.7 `Settings` dictionary

When the getSettings() method is invoked on a MediaStreamTrack object that represents a depth track, the user agent MUST return a Settings dictionary with the additional properties listed below. When the getSettings() method is invoked on a MediaStreamTrack object that represents a video track, the user agent MAY return a Settings dictionary with the additional properties listed below:

partial dictionary Settings {
    double focalLength = null;
    double horizontalFieldOfView = null;
    double verticalFieldOfView = null;
};

The focalLength attribute MUST return the value it was initialized to. When the object is created, this attribute MUST be initialized to null. It represents the focal length of the camera in millimeters.

The horizontalFieldOfView attribute MUST return the value it was initialized to. When the object is created, this attribute MUST be initialized to null. It represents the horizontal angle of view in degrees.

The verticalFieldOfView attribute MUST return the value it was initialized to. When the object is created, this attribute MUST be initialized to null. It represents the vertical angle of view in degrees.

5.8 `WebGLRenderingContext` interface

5.8.1 Implementation considerations

This section is non-normative.

A video element whose source is a MediaStream object containing a depth track may be uploaded to a WebGL texture of format RGB and type UNSIGNED_BYTE. [WEBGL]

For each pixel of this WebGL texture, the R component represents the lower 8 bit value of 16 bit depth value, the G component represents the upper 8 bit value of 16 bit depth value and the value in B component is not defined.

6. Examples

This section is non-normative.

2D Canvas Context based post-processing

Example 1

var canvasContext = document.createElement("canvas").getContext("2d");
var fps = 60;

navigator.mediaDevices.getUserMedia({
  video: true,
  depth: true,
}).then(function (stream) {
  // wire the stream into a <video> element for playback
  var video = document.querySelector('#video');
  video.srcObject = stream;
  video.play();
  // wire the depth stream into another <video> element to convert kind
  // NOTE: Only the R and G bytes are set to carry 16 bits of data
  var depthVideo = document.querySelector('#depthVideo');
  // construct a new MediaStream out of the existing depth track(stream)
  var depthStream = new MediaStream(stream.getDepthTracks()[0]);
  depthVideo.srcObject = depthStream;
  depthVideo.play();

  depthVideo.onloadedmetadata = function () {
    setInterval(function () {
      canvasContext.drawImage(video);
      var rgbImageData = canvasContext.getImageData(0, 0, w, h);
      var pixels = rgbImageData.data;
      depthCanvasContext.drawImage(depthVideo);
      var depthImageData = depthCanvasContext.getImageData(0, 0, w, h);
      var dexels = depthImageData.data;

      // iterate through depth pixels to convert 2 bytes into 1 Uint16
      for (var x = 0; x < w ; ++x) {
        for (var y = 0; y < h; ++y) {
          var i = (x + y * w) * 4;
          // combine the R & G pixels at (x, y) to get
          // the 16 bit depth pixel value
          var depth = dexels[i] | dexels[i + 1] << 8;
        }
      }
      // do things with pixels and dexels here
    }, 1000/fps);
  };
}).catch(function (reason) {
  // handle gUM error here
});

WebGL Fragment Shader based post-processing

Example 2

// This code sets up a video element from a depth stream, uploads it to a WebGL
// texture, and samples that texture in the fragment shader, reconstructing the
// 16-bit depth values from the red and green channels.
navigator.mediaDevices.getUserMedia({
  depth: true,
}).then(function (stream) {
  // wire the stream into a <video> element for playback
  var depthVideo = document.querySelector('#depthVideo');
  depthVideo.srcObject = stream;
  depthVideo.play();
}).catch(function (reason) {
  // handle gUM error here
});

// ... later, in the rendering loop ...
gl.texImage2D(
   gl.TEXTURE_2D,
   0,
   gl.RGB,
   gl.RGB,
   gl.UNSIGNED_BYTE,
   depthVideo
);

<script id="fragment-shader" type="x-shader/x-fragment">
  varying vec2 v_texCoord;
  // u_tex points to the texture unit containing the depth texture.
  uniform sampler2D u_tex;
  void main() {
    vec4 floatColor = texture2D(u_tex, v_texCoord);
    vec3 rgb = floatColor.rgb;
    // ...
    float depth = rgb.r + 256. * rgb.g;
    // ...
  }
</script>

Media Capture Depth Stream Extensions

W3C Working Draft 29 January 2015

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Use cases and requirements

3. Conformance

4. Terminology

5. Extensions

5.1 `MediaStreamConstraints` dictionary

5.2 `MediaStream` interface

5.2.1 Implementation considerations

5.3 `MediaStreamTrack` interface

5.4 Direct assignment to media elements

5.5 `CanvasImageSource` typedef

5.6 `ImageData` interface

5.7 `Settings` dictionary

5.8 `WebGLRenderingContext` interface

5.8.1 Implementation considerations

6. Examples

2D Canvas Context based post-processing

WebGL Fragment Shader based post-processing

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Use cases and requirements

3. Conformance

4. Terminology

5. Extensions

5.1 MediaStreamConstraints dictionary

5.2 MediaStream interface

5.2.1 Implementation considerations

5.3 MediaStreamTrack interface

5.4 Direct assignment to media elements

5.5 CanvasImageSource typedef

5.6 ImageData interface

5.7 Settings dictionary

5.8 WebGLRenderingContext interface

5.8.1 Implementation considerations

6. Examples

2D Canvas Context based post-processing

WebGL Fragment Shader based post-processing

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

5.1 `MediaStreamConstraints` dictionary

5.2 `MediaStream` interface

5.3 `MediaStreamTrack` interface

5.5 `CanvasImageSource` typedef

5.6 `ImageData` interface

5.7 `Settings` dictionary

5.8 `WebGLRenderingContext` interface