Copyright © 2022 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This specification extends the Media Capture and Streams specification [GETUSERMEDIA] to allow a depth-only stream or combined depth+color stream to be requested from the web platform using APIs familiar to web authors.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Web Real-Time Communications Working Group as a Discontinued Draft using the Recommendation track.
Publication as a Discontinued Draft implies that this document is no longer intended to advance or to be maintained. It is inappropriate to cite this document as other than abandoned work.
The Working Group has decided to discontinue work on this specification due to lack of implementation momentum.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 2 November 2021 W3C Process Document.
Depth cameras are increasingly being integrated into devices such as phones, tablets, and laptops. Depth cameras provide a depth map, which conveys the distance information between points on an object's surface and the camera. With depth information, web content and applications can be enhanced by, for example, the use of hand gestures as an input mechanism, or by creating 3D models of real-world objects that can interact and integrate with the web platform. Concrete applications of this technology include more immersive gaming experiences, more accessible 3D video conferences, and augmented reality, to name a few.
        To bring depth capability to the web platform, this specification
        extends
        the MediaStream interface [GETUSERMEDIA] to enable it to also
        contain depth-based MediaStreamTracks. A depth-based
        MediaStreamTrack, referred to as a depth stream track,
        represents an abstraction of a stream of frames that can each be
        converted to objects which contain an array of pixel data, where each
        pixel represents the distance between the camera and the objects in the
        scene for that point in the array. A MediaStream object that
        contains one or more depth stream tracks is referred to as a
        depth-only stream or depth+color stream.
      
This specification attempts to address the Use Cases and Requirements for accessing depth stream from a depth camera. See also the Examples section for concrete usage examples.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key word MUST in this document is to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.
        The MediaStreamTrack
        and MediaStream
        interfaces this specification extends are defined in [GETUSERMEDIA].
      
        The concepts Constraints,
        Capabilities,
        ConstraintSet,
        and Settings, and
        types
        of constrainable properties are defined in [GETUSERMEDIA].
      
        The ConstrainDOMString
        type is defined in [GETUSERMEDIA].
      
        MediaTrackSettings,
        MediaTrackConstraints,
        MediaTrackSupportedConstraints,
        MediaTrackCapabilities,
        and MediaTrackConstraintSet
        dictionaries this specification extends are defined in
        [GETUSERMEDIA].
      
        The getUserMedia()
        is defined in [GETUSERMEDIA].
      
        The concepts muted and
        disabled as applied
        to MediaStreamTrack are defined in [GETUSERMEDIA].
      
The terms source and consumer are defined in [GETUSERMEDIA].
        The MediaDeviceKind
        enumeration is defined in [GETUSERMEDIA].
      
        The video
        element and Canvas
        Pixel ArrayBuffer interfaces are defined in
        [HTML].
      
The meaning of dictionary member being present or not present is defined in [WEBIDL].
        The term depth+color stream means a MediaStream
        object that contains one or more MediaStreamTrack objects whose
        videoKind of Settings is "depth"
        (depth stream track) and one or more MediaStreamTrack
        objects whose videoKind of Settings is
        "color" (color stream track).
      
        The term depth-only stream means a MediaStream object
        that contains one or more MediaStreamTrack objects whose
        videoKind of Settings is "depth"
        (depth stream track) only.
      
        The term color-only stream means a MediaStream object
        that contains one or more MediaStreamTrack objects whose
        videoKind of Settings is "color"
        (color stream track) only, and optionally of kind
        "audio".
      
        The term depth stream track means a MediaStreamTrack
        object whose videoKind of Settings is
        "depth". It represents a media stream track whose
        source is a depth camera.
      
        The term color stream track means a MediaStreamTrack
        object whose videoKind of Settings is
        "color". It represents a media stream track whose
        source is a color camera.
      
A depth map is an abstract representation of a frame of a depth stream track. A depth map is a two-dimensional array that contains information relating to the perpendicular distance of the surfaces of scene objects to camera's near plane. The numeric values in the depth map are referred to as depth map values and represent distances to near plane normalized against the distance between far and near plane.
Normalized depth map value means that it's range is from 0 to 1, where maximum depth map value of 1 corresponds to distances equal to far plane. Normalized depth map value is represented using floating-point or unsigned fixed-point formats [OpenGL ES 3.0.5]#subsection.2.1.6.
Depth map's near plane and far plane are concepts of 3D graphics that define camera viewing volume (view frustum). Their definition is outside the scope of this specification.
If the implementation is unable to report the value represented by any of the dictionary members, they are not present in the dictionary.
MediaTrackSupportedConstraints dictionary
        
          MediaTrackSupportedConstraints dictionary represents the list
          of Constraints recognized by a user agent for
          controlling the Capabilities of a MediaStreamTrack
          object.
        
          Partial dictionary MediaTrackSupportedConstraints extends the
          original dictionary defined in [GETUSERMEDIA]. The dictionary
          value true represents an applicable constraint.
        
          An applicable constraint is not omitted by the user
          agent in step 6.2.2 in the getUserMedia() algorithm.
        
WebIDLpartial dictionary MediaTrackSupportedConstraints {
    // Applies to both depth stream track and color stream track:
    boolean videoKind = true;
};
      MediaTrackCapabilities dictionary
        
          MediaTrackCapabilities dictionary represents the
          Capabilities of a MediaStreamTrack object.
        
          Partial dictionary MediaTrackCapabilities extends the original
          MediaTrackCapabilities dictionary defined in
          [GETUSERMEDIA].
        
WebIDLpartial dictionary MediaTrackCapabilities {
    // Applies to both depth stream track and color stream track:
    DOMString videoKind;
};
      MediaTrackConstraintSet dictionary
        
          ConstraintSet dictionary specifies each member's set of
          allowed values.
        
          The allowed values for ConstrainDOMString type are
          defined in [GETUSERMEDIA].
        
WebIDLpartial dictionary MediaTrackConstraintSet {
    // Applies to both depth stream track and color stream track:
    ConstrainDOMString videoKind;
};
      MediaTrackSettings dictionary
        
          MediaTrackSettings dictionary represents the Settings
          of a MediaStreamTrack object.
        
          Partial dictionary MediaTrackSettings extends the original
          MediaTrackSettings dictionary.
        
WebIDLpartial dictionary MediaTrackSettings {
    // Applies to both depth stream track and color stream track:
    DOMString           videoKind;
};
        
            The videoKind constrainable property is defined to
            apply to both color stream track and depth stream
            track. The videoKind member specifies the
            video kind of the source.
          
WebIDLenum VideoKindEnum {
    "color",
    "depth"
};
          
            The VideoKindEnum enumeration defines the valid video
            kinds: color for
            color stream track whose source is a color camera,
            and depth for depth
            stream track whose source is a depth camera.
          
            The MediaStream consumer for the depth-only
            stream and depth+color stream is the video
            element [HTML].
          
            If a MediaStreamTrack whose videoKind is
            depth is muted or
            disabled, it MUST render frames as if all the pixels would
            be 0.
          
This section is non-normative.
Depth map values that the camera produces are often in 16-bit normalized unsigned fixed-point format. Application developer can access the data using canvas pixel arraybuffer red color component, but that would cause a precision loss given that it is in 8-bit normalized unsigned fixed-point format.
            The same precision loss is related to usage of [WEBGL]
            UNSIGNED_BYTE textures. In order to access the full
            precision, application developer can
            use [WEBGL] floating-point textures.
          
There are several use-cases which are a good fit to be, at least partially, implemented on the GPU, such as motion recognition, pattern recognition, background removal, as well as 3D point cloud.
This section explains which APIs can be used for some of these mentioned use-cases; the concrete examples are provided in the Examples section.
              A video element whose source is a MediaStream
              object containing a depth stream track may be
              uploaded to a [WEBGL] texture of format
              RGBA or RED and type
              FLOAT. See the specification [WEBGL] and the
              upload to float texture example code.
            
For each pixel of this WebGL texture, the R component represents normalized floating-point depth map value.
Here we list some of the possible approaches.
This section is non-normative.
navigator.mediaDevices.getUserMedia({
  video: {videoKind: {exact: "color"}, groupId: {exact: id}}
}).then(function (stream) {
    // Wire the media stream into a <video> element for playback.
    // The RGB video is rendered.
    var video = document.querySelector('#video');
    video.srcObject = stream;
    video.play();
  }
);
navigator.mediaDevices.getUserMedia({
  video: {videoKind: {exact: "depth"}, groupId: {exact: id}}
}).then(function (stream) {
    // Wire the depth-only stream into another <video> element for playback.
    // The depth information is rendered in its grayscale representation.
    var depthVideo = document.querySelector('#depthVideo');
    depthVideo.srcObject = stream;
    depthVideo.play();
  }
);This code sets up a video element from a depth stream, uploads it to a WebGL 2.0 float texture.
navigator.mediaDevices.getUserMedia({
  video: {videoKind: {exact: "depth"}}
}).then(function (stream) {
  // wire the stream into a <video> element for playback
  var depthVideo = document.querySelector('#depthVideo');
  depthVideo.srcObject = stream;
  depthVideo.play();
}).catch(function (reason) {
  // handle gUM error here
});
let gl = canvas.getContext("webgl2");
// Activate the standard WebGL 2.0 extension for using single component R32F
// texture format.
gl.getExtension('EXT_color_buffer_float');
// Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
   gl.TEXTURE_2D,
   0,
   gl.R32F,
   gl.RED,
   gl.FLOAT,
   depthVideo);This example extends upload to float texture example.
This code creates the texture to which we will upload the depth video frame. Then, it sets up a named framebuffer, attach the texture as color attachment and, after uploading the depth video to the texture, reads the texture content to Float32Array.
// Initialize texture and framebuffer for reading back the texture.
let depthTexture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
let framebuffer = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer);
gl.framebufferTexture2D(
  gl.FRAMEBUFFER,
  gl.COLOR_ATTACHMENT0,
  gl.TEXTURE_2D,
  depthTexture,
  0);
let buffer;
// Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
   gl.TEXTURE_2D,
   0,
   gl.R32F,
   gl.RED,
   gl.FLOAT,
   depthVideo);
if (!buffer) {
  buffer =
      new Float32Array(depthVideo.videoWidth * depthVideo.videoHeight);
}
gl.readPixels(
  0,
  0,
  depthVideo.videoWidth,
  depthVideo.videoHeight,
  gl.RED,
  gl.FLOAT,
  buffer);
            Use
            gl.getParameter(gl.IMPLEMENTATION_COLOR_READ_FORMAT);
            to check whether readPixels to gl.RED or gl.RGBA float is
            supported.
          
This section is non-normative.
The privacy and security considerations discussed in [GETUSERMEDIA] apply to this extension specification.
Thanks to everyone who contributed to the Use Cases and Requirements, sent feedback and comments. Special thanks to Ningxin Hu for experimental implementations, as well as to the Project Tango for their experiments.
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: