Copyright ยฉ 2017 W3Cยฎ (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This specification extends the Media Capture and Streams specification [GETUSERMEDIA] to allow a depth-only stream or combined depth+color stream to be requested from the web platform using APIs familiar to web authors.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This extensions specification defines a new media type and constrainable property per Extensibility guidelines of the Media Capture and Streams specification [ GETUSERMEDIA]. Horizontal reviews and feedback from early implementations of this specification are encouraged.
This document was published by the Device and Sensors Working Group and the Web Real-Time Communications Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Device and Sensors Working Group) and a public list of any patent disclosures (Web Real-Time Communications Working Group) made in connection with the deliverables of each group; these pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 March 2017 W3C Process Document.
Depth cameras are increasingly being integrated into devices such as phones, tablets, and laptops. Depth cameras provide a depth map, which conveys the distance information between points on an object's surface and the camera. With depth information, web content and applications can be enhanced by, for example, the use of hand gestures as an input mechanism, or by creating 3D models of real-world objects that can interact and integrate with the web platform. Concrete applications of this technology include more immersive gaming experiences, more accessible 3D video conferences, and augmented reality, to name a few.
To bring depth capability to the web platform, this specification
extends the MediaStream
interface [GETUSERMEDIA] to enable it to also contain depth-based MediaStreamTrack
s. A depth-based
MediaStreamTrack
, referred to as a depth stream track, represents an abstraction of a stream of frames that can each be converted to objects which contain an array of pixel data, where each pixel represents the distance between the camera and the objects in the scene for that point in the array. A MediaStream
object that contains one or more depth stream tracks is referred to as a
depth-only stream or depth+color stream.
This specification attempts to address the Use Cases and Requirements for accessing depth stream from a depth camera. See also the Examples section for concrete usage examples.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key word MUST is to be interpreted as described in [RFC2119].
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.
The MediaStreamTrack
and MediaStream
interfaces this specification extends are defined in [GETUSERMEDIA].
The concepts Constraints
,
Capabilities
,
ConstraintSet
, and Settings
, and
types
of constrainable properties are defined in [GETUSERMEDIA].
The ConstrainDOMString
,
ConstrainDouble
,
ConstrainBoolean
, and DoubleRange
types are defined in [GETUSERMEDIA].
MediaTrackSettings
,
MediaTrackConstraints
,
MediaTrackSupportedConstraints
,
MediaTrackCapabilities
, and MediaTrackConstraintSet
dictionaries this specification extends are defined in [
GETUSERMEDIA].
The getUserMedia()
,
getSettings()
methods and the NavigatorUserMediaSuccessCallback
callback are defined in [GETUSERMEDIA].
The concepts muted,
disabled, and
overconstrained
as applied to MediaStreamTrack
are defined in [GETUSERMEDIA].
The terms source and consumer are defined in [GETUSERMEDIA].
The MediaDeviceKind
enumeration is defined in [GETUSERMEDIA].
The video
element and ImageData
(and its data
attribute and
Canvas Pixel
ArrayBuffer
), VideoTrack
, HTMLMediaElement
(and its
srcObject
attribute), HTMLVideoElement
interfaces and the CanvasImageSource
enum are defined in [HTML].
The terms media data, media provider object, assigned media provider object, and the concept potentially playing are defined in [ HTML].
The term permission and the permission name "camera
" are defined in [PERMISSIONS].
The DataView
,
Uint8ClampedArray
, and Uint16Array
buffer source types are defined in [WEBIDL].
The meaning of dictionary member being present or not present, and its default value are defined in [WEBIDL].
The term depth+color stream means a MediaStream
object that contains one or more MediaStreamTrack
objects whose
videoKind
of Settings
is "depth
" (depth stream track) and one or more MediaStreamTrack
objects whose videoKind
of Settings
is "
color
" (color stream track).
The term depth-only stream means a MediaStream
object that contains one or more MediaStreamTrack
objects whose
videoKind
of Settings
is "depth
" (depth stream track) only.
The term color-only stream means a MediaStream
object that contains one or more MediaStreamTrack
objects whose
videoKind
of Settings
is "color
" (color stream track) only, and optionally of kind "
audio
".
The term depth stream track means a MediaStreamTrack
object whose videoKind
of Settings
is "
depth
". It represents a media stream track whose
source is a depth camera.
The term color stream track means a MediaStreamTrack
object whose videoKind
of Settings
is "
color
". It represents a media stream track whose
source is a color camera.
A depth map is an abstract representation of a frame of a depth stream track. A depth map is a two-dimensional array that contains information relating to the perpendicular distance of the surfaces of scene objects to camera's near plane. The numeric values in the depth map are referred to as depth map values and represent distances to near plane normalized against the distance between far and near plane.
Normalized depth map value means that it's range is from 0 to 1, where maximum depth map value of 1 corresponds to distances equal to far value. Following the conversion between depth map value and distance, the minumum value of 0 would correspond to distances equal to near value, but 0 has a special meaning - it is an invalid depth map value and represents that the user agent is unable to acquire depth information for the given pixel for any reason. Normalized depth map value is represented using floating-point or unsigned fixed-point formats [OpenGL ES 3.0.5]#subsection.2.1.6.
A depth map has an associated near value which is a double. It represents the minimum range in meters and it defines near plane which is a plane perpendicular to camera viewing direction on distance near value from the camera origin.
A depth map has an associated far value which is a double. It represents the maximum range in meters. It represents the minimum range in meters and it defines far plane which is a plane perpendicular to camera viewing direction on distance far value from the camera origin.
A depth map has an associated horizontal focal length which is a double. It represents the horizontal focal length of the depth camera, in pixels.
A depth map has an associated vertical focal length which is a double. It represents the vertical focal length of the depth camera, in pixels.
A depth map has an associated principal point, specified by principal point x and principal point y coordinates which are double. It is a concept defined in the pinhole camera model; a projection of perspective center to the image plane.
A depth map has an associated transformation from depth
to video, which is a transformation matrix represented by a Transformation
dictionary. It is used to translate position in depth camera 3D coordinate system to RGB video stream's camera (identified by videoDeviceId) 3D coordinate system. After projecting depth 2D pixel coordinates to 3D space, we use this matrix to transform depth camera 3D space coordinates to RGB video camera 3D space.
Both depth and color cameras usually introduce significant distortion caused by the camera and lens. While in some cases, the effects are not noticeable, these distortions cause errors in image analysis. To map depth map pixel values to corresponding color video track pixels, we use two DistortionCoefficients
dictionaries:
deprojection distortion coefficients and projection
distortion coefficients.
Deprojection distortion coefficients are used for compensating camera distortion when deprojecting 2D pixel coordinates to 3D space coordinates. Projection distortion coefficients are used in the opposite case, when projecting camera 3D space points to pixels. One track doesn't have both of the coefficients specified. The most common scenario is that the depth track has deprojection distortion coefficients or that the color video track has projection distortion coefficients. For the details, see algorithm to map depth pixels to color pixels.
A depth map value is a distance to near plane normalized against the distance between far and near plane:
depth=dโnear farโnear
d=depthโ (farโnear) +near
If the implementation is unable to report the value represented by any of the dictionary members, they are not present in the dictionary.
MediaTrackSupportedConstraints
dictionary
MediaTrackSupportedConstraints
dictionary represents the list of Constraints
recognized by a user agent for controlling the Capabilities
of a MediaStreamTrack
object.
Partial dictionary MediaTrackSupportedConstraints
extends the original dictionary defined in [GETUSERMEDIA]. The dictionary value true represents an applicable constraint.
An applicable constraint is not omitted by the user
agent in step 6.2.2 in the getUserMedia()
algorithm.
partial dictionary MediaTrackSupportedConstraints
{
// Apply to both depth stream track and color stream track:
boolean videoKind
= true;
boolean focalLengthX
= false;
boolean focalLengthY
= false;
boolean principalPointX
= false;
boolean principalPointY
= false;
boolean deprojectionDistortionCoefficients
= false;
boolean projectionDistortionCoefficients
= false;
// Apply to depth stream track:
boolean depthNear
= false;
boolean depthFar
= false;
boolean depthToVideoTransform
= false;
};
MediaTrackCapabilities
dictionary
MediaTrackCapabilities
dictionary represents the
Capabilities
of a MediaStreamTrack
object.
Partial dictionary MediaTrackCapabilities
extends the original
MediaTrackCapabilities
dictionary defined in [
GETUSERMEDIA].
partial dictionary MediaTrackCapabilities
{
// Apply to both depth stream track and color stream track:
DOMString videoKind
;
(double or DoubleRange
) focalLengthX
;
(double or DoubleRange
) focalLengthY
;
(double or DoubleRange
) principalPointX
;
(double or DoubleRange
) principalPointY
;
boolean deprojectionDistortionCoefficients
;
boolean projectionDistortionCoefficients
;
// Apply to depth stream track:
(double or DoubleRange
) depthNear
;
(double or DoubleRange
) depthFar
;
boolean depthToVideoTransform
;
};
MediaTrackConstraintSet
dictionary
ConstraintSet
dictionary specifies each member's set of
allowed values.
The allowed values for ConstrainDOMString
,
ConstrainDouble
, and ConstrainBoolean
types are defined in [GETUSERMEDIA] respectively.
partial dictionary MediaTrackConstraintSet
{
// Apply to both depth stream track and color stream track:
ConstrainDOMString
videoKind
;
ConstrainDouble
focalLengthX
;
ConstrainDouble
focalLengthY
;
ConstrainDouble
principalPointX
;
ConstrainDouble
principalPointY
;
ConstrainBoolean
deprojectionDistortionCoefficients
;
ConstrainBoolean
projectionDistortionCoefficients
;
// Apply to depth stream track:
ConstrainDouble
depthNear
;
ConstrainDouble
depthFar
;
ConstrainBoolean
depthToVideoTransform
;
};
MediaTrackSettings
dictionary
MediaTrackSettings
dictionary represents the Settings
of a MediaStreamTrack
object.
Partial dictionary MediaTrackSettings
extends the original
MediaTrackSettings
dictionary.
partial dictionary MediaTrackSettings
{
// Apply to both depth stream track and color stream track:
DOMString videoKind
;
double focalLengthX
;
double focalLengthY
;
double principalPointX
;
double principalPointY
;
DistortionCoefficients
deprojectionDistortionCoefficients
;
DistortionCoefficients
projectionDistortionCoefficients
;
// Apply to depth stream track:
double depthNear
;
double depthFar
;
Transformation
depthToVideoTransform
;
};
DistortionCoefficients
dictionary
The DistortionCoefficients
dictionary has the k1
, k2
, p1
, p2
and k3
dictionary members that represent the
deprojection distortion coefficients or projection
distortion coefficients. k1,
k2 and k3 are radial distortion coefficients while
p1 and p2 are tangential distortion coefficients.
Radial distortion coefficients and tangential
distortion coefficients are used to deproject depth value to 3D space or to project 3D value to 2D video frame coordinates.
See the algorithm to map depth pixels to color pixels and Brown-Conrady distortion model implementation in 3D point
cloud rendering example GLSL shader.
Transformation
dictionary
dictionary Transformation
{
Float32Array transformationMatrix
;
DOMString videoDeviceId
;
};
The Transformation
dictionary has the
transformationMatrix
dictionary member that is a 16 element array that defines the transformation
matrix of the depth map's camera's 3D coordinate system to video track's camera 3D coordinate system.
The first four elements of the array correspond to the first matrix row, followed by four elements of the second matrix row and so on. It is in format suitable for use with WebGL's uniformMatrix4fv.
The videoDeviceId
dictionary member represents the deviceId
of video camera the depth stream must be synchronized with.
The value of videoDeviceId
can be used as the
deviceId
constraint in [GETUSERMEDIA] to get the corresponding video and audio streams.
The following constrainable properties are defined to apply to both color stream track and depth stream track.
videoKind
The videoKind
member specifies the video
kind of the source.
enum VideoKindEnum
{
"color",
"depth"
};
The VideoKindEnum
enumeration defines the valid video
kinds: color
for
color stream track whose source is a color camera, and depth
for depth
stream track whose source is a depth camera.
The MediaStream
consumer for the depth-only
stream and depth+color stream is the video
element [HTML].
If a MediaStreamTrack
whose videoKind
is
depth
is muted or
disabled, it MUST render frames as if all the pixels would be 0.
A color stream track and a depth stream track can be combined into one depth+color stream. The rendering of the two tracks are intended to be synchronized. The resolution of the two tracks are intended to be same. And the coordination of the two tracks are intended to be calibrated. These are not hard requirements, since it might not be possible to synchronize tracks from sources.
This approach is simple to use but comes with the following caveats: it might might not be supported by the implementation and the resolutions of two tracks are intended to be the same that can require downsampling and degrade quality. The alternative approach is that a web developer implements the algorithm to map depth pixels to color pixels. See the 3D point cloud rendering example code.
focalLengthX
The focalLengthX
member specifies the horizontal
focal length, in pixels.
focalLengthY
The focalLengthY
member specifies the vertical
focal length, in pixels.
principalPointX
The principalPointX
member specifies the principal
point x coordinate, in pixels.
principalPointY
The principalPointY
member specifies the principal
point y coordinate, in pixels.
deprojectionDistortionCoefficients
The deprojectionDistortionCoefficients
member specifies the MediaStreamTrack
's deprojection distortion
coefficients used when deprojecting from 2D to 3D space.
projectionDistortionCoefficients
The projectionDistortionCoefficients
member specifies the MediaStreamTrack
's projection distortion
coefficients used when deprojecting from 2D to 3D space.
The following constrainable properties are defined to apply only to depth stream track.
depthNear
and depthFar
The depthNear
member specifies the near value, in meters.
The depthFar
member specifies the far value, in meters.
The depthNear
and depthFar
constrainable properties, when set, allow the implementation to pick the best depth camera mode optimized for the range [depthNear,
depthFar]
and help minimize the error introduced by the lossy conversion from the depth value d to a quantized d
8bit and back to an approximation of the depth value
d.
If the depthFar
property's value is less than the
depthNear
property's value, the depth stream
track is overconstrained
.
depthToVideoTransform
The depthToVideoTransform
member specifies the
depth map's camera's transformation from depth to
video camera 3D coordinate system.
This section is non-normative.
Depth map values that the camera produces are often in 16-bit normalized unsigned fixed-point format. Application developer can access the data using canvas pixel arraybuffer red color component, but that would cause a precision loss given that it is in 8-bit normalized unsigned fixed-point format.
The same precision loss is related to usage of [WEBGL]
UNSIGNED_BYTE
textures. In order to access the full precision, application developer can use [
WEBGL] floating-point textures.
There are several use-cases which are a good fit to be, at least partially, implemented on the GPU, such as motion recognition, pattern recognition, background removal, as well as 3D point cloud.
This section explains which APIs can be used for some of these mentioned use-cases; the concrete examples are provided in the Examples section.
A video
element whose source is a MediaStream
object containing a depth stream track may be uploaded to a [WEBGL] texture of format RGBA
or
RED
and type FLOAT
. See the specification [
WEBGL] and the upload to float texture example code.
For each pixel of this WebGL texture, the R component represents normalized floating-point depth map value.
Here we list some of the possible approaches.
This section is non-normative.
The algorithms presented in this section explain how a web developer can map depth and color pixels. Concrete example on how to do the mapping is provided in example vertex shader used for 3D point cloud rendering.
When rendering, we want to position a color value from color video frame to corresponding depth map value or 3D point in space defined by depth map value. We use deprojection distortion coefficients to compensate camera distortion when deprojecting 2D pixel coordinates to 3D space coordinates and projection distortion coefficients in the opposite case, when projecting camera 3D space points to pixels.
The algorithm to map depth pixels to color pixels is as follows:
The algorithm to deproject depth map value to point in depth camera is as follows:
Let dx and dy be 2D coordinates, in pixels, of a pixel in depth map.
Let dz be depth map value of the same pixel in the depth map.
Let fx and fy be depth map's horizontal focal length and vertical focal length respectively.
Let cx and cy be depth map's principal point 2D coordinates.
Let 3D coordinates (Xd, Yd, Zd) be the output of this step - a 3D point in depth camera's 3D coordinate system.
px=dx โcx fx
py=dy โcy fy
3D coordinates (Xd, Yd, Zd) in depth camera space are calculated as:
Xd=dz โ px
Yd=dz โ px
Zd=dz
3D coordinates (Xd, Yd, Zd) in depth camera space are calculated as:
r2=px 2 +py 2
r=1+k1โ r2+k2โ r2 2 +k3โ r2 3
Xd=dz โ (pxโ r+2โ p1โ pxโ py+p2โ (r2+2โ px 2 ) )
Yd=dz โ (pyโ r+2โ p2โ pxโ py+p1โ (r2+2โ py 2 ) )
Zd=dz
See depth_deproject function in 3D point cloud rendering example.
The result of project depth value to 3D point step, 3D point (Xd, Yd, Zd), is in depth camera 3D coordinate system. To transform coordinates of the same point in space, but to color camera 3D coordinate system, we use matrix multiplication of transformation from depth to video matrix by the (Xd, Yd, Zd) 3D point vector.
Let (Xc, Yc, Zc) be the output of this step - a 3D coordinates of projected depth map value to color camera 3D space.
Let M be transformation matrix defined in depth
map's depthToVideoTransform
field.
To multiply 4x4 matrix by 3 element vector, we extend the 3D vector by one element to 4 dimensional vector. After multiplication, we use vector's x, y and z coordinates as the result.
โโโ Xc Yc Zc โโโ =โโ โ โโ [M] รโโ โ โโ Xd Yd Zd 1 โโ โ โโ โโ โ โโ .xyz
In 3D point cloud rendering example, this is done by:
vec4 color_point = u_depth_to_color * vec4(depth_point,
1.0);
To project from color 3D to 2D coordinate we use the corresponding color track's MediaTrackSettings. The color track we get using depth map's Transformation
.videoDeviceId - it represents the target color video deviceID that should be used as a constraint with [GETUSERMEDIA] call to get the corresponding color video stream track. After that, we use color track
getSettings()
to access MediaTrackSettings.
Let fx c and fy c be color track's horizontal focal length and vertical focal length respectively.
Let cx c and cy c be color track's principal point 2D coordinates.
The result of this step is 2D coordinate of pixel in color video frame (x, y).
position of pixel in color frame image (x, y) is calculated as:
r2 c =(Xc) 2 +(Yc) 2
r=1+k1โ r2+k2โ r2 2 +k3โ r2 3
px c =rโ Xc Zc
py c =rโ Yc Zc
x=(px c +2โ p1โ px c โ py c +p2โ (r2 c +2โ px 2 c ) ) โ fx c +cx c
y=(py c +2โ p2โ px c โ py c +p1โ (r2 c +2โ py 2 c ) ) โ fy c +cy c
position of pixel in color frame image (x, y) is calculated as:
px c =Xc Zc
py c =Yc Zc
x=px c โ fx c +cx c
y=py c โ fy c +cy c
See color_project function in 3D point cloud rendering example.
This section is non-normative.
navigator.mediaDevices.getUserMedia({
video: {videoKind: {exact: "color"}, groupId: {exact: id}}
}).then(function (stream) {
// Wire the media stream into a <video> element for playback.
// The RGB video is rendered.
var video = document.querySelector('#video');
video.srcObject = stream;
video.play();
}
);
navigator.mediaDevices.getUserMedia({
video: {videoKind: {exact: "depth"}, groupId: {exact: id}}
}).then(function (stream) {
// Wire the depth-only stream into another <video> element for playback.
// The depth information is rendered in its grayscale representation.
var depthVideo = document.querySelector('#depthVideo');
depthVideo.srcObject = stream;
depthVideo.play();
}
);
This code sets up a video element from a depth stream, uploads it to a WebGL 2.0 float texture.
navigator.mediaDevices.getUserMedia({
video: {videoKind: {exact: "depth"}}
}).then(function (stream) {
// wire the stream into a <video> element for playback
var depthVideo = document.querySelector('#depthVideo');
depthVideo.srcObject = stream;
depthVideo.play();
}).catch(function (reason) {
// handle gUM error here
});
let gl = canvas.getContext("webgl2");
// Activate the standard WebGL 2.0 extension for using single component R32F
// texture format.
gl.getExtension('EXT_color_buffer_float');
// Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
gl.TEXTURE_2D,
0,
gl.R32F,
gl.RED,
gl.FLOAT,
depthVideo);
This section is non-normative.
This example extends upload to float texture example.
This code creates the texture to which we will upload the depth video frame. Then, it sets up a named framebuffer, attach the texture as color attachment and, after uploading the depth video to the texture, reads the texture content to Float32Array.
// Initialize texture and framebuffer for reading back the texture.
let depthTexture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
let framebuffer = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer);
gl.framebufferTexture2D(
gl.FRAMEBUFFER,
gl.COLOR_ATTACHMENT0,
gl.TEXTURE_2D,
depthTexture,
0);
let buffer;
// Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
gl.TEXTURE_2D,
0,
gl.R32F,
gl.RED,
gl.FLOAT,
depthVideo);
if (!buffer) {
buffer =
new Float32Array(depthVideo.videoWidth * depthVideo.videoHeight);
}
gl.readPixels(
0,
0,
depthVideo.videoWidth,
depthVideo.videoHeight,
gl.RED,
gl.FLOAT,
buffer);
Use
gl.getParameter(gl.IMPLEMENTATION_COLOR_READ_FORMAT);
to check whether readPixels to gl.RED or gl.RGBA float is supported.
This vertex shader is used for 3D point cloud rendering. The code here shows how the web developer can implement algorithm to map depth pixels to color pixels. Draw call used is glDrawArrays(GL_POINTS, 0, depthMap.width * depthMap.height). Shader output is 3D position of vertices (gl_Position) and color texture sampling coordinates per vertex.
<script id="fragment-shader" type="x-shader/x-fragment">#version 300 es
#define DISTORTION_NONE 0
#define USE_DEPTH_DEPROJECTION_DISTORTION_COEFFICIENTS 1
#define USE_COLOR_PROJECTION_DISTORTION_COEFFICIENTS 2
uniform mat4 u_mvp;
uniform vec2 u_color_size;
uniform vec2 u_depth_size;
uniform highp usampler2D s_depth_texture;
uniform float u_depth_scale_in_meter;
uniform mat4 u_depth_to_color;
uniform vec2 u_color_offset;
uniform vec2 u_color_focal_length;
uniform float u_color_coeffs[5];
uniform int u_color_projection_distortion;
uniform vec2 u_depth_offset;
uniform vec2 u_depth_focal_length;
uniform float u_depth_coeffs[5];
uniform int u_depth_deprojection_distortion;
out vec2 v_tex;
vec3 depth_deproject(vec2 pixel, float depth)
{
vec2 point = (pixel - u_depth_offset) / u_depth_focal_length;
if(u_depth_deprojection_distortion == USE_DEPTH_DEPROJECTION_DISTORTION_COEFFICIENTS)
{
float r2 = dot(point, point);
float f = 1.0 + u_depth_coeffs[0] * r2 + u_depth_coeffs[1] * r2 * r2 + u_depth_coeffs[4] * r2 * r2 * r2;
float ux = point.x * f + 2.0 * u_depth_coeffs[2] * point.x * point.y +
u_depth_coeffs[3] * (r2 + 2.0 * point.x * point.x);
float uy = point.y * f + 2.0 * u_depth_coeffs[3] * point.x * point.y +
u_depth_coeffs[2] * (r2 + 2.0 * point.y * point.y);
point = vec2(ux, uy);
}
return vec3(point * depth, depth);
}
vec2 color_project(vec3 point)
{
vec2 pixel = point.xy / point.z;
if(u_color_projection_distortion == USE_COLOR_PROJECTION_DISTORTION_COEFFICIENTS)
{
float r2 = dot(pixel, pixel);
float f = 1.0 + u_color_coeffs[0] * r2 + u_color_coeffs[1] * r2 * r2 +
u_color_coeffs[4] * r2 * r2 * r2;
pixel = pixel * f;
float dx = pixel.x + 2.0 * u_color_coeffs[2] * pixel.x * pixel.y +
u_color_coeffs[3] * (r2 + 2.0 * pixel.x * pixel.x);
float dy = pixel.y + 2.0 * u_color_coeffs[3] * pixel.x * pixel.y +
u_color_coeffs[2] * (r2 + 2.0 * pixel.y * pixel.y);
pixel = vec2(dx, dy);
}
return pixel * u_color_focal_length + u_color_offset;
}
void main()
{
vec2 depth_pixel;
// generate lattice pos; (0, 0) (1, 0) (2, 0) ... (w-1, h-1)
depth_pixel.x = mod(float(gl_VertexID) + 0.5, u_depth_size.x);
depth_pixel.y = clamp(floor(float(gl_VertexID) / u_depth_size.x) + 0.5, 0.0, u_depth_size.y);
// get depth
vec2 depth_tex_pos = depth_pixel / u_depth_size;
uint depth = texture(s_depth_texture, depth_tex_pos).r;
float depth_in_meter = float(depth) * u_depth_scale_in_meter;
vec3 depth_point = depth_deproject(depth_pixel, depth_in_meter);
vec4 color_point = u_depth_to_color * vec4(depth_point, 1.0);
vec2 color_pixel = color_project(color_point.xyz);
// map [0, w) to [0, 1]
v_tex = color_pixel / u_color_size;
gl_Position = u_mvp * vec4(depth_point, 1.0);
}
</script>
This section is non-normative.
The privacy and security considerations discussed in [GETUSERMEDIA] apply to this extension specification.
Thanks to everyone who contributed to the Use Cases and Requirements, sent feedback and comments. Special thanks to Ningxin Hu for experimental implementations, as well as to the Project Tango for their experiments.