WebXR Hand Input Module - Level 1

W3C First Public Working Draft,

This version:
https://www.w3.org/TR/2020/WD-webxr-hand-input-1-20201022/
Latest published version:
https://www.w3.org/TR/webxr-hand-input-1/
Editor's Draft:
https://immersive-web.github.io/webxr-hand-input/
Issue Tracking:
GitHub
Inline In Spec
Editors:
(Invited Expert)
Participate:
File an issue (open issues)
Mailing list archive
W3C’s #immersive-web IRC
Unstable API

The API represented in this document is under development and may change at any time.

For additional context on the use of this API please reference the Hand Input Module Explainer.


Abstract

The WebXR Hand Input module expands the WebXR Device API with the functionality to track articulated hand poses.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

The Immersive Web Working Group maintains a list of all bug reports that the group has not yet addressed. This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid. Pull requests with proposed specification text for outstanding issues are strongly encouraged.

This document was published by the Immersive Web Working Group as a Working Draft. This document is intended to become a W3C Recommendation.

This document is a First Public Working Draft.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 15 September 2020 W3C Process Document.

This WebXR Augmented Reality Module is designed as a module to be implemented in addition to WebXR Device API, and is originally included in WebXR Device API which was divided into core and modules.

1. Introduction

On some XR devices it is possible to get fully articulated information about the user’s hands when they are use as input sources.

This API exposes the poses of each of the users' hand skeleton joints. This can be used to do gesture detection or to render a hand model in VR scenarios.

2. Initialization

If an application wants to view articulated hand pose information during a session, the session MUST be requested with an appropriate feature descriptor. The string "hand-tracking" is introduced by this module as a new valid feature descriptor for articulated hand tracking.

The "hand-tracking" feature descriptor should only be granted for an XRSession its XR device has physical hand input sources that supports hand tracking.

The user agent MAY gate support for hand based XRInputSources based upon this feature descriptor.

NOTE: This means that if an XRSession does not request the "hand-tracking" feature descriptor, the user agent may choose to not support input controllers that are hand based.

3. Physical Hand Input Sources

An XRInputSource is a physical hand input source if it tracks a physical hand. A physical hand input source supports hand tracking if it supports reporting the poses of one or more skeleton joints defined in this specification.

Physical hand input sources MUST include the input profile name of "generic-hand-select" in their profiles.

3.1. XRInputSource

partial interface XRInputSource {
   readonly attribute XRHand? hand;
};

The hand attribute on a physical hand input source that supports hand tracking will be an XRHand object giving access to the underlying hand-tracking capabilities. hand will have its input source set to this.

If the XRInputSource belongs to an XRSession that has not been requested with the "hand-tracking" feature descriptor, hand MUST be null.

3.2. Skeleton Joints

A physical hand input source is made up of many skeleton joints.

A skeleton joint for a given hand can be uniquely identified by a skeleton joint index, which is a nonnegative integer.

A skeleton joint may have an associated bone that it is named after and used to orient its -Z axis. The associated bone of a skeleton joint is the bone that comes after the joint when moving towards the fingertips. The tip and wrist joints have no associated bones.

A skeleton joint has a radius which is the radius of a sphere placed at its center so that it roughly touches the skin on both sides of the hand.

This specification defines the following skeleton joints:

Skeleton joint Skeleton joint index
Wrist 0
Thumb Metacarpal 1
Proximal Phalanx 2
Distal Phalanx 3
Tip 4
Index finger Metacarpal 5
Proximal Phalanx 6
Intermediate Phalanx 7
Distal Phalanx 8
Tip 9
Middle finger Metacarpal 10
Proximal Phalanx 11
Intermediate Phalanx 12
Distal Phalanx 13
Tip 14
Ring finger Metacarpal 15
Proximal Phalanx 16
Intermediate Phalanx 17
Distal Phalanx 18
Tip 9
Little finger Metacarpal 20
Proximal Phalanx 21
Intermediate Phalanx 22
Distal Phalanx 23
Tip 24

Visual aid demonstrating joint layout

3.3. XRHand

[Exposed=Window]
interface XRHand {
    iterable<XRJointSpace>;
    readonly attribute unsigned long length;
    getter XRJointSpace joint(unsigned long jointIndex);

    const unsigned long WRIST = 0;

    const unsigned long THUMB_METACARPAL = 1;
    const unsigned long THUMB_PHALANX_PROXIMAL = 2;
    const unsigned long THUMB_PHALANX_DISTAL = 3;
    const unsigned long THUMB_PHALANX_TIP = 4;

    const unsigned long INDEX_METACARPAL = 5;
    const unsigned long INDEX_PHALANX_PROXIMAL = 6;
    const unsigned long INDEX_PHALANX_INTERMEDIATE = 7;
    const unsigned long INDEX_PHALANX_DISTAL = 8;
    const unsigned long INDEX_PHALANX_TIP = 9;

    const unsigned long MIDDLE_METACARPAL = 10;
    const unsigned long MIDDLE_PHALANX_PROXIMAL = 11;
    const unsigned long MIDDLE_PHALANX_INTERMEDIATE = 12;
    const unsigned long MIDDLE_PHALANX_DISTAL = 13;
    const unsigned long MIDDLE_PHALANX_TIP = 14;

    const unsigned long RING_METACARPAL = 15;
    const unsigned long RING_PHALANX_PROXIMAL = 16;
    const unsigned long RING_PHALANX_INTERMEDIATE = 17;
    const unsigned long RING_PHALANX_DISTAL = 18;
    const unsigned long RING_PHALANX_TIP = 19;

    const unsigned long LITTLE_METACARPAL = 20;
    const unsigned long LITTLE_PHALANX_PROXIMAL = 21;
    const unsigned long LITTLE_PHALANX_INTERMEDIATE = 22;
    const unsigned long LITTLE_PHALANX_DISTAL = 23;
    const unsigned long LITTLE_PHALANX_TIP = 24;
};

Every XRHand has an associated input source, which is the physical hand input source that it tracks.

Each XRHand has a list of joint spaces which is a list of XRJointSpaces corresponding to each skeleton joint defined in this specification. These all will have their hand set to this.

If an individual device does not support a joint defined in this specification, it MUST emulate it instead.

The list of joint spaces MUST NOT change over the course of a session.

The length attribute MUST return the number 25

The joint(jointIndex) getter when invoked runs the following steps:
  1. Look for an XRJointSpace in this's list of joint spaces with joint index corresponding to jointIndex.

  2. Handle the result of the search as follows:

    If found:
    Return the XRJointSpace.
    Otherwise:
    Return null

3.4. XRJointSpace

[Exposed=Window]
interface XRJointSpace: XRSpace {};

The native origin of an XRJointSpace is the position and orientation of the underlying joint.

The native origin of the XRJointSpace may only be reported when native origins of all other XRJointSpaces on the same hand are being reported. When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

Note: This means that when fetching poses you will either get an entire hand or none of it.

This by default precludes faithfully exposing polydactyl/oligodactyl hands, however for fingerprinting concerns it will likely need to be a separate opt-in, anyway. See Issue 11 for more details.

The native origin has its -Y direction pointing perpendicular to the skin, outwards from the palm, and -Z direction pointing along their associated bone, away from the wrist.

For tip skeleton joints where there is no associated bone, the -Z direction is the same as that for the associated distal joint, i.e. the direction is along that of the previous bone. For wrist skeleton joints the -Z direction SHOULD point roughly towards the center of the palm.

Every XRJointSpace has an associated hand, which is the XRHand that created it.

Every XRJointSpace has an associated joint index, which is the joint index corresponding to the joint it tracks.

Every XRJointSpace has an associated joint, which is skeleton joint corresponding to its joint index.

4. Frame Loop

4.1. XRFrame

partial interface XRFrame {
    XRJointPose? getJointPose(XRJointSpace joint, XRSpace baseSpace);
    boolean fillJointRadii(sequence<XRJointSpace> jointSpaces, Float32Array radii);

    boolean fillPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array transforms);
};

The getJointPose(XRJointSpace joint, XRSpace baseSpace) method provides the pose of joint relative to baseSpace as an XRJointPose, at the XRFrame's time.

When this method is invoked, the user agent MUST run the following steps:

  1. Let frame be this.

  2. Let session be frame’s session object.

  3. If frame’s active boolean is false, throw an InvalidStateError and abort these steps.

  4. If baseSpace’s session or joint’s session are different from this session, throw an InvalidStateError and abort these steps.

  5. Let pose be a new XRJointPose object in the relevant realm of session.

  6. Populate the pose of joint in baseSpace at the time represented by frame into pose, with force emulation set to false.

  7. If pose is null return null.

  8. Set pose’s radius to the radius of joint, emulating it if necessary.

  9. Return pose.

The fillJointRadii(sequence<XRJointSpace> jointSpaces, Float32Array radii) method populates radii with the radii of the jointSpaces, and returns a boolean indicating whether all of the spaces have a valid pose.

When this method is invoked on an XRFrame frame, the user agent MUST run the following steps:

  1. Let frame be this.

  2. Let session be frame’s session object.

  3. If frame’s active boolean is false, throw an InvalidStateError and abort these steps.

  4. For each joint in the jointSpaces:

    1. If joint’s session is different from session, throw an InvalidStateError and abort these steps.

  5. If the length of jointSpaces is larger than the number of elements in radii, throw a TypeError and abort these steps.

  6. let offset be a new number with the initial value of 0.

  7. Let allValid be true.

  8. For each joint in the jointSpaces:

    1. Set the float value of radii at offset to NaN.

    2. If the user agent can determine the pose of joint, set the float value of radii at offset to that radius.

    3. If the user agent cannot determine the pose of joint, set allValid to false.

    4. Increase offset by 1.

  9. Return allValid.

The fillPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array transforms) method populates transforms with the matrices of the poses of the spaces relative to the baseSpace, and returns a boolean indicating whether all of the spaces have a valid pose.

When this method is invoked on an XRFrame frame, the user agent MUST run the following steps:

  1. Let frame be this.

  2. Let session be frame’s session object.

  3. If frame’s active boolean is false, throw an InvalidStateError and abort these steps.

  4. For each space in the spaces sequence:

    1. If space’s session is different from session, throw an InvalidStateError and abort these steps.

  5. If baseSpace’s session is different from session, throw an InvalidStateError and abort these steps.

  6. If the length of spaces multiplied by 16 is larger than the number of elements in transforms, throw a TypeError and abort these steps.

  7. let offset be a new number with the initial value of 0.

  8. Initialize pose as follows:

    If fillPoses() was called previously, the user agent MAY:
    Let pose be the same object as used by an earlier call.
    Otherwise
    Let pose be a new XRPose object in the relevant realm of session.
  9. Let allValid be true.

  10. For each space in the spaces sequence:

    1. Populate the pose of space in baseSpace at the time represented by frame into pose.

    2. If pose is null, perform the following steps:

    3. Set 16 consecutive elements of the transforms array starting at offset to NaN.

    4. Set allValid to false.

    5. If pose is not null, copy all elements from pose’s matrix member to the transforms array starting at offset.

    6. Increase offset by 16.

  11. Return allValid.

4.2. XRJointPose

An XRJointPose is an XRPose with additional information about the size of the skeleton joint it represents.

[Exposed=Window]
interface XRJointPose: XRPose {
    readonly attribute float radius;
};

The radius attribute returns the radius of the skeleton joint in meters.

The user-agent MUST set radius to an emulated value if the XR device does not have the capability of determining this value, either in general or in the current animation frame (e.g. when the skeleton joint is partially obscured).

5. Privacy & Security Considerations

The WebXR Hand Input API is a powerful feature with that carries significant privacy risks.

Since this feature returns new sensor data, the User Agent MUST ask for explicit consent from the user at session creation time.

Data returned from this API, MUST NOT be so specific that one can detect individual users. If the underlying hardware returns data that is too precise, the User Agent MUST anonymize this data (ie by adding noise or rounding) before revealing it through the WebXR Hand Input API.

This API is only supported in XRSessions created with XRSessionMode of "immersive-vr" or "immersive-ar". "inline" sessions MUST not support this API.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[SERVICE-WORKERS-1]
Alex Russell; et al. Service Workers 1. 2 November 2017. WD. URL: https://www.w3.org/TR/service-workers-1/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WebIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/

IDL Index

partial interface XRInputSource {
   readonly attribute XRHand? hand;
};

[Exposed=Window]
interface XRHand {
    iterable<XRJointSpace>;
    readonly attribute unsigned long length;
    getter XRJointSpace joint(unsigned long jointIndex);

    const unsigned long WRIST = 0;

    const unsigned long THUMB_METACARPAL = 1;
    const unsigned long THUMB_PHALANX_PROXIMAL = 2;
    const unsigned long THUMB_PHALANX_DISTAL = 3;
    const unsigned long THUMB_PHALANX_TIP = 4;

    const unsigned long INDEX_METACARPAL = 5;
    const unsigned long INDEX_PHALANX_PROXIMAL = 6;
    const unsigned long INDEX_PHALANX_INTERMEDIATE = 7;
    const unsigned long INDEX_PHALANX_DISTAL = 8;
    const unsigned long INDEX_PHALANX_TIP = 9;

    const unsigned long MIDDLE_METACARPAL = 10;
    const unsigned long MIDDLE_PHALANX_PROXIMAL = 11;
    const unsigned long MIDDLE_PHALANX_INTERMEDIATE = 12;
    const unsigned long MIDDLE_PHALANX_DISTAL = 13;
    const unsigned long MIDDLE_PHALANX_TIP = 14;

    const unsigned long RING_METACARPAL = 15;
    const unsigned long RING_PHALANX_PROXIMAL = 16;
    const unsigned long RING_PHALANX_INTERMEDIATE = 17;
    const unsigned long RING_PHALANX_DISTAL = 18;
    const unsigned long RING_PHALANX_TIP = 19;

    const unsigned long LITTLE_METACARPAL = 20;
    const unsigned long LITTLE_PHALANX_PROXIMAL = 21;
    const unsigned long LITTLE_PHALANX_INTERMEDIATE = 22;
    const unsigned long LITTLE_PHALANX_DISTAL = 23;
    const unsigned long LITTLE_PHALANX_TIP = 24;
};

[Exposed=Window]
interface XRJointSpace: XRSpace {};

partial interface XRFrame {
    XRJointPose? getJointPose(XRJointSpace joint, XRSpace baseSpace);
    boolean fillJointRadii(sequence<XRJointSpace> jointSpaces, Float32Array radii);

    boolean fillPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array transforms);
};

[Exposed=Window]
interface XRJointPose: XRPose {
    readonly attribute float radius;
};

Issues Index

This by default precludes faithfully exposing polydactyl/oligodactyl hands, however for fingerprinting concerns it will likely need to be a separate opt-in, anyway. See Issue 11 for more details.