1. Introduction
Hardware that enables Virtual Reality (VR) and Augmented Reality (AR) applications are now broadly available to consumers, offering an immersive computing platform with both new opportunities and challenges. The ability to interact directly with immersive hardware is critical to ensuring that the web is well equipped to operate as a first-class citizen in this environment.
Immersive computing introduces strict requirements for high-precision, low-latency communication in order to deliver an acceptable experience. It also brings unique security concerns for a platform like the web. The WebXR Device API provides the interfaces necessary to enable developers to build compelling, comfortable, and safe immersive applications on the web across a wide variety of hardware form factors.
Other web interfaces, such as the RelativeOrientationSensor
and AbsoluteOrientationSensor
, can be repurposed to surface input from some devices to polyfill the WebXR Device API in limited situations. These interfaces cannot support multiple features of high-end immersive experiences, however, such as 6DoF tracking, presentation to headset peripherals, or tracked input devices.
1.1. Terminology
This document uses the acronym XR throughout to refer to the spectrum of hardware, applications, and techniques used for Virtual Reality, Augmented Reality, and other related technologies. Examples include, but are not limited to:
-
Head-mounted displays, whether they are opaque, transparent, or utilize video passthrough
-
Mobile devices with positional tracking
-
Fixed displays with head tracking capabilities
The important commonality between them being that they offer some degree of spatial tracking with which to simulate a view of virtual content.
Terms like "XR Device", "XR Application", etc. are generally understood to apply to any of the above. Portions of this document that only apply to a subset of these devices will indicate so as appropriate.
The terms 3DoF and 6DoF are used throughout this document to describe the tracking capabilities of XR devices.
-
A 3DoF device, short for "Three Degrees of Freedom", is one that can only track rotational movement. This is common in devices which rely exclusively on accelerometer and gyroscope readings to provide tracking. 3DoF devices do not respond translational movements from the user, though they may employ algorithms to estimate translational changes based on modeling of the neck or arms.
-
A 6DoF device, short for "Six Degrees of Freedom", is one that can track both rotation and translation, enabling for precise 1:1 tracking in space. This typically requires some level of understanding of the user’s environment. That environmental understanding may be achieved via inside-out tracking, where sensors on the tracked device itself (such as cameras or depth sensors) are used to determine the device’s position, or outside-in tracking, where external devices placed in the user’s environment (like a camera or light emitting device) provides a stable point of reference against which the XR device can determine it’s position.
1.2. Application flow
Most applications using the WebXR Device API will follow a similar usage pattern:
-
Query
navigator.xr.isSessionSupported()
to determine if the desired type of XR content is supported by the hardware and UA. -
If so, advertise the XR content to the user.
-
Wait for the user to trigger a user activation event indicating they want to begin viewing XR content.
-
Request an
XRSession
within the user activation event withnavigator.xr.requestSession()
. -
If the
XRSession
request succeeds, use it to run a frame loop to respond to XR input and produce images to display on the XR device in response. -
Continue running the frame loop until the session is shut down by the UA or the user indicates they want to exit the XR content.
2. Model
2.1. XR device
An XR device is a physical unit of hardware that can present imagery to the user. On desktop clients, this is usually a headset peripheral. On mobile clients, it may represent the mobile device itself in conjunction with a viewer harness. It may also represent devices without stereo-presentation capabilities but with more advanced tracking.
An XR device has a list of supported modes (a list of strings) that contains the enumeration values of XRSessionMode
that the XR device supports.
The user-agent MUST have an inline XR Device, which is an XR Device that MUST contain "inline"
in its list of supported modes. The Inline XR Device will report as much pose information of the physical device the user agent is rendering to as possible. This device MAY be the same as the immersive XR device if one is present, but doesn’t have to be.
Note: On phones, the inline XR Device will report gyroscopic pose information of the phone itself. On desktops and laptops without gyroscopes, the inline XR Device will not be able to report a pose. In case the user agent is already running on an XR device, the inline XR device will be the same device, and may support multiple views.
3. Initialization
3.1. navigator.xr
partial interface Navigator { [SecureContext ,SameObject ]readonly attribute XR xr ; };
The xr
attribute’s getter MUST return the XR
object that is associated with the context object.
3.2. XR
[SecureContext ,Exposed =Window ]interface :
XR EventTarget { // MethodsPromise <boolean >isSessionSupported (XRSessionMode ); [
mode NewObject ]Promise <XRSession >requestSession (XRSessionMode ,
mode optional XRSessionInit = {}); // Events
options attribute EventHandler ondevicechange ; };
The user agent MUST create an XR
object when a Navigator
object is created and associate it with that object.
An XR
object is the entry point to the API, used to query for XR features available to the user agent and initiate communication with XR hardware via the creation of XRSession
s.
An XR
object has a list of immersive XR devices (a list of XR device), which MUST be initially an empty list.
An XR
object has an immersive XR device (null or XR device) which is initially null and represents the active XR device from the list of immersive XR devices.
The user agent MUST be able to enumerate immersive XR devices attached to the system, at which time each available device is placed in the list of immersive XR devices. Subsequent algorithms requesting enumeration MUST reuse the cached list of immersive XR devices. Enumerating the devices should not initialize device tracking. After the first enumeration the user agent MUST begin monitoring device connection and disconnection, adding connected devices to the list of immersive XR devices and removing disconnected devices.
Each time the list of immersive XR devices changes the user agent should select an immersive XR device by running the following steps:
-
Let oldDevice be the immersive XR device.
-
If the list of immersive XR devices is an empty list, set the immersive XR device to
null
. -
If the list of immersive XR devices's size is one, set the immersive XR device to the list of immersive XR devices[0].
-
Set the immersive XR device as follows:
- If there are any active
XRSession
s and the list of immersive XR devices contains oldDevice - Set the immersive XR device to oldDevice
- Otherwise
- Set the immersive XR device to a device of the user agent’s choosing
- If there are any active
-
If this is the first time devices have been enumerated or oldDevice equals the immersive XR device, abort these steps.
-
Set the XR compatible boolean of all
WebGLRenderingContextBase
instances tofalse
. -
Queue a task to fire an event named
devicechange
on the context object.
Note: The user agent is allowed to use any criteria it wishes to select an immersive XR device when the list of immersive XR devices contains multiple devices. For example, the user agent may always select the first item in the list, or provide settings UI that allows users to manage device priority. Ideally the algorithm used to select the default device is stable and will result in the same device being selected across multiple browsing sessions.
The user agent ensures an immersive XR device is selected by running the following steps:
-
If the context object's immersive XR device is not null, abort these steps.
The ondevicechange
attribute is an Event handler IDL attribute for the devicechange
event type.
isSessionSupported(mode)
method queries if a given mode is supported by the user agent and device capabilities.
When this method is invoked, it MUST run the following steps:
-
Let promise be a new Promise.
-
If mode is
"inline"
, resolve promise withtrue
and return it. -
Run the following steps in parallel:
-
If the requesting document’s origin is not allowed to use the "xr-spatial-tracking" feature policy, reject promise with a "
SecurityError
"DOMException
and abort these steps. -
If the immersive XR device is null, resolve promise with
false
and abort these steps. -
If the immersive XR device's list of supported modes does not contain mode, resolve promise with
false
and abort these steps. -
Resolve promise with
true
.
-
-
Return promise.
Calling isSessionSupported()
MUST NOT trigger device-selection UI as this would cause many sites to display XR-specific dialogs early in the document lifecycle without user activation. Additionally, calling isSessionSupported()
MUST NOT interfere with any running XR applications on the system, and MUST NOT cause XR-related applications to launch such as system trays or storefronts.
immersive-vr
sessions are supported.
navigator. xr. isSessionSupported( 'immersive-vr' ). then(( supported) => { if ( supported) { // 'immersive-vr' sessions are supported. // Page should advertise support to the user. } else { // 'immersive-vr' sessions are not supported. } }
The XR
object has a pending immersive session boolean, which MUST be initially false
, an active immersive session, which MUST be initially null
, and a list of inline sessions, which MUST be initially empty.
The requestSession(mode, options)
method attempts to initialize an XRSession
for the given mode if possible, entering immersive mode if necessary.
When this method is invoked, the user agent MUST run the following steps:
-
Let promise be a new Promise.
-
Let immersive be
true
if mode is"immersive-vr"
, andfalse
otherwise. -
Check whether the session request is allowed as follows:
- If immersive is
true
-
-
Check if an immersive session request is allowed, and if not reject promise with a "
SecurityError
"DOMException
and return promise. -
If pending immersive session is
true
or active immersive session is notnull
, reject promise with an "InvalidStateError
"DOMException
and return promise. -
Set pending immersive session to
true
.
-
- Otherwise
- Check if an inline session request is allowed, and if not reject promise with a "
SecurityError
"DOMException
and return promise.
- If immersive is
-
Run the following steps in parallel:
-
Choose device as follows:
- If immersive is
true
-
-
Set device to the immersive XR device.
- Otherwise
- Set device to the inline XR device.
- If immersive is
-
Queue a task to perform the following steps:
-
If device is
null
or device’s list of supported modes does not contain mode, run the following steps:-
Reject promise with a "
NotSupportedError
"DOMException
. -
If immersive is
true
, set pending immersive session tofalse
. -
Abort these steps.
-
-
Let session be a new
XRSession
object. -
Initialize the session with session, mode, and device.
-
Resolve the requested features given by options’
requiredFeatures
and options’optionalFeatures
values for session, and let resolved be the returned value. -
If resolved is
false
, run the following steps:-
Reject promise with a "
NotSupportedError
"DOMException
. -
If immersive is
true
, set pending immersive session tofalse
. -
Abort these steps.
-
-
Potentially set the active immersive session as follows:
- If immersive is
true
- Set the active immersive session to session, and set pending immersive session to
false
. - Otherwise
- Append session to the list of inline sessions.
- If immersive is
-
Resolve promise with session.
-
-
-
Return promise.
immersive-vr
XRSession
.
let xrSession; navigator. xr. requestSession( "immersive-vr" ). then(( session) => { xrSession= session; });
3.3. XRSessionMode
The XRSessionMode
enum defines the modes that an XRSession
can operate in.
enum {
XRSessionMode "inline" ,"immersive-vr" };
-
A session mode of
inline
indicates that the session’s output will be shown as an element in the HTML document.inline
session content MUST be displayed in mono (i.e., with a single view). It MAY allow for viewer tracking. User agents MUST allowinline
sessions to be created. -
A session mode of
immersive-vr
indicates that the session’s output will be given exclusive access to the immersive XR device display and that content is not intended to be integrated with the user’s environment.
In this document, the term inline session is synonymous with an inline
session and the term immersive session is synonymous with an immersive-vr
session.
Immersive sessions MUST provide some level of viewer tracking, and content MUST be shown at the proper scale relative to the user and/or the surrounding environment. Additionally, Immersive sessions MUST be given exclusive access to the immersive XR device, meaning that while the immersive session is "visible"
the HTML document is not shown on the immersive XR device's display, nor does content from any other source have exclusive access. Exclusive access does not prevent the user agent from overlaying its own UI, however this UI SHOULD be minimal.
Note: Future specifications or modules may expand the definition of immersive session include additional session modes.
Note: Examples of ways exclusive access may be presented include stereo content displayed on a virtual reality headset.
Note: As an example of overlaid UI, the user-agent or operating system in an immersive session may show notifications over the rendered content.
3.4. Feature Dependencies
Some features of an XRSession
may not be universally available for a number of reasons, among which is the fact not all XR devices can support the full set of features. Another consideration is that some features expose sensitive information which may require a clear signal of user intent before functioning.
Since it is a poor user experience to initialize the underlying XR platform and create an XRSession
only to immediately notify the user that the applications cannot function correctly, developers can indicate required features by passing an XRSessionInit
dictionary to requestSession()
. This will block the creation of the XRSession
if any of the required features are unavailable due to device limitations or in the absence of a clear signal of user intent to expose sensitive information related to the feature.
Additionally, developers are encouraged to design experiences which progressively enhance their functionality when run one more capable devices. Optional features which the experience does not require but will take advantage of when available must also be indicated in an XRSessionInit
dictionary to ensure that user intent can be determined before enabling the feature if necessary.
dictionary {
XRSessionInit sequence <any >requiredFeatures ;sequence <any >optionalFeatures ; };
The requiredFeatures
array contains any Required features for the experience. If any value in the list is not a recognized feature descriptor the XRSession
will not be created. If any feature listed in the requiredFeatures
array is not supported by the XR Device or, if necessary, has not received a clear signal of user intent the XRSession
will not be created.
The optionalFeatures
array contains any Optional features for the experience. If any value in the list is not a recognized feature descriptor it will be ignored. Features listed in the optionalFeatures
array will be enabled if supported by the XR Device and, if necessary, given a clear signal of user intent, but will not block creation of the XRSession
if absent.
Values given in the feature lists are considered a valid feature descriptor if the value is one of the following:
-
Any
XRReferenceSpaceType
enum value
Future iterations of this specification and additional modules may expand the list of accepted feature descriptors.
Note: Features are accepted as an array of any
values to ensure forwards compatibility. It allows unrecognized optional values to be properly ignored as new feature descriptor types are added.
Depending on the XRSessionMode
requested, certain feature descriptors are added to the requiredFeatures
or optionalFeatures
lists by default. The following table describes the default features associated with each session type and feature list:
Feature | Sessions | List |
---|---|---|
"viewer"
| Inline sessions and immersive sessions | requiredFeatures
|
"local"
| Immersive sessions | requiredFeatures
|
The combined list of feature descriptors given by the requiredFeatures
and optionalFeatures
are collectively considered the requested features for an XRSession
.
Some feature descriptors, when present in the requested features list, are subject to feature policy and/or requirements that user intent to use the feature is well understood, via either explicit consent or implicit consent. The following table describes the feature requirements that must be satisfied prior to being enabled:
Feature | Feature Policy Required | Consent Required |
---|---|---|
"local"
| "xr-spatial-tracking" | Inline sessions require consent |
"local-floor"
| "xr-spatial-tracking" | Always requires consent |
"bounded-floor"
| "xr-spatial-tracking" | Always requires consent |
"unbounded"
| "xr-spatial-tracking" | Always requires consent |
Note: "local"
is always included in the requested features of immersive sessions as a default feature, and as such immersive sessions always need to obtain explicit consent or implicit consent.
Requested features can only be enabled for a session if the XR Device is capable of supporting the feature, which means that the feature is known to be supported by the XR Device in some configurations, even if the current configuration has not yet been verified as supporting the feature. The user agent MAY apply more rigorous constraints if desired in order to yield a more consistent user experience.
Note: For example, several VR devices support either configuring a safe boundary for the user to move around within or skipping boundary configuration and operating in a mode where the user is expected to stand in place. Such a device can be considered to be capable of supporting "bounded-floor"
XRReferenceSpace
s even if they are currently not configured with safety boundaries, because it’s expected that the user could configure the device appropriately if the experience required it. This is to allow user agents to avoid fully initializing the XR Device or waiting for the user’s environment to be recognized prior to resolving the requested features if desired. If, however, the user agent knows that the boundary state at the time the session is requested without additional initialization it may choose to reject the "bounded-floor"
feature if the safety boundary not already configured.
To resolve the requested features given requiredFeatures and optionalFeatures for an XRSession
session, the user agent MUST run the following steps:
-
Add every feature descriptor in the default features table associated with session’s mode to the indicated feature list if it is not already present.
-
For each feature in requiredFeatures perform the following steps:
-
If feature is not a valid feature descriptor, return
false
. -
If the requesting document’s origin is not allowed to use any feature policy required by feature as indicated by the feature requirements table, return
false
. -
If session’s XR Device is not capable of supporting the functionality described by feature or the user agent has otherwise determined to reject the feature, return
false
. -
If the functionality described by feature requires explicit consent, append it to consentRequired.
-
Else append feature to session’s list of enabled features.
-
-
For each feature in optionalFeatures perform the following steps:
-
If feature is not a valid feature descriptor, continue to the next entry.
-
If the requesting document’s origin is not allowed to use any feature policy required by feature as indicated by the feature requirements table, continue to the next entry.
-
If session’s XR Device is not capable of supporting the functionality described by feature or the user agent has otherwise determined to reject the feature, continue to the next entry.
-
If the functionality described by feature requires explicit consent, append it to consentOptional.
-
Else append feature to session’s list of enabled features.
-
-
If consentRequired or consentOptional are not empty, request explicit consent to use the functionality described by those features.
-
For each feature in consentRequired perform the following steps:
-
If a clear signal of user intent to enable feature has not been given, return
false
. -
Else append feature to session’s list of enabled features.
-
-
For each feature in consentOptional perform the following steps:
-
If a clear signal of user intent to enable feature has not been given, continue to the next entry.
-
Else append feature to session’s list of enabled features.
-
-
Return
true
4. Session
4.1. XRSession
Any interaction with XR hardware is done via an XRSession
object, which can only be retrieved by calling requestSession()
on the XR
object. Once a session has been successfully acquired, it can be used to poll the viewer pose
, query information about the user’s environment, and present imagery to the user.
The user agent, when possible, SHOULD NOT initialize device tracking or rendering capabilities until an XRSession
has been acquired. This is to prevent unwanted side effects of engaging the XR systems when they’re not actively being used, such as increased battery usage or related utility applications from appearing when first navigating to a page that only wants to test for the presence of XR hardware in order to advertise XR features. Not all XR platforms offer ways to detect the hardware’s presence without initializing tracking, however, so this is only a strong recommendation.
enum {
XRVisibilityState "visible" ,"visible-blurred" ,"hidden" , }; [SecureContext ,Exposed =Window ]interface :
XRSession EventTarget { // Attributesreadonly attribute XRVisibilityState visibilityState ; [SameObject ]readonly attribute XRRenderState renderState ; [SameObject ]readonly attribute XRInputSourceArray inputSources ; // Methodsvoid updateRenderState (optional XRRenderStateInit = {}); [
state NewObject ]Promise <XRReferenceSpace >requestReferenceSpace (XRReferenceSpaceType );
type long requestAnimationFrame (XRFrameRequestCallback );
callback void cancelAnimationFrame (long );
handle Promise <void >end (); // Eventsattribute EventHandler onend ;attribute EventHandler onselect ;attribute EventHandler oninputsourceschange ;attribute EventHandler onselectstart ;attribute EventHandler onselectend ;attribute EventHandler onvisibilitychange ; };
Each XRSession
has a mode, which is one of the values of XRSessionMode
.
To initialize the session, given session, mode, and device, the user agent MUST run the following steps:
-
Set session’s mode to mode.
-
Set session’s XR device to device.
-
If no other features of the user agent have done so already, perform the necessary platform-specific steps to initialize the device’s tracking and rendering capabilities, including showing any necessary instructions to the user.
Note: Some devices require additional user instructions for activation. For example, going into immersive mode on a phone-based headset device requires inserting the phone into the headset, and doing so on a desktop browser connected to an external headset requires wearing the headset. It is the responsibility of the user agent — not the author — to ensure any such instructions are shown.
A number of different circumstances may shut down the session, which is permanent and irreversible. Once a session has been shut down the only way to access the XR device's tracking or rendering capabilities again is to request a new session. Each XRSession
has an ended boolean, initially set to false
, that indicates if it has been shut down.
When an XRSession
is shut down the following steps are run:
-
Let session be the target
XRSession
object. -
Set session’s ended value to
true
. -
If the active immersive session is equal to session, set the active immersive session to
null
. -
Remove session from the list of inline sessions.
-
Reject any outstanding promises returned by session with an
InvalidStateError
, except for any promises returned byend()
. -
If no other features of the user agent are actively using them, perform the necessary platform-specific steps to shut down the device’s tracking and rendering capabilities. This MUST include:
-
Releasing exclusive access to the XR device if session is an immersive session.
-
Deallocating any graphics resources acquired by session for presentation to the XR device.
-
Putting the XR device in a state such that a different source may be able to initiate a session with the same device if session is an immersive session.
-
-
Queue a task that fires an
XRSessionEvent
namedend
on session.
The end()
method provides a way to manually shut down a session. When invoked, it MUST run the following steps:
-
Let promise be a new Promise.
-
Queue a task to perform the following steps:
-
Wait until any platform-specific steps related to shutting down the session have completed.
-
Resolve promise.
-
-
Return promise.
Each XRSession
has a list of enabled features, which is a list of feature descriptors which MUST be initially an empty list
Each XRSession
has an active render state which is a new XRRenderState
, and a pending render state, which is an XRRenderState
which is initially null
.
The renderState
attribute returns the XRSession
's active render state.
Each XRSession
has a minimum inline field of view and a maximum inline field of view, defined in radians. The values MUST be determined by the user agent and MUST fall in the range of 0
to PI
.
Each XRSession
has a minimum near clip plane and a maximum far clip plane, defined in meters. The values MUST be determined by the user agent and MUST be non-negative. The minimum near clip plane SHOULD be less than 0.1
. The maximum far clip plane SHOULD be greater than 1000.0
(and MAY be infinite).
The updateRenderState(newState)
method queues an update to the active render state to be applied on the next frame. Unset fields of the XRRenderStateInit
newState passed to this method will not be changed.
When this method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSession
. -
If session’s ended value is
true
, throw anInvalidStateError
and abort these steps. -
If newState’s
baseLayer
's was created with anXRSession
other than session, throw anInvalidStateError
and abort these steps. -
If newState’s
inlineVerticalFieldOfView
is set and session is an immersive session, throw anInvalidStateError
and abort these steps. -
Let activeState be session’s active render state.
-
If session’s pending render state is
null
, set it to a copy of activeState. -
If newState’s
depthNear
value is set, set session’s pending render state'sdepthNear
to newState’sdepthNear
. -
If newState’s
depthFar
value is set, set session’s pending render state'sdepthFar
to newState’sdepthFar
. -
If newState’s
inlineVerticalFieldOfView
is set, set session’s pending render state'sinlineVerticalFieldOfView
to newState’sinlineVerticalFieldOfView
. -
If newState’s
baseLayer
is set, set session’s pending render state'sbaseLayer
to newState’sbaseLayer
.
When requested, the XRSession
MUST apply the pending render state by running the following steps:
-
Let session be the target
XRSession
. -
Let activeState be session’s active render state.
-
Let newState be session’s pending render state.
-
Set session’s pending render state to
null
. -
Set activeState to newState.
-
If activeState’s
inlineVerticalFieldOfView
is less than session’s minimum inline field of view set activeState’sinlineVerticalFieldOfView
to session’s minimum inline field of view. -
If activeState’s
inlineVerticalFieldOfView
is greater than session’s maximum inline field of view set activeState’sinlineVerticalFieldOfView
to session’s maximum inline field of view. -
If activeState’s
depthNear
is less than session’s minimum near clip plane set activeState’sdepthNear
to session’s minimum near clip plane. -
If activeState’s
depthFar
is less than session’s maximum far clip plane set activeState’sdepthFar
to session’s maximum far clip plane. -
Let baseLayer be activeState’s
baseLayer
. -
Set activeState’s composition disabled and output canvas as follows:
- If session’s mode is
"inline"
and baseLayer is an instance of anXRWebGLLayer
with composition disabled set totrue
- Set activeState’s composition disabled boolean to
true
.- Set activeState’s output canvas to baseLayer’s context's
canvas
. - Set activeState’s output canvas to baseLayer’s context's
- Otherwise
- Set activeState’s composition disabled boolean to
false
.- Set activeState’s output canvas to
null
. - Set activeState’s output canvas to
- If session’s mode is
requestReferenceSpace(type)
method constructs a new XRReferenceSpace
of a given type, if possible.
When this method is invoked, the user agent MUST run the following steps:
-
Let promise be a new Promise.
-
Run the following steps in parallel:
-
Create a reference space, referenceSpace, with the
XRReferenceSpaceType
type. -
If referenceSpace is
null
, reject promise with aNotSupportedError
and abort these steps. -
Resolve promise with referenceSpace.
-
-
Return promise.
Each XRSession
has a list of active XR input sources (a list of XRInputSource
) which MUST be initially an empty list.
Each XRSession
has an XR device, which is an XR device set at initialization.
The inputSources
attribute returns the XRSession
's list of active XR input sources.
The user agent MUST monitor any XR input sources associated with the XR Device, including detecting when XR input sources are added, removed, or changed.
When new XR input sources become available for XRSession
session, the user agent MUST run the following steps:
-
Let added be a new list.
-
For each new XR input source:
-
Let inputSource be a new
XRInputSource
. -
Add inputSource to added.
-
-
Queue a task to perform the following steps:
-
Extend session’s list of active XR input sources with added.
-
Fire an
XRInputSourcesChangeEvent
namedinputsourceschange
on session withadded
set to added.
-
When any previously added XR input sources are no longer available for XRSession
session, the user agent MUST run the following steps:
-
Let removed be a new list.
-
For each XR input source that is no longer available:
-
Let inputSource be the
XRInputSource
in session’s list of active XR input sources associated with the XR input source. -
Add inputSource to removed.
-
-
Queue a task to perform the following steps:
-
Remove each
XRInputSource
in removed from session’s list of active XR input sources. -
Fire an
XRInputSourcesChangeEvent
namedinputsourceschange
on session withremoved
set to removed.
-
When the handedness
, targetRayMode
, profiles
, or presence of a gripSpace
for any XR input sources change for XRSession
session, the user agent MUST run the following steps:
-
Let added be a new list.
-
Let removed be a new list.
-
For each changed XR input source:
-
Let oldInputSource be the
XRInputSource
in session’s list of active XR input sources previously associated with the XR input source. -
Let newInputSource be a new
XRInputSource
. -
Add oldInputSource to removed.
-
Add newInputSource to added.
-
-
Queue a task to perform the following steps:
-
Remove each
XRInputSource
in removed from session’s list of active XR input sources. -
Extend session’s list of active XR input sources with added.
-
Fire an
XRInputSourcesChangeEvent
namedinputsourceschange
on session withadded
set to added andremoved
set to removed.
-
Each XRSession
has a visibility state value, which is an enum which MUST be set to whichever of the following values best matches the state of session.
-
A state of
visible
indicates that imagery rendered by theXRSession
can be seen by the user andrequestAnimationFrame()
callbacks are processed at the XR device's native refresh rate. Input is processed by theXRSession
normally. -
A state of
visible-blurred
indicates that imagery rendered by theXRSession
may be seen by the user, but is not the primary focus.requestAnimationFrame()
callbacks MAY be throttled. Input is not processed by theXRSession
. -
A state of
hidden
indicates that imagery rendered by theXRSession
cannot be seen by the user.requestAnimationFrame()
callbacks will not be processed until the visibility state changes. Input is not processed by theXRSession
.
The visibilityState
attribute returns the XRSession
's visibility state. The onvisibilitychange
attribute is an Event handler IDL attribute for the visibilitychange
event type.
The visibility state MAY be changed by the user agent at any time other than during the processing of an XR animation frame, and the user agent SHOULD monitor the XR platform when possible to observe when session visibility has been affected external to the user agent and update the visibility state accordingly.
Note: The XRSession
's visibility state does not necessarily imply the visibility of the HTML document. Depending on the system configuration the page may continue to be visible while an immersive session is active. (For example, a headset connected to a PC may continue to display the page on the monitor while the headset is viewing content from an immersive session.) Developers should continue to rely on the [Page Visibility API](https://w3c.github.io/page-visibility/) to determine page visibility.
Each XRSession
has a viewer reference space, which is an XRReferenceSpace
of type "viewer"
with an identity transform origin offset. The viewer reference space has a list of views, which is a list of views corresponding to the views provided by the XR device. If the XRSession
's renderState
's composition disabled boolean is set to true
the list of views MUST contain a single view.
The onend
attribute is an Event handler IDL attribute for the end
event type.
The oninputsourceschange
attribute is an Event handler IDL attribute for the inputsourceschange
event type.
The onselectstart
attribute is an Event handler IDL attribute for the selectstart
event type.
The onselectend
attribute is an Event handler IDL attribute for the selectend
event type.
The onselect
attribute is an Event handler IDL attribute for the select
event type.
4.2. XRRenderState
An XRRenderState
represents a set of configurable values which affect how an XRSession
's output is composited. The active render state for a given XRSession
can only change between frame boundaries, and updates can be queued up via updateRenderState()
.
dictionary {
XRRenderStateInit double ;
depthNear double ;
depthFar double ;
inlineVerticalFieldOfView XRWebGLLayer ?; }; [
baseLayer SecureContext ,Exposed =Window ]interface {
XRRenderState readonly attribute double depthNear ;readonly attribute double depthFar ;readonly attribute double ?inlineVerticalFieldOfView ;readonly attribute XRWebGLLayer ?baseLayer ; };
Each XRRenderState
has a output canvas, which is an HTMLCanvasElement
initially set to null
. The output canvas is the DOM element that will display any content rendered for an "inline"
XRSession
.
Each XRRenderState
also has composition disabled boolean, which is initially false
. The XRRenderState
is considered to be have composition disabled if rendering commands performed for an "inline"
XRSession
are executed in such a way that they are directly displayed into output canvas, rather than first being processed by the XR Compositor.
Note: At this point the XRRenderState
will only have an output canvas if it has composition disabled, but future versions of the specification are likely to introduce methods for setting output canvas' that support more advanced uses like mirroring and layer compositing that will require composition.
When an XRRenderState
object is created for an XRSession
session, the user agent MUST initialize the render state by running the following steps:
-
Let state be the newly created
XRRenderState
object. -
Initialize state’s
depthNear
to0.1
. -
Initialize state’s
depthFar
to1000.0
. -
Initialize state’s
inlineVerticalFieldOfView
as follows:- If session is an inline session
- Initialize state’s
inlineVerticalFieldOfView
toPI * 0.5
. - Else
- Initialize state’s
inlineVerticalFieldOfView
tonull
.
-
Initialize state’s
baseLayer
tonull
.
The depthNear
attribute defines the distance, in meters, of the near clip plane from the viewer. The depthFar
attribute defines the distance, in meters, of the far clip plane from the viewer.
depthNear
and depthFar
is used in the computation of the projectionMatrix
of XRView
s and determines how the values of an XRWebGLLayer
depth buffer are interpreted. depthNear
MAY be greater than depthFar
.
The inlineVerticalFieldOfView
attribute defines the default vertical field of view in radians used when computing projection matrices for "inline"
XRSession
s. The projection matrix calculation also takes into account the aspect ratio of the output canvas. This value MUST be null
for immersive sessions.
The baseLayer
attribute defines an XRWebGLLayer
which the XR compositor will obtain images from.
4.3. Animation Frames
The primary way an XRSession
provides information about the tracking state of the XR device is via callbacks scheduled by calling requestAnimationFrame()
on the XRSession
instance.
callback =
XRFrameRequestCallback void (DOMHighResTimeStamp ,
time XRFrame );
frame
Each XRFrameRequestCallback
object has a cancelled boolean initially set to false
.
Each XRSession
has a list of animation frame callbacks, which is initially empty, and an animation frame callback identifier, which is a number initially be zero.
The requestAnimationFrame(callback)
method queues up callback for being run the next time the user agent wishes to run an animation frame for the device.
When this method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSession
object. -
Increment session’s animation frame callback identifier by one.
-
Append callback to session’s list of animation frame callbacks, associated with session’s animation frame callback identifier’s current value.
-
Return session’s animation frame callback identifier’s current value.
The cancelAnimationFrame(handle)
method cancels an existing animation frame callback given its animation frame callback identifier handle.
When this method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSession
object. -
Find the entry in session’s list of animation frame callbacks that is associated with the value handle.
-
If there is such an entry, set it’s cancelled boolean to
true
and remove it from session’s list of animation frame callbacks.
When an XRSession
session receives updated viewer state from the XR device, it runs an XR animation frame with a timestamp now and an XRFrame
frame, which MUST run the following steps regardless of if the list of animation frame callbacks is empty or not:
-
If session’s pending render state is not
null
, apply the pending render state. -
If session’s
renderState
'sbaseLayer
isnull
, abort these steps. -
If session’s mode is
"inline"
and session’srenderState
's output canvas isnull
, abort these steps. -
Let callbacks be a list of the entries in session’s list of animation frame callback, in the order in which they were added to the list.
-
Set session’s list of animation frame callbacks to the empty list.
-
Set frame’s active boolean to
true
. -
Set frame’s animationFrame boolean to
true
. -
For each entry in callbacks, in order:
-
If the entry’s cancelled boolean is
true
, continue to the next entry. -
Invoke the Web IDL callback function, passing now and frame as the arguments
-
If an exception is thrown, report the exception.
-
-
Set frame’s active boolean to
false
.
Window
requestAnimationFrame()
may not be processed while an immersive session is active. For instance, on a mobile or standalone device where the immersive content completely obscures the HTML document. As such, developers must not rely on Window
requestAnimationFrame()
callbacks to schedule XRSession
requestAnimationFrame()
callbacks and visa-versa, even if they share the same rendering logic. Applications that do not follow this guidance may not execute properly on all platforms. A more effective pattern for applications that wish to transition between these two types of animation loops is demonstrated below:
let xrSession= null ; function onWindowAnimationFrame( time) { window. requestAnimationFrame( onWindowAnimationFrame); // This may be called while an immersive session is running on some devices, // such as a desktop with a tethered headset. To prevent two loops from // rendering in parallel, skip drawing in this one until the session ends. if ( ! xrSession) { renderFrame( time, null ); } } // The window animation loop can be started immediately upon the page loading. window. requestAnimationFrame( onWindowAnimationFrame); function onXRAnimationFrame( time, xrFrame) { xrSession. requestAnimationFrame( onXRAnimationFrame); renderFrame( time, xrFrame); } function renderFrame( time, xrFrame) { // Shared rendering logic. } // Assumed to be called by a user gesture event elsewhere in code. function startXRSession() { navigator. xr. requestSession( 'immersive-vr' ). then(( session) => { xrSession= session; xrSession. addEventListener( 'end' , onXRSessionEnded); // Do necessary session setup here. // Begin the session’s animation loop. xrSession. requestAnimationFrame( onXRAnimationFrame); }); } function onXRSessionEnded() { xrSession= null ; }
Applications which use "inline"
sessions for rendering to the HTML document do not need to take any special steps to coordinate the animation loops, since the user agent will automatically suspend the animation loops of any "inline"
sessions while an immersive session is active.
4.4. The XR Compositor
The user agent MUST maintain an XR Compositor which handles presentation to the XR device and frame timing. The compositor MUST use an independent rendering context whose state is isolated from that of any graphics contexts created by the document. The compositor MUST prevent the page from corrupting the compositor state or reading back content from other pages or applications. The compositor MUST also run in separate thread or processes to decouple performance of the page from the ability to present new imagery to the user at the appropriate framerate. The compositor MAY composite additional device or user agent UI over rendered content, like device menus.
Note: Future extensions to this spec may utilize the compositor to composite multiple layers coming from the same page as well.
5. Frame Loop
5.1. XRFrame
An XRFrame
represents a snapshot of the state of all of the tracked objects for an XRSession
. Applications can acquire an XRFrame
by calling requestAnimationFrame()
on an XRSession
with an XRFrameRequestCallback
. When the callback is called it will be passed an XRFrame
. Events which need to communicate tracking state, such as the select
event, will also provide an XRFrame
.
[SecureContext ,Exposed =Window ]interface { [
XRFrame SameObject ]readonly attribute XRSession session ;XRViewerPose ?getViewerPose (XRReferenceSpace );
referenceSpace XRPose ?getPose (XRSpace ,
space XRSpace ); };
baseSpace
Each XRFrame
has an active boolean which is initially set to false
, and an animationFrame boolean which is initially set to false
.
The session
attribute returns the XRSession
that produced the XRFrame
.
The getViewerPose(referenceSpace)
method provides the pose of the viewer relative to referenceSpace as an XRViewerPose
, at the time represented by the XRFrame
.
When this method is invoked, the user agent MUST run the following steps:
-
Let frame be the target
XRFrame
-
Let session be frame’s
session
object. -
If frame’s animationFrame boolean is
false
, throw anInvalidStateError
and abort these steps. -
Let pose be a new
XRViewerPose
object. -
Populate the pose of session’s viewer reference space in referenceSpace at the time represented by frame into pose.
-
If pose is
null
returnnull
. -
Let xrviews be an empty list.
-
For each view view in the list of views on theviewer reference space of
session
, perform the following steps:-
Let xrview be a new
XRView
object. -
Initialize xrview’s underlying view to view.
-
Initialize xrview’s frame to frame.
-
Let offset be an
XRRigidTransform
equal to the view offset of view -
Set xrview’s
transform
property to the result of multiplying theXRViewerPose
'stransform
by the offset transform -
Append xrview to xrviews
-
-
Set pose’s
views
to xrviews -
Return pose.
The getPose(space, baseSpace)
method provides the pose of space relative to baseSpace as an XRPose
, at the time represented by the XRFrame
.
When this method is invoked, the user agent MUST run the following steps:
-
Let frame be the target
XRFrame
-
Let pose be a new
XRPose
object. -
Populate the pose of space in baseSpace at the time represented by frame into pose.
-
Return pose.
6. Spaces
A core feature of the WebXR Device API is the ability to provide spatial tracking. Spaces are the interface that enable applications to reason about how tracked entities are spatially related to the user’s physical environment and each other.
6.1. XRSpace
An XRSpace
represents a virtual coordinate system with an origin that corresponds to a physical location. Spatial data that is requested from the API or given to the API is always expressed in relation to a specific XRSpace
at the time of a specific XRFrame
. Numeric values such as pose positions are coordinates in that space relative to its origin. The interface is intentionally opaque.
[SecureContext ,Exposed =Window ]interface :
XRSpace EventTarget { };
Each XRSpace
has a session which is set to the XRSession
that created the XRSpace
.
Each XRSpace
has a native origin that is tracked by the XR device's underlying tracking system, and an effective origin, which is the basis of the XRSpace
's coordinate system. The transform from the effective space to the native origin's space is defined by an origin offset, which is an XRRigidTransform
initially set to an identity transform.
The effective origin of an XRSpace
can only be observed in the coordinate system of another XRSpace
as an XRPose
, returned by an XRFrame
's getPose()
method. The spatial relationship between XRSpace
s MAY change between XRFrame
s.
To populate the pose of an XRSpace
space in an XRSpace
baseSpace at the time represented by an XRFrame
frame into an XRPose
pose, the user agent MUST run the following steps:
-
If frame’s active boolean is
false
, throw anInvalidStateError
and abort these steps. -
Let session be frame’s
session
object. -
If space’s session does not equal session, throw an
InvalidStateError
and abort these steps. -
If baseSpace’s session does not equal session, throw an
InvalidStateError
and abort these steps. -
Check if poses may be reported and, if not, throw a
SecurityError
and abort these steps. -
Let limit be the result of whether poses must be limited between space and baseSpace.
-
Let transform be pose’s
transform
. -
Query the XR device's tracking system for space’s pose relative to baseSpace at the time represented by frame, then perform the following steps:
- If limit is
false
and the tracking system provides a 6DoF pose whose position is actively tracked or statically known for space’s pose relative to baseSpace: - Set transform’s
orientation
to the orientation of space’s effective origin in baseSpace’s coordinate system.- Set transform’s
position
to the position of space’s effective origin in baseSpace’s coordinate system.- Set pose’s
emulatedPosition
tofalse
. - Set transform’s
- Else if limit is
false
and the tracking system provides a 3DoF pose or a 6DoF pose whose position is neither actively tracked nor statically known for space’s pose relative to baseSpace: - Set transform’s
orientation
to the orientation of space’s effective origin in baseSpace’s coordinate system.- Set transform’s
position
to the tracking system’s best estimate of the position of space’s effective origin in baseSpace’s coordinate system. This MAY include a computed offset such as a neck or arm model. If a position estimate is not available, the last known position MUST be used.- Set pose’s
emulatedPosition
totrue
. - Set transform’s
- Else if space’s pose relative to baseSpace has been determined in the past:
- Set transform’s
position
to the last known position of space’s effective origin in baseSpace’s coordinate system.- Set transform’s
orientation
to the last known orientation of space’s effective origin in baseSpace’s coordinate system.- Set pose’s
emulatedPosition
boolean totrue
. - Set transform’s
- Else if space’s pose relative to baseSpace has never been determined:
- Set pose to
null
.
- If limit is
Note: The XRPose
's emulatedPosition
boolean does not indicate whether baseSpace’s position is emulated or not, only whether evaluating space’s position relative to baseSpace relies on emulation. For example, a controller with 3DoF tracking would report poses with an emulatedPosition
of true
when its targetRaySpace
or gripSpace
are queried against an XRReferenceSpace
, but would report an emulatedPosition
of false
if the pose of the targetRaySpace
was queried in gripSpace
, because the relationship between those two spaces should be known exactly.
6.2. XRReferenceSpace
An XRReferenceSpace
is one of several common XRSpace
s that applications can use to establish a spatial relationship with the user’s physical environment.
XRReferenceSpace
s are generally expected to remain static for the duration of the XRSession
, with the most common exception being mid-session reconfiguration by the user. The native origin for every XRReferenceSpace
describes a coordinate system where +X
is considered "Right", +Y
is considered "Up", and -Z
is considered "Forward".
enum {
XRReferenceSpaceType "viewer" ,"local" ,"local-floor" ,"bounded-floor" ,"unbounded" }; [SecureContext ,Exposed =Window ]interface :
XRReferenceSpace XRSpace { [NewObject ]XRReferenceSpace getOffsetReferenceSpace (XRRigidTransform );
originOffset attribute EventHandler onreset ; };
Each XRReferenceSpace
has a type, which is an XRReferenceSpaceType
.
An XRReferenceSpace
is most frequently obtained by calling requestReferenceSpace()
, which creates an instance of an XRReferenceSpace
or an interface extending it, determined by the XRReferenceSpaceType
enum value passed into the call. The type indicates the tracking behavior that the reference space will exhibit:
-
Passing a type of
viewer
creates anXRReferenceSpace
instance. It represents a tracking space with a native origin which tracks the position and orientation of the viewer. EveryXRSession
MUST support"viewer"
XRReferenceSpace
s. -
Passing a type of
local
creates anXRReferenceSpace
instance. It represents a tracking space with a native origin near the viewer at the time of creation. The exact position and orientation will be initialized based on the conventions of the underlying platform. When using this reference space the user is not expected to move beyond their initial position much, if at all, and tracking is optimized for that purpose. For devices with 6DoF tracking,local
reference spaces should emphasize keeping the origin stable relative to the user’s environment. -
Passing a type of
local-floor
creates anXRReferenceSpace
instance. It represents a tracking space with a native origin at the floor in a safe position for the user to stand. They
axis equals0
at floor level, with thex
andz
position and orientation initialized based on the conventions of the underlying platform. If the floor level isn’t known it MUST be estimated. If the estimated floor level is determined with a non-default value, it MUST be rounded sufficiently to prevent fingerprinting. When using this reference space the user is not expected to move beyond their initial position much, if at all, and tracking is optimized for that purpose. For devices with 6DoF tracking,local-floor
reference spaces should emphasize keeping the origin stable relative to the user’s environment.Note: If the floor level of a
"local-floor"
reference space is adjusted to prevent fingerprinting, rounded to the nearest 1cm is suggested. -
Passing a type of
bounded-floor
creates anXRBoundedReferenceSpace
instance if supported by the XR device and theXRSession
. It represents a tracking space with it’s native origin at the floor, where the user is expected to move within a pre-established boundary, given as theboundsGeometry
. Tracking in abounded-floor
reference space is optimized for keeping the native origin andboundsGeometry
stable relative to the user’s environment. -
Passing a type of
unbounded
creates anXRReferenceSpace
instance if supported by the XR device and theXRSession
. It represents a tracking space where the user is expected to move freely around their environment, potentially even long distances from their starting point. Tracking in anunbounded
reference space is optimized for stability around the user’s current position, and as such the native origin may drift over time.
Devices that support "local"
reference spaces MUST support "local-floor"
reference spaces, through emulation if necessary, and vice versa.
The onreset
attribute is an Event handler IDL attribute for the reset
event type.
When an XRReferenceSpace
is requested, the user agent MUST create a reference space by running the following steps:
-
Let session be the
XRSession
object that requested creation of a reference space. -
Let type be set to the
XRReferenceSpaceType
passed torequestReferenceSpace()
. -
If the reference space is supported for type and session, run the following steps:
-
Initialize referenceSpace as follows:
- If type is
bounded-floor
- Let referenceSpace be a new
XRBoundedReferenceSpace
. - Otherwise
- Let referenceSpace be a new
XRReferenceSpace
.
- If type is
-
Initialize referenceSpace’s type to type.
-
Initialize referenceSpace’s session to session.
-
Return referenceSpace
-
-
Return
null
.
XRSession
session, run the following steps:
-
If type is not contained in session’s list of enabled features return
false
. -
If type is
viewer
, returntrue
. -
If type is
local
orlocal-floor
, and session is an immersive session, returntrue
. -
If type is
local
orlocal-floor
, and the XR device supports reporting orientation data, returntrue
. -
If type is
bounded-floor
and session is an immersive session, return the result of whether bounded reference spaces are supported by the XR device. -
If type is
unbounded
, session is an immersive session, and the XR device supports stable tracking near the user over an unlimited distance, returntrue
. -
Return
false
.
getOffsetReferenceSpace(originOffset)
method MUST perform the following steps when invoked:
-
Let base be the
XRReferenceSpace
the method was called on. -
Initialize offsetSpace as follows:
- If base is an instance of
XRBoundedReferenceSpace
- Let offsetSpace be a new
XRBoundedReferenceSpace
and set offsetSpace’sboundsGeometry
to base’sboundsGeometry
, with each point multiplied by theinverse
of originOffset. - Else
- Let offsetSpace be a new
XRReferenceSpace
.
- If base is an instance of
-
Set offsetSpace’s native origin to base’s native origin.
-
Set offsetSpace’s origin offset to the result of multiplying base’s origin offset by originOffset.
-
Return offsetSpace.
Note: It’s expected that some applications will use getOffsetReferenceSpace()
to implement scene navigation controls based on mouse, keyboard, touch, or gamepad input. This will result in getOffsetReferenceSpace()
being called frequently, at least once per-frame during periods of active input. As a result UAs are strongly encouraged to make the creation of new XRReferenceSpace
s with getOffsetReferenceSpace()
a lightweight operation.
6.3. XRBoundedReferenceSpace
XRBoundedReferenceSpace
extends XRReferenceSpace
to include boundsGeometry
, indicating the pre-configured boundaries of the users space.
[SecureContext ,Exposed =Window ]interface :
XRBoundedReferenceSpace XRReferenceSpace {readonly attribute FrozenArray <DOMPointReadOnly >boundsGeometry ; };
The origin of an XRBoundedReferenceSpace
MUST be positioned at the floor, such that the y
axis equals 0
at floor level. The x
and z
position and orientation are initialized based on the conventions of the underlying platform, typically expected to be near the center of the room facing in a logical forward direction.
Note: Other XR platforms sometimes refer to the type of tracking offered by a bounded-floor
reference space as "room scale" tracking. An XRBoundedReferenceSpace
is not intended to describe multi-room spaces, areas with uneven floor levels, or very large open areas. Content that needs to handle those scenarios should use an unbounded
reference space.
Each XRBoundedReferenceSpace
has a native bounds geometry describing the border around the XRBoundedReferenceSpace
, which the user can expect to safely move within. The polygonal boundary is given as an array of DOMPointReadOnly
s, which represents a loop of points at the edges of the safe space. The points describe offsets from the native origin in meters. Points MUST be given in a clockwise order as viewed from above, looking towards the negative end of the Y axis. The y
value of each point MUST be 0
and the w
value of each point MUST be 1
. The bounds can be considered to originate at the floor and extend infinitely high. The shape it describes MAY be convex or concave.
Each point in the native bounds geometry MUST be limited to a reasonable distance from the reference space’s native origin.
Note: It is suggested that points of the native bounds geometry be limited to 15 meters from the native origin in all directions.
Each point in the native bounds geometry MUST also be quantized sufficiently to prevent fingerprinting. For user’s safety, quantized points values MUST NOT fall outside the bounds reported by the platform.
Note: It is suggested that points of the native bounds geometry be quantized to the nearest 5cm.
The boundsGeometry
attribute is an array of DOMPointReadOnly
s such that each entry is equal to the entry in the XRBoundedReferenceSpace
's native bounds geometry premultiplied by the inverse
of the origin offset. In other words, it provides the same border in XRBoundedReferenceSpace
coordinates relative to the effective origin.
Note: Content should not require the user to move beyond the boundsGeometry
. It is possible for the user to move beyond the bounds if their physical surroundings allow for it, resulting in position values outside of the polygon they describe. This is not an error condition and should be handled gracefully by page content.
Note: Content generally should not provide a visualization of the boundsGeometry
, as it’s the user agent’s responsibility to ensure that safety critical information is provided to the user.
7. Views
7.1. XRView
An XRView
describes a single view into an XR scene for a given frame.
Each view corresponds to a display or portion of a display used by an XR device to present imagery to the user. They are used to retrieve all the information necessary to render content that is well aligned to the view's physical output properties, including the field of view, eye offset, and other optical properties. Views may cover overlapping regions of the user’s vision. No guarantee is made about the number of views any XR device uses or their order, nor is the number of views required to be constant for the duration of an XRSession
.
A view has an associated internal view offset, which is an XRRigidTransform
describing the position and orientation of the view in the viewer reference space's coordinate system.
A view has an associated projection matrix which is a matrix describing the projection to be used when rendering the view, provided by the underlying XR device. The projection matrix MAY include transformations such as shearing that prevent the projection from being accurately described by a simple frustum.
A view has an associated eye which is an XREye
describing which eye this view is expected to be shown to. If the view does not have an intrinsically associated eye (the display is monoscopic, for example) this value MUST be set to "none"
.
Note: Many HMDs will request that content render two views, one for the left eye and one for the right, while most magic window devices will only request one view, but applications should never assume a specific view configuration. For example: A magic window device may request two views if it is capable of stereo output, but may revert to requesting a single view for performance reasons if the stereo output mode is turned off. Similarly, HMDs may request more than two views to facilitate a wide field of view or displays of different pixel density.
enum {
XREye ,
"none" ,
"left" }; [
"right" SecureContext ,Exposed =Window ]interface {
XRView readonly attribute XREye eye ;readonly attribute Float32Array projectionMatrix ; [SameObject ]readonly attribute XRRigidTransform transform ; };
The eye
attribute describes is the eye of the underlying view. This attribute’s primary purpose is to ensure that pre-rendered stereo content can present the correct portion of the content to the correct eye.
The projectionMatrix
attribute is the projection matrix of the underlying view. It is strongly recommended that applications use this matrix without modification or decomposition. Failure to use the provided projection matrices when rendering may cause the presented frame to be distorted or badly aligned, resulting in varying degrees of user discomfort. This attribute MUST be computed by obtaining the projection matrix for the XRView
.
The transform
attribute is the XRRigidTransform
of the viewpoint. It represents the position and orientation of the viewpoint in the XRReferenceSpace
provided in getViewerPose()
.
Each XRView
has an associated frame which is the XRFrame
that produced it.
Each XRView
has an associated underlying view which is the underlying view that it represents.
Each XRView
has an associated internal projection matrix which stores the projection matrix of its underlying view. It is initially null
.
Note: The transform
can be used to position camera objects in many rendering libraries. If a more traditional view matrix is needed by the application one can be retrieved by calling view.transform.inverse.matrix
.
To obtain the projection matrix for a given XRView
view
-
If view’s internal projection matrix is not
null
, perform the following steps: -
If the operation
IsDetachedBuffer
on internal projection matrix isfalse
, return view’s internal projection matrix. -
Set view’s internal projection matrix to a new matrix which is equal to view’s underlying view's projection matrix.
-
Return view’s internal projection matrix.
7.2. XRViewport
An XRViewport
object describes a viewport, or rectangular region, of a graphics surface.
[SecureContext ,Exposed =Window ]interface {
XRViewport readonly attribute long x ;readonly attribute long y ;readonly attribute long width ;readonly attribute long height ; };
The x
and y
attributes define an offset from the surface origin and the width
and height
attributes define the rectangular dimensions of the viewport.
The exact interpretation of the viewport values depends on the conventions of the graphics API the viewport is associated with:
-
When used with an
XRWebGLLayer
thex
andy
attributes specify the lower left corner of the viewport rectangle, in pixels, with the viewport rectangle extendingwidth
pixels to the right ofx
andheight
pixels abovey
. The values can be passed to the WebGL viewport function directly.
XRView
s of an XRViewerPose
, queries an XRViewport
from an XRWebGLLayer
for each, and uses them to set the appropriate WebGL viewports for rendering.
xrSession. requestAnimationFrame(( time, xrFrame) => { let viewer= xrFrame. getViewerPose( xrReferenceSpace); gl. bindFramebuffer( xrWebGLLayer. framebuffer); for ( xrViewof viewer. views) { let xrViewport= xrWebGLLayer. getViewport( xrView); gl. viewport( xrViewport. x, xrViewport. y, xrViewport. width, xrViewport. height); // WebGL draw calls will now be rendered into the appropriate viewport. } });
8. Geometric Primitives
8.1. Matrices
WebXR provides various transforms in the form of matrices. WebXR uses the WebGL conventions when communicating matrices, in which 4x4 matrices are given as 16 element Float32Array
s with column major storage, and are applied to column vectors by premultiplying the matrix from the left. They may be passed directly to WebGL’s uniformMatrix4fv
function, used to create an equivalent DOMMatrix
, or used with a variety of third party math libraries.
Float32Array
laid out like so:
[a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15]
Applying this matrix as a transform to a column vector specified as a DOMPointReadOnly
like so:
{x:X, y:Y, z:Z, w:1}
Produces the following result:
a0 a4 a8 a12 * X = a0 * X + a4 * Y + a8 * Z + a12 a1 a5 a9 a13 Y a1 * X + a5 * Y + a9 * Z + a13 a2 a6 a10 a14 Z a2 * X + a6 * Y + a10 * Z + a14 a3 a7 a11 a15 1 a3 * X + a7 * Y + a11 * Z + a15
8.2. Normalization
There are several algorithms which call for a vector or quaternion to be normalized, which means to scale the components to have a collective magnitude of 1.0
.
To normalize a list of components the UA MUST perform the following steps:
-
Let length be the square root of the sum of the squares of each component.
-
If length is
0
, throw anInvalidStateError
and abort these steps. -
Divide each component by length and set the component.
8.3. XRRigidTransform
An XRRigidTransform
is a transform described by a position
and orientation
. When interpreting an XRRigidTransform
the orientation
is always applied prior to the position
.
An XRRigidTransform
contains an internal matrix which is a matrix.
[SecureContext ,Exposed =Window ]interface {
XRRigidTransform constructor (optional DOMPointInit = {},
position optional DOMPointInit = {}); [
orientation SameObject ]readonly attribute DOMPointReadOnly position ; [SameObject ]readonly attribute DOMPointReadOnly orientation ;readonly attribute Float32Array matrix ; [SameObject ]readonly attribute XRRigidTransform inverse ; };
The XRRigidTransform(position, orientation)
constructor MUST perform the following steps when invoked:
-
Let transform be a new
XRRigidTransform
. -
If position is not a
DOMPointInit
initialize transform’sposition
to{ x: 0.0, y: 0.0, z: 0.0, w: 1.0 }
. -
Initialize transform’s
position
’sx
value to position’s x dictionary member,y
value to position’s y dictionary member,z
value to position’s z dictionary member andw
to1.0
. -
Initialize transform’s
orientation
as follows:- If orientation is not a
DOMPointInit
- Initialize transform’s
orientation
to{ x: 0.0, y: 0.0, z: 0.0, w: 1.0 }
. - Else
- Initialize transform’s
orientation
’sx
value to orientation’s x dictionary member,y
value to orientation’s y dictionary member,z
value to orientation’s z dictionary member andw
value to orientation’s w dictionary member.
- If orientation is not a
-
Initialize transform’s internal matrix to
null
. -
Normalize
x
,y
,z
, andw
components of transform’sorientation
. -
Return transform.
The position
attribute is a 3-dimensional point, given in meters, describing the translation component of the transform. The position
's w
attribute MUST be 1.0
.
The orientation
attribute is a quaternion describing the rotational component of the transform. The orientation
MUST be normalized to have a length of 1.0
.
The matrix
attribute returns the transform described by the position
and orientation
attributes as a matrix. This attribute MUST be computed by obtaining the matrix for the XRRigidTransform
.
Note: This matrix when premultiplied onto a column vector will rotate the vector by the 3D rotation described by orientation
, and then translate it by position
. Mathematically in column-vector notation, this is M = T * R
, where T
is a translation matrix corresponding to position
and R
is a rotation matrix corresponding to orientation
.
To obtain the matrix for a given XRRigidTransform
transform
-
If transform’s internal matrix is not
null
, perform the following steps: -
If the operation
IsDetachedBuffer
on internal matrix isfalse
, return transform’s internal matrix. -
Let translation be a new matrix which is a column-vector translation matrix corresponding to
position
. Mathematically, ifposition
is(x, y, z)
, this matrix is -
Let rotation be a new matrix which is a column-vector rotation matrix corresponding to
orientation
. Mathematically, iforientation
is the unit quaternion(qx, qy, qz, qw)
, this matrix is -
Set transform’s internal matrix to a new matrix set to the result of multiplying translation and rotation with translation on the left (
translation * rotation
). Mathematically, this matrix is -
Return transform’s internal matrix.
The inverse
attribute returns an XRRigidTransform
which, if applied to an object that had previously been transformed by the original XRRigidTransform
, would undo the transform and return the object to its initial pose. This attribute SHOULD be lazily evaluated. The XRRigidTransform
returned by inverse
MUST return the originating XRRigidTransform
as its inverse
.
An XRRigidTransform
with a position
of { x: 0, y: 0, z: 0 w: 1 }
and an orientation
of { x: 0, y: 0, z: 0, w: 1 }
is known as an identity transform.
To multiply two XRRigidTransform
s, B and A, the UA MUST perform the following steps:
-
Let result be a new
XRRigidTransform
object. -
Set result’s
matrix
to the result of premultiplying B’smatrix
from the left onto A’smatrix
. -
Set result’s
orientation
to the quaternion that describes the rotation indicated by the top left 3x3 sub-matrix of result’smatrix
. -
Set result’s
position
to the vector given by the fourth column of result’smatrix
. -
Return result.
result is a transform from A’s source space to B’s destination space.
Note: This is equivalent to constructing an XRRigidTransform
whose orientation
is the composition of the orientation of A and B, and whose position
is equal to A’s position
rotated by B’s orientation
, added to B’s position
.
9. Pose
9.1. XRPose
An XRPose
describes a position and orientation in space relative to an XRSpace
.
[SecureContext ,Exposed =Window ]interface { [
XRPose SameObject ]readonly attribute XRRigidTransform transform ;readonly attribute boolean emulatedPosition ; };
The transform
attribute describes the position and orientation relative to the base XRSpace
.
The emulatedPosition
attribute is false
when the transform
represents an actively tracked 6DoF pose based on sensor readings, or true
if its position
value includes a computed offset, such as that provided by a neck or arm model.
9.2. XRViewerPose
An XRViewerPose
is an XRPose
describing the state of a viewer of the XR scene as tracked by the XR device. A viewer may represent a tracked piece of hardware, the observed position of a users head relative to the hardware, or some other means of computing a series of viewpoints into the XR scene. XRViewerPose
s can only be queried relative to an XRReferenceSpace
. It provides, in addition to the XRPose
values, an array of views which include rigid transforms to indicate the viewpoint and projection matrices. These values should be used by the application when render a frame of an XR scene.
[SecureContext ,Exposed =Window ]interface :
XRViewerPose XRPose { [SameObject ]readonly attribute FrozenArray <XRView >views ; };
The views
array is a sequence of XRView
s describing the viewpoints of the XR scene, relative to the XRReferenceSpace
the XRViewerPose
was queried with. Every view of the XR scene in the array must be rendered in order to display correctly on the XR device. Each XRView
includes rigid transforms to indicate the viewpoint and projection matrices, and can be used to query XRViewport
s from layers when needed.
Note: The XRViewerPose
's transform
can be used to position graphical representations of the viewer for spectator views of the scene or multi-user interaction.
10. Input
10.1. XRInputSource
An XRInputSource
represents an XR input source, which is any input mechanism which allows the user to perform targeted actions in the same virtual space as the viewer. Example XR input sources include, but are not limited to, handheld controllers, optically tracked hands, and gaze-based input methods that operate on the viewer's pose. Input mechanisms which are not explicitly associated with the XR Device, such as traditional gamepads, mice, or keyboards SHOULD NOT be considered XR input sources.
enum {
XRHandedness ,
"none" ,
"left" };
"right" enum {
XRTargetRayMode "gaze" ,"tracked-pointer" ,"screen" }; [SecureContext ,Exposed =Window ]interface {
XRInputSource readonly attribute XRHandedness handedness ;readonly attribute XRTargetRayMode targetRayMode ; [SameObject ]readonly attribute XRSpace targetRaySpace ; [SameObject ]readonly attribute XRSpace ?gripSpace ; [SameObject ]readonly attribute FrozenArray <DOMString >profiles ; };
The handedness
attribute describes which hand the XR input source is associated with, if any. Input sources with no natural handedness (such as headset-mounted controls) or for which the handedness is not currently known MUST set this attribute "none"
.
The targetRayMode
attribute describes the method used to produce the target ray, and indicates how the application should present the target ray to the user if desired.
-
gaze
indicates the target ray will originate at the viewer and follow the direction it is facing. (This is commonly referred to as a "gaze input" device in the context of head-mounted displays.) -
tracked-pointer
indicates that the target ray originates from either a handheld device or other hand-tracking mechanism and represents that the user is using their hands or the held device for pointing. The orientation of the target ray relative to the tracked object MUST follow platform-specific ergonomics guidelines when available. In the absence of platform-specific guidance, the target ray SHOULD point in the same direction as the user’s index finger if it was outstretched. -
screen
indicates that the input source was an interaction with the canvas element associated with an inline session’s output context, such as a mouse click or touch event.
The targetRaySpace
attribute is an XRSpace
that has a native origin tracking the position and orientation of the preferred pointing ray of the XRInputSource
, as defined by the targetRayMode
.
The gripSpace
attribute is an XRSpace
that has a native origin tracking to the pose that should be used to render virtual objects such that they appear to be held in the user’s hand. If the user were to hold a straight rod, this XRSpace
places the native origin at the centroid of their curled fingers and where the -Z
axis points along the length of the rod towards their thumb. The X
axis is perpendicular to the back of the hand being described, with back of the users right hand pointing towards +X
and the back of the user’s left hand pointing towards -X
. The Y
axis is implied by the relationship between the X
and Z
axis, with +Y
roughly pointing in the direction of the user’s arm.
The gripSpace
MUST be null
if the input source isn’t inherently trackable such as for input sources with a targetRayMode
of "gaze"
or "screen"
.
The profiles
attribute is a list of input profile names indicating both the prefered visual representation and behavior of the input source.
An input profile name is a lowercase DOMString
containing no spaces, with separate words concatenated with a hyphen (-
) character. A descriptive name should be chosen, using the prefered verbiage of the device vendor when possible. If the platform provides an appropriate identifier, such as a USB vendor and product ID, it MAY be used. Values that uniquely identify a single device, such as serial numbers, MUST NOT be used. The input profile name MUST NOT contain an indication of device handedness. If multiple user agents expose the same device, they SHOULD make an effort to report the same input profile name. The WebXR Input Profiles Registry is the recommended location for managing input profile names.
Profiles are given in descending order of specificity. Any input profile names given after the first entry in the list should provide fallback values that represent alternative representations of the device. This may include a more generic or prior version of the device, a more widely recognized device that is sufficiently similar, or a broad description of the device type (such as "generic-trigger-touchpad"). If multiple profiles are given, the layouts they describe must all represent a superset or subset of every other profile in the list.
If the XRSession
's mode is "inline"
, profiles
MUST be an empty list.
If the input device cannot be reliably identified, or the user agent wishes to mask the input device being used, it MAY choose to only report generic input profile names or an empty list.
For example, the Samsung HMD Odyssey’s controller is a design variant of the standard Windows Mixed Reality controller. Both controllers share the same input layout. As a result, the profiles
for a Samsung HMD Odyssey controller could be: ["samsung-odyssey", "microsoft-mixed-reality", "generic-trigger-squeeze-touchpad-thumbstick"]
. The appearance of the controller is most precisely communicated by the first profile in the list, with the second profile describing an acceptable substitute, and the last profile a generic fallback that describes the device in the roughest sense. (It’s a controller with a trigger, squeeze button, touchpad and thumbstick.)
Similarly, the Valve Index controller is backwards compatible with the HTC Vive controller, but the Index controller has additional buttons and axes. As a result, the profiles
for the Valve Index controller could be: ["valve-index", "htc-vive", "generic-trigger-squeeze-touchpad-thumbstick"]
. In this case the input layout described by the "valve-index"
profile is a superset of the layout described by the "htc-vive"
profile. Also, the "valve-index"
profile indicates the precise appearance of the controller, while the "htc-vive"
controller has a significantly different appearance. In this case the UA would have deemed that difference acceptable. And as in the first example, the last profile is a generic fallback.
(Exact strings are examples only. Actual profile names are managed in the WebXR Input Profiles Registry.)
Note: XRInputSource
s in an XRSession
's inputSources
array are "live". As such values within them are updated in-place. This means that it doesn’t work to save a reference to an XRInputSource
's attribute on one frame and compare it to the same attribute in a subsequent frame to test for state changes, because they will be the same object. Therefore developers that wish to compare input state from frame to frame should copy the content of the state in question.
Each XR input source SHOULD define a primary action. The primary action is a platform-specific action that, when engaged, produces selectstart
, selectend
, and select
events. Examples of possible primary actions are pressing a trigger, touchpad, or button, speaking a command, or making a hand gesture. If the platform guidelines define a recommended primary input then it should be used as the primary action, otherwise the user agent is free to select one.
When an XR input source for XRSession
session begins its primary action the UA MUST run the following steps:
-
Queue a task to fire an
XRInputSourceEvent
namedselectstart
on session.
When an XR input source source for XRSession
session ends its primary action the UA MUST run the following steps:
-
Let frame be a new
XRFrame
withsession
session for the time the event occurred. -
Queue a task to perform the following steps:
-
Fire an input source event with name
select
, frame frame, and source source. -
Fire an input source event with name
selectend
, frame frame, and source source.
-
Sometimes platform-specific behavior can result in a primary action being interrupted or cancelled. For example, a XR input source may be removed from the XR device after the primary action is started but before it ends.
When an XR input source source for XRSession
session has its primary action cancelled the UA MUST run the following steps:
-
Let frame be a new
XRFrame
withsession
session for the time the event occurred. -
Queue a task to fire an input source event an
XRInputSourceEvent
with nameselectend
, frame frame, and source source.
10.2. Transient input
Some XR Devices may support transient input sources, where the XR input source is only meaningful while performing it’s primary action. An example would be mouse, touch, or stylus input against an "inline"
XRSession
, which MUST produce a transient XRInputSource
with a targetRayMode
set to screen
. Transient input sources are only present in the session’s list of active XR input sources for the duration of the the selectstart
, select
, and selectend
event sequence.
Transient input sources follow a slightly different sequence when firing primary action events:
When a transient input source source for XRSession
session begins its primary action the UA MUST run the following steps:
-
Let frame be a new
XRFrame
withsession
session for the time the event occurred. -
Queue a task to perform the following steps:
-
Fire any
"pointerdown"
events produced by the XR input source's action, if necessary. -
Add the XR input source to the list of active XR input sources.
-
Fire an input source event with name
selectstart
, frame frame, and source source.
-
When a transient input source source for XRSession
session ends its primary action the UA MUST run the following steps:
-
Let frame be a new
XRFrame
withsession
session for the time the event occurred. -
Queue a task to perform the following steps:
-
Fire an input source event with name
select
, frame frame, and source source. -
Fire any
"click"
events produced by the XR input source's action, if necessary. -
Fire an input source event with name
selectend
, frame frame, and source source. -
Remove the XR input source from the list of active XR input sources.
-
Fire any
"pointerup"
events produced by the XR input source's action, if necessary.
-
When a transient input source source for XRSession
session has its primary action cancelled the UA MUST run the following steps:
-
Let frame be a new
XRFrame
withsession
session for the time the event occurred. -
Queue a task to perform the following steps:
-
Fire an input source event with name
selectend
, frame frame, and source source. -
Remove the XR input source from the list of active XR input sources.
-
Fire any
"pointerup"
events produced by the XR input source's action, if necessary.
-
10.3. XRInputSourceArray
An XRInputSourceArray
represents a list of XRInputSource
s. It is used in favor of a frozen array type when the contents of the list are expected to change over time, such as with the XRSession
inputSources
attribute.
[SecureContext ,Exposed =Window ]interface {
XRInputSourceArray iterable <XRInputSource >;readonly attribute unsigned long length ;getter XRInputSource (unsigned long ); };
index
The length
attribute of XRInputSourceArray
indicates how many XRInputSource
s are contained within the XRInputSourceArray
.
The indexed property getter of XRInputSourceArray
retrieves the XRInputSource
at the provided index.
11. Layers
Note: While this specification only defines the XRWebGLLayer
layer, future extensions to the spec are expected to add additional layer types and the image sources that they draw from.
11.1. XRWebGLLayer
An XRWebGLLayer
is a layer which provides a WebGL framebuffer to render into, enabling hardware accelerated rendering of 3D graphics to be presented on the XR device.
typedef (WebGLRenderingContext or WebGL2RenderingContext );
XRWebGLRenderingContext dictionary {
XRWebGLLayerInit boolean =
antialias true ;boolean =
depth true ;boolean =
stencil false ;boolean =
alpha true ;boolean =
ignoreDepthValues false ;double = 1.0; }; [
framebufferScaleFactor SecureContext ,Exposed =Window ]interface {
XRWebGLLayer constructor (XRSession ,
session XRWebGLRenderingContext ,
context optional XRWebGLLayerInit = {}); // Attributes
layerInit readonly attribute boolean antialias ;readonly attribute boolean ignoreDepthValues ; [SameObject ]readonly attribute WebGLFramebuffer framebuffer ;readonly attribute unsigned long framebufferWidth ;readonly attribute unsigned long framebufferHeight ; // MethodsXRViewport ?getViewport (XRView ); // Static Methods
view static double getNativeFramebufferScaleFactor (XRSession ); };
session
Each XRWebGLLayer
has a context object, initially null
, which is an instance of either a WebGLRenderingContext
or a WebGL2RenderingContext
.
Each XRWebGLLayer
has an associated session, which is the XRSession
it was created with.
The XRWebGLLayer(session, context, layerInit)
constructor MUST perform the following steps when invoked:
-
Let layer be a new
XRWebGLLayer
-
If session’s ended value is
true
, throw anInvalidStateError
and abort these steps. -
If context is lost, throw an
InvalidStateError
and abort these steps. -
If session is an immersive session and context’s XR compatible boolean is
false
, throw anInvalidStateError
and abort these steps. -
Initialize layer’s context to context.
-
Initialize layer’s session to session.
-
Initialize layer’s
ignoreDepthValues
as follows:- If layerInit’s
ignoreDepthValues
value isfalse
and the XR Compositor will make use of depth values - Initialize layer’s
ignoreDepthValues
tofalse
- Otherwise
- Initialize layer’s
ignoreDepthValues
totrue
- If layerInit’s
-
Initialize layer’s composition disabled boolean as follows:
- If session is an inline session
- Initialize layer’s composition disabled to
true
- Otherwise
- Initialize layer’s composition disabled boolean to
false
-
- If layer’s composition disabled boolean is
false
: -
-
Initialize layer’s
antialias
to layerInit’santialias
value. -
Initialize layer’s
framebuffer
to a new opaque framebuffer created with context and layerInit’sdepth
,stencil
, andalpha
values. -
Allocate and initialize resources compatible with session’s XR device, including GPU accessible memory buffers, as required to support the compositing of layer.
-
If layer’s resources were unable to be created for any reason, throw an
OperationError
and abort these steps.
-
- Otherwise
-
-
Initialize layer’s
antialias
to layer’scontext
's actual context parametersantialias
value. -
Initialize layer’s
framebuffer
tonull
.
-
- If layer’s composition disabled boolean is
-
Return layer.
Note: If an XRWebGLLayer
's composition disabled boolean is set to true
all values on the XRWebGLLayerInit
object are ignored, since the WebGLRenderingContext
's default framebuffer was already allocated using the context’s actual context parameters and cannot be overridden.
The context
attribute is the WebGLRenderingContext
the XRWebGLLayer
was created with.
Each XRWebGLLayer
has a composition disabled boolean which is initially set to false
. If set to true
it indicates that the XRWebGLLayer
MUST NOT allocate its own WebGLFramebuffer
, and all properties of the XRWebGLLayer
that reflect framebuffer
properties MUST instead reflect the properties of the context's default framebuffer.
The framebuffer
attribute of an XRWebGLLayer
is an instance of a WebGLFramebuffer
which has been marked as opaque if composition disabled is false
, and null
otherwise. The framebuffer
size cannot be adjusted by the developer after the XRWebGLLayer
has been created.
An opaque framebuffer functions identically to a standard WebGLFramebuffer
with the following changes that make it behave more like the default framebuffer:
-
An opaque framebuffer MAY support antialiasing, even in WebGL 1.0.
-
An opaque framebuffer's attachments cannot be inspected or changed. Calling
framebufferTexture2D
,framebufferRenderbuffer
,deleteFramebuffer
, orgetFramebufferAttachmentParameter
with an opaque framebuffer MUST generate anINVALID_OPERATION
error. -
An opaque framebuffer is considered incomplete outside of a
requestAnimationFrame()
callback. When not in arequestAnimationFrame()
callback calls tocheckFramebufferStatus
outside of arequestAnimationFrame()
callback MUST generate aFRAMEBUFFER_UNSUPPORTED
error and attempts to clear, draw to, or read from the opaque framebuffer MUST generate anINVALID_FRAMEBUFFER_OPERATION
error. -
An opaque framebuffer initialized with
depth
false
will not have an attached depth buffer. -
An opaque framebuffer initialized with
stencil
false
will not have an attached stencil buffer. -
An opaque framebuffer's color buffer will have an alpha channel if and only if
alpha
istrue
. -
The XR Compositor will assume the opaque framebuffer contains colors with premultiplied alpha. This is true regardless of the
premultipliedAlpha
value set in thecontext
's actual context parameters.
Note: User agents are not required to respect true
values of depth
and stencil
, which is similar to WebGL’s behavior when creating a drawing buffer
The buffers attached to an opaque framebuffer MUST be cleared to the values in the table below when first created, or prior to the processing of each XR animation frame. This is identical to the behavior of the WebGL context’s default framebuffer.
Buffer | Clear Value |
---|---|
Color | (0, 0, 0, 0) |
Depth | 1.0 |
Stencil | 0 |
Note: Implementations may optimize away the required implicit clear operation of the opaque framebuffer as long as a guarantee can be made that the developer cannot gain access to buffer contents from another process. For instance, if the developer performs an explicit clear then the implicit clear is not needed.
Each XRWebGLLayer
has a target framebuffer, which is the framebuffer
if composition disabled is false
, and the context's default framebuffer otherwise.
The framebufferWidth
and framebufferHeight
attributes return the width and height of the target framebuffer's attachments, respectively.
The antialias
attribute is true
if the target framebuffer supports antialiasing using a technique of the UAs choosing, and false
if no antialiasing will be performed.
The ignoreDepthValues
attribute, if true
, indicates the XR Compositor MUST NOT make use of values in the depth buffer attachment when rendering. When the attribute is false
it indicates that the content of the depth buffer attachment will be used by the XR Compositor and is expected to be representative of the scene rendered into the layer.
Depth values stored in the buffer are expected to be between 0.0
and 1.0
, with 0.0
representing the distance of depthNear
and 1.0
representing the distance of depthFar
, with intermediate values interpolated linearly. This is the default behavior of WebGL. (See documentation for the depthRange function for additional details.))
Note: Making the scene’s depth buffer available to the compositor allows some platforms to provide quality and comfort improvements such as improved reprojection.
Each XRWebGLLayer
MUST have a list of viewports which is a list containing one WebGL viewport for each XRView
the XRSession
currently exposes. The viewports MUST NOT be overlapping. If composition disabled is true
, the list of viewports MUST contain a single WebGL viewport that covers the context's entire default framebuffer.
getViewport()
queries the XRViewport
the given XRView
should use when rendering to the layer.
The getViewport(view)
method, when invoked on an XRWebGLLayer
layer, MUST run the following steps:
-
Let frame be view’s frame.
-
If frame’s
session
is not equal to layer’s session, throw anInvalidStateError
and abort these steps. -
If frame’s active boolean is
false
, throw anInvalidStateError
and abort these steps. -
Let glViewport be the WebGL viewport from the list of viewports associated with view.
-
Let viewport be a new
XRViewport
instance. -
Initialize viewport’s
x
to glViewport’sx
component. -
Initialize viewport’s
y
to glViewport’sy
component. -
Initialize viewport’s
width
to glViewport’swidth
. -
Initialize viewport’s
height
to glViewport’sheight
. -
Return viewport.
Each XRSession
MUST identify a native WebGL framebuffer resolution, which is the pixel resolution of a WebGL framebuffer required to match the physical pixel resolution of the XR device.
The native WebGL framebuffer resolution is determined by running the following steps:
-
Let session be the target
XRSession
. -
If session’s mode value is not
"inline"
, set the native WebGL framebuffer resolution to the resolution required to have a 1:1 ratio between the pixels of a framebuffer large enough to contain all of the session’sXRView
s and the physical screen pixels in the area of the display under the highest magnification and abort these steps. If no method exists to determine the native resolution as described, the recommended WebGL framebuffer resolution MAY be used. -
If session’s mode value is
"inline"
, set the native WebGL framebuffer resolution to the size of the session’srenderState
's output canvas in physical display pixels and reevaluate these steps every time the size of the canvas changes or the output canvas is changed.
Additionally, the XRSession
MUST identify a recommended WebGL framebuffer resolution, which represents a best estimate of the WebGL framebuffer resolution large enough to contain all of the session’s XRView
s that provides an average application a good balance between performance and quality. It MAY be smaller than, larger than, or equal to the native WebGL framebuffer resolution.
Note: The user agent is free to use and method of it’s choosing to estimate the recommended WebGL framebuffer resolution. If there are platform-specific methods for querying a recommended size it is recommended that they be used, but not required.
The getNativeFramebufferScaleFactor(session)
method, when invoked, MUST run the following steps:
-
Let session be the target
XRSession
. -
If session’s ended value is
true
, return0.0
and abort these steps. -
Return the value that the session’s recommended WebGL framebuffer resolution must be multiplied by to yield the session’s native WebGL framebuffer resolution.
11.2. WebGL Context Compatibility
In order for a WebGL context to be used as a source for immersive XR imagery it must be created on a compatible graphics adapter for the immersive XR device. What is considered a compatible graphics adapter is platform dependent, but is understood to mean that the graphics adapter can supply imagery to the immersive XR device without undue latency. If a WebGL context was not already created on the compatible graphics adapter, it typically must be re-created on the adapter in question before it can be used with an XRWebGLLayer
.
Note: On an XR platform with a single GPU, it can safely be assumed that the GPU is compatible with the immersive XR devices advertised by the platform, and thus any hardware accelerated WebGL contexts are compatible as well. On PCs with both an integrated and discrete GPU the discrete GPU is often considered the compatible graphics adapter since it generally a higher performance chip. On desktop PCs with multiple graphics adapters installed, the one with the immersive XR device physically connected to it is likely to be considered the compatible graphics adapter.
Note: "inline"
sessions render using the same graphics adapter as canvases, and thus do not need xrCompatible
contexts.
partial dictionary WebGLContextAttributes {boolean =
xrCompatible null ; };partial interface mixin WebGLRenderingContextBase {Promise <void >makeXRCompatible (); };
When a user agent implements this specification it MUST set a XR compatible boolean, initially set to false
, on every WebGLRenderingContextBase
. Once the XR compatible boolean is set to true
, the context can be used with layers for any XRSession
requested from the current immersive XR device.
The XR compatible boolean can be set either at context creation time or after context creation, potentially incurring a context loss. To set the XR compatible boolean at context creation time, the xrCompatible
context creation attribute must be set to true
when requesting a WebGL context.
When the HTMLCanvasElement
's getContext()
method is invoked with a WebGLContextAttributes
dictionary with xrCompatible
set to true
, run the following steps:
-
Create the WebGL context as usual, ensuring it is created on a compatible graphics adapter for the immersive XR device.
-
Let context be the newly created WebGL context.
-
Set context’s XR compatible boolean to true.
-
Return context.
XRWebGLLayer
.
function onXRSessionStarted( xrSession) { let glCanvas= document. createElement( "canvas" ); let gl= glCanvas. getContext( "webgl" , { xrCompatible: true }); loadWebGLResources(); xrSession. updateRenderState({ baseLayer: new XRWebGLLayer( xrSession, gl) }); }
To set the XR compatible boolean after the context has been created, the makeXRCompatible()
method is used.
makeXRCompatible()
method ensures the WebGLRenderingContextBase
is running on a compatible graphics adapter for the immersive XR device.
When this method is invoked, the user agent MUST run the following steps:
-
Let promise be a new Promise.
-
Run the following steps in parallel:
-
Let context be the target
WebGLRenderingContextBase
object. -
If context’s WebGL context lost flag is set, reject promise with an
InvalidStateError
and abort these steps. -
If context’s XR compatible boolean is
true
, resolve promise and abort these steps. -
If the immersive XR device is
null
:-
Set context’s XR compatible boolean to
false
. -
Reject promise with an
InvalidStateError
and abort these steps.
-
-
If context was created on a compatible graphics adapter for the immersive XR device:
-
Set context’s XR compatible boolean to
true
. -
Resolve promise and abort these steps.
-
-
Queue a task to perform the following steps:
-
Force context to be lost and handle the context loss as described by the WebGL specification.
-
If the canceled flag of the "webglcontextlost" event fired in the previous step was not set, reject promise with an
AbortError
and abort these steps. -
Restore the context on a compatible graphics adapter for the immersive XR device.
-
Set context’s XR compatible boolean to
true
. -
Resolve promise.
-
-
-
Return promise.
Additionally, when any WebGL context is lost run the following steps prior to firing the "webglcontextlost" event:
-
Set the context’s XR compatible boolean to
false
.
XRWebGLLayer
from a pre-existing WebGL context.
let glCanvas= document. createElement( "canvas" ); let gl= glCanvas. getContext( "webgl" ); loadWebGLResources(); glCanvas. addEventListener( "webglcontextlost" , ( event) => { // Indicates that the WebGL context can be restored. event. canceled= true ; }); glCanvas. addEventListener( "webglcontextrestored" , ( event) => { // WebGL resources need to be re-created after a context loss. loadWebGLResources(); }); function onXRSessionStarted( xrSession) { // Make sure the canvas context we want to use is compatible with the device. // May trigger a context loss. return gl. makeXRCompatible(). then(() => { return xrSession. updateRenderState({ baseLayer: new XRWebGLLayer( xrSession, gl) }); }); }
12. Events
12.1. XRSessionEvent
XRSessionEvent
s are fired to indicate changes to the state of an XRSession
.
[SecureContext ,Exposed =Window ]interface :
XRSessionEvent Event {(
constructor DOMString ,
type XRSessionEventInit ); [
eventInitDict SameObject ]readonly attribute XRSession session ; };dictionary :
XRSessionEventInit EventInit {required XRSession ; };
session
The session
attribute indicates the XRSession
that generated the event.
12.2. XRInputSourceEvent
XRInputSourceEvent
s are fired to indicate changes to the state of an XRInputSource
.
[SecureContext ,Exposed =Window ]interface :
XRInputSourceEvent Event {(
constructor DOMString ,
type XRInputSourceEventInit ); [
eventInitDict SameObject ]readonly attribute XRFrame frame ; [SameObject ]readonly attribute XRInputSource inputSource ; };dictionary :
XRInputSourceEventInit EventInit {required XRFrame ;
frame required XRInputSource ; };
inputSource
The inputSource
attribute indicates the XRInputSource
that generated this event.
The frame
attribute is an XRFrame
that corresponds with the time that the event took place. It may represent historical data. Any XRViewerPose
queried from the frame
MUST have an empty views
array.
When the user agent has to fire an input source event with name name, XRFrame
frame, and XRInputSource
source it MUST run the following steps:
-
Create an
XRInputSourceEvent
event withtype
name,frame
frame, andinputSource
source -
Set frame’s active boolean to
true
. -
Set frame’s active boolean to
false
.
12.3. XRInputSourcesChangeEvent
XRInputSourcesChangeEvent
s are fired to indicate changes to the XRInputSource
s that are available to an XRSession
.
[SecureContext ,Exposed =Window ]interface :
XRInputSourcesChangeEvent Event {(
constructor DOMString ,
type XRInputSourcesChangeEventInit ); [
eventInitDict SameObject ]readonly attribute XRSession session ; [SameObject ]readonly attribute FrozenArray <XRInputSource >added ; [SameObject ]readonly attribute FrozenArray <XRInputSource >removed ; };dictionary :
XRInputSourcesChangeEventInit EventInit {required XRSession ;
session required FrozenArray <XRInputSource >;
added required FrozenArray <XRInputSource >; };
removed
The session
attribute indicates the XRSession
that generated the event.
The added
attribute is a list of XRInputSource
s that were added to the XRSession
at the time of the event.
The removed
attribute is a list of XRInputSource
s that were removed from the XRSession
at the time of the event.
12.4. XRReferenceSpaceEvent
XRReferenceSpaceEvent
s are fired to indicate changes to the state of an XRReferenceSpace
.
[SecureContext ,Exposed =Window ]interface :
XRReferenceSpaceEvent Event {(
constructor DOMString ,
type XRReferenceSpaceEventInit ); [
eventInitDict SameObject ]readonly attribute XRReferenceSpace referenceSpace ; [SameObject ]readonly attribute XRRigidTransform ?transform ; };dictionary :
XRReferenceSpaceEventInit EventInit {required XRReferenceSpace ;
referenceSpace XRRigidTransform ; };
transform
The referenceSpace
attribute indicates the XRReferenceSpace
that generated this event.
The transform
attribute describes the post-event position and orientation of the referenceSpace
's native origin in the pre-event coordinate system.
12.5. Event Types
The user agent MUST provide the following new events. Registration for and firing of the events must follow the usual behavior of DOM4 Events.
The user agent MUST fire a devicechange
event on the XR
object to indicate that the availability of immersive XR devices has been changed unless the document’s origin is not allowed to use the "xr-spatial-tracking" feature policy. The event MUST be of type Event
.
A user agent MUST dispatch a visibilitychange
event on an XRSession
each time the visibility state of the XRSession
has changed. The event MUST be of type XRSessionEvent
.
A user agent MUST dispatch an end
event on an XRSession
when the session ends, either by the application or the user agent. The event MUST be of type XRSessionEvent
.
A user agent MUST dispatch an inputsourceschange
event on an XRSession
when the session’s list of active XR input sources has changed. The event MUST be of type XRInputSourcesChangeEvent
.
A user agent MUST dispatch a selectstart
event on an XRSession
when one of its XRInputSource
s begins its primary action. The event MUST be of type XRInputSourceEvent
.
A user agent MUST dispatch a selectend
event on an XRSession
when one of its XRInputSource
s ends its primary action or when an XRInputSource
that has begun a primary action is disconnected. The event MUST be of type XRInputSourceEvent
.
A user agent MUST dispatch a select
event on an XRSession
when one of its XRInputSource
s has fully completed a primary action. The event MUST be of type XRInputSourceEvent
.
A user agent MUST dispatch a reset
event on an XRReferenceSpace
when discontinuities of the native origin or effective origin occur, i.e. there are significant changes in the origin’s position or orientation relative to the user’s environment. (For example: After user recalibration of their XR device or if the XR device automatically shifts its origin after losing and regaining tracking.) A reset
event MUST also be dispatched when the boundsGeometry
changes for an XRBoundedReferenceSpace
. A reset
event MUST NOT be dispatched if the viewer's pose experiences discontinuities but the XRReferenceSpace
's origin physical mapping remains stable, such as when the viewer momentarily loses and regains tracking within the same tracking area. A reset
event also MUST NOT be dispatched as an unbounded
reference space makes small adjustments to its native origin over time to maintain space stability near the user, if a significant discontinuity has not occurred. The event MUST be of type XRReferenceSpaceEvent
, and MUST be dispatched prior to the execution of any XR animation frames that make use of the new origin.
Note: This does mean that the session needs to hold on to strong references to any XRReferenceSpace
s that have reset
listeners.
Note: Jumps in viewer position can be handled by the application by observing the emulatedPosition
boolean. If a jump in viewer position coincides with emulatedPosition
switching from true
to false
, it indicates that the viewer has regained tracking and their new position represents a correction from the previously emulated values. For experiences without a "teleportation" mechanic, where the viewer can move through the virtual world without moving physically, this is generally the application’s desired behavior. However, if an experience does provide a "teleportation" mechanic, it may be needlessly jarring to jump the viewer's position back after tracking recovery. Instead, when such an application recovers tracking, it can simply resume the experience from the viewer's current position in the virtual world by absorbing that sudden jump in position into its teleportation offset. To do so, the developer calls getOffsetReferenceSpace()
to create a replacement reference space with its effective origin adjusted by the amount that the viewer's position jumped since the previous frame.
13. Security, Privacy, and Comfort Considerations
The WebXR Device API provides powerful new features which bring with them several unique privacy, security, and comfort risks that user agents must take steps to mitigate.
13.1. Sensitive Information
In the context of XR, sensitive information includes, but is not limited to, user-configurable data such as interpupillary distance (IPD) and sensor-based data such as XRPose
s. All immersive sessions will expose some amount of sensitive data, due to the user’s pose being necessary to render anything. However, in some cases, the same sensitive information will also be exposed via "inline"
sessions.
13.1.1. Active and focused document
A document MUST be active and focused at the time that sensitive information is requested.To determine if a given Document
document is active and focused the user agent MUST run the following steps:
-
If the currently focused area does not belong to document, return
false
-
If document is not of the same origin-domain as the active document, return
false
-
Return
true
13.1.2. Trustworthy documents and origins
In order to expose any sensitive information the requesting document MUST be considered trustworthy.To determine if a given Document
document is trustworthy the user agent MUST run the following steps:
-
If document is not a responsible document, return
false
-
If document is not active and focused, return
false
-
Return
true
13.2. User intention
It is often necessary to be sure of user intent before exposing sensitive information or allowing actions with a significant effect on the user’s experience. This intent may be communicated or observed in a number of ways.
13.2.1. User activation
Events which are triggered by user activation MAY serve as an indication of user intent in some scenarios.13.2.2. Launching a web application
In some environments a page may be presented as an application, installed with the express intent of running immersive content. In that case launching a web application MAY also serve as an indication of user intent.13.2.3. Implicit and Explicit consent
A user agent MAY use implicit consent based, for example, on the install status of a web application or frequency and recency of visits. Given the sensitivity of XR data, caution is strongly advised when relying on implicit signals.It is often useful to get explicit consent from the user before exposing sensitive information. When gathering explicit user consent, user agents present an explanation of what is being requested and provide users the option to decline. Requests for user consent can be presented in many visual forms based on the features being protected and user agent choice.
13.2.4. Duration of consent
It is recommended that once explicit consent is granted for a specific origin that this consent persist until the browsing context has ended. User agents may choose to lengthen or shorten this consent duration based upon implicit or explicit signals of user intent, but implementations are advised to exercise caution when deviating from this recommendation, particularly when relying on implicit signals. For example, it may be appropriate for a web application installed with the express intent of running immersive content to persist the user’s consent, but not for an installed web application where immersive content is a secondary feature.Regardless of how long the user agent chooses to persist the user’s consent, sensitive information MUST only be exposed by an XRSession
which has not ended.
13.3. Mid-session consent
There are multiple non-XR APIs which cause user agents to request explicit consent to use a feature. If the user agent will request the user’s consent while there is an active immersive session, the user agent MUST shut down the session prior to displaying the consent request to the user. If the user’s consent for the feature had been granted prior to the active immersive session being created the session does not need to be terminated.
Note: This limitation is to ensure that there is behavioral parity between all user agents until consensus is reached about how user agents should manage mid-session explicit consent. It is not expected to be a long term requirement.
13.4. Data adjustments
In some cases, security and privacy threats can be mitigated through data adjustments such as throttling, quantizing, rounding, limiting, or otherwise manipulating the data reported from the XR device. This may sometimes be necessary to avoid fingerprinting, even in situations when user intent has been established. However, data adjustment mitigations MUST only be used in situations which would not result in user discomfort.
13.4.1. Throttling
Throttling is when sensitive information is reported at a lower frequency than otherwise possible. This mitigation has the potential to reduce a site’s ability to infer user intent, infer location, or perform user profiling. However, when not used appropriately throttling runs a significant risk of causing user discomfort. In addition, under many circumstances it may be inadequate to provide a complete mitigation.13.4.2. Rounding, quantization, and fuzzing
Rounding, quantization, and fuzzing are three categories of mitigations that modify the raw data that would otherwise be returned to the developer. Rounding decreases the precision of data by reducing the number of digits used to express it. Quantization constrains continuous data to instead report a discrete subset of values. Fuzzing is the introduction of slight, random errors into the the data. Collectively, these mitigations are useful to avoid fingerprinting, and are especially useful when doing so does not cause noticeable impact on user comfort.13.4.3. Limiting
Limiting is when data is reported only when it is within a specific range. For example, it is possible to comfortably limit reporting positional pose data when a user has moved beyond a specific distance away from an approved location. Care should be taken to ensure that the user experience is not negatively affected when employing this mitigation. It is often desirable to avoid a 'hard stop' at the at the end of a range as this may cause disruptive user experiences.13.5. Protected functionality
The sensitive information exposed by the API can be divided into categories that share threat profiles and necessary protections against those threats.
13.5.1. Immersiveness
Users must be in control of when immersive sessions are created because the creation causes invasive changes on a user’s machine. For example, starting an immersive session will engage the XR device sensors, take over access to the device’s display, and begin presentating immersive content which may terminate another application’s access to the XR hardware. It may also incur significant power or performance overhead on some systems or trigger the launching of a status tray or storefront.To determine if an immersive session request is allowed the user agent MUST run the following steps:
-
If the request was not triggered by user activation or launching a web application, return
false
-
If the requesting document is not considered trustworthy, return
false
-
If user intent to begin an immersive session is not well understood, either via explicit consent or implicit consent, return
false
-
Return
true
Starting an "inline"
session does not implicitly carry the same requirements, though additional requirements may be imposed depending on the session’s requested features.
To determine if an inline session request is allowed the user agent MUST run the following steps:
-
If the session request contained any required features or optional features and the request was not triggered by user activation or launching a web application, return
false
-
If the requesting document is not responsible, return
false
-
Return
true
13.5.2. Poses
When based on sensor data,XRPose
and XRViewerPose
will expose sensitive information that may be misused in a number of ways, including input sniffing, gaze tracking, or fingerprinting.
To determine if poses may be reported to an XRSession
session, the user agent MUST run the following steps:
-
Let document be the document that owns session.
-
If the request does not originate from document, return
false
. -
If document is not active and focused, return
false
. -
If session’s
visibilityState
in not"visible"
, returnfalse
. -
Determine if the pose data can be returned as follows:
- If the pose data is known by the user agent to not expose fingerprintable sensor data
- Return
true
. - If data adjustments will be applied to the underlying sensor data to prevent fingerprinting or profiling
- Return
true
. - If user intent is well understood, either via explicit consent or implicit consent
- Return
true
. - Otherwise
- Return
false
.
Note: The method by which a user agent determines that poses do not expose fingerprintable data is left to the user agent’s discretion.
The primary difference between XRViewerPose
and XRPose
is the inclusion of XRView
information. When more than one view is present and the physical relationship between these views is configurable by the user, the relationship between these views is considered sensitive information as it can be used to fingerprint or profile the user.
If the relationship between XRView
s could uniquely identify the XR device, then the user agent MUST anonymize the XRView
data to prevent fingerprinting. The method of anonymization is at the discretion of the user agent.
Note: Furthermore, if the relationship between XRView
s is affected by a user-configured interpupillary distance (IPD), then it is strongly recommended that the user agent require explicit consent during session creation, prior to reporting any XRView
data.
13.5.3. Reference spaces
Depending on the reference spaces used, several different types of sensitive information may be exposed to the application.-
On devices which support 6DoF tracking,
"local"
reference spaces may be used to perform gait analysis, allowing user profiling and fingerprinting. -
On devices which support 6DoF tracking,
"local-floor"
reference spaces may be used to perform gait analysis, allowing user profiling and fingerprinting. In addition, because the"local-floor"
reference spaces provide an established floor level, it may be possible for a site to infer the user’s height, allowing user profiling and fingerprinting. -
"bounded-floor"
reference spaces, when sufficiently constrained in size, do not enable developers to determine geographic location. However, because the floor level is established and users are able to walk around, it may be possible for a site to infer the user’s height or perform gait analysis, allowing user profiling and fingerprinting. In addition, it may be possible perform fingerprinting using the bounds reported by a bounded reference space. -
"unbounded"
reference spaces reveal the largest amount of spatial data and may result in user profiling and fingerprinting. For example, this data may enable determining user’s specific geographic location or to perform gait analysis.
As a result the various reference space types have restrictions placed on their creation to ensure the sensitive information expose is handled safely:
Most reference spaces require that user intent to use the reference space is well understood, either via explicit consent or implicit consent. See the feature requirements table for details.
Any group of "local"
, "local-floor"
, and "bounded-floor"
reference spaces that are capable of being related to one another MUST share a common native origin; This restriction only applies when the creation of "unbounded"
reference spaces has been restricted.
To determine if poses must be limited between two spaces, space and baseSpace, the user agent MUST run the following steps:
-
If either space or baseSpace are an
XRBoundedReferenceSpace
and the other space’s native origin's falls further outside the native bounds geometry than a reasonable distance determined by the user agent, return true. -
If either space or baseSpace are an
XRReferenceSpace
with a type of"local"
or"local-floor"
and the distance between the spaces' native origin's is greater than a reasonable distance determined by the user agent, returntrue
. -
Return false.
Note: Is is suggested that poses reported relative to a "local"
or "local-floor"
reference space be limited to a distance of 15 meters from the XRReferenceSpace
's native origin.
Note: Is is suggested that poses reported relative to a XRBoundedReferenceSpace
be limited to a distance of 1 meter outside the XRBoundedReferenceSpace
's native bounds geometry.
13.6. Gaze Tracking
While the API does not yet expose eye tracking capabilities a lot can be inferred about where the user is looking by tracking the orientation of their head. This is especially true of XR devices that have limited input capabilities, such as Google Cardboard, which frequently require users to control a "gaze cursor" with their head orientation. This means that it may be possible for a malicious page to infer what a user is typing on a virtual keyboard or how they are interacting with a virtual UI based solely on monitoring their head movements. For example: if not prevented from doing so a page could estimate what URL a user is entering into the user agent’s URL bar.
To prevent this risk the user agent MUST set the visibility state of all XRSession
s to when the user is interacting with sensitive, trusted UI such as URL bars or system dialogs. Additionally, to prevent a malicious page from being able to monitor input on other pages the user agent MUST set the
XRSession
's visibility state to if the currently focused area does belong to the document which created the
XRSession
.
13.7. Trusted Environment
If the virtual environment does not consistently track the user’s head motion with low latency and at a high frame rate the user may become disoriented or physically ill. Since it is impossible to force pages to produce consistently performant and correct content the user agent MUST provide a tracked, trusted environment and an XR Compositor which runs asynchronously from page content. The compositor is responsible for compositing the trusted and untrusted content. If content is not performant, does not submit frames, or terminates unexpectedly the user agent should be able to continue presenting a responsive, trusted UI.
Additionally, page content has the ability to make users uncomfortable in ways not related to performance. Badly applied tracking, strobing colors, and content intended to offend, frighten, or intimidate are examples of content which may cause the user to want to quickly exit the XR experience. Removing the XR device in these cases may not always be a fast or practical option. To accommodate this the user agent SHOULD provide users with an action, such as pressing a reserved hardware button or performing a gesture, that escapes out of WebXR content and displays the user agent’s trusted UI.
When navigating between pages in XR the user agent should display trusted UI elements informing the user of the security information of the site they are navigating to which is normally presented by the 2D UI, such as the URL and encryption status.
XRSession
s MUST have their visibility state set to when the user is interacting with potentially sensitive UI from the user agent (such as entering a URL) in the trusted environment.
13.8. Context Isolation
The trusted UI must be drawn by an independent rendering context whose state is isolated from any rendering contexts used by the page. (For example, any WebGL rendering contexts.) This is to prevent the page from corrupting the state of the trusted UI’s context, which may prevent it from properly rendering a tracked environment. It also prevents the possibility of the page being able to capture imagery from the trusted UI, which could lead to private information being leaked.
Also, to prevent CORS-related vulnerabilities each page will see a new instance of objects returned by the API, such as XRSession
. Attributes such as the context set by one page must not be able to be read by another. Similarly, methods invoked on the API MUST NOT cause an observable state change on other pages. For example: No method will be exposed that enables a system-level orientation reset, as this could be called repeatedly by a malicious page to prevent other pages from tracking properly. The user agent MUST, however, respect system-level orientation resets triggered by a user gesture or system menu.
13.9. Fingerprinting
Given that the API describes hardware available to the user and its capabilities it will inevitably provide additional surface area for fingerprinting. While it’s impossible to completely avoid this, user agents should take steps to mitigate the issue. This spec limits reporting of available hardware to only a single device at a time, which prevents using the rare cases of multiple headsets being connected as a fingerprinting signal. Also, the devices that are reported have no string identifiers and expose very little information about the devices capabilities until an XRSession is created, which requires additional protections when sensitive information will be exposed.
14. Integrations
14.1. Feature Policy
This specification defines a policy-controlled feature that controls whether any XRSession
that requires the use of spatial tracking may be returned by requestSession()
, and whether support for session modes that require spatial tracking may be indicated by either isSessionSupported()
or devicechange
events on the navigator.xr
object.
The feature identifier for this feature is "xr-spatial-tracking"
.
The default allowlist for this feature is ["self"]
.
Note: If the document’s origin is not allowed to use the "xr-spatial-tracking"
feature policy any immersive sessions will be blocked, because all immersive sessions require some use of spatial tracking. Inline sessions will still be allowed, but restricted to only using the "viewer"
XRReferenceSpace
.
15. Acknowledgements
The following individuals have contributed to the design of the WebXR Device API specification:
-
Sebastian Sylvan (Formerly Microsoft)
And a special thanks to Vladimir Vukicevic (Unity) for kick-starting this whole adventure!