W3C

– DRAFT –
Immersive Web March 2024 FTF

25 March 2024

Attendees

Present
ada, alcooper, Alex Cooper, Atsushi Shimono, bajones, bialpio, Brandel, Brandon Jones, Brett_, cwilso, etienne_, Javier Fernandez, jfernandez, Laszlo_Gombos, Lazlo Gombas, m-alkalbani, marisha, mblix, nick-niantic, Omegahed, Piotr Bialecki, Rik Cabanier, yonet
Regrets
-
Chair
Ada
Scribe
alcooper, bajones, Brandel, Brett_, cabanier, marisha

Meeting minutes

<cwilso> IRC instruction manual : https://github.com/immersive-web/administrivia/blob/main/IRC.md

Guide on Scribing: https://w3c.github.io/scribe2/scribedoc.html

<yonet> item: immersive-web/webxr#1365

Give developers control over "overlay" browser (webxr#1365)

<Brett_> hello laszlo

<Brett_> :wave:

<giorgio> no video from room also here

<giorgio> Nice to see you guys! ;-)

<ada> immersive-web/webxr#1365

cabanier: A little while ago the Quest Browser released a feature where if you click the Quest button, instead of a plain overlay, we bring up the 2D browser

cabanier: We keep the visual blurred so the developers can change what's in the 2D experience

cabanier: Or could change the settings of your WebXR experience

cabanier: We got many requests to trigger and exit this manually

cabanier: Right now the only way is to hit the Quest button and exit by clicking Resume. Right now we are the only ones with this feature, but we think it's a good feature and could be done by other manufacturers

cabanier: There could be an API to call this at any time. The 'resume' API should probably be user-activated

cabanier: We got lots of feedback that devs want to control this

<Zakim> ada, you wanted to ask about DOMOverlay

ada: It's a cool idea and it seems like this would be a good fit for DOM overlay

ada: Some DOM overlay wouldn't make sense like 'what does fullscreen' mean, but you're just showing the full browser window

ada: The bit that developers could trigger and bring up might be more interesting if it was with the DOM overlay API where they could more fully customize it

cabanier: Right now it's not just the page but the full Brorwser navigation , refresh button, address bar, etc

cabanier: DOM overlay takes an element, makes it fullscreen, not sure how this would work

ada: You could take the full window, wouldn't need the Browser chrome

cabanier: I think that would break a lot of assumptions for pages

cabanier: Maybe something like a DOM overlay for when devs want to trigger it

Brandel: Right now you're sending visible: blurred and only head placement is updated

Brandel: seems like it's straying from what visible:blurred was for

cabanier: No we alwasy had this overlay screen

Brandel: But that was meant to have WebXR in a frozen state. But now indirectly you have a way to react in the WebXR session via the 2D page

Brandel: So it's not at the same scope as visible: blurred

cabanier: visible:blurred just indicates you don't have controls any more, only head pose

Brandel: My question is more about whether that's all that we want to vend. What do we expect to come through vs not come through?

Brandel: visible:blurred was not going to update the world for any reason before

Brandel: My main question is whether this is adequately descriptive of what it is and what it's for

bajones: visible:blurred means that something else is taking input right now but page content is still visible and can still do head tracking. Tracking could be at a lower rate

bajones: The intent of that state is: you can still see the world, but none of the inputs

Brandel: For cases of audio, keyboard, and DOM events, you wouldn't lose anything by virtue of being in overlay

cabanier: You're right in that mode, keyboard events would start working again

Brandel: Any set of events that are expected in WebXR vs 2D is what is the concern

nick-niantic: How does this work today, and how could it work in ways that are more beneficial? When you go into this mode, where does the Browser show and where does the 3D content show?

cabanier: Browser is on top, right in front of the user

Brandel: I beleive it's in whatever position it previously was

nick-niantic: So the Browser is not occluded by anything in the scene. If I had a scene with a car in it, with a car's color, I could bring up this window and click a button to immediately update the color of the car?

cabanier: yes

nick-niantic: This sounds like what we were asking for with DOM layers a while back except for a combination of being in a fixed position, or when the browser could always be open, but then you'd need to be able to get it out of your way

nick-niantic: But not doing depth-testing with the scene also is a little bit undesirable, if something was in front of that browser window you could move it out of the way

cabanier: I don't think there's anything stopping us from placing the browser where they want it

Brandel: The user can put the window wherever they want

cabanier: I think nick wants it programmatically

nick-niantic: Even programmatically is tricky, trying to find a placement for it. We also allow the user to move it around and it can follow your head around

nick-niantic: A system to manage that automatically would be okay, but having it be embedded in the scene and movable

nick-niantic: Being able to do things like press a button on your wrist to make it appear and disappear would be desirable

nick-niantic: We would want it to be interactable with the rest of the scene

ada: Something like that would work on Vision Pro quite well if you don't have hand tracking requested, because we have transient pointers

ada: If you were to target the DOM content, you could just not bubble it up to the WebXR content

Brandel: The only reason it would work with Vision Pro is because it wouldn't work with Meta

Brandel: When you have the ability to track user inputs to a webpage, it exposes serious security issues (like watching someone put input into a banking website)

nick-niantic: Today if I have a webpage and I have an iframe, that would be an external source

Brett_: You could just have the input disappear when user inputs over an iframe

bajones: Yes but that would be weird

ada: You could re-implement something like transient-pointer in order to make this work

ada: I think everyone wants a way to get DOM into the web. this seems like the closest approach so far.

cabanier: You also want occlusion and depth-sorting? That would be pretty difficult and expensive

cabanier: I think the OS supports it but you have to resolve depth for every frame

<Zakim> Brett_, you wanted to fully flesh out this idea for HTML support in WebXR

Brandel: Regarding trust and safety, we also don't want to the Browser chrome to be obscured

Brett_: I want to be able to use HTML in VR and position it. The one thing holding it back right now is security. We can't have a cursor here because it could be an iframe

Brett_: We could have a security policy with shared array buffers to not include iframes from cross-origin domains

Brett_: It would allow us to start experimenting with DOM overlay in a safe environment

cabanier: I think there is a proposal here. The HTML would have to come from its own document

bajones: In general the conclusion with this group is that when we get ot a point where we can start doing that, this is very likely going to be a restriction set in place

bajones: There are other technical issues around it. But in general I think we're in agreement that that is the safest way to start pushing into that space

cabanier: Every DOM layer would be its own document with the same origin

Brett_: Right now you can only have one WebXR session but you could maybe later have two WebXR sessions overlayed on each other

cabanier: In the overlay Browser you have to block all input

Brett_: But what about a scenario where you don't blur everything

bajones: Overlay browser is the topic of discussion currently

bajones: for DOM layers that is DOM content in the world, while you retain input in the XR session, I think is desirable and everyone recognizes how powerful that could be.

bajones: Interaction with the DOM layer gets tricky, if the developer has their own cursor pointer (like from a gun), then it diverges frmo the cursor pointer that interacts with the DOM layer

Brandel: WebXR need not be the only pathway to spatial content on the internet

Brett_: But the only way to use HTML content in VR.. how to do that

<Zakim> bajones, you wanted to ask about potential for pages to grief users

bajones: There would be many ways

bajones: One of the things that came to mind about the gesture to pull up the Browser overlay: if I'm pulling up the Browser while in the session, it's probably because I want the user to do something within the session, rather than allowing them to run away

bajones: This seems like an opportunity to grief the user if I don't let the user do the thing they want to do, if a gesture works

bajones: You could trick the user into never triggering the actual system gesture

bajones: You need a way to mitigate that, and it also implies the version of the Browser that I bring up programmatically should also have some sort of visible difference to the typical browser

bajones: Maybe it doesn't ahve a tab bar, an abbreviated chrome (it feels slightly less useful that way)

bajones: If you're going to allow positioning of teh window programmatically - it feels like a bad idea, you don't want to allow them to hide it behind the user's head or run away from their hand

bajones: But the ability to hint ot the Browser that "this might be a good place for content" to spawn the Browser there optionally

bajones: I'm more concerned about the griefing scenarios where devs would deny users access to the real browser

bajones: This goes back to the API where there's a gesture to always reliably open the Browser

bajones: In order to fake the system gesture for this, it's a huge ordeal. With a programmatic gesture, you are invited to generate and dismiss this object

bajones: If you have a trigger but not a resume gesture, the concern is quite a bit less

cabanier: The proposal is to require user action to resume the WebXR session

bajones: Clicking the 'resume' button is not propagated to the page?

cabanier: It is not

Brandel: Are the select/click actions trusted in the XR environment?

cabanier: I'm not sure

ada: Wasn't the point of 'select' to have a trusted event in WebXR?

bajones: That was the intention for the event. You also have to initialize the session in the context of a trusted event

bajones: If the user can trigger something and it not be closed arbitrarily, it seems like it should be the full browser

bajones: You said you don't need a user activation to call this API. Would it be sufficient to require that but initially open into this mode?

alcooper: You mentioned a user case to direct devs to a separate site for payment - why do devs want a separate site rather than showing their own content - is directing to a separate 2D page also spoofable?

alcooper: If the developer can guess what your browser looks like an pretend it

Brett_: What about one gesture to open the full browser and another button to open a partial web view (same dom origin)

alcooper: That sounds like what we were discussing as a separate thing and why you'd want to pull up the full browser

alcooper: Maybe for payments there would be a separate API for trusted scenarios

cabanier: I agree it's a concern, maybe multiple tabs should not be available..

Phu: So bringing it up with the user action (not programmatic event) - what are scenarios where people will bring up the Browser without leaving the WebXR session?

cabanier: Right now they don't have a choice, the browser just comes up automatically

Phu: Bringing up the browser only partially solves the problem that DOM overlay would try to solve. Doing it programmatically sounds additionally useful

Phu: But there is the spoofable concern. Today a webpage can declare whether a webpage can appear in an iframe or not. Maybe we update the contract to allow communication for other modes (webxr)

cabanier: That sounds like a whole new paradigm for the browser

Phu: The website could declare their preferences or declare their trusted origins

alcooper: Tehre might be permission policy stuff that might allow that today. But this is in reverse of that almost

Phu: Could at least enable it in developer mode or something so people could validate the scenario

Phu: Everyone who uses our SDK asks for a way to embed web content

Backgrounded Tabs and WebXR (webxr#1364)

<ada> immersive-web/webxr#1364

ada: Now we have several stand-alone webXR device, the original metaphor for how webXR was devised isn't how we're running them. We don't seem to have broad agreement about what is and is not running in this new model

ada: e.g. video, getUserMedia, various animation events etc,

ada: My issue posits three buckets for definite yes, definite no and some "maybes" for things we could gain some benefit from but we should discuss them here!

alcooper: we ran into this even on desktop VR. the 2D window 'lost focus', resulting in some input shenanigans. We need to find clarity on this.

alcooper: one question is power consumption - what are people permitted to shut down vs. what is obligatory for the session

cabanier: I would expect there to be a difference between minimizing a browser to tab-switching to entering an XR session

ada: I had asked for the page's rAF but grudgingly accept the importance of suspending it

cabanier: WebApps may provide an overall listing of which capabilities are available within a minimized or an inactive tab, but it may be down to the individual feature

Brandel: I'm aware that the initial treatment of audiocontext, we suspended the context on session start.

cabanier: We should defer to the logic of the 'hidden state' API for individual features rather than attempt to collect or mandate the status of features on our own

Phu: Working on those features one-by-one in/for webXR is messy, and should be left to the feature owners there

ada: Should we seek these people out to clarify the behavior is and discuss the best resolution for these?

alcooper: I would need to go code spelunking in order to find the relevant owners and discuss the right action

mblix: We have been talking about this from the spec perspective - are there alternatives like performance, to assess the relative costs of enabling various features like window.rAF?

cabanier: we have looked at this in the past, and a lot of people run exorbitantly expensive ads that would negatively affect webXR performance etc.

mblix: could we pursue the optional enablement of various features on the basis of explicit page intent?

cabanier: that seems reasonable

cabanier: This would be difficult to polyfill across different devices and vendors, given the hard boundaries for capability.

<Zakim> bajones, you wanted to say that we do reference the page visibility state in the spece

<bajones> https://immersive-web.github.io/webxr/#ref-for-visibilitystate-attribute

ada: It's *not* something we can likely polyfill, we would probably need to return the success or failure in the features request for the session.

bajones: We do mention the visibility state in an `inline` session description, It's likely we can do something more useful - even if it's non-normative language describing this.

bajones: I've been spelunking through similar APIs to look for precedent, and found FullScreen and Picture-in-picture. There is scant mention of anything useful for our purposes.

Brett_: Could we have a range of user-configurable capabilities to control this? Ideally both developer- and user-facing capabilities to configure?

bajones: I would recommend against granularity exposed to the user, given the level of legibility those options necessarily entail. I'm concerned people won't understand this.

bajones: What does this enable for the user beyond what reasonable defaults furnish for them?

Indicate "preferred" immersive mode (webxr#1360 )

cabanier: I propose this because it is simliar to an openXR API. If a user is already in pass-through, maybe we should allow users to remain in pass-through. Likewise, devices that support AR and VR but _prefer_ VR, it might be useful to hint to the user that the device will do better with a given mode.

cabanier: I'm not sure of the scope of additional information disclosure, but it seems like it's only a little. The API would indicate what is preferred, but wouldn't enforce it.

yonet: Generally I undestand developers to be explicitly developing _for_ AR or VR. How does this work for them?

cabanier: you will be able to return the mode that is preferred from the developer's side.

cabanier: a requested AR session would still be an AR session, likewise with VR - this is more about the capacity to provide continuity. Developers could still tailor for individual devices

cabanier: This is about the _device_ providing the indication of preferred mode, rather than the content or the user.

cabanier: The system could let the user decide, but it would be a system indication. A developer could still launch what's available

bajones: There is a long list of people who care a lot about a very little amount of information, so any exposure is something we need to be wary of

bajones: especially in concert with other browser signals, this serves to flesh out even more contextual information about exactly what the user is experiencing. I'm not personally overly concerned myself, but other people are,

bajones: I echo yonet 's thoughts about nudging users and authors towards a specific (and potentially more disclosing) context for interaction, which is both unfornuate and _primarily_ a library problem

bajones: though realistically, people do stupid things.

bajones: User preferences could provide a way to override information like this

<ada> qack ada

bajones: particularly remembering preferences could make this less frustrating

ada: if this could be requested along with a session rather than in `isSessionSupported`, it could be more frictionless - given that it's mainly about a preference but not a required capability.

ada: gating the time at which the information is vended also reduces the scope of how it can be integrated into general information-gathering as well

ada: whether frameworks would care to honor this information depends on them - but the basic presence of "supports preferred" might be enough on the isSessionSupported

bajones: Most of the time, the main thing that distinguishes an AR session from a VR one is that the AR simply loads and displays _less_

idris: this seems like a good proposal, but the preferences should lie primarily with the developer's awareness of the experience rather than the user's

idris: the user should have ability to agree or disagree to a given set of arrangements, but it's on the developer to decide what to offer

nick-niantic: the uncertainty of not knowing what's going to be entered might imperil the load-status of everything required for the session, but it's not a big deal

ada: given that invoking a session requires a prompt anyway, `preferred-vr` and `preferred-ar` both going into the session would just defer the decision to another moment unless we specify a default

ada: this would probably only cater to people who have the ability to construct the absolute best experience possible across all capabilities

cabanier: generally, AR sessions request many more features than VR. AR wants hit-testing, lighting estimation, world geometry - where VR often only needs local-floor etc.

cabanier: that kind of thing results in additional overhead, given the per-frame computations required for some of them. That becomes expensive if it's not essential and by design for the experience

bajones: we could say if you use 'sessionPreferred' you can only use optional features, but that doesn't feel great.

bajones: I am trying to figure out the circumstances under which you might decide between AR and VR based on the knowledge that one is _more_ preferred? Ultimately, I think an author is going to be highly opinionated

bajones: it's hard to tell how an author would make different choices based on the results of knowing this setting

bajones: The preloading feature makes knowing ahead of time more compelling

alcooper: we have different features for VR vs. AR, so there's an immediate divergence based on that decision from the start

alcooper: as an author, if I already know the category of device, it's likely I can make decisions about what to offer

bialpio: given that there are a range of states for a given device, it's not likely that we can know what the preferred mode should be based on the hardware.

bialpio: the branching between committing to AR and VR become tangled, especially when some features have been enabled and others haven't.

<Zakim> ada, you wanted to say what's your preferred? incorect your getting vr instead

bialpio: It's not clear to me how much this affects the complexity for pre-fetching, whether people would pull everything or need to make decisions at that moment when the information is known

<Brett_> Would it be nice for a normal webpage to know if its being opened in VR or AR?

ada: if we return this at the moment that the session is actually granted, while you don't get the preload ability, you still have the opportunity to re-negotiate the terms of the session before it has started in earnest

bialpio: Right now the only way to load either AR or VR is to have two buttons - one for each high-level kind

Brandel: we will always have a permission dialog here, so trying to smooth out the friction this kind of query requires isn't as important

Brett_: Does a page knows the context it's in?

lunch break

presnt+

Add Scene Description API for automation and a11y (webxr#1363)

ada: Topic keeps being brought up around accessibility discussions and automation; but if nothing else this is a good foundation for an accessibility story
… which is currently a bit lacking
… two part API, first is a stencil buffer where they render the ID of each object as a color with no antialiasing

<ada> immersive-web/webxr#1363

ada: Scene Graph attached to a session somehow; where each object in the scene graph contains an ID that matches the color in the stencil and a few properties to make it useful
… Some proposals to make this useful: 4x4 transformation matrix (not sure if local or global), bounding box with identity applied (even if not currently visible), attachment to a specific XR space, name/description (description like an alt tag), role in the environment (aria role)
… this could also be used to tab through content and even fire select events without needing inputs
… sound effects or whether things represent a person on a person
… navigation to another page (aria role relationship), and/or any state that it has
… Does this sound useful?
… Probably wouldn't be too difficult to implement from a browser standpoint (hook a buffer up to screen reader functionalities), also useful for automation
… Downside is that this wouldn't be automatic; developers would need to do additional work; but unblocks usage in areas where accessibility is required
… What carrots should we provide to encourage developers to use it?

bajones: This has been talked about a bit in Editor's meetings, a few highlights
… really interesting idea, been talking about need for accessibility for a long time; not just WebXR but anyhting that creates a buffer of pixels
… requires hacks to be accessible today
… possibility of extending to WebGL/WebGPU, which may also be a bit of a curse
… Talked about this a bit more abstractly, but hard to find a way to meet in the middle with accessibility folks and/or prototype
… Really need people who can prototype this to figure out that we're moving in the right direction, and then need to provide encouragement to developers
… e.g. developers have to do application specific ray tracing, and we can use this to maybe provide hints or objects that were hit-test
… Maybe something more primitive than color buffers of objects like just bounding boxes
… If you have the color buffer we can ID those a bit better

bajones: Gives the opportunity for the browser to also highlight what you're looking at which the page may not be able to do, e.g. on the Apple Vision Pro
… whatever we talk about here, we should disconnect it from a session
… may want to build this outside of having a session
… and then attach it to a space
… Minor nitpick: Rigid Transforms, not 4x4 matricies

nick-niantic:

nick-niantic: A couple observations, a lot of 3D frameworks lean into these representations that don't give you tools to get a full description of the scene
… a lot of ways to manipulate geometry of scene beyond initial scene description
… therefore it's indeterminate and hard to do custom hit testing
… In some ways having a full description of the scene is impossible
… I'd try to think more concretely about what values people can get and write a spec to give them a way to get benefits out of those values
… need to be clear what these scene-reading applications are and what value they provide
… but also not looking for a solution that fits everything, since likely need to run code to do a hit test

ada: Goal wasn't to force a developer to fully enumerate all these properties; just that they get more benefit to providing more of them
… if a developer doesn't do anyhting, things will still work as before
… for automation kind of just want to use it with WebDriver to figure out where buttons are
… a lot of sites rely on automation testing which is currently hard in WebXR since there's not the same feedback mechansims as HTML/CSS provide
… accessibility is the most important story; automation is a side benefits, everything else is meant to be developer benefit/motivation to use this
… maybe search engines can use this to index part of the page

<Zakim> forgot, you wanted to paste image https://manipulation.csail.mit.edu/data/coco_instance_segmentation.jpeg

Brett_: Is it like this image https://manipulation.csail.mit.edu/data/coco_instance_segmentation.jpeg

ada: Yeah kind of

Brett_: So this is an accessibility frame buffer?

Ada: yes, with an object on the side

Brett_: Loading HTML into WebXR solves this for that WebXR (given the existing content), and same for model/image tags

ada: But that doesn't solve the WebGL problem

Brett_: But could you describe this all via HTML?

ada: We'd essentially have to rebuild HTML for the Web which is a 30+ year project

<cabanier> https://www.w3.org/WAI/APA/wiki/WebXR_Standards_and_Accessibility_Architecture_Issues

cabanier: We've discussed this a few times, including link from Accessibility group
… Is this the same proposal?

ada: Very similar

ada: so full scene graph including things that aren't rendered and you'd also have the framebuffer

cabanier: In that case why do we need matrices/bounding boxes and why not just pixels?

ada: Thought it could be cheap and interesting, but if it's not useful shouldn't list it. Initial list is just initial impressions of what my be useful

bajones: From an accessibility standpoint how important is it to see something off the screen exists
… maybe if you have mobility issues you can be targeted towards it
… (via tabbing)
… This makes me say we probably want more than just a pixel buffer approach and they are complimentary

cabanier: Might be very hard to calculate the bounding boxes

bajones: This gets into what Nick was talking about that things may become impractical

Brandel: One place definitely needed is to include/give information about things that aren't in view, and pixels won' t be enough

cabanier: DOM would still give you this information
… unsure if there is a need for the extended scene description if there's a pixel buffer and the fallback dom
… fallback dom wouldn't have spaces/bounding boxes, is just standard html

ada: If you just had an object that was one set of objects/vertices with a smart shader you could still work out where things like heads/arms are to be rendererd accurately in the scene graph
… scene graph doesn't have to be literally what's being rendered to the page, could fake it a bit/provide basic positions

<Zakim> ada, you wanted to suggest a descriptive torch

ada: Another useful thing could be if pixel buffer sometimes came from eye or sometimes from hand
… e.g. could scan with hand like a torch (flashlight) vs. just what looking at

bajones: Definitely need secondary buffer to get info when pointing sideways vs. what is being viewed; concerned about performance

ada: Resolution doesn't have to be same as pixel buffer

bajones: This does effectively mean doing a second render pass.
… similar if this buffer isn't antialised but color buffers are
… lots of pitfalls that can lead to bad performance

Brandel: Despite the fact that most engeines are using direct scene graphs, that shouldn't necessarily be what's submitted for this since instancing, et.c can lead to inaccuracices
… but this does mean that it can be simpler
… Doesn't have to update as often
… These aren't dealbreakers
… may also need this for things like shadertoys that are just two triangles
… WebGL doesn't have a scene graph and just has pixels as well
… conceptually harder to construct this sometimes
… shouldn't overindex on what some people have to do to create these views

ada: Could use this to update instances if browser keeps the two in sync

Brett_: If we do use model tags could automatically generate scene graph
… on a smartphone today, can use accessibility features in 3D css/dom overlay

<Zakim> Brett, you wanted to talk about Model tag smart-shader + 3D CSS Dom Overlay Smart shader

ada: It's possible to do this with DOM overlay today, but falls apart via VR today

Potentially incorrect wording in the specification (immersive-web/depth-sensing#44)

bialpio: When talking with Rik about the depth sensing API we realized that there may be a mistake in the spec. Want to see if there's any potential future changes of this sort.
… : Feels like we will have to introduce a breaking change with what Chrome is already returning in order to be more concrete.
… In ARCore we're not returning from the near plane, just returning as-is. Doesn't match Quest, need to offset to make them match.
… We don't want to have to apply this offset to the depth buffer ourselves, which incurs a perf penalty, so need to return an offset for the user to use.
… That's the breaking change. The data will be the same, but if users don't change how they interpret the data it won't work cross platform.
… Do we need to account for anything beyond the near plane. For example, how is the data interpreted between near/far planes?

cabanier: How did we come to the conclusion that the values returned are between the near/far plane?

bialpio: We assume 0 means near place + 0 mm?

cabanier: I don't know if that's true?

bialpio/cabanier: Discussion of how to use projection matrices to get values. Math math math.

bialpio: First thing is that we need to change the spec to not say values are from the near plane, because it turns out to not be true for any devices today.
… : We also have to expose the near value that it's relative to, or there's no way it can work. Once you have a depth buffer you already have data that you have to fix up.
… : Might have to expose depth far? Only if the depth buffer data is normalized. In order for the developer to know how to use for anything other than occlusion they need to know these values.
… : Occlusion might work if the values match the near/far planes.
… : How can we future proof, since we're already making a breaking change.

cabanier: I think the depth should always be linear

Brandel: Logrithmic depth is beneficial in some cases, not necessarily in AR context given the scales involved.

nick-niantic: AR core returns 16 bit ints with distance in mm

cabanier: Meta returns floating point meters

bialpio: Bigger the numbers, the more inaccurate the reporting is going to be.

bialpio: AR Core says beyond 8 meters precision falls off.

bajones: How does the different scale get resolved today?

bialpio: Already reports scaling factor and data format.

Ada: Could we provide a conversion function?

bialpio: That exists on the CPU, but how do we do it on GPU?

nick-niantic: Could you provide a matrix?

bialpio: For non linear values? Probably not.

cabanier: Could we tell devices that decide to go the non-linear route that the have to expose a v2 of the API?

bialpio: Yeah, that would work.
… last question for Rik. your depth far is at Inifinity? Do we have to expose that?
… We just expose absolute values.

bajones: Doesn't that imply that depth far is at infinity?

bialpio: Probably? That sounds right but not sure.
… I think we can skip it. Should be able to absorb that math into the scaling factor.

Brandel: Did you say the values are perpendicular in ARCore? Because that would make the values non linear.

<more discussion about if we need more math or less math>

nick-niantic: Trying to understand depth near on Oculus. Why do we need to have it? Due to how the cameras compute it?

cabanier: Yes.
… Only reason we might need depth far is if implementations use something other than infinity for the far plane.

bialpio: Does openXR enforce units?

cabanier: depth 16, yes.

bialpio: I think the math will work out.

Brett_: Is the near plane supposed to represent where the depth starts counting from from the headset moving out?

bialpio: Yes.

Feature for the UA to handle viewing the system inputs during a session (immersive-web/webxr#1366)

bialpio: Think I have what I need to make progress.

<marisha> How to scribe: https://w3c.github.io/scribe2/scribedoc.html#scribe

Ada: On the vision Os when you dont enable hand tracking because the a hands arnt visible we show the users hands. We are thinking from using this it might be nice to have this for when you turn on hand tracking they could opt in and use their real hands. In general, instead of using WebXR input profiles you use system rendering. You might be able to have the controller do stuff with depth buffers for interaction. Want to guage interest and[CUT]

like when pages cut out information for security reasons like with dom layers

bajones - depth compositing isn't a factor. today my understanding is on vision pro you can request the hands without hand tracking and get the hands and then it stops if you ask for the hands

ada - we dont want to stop people doing existing hand writing so they would opt into this

brandon - do you envision this as a real time toggleable thing you could turn

brandon - or flag at session time for all meat all the time

ada: describing transition between system hands

idea: about putting on boxing gloves but being cute, switching from real hands to controller hands

ada: maybe it should be toggleable on the fly, like a method in the session;

brett: what about clipping your hands in and our with depth sensing

brandon: by layering the hands video feed on purpose, you could composite it without depth

ada: this will be a feature flag and you turn it on and then there is an open PR that is about inputs and weather or not they should be rendered

ada: in this situation we would set that to true to show that hte hands are already rendered somewhere else

rick: on the quest that flag false always unless you add this feature

rick: could this work in AR. it would work for AR depth sorting hand visibility or depth sensing can do this too

ada: it could be viable for AR and I guess you would be doing whatever the system does, the system could do

rik: leave the hands up to the UA with interacting with controller? ada: yes

this is about systems rendering inputs

Felix: Is this about hands only? Or what is currently supported by the system? What about tracked sources?

Ada: Tracked sources will be rendered by the system, your in VR and the system is rendering controllers, they could appear on the table, you also have system rendered hands, currently the system does the hand rendering

Felix: What about an input rig?

I need to ask more about that

Felix: One more question, do you see values in just rendered the active input sources?

Ada: Transient pointers are an example of something that isn't even tracked like that

Ada: Per input? Maybe its gonna be like... like when you do hit testing you give it a target ray mode track, it could be like that

Ada: Maybe its only for tracked pointers like things that would get rendered, actually I dont think so it would be for all tracked pointers

Felix: Turn off rendering of active rendering sources as an option?

Ada: Quest perspective, put controllers down, have hands, pick them up,

Brandon: We talked about trackers representing the body before

1+

Ada: That would be interesting, the user agent would render those?

Felix: We have the tracked sources, thats the controllers, we are using hands right now, controllers not being blocked out when depth is wrong can be destracting

Ada: Would you want to be able to cut out the controller with depth buffer?

Ada: technically?

Ada: Session startup?

Rick: Would know if punch-through or not and can not render controller. Your punchtrough hand would remove the hand your rendering.

Ada: User agent could request the feature, then you render, then tell them not to render these, others can or can't

Felix: I got an idea for the possibility to toggle this at runtime? Make this universal, one rendering of hand is one XR input source, a list of masks could be exposed and find hte mapping of the rendering to the XR input source

Felix: Then system rendering can be turned off and people can toggle rendering per input source

Ada: We can't expose the mask raw data

Ada: Per input, seems harder to do it for every input

Rick: but one controller and one hand

Brandon: I don't know if this is rendering a mask per hand

Brett: Feet, hands, body, this can expand to the rest

Ada: compositing hands with virtual objects is an exciting possibility, like light saber

brandon: I would like to know more about levels of control

brandon: simple start might be good

other idea: making it good to start is good, this might be an easy thing to change at runtime, global render all controllers rendering sounds fine to me

Brett: With masks we can put our feet through portals, we don't see the mask data but layering the masks is still amazing

Rik: If your interacting with your room, you arn't interested in your hands, there is an option to filter hands out of the depth so this is important here too

Brett Fix ^ in the context of hands being filtered out of depth data as a current option today

Evaluate if any aspects of the API should be deprecated. (immersive-web/webxr#1341)

bajones: we've brought this up at TPAC
… we didn't decide on anything concrete there
… are there any aspects that we want to reconsider or deprecate?
… deprecation is an arduous process
… but it's ok to look on our bets and decide if they didn't pan out
… maybe it's healthier for implementor and user
… inline sessions are an example
… but stereo inline or magic window could come in the future but not necessarily using inline session
… very few people use them. afaik only the samples page does
… maybe we can simplify things by dropping them

Brandel: you mentioned that inline isn't used much. Do you have analytics

bajones: we have some but they're not likely very good
… it very much depends if they have a headset or not
… I don't believe our numbers show that this is needed
… I have not seen much use. Things are based on three or babylon or playcanvas and those don't use inline sessions
… only the sample page
… the only other use is pages based on the samples

Brett_: inline mode won't get you anything XR related

bajones: you can get orientation data on android
… you can use offset reference space and other things
… but the only thing is that if you built your framework around the webxr frameloop, you can reuse it on the 2d page
… but it's not a big deal to switch between raf and the xrsession's frameloop
… maybe offset reference spaces?

ada: I use those

cabanier: I've seen them used as well

cabanier: can we get rid of secondary views?

alcooper: I believe hololens has them

Brandel: NTT has a prototype that uses them

ada: for the a11y pixel fallback, would that be a secondary view?

bajones: that would be a different thig

ada: it's potentially used for a buffer that you can share with people

bajones: I believe that's the one of the use cases for secondary views
… I believe hololens had an observer viewport
… but I believe we discussed using it as an observer view
… I don't know how well supported that use case is in the spec language

ada: I'm not super attached to it
… as a web developer, I haven't seen much adoption

<marisha> It looks like this fairly popular mathematics animation library might be using inline sessions: CindyJS/CindyJS

bialpio: a bunch of api use xrview and secondary views are allowed there
… so you could add camera data to such a secondary view
… for layers, we don't want to add more pressure on the system
… we only use them because we already have a lot of data.
… for raw camera access, I was thinking of creating artificial views

nick-niantic: for reference space, can we get rid of them?
… for example, you want an unbounded grounded reference space
… you can ask for the ground on android and it doesn't really know
… the avp lets you ask for an unbounded reference space and you can only walk a meter

ada: the vision pro case is a bug

idris: magic leap and hololens support unbounded reference spaces

cabanier: I believe Quest supports all reference spaces correctly but it doesn't support unbounded yet

nick-niantic: the intent of reference space is not super well implemented
… let's not give the developer tools that are not consistent
… we only want to start at the origin

alcooper: are you saying there should be more clarity around the reference spaces?
… so not deprecate but make them better defined

nick-niantic: as a developer I want the ground to be at 0
… I always want the origin facing forward
… I want you to be able to set my own content boundary

bajones: are you saying that as a developer you want things but maybe that is not what everyone wants
… for unbounded space, I might wants things pinned around the house
… you want things to be preserved
… webxr doesn't have facilities to preserve that

Brandel: in the context of inline sessions, it doesn't seem that anyone uses is so we can take it out
… for reference spaces there is enough reason to keep them
… are there things that we want people to do with them?

bajones: it's certainly possible to have clarifications
… I wouldn't mind doing a review on these
… and maybe we should normalize that and codify that in a spec
… there are edges case

Brandel: it's pretty clear that layers benefit from spaces
… offset reference spaces
… and maybe we should add that to the spec

bajones: maybe it makes sense to add a note
… to the layers API

ada: you can just teleport around

Brett_: it's the steam deck the perfect inline device

bajones: any handheld device, yes
… if all you're looking for is rendering something inline, yes. But you can just do it with the standard web APIs

Brett_: so you can write a polyfill

bajones: yes but likely I'd be the only user
… if we deprecate it, give it a long lead time to keep supporting it.
… give console warning, create a polyfill, etc
… reach out to developers to tell them to stop
… in the case of the sample, inline just goes away

<Zakim> alcooper, you wanted to talk about native support

alcooper: we should be careful removing features that have openxr counterparts

<Zakim> ada, you wanted to ask about what if a future device

alcooper: if the thing I needed are secondary views
… and webxr removed support

ada: what if we remove it and suddenly a popular device supports it
… it = multiple views
… will it just work?

bajones: there are devices like that like Varjo
… but they support render them to just 2 views
… so part of the reason this was introduced
… some devices have a very large view with 4 large displays but even those render to 2 and then break it up in 4
… there is a lot of content that will just break if there are more than 2 views

<Brandel> I'm muddy on how this relates to https://developer.mozilla.org/en-US/docs/Web/API/OVR_multiview2, but it seems related

cabanier: (unbounded reference space issues)

bajones: I'd love to have the discussion to hash this out

ada: in the early days, we added it for hololens but it was never shipped
… but now with ML and Meta implementing it, we need to go over the issues.

idris: we got customer requests to implement

marisha: an admin question, for inline session, I find a bunch of code on github that is using it
… how do you make decision on what's being usedf

bajones: every time a browser is removing a feature, we look at the metrics
… we should get a pretty clear idea
… add deprecation warnings
… and if we get feedback, we might keep it in

Brett_: if we remove secondary views, would caves be removed from the spec?

bajones: those devices might just budget and do it within their own implementation
… and ship their own custom implementation

unconference

Splats are the new mesh

<atsushi> [Nick going through slide]

Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

Succeeded: i|https://github.com/immersive-web/webxr/issues/1364|topic: Backgrounded Tabs and WebXR (webxr#1364)

Succeeded: s|https://github.com/immersive-web/webxr/issues/1360|Indicate "preferred" immersive mode (webxr#1360 -> https://github.com/immersive-web/webxr/issues/1360 )

Succeeded: i/ada:Topic keeps being brought up around accessibility/topic: Add Scene Description API for automation and a11y (webxr#1363)

Succeeded: s/carrots/encouragement

Succeeded: s/hte/the

Succeeded: s|https://github.com/immersive-web/depth-sensing/issues/44|Potentially incorrect wording in the specification (immersive-web/depth-sensing#44)

Succeeded: s|https://github.com/immersive-web/webxr/issues/1366|Feature for the UA to handle viewing the system inputs during a session (immersive-web/webxr#1366)

Succeeded: i|bajones: we've brought this up at TPAC|topic: Evaluate if any aspects of the API should be deprecated. (immersive-web/webxr#1341)

Succeeded: s/scribenick cabanier/scribenick: cabanier

Maybe present: bialpio/cabanier, brandon, brett, cabanier, Felix, idea, idris, Phu, rick, rik

All speakers: ada, alcooper, bajones, bialpio, bialpio/cabanier, Brandel, brandon, brett, Brett_, cabanier, Felix, idea, idris, marisha, mblix, nick-niantic, Phu, rick, rik, yonet

Active on IRC: ada, alcooper, atsushi, bajones, bialpio, Brandel, Brandel_, Brett_, cabanier, cwilso, etienne_, Felix_Z, giorgio, idris, jfernandez, lgombos, m-alkalbani, marisha, mblix, nick-niantic, Omegahed, Phu, yonet