Immersive Web 2025 March face-to-face Day 1

Meeting minutes

<ada> Sorry people on IRC, the camera is still being set up

<ada> Sorry still having audio issues from the room.

<ada> ada: explains the minutes

<Zakim> ada, you wanted to demonstrate how to ask a question

<bialpio> no audio on the VC if anyone's speaking

Geospatial CG Introduction

<ada> [demo]

<bialpio> room is not audible at all on VC

<bajones> Working on that, sorry.

ali to do demo

<bialpio> I think the audio only worked because the presenter was not muted (i.e. it wasn't the room's mic that was being used)

<bajones> Yeah, I just wanted to check if the room's mic had come back.

<Lucas> Sorry, I can't see the calendar invite. Could I get a link to the zoom meeting?

<ada> Sorry we can't paste it into IRC please log in to w3.org and click on your profile to access the calendar and the links

ali: open ar cloud is creating a way to anchor content to real world locations

ali: spatial clients provide a way to access location information to position virtual content in the world.

ali: shows a way to map our current location to visualize points of interest around our area

ali: once you are localized you can view GLB content or WebXR content

brandon: great presentation, interesting demo

brandon: drilling down into logistical problems, big thing is coordinates on a global scale, you run into floating point issues, on web we use 64 bit doubles, not sure if that is sufficient

brandon: services process these numbers as 32 bit floats, this might not be enough precision to get a smooth range of motion. Would like to know how that can be solved.

mikes: we have a test that is working with normal numbers in javascript, it is working and we are targetting high precision.

brandon: to clerify, is all you calculations being done with number type and it has been sufficient?

mikes: we had doubts ourselves but it seems we can hit 10 micron accuracy, only see issues when you get very close

brandon: good to know, we might run into some issues with webXR as it exists today. Parts of the spec returns matricies that return lower precision. Other parts of the spec use array of number. Some parts of the api have high accuracy but others have worse so there might be some issues here. We might have some 2 tier solution where we pick one spot

in geo space but do local math using local space.

mikes: current hardware, you receive geo pose with entity and you transform it into local coordinate space. This is how it works. Would like to accelerate this with improvements of api here

<yonet> Ali's presentation link: https://www.openarcloud.org/oscp/w3c

rik: it seems this could be a specialization of the unbounded ref space. Get a geo ref space which gives you a patch of space and a position within that space. This could solve the accuracy problem

mikes: sphere has a limit when you move a far distance, you end up in a different position, need to be aware that if you create a geopose with more than 90 deg of latitude, you are on the other side of the world

rik: this is handled by the reset event, having poses in different geo spaces should just work

mikes: think about this as being able to generate local spaces accross the world. Having a connection to a satalite or tracking a space ship you can do so by thinking in terms of the earth

rik: if you create a ref space in the geo pose, you might need to add param to say which geo pose

mikes: yes there are many coordinate systems and want to limit as much as possible

rik: thinks this sounds possible

mikes: we are working on translation tools to convert other coodinate spaces to others

rik: lets standardize on one so we dont run into endless types

<Zakim> alcooper, you wanted to ask which features need to be enabled

alcooper: tech question, you use webXR incubation features, which ones did you neeed?

ali: not aware of which

mikes: it was needed a while ago, now im not sure

ali: will follow up on this

alcooper: only spec not enabled by default is layers spec, you might not actually be using these, all camera access was shipped a while ago

mikes: we didnt have camera properties, had to create own set of metadata to store this info

Emerson: Precision of even a small area, if you scan a parking area and someone did something similar 200 m away. When you do slam stuff you do a bundle adjustment and have to keep doing it. The demo might work in a certain area, does it work if you are moving across distances and have many entities. Even in a small scale is there something that

needs to be solved to be continuous. What are your thoughts. Does mm precision does it scale to moving far distances

mikes: when i say mm i mean for interaction, when it comes to movement around large space, delegating this issue to other systems. I dont know how to answer this directly.

nazih: when you scan some area, there is accuracy and the further you go there will be less acuracy. Other systems like street view can handle this for accuracy.

ali: in a dream world, if google anchors are geopose compliant, this will be a better way instead of users scanning directly. AI can handle position correcting in the background and privide more accuracy

brandon: the problem can be hit with just something in the same room. If you walk around a table, if you just use the single ref space, you might see some drift over time. Anchors exist to solve this problem, they have local points processed internally in the device, it uses landmarks to update the pose. For geopose you can get info from the cloud,

the global position (using lat long, or something for global). Then unbounded space is aroud that to get the intiail position. Then you create an anchor which is used to manage local update for better accuracy.

brandon: i am curious about getting the initial lat/long, gps alone is not sufficient. Using visual position systems was talked about. We dont want to build a perception that you need to ping a big companies server to get location. If it needs to rely on a visual positioning system or similar, the question of where that comes from becomes a

concern. If there was something well understood, we can point people to use some service. If we want to build this into the standard, we need to describe in concise terms in a way that it is not dependent on a single known service. It is fuzzy how we can do that without jumping out to an external server.

brandon: what do you perceive as the process for that?

mikes: geopose standard is what we are working on. Basic unit is pos/location in quaternion. This is agnostic of how you obtain it.

ali: it can be a geoposed qr code, doesnt matter how these poses are obtained

ali: would like to set the standard of how to obtain/use it, doesnt matter where it comes from

emmerson: the spec could have something like, if you use google or other service, they specify the precision.

mikes: this issue isnt solved but we have specific formats to deal with that

brandon: from a users perspective, there needs to be a well understood flow of how this info gets connected. I can see this encoded into a qr code, if that was common, we could tell people this is how people can use the api. People in this room might think this is a good usecase. The spec might say we dont worry about how the user gets this

information.

brandon: maintaining services like this by browser developers can be expensive and a complex/political process to setup

brandon: we could say at the api we dont deal with where the location comes from, the user of the application can choose where to pull from (specific service or qr code) that associates that a specific lat/long is related to a certain ref space

brandon: this could make this service stuff external to the api otherwise the browser weill have a location service that they have to pick a location provider from

mikes: we have the need to operate with geospatial information that is encoded in a specific way, we dont have a camera that is geospatial, how do we create a camera that can integrate that data. We have a camera, with lat / long, rotation, north. that is everything we need.

rik: would be better if every geo ref space could come with its own anchor you can create. The more anchors you have the slower things become

mikes: those anchors are also entities in the geospatial element. There is no origin point other that north pole

rik: anchor is just a point in space, can define that as the center of your geo ref space

mikes: you are always in a ref relative to earth. The frame of ref is always the center of the earth. Those anchors are the same level as anything else you located

brandon: when you manage to geo locate youself, you wil let geocoordinate that is preceived by the device. it would be convenient for developer to be able to drop an anchor at the time of location where the device itself has a series of landmarks so the anchor is always relative to those. It has nothing to do with the global pose. If we have an

anchor at the time as global pose. The device can position the anchor always tied to that specific geolocation.

mikes: want to generate points as a go, it is the responsibility of those that are working on the geolocation api. While it is important to have a good geolocation api. it doesnt mean that the idea we are proposing. If the device knows where it is, i dont mind if other entities exist.

ali: if you are in your home you dont need geo pose. But that can also know everything around it, if it is tied to a anchor you can place them relative to other anchors. At some point this should be at device level where you pick which geo service you want at the device level.

brandon: what you are showing so far relies on visual positioning systems. Do you a have a sense of what information is needed to get an accurate geo location. Is it a single image, series of image, point cloud etc?

ali: you just send one image along with lat long, uses gps to narrow down its map search to match it

brandon: cool that it comes from a single image

ali: the area has to be scanned earlier

ali: there were 21 images to scan the area at first and this gave accurate geopose at the service provider level to generate things before hand.

brandon: ok, so user creates one picture.

ali: you can see where you should be to localize. Its not everywhere, there is a predefiend location where you start and take the image. Things will be easier with ai. At the backend there can be localization done with anchors.

brandon: if the api looks like webxr delivers locally image taken from camera that we can take the image and metadata, the cloud can figure out where the picture was taken, i can figure out where to place the anchor and start building out from there. This could allow for system that isnt tied to a specific. THe api is a little odd but we talked

about something similar with qr codes. This might not be right but i could see it working

ali: agree

mikes: great for a geolocation api but its separate from webXR, if we only have 5 mm accuracy its not great but we can work with that

lunch time

<Ali> Here is the link for my slides

<Ali> https://www.dropbox.com/scl/fi/rps3p43rarclvkdx9gvh0/GeoPose-SWG-W3C-Immersive-Web-Face-to-Face-March-2025.pdf?rlkey=w7a45h01vuo0tziipivu8dp2v&dl=0

<Ali> and

<Ali> the web site URL for today's demo is below

<Ali> https://www.openarcloud.org/oscp/w3c

[lunch break]

<ada> Sorry we're still running late

<alcooper> Starting back up. Brandon is discussing in person logistics. We will have the current setup for the rest of the day

ada is explaining the next steps

Hint for VR in AR WebXR sessions immersive-web/webxr#1402

ada: issue relates to AR passthrough features

ada: should we have a new session type

cabanier: how do we expose it, maybe set it in the render states
… open to any other ideas

bajones: Render state is probably the right spot for it

bajones: there is convoluted way to get by without introducing a new api, but it's a pain

MikelSalazar: do we care if its accurate at one frame

cabanier: we would only disable the second stage, it would come up right away

bajones: yes, everybody can guaratee the type described in the spec
… these days its mostly VIsionOS, OpenXR and some other solutions

bajones: bigger question I would have is it a hint or is it a guarantee
… video passthrough devices might end up implyng you have to do another clear or copy if a device dosen't have the ability to stop the pass through

I don't need the pass through right now, you're allowed to disable it

I feel the naming is important

<Zakim> Brandel, you wanted to discuss what else is meant by being "in AR" as a bundle of permissions

Brandel: makes sense as a method of transition between AR and VR
… what is understood to be meant between AR and VR session for a user
… it would be worth while to look at lower level switch bases on user persmisison

alcooper: on mobile would be something you are already in AR,

Brandel: user notification is important

cabanier: could just render black over pass through
… already start in AR , user is already passed the permission

alcooper: I'm not sure about enviroment blend modes
… Android XR don't know 100% how it works

bajones: we do in some cases shuffle the camera image around
… agree with Rik, in terms of premissions, we want to deal with at the session level
… in AR session, once you've done that , its not realy a mode change, you don't need to spend time processing these pixels

I'm not revoking persmission by saying I don't want to share camera , optical passthough is a good analogy

Brandel: its a matter of duration

bajones: I can get on a zoom call, meeting goes muted, I walk off and my camera could still be runing, to certain degree that is on my as the user

there is only so much we can do to protect users from themselves

the most practical thing would be to have some sort of Indicator
… like if I bring up a task bar, highlight it

ada: on a s diff topic . we don't explicitly tie features to session types. there are certain features only excted to work if explicitly saying where you can acessess them.

alcooper: I somewhat disagree a bit

there could be valid scenarios for developers to do that

on chrome we have VR and AR separte persmissions

It should be up to the user agent if its ok to grant persmissions based on "persmission model" of ua
… without raw camera , sites don't have access to raw camera that is part of the passthrough
… the privacy concerns are a little redundant as the site can do that anyways

ada: would like to clarify, assumption we disable all AR features, but if this is just for redering, than ok

cabanier: the only thing that dosent work is for unbounded spaces

trevorPicoXR: whatever we sovlve for this may apply to other features as well

ada: maybe you could't do unbounded , would you want to have a feature for this

if the users can know if they require an unbouned space they should not if its not going to work

cabanier: even if its ubounded ther eis not much we can do

unbounded has nothign to do with pass through

if you are in unbounded you don't get the guardian

ada: probably ok

alcooper: per Brandon's point, indicate it's a hint not a guaranteee

there are form factors that simply can not do it

ada: what is the current proposal name

<ada> {uShallNotPassThrough}

<ada> {passThroughObscured}

bajones: pass though obscured

jinch: pass through limited

<ada> {dontNeedPassThrough}

bajones: there might be some devices can't do it

<Brandel> framing it positively as {needsPassthrough}

bajones: we want something that indicates the state of the app
… if its a heavy lift we are giving premission to turn it off

ada: could we mandate to clear to black

<alcooper> canDisablePassthrough(Compositing)

bajones: we may not wish to mandate

<MikelSalazar> suggested passtrhoughAllowed"

MikelSalazar: can disable passthrough

bajones: we want default to be falsy
… having something where we are actively saying allowed disabled is probably the better framing

ada: I like passthrougobscured

<cabanier> isPassthroughObscured?

<cabanier> isPassthroughFullyObscured?

alcooper: passtough objscured vs fully obscured

<ada> passThroughObscured

<ada> passThroughFullyObscured

<cabanier> -1

ada: voting on the above

<ada> +1

<bajones> +1

<trevorPicoXR> +1

<alcooper> Fully = +1 no Fully = -1

<atsushi> +1

<MikelSalazar> -1

<alcooper> +1

<Lucas> +1

<Emerson> +1

<Brandel> +1

<m-alkalbani> -1

<yonet> +1

ada: Pass Through Fully Obscured wins atm

<cabanier> pasthroughFullyObscured passed

ada: we can wrap up that issue

<MikelSalazar> To be a bit pedantic obscured is to make it dark,

<MikelSalazar> occluded is to hide it behind something else

<MikelSalazar> Also, here is the link to the slides of the previous presentation: https://docs.google.com/presentation/d/1ozukf7t8WPiZn9p74gn16JNJuwKNiR07UTs0Pd1M_AU/edit?usp=sharing

Depth Sensing: Raw vs Smooth Depth

immersive-web/depth-sensing#51

alcooper: these next few issues are related. 51 + 52 made us come up with 53 and broader changes to the API shape.
… ARCore and OpenXR, things we use, support "raw" and "smoothed" depth information.
… Currently, we don't differentiate or inform about the characteristics of the depth texture. The PR I'm working on adds a preference for if you want raw or smooth
… It would be a preference, such that if a UA has only the ability to offer one it can do so.
… As I started going back through the spec to re-work three things instead of two, I ran into some things that made me want to refactor.

cabanier: We don't have support for different types - ours is only raw and we advise authors to do smoothing of their own.

cabanier: We _could_ add an extra pass?

alcooper: We would have to make it optional so that it supports environments where only one can be offered.

cabanier: We should also report if a hint is denied, when you ask and don't get the type

alcooper: I think if you ask and don't get it, it should be able to reject - the main question is probably in a retro-fit situation

cabanier: I would like it so that if you ask for smooth and the session reflects raw, that's a good signal that the author will have to do the smoothing themselves

alcooper: so you'd prefer if it _never_ rejects?

cabanier: I think so, does that seem reasonable?

alcooper: That does make sense - it probably breaks less in the event that people have stuff that works today that might break if a "smooth" request is now denied on Quest

cabanier: If we get feedback from developers indicating that they _don't_ want to do this work themselves, we could add one. It seems like what we'd do is more or less equivalent to what they'd do.

alcooper: There's a possibility that a UA is in a better position (either by way of additional data etc) to be able to do a better job of this, but doesn't have to be a hard guarantee

immersive-web/depth-sensing#52

alcooper: On 52 - Trever helped foreshadow this one
… Should we be able to pause or resume sending up depth data, based on the expected cost? Maybe use snapshotted data etc once an object has been placed?
… Pausing could be void and resume might need to be a promise, since it could take a few frames to resume a service

cabanier: Depth takes a couple frames for us to get back to us - but we might serve frames that happen to be empty

alcooper: I think we return null rather than empty data until we get meaningful things

yonet: In the case that you need to do extra scanning, what would you do?

cabanier: We would wait the few frames like we currently have to at the outset of a session

bajones: Nit: I'm not sure that this belongs in "render state", since it's not a feature of what is visually displayed
… Render state tends to have guarantees about when things are going to take effect. It's not the end of the world to put this information in there.
… The depth data comes in on the frame? [It's queried on the frame]

alcooper: Rik has a PR to get it in on an additional view. Right now it's _aligned_ with the frame's view and the PR is to make it come from a different one.

alcooper: I _think_ they come through the WebGPU binding, at least for the GPU one. The CPU one I'm not sure

bajones: In terms of where we set this, my initial instinct is to put it on the binding [cabanier shakes head vigorously]

alcooper: The XRFrame carries CPU depth info, XRBinding carries GPU depth info
… I'm not sure if cabanier's change impacts that

bajones: My gut tells me the session is a better home for this information
… that also helps preserve the function of the renderstate to provide guarantees about timing information about this

cabanier: I'm fine with this not being in renderstate, I don't think it needs to have functions or fulfill a promise

trevorPicoXR: I like the idea of this being on the session, I'm curious about whether there's a path to follow this pattern for other features as well
… we might want to (for example) disable Layers, or hit testing - we'd need to enable the most at the outset, but potentially pausing features and resume them
… Is the challenge here that the API is complicated, or that we want to _make_ this simple?

cabanier: The existing API is supposed to be able to accept null textures and dummy data

alcooper: What trevorPicoXR is suggesting is a more comprehensive model of being able to toggle settings at a more granular level. I think a promise would complicate that
… But anything that's null that's not _expected_ to be null could complicate that and delay things

cabanier: I'm conscious that the slate of capabilities present slightly differently, which makes a unified approach more challenging to resolve

bajones: We can resolve some of that at the API level so that it's more of a hint and the user-space code isn't significantly affected

cabanier: Some of the things that trevorPicoXR suggested, though, aren't session-level features

bajones: someone gave an example of toggling on hit-test. For things like "requestHitTestSourceForTransientInput"

you can cancel that in order to lighten the burden, and it does currently live on the session

alcooper: Lighting estimation looks similar - you ask for a light probe and could probably just drop it after you don't need it
… 53 gets toward a more unified approach that makes use of a tracker, and you can just destroy this object to indicate a disinterest in using those capabilities on an ongoing basis

immersive-web/depth-sensing#53

alcooper: Should we reconsider the API shape of depth? As I was trying to implement this, I was wondering if a site might want both raw _and_ smooth, for example.
… We also have other constructs that have a requirement like this, like a hit test, that you _indicate_ at start up but don't necessarily consume immediately or at all times.
… I have a straw proposal that has an XRDepthConfiguration that indicates some preferences
… If we're already saying that we need to start and stop things, or ask for both kinds of things at the same time, then maybe more things will follow in that pattern as we look harder
… I'm not aware of other features that currently require an `init` - maybe DOMOverlay

cabanier: So in your proposal, you could have _multiple_ depth configurations at the same time?

alcooper: at most one raw, one smooth - predicated and ordered on the depth type

cabanier: Depth is currently our most expensive capability, so I don't know that it's a good idea to make it easy to request something so hard/expensive

bialpio: I think a subscription-based model might be more sensible here, since there are smooth/raw, cpu/gpu - not necessarily things that an author can pre-declare and express ahead of time
… It might be easier to use a promise-based model where you can engage in an ongoing negotiation about whether you get A, and or then B, then C etc.
… Rather than a single config that doesn't necessarily give a collection of options that the author has the ability to immediately respond with
… I also wanted to respond to cabanier's thoughts on multiple sources of the same broad type - a system could reject things if it's at capacity. Sites would simply have to start tolerating more failure cases
… It might become harder to write an app against disparate implementations, but maybe that's okay?

<Zakim> alcooper, you wanted to mention tighter preferences

alcooper: one thing bialpio mentioned that contributed to my motivations on this, in addition the raw/smooth, is the idea that an author says "If I get raw data, I need it on the GPU, whereas I may be able to cope with smoothed on CPU etc."
… letting developers specify a tighter set, this configuration gives them more control to make a more precise request of things

cabanier: I am apprehensive that if there's too much variability about how this kind of session-granting can establish, it will make it too complicated for developers to anticipate how different devices will run
… (and maybe we need to be a little more prescriptive)

alcooper: We do have a shipping API that we might be able to steer towards a more conflict-resistant approach
… a new API allows us to build something that can better resolve some of the potential conflicts that we're currently tip-toeing around

cabanier: I think even where there are big disparities, we _can_ resolve them by doing things like dropping GPU buffers into something CPU-readable

bajones: I don't love when APIs construct elaborate preference sets in order to get around variability. I'm aware WebBluetooth that has something like this, as does aspects of video selection
… It's my impression that this is pretty daunting to authors - in general we have tended toward a more prescriptive model that _tells_ authors what to expect
… Leaning more heavily on one or two primary views etc - and providing _support_ for edge (or CAVE) cases, but making sure that the most common path is going to be the best-catered to

bialpio: The only problem I have with mandating a specific course of action for all classes of devices, is that we might be effectively penalizing one of them if it happens to be very expensive to do that thing
… if a category of devices simply doesn't _get_ data in a form that's amenable to the 'expected transformation'
… this is a lot like the course that cabanier was trying to mitigate with re-projection of depth data, for example
… it's not a reason not to prescribe things, but we need to be conscious about the implications of saying "this is the expected presentation for a given source of data", and jeopardizing an API shape because it ended up hard

alcooper: Agree with bialpio. We have these flexibilities because different devices do things differently - it feels reasonable to say "if you ask for depth, you will get it in this way. You can _request_ if you want it CPU/GPU, or Smooth/raw" - sometimes those things are cheap and we should be able to send it to an author

cabanier: I think we've learned a lot since depth V1. I want to reiterate that if we give authors too many options they will get confused. We should be able to make an API that says "this is the best way to work with this data" - and provide hints for what options exist

alcooper: We are using CPU data, it wouldn't be hard for us to upload that to GPU but it would be expensive to pull it back down

bajones: when you're in the WebGL then you can pass things around. The bigger challenge is when you have a bubble in the data that takes time to resolve
… If you're willing to make the data asynchronous then the performance could probably benefit

alcooper: Back to my earlier concerns - yes it would be better to say there's one right way to do this, but I don't know that our devices are at a place where we can refactor all of these devices streams to behave the same
… We might want to wait a few years to converge

cabanier: I don't know that we _will_ converge though

trevorPicoXR: this is kind of like camera data that can, at times, go straight to a GPU while some have to go through CPU
… but a developer can be pretty indignant about having that complextiy masked, if it has impacts on the behavior and perf they can expect

alcooper: If we can make minor changes that are no worse than the current world then that would be good. We could make something less "opt-in"

cabanier: You might not like my proposal - what if we just drop CPU? Is anything going to use CPU directly, or are they just jam it in the GPU?

alcooper: I have a developer who wants CPU, (but also they say that one of the first things they do is throw it in the GPU)

bialpio: I think the use-cases for CPU are physics and collisions, if you want depth for occlusion it goes to GPU.
… per bajones suggestion, I would lean toward dropping GPU because it's simply easier to throw things _into_ the GPU. That might put an upper bound on the resolution of the buffers that this is viable with though.
… I don't have a good intuition of where people might use CPU data than physics - AI possibly
… like pathfinding based on the environment, but I don't know how common that use case is

alcooper: The notes from the developer said that "raw depth is best on CPU, for physics, smooth=occlusion=GPU."
… we might be able collapse some of this to say that "raw goes on CPU, smoothed goes on GPU. I'm not sure if that helps anyone

cabanier [shakes head]

bajones: Something that came to mind is that AI is almost certainly going to go on the GPU.
… for physics I can _see that_ being more convenient on the CPU, but we also have a lot more compute-based abilities coming with WebGPU.
… bialpio is right that it's almost always easier to get data _on_ the GPU rather than off it, but am conscious that these resources are going to continue to get bigger
… There is an overlap in that use-case with things like environment meshing and awareness features. It seems like that might be preferable to depth in that event, though.
… Depth might allow you to expand, resolve and refine a mesh, but it won't let you consider things behind you, for example.
… I do think that convergence on being able to use one thing for all kinds of tasks is multiple years down the line, though.
… In the long term, I do think that GPU-based usage is going to supersede CPU-based approaches

bialpio: When I said "AI" I didn't mean ML, I meant _fun_ AI like A* and minimax etc
… given that we have a pretty even split about use-cases - makes me think we should just offer an occlusion feature itself, rather than furnish the app with the data and expecting them to do it themselves
… that might serve to mitigate the need to do too many different things with this information

cabanier: It's not really feasible for us to composite in that way with today's technology unfortunately
… most of the time you do occlusion in AR, so you only have to apply occlusion to a small character rather than the full overdraw of doing it on the whole buffer

<MikelSalazar> Sadly, I am not fast enough with the keyboard :-/

bajones: To support Rik on the Occlusion API, we can't just take the depth as provided and use it as a Depth Buffer
… 1. It's not aligned and it's a lower resolution than want you want.

If you want to do high quality occlusions with these depth buffers, it's something you need to do as a render pass. Basically, Rik is right

ada: This is wild and might be terrible. What if you were to foveate the depth buffer. That way the depth would be cheaper to calculate
… Trying to reduce the number of pixels

cabanier: Foveation cuts the frag cost, so this wouldn't apply

alex: It seems like we might move forward with depth v2. I don't know if there's enough to do away with the request model we have now. I think we may want to support GPU depth

ada: might want to take that back and think on it with company / clients

cabanier: we don't support cpu, so it's not a problem

alcook: Is depthv2 worth pursing? I think we should land the improvements we have in the mean time to give users a better experience sooner

cabanier: Maybe by the time v2 comes out, there will be a clear to not support cpu

alcooper: Maybe data format wise, it would be easy to keep them all

cabanier: Which formats do we support?

<Zakim> Brandel, you wanted to boggle/reflect on the disparity of resolution on the sensors

alcooper: 3

Brandel: I didn't realize the resolution between the two approaches were so different. It makes sense that there's a difference in needs

alcooper: I think that's the crux of why we have cpu + gpu, then there's the element of having smooth and raw for the different purposes

Brandel: We should acknowledge the function of the depth API for both platforms

cabanier: I think we both want to use it for occlusions

bajones: In the spec we have luminance-alpha, float 32, and unsigned short formats

bialpio: I think there were other resolutions that could be support on our platform, but it might require dedicated hardware. Still a bit away from 1k x 1k
… The other thing here is that there may be something different on how the depth buffer is computed. I think ARCore is trying to do live scanning of the environment.
… Feeds in a bunch of data... This might be the reason why the resolution is still pretty small
… I think they don't want to assume what kind of device hardware is supported. It really depends on how things end up developing there. For now, it seems we're stuck with fairly low res on phones without dedicated hardware

alcooper: I need to think about some stuff here. I don't know how to proceed, but not sure if more discussion is beneficial for the time being

immersive-web/layers#310

<ada> immersive-web/layers#310

bajones: i don't think we came to a conclusion on api shape

cabanier: threejs is moving to a 2 pass model, so in order to apply foveation need to apply foveation to render texture, and no webgl extension to do so

cabanier: and webgl kind of on the way out, and slow to get an extension

cabanier: proposal: foveatedTexture with one parameter (bikeshed name)

bajones: is that the api shape that points at a texture, and say foveate that, and then whenever it is rendered to it is foveated?

cabanier: yes thats how would work

bajones: this would probably be applied on the webgl binding, would be different on webgpu backend

cabanier: you just say where the eyes are and the regions beyond that, but there's still hand wavey things

bajones: anything else that need to be set?

cabanier: it kind of has some weird attributes but that's similar to other layers, webgl layer

bajones: is it the best path to have a texture once set as foveated will be foveated and can't turn off, if want off then recreate a new one w/o etc

bajones: traditionally with webgl can usually query this state, do we need? i.e. 'is this texture foveated?'

cabanier: then would need to keep track of state, we are taking the easy way out w.r.t. 'proper' webgl way

ada: it would be weird if once left webxr experience, see the foveation artifacts potentially on non-webxr experience

<Zakim> bajones, you wanted to ask about eye tracked foveation

ada: maybe a set with textures that have foveation, so developers can do clean up on those textures / discard / re-create etc

bajones: this would be something that persists outside of the session so that could be a negative, 'leave campsite as you left it' is a good principle

bajones: tried to replace fixed foveation as this would be better, but there is a dependency on eye tracked foveation to get the biggest worthwhile benefit

bajones: q to apple forks: can we do in a way privacy preserving way?

ada: can't speak for apple in terms of if would be possible

Brandel: it's done at system level on visionOS, so would have to be in a way safari/webkit wouldn't know about it

Brandel: don't know if visionOS lets apps do dynamic (non-fixed) foveation

bajones: even if you don't know exact foveation texture etc you can figure out through various side ways aspects of the foveation etc

ada: others could speak more on what is really possible and would need that to really give fuller answers

Brandel: visionOS has a more complicated foveated rendering even for fixed

bajones: do you have any sense how you would apply foveation in the deeper layers, and how would do this from webkit?

Brandel: don't know

bajones: want to know just so that can make sure proposal is implementable on platform

ada: overall seems a good idea

cabanier: so should state persist after session ended?

cabanier: texture is created as focus, but has foveation applied

ada: opening a potential trap or visual weirdness or closing potential options for headsets that do more unusual things with foveation

cabanier: answering if change foveation level if need to re-create: no

ada: don't like idea of device state leaking after session end

cabanier: could do something to unapply foveation, but feels ugly

bajones: but maybe worth it as it's cleanup and could be good to do from session being -> end, weird to have some hidden state changed and can't tell

bajones: could do some best effort thing to not leave it hanging out

cabanier: so would have to add that to the spec?

bajones: yes would have to add something related to session end

cabanier: could use a weak list, as we don't want to hold on to them, could have 1000s of textures

bajones: i'm sure other specs has weak lists, it exists but don't know details exactly

bajones: can't use weak list as not iterable

ada: can do with weak set

cabanier: ok something possible

cabanier: is the name foveateTexture alright with everyone?

bajones: likely for webgpu would have to do something different, likely would have to do something at beginning of render .. would have to look into; but for webgl this seems fine

<Zakim> Brandel, you wanted to discuss other hints about VRR

Brandel: foveation is not the only reason to have variable raster rate

Brandel: is that something we'd want to address?

bajones: correct that variable rate stuff is useful outside webxr, but 1) webgl working group not interesting in doing a lot of the work required for maintence mode webgl

bajones: for webgpu do want to consider variable rate and may overlapy with eye tracking, but that's farther off

bajones: for now like the more straightforward foveation being described here

bajones: typically variable rate has a generated texture map of shading rates and may eventually go that direction

bajones: I assume this would take a param of foveation level?

cabanier: no would take the layer's foveation level set, a float

bajones: how do we make the assocation between layer and texture?

cabanier: so i guess we would to add param

bajones: either we can pass as creation time or find some way to associate

cabanier: github issue proposal has the param there

bajones: any way to change so dont need bound texture for param 0, i.e. have unbound

cabanier: would have to handle binding if didn't do that which is ugly

cabanier: want to avoid storing strong references for these textures

: react three fiber is nice has some ways to make UI a little easier in WebXR, but doesn't work with all HTML etc it's restricted and most companies don't want to have to rebuild, so we're building a platform somewhat similar to react native that lets developers build 3d applications easily across platforms in a mult-app way

: we want to come to all platforms like meta, androidXR, etc and want to collaborate; originally wanted to build upon webxr platform but needed native features and functionality

<bajones> WICG/canvas-place-element

bajones: awesome great to see the experimentation. model tag is something that gets out of webxr fully immersive limitation, also wanted to point out placeElement api which will allow UI etc things to be much easier

bajones: Also there are possibilities of things like DOM layers that are possible that would be helpful for UI among other things

: If we don't use HTML are we losing the spirit of the web?

: We don't want users to have to rewrite their HTML etc, we do have concepts like a volumetric view

: I don't like that UIs are still so flat

: We can do 3D stuff, we can map HTML/DOM to native elements that may do 3D; open to exploring more but still early

: think of this like electron, that can talk to native stuff but still web-y; think of like that + combining with react native

<Zakim> Brandel, you wanted to talk about the uniquely dangerous role of a browser

Brandel: one reason browsers are slower is for security due to untrusted and need to check, so one way to combine things to get web-like benefits and concepts but avoid some of the security issues as trusted concept is an interesting concept, but eventually would like web itself to be able to do many of these things, cool that can experiment before

that exists

– DRAFT –
Immersive Web 2025 March face-to-face Day 1

18 March 2025

Attendees