Immersive-web WG/CG face-to-face 2023/04 Day 2

Meeting minutes

<cabanier> new repo (real-world-meshing)

Add support for 3D geometry

cabanier: I made a spec in the new repo
… We don't support meshes like the hololens does but you can create shapes that come back as 3D geometry in your room
… with the 2D planes API there's no practical way to return them but we want to be able to and return these objects
… They are not fine meshes, they are outlines
… Should there be an attribute on the mesh itself that can do basic hit testing or object detection?
… I'm not sure if the hololens supports multiple meshes or a single mesh
… The API right now has separate meshes, a table, a chair, a room
… But I'm not sure if the hololens segments the meshes
… On magic leap it just returned the whole thing

bajones: A couple of observations looking through the spec
… First, re Hololens: what I recall is that they would have submeshes not associated with any particular object
… And then the submeshes would update over time
… So that'd be one of my first concerns, I can see on the XR Mesh you've defined that there's a last change time
… I would want something where it could go through and indicate 'these are the meshes that have changed since last time' so I'm not scanning through every mesh every time
… that would require me to keep a list of all meshes that I've seen before and compare against the list every single frame
… would be cool to refine down to an event
… I also recall that meshes on the Hololens had a unique ID to help with that process
… And the last point of feedback is that vertices are a frozen array of dom point-read only
… I would suggest that there's very little point to this, we probably want that to be a float32 array
… since I want to put them into float32 anyway, it would make javascript easier but at this point the 3D world involves pushing things into buffers so much that it wouldn't be a problem
… Since we're dealing with potentially large array

cabanier: Some of your feedback also applies to planes
… For planes I could do one thing and meshes I would have to do something else

bajones: I expect there to be more meshes than planes in the environment, especially in the Hololens
… the Meta approach right now seems a bit different
… If you want it to refine further like to Magic Leap or Hololens (which seems reasonable) then I would expect a high volume of meshes and changes to be coming through
… more than planes
… But I'm not opposed to making those approaches more unified

bialpio: To respond to some of Brandon's comments, we're thinking about ID for planes as well
… but the conclusion was that since planes are live objects, we're just updating for you
… if you want to assign some kind of ID, you can
… but it's not a best practice
… but you can use Javascript equality operator to compare things
… it puts the responsibility on you to hold on to the objects from the last callback to compare against
… to evaluate meshes that are new or are removed
… They will be able to emulate the event based approach
… If we do want to have an event-based API, do we want to have it in a context where we can issue GL calls?
… if not, we shoudl stick to the approach of letting the developers emulate the event-based approach
… The other point was for planes, we expect lower amount of data, we just give the outline of the plane
… Can we make sure this design works with whatever Hololens is doing?
… their data is likely to be wavier than a user annotating their environment
… especially with Hololens having a dynamic environment
… In other versions of the Mesh API we were concerned with that
… Just want to do the right thing here so we won't need a diffrent API to support Hololens

cabanier: So you think the spec should say the user agent should simplify the mesh?

bialpio: If Hololens wants to use it without having to simplify the mesh, they just won't use it
… If there is a device that wants to implement this API but also has concerns similar to Hololens, we don't want to have to scratch this API and invent a new one that would support the future

cabanier: What would that API look like?

bialpio: That's the question
… We have this design from Magic Leap I don't remember what was on it

cabanier: There was a fan mesh that was changed while walking

bialpio: I remember a lot of conversation about how to do the updates efficiently
… Do we have anyone from Microsoft in the room?

cabanier: Things would also go away and come back, which is different from now

bialpio: I'm not saying this API won't work for them, I just don't know if it will

bajones: I assume that as long as we can be assured that the process of advertising new meshes, updated, and removed meshes, can be detected and responded to,
… the actual content of this doesn't seem like it's that bad
… the thing you want to avoid is putting the users in a position where they can't tell if things have changed
… so they just upload every buffer every frame
… I do think the float32 array would be a requirement here
… the most flexible way to shove things off to the GPU
… We don't know if the user wants to send this to a physics engine or something but they take float arrays too

bialpio: The basic thing is that Hololens submeshing, they can just model it as a distinct XR mesh

bajones: and that's how the Hololens already does things

bialpio: If you don't have any concerns...

bajones: I will look through their documentation and make sure I remember correctly but yes
… I think your argument for why we don't need the same integer index is compelling
… You can do comparison or set your own as you want it
… as long as we are returning the same XR Mesh javascript wrapper every frame
… that probably takes care of itself
… removal might be the trickiest

bialpio: You will see something that is stale and not in the new XR frame
… it is bookkeeping that the app needs to do
… Maybe then one more question: Do we intend to attach a semantic label to it ever?

cabanier: yes

bialpio: that might throw a wrench into Hololens design

bajones: The only question I would have about semantic bits is do we expect any devices to reasonably support them?

cabanier: Not yet but in the future yes

bajones: On Hololens where they generate a mesh but has no semantic meaning, it just comes through as unknown empty string

<yonet> There are semantic labels for the mesh on HoloLens.

Nick-Niantic: I want to share our API for this
… (displaying screen) This is from our developer documentation
… It explains what we give today for meshes when you use our web-based systems
… basically we have meshfound events, and the meshfound event has an ID, has a transform of hte mesh, and geometry index array and attributes
… the attributes have a float32 position array, float32 color array
… This is what we give people

bajones: What do you use the color for and where does it come from?

Nick-Niantic: These meshes come from previous scans either from developer or community

bajones: How high res are they?

Nick-Niantic: Whatever the resolution of the mesh is - relatively coarse
… Color is useful for visualizing the mesh in the space, it's not high fidelity
… I don't know that it's required but it is what we give today
… Our meshupdates have anchor, position, rotation to account for drift
… you can update the position of a mesh
… I'm not sure how Oculus handle drift within a session but it might be useful to tweak location
… Event for no longer tracking a mesh as well
… typically we don't have semantic labels, but we can give per-vertex semantic labels within a mesh
… our preferred method of semantic labels is through a dense mask of the image
… since you can test any portion of the image
… I'm not sure where I stand for usefulness of labeling vertices with semantic info, but if we wanted to, having granularity on the submesh level would be valuable for us to remain compatible with WebXR APIs
… It's better to have callbacks for these things than provide every frame

yonet: There's more than one way to mesh on Hololens, and semantic labels are part of it like for windows or ceilings
… Portions of a mesh are labeled, it has to be a closed mesh

Nick-Niantic: One way to do semantic labels is a map from vertices to semantic label

cabanier: So every mesh would have an array of submeshes

Nick-Niantic: That just came to mind just now
… Not sure how it would work with multiple disconnected meshes

cabanier: Accounting for drift, a mesh has an anchor associated with it, so it won't drift

Nick-Niantic: Okay so it's being kept consistent internally

cabanier: I asked if it should return a fine mesh or not?

adarose: It seems like a fine idea, but it would have to change over time as it gets updated

cabanier: Hololens could use it for occlusion

Nick-Niantic: Maybe could be an optional feature
… Could use meshes for things like physics
… even occlusion may be improved with a coarse mesh even though fine is better for that
… if the user requests a mesh quality level, then you don't have to give it to them

cabanier: We don't have to stop them but could inform them

bajones: Nick, in your system you have an update event for the anchor moving around. How do you handle actual updates to the mesh itself?

Nick-Niantic: Currently our meshes are based on pregenerated user scans so they don't update
… I pushed the team to allow update API but they didn't add it

bajones: I do wonder, looking through Microsoft mapping, they have two systems now for doing this,
… They have a breakdown between spatial mapipng and scene understanding APIs
… the scene understanding SDK is that it gives back static versions of spatial mapping data
… Is there value to call out whether a particular mesh will be static or not
… There is some certainty about whether it would be static or dynamic
… Metadata would be considered mostly static

cabanier: People can run room setup while in XR session so mesh could disappear

bajones: You could say meshes have been removed and put in, while still static

Brandel_: That makes sense, especially as different purposes and functions for meshes arise
… I don't love the terms fine and coarse because over time, 640x480 was once high-res
… Would suggest more explicit terms for the function of the data
… Otherwise we have to get into doublefine and ultrafine, where are fun but not good names

bajones: It's hard to prescribe a particular use for a mesh
… For example it might be useful for physics, but it might just be a box
… Tabletop and refrigerator would be good for occlusion but not a statue
… Fine and coarse doesn't seem like the right distinction
… marking as something for occlusion doesn't seem right either

Nick-Niantic: The thing that makes sense to me is that you are trying to give an indication of 'dont' spend too much work to get this data to me' vs 'spend extra work to make this data better'
… You're looking at a performance tradeoff
… they don't care about how fine is too fine, vs spending work on the data
… I don't know if that applies to Oculus since users are defining the meshes
… someone could draw their mesh very carefully for a fine mesh

cabanier: I could leave the spec as is
… if it becomes an issue in the future we could modify it

adarose: Maybe a "could be better" designation

bajones: Are there going to be systems where you can meaningfully turn that dial?
… where the user can be as detailed as they want
… if someone is scanning, they can specify the resolution?

Nick-Niantic: There are different resolutions

bialpio: Does it matter from the perspective of the app?
… if all the meshes are static, I could use a different version of the code, but if things are going to change, it doesn't matter whether one thing is changing or all of them
… Does that help us with design here?
… It sounds like most of our current devices for this API would be handing out static data
… don't know about Microsoft looking into this implementation, maybe it doesn't matter

cabanier: With the example of planes, static on device, it would break on Android

bialpio: but I can't assume that this data won't change, I have to handle it in my code
… The assumption that things won't change doesnt' come from us, it comes from experience

cabanier: Sounds like we should not have a static

bajones: I can't think of a specific thing that would drastically change
… You can tell a GPU whether it's going to be a static or dynamic mesh
… but it's not going to matter if you're not updating the same buffer
… unless you can assume that indices are never going to change

bialpio: for Planes we do guarantee that positions can change without touching vertices
… we say that change of position will not affect last-update time since pose is a property of a pair of things and not the thing

bajones: I would expect the same to be here
… for data that you're shoving off to the GPU, if there were a dynamic mesh, you couldn't guarantee the same number of vertices every time
… so you're going through the same process as a brand new mesh
… same for the physics systems
… In that case, it's a good observation that there's not too much difference in how the user will handle that.

bajones: Dom Float array, you want to make sure we indicate the meshes are going to be the same object frame to frame if they represent the same data

cabanier: that's already in there

bialpio: We don't have a way to annotate it in IDL since we have sets, we cannot say if you have the same object

Proposals - Some sort of local shared space

<yonet> issue

adarose: We talked about shared anchors for a long time, but persistent and shared anchors were a future thing
… It would be good to do some work to do a shared space, a lot of people would like it
… How would we do this? I think there's different options
… We could build an example where you print out a piece of paper with three numbers, use that to generate an offset reference space that both people agree on, and just have that as an example that people can do
… That would be a way to get people started with doing this kind of thing
… but it might be nice to standardize something for creating a shared reference space
… Could have it so that the same space could be maintained for future sessions
… like going from one domain to another without having to reset the space
… People would like it and it's not necessarily worth waiting for shared anchors

bialpio: If we are okay with requiring that the users are coordinating, hit testing on the same thing would be the easiest way
… would have to maintain the same orientation..

adarose: Yes would align on three points

bialpio: Image tracking is something we don't have an API for but it becomes easier if it only needs to work for a few frames initially
… based on camera access experiment
… might not be performant enough to do the entire session
… could establish space, create an anchor, then assume that users have the anchor in the correct space
… is there any other API that we could leverage? Planes on ARCore wouldn't work since they're not static
… Depth sensing might give you some information you could correlate but that'd be difficult

bajones: Agree with Piotr,
… if you want to approach this with guidelines of how to set this up with existing mechanisms,
… we already have examples like "observer mode"
… you could do the same thing here. This would make a spectacular demo to have people standing around with their Android phones, and people with a Quest Pro, and they're all seeing the same thing in the same space
… This would be the coolest way to show off why the web matters in this space
… Would be worthwhile, I want to see this on Twitter
… My biggest worry is that it'd be a bit flaky
… In terms of what you say about going between sites, at last TPAC we talked about per-device caching or saving anchors. Is that something you've implemented?

cabanier: Yes it's shipped, but it only works per origin

adarose: It's something we'd have to do as a user agent
… it'd be a user agent guide or wizard which would then give you the session

bajones: That would be the mechanism, if each device can retain its sense of where the shared marker is
… It would take some coordination between the sites
… This would be one of the most compelling demos I can think of for XR on the web
… especially between mobiles and headsets
… I want to help organize and coordinate that if it could realistically happen

adarose: Would be good for "help wanted" post

cabanier: From our perspective there are a lot of people who have multiple devices at home, we are focusing more on mixed reality and social mixed reality
… so we are interested in exploring this area
… if we can do it with a polyfill that'd be awesome
… our systems do support shared anchors but they are a pain to set up

bajones: We're not even talking about shared anchors at this point

cabanier: more like a shared coordinate space

bajones: If I print out a big A B and C in the corners, and you have an app where you point at A, B, and C, and generates three anchors, triangulates them
… If I can do that on Quest Pro, then on ARCore, they can presumably both start sharing their approximation of the same space
… but it wouldn't require anything new

cabanier: If you want to skip even that step...

bajones: That would be awesome but is a much harder technical problem

adarose: even some sort of shared function?

bajones: might not be consistent enough

cabanier: The thing could take three hit tests

adarose: If there are two different sites but they have a flipped orientation... would be good to have a function on the session that generates space from three points
… it's not trivial
… doing it with anchors and it is updating, this space will probably change over time
… If it's something that takes care of tracking those anchors and making the space for you
… but we probably need to show the value of it

bialpio: I want to also mention cross-origin persistent anchors... they will have to be browser-coordinated, user would have to agree to it,
… one idea of how we do this is to use the origin of a local space, localize it in a consistent manner when the user agrees to use that information
… might be the simplest thing to do to maintain that space across multiple sessions
… Not sure if that would work for ARCore
… Might need to have some manner of conveying the information to the experience that the space is stable
… with the example of some games, the site can assume the space is consistent for a user, but the challenge is how do they coordinate with each other?

adarose: Maybe some kind of menu option to "resync space"
… if you sit down with another person, could both resync space

Brandel: Presumably all the devices we're talking about have magnetometers, compasses and IMUs that can detect gravity..

cabanier: I don't think we do

Brandel: Everyone agrees on gravity, it's pretty stable
… if magnetometers were more reliable across devices, you could use that for helping anchor spaces

cabanier: If you know the gravity, how does that help?

Brandel: You know which way is up, and then you'd know which direction people were in

cabanier: For refinement, I see

Brandel: If you have three points, someone could be looking from underneath. But if you have gravity and people are looking down, that's unambiguous

adarose: ABC helps too with the right-hand rule

cabanier: I'm not sure if you can use the local space and just transfer the local space
… We could have a different space, something you can request, and if it's not there it is the local space

bajones: You're allowed to not have local space be exactly where you start

bialpio: I think experiences might assume you start at 0

bajones: I remember stressing about this text when we wrote it because we didn't want local space to shift between pages or between refresh of the page
… "A local space represents tracking space with the origin near the viewer at the time of creation"

bialpio: Something the platforms can tweak but is close enough to you...

bajones: If your'e doing it with Cardboard or devices where that's the mode you're working in, it probably just uses 0, 0, 0 every time
… but for the Quest it doesn't matter whether the origin is here or slightly off here (gesturing)

mkeblx: The three points on a plane, piece of paper, another idea is to start in the same orientation, start in the same chair

bajones: I thought about the same thing, but the most likely use you're going to get for that, you're going to put a paper on a flat surface, it's hard to guarantee everyone is standing at the same space
… They could all configure themselves at roughly the same time

bialpio: At least for ARCore, you probably need to ensure we are already in reliable tracking state
… At session start there might be phone wave thing were you want the user to go through that first before they start walking around
… I don't know how much relying on the same starting point would work there
… but if we are in a reliable tracking state and the user is going to be in the same place, maybe could work with that

adarose: five minute break!

adarose: Wait do people think we should make an API for this?

bajones: Might be better first as a library
… Do we have image tracking enabled on Android?

bialpio: Behind a flag or raw camera access in a way I cannot comment on without knowledge

bajones: That could just speed it up, I don't believe the Meta Quest Pro has any sort of image...
… you could still do image recognition without saying the image you've registered has an anchor, without giving away the camera data
… Does Meta have image recognition?

Marisha: I don't think so, at least not released

proposals#84 Immersive Capable/WebXR meta tag

ada: One from me, but a continuation from Apr '22
… It would be handy if there was a way to identify that this page is webXR-enable
… indicating the features you want to support, conferring potential benefits like SEO for XR-prioritized search
… and used for ambient badging in search results, or on a URL bar

<bkardell_> is this on github?

ada: so the UA could also invoke the XR session rather than having the user hunt around for the button to invoke

<bkardell_> ah https://github.com/immersive-web/proposals/issues/84

ada: and for archives and researchers, it would be possible to more easily identify pages that happen to have been XR in some kind of future past
… so this is a general query about the value and function of this - is it useful, where would you use it, where would it go? Int an HTTP header, a web manifest, where?

Nick-Niantic: I got caught on one of the first things you said - declaring ahead of time the required permissions for a page
… The inability to 'pre-flight' permissions causes us grief
… we want to build an HTML lobby that gets everyone ready re:permissions so that players can launch into experiences with all the prerequisite permissions
… it's a shift from the initial remit of the discussion but important to us and related

Nick-Niantic: what is meant by 'ambient badging'?

Ada: Ambient badging is a property on things like PWAs, indicating the "appness" of a page, allowing UAs to invoke actions related to that
… so for an approach like that, you could have the browser take on that responsibility, but we have a separate session for that

bkardell_: I am on the call! It is hard to hear

ada: It was suggested that wolvic has ambient badging of XR capability that could allow these things to launch

bkardell_: ..maybe?

cabanier: I think the badging just guesses that an experience is XR-capable

bajones: I would imagine that most pages are searching for isSessionSupported, which is a good guess that the page intends to use it

cabanier: showing the button has the drawback that it may not account for the loading of prerequisite assets - often pages need a lot of things to go into XR.

cabanier: the search engines would need to be updated to leverage this
… and people could abuse this, so that could be a problem

bajones: We can't have nice things - anything that allows people to game SEO things, becomes effectively meaningless once adequately abused
… and if you're using this to put a badge in the URL bar etc, without a hard obligation to invoke the session, that can be weird
… it seems like the page should need to a little more work in order to guarantee that this hinting is accurate
… PWAs are a little different, because of the difference in context and function - the manifest a better guarantee of expected functionality
… I'm not sure of the importance and needs of archiving
… if you're not using JS to scan through web pages, as an archivist today, you're probably not doing it properly
… that said, I don't _object_ to it - I just question whether it will do the thing it's intended for

CharlesL: Discoverability via a meta tag would help, and for future ePubs in the future
… in which case, schema.org would probably want to be involved - it looks like "virtual location" is the only similarly-defined attribute today

bialpio: we already have a schema proposal, whose status isn't known, that may include this information
… model-viewer has some representation, but I don't know whether it has what we want - but it's likely a step in the right direction

<bialpio> https://schema.org/3DModel

marcosc: As someone close to the web manifest, it's useful to talk about the difference between "app" and "page", since an app can span pages
… And question if / how this is a new display mode. Within Apple we often use OG as a representation as well

bkardell_: I am finding it hard to hear, so I'm not sure if marcosc and I are saying the same thing
… I was going to ask - there is a radical range of things that can be done with XR, so is there a single attribute that is relevant to identify XR pages / apps with?

marcosc: checking API calls isn't a great proxy for actual, legitimate use because of the prevalence of fingerprinting
… in the past, we've had things like that - simple boolean checks typically haven't changed the UI of systems, so we shouldn't do too much

adarose: So in summary, it would. be useful but shouldn't change things in a way to alter the presentation

marcosc: to reiterate bajones' point, the page ought to have to do work in order to make it a fair guarantee that this is legitimately a function of the page

adarose: Last issue before lunch, and then lunch!

[technical setup to share]

<adarose> https://lists.w3.org/Archives/Public/public-immersive-web/2023Apr/0006.html

[anticipation builds]

webxr#1264 Proposal: lower friction to enter WebXR sessions

cabanier: This is about the 'offer session', similar to the 'request session'
… it's like request session, in that it allows for the inclusion of optional extras
… the button appears in the URL bar. That's the whole demo!

bajones: This was the best demo we've seen all day

cabanier: I haven't added to the spec via PR, but did have some questions. I have written them down.
… what happens when you call 'offerSession' multiple times? do we reject the promises of the earlier sessions?
… similarly, what do we do about iFrames who pass in and out of existence?

<bkardell_> is it necessary to support it in an iframe?

cabanier: I think that there's only one offerSession in action at a time, and can't be revoked by the offerer
… this might require user-education, since they could be tripped up without it

bajones: multiple session calls should probably just override prior ones. Libraries might misbehave, but *shrugs*
… we should have a sense that the `offerSession` establishes some pending promise on the page, and it should be able to go away with the page offering
… I'm not sure that it's critical to be able to make a call to cancel the 'offer'
… it seems like users might want to cancel the offer and dismiss the chip, but that should be non-normative and up to the UA
… we wouldn't be the first, so that can be followed the same way

bajones: 'the chip' refers to the button and surrounding information
… a promise, rather than a callback or an event, seems like a better match with our 'requestSession' approach we take today
… promises feel like the right primitive here - it does mean that sessions would need to be 're-offered' if it's accepted and then dropped out

bajones: sessions would need to be re-offered after a session has ended

bialpio: unless the UA has the 2D _and_ the XR session present

marcosc: can we see the code again?

<Zakim> adarose, you wanted to ask about abort signals

bajones: It's the same as requestSession, but doing so in a deferred manner

adarose: There would be ways to hide this signal, like offering a null session
… a hack would be to invoke an iFrame that constructs an overriding offerSession, and then terminates it in order to clear it
… so it would be good for us to give developers the right way to deal with this
… a use case includes a time-sensitive offer based on the availability of another participant in an XR session, which could be terminated by someone getting bored

bajones: `fetch` has a way of being terminable, via some kind of signal
… we could support the same thing here, we'd just need to explictly opt into our use of it

bajones: it would require attending to an 'abort' signal, which we might want to incorporate into the 'requestSession' syntax as well

marcosc: We need to have a good reason for taking the agency away from the user on this

adarose: It's not trivial and not impossible to make codes to wipe out a session, but it will be messy

marcosc: there is a similar problem in payments, where a payment sheet has to be cancelled by destroying the establishing context

adarose: for this API, I would suggest using the (standard) abort controller

bajones: the fetch API covers this approach

adarose: Quest devices might be able to offer both immersive AR and immersive VR - giving users the ability to choose

bajones: you _could_ make the chip a drop-down - but many people's "AR" sessions are basically just VR

Nick-Niantic: I am a little confused about what this is in service of

<bajones> Abort Controller docs

Nick-Niantic: generally when a UA manages things, developers have to put big CSS arrows to point to the window chrome in order to draw user attention to it
… relying on devices to self-report capabilities is fraught, because many phones mis-report their capabilities, or do so in a way we are not familiar with the capabilities of
… the case where this seems to be the most sense to me is in a windows, connected headset and you are _sending_ it to your HMD
… in an oculus[sic], you probably want to get into that immersion ASAP, e.g. in a PWA
… but I don't see why this is a meaningful addition

cabanier: Last year we looked into all the problems with the button presentation for these things - often very small, or non-presenting at all
… including a failure for a 2D page to account for a resized window at all

<bkardell_> Maybe it's like controls in media element/full screen?

Nick-Niantic: I have _less_ reservation based on the presence of requestSession, but still don't see the criticality of this as a solution

cabanier: we have seen many people struggle with, so we feel we are solving a problem people have

bialpio: I'm concerned by the 'last-writer' resolution for this - is this safe?

cabanier: you still need to explicitly grant xr permission to iFrames, so that's under some degree of control

bialpio: It still seems like the impact on debugging could get difficult, to track down who is the ultimate offerer. Maybe we could broker that permission explicitly

bajones: we have the "XR Spatial Tracking" permission policy, we need to continue to respect that
… MatterPort / sketchFab / etc do a lot of their work through iFrames, so we need to keep letting them do that without random ads being able to jump in arbitrarily

bajones: we would apply the conventions that come from abort controllers and the requestSession syntax
… I'm still not convinced that we need this to be abortable, but could be convinced
… maybe on a gallery page where some content is XR capable and some are not - but it sounds complicated

alcooper: 2 things: Initially I thought this sounds great, but some of nick's comments made me wonder:
… why isn't developer education our first priority to solve this problem?
… (bigger buttons etc)
… and second, what is the strategy for landing this, where does it go?
… it seems like requiring UAs to add things to their OS chrome is a big ask

<Zakim> adarose, you wanted to ask about offering unsupported sessions

adarose: when you click on the button in the URL bar, do you bypass the permission prompt? a: no
… if it didn't entail additional permissions, would it be possible to bypass permissions?

<Zakim> bajones, you wanted to discuss issues with developers ONLY calling offerSession

cabanier: maybe, but anything additional like hands or room would necessarily require a prompt

bajones: In the past I have worried that if this is well-supported unevenly, we end up with the opposite problem
… if Meta devices support it well, but then Android devices are still only relying on requestSession, then the developer-selected user signals become fragmented

cabanier: This is similar to the problem of having uneven support for modules like layers

bajones: Yes, but it's slightly different in that the default expectations of if/how things can fail
… If the method simply doesn't exist, developers should notice that it can't be used - I will have to think more about the consequences

cabanier: It wouldn't be my expectation that everyone would use this API

bajones: There's a difference between using the API and not exposing the end-point though

<Zakim> adarose, you wanted to ask about offering unsupported sessions

bajones: we have allowed people to sign on to things we don't intend to implement user-facing end-points for the sake of developer convenience

adarose: in your example You offer a session without checking that it's supported - what would happe if it wasn't supported?

cabanier: it would be rejected immediately
… unsupported sessions wouldn't over-write supported sessions and kick them out

bialpio: I want to refer back to the browser UI real-estate as being scarce - we would want the spec to say that this isn't always supported
… I would want the site to know that this failed, so it's not relying on it
… this is now the biggest API that allows you to do the same thing in two different ways, which is an entire alternate entry point
… which can encourage developers to have divergent ways of getting in, with potentially different permissions and code paths
… in practice, there are a lot of thing we do only when resources are explicitly requested
… our "SessionSupported" is only looking at Android, rather than e.g. ARCore presence
… I would need to look more deeply in the 'offer' timeframe to make appropriate determinations about which sessions can overwrite which ones

cabanier: The request sessions are lightweight on Quest

bialpio: so are ours, but the spec allows us to be optimistic about the scope of capability
… but we need to determine satisfiability of an potentially overriding offerSession actions, so that an unsupported session bumps a supported one

cabanier: I think we can just bump it

bajones: there might be some benefits that come from showing things inside trusted / scarce UI, but it is limited
… you don't get to choose from an unbounded range of options there - you should probably use requestSession for that
… we don't want sites to rely exclusively on this affordance. it helps accessibility etc - but this should probably just be for the minimum, most basic action that can be taken

bialpio: But does that mean that we are incentivizing developers to build experiences that target the least set of capabilities possible?

bajones: those options can be negotiated, but the UA has the ability to make those decisions and it doesn't break page logic to have those change
… e.g. the UA could simply mute that request, if it's being too loud
… if a user has repeatedly rejected the request

adarose: Let's put lots of pins in this to get back to in the un-conference time

Yih: I have been trying to gauge the scope of where these things are decided, on UA vs. page

webxr#1317 Some WebXR Implementations pause the 2D browser page in XR, make this optional?

<Jared> https://github.com/immersive-web/webxr/issues/1317

adarose: Been working on attaching DOM things to WebGL things via CSS, to animate properties of an object per frame. Cool, apart from entering WebXR, specifically Meta Quest Browser. Page doesn't update CSS queries each frame anymore, custom properties that are being animated stop.

adarose: I would like an option to not background the page, to not be asleep

<bajones> https://immersive-web.github.io/webxr/#xr-animation-frame

bajones: Mostly surfacing previous discussion on this topic (referencing documentation link), headset and window may not coincide with each other

<bajones> https://github.com/immersive-web/webxr/issues/225

bajones: previous conversation in May 2017,
… it was discussed quite extensively in the past (referencing above two links)

Nick-Niantic: We also want the page to update in a timely fashion, even in background, generally support this notion.

mkeblx: (shows demo of animation illustrating point)

Marisha: What is the current status of DOM layers and is there overlap here?

cabanier: Main page would still be frozen, but DOM layer would be animating

Brandel: You can use the DOM to process things, CSS animations may be what is missing

cabanier: Switching to another tab will also not work, CSS animations will be frozen on backgrounded tab if switching to another tab

cabanier: Framerate may also not be smooth

adarose: The animations do look smooth, not an issue
… not using requestAnimationFrame
… using headset to query values

cabanier: If requestAnimationFrame is running, and framerate changes, wouldn't be a problem

adarose: Most web browsers can run at 120fps

cabanier: Would be nice if only CSS animations are being run, and not redrawing the entire page
… Not sure what group to ask

adarose: If we're in immersive web, the page shouldn't count as being backgrounded

bajones: In general, browser should recognize that DOM is being displayed, similar to DOM layers
… DOM is being read and reproduced, not recognized by the system. Possibly the solution is ubiquitous DOM layers. If I'm querying CSS, it should return an animation value.
… this is possibly outside the purview of this group, worth finding the group to implement the fix to what sounds like a broken situation

adarose: Valid situation for CSS animations to continue playing, such as a backgrounded tab

bajones: Question I have: Is this actual intentional behavior?

adarose: Could be specs or compatibility issue

bialpio: Does it matter that XR raf is using predictive/display time versus CSS might be using different timeline — FPS might not matter, but that it'll be slightly different timeline

adarose: If I had to do something time sensitive, I wouldn't use CSS animations — timeline isn't as important

bialpio: Might be the case that this was accidentally omitted

adarose: There is a significant performance impact when attempting to do this with large number of elements, big part of frame budget is checking for updates

mkeblx: As a developer, one takes on the performance impact as an accepted tradeoff

adarose: I try to cover most cases, but accounts with contingencies. Despite performance overhead, using page logic to control WebGL is an ideal approach
… goal is to not put page to sleep, and keep it alive
… for highest browser compat, would like a simple way of getting this solved

cabanier: Something when you request a session sounds like a reasonable push. Don't know if we need to talk to different groups about this, sounds like something we can solve in isolation.

adarose: It could be something like turning on a property by default.

<bkardell_> I play a 1s audio loop on my android device, otherwise my music player keeps dropping the actual sound (any player)

bajones: CSS animations across browsers do not seem to pause timing when I switch tabs, seems to be consistent, not retaining state. CSS animation seems to be jumping forward to meet timeline, doesn't seem to be a concern.

mkeblx: If by default, we change this behavior, what is the expected result?
… is there a downside to doing this?

adarose: A developer could be doing something improper, perhaps we can follow Three.js lead here

adarose: Could a background page that's still running impact WebXR?

cabanier: Yes, it could

adarose: I'll try putting something together and put it forward to get input.

Nick-Niantic: Are we going to make time to discuss DOM overlay later?

navigation

adarose: This is a great unconference topic.

ada: A at a last face to face, having a means to go from one page to another while staying in WebXR, there have been some ideas like waiting for an event to go to a new page, Brandon had proposed a more complex solution. I think it would be good to talk about again.

brajones: At Tpac and the last face to face we had feedback. The proposal that I made, was me "pie in the sky" lets solve navigation and a11y at the same time. I like the core idea but maybe not solving all the things at the same time.

brajones: the proposal that was made last time, the capture session should be the means. Rick's proposal had a session granted.

brajones: The fact this is ambiguous it is problematic.

Cabanier: Session granted is a trigger. In the session granted event handler you request session and continue on.

bajones: I thought session granted provided you with a session.

cabanier: No.

bajones: Do you have many people that are trying to use that?

cabanier: I don't have data on that

ada: I encounter this alot when I am debugging a-frame pages. A-frame supports it and is turned on by default. Not that I know many multi-page a-frame pages.

cabanier: That is the first time I have seen it used.

Ada: I am tempted to add it to a project. Rather than having to refresh a lot.

bajones: I was just reviewing the readme on navigation so far. We need to update it. It indicates you get a session from the session granted event. I am going to assume that when you fire off the session granted event and you don't respond..

cabanier: You have an amount of time to respond.

bajones: I have a request for that. The session granted readme is out of date. If you could list out the mechanics to have it reflect what the meta experiment is doing, it would be helpful. Given that, I am going to leave the catching side alone then if it has been working well for you. And some of my assumptions were wrong. We should go back to have you initiate the sessoin.

cabiner: You can navigate however you want to. If you don't resume the session in a number of seconds you get kicked out.

bajones: that is currently limited to same session?

cabiner: same origin

bajones: None of this is new from the slide decks previously, the thing this kind of misses out on that we would want in a complete nav solution, it doesn't provide a way to provide context to the user where they are going when they navigate and it doesn't provide the user a way to have them know an action is a navigating action. So I think that, I don't think we should take that ability away. Web pages can do this, a mouse move can be a val[CUT]
… We also have a href that we can use.. we should have a specific gesture that says, okay I want to do a navigation gesture now, and then go to the last place they want to go that are the navigation target.
… Those two things allow you to determine where the navigation target is, and the nav is predictable and owned by the platform. I can then hold down the menu button for two seconds and navigate away.
… You have more chances for the UA to mention something and context about where you are going (bank, etc)
… Those are the core elements that I feel somehow should be present. I don't know or care about the API that should provide them. There are two slides, TPAC and others in which we mark up everything, and others where we markup just navigation elements.
… Starting with where you are going to navigate and seems like a good place to start.

Cabiner: The issues are every time you navigate there is dark, there could be several seconds of dark. The website you navigate to may not render or be ready to render especially the first time. If we have a solution, it needs to somehow allow preloading. And the page can say I am ready.
… So you can have a seamless transition, I don't know how that would look like. I looked at something that Google proposed that was portals. It was a noninteractive page, it doesn't know it isn't being loaded but is being displayed. It could be built on and be shown in a non interactive DOM layer. Maybe a meta dom that allows you to navigate to that.
… At that point session granted could fire and seamlessly take over.
… That kind of solves the problem. You want the navigation to be seamless.

bajones: So, I generally agree. I have so many fanciful wonderful ideas on how we could bridge the two sites to make them nicer. A twitter thread about endless elevators. I think those are ideas that we can add on top of the core of navigating. I would like to not always stop at that point because we figured out and more concretely build on top. I like the ieaa of portals.
… If we have something like a broadcast to the page, this is where you will go if you navigate now. This can be a signal that I as a UA I could be proactive about it at that point, but you don't always want to do that. If everything is an acnhor, loading all of them.
… The gesture is also something in which you need to press a button for a second or two. There is nothing to stop you thinking they are going to abondon the gesutre but I am going to preload now.
… If you do the nav this way you have more control to go in and out, otherwise you hit a black wall and the page just dissapears, maybe you can show a spinner. If a user initis a gesture it gives you more chance as he UA to smooth it over. And the browser to do things like show the skybox or whatever.

brandel: There are a bunch of things you alluded to, an explicit path you are going to or a specific action for that specific thing. Makes me wonder there aren't trusted UI or trusted events.
… Do you have that?

Bajones: We have had bits of trusted UI, it's been iffy. We have had a floating thing that was telling you but it was annoying but then we made it https only. There was a thing where you could ask for permission for camera in XR and I know during the time we had browser for daydream we did have UI for that. It may have backrounded the session. THere are ways to do that. We are in a secanrio where we take over your whole view. THere is no gau[CUT]
… the pixels are being rendered by the UI, if there is a prominent browser it becomes easier to spoof.
… An environment that the browser can make that is recognizable, the browser can always render on top. Positioning is tricky they can try and put stuff above and below. There are ways that you can be more assured the thing is the browser. The concrete guarantee we don't really have that option.

brandel: do you have it rick?

cabanier: we don't

bajones: the reason we want the gesture is to know that it is explicit, whatever button on the quest when I am on a session it will bring me to a panel. I have no doubt it came from the OS becasue the page cannot override or incercept. That button will always take me to that page. If the other side is held for two seconds and then it switches away, and says you are about to navigate to evil.com the only way I could get that is from the browser.
… While it is difficult to render it into the scene it is difficult to trust, but you can know for certainty you know came from you..

ada: something that is new but was very pie in the sky since we last talked about this, I don't know if it has landed landed, the fade transition effect.

<adarose> https://developer.mozilla.org/en-US/docs/Web/API/PageTransitionEvent

ada: In 2d. Once we know the navigate from one page to another. The 2d web, had been doing this. There will be a link in IRC>
… This is a part of the page transition api that allows you to animate smoothly between pages and it does the animation. I think we could hook into these events.
… Instead of a 2d transition we could have a developer know that they could do a nice transition for a interstitial state. I think it could be nice.
… we can probably end that conversation there if there isn't much to add. Coffee?
… After this is unconference items. Then it's the end

unconference

<scribe> unconference topic doc

ada: There is a doc, I will paste it in IRC. You can add stuff to it if you want.

DOM layers

Nick-Niantic: cabanier mentionned that DOM-layers will have a different document, what are the ramifications?

cabanier: each DOM layer is like a dialog
… fullscreen only works for a single element, so it's a better fit

Nick-Niantic: so if we want to bring something from the original page to the 3D session, how can we keep things in sync with the dom-layer
… so could the DOM layer be the whole document?

cabanier: people might want multiple DOM layers
… you have to provide the URL (same origin, no foreign content) when requestion the DOM layer

<bkardell_> postmessage?

Nick-Niantic: using a separate URL could be problematic

cabanier: you could create an empty dialog then populate it from the main page

brandon: but what about event handlers etc... if we move element to a different document

Nick-Niantic: if something moves to the dialog then back, what will break?

brandel: the CSS cascade will change

cabanier: you could getComputedStyle() on everything before sending it to the dialog

Nick-Niantic: but that wouldn't work well after class changes etc...

ada: looking at the adopt node spec, it doesn't mention event listeners

cabanier: if they are on the same element they should just work

Nick-Niantic: so the recommendation is to load `about:blank` then programatically populate everything

Marisha: how does the origin work for `about:blank`

ada: we might need to explicitely state that it'll work for same-origin and data URLs

Nick-Niantic: the existing "dom tablet" concept should be implementable with enough framework work

cabanier: still afraid that it'll be too much of a pain and people will roll their own

Nick-Niantic: what about clicks / events?

cabanier: event handlers should fire

cabanier: the author will have to intercept the select event, and relay where on the DOM layer's quad it should be dispatched

ada: if it's coming from the xr select event it should be trusted

brandel: but is the target un-ambiguous?

cabanier: that's why we have the same origin and security restrictions, you would be lying to yourself :D

cabanier: should the fullscreen API work there?

ada: it would be handy for videos

brandon: I have a demo!
… demoing moving a button between windows
… the script continues to work, event bindings work, but the style doesn't move between windows (obviously)

adarose: we should make a repo and have a place for people to file issues

cabanier: there might be a PR to layers

Marisha: why do we need to pass a URL instead of passing a document fragment directly?

cabanier: we need it to get a new document

brandel: document have a path, need dimensions etc...

cabanier: yes you need to pass dimensions too

<bialpio> https://www.w3.org/TR/webxr-dom-overlays-1/#xr-overlay

cabanier: but dom _overlays_ work similary to fullscreen so you don't need to pass dimensions
… and you can only have one

adarose: if you only want one element, could you get the same treatment as overlays
… oh but it wouldn't work for events

bajones: 1. how do we determine the rasterization size? which is distinct from the size of the element in space, and needs a limit
… 2. interaction wise how do I know what I'm pointing at? which ties into the keyboard integration from yesterday
… the keyboard blurs the scene, but that maybe wouldn't work for dom panels. but maybe it's a good place to start
… the user needs to be confident about where they're pointing

adarose: but again, with the same origin limitations as a developer you can't do anything you couldn't do before
… no need user input hack

bajones: correct, but still concerned. do we need an explicit mode switch depending on what the user is interacting with
… what would actually trigger the pointer events?

cabanier: needs to be specified, but would probably be an API on the layer

Nick-Niantic: could the browser do everything here?

cabanier: how? the layer could be occluded (and we can't read back the depth buffer)

cabanier: when casting a ray, only the author knows if you hit the layer

adarose: accessibility wise, we have access to the accessibility tree here, could we do anything based on gaze? move a virtual cursor?

cabanier: things like hover events are going to be compex

mkeblx: we need more than click interactions (scroll etc...)

adarose: maybe the events should be treated as non-artificial events
… so more things would work (sliders etc...)
… and then we could just do pointer events

brandel: users of this API would taylor the interactions for this setup

Nick-Niantic: but we would like to make the effort minimal for developers, and have things working on phones etc....

bajones: I would expect an even split between people building XR UIs with the DOM, and people bringing existing UIs in

cabanier: the main document might have the same restrictions. so you couldn't have 3rd party scripts on the main document handling events from the DOM layer

Nick-Niantic: the use case we care the most about is having the outer page controlling the DOM layer's content
… so anything you can do on your one page should work here
… for scroll alone, it seems that it would be much easier if a controller could be assigned to the DOM layer, letting the browser handle event dispatch (not the author)

bajones: the more I think about it, the more I think we should have a mode switch where the OS takes over (and handles input events)
… on daydream the platform conventions for scrolling were different than on quest
… so the author couldn't reproduce it while respecting the OS conventions

adarose: this feels like the onbeforeselect event
… both get the event but the xr content can ignore them

bajones: on the quest, do you show a cursor or just a ray?

Marisha: yes we show a cursor
… for a 2D browser

bajones: the idea would be to have the OS cursor be the "source of truth" for where the input event will be dispatched
… the web content stays in control of the ray

Marisha: we might have some implementation issues, but theorically it could work

brandel: where do we put the limit? would the document also get devicemotion?
… and resize

cabanier: currently you can't resize a layer

adarose: one thing developers might use this for is a custom DOM-layer based keyboard (to avoid the blur?)
… and for the extra control

brian: some people to that on mobile already

bajones: native apps too

adarose: let's wrap up, it's a very cool thing

cabanier: need to confirm if the CSP needs to be set on the parent document or not

adarose: which would break people loading three from a CDN etc...

brandel: (and others) the CSP is set in the HTTP header, not the page, so github pages would break too

cabanier: points that a meta tag is also available

brandel: but then who wins?

adarose: we definitely don't want things like bank's iframes being loaded
… but scripts should be fine?

cabanier: the spec says that if you use the meta tag, you could load scripts _before_ the meta tag at it'd work

adarose: it would be nice if only the subpage had the restrictions

– DRAFT –
Immersive-web WG/CG face-to-face 2023/04 Day 2

25 April 2023

Attendees