Immersive-Web WG/CG face-to-face day 1

Meeting minutes

Introductions

intro

Ada Rose Cannon, Apple; into declarative stuff

Nick, sr director at Niantic; AR web and geodata platform for devs

<adarose> https://hackmd.io/@jgilbert/imm-web-unconf

additional introductions available on request

webxr-gamepads-module#58 Add support for a PCM buffer to the gamepad actuator

<adarose> https://github.com/immersive-web/administrivia/blob/main/F2F-April-2023/schedule.md

Ada: First item on agenda is Add support for a PCM buffer to the gamepad actuator

<cabanier> https://github.com/WebKit/standards-positions/issues/1

<cabanier> https://github.com/w3c/gamepad/issues/186

Rik: had support for intensity and vibration of rumble/haptic on controller, but done through nonstandard API
… Google wanted to extend API, Apple objected; should use .WAV file and leave implementation up to developer
… Can send multiple frequencies through motor; want to add API to pass audio buffer to controller
… Haptic actuator is nonstandard, people want to get rid of it; but alternate proposals haven't been developed in two years
… Also based on touch events, focused on mouse more than controller
… Want API for it in WebXR, so instead of going through input profile, gamepad, haptic actuator, etc.; just go straight through WebXR
… Complication is a constant source of problems
… Really just need a method to play audio file

Marcos: putting on web apps working group chair hat, I work on gamepad API with colleagues at Google
… objection was shared, and based on the idea that Xbox live folks were using dual rumble
… dual rumble was supported in chrome; objected that this is a terrible design, all you do is pass enum and get things to rumble
… no fine-grained haptics there. Implemented in web kit, safari, but all found it abhorrent as a working group
… Putting Apple hat on instead, would object to it moving because compared to core haptics, using only audio to represent haptics is not good enough; can't get fidelity you need
… must synchronize audio and haptics together; not sure what a WAV file would lead to on a gamepad
… for more complicated haptic devices, there are different regions, multiple actuators; proposal from microsoft is more region-based, e.g. a glove with actuator for each finger
… In web apps, we claimed actuator part because of gamepad; want to figure out in this space whether it's the right time to do generalization
… Minefield with regards to IPR as well. It's a new area for the web, fraught with potential issues
… e.g. vibration API is not in webkit because you can do vibrations that feel like system vibrations, alerts; could be scary
… Together, many issues that increase complexity; just sending an audio stream isn't good enough

<CharlesL> +1 to include region based tactile as well

Marcos: But acknowledge that a lot of devices do take audio input. Need to find a happy medium

Brandon: In addition to being concerned that this would map to devices we're trying to support, I feel strongly that putting an API on object A when it likely belongs on object B because we aren't getting what we want from group B is not the right direction
… Could be applicable to any gamepad; we should be improving this for all gamepad-like objects
… Would want to see evidence that what we're doing only applies to webXR devices
… PSVR2 has rumble in the headset. Could see argument for "let's give the session itself as a proxy for the device the ability to rumble" (though an edge case right now)
… Don't just try to leapfrog bureaucracy using the spec - shouldn't take exclusive ownership of this capability

Rik: Some frustration because haptic actuator has been festering for years. Shipped it nonstandard, leaving us in bad situation
… Some frustration over lack of progress. OpenXR supports ___ CCM, with plenty of experiences that use API without problems. Not sure if there's something missing by playing an audio file

Ada: From a politics standpoint, is there anything we can do as a group to encourage discussion? "Festering" is an unfortunately accurate verb

Marcos: For those of us with ownership over gamepad, we meet once a month Thurs at 4pm; could be a good time to grab Microsoft folks and push discussion
… On the scope questions that came up, targeting gamepads, writing instruments, etc.; could be overly generic. How much of this is an XR issue?
… Folks at Apple adamant that audio isn't going to cut it. Need better synchronization
… Must synchronize haptics to audio itself. Renderers need to sync with each other, which is challenging

Manish: Heard a bunch of political/technical reasons for trickiness; sounds like there might also be a lack of people to do the work
… Quite a bit of interest here, in this group. Worth wondering if there's a way for people in this group to submit proposal to help

Marcos: Yes, that would be great. Have been wanting to rage-rewrite it for a while, it's a mess. But it's a matter of resource allocation - need testing framework, etc.
… Would be great to apply resources from multiple companies, have a nice base to apply future WebXR work as well

Charles: From accessibility POV, having only an audio API would be an issue. Having multiple ways to target different regions could be very beneficial if e.g. audio is only coming from your right or left

<Zakim> adarose, you wanted to ask about a web audio node

Ada: This is probably wrong group for this, but could be cool if it was a web audio node

Manish: as hacky as that sounds, it might be the best way to do this

<Zakim> bajones, you wanted to point out compatibility challenges if only targeting the highest end haptics

Brandon: Want to caution against the perfect being the enemy of the good. In some cases, you've just got a little motor that buzzes
… Would be a shame if we ignore pressing need for haptics in devices available today because people want to be architectural astronauts
… Balance to be made between quick and dirty, vs planning for the future

Brandel: on topic of devices that exist today, Xbox One has 4 actuators, with intention of spatializing a haptic moment. The accessibility controller also has haptics
… need a higher level signal to make judgments on what spatialization entails

Marcos: Sony are editors of gamepad spec, have asked them to take a look at uploading audio from the web
… Concern that comes up is that controllers were never designed to take random files from the web
… From a security perspective, not sure whether harm can be done. e.g. overloading the motor
… iPhone considered a gamepad as well

Rik: for reference, Quest Pro controller has 4 haptic actuators, including a fancy one; all take audio, system downsamples to do something reasonable

Brandel: does it expose relative position?

Rik: no, the gamepad is just supposed to know which one is which

Marcos: have a demo that could show how it does work with audio. How it synchronizes, etc.

Rik: Everything is synchronized to display time. Pass it a time, it plays at that time.

Marcos: Send it like a URL?

Rik: No, it's a web audio buffer. Already in memory

Ada: We should set a cross-group meeting

Marcos: next meeting on Thursday May 11th

webxr#1320 Discuss Accessibility Standards Process

Dylan: prior a11y discussions: webxr has *some* control over this but it's fundamentally a low level system
… we should figure out what of this is under our scope, and what falls under other groups
… case study: charles & i are a part of an NSF team, making nonverbal communication in XR accessible to low-vision people. making gestures, physical proximity, and 3d/2d content and turning that into sound and haptics
… some things here we can help handle, some things like gestures or emoji are beyond the webxr level

<Dylan_XR_Access> Resource: How Do You Add Alternative Text and Metadata to glTF Objects? https://equalentry.com/accessibility-gltf-objects/

Dylan: <missed>
… could create a task force from XR a11y, separate from tuesday meetings
… can bring recs to this group as a while, and bring in the APA/etc

Charle: <last two lines>

Dylan: A lot of the current screenreaders in VR are about pointing at what you want and OCRing what you see, as opposed to looking at "everything in the space"

<Jared> https://www.w3.org/2019/08/inclusive-xr-workshop/

Jared: back in 2019 was that there was a workshop. there's quite a bit of shifts around responsibilities in the w3c
… could be interesting for this group to have one resource for what the responsibilities currently are
… i've had a hard time discovering what that is. would be good to come up with consensus on the current state of things

<Dylan_XR_Access> XR Access github: https://bit.ly/xraccess-github

<Zakim> adarose, you wanted to ask about standardisation work we can do in this group

Dylan: one thing is that we don't have things like a list of legal responsibilities for XR, and that's one of the problems
… good to have minimum guidelines around this

ada: 100%, if we had such minimal guidelines we could start building the things we need so people can satisfy them
… also this is a good group to do this work, to do it in the group or as a separate task force formed from this group
… something mentioned last week, might be a good idea to ... like the a11y object model ... the visual parts of that model is quite tied in, but nothing like that for WebGL. Giving people the option to generate that themselves will be useful
… and then if there are minimum-viable standards later, we can say "hey we made this easy for you" (and if you don't do it, there's the stick)

Nick: when we talk about a11y we talk about alt text, ARIA tags, ... markup
… as ada said we now have webgl/gpu which don't know anything about what they're rendering
… but you also have frameworks like a-frame/etc that integrate with DOM/etc
… and they can perhaps do more semantic a11y stuff
… otoh there's pushback against them for being heavy on the DOM

Nick: in other words; do you think we should make a standard like a-frame, or something else?

adarose: would like an imperative API, where you build some kind of tree
… probably has access to hit boxes / etc

Nick: could it be declarative? like a json file?

ada: i guess you could and then parse it into tree

ada: part of my instinct is to keep the DOM for stuff that is rendered in DOM
… especially as we get more DOM integration
… a-frame has shown you can have a nice matchup bw the DOM tree and the scenegraph

Manishearth_: I would prefer an imperative API, I would not want to standardise AFRame for a11y, I think for a DOM based API, that more ARIA tags which declarative APIs like AFrames could use would be nice. But an Imperative API would work for everyone. Whilst I think the DOM based API is fine I wouldn't want to force everyone through it.
… there are lots of tools for that scenarios I don't want people to use a11y I nXR to stick with that approach. Imperative APIs can be integrated in to the DOM based APIs without it being an additional cost

Dylan: another player that we should keep in mind here is screenreaders
… there's gonna be a big q of ... when they get this, how do they interpret it
… what are they going to do with it
… would be very curious to see what the differences are when it comes to how they acquire their content, and how different screenreaders fare when fed these things
… if there's a way we can make these experiences at least navigable from a user exp ... relatively similar, so people aren't coming to this completely confused as to the way it was built

ada: i think things like unity, when they're targeting the web, ...
… things in the session or on the document itself , they should be able to use it
… because it's an new rendering mode, existing screenreaders would have to write addl apis to hook into it. needs to be easily accessible, not deeply ingrained in a way that you wouldn't get from the DOM tree + executed JS

cabanier: not sure if we ever wrote down results of a TPAC session about a lot of this

ada: hopefully minuted. wasn't our meeting, might be an a11y group

cabanier: at the time we thought we had something that covers most of what is needed by webxr

<cwilso> There was a workshop

ada: going to make a repo to start work here. it's going to have to be impld in the browser

<Brandel> cwilso: Do you have the link to the Webex?

<cwilso> https://www.w3.org/2019/08/inclusive-xr-workshop/papers/XR_User_Requirements_Position_Paper_draft.html

cwilso: <is banging on the gates>

fetchez la vache

Jared: what kind of process exists to ensure we follow the success criteria that each spec has to have an a11y section

ada: we generally ensure that the webxr APIs are more accessible than what they are building on
… big problem is that devs aren't really using stuff we have at the per-spec level, doing something like this might work but nobody else is doing that kind of work

Charles: The concept i was thinking of was the w3c registry
… screenreaders already know how to navigate the DOM, that might make sense
… as long as the new portions of the DOM get updated as you move around
… parallel with the publishing group in the w3c, created a separate group

ada: regarding last point; pretty much all of our specs are done in parallel
… so a module would fit in very well

Nick: conversation earlier, may want to consider a spec that's not only for web devs but also useful for unity/etc people
… on one hand a thorny problem to solve at an api level. thinking of GLTF as a format; maybe a way to do a11y tags is as a part of the gltf spec
… and then you have browser libs/etc that read source information in that scenegraph
… not perfect, if you're not using GLTF and doing runtime stuff, there's no real recourse

ada: for the model tag discussion we're going to need this kind of thing

Dylan: we can connect with the devs working with screenreaders
… other thing is we can work with unity/etc people who need to integrate it
… also need to figure out where we expose it at each level

<Zakim> bajones, you wanted to point out https://github.com/WICG/aom

bajones: at tpac 2022 we had a meeting with the a11y object model group, part of WICG
… part of the programmatic extension of ARIA
… can we make imperative canvas/webgl stuff more accessible
… mostly just "everyone defines this through js". problem becomes "how do we motivate that". should continue to interface with them, they were quite interested in working with us
… second point: idk how well this would apply here. One of the things we did to the webgpu api was an abundance of labels. This is just for development purposes; so you can have good error messages
… appealing to dev's selfish nature here ... works!
… anything we can do to make object-picking, debugging, etc easier; whatever carrot we can dangle, that would prob be good

<CharlesL> +1 to the a11y labels carrot for devs!

ada: reminded of Google driving SEO with this

manish: note on process; we do have
… a11y at CR review. if we want to do more than what the review requires we can also do that, tricky. generally in favor of having an a11y-focused model that other specs build on

Dylan: making things accessible makes it more readable to machines too
… one thing we do is to get univs to teach a11y but also get people to work on these kinds of challenges
… if there is oss code from this group, that's something we're interested in making easier to access
… encourage people to reach out!!!

<Dylan_XR_Access> Prototype for the people project - would be happy to add anything from this conversation that needs additional development muscle: https://xraccess.org/workstreams/prototype-for-the-people/

<Jared> I'm interested in that. Working on OSS WebXR samples now with lots of people in the community.

Charles: what about building a11y checker tools

ada: currently not much existing , a lot of tooling is around rendering

Charles: might end up becoming a legal requirement, even.

ada: really pro there being a11y standards for XR
… lots of places won't do a11y unless legally mandated to do so
… unsure if it's us or a different group

Dylan: def agree
… we need to surface text, even, at the AOM model /etc
… do we have info for the xaur group
… e.g. if you're in a social vr setting you should be able to tell where peoples' avatars are
… ensure that the right concerns get directed to the right group

Nick: q for googlers: relatively recently Google transitioned Docs from being DOM-based to canvas-based
… improves compat and smoothness, but now you have to reinvent a11y

bajones: idk about what efforts went through to make it accessible
… i was under the impression that it had happened but recently i went spelunking and there was still some DOM there
… so the transition may not be as complete

Nick: hm. find-in-page at least doesn't work

bajones: do not expect it was done in a way that was necessarily easy to replicate outside of google

<bajones> A relevant link for the Docs canvas transition: https://workspaceupdates.googleblog.com/2021/05/Google-Docs-Canvas-Based-Rendering-Update.html

Nick: interesting that Docs is kinda in the opposite situation where they're moving from a structured model to 2d rendering

Dylan: path forward: do we work with folks like unity/8thwall/etc to come up with the solution? can we require users to use something

Nick: yeah even figuring out the level at which to do this is hard

Nick: At least for the 2d web the browser knows everything about what's going on, we're nowhere near that here

ada: one approach i'd like to take is have a go at speccing out an API to let libraries add the info needed
… "this is a thing we're proposing, a11y, SEO, etc", showing it to the various people who it's relevant to
… "does this fit with what you're building"
… then we can approach the model people with "these libraries have ways to add things to rendering, but these models are opaque blobs"

ada: even if we do something like that, it won't be useful in all situations
… "there is a fox person in front of you with red ears and ..." is not necessarily as useful as "there is person A in front of you, they are walking away, slightly frowning" in many contexts

Dylan: our nsf grant is helping figure that out

ada: have opinions about avatars on the web, think we need to drive standardization before we get into a problem

Charles: reach out to the various a11y groups at tpac?

ada: good call, haven't started assembling an agenda but can

Manish: a big diff b/w the 2d web and XR is that the 2d web can be represented as a roughly 1-dimensional thing ( a traverseable tree) with some jumping around, whereas for XR that's very ... not true; what is and isn't important; and how that changes over *time* leads to trickiness, and different applications will want to highlight different things. we do need something low level

Yonet: <offer to help Dylan and also solicit other help>

Dylan: to give a sneak preview of the stuff we're doing, we did an AR thing to use e.g. the hololens for real spaces, to e.g. help blind people navigate to the right bus stop
… when you try to make everything audible at once everything is irrelevant

<Jared> I am interested in participating in the accessibility initiative too.

<Yonet> Great

Dylan: would like help setting the group up

semantic-labels

<adarose> https://github.com/immersive-web/semantic-labels/issues/4

cabanier: planes give you the different surfaces, hti testing lets you point rays at things and see the intersections

<Jared> Is there a link for this issue?

Rik: quest browser gives back planes / etc. where in real world. not sure what you are hitting / planes you are hitting etc. user has to manually set up the room and what those objects are, table chair etc.

cabanier: but you don't know what you're actually hitting. in quest the user tells us what their things are when they set stuff up (manually). we want to expose that to webxr
… so you know if something is a door or window or something

<Yonet> Jared https://github.com/immersive-web/semantic-labels/issues/4

Rik: update two existing specs. in the array of attributes single DOM string attibute.
… set up a repo that defines all that.

bajones: topic came up before. only expose metadata on hits correct>: Yes
… hit tests could get a rough idea, as you point you can have items call out
… curious expected use cases? if I can only get back the real item you are pointing at in the real world

Rik: plaines API, Meshes API. you can query all the planes in a scene
… quest browser this website, gives link to the privacy policy on what data you are giving up.
… if you are putting furnature in a room you put it on the floor. and likewise a paining should be put on a wall and not a window.

bialpio: know there are some products that exists that label context "mask" of what user sees would annotate every pixel. if devices do opperate like this how do we expose this info through the API. where is the Sky etc.
… wonders annotated buffer, we may not know where all the pixels are. table top board games where is the table? how do we integrate with a buffered approach limited APIs limit to bitmask, sky vs. wall, vs. window.

Rik: going outside is still unsolved for VR. tied to the room, even walking between rooms.
… not really implemented correctly. Semantic Labelling comes from OpenXR.
… would be an optional label.

bajones: assume real world meshing paired with semantic labels, would this help with the tagged buffer?

???: will there be a viewport like the sky?

bajones: if I am in my living room and I can label couch / chair, but when I go outside I won't know there is a mountain vs.. sky.

???: No confidence level could be useful, tagged buffer could expose confidence level. I am looking for problems here not sure if they are real.
… we need to make sure the API is flexible

bajones: masking out sky, star chart AI.
… anyone know what they are using for masking out the sky in those applications.

Nick: we employ two scenes and one with a mask over it.

<Yonet> https://github.com/immersive-web/semantic-labels

Nick: do you what they are a list of semantic labels?

<Yonet> https://github.com/immersive-web/semantic-labels#list-of-semantic-labels-for-webxr

RiK: you can add more.

<Dylan_XR_Access> Desk, couch, floor ceiling, wall, door, window, other

RiK: other right now is undefined right now empty
… if you manually draw a table it won't have the semantics. one label per object.

Ada: is it an array of 1 item. table and round table, brown table.

Rik: we should not invent it now.
… confidence level, I don't like that pushes the decision to the developer, avoid conf. level would be good.

Nick: confidence level: content fades out along the edges, having confidence level is helpful. per pixel confidence level.

Rik: Depth.

??: AI-Core could give you depth information. one issue about that. consider how to expose this. one buffer with both confidence level and data but they changed that.

bialpio: open XR coming from Meta.
… who implements the extension.

Dylan: a11y impact being able to label whats in the users env. is very important and where the edges are, edge enhancement around the boarders very important.

Rik: quest we do this so you don't trip over the edge of an object.

bialpio: Computer vision if we do expose the confidence levels it may not render, it may ignore if a table has 30% confidence it may not render it, but leaving it up to AI is probably not a good idea.

<Dylan_XR_Access> "The difference between something that might go wrong and something that can't possibly go wrong is that when the thing that can't possibly go wrong goes wrong, it's generally much harder to get at and fix." -Douglas Adams

bialpio: making sure we don't paint ourselves in a corner, make sure sky detection with blending sky not sky works for this.

Nick: hit test for meshes / planes as headsets go outdoors. bulding meshes outside may be challenging. may be useful lables per vertex not per mesh. this region of a scene is a bush or tree.

RiK: could have 1000's of vertices

Nick: could be used to place content in a smart way.
… having labels per plane could be useful but outdoors could have multiple meshes for multiple objects.

Rik: Is there hardware?

Nick: there are classifiers mesh generation from scanning.

bajones: which methods expose which data. Are these semantic labels / classifications and ability to add to it later. planes, pixels, meshes, pixels. seems to make sense. propose: we should have a registry of semantic labels.
… image based masked different pixels be labelled / confidence levels etc. give each of these values anoon. we should have concrete values an integer value for each label.
… Sounds like semantic labels is Yes.

<Leonard> ... email: Dylan [AT] xraccess [DOT] orgpresent+

Model Element

marcosc: not much has been done since the last meeting
… mostly because the needed stuff is easy
… the issue is that we need to agree that model is a good idea
… I was waiting for mozilla's position, standard's position
… I think cwilso has an opinion
… there's more question that I've been grappling with?
… like is this a media element?
… is it like a 3d video? What about a11y?
… how do we describe the accessibility of it?
… we have a bunch of web content that moves and that has a11y content
… one of the elephants is the format
… I'm not pushing but we have gltf and usdz
… and we designed it format agnostic

<Jared> Okay, is this related to https://modelviewer.dev/ ?

marcosc: there's going to be an industry push for a standard format
… how are we going to work out the format issues? We're going to have a conversation about that
… this is roughly where we're at
… in webkit we landed width/height attributes
… (ccormack worked on that for half a day)
… feedback please

Leonard: fundamentally this is a good idea to display 3d format
… but there are a whole bunch of issue
… like how to ensure render quality, animation, interactivity
… how do you get the camera in the scene
… it's really hard to solve these issue
… the formats are going to be an issue but the concept should work out first

marcosc: I didn't really mean that easy :-)
… you are right that rendering is quite hard
… and those are things that I need help with defining
… I'm unsure if they will be easy

Leonard: in the issues, the specs are out of sync (??)

marcosc: we don't have versions in html
… it's not trivial. there are test suites that may have fidelity
… there is not the idea that we have version
… specs can move faster than implementations
… there is nothing process wise from making progress quickly
… I don't want to merge things in the spec without other implementor feedback
… I don't want to add prototype-y stuff

Marisha: it came up earlier that webxr is a black box
… there is a huge number of developers that can't participate because it's so complicated
… the web is inherintly semantic so the model would be very helpful

bajones: I think that the desire is understandable
… especially in light of the a11y discussion
… for developers that don't want to do the imperative thing
… but my issue is that this feels like an unbounded space
… working on webgpu and webxr, when talking about the model tag, what can the web already do that does that
… three.js, babylon can add on new modules and grow in complexity forever
… which is ok for a user space library
… but I'm not comfortable with that on a web spec

<cwilso> +1

bajones: I don't want to become unreal engine in a tag
… is there a reasonable way to cap that complexity
… is there something that we're willing to limit it to?
… I don't know what the escape values are looking like
… getting gpu buffers from the model tag is likely not a solution
… we'd feel much better if there was a clear scope of work

marcosc: I couldn't agree more
… I though you were going to mention the video element
… which is what I'm envisioning
… given a file, render something on the screen

bajones: I've had conversation about glb/usdz
… because people think that you can just extend it
… we really don't want to add things like hair rendering

<Leonard> USD connectors are the extension mechanism

bajones: even gltf has a bunch of extensions
… things like refraction index, thickness of glass
… there should be a line
… and there's a temptation to keep pushing the line

dulce: physics is a big problem in xr and you will always be pushing that

bajones: for context, babylon worked with the havok team so now we have high quality physics for the web
… do physics need to be part of the web?
… will this reduce the complexity? People will want to push the line

Marisha: do you see a cap?

bajones: I don't know what that is
… but it shouldn't be infinity
… I think we can find something that doesn't require us to build a new silo

marcosc: how did video cope with that?

bajones: mpeg spec has a lot of extensions that nobody implements
… if we look at how video is actually used, they are very complex
… but in terms of behavior they are well bounded
… nobody expects all the pixels have physics attributes
… which could be reasonable for the model tag

marcosc: why don't I think that back to the usd team?
… how do we limit it so it doesn't get out of hand

<Leonard> glTF team does not to limit the capabilities

bajones: in the case of gltf, we'd likely support the base spec and a limited set of extension
… I'm sure there's a similar set for usd

marcosc: that is important to take back

kdashg: the format is the primary concern
… it would be bad if authors would have to provide 2 formats so it works everywhere
… we need 1 path forward
… whatever we do, we still going to be subsetting it
… this is what happened with webm with matroshka
… where webm is a subset
… and it's explicitly cut down
… so we'd need to do the same. People shouldn't have to experiment
… use cases are also important
… generally we don't see the model tag as something that makes it easier to draw 3d content
… we're handling 3d content well today
… we're focusing on narrower use cases
… some of the things there. For instance priviliged interactions
… like an AR scenario where you'd not need to give depth information to the web site
… so it would work with untrusted website
… the other thing is that you can interact with other priviliged content like iframes
… which is what we should be focusing on. Triage our efforts
… and not focus on making something it can already do easier. Focus on what it can't do
… it's going to be really tempting to show a demo
… dropping a model in an area scene
… we can already do

Nick-Niantic: obviously there's a lot here.

<cwilso> +1 to "focus on what doesn't demo well, rather than what does demo well."

Nick-Niantic: echoing what other people say
… models are easy and accessible on the web today
… with very little markup you can embed a model on the web today
… we don't see a lot of wanting to stop at a model
… most use cases require a dynamic presentation
… for instance, change a color on a model, you don't want to download a new model for each color
… usually you have a bit of code to draw animations
… running in offline mode is less compelling than something that is prebaked

marcosc: can you talk more about swapping in a model?

Nick-Niantic: I can show it on my screen

<Leonard> glTF has KHR_materials_variants that holds multiple materials for a single model

Nick-Niantic: (demoing) this is an example of website with different model
… this is telling aframe to make changes
… other cases are character cameras

<bialpio_> (additional case study: model viewer which uses glTF, but don't ask me how it works internally: https://modelviewer.dev/examples/scenegraph/#swapTextures)

Nick-Niantic: on the one hand, I agree that complexity can grow high
… I don't agree that low complexity is what we want

marcosc: (????)

Nick-Niantic: enough functionality grows quickly
… talking about 3D video, holograms are popular. (volumetric captures)
… for the need of the market, there are a lot of formats to consider

marcosc: we don't know what's coming down the pipe

Nick-Niantic: yes but we shouldn't limit us too much at the start
… a lot of the interesting cases with the video tag
… applying video interestingly in a 3d space
… where if you were to have a model tag, the question is how to get vertices and textures out of the model
… so it's limited that way
… we talked about gltf extensions, where they might grow and be extended over time
… maybe we add semantic information inside the gltf
… if the model tag is too limited, people will become frustrated
… finally, we were talking about a11y
… this could be embedded with the model
… what we want is an annotated scene graph like what aframe lets you do?

marcosc: what does aframe do with a11y?

Nick-Niantic: aframe lets you declare your scene as html in dom elements
… this lets you hook into the browser a11y engine
… it won't work out of the box today but it might require a new rendering engine
… in short the key is without a lot care, a model element is not as useful as what's in the market today
… what is the better more useful thing

Brandel: as someone who plays on the internet a lot
… we're always going to be disappointed
… to that end, I'm not concerned.
… what we need to find is the bare minimum that is useful
… I was looking at the model proposal. we talk about using the environment map without needing a user request
… or without needing access to the textures
… on the complexity, we should aim at what is the simplest thing possible
… and we should focus on the benefits
… knowing that there is more content in the future

cwilso: my biggest concern is that if a lot of functionality is in the format, that is problematic
… this moves interop to a spec
… implementations can put things together quickly
… safari use their usdz component to implement their model
… and now everyone else has to use the same component
… there are massive layers of features that need to be implemented
… if hair rendering was added, the model spec didn't change but the implementation did
… people don't like to implement multiple formats
… focussing on what demoes well is indeed the wrong thing
… do people remember activex?
… people could build fallbacks but didn't
… this kept internet explorer alive
… baking this much complexity in an engine without a web spec, is hard
… you don't want to expose things to user or developer code
… the boundaries have to be part of the standard
… I'm worried that this is going to create a massive interop fracture
… HTML should have defined an image and video format
… and an audio one because we still don't have good ones today :-)

Dylan_XR_Access: we were talking about how much we want to push it
… from usability perspective
… I'm wondering, are there certain things that we bake into this tag?
… is it controlled by the user?
… should we define the core things that are part of the tag?
… where does it all fit into it?

adarose: one benefit is that it won't render the same on different devices
… if I want to show a model on a watch I don't want to use expensive shader or refractions
… but if I show it on a high end computer, I would want all those expensive things turned on
… if you want to be pixel perfect, webgl is the thing to do
… different renditions is a feature an not a bug

Leonard: many of the engines already have an idea of device capabilities
… the bigger issue is that they should look the same on the different browser on the same device
… you can differentiate but different browser should have the same rendering engine
… usd is not a format, it's an api
… making a new format takes at least 2 years

marcosc: we estimate that any spec takes 5 years :-)
… having a single format, we've not seen such a thing
… we've seen disasters happen with formats. We've seen implementations becoming the standard
… we generally understand what we want
… if we do it in the w3c, we could all agree
… we could decide today to just use usdz
… but it's going to be challenging

<bajones> <narrator>: They did not all agree.

Leonard: the modelviewer tag can do most of what you're talking about
… you should have demos that shows what modelviewer can't do
… show what the community with model tag what can't be done with other capabilities
… there was a discussion about gltf extensions, if the model tag allows them it would break the system

marcosc: we would only do that across browser vendors
… like anything in ecmascript. there's a standard and aim for the same shipping date

Leonard: so it's extensions for browser?

Marisha: why can't we just decide to not have 2 supported formats?

<Leonard> USD is not a format. It is an API

bajones: there are platform isues. adding usdz is easy for apple but hard for other
… usd is not a standard. it's basicaly a black box
… you can put a lot of things in usd but apple will only render their own content

Marisha: is there is no desire for USDZ a standard format

bajones: there is no really standard

Marisha: is there no document?

marcosc: there's a github repo and a reference renderer

kdashg: this is not surmountable
… the video codec space, many millions of users can't decode h264 video because of patents
… it's because authors just use their defacto assets
… people choose the easiest and then users have problems
… we as browser vendors can't tear things apart and repackage it

<cwilso> +1

<Zakim> cwilso, you wanted to point out but if it looks better on Apple watch than on Samsung watch...

cwilso: the problem with having 2 formats, does that mean that they are both required?
… that means that they are not web standards
… you end up exploring what works in browser a and not browser b
… and we have a responsibility to make things interoperable
… yes, things can look look different on different devices but it should be roughly the same on similar devices

adarose: let's wrap it up there

<Leonard> Thanks you Ada. Is there any TODOs or TakeAways tasks from this discussion?

<adarose> https://hackmd.io/@jgilbert/imm-web-unconf

webxr#1321 Control over system keyboard's positioning

Emmanuel: we implemented keyboard integration for the user to trigger keyboard and use for input - first point of feedback was wanting to control where the keyboard appears

<adarose> https://github.com/immersive-web/webxr/issues/1321

Emmanuel: We currently provide z-position but looking for feedback about what folks think about availability for positioning:

bajones: What level of control do native apps have around this (like OpenXR or old Oculus APIs)? Or do native apps invent their own wheel here

Emmanuel: Not sure what native apps do

cabanier: There's maybe an OpenXR extension for this... Emmanuel what do you do?

Emmanuel: This is brand new, not very mature

bajones: Maybe it's too soon to try to standardize?

cabanier: there are standards in Unity that are used as an Android-ism

cabanier: If you are in Android, you can specify where on the screen the input is, and the keyboard will try to move itself to that position

cabanier: In immersive, that goes away

bajones: The thing that makes sense is to specify the bounds/coords of the input rect. But maybe that's more complicated than what I'm thinking (what if someone specifies 3mi away) - unless you want devs to specify exact coordinates where they want keyboard to appear

Emmanuel: Right now the keyboard renders at the same depth as the cylinder

Nick-Niantic: When I think about the placement of content, there are the nuances of the current scene + the current viewer. Asking developers to navigate that complexity can be challenging. Could offload some of the complexity onto the user and let them determine a better spot

Nick-Niantic: We had a project with a dom-tablet where you can pull things in and out of the 3D space - the way this was moved around was by grabbing and moving it in a radius around yourself, and follows you when you walk around. Making it easy for the user to move the keyboard is best.

CharlesL: From an accessibility point of view, a person with low vision may need keyboard to be in a very specific spot

Dylan_XR_Access: We've heard from folks that to read things they usually bring things close to their face, but they often can't do that in an XR environment. Ideally we'd have system settings for this

cabanier: This is the system keyboard so if it adds accessibility settings, you'd get that part for free like high contrast or letters size

cabanier: It could be a pain for the user to be able to move the keyboard

bajones: The two things are not mutually exclusive - can make the keyboard not occlude the input, but also make it moveable for users.

bajones: The worst case scenario is two different platforms that have two different conventions for where the keyboard is placed, giving inconsistent results to users

bajones: You don't want to rely on user's control of the keyboard but should enable it

Emmanuel: Team is still working on "Follow" functionality vs fixed w/ a toggle. This gets to the question of how we surface this to webxr devs

Nick-Niantic: *Showing demo on screen* This is the dom-tablet, it's not a burden, it's easy for users to use and place wherever they want

Nick-Niantic: If they get too far away from it it will also follow the user. An idiom like this is useful. Also we'd love this (dom content) as a native WebXR feature.

Dylan_XR_Access: Something that comes to mind when it comes to interaction - we don't want just pointing, should have equivalents to tab, enter, gaze controls, etc, because there will be folks that have trouble pointing and need things like arrow keys

adarose: One heavily-requested feature has been DOM-overlay for VR, or some kind of DOM layer for XR that's interactive. But as much as it's desired, it's very difficult to implement. It's been discussed for years without a lot of movement.

Nick-Niantic: We can offer our existing implementation as a reference.

Dylan_XR_Access: What part of this is being handled by the WebXR vs the system?

adarose: There's a rectangle that the user is carrying around that has the HTML content on it, with all the accessibility features you'd expect for HTML.

adarose: Currently all we have is DOM Overlay which is only applicable to handheld mixed reality experiences. It's difficult to establish what it should do in virtual reality

bajones: There's a demo for this and how you can take content and display it, but no one has described how this should work for virtual reality specifically

Emmanuel: These are great discussions: touching on some of the accessibility - one of the features for a system keyboard that will provide a strip at the top to show what content is in the input that is being modified.

Rigel: when thinking about text input in VR: hand tracking is becoming more popular, the raycaster handheld approach means that the keyboard is beyond the reach of your own hands. But with hand tracking you want something more touch type, and have to think about the distance from the user, have to think about input methods

bajones: If the system has been designed such that the keyboard can be accessed via touch typing, it should bring up a hands-friendly version of the keyboard. The system should know what input method is being used.

proposals#83 Proposal for panel distance API in VR

Bryce: I'm an engineer from the Browser team at Meta

<adarose> https://github.com/immersive-web/proposals/issues/83

Bryce: This is outside the context of WebXR, it is about exposing the distance of a virtual 2D panel to the user

Bryce: This could be like for a weather application - what is displayed depends on how close the user is to the panel.

Bryce: Another example is a picture window effect, as you get closer to it you can see more and more what is "outside" the picture window

Bryce: Do those examples make sense? At a high level - is there any precedent around this? Has it already been attempted? Just want to open up to the group for questions and considerations.

mkeblx: You alluded to the idea of a picture changing size - previous ideas in this group are things like a magic picture app - you don't have just the distance but also orientation and user's position relative to the screen. Do people still want that even though we dropped it for a long time? And would your idea be a subset of our previous idea

mkeblx: Another similar feature is the magic leap browser which exposed not just position but orientation via Javascript

adarose: One concern is that it could potentially be a privacy vulnerability. Maybe users don't want you to know if they're sitting or standing, where their head is in relation to the panel. I don't like the idea of giving user position to web pages.

Dylan_XR_Access: For some folks, being able to get close is necessary to see something. If it suddenly changes, that could be frustrating to users. But if that's something the user could control or have a setting for, that could be a feature.

Jared: What is a panel app?

Bryce: Panel app in this context is just a 2D Browser outside of the context of WebXR. If you're in VR viewing a standard 2D browser window

Nick-Niantic: My understanding from previously is that there were non-immersive modes to the WebXR spec that were meant to handle cases like this. If you wanted to have DOM content but also have a magic window

bajones: Clarification - there is a non-immersive (inline) mode, but it does no tracking of any sort. The thing it gives you is the ability to use the same sort of render loop with immersive and non-immersive content. So you can use the XrSession's requestAnimationFrame. Nobody uses it much, I wish I hadn't spent so much time on it.

bajones: We talked a lot about the magic window mode, that involves tracking users position. there were privacy considerations, implementation questions. We could revisit that, but that doesn't sound like what's being discussed here.

Bryce: Yeah, in its simplest form it's just "how far away is the user in this virtual space". Following discussion about XRSession, I was thinking it could be for devs who don't know anything about WebXR. It could be like the geolocation API that's just surfaced in the navigator.

<bkardell_> That is what I was going to say actually

cabanier: So we don't really need to know how far away the user is to the centimeter. We just need to know are they close, sorta far away, or really far away. It could be like a CSS media-query to resolve some of the privacy considerations. We don't need to know exactly how far away they are in relation to the window.

cabanier: It could be something on navigator but could also be some CSS-y thing that automatically reformats itself

bajones: I like the idea of a CSS media query in this context. It seems like the right abstraction. This isn't necessarily about how far away the panel is, more about the angular resolution of the panel (includes both how far away, how wide/tall, etc)

bajones: There is still some fingerprinting in the media query but you're not looking at what the user's head is doing. It seems like it slots in nicely with Zoom level, what frame, etc. You could maybe call this perceptual width or something - how big the user perceives the page to be, and have CSS adjust to that.

Brandel: What might be confusing for folks: Head tracking and how far away the element is are essentially the same question - just depends on spatial/temporal resolution. CSS media query is one approach, other approach is to have exact xyz coordinates. While they are technically the same thing, they can be used to serve very different purposes.

Brandel: There was a discussion about having a universal virtual unit like a millimeter

Brandel: If there were reasonable limits on update frequency and such, it could be very useful

adarose: You could even use the existing media queries, if you bin the information about panel size

<Brandel> https://www.youtube.com/watch?v=ES9jArHRFHQ is McKenzie's presentation

Brian: My immediate reaction is it sounds like it would be ideal in CSS like media-query list, listeners, CSS already has lots of things related to things like perceptual distance and calculations, since it is already used in televisions. This doesn't seem like it would be particularly hard, it fits well.

adarose: This might be more for the CSS working group instead of here

cabanier: Bryce first brought it to web apps, who told him to bring it here

<bkardell_> wow I must have missed that

cabanier: But we're more looking to get people's opinions on it. Sounds like people don't have too maybe problems with it as a CSS media query and binning.

<bkardell_> bryce can you share the css issue?

mkeblx: You mentioned the weather thing. But the Meta Quest browser stays at the same distance. What implementation are you imagining?

cabanier: Trying to do more mixed reality stuff, where people are expected to walk around more

<bkardell_> cabanier: do you know what the css issue # is? I don't see a handle here that is bryce

Bryce: With mixed reality over time, you might have more scenarios where a panel is attached to physical space

Jared: If you utilized existing media queries via virtual screen size, there might be some good tools to play around with

<cabanier> @ bkardell_ : we didn't file a CSS issue yet. Bryce went to webapps first because he wanted to extend navigator

Bryce: I wanted to ask about fingerprinting risk - if there were a permission dialog, does this group handle that sort of thing?

adarose: Usually permission prompts are not determined by this group, left up to the Browser

bajones: Usually specifications don't determine what is shown or said regarding permissions. We can sometimes say "user consent is needed for this feature" and mention permission prompts as an example but we don't dictate that that needs to be how consent needs to be given.

adarose: No one on queue, should we wrap up for coffee break?

Bryce: sounds good to me

webxr#1273 Next steps for raw camera access

<adarose> https://github.com/immersive-web/webxr/issues/1273

Nick-Niantic_: next step for raw camera access
… goal to consent and developer use cases
… reviewed Google Chrome implementation and reviewed it for headsets. chalange with render loop
… unlock new use cases.. reflections, adapting scale of screen, media sharing, image target, qr code
… skyeffects demo
… running nn in background, to build sky map, create cube map

<bajones> "It

<bajones> "It's hard to get close to the sky" [citation needed]

Nick-Niantic_: In general Niantic cares about use cases outside (sky, ground, foliage)
… marker based ar demo
… camera texture processing
… part of why demos are not polished further as Chrome API is still experimental

bialpio_: raw camera access (for smartphone) launched (no longer experimental) in late 2022

<Yonet> Nick, can you share the slides so we can add them to the meeting notes. Thanks!

<alcooper> Enabled by default for Chrome since M107

bialpio_: only smartphone specific/capable api's are released to stable

<Zakim> bajones, you wanted to ask what API (OpenXR presumably) Meta uses to handle passthrough.

bialpio_: other Chromium based browser running on headset (Hololens, Quest) do not support raw camera access
… headsets typically do not expose camera for web

<adarose> ack. cabanier

bialpio_: Nick-Niantic_ API proposed is a simple adapter

cabanier: On Quest Pro nobody gets access to camera feed

Nick-Niantic_: lot of advancements cimd execution on nn
… 200 fps on handhelds with cimd

<bkardell_> sorry it's a little noisy here at the moment - is the question whether any device would give wolvic those permissions?

<Jared> We could help with that... Also provide it a virtual means to do it

cabanier: unlikely realtime access to camera even later..
… Nick-Niantic_ exloring other headset providers to expose raw camera access

Yih: question regarding camera feed processing

Nick-Niantic_: slide 8.. only meaning to show middle demo "location"
… point is 6DoF tracking on the phone

Jared: interesting helping.. actual and virtual devices .. what is input the algorithm ? just color ?

Nick-Niantic_: needs rgb texture
… FoV of camera

Nick-Niantic_: will share the presentation

<Nick-Niantic_> https://github.com/immersive-web/raw-camera-access/issues/11

Nick-Niantic_: these are the needs

Jared: I can imaging and implementation on virtual enviromnet.. that later can work on the headset

CharlesL: I was wondering .. to add a secondary camera

cabanier: can not comment on future devices

bialpio_: we have been exploring marker detection.. it is in chromium repo, will link to it
… we used opencv marker tracking module
… not easy to get a performant implementation

<bialpio_> https://source.chromium.org/chromium/chromium/src/+/main:third_party/webxr_test_pages/webxr-samples/proposals/camera-access-marker.html

cabanier: presenting stereo, how does that work with camera feed

bialpio_: the app could reproject
… user sees excatly what the website has information to

cabanier: for PT ?

Nick-Niantic_: for PT, unlikely to have this problem

cabanier: is it timewarped ?

bialpio_: ARCore introduces a lag .. when image tracking is on
… can not use camera for effects.. you might get frames from "future"

cabanier: we predict what camera feed will be

cabanier: hololens you can get access to one of the camera.. not all of them

bajones: hololens requirements are different.. not the whole scene

Nick-Niantic_: timewarping has to happen at some point.. image and timeline needs to be aligned

cabanier: api does not give you a snapshot

Nick-Niantic_: event based API

Nick-Niantic_: it does not have to run on every frame

Brandel: does it have to be raw stereo

Nick-Niantic_: does not have to be stereo

<Jared> Could be interesting to check out some of what is trending as being exposed for certain types of wearable XR in native APIs or extensions

<Jared> https://registry.khronos.org/OpenXR/specs/1.0/html/xrspec.html#XR_HTC_passthrough

Nick-Niantic_: does not necessary need to show to user

Jared: similar concept, underlays
… can be used as a prototype

<Jared> Sorry, I was mistaken. It doesn't give you access to the pixels.

adarose: new topic https://github.com/immersive-web/webxr/issues/892

webxr#892 Evaluate how/if WebXR should interact with audio-only devices.

<adarose> scribe nick: mjordan

bajones: likes point about AirPods as audio only devices, fairly common
… thinks that these are consuming 5.1 audio and hands-off?
… doesn't seem like audio generated on the fly. Can get 3-dof pose data, but maybe not a way to get data bask into the scene?
… are they designed to be interacted with as controllers necessarily...
… Bose devices were trying to explicitly be XR devices.

Manishearth_: Can have devices that are audio only. Like a headset without lenses. Could be able to get poses from certain devices.
… main benefit is that you could get pose based control, as well as controller based control.
… experience that you want to work everywhere might not need pose because you have other controls. but if you don't have those devices, could you initiate session that is not backed by one of those devices. Might be good to look into.

Brandel: headphones are looked at as display devices
… public available apis do return at least 3dof, and sometimes acceleration.
… back to Gamepad discussion earlier, you can get orientation from gamepads, and those can be considered.

Jared: Anecdote - ghost story narrative where it's audio only with directional whispers, etc.

Nick-Niantic_: Curious about expectation.
… for audio-only headset, you are looking at a device in real space, so if you go into immersive mode, what should happen?
… What is the expected use case, interface, user experience?

adarose: Like HTC vive attached to computer, maybe render a 3-dof view.
… on the phone. Or maybe doesn't render something, but you still get 3dof audio.
… could run independently on device, but maybe get audio transcription on device.

Nick-Niantic_: do you need an XR session for that?

bajones: probably inclined to treat as a different type of session?
… could get back poses every frame.

Nick-Niantic_: What would happen on quest, when you ask for immersive audio?

bajones: might have limitations because of the device functionality.
… maybe normalize around devices where this is the norm, or expected use case?
… if you're trying to get poses, you could do some interesting accessibility stuff, sensors might not be super accurate?
… Would give it it's own session type.

CharlesL: There is a link on how OpenXR should interact with audio only device, but not a lot of info there. Blind users do turn off their screens to this seems reasonable.

Dylan_XR_Access: Being able to support things like spatial sound is necessary for a lot of experiences.
… should support those use cases.

adarose: shouldn't do a different session type, would say something about the user not wanting to view video.
… can get immersive sounds while not having have an imersive session

cabanier: can't find 5.1 support in browsers? Certain devices, or special video formats may have that. What do we need to do to get 5.1 support in browsers?
… maybe manually decoding the audio streams?

Chris_Wilson: Can be supported in web audio, but need to use 3d panner to do positioning, but nothing that does 3d panning inside of 5.1 audio.

bialpio_: I know we don't say how many views you get, but can we say we get only one?

bajones: you only get 1 or 2, unless you explicitly ask for them.
… . even if allowed, wouldn't want scenarios where you get 0,
… don't want to expose the fact that users are using accessibility settings. So you could advertise that the content is maybe audio-only, and put the acceptance choice back on the user.
… try and make the page as unaware of what the user chooses as possible.

bialpio_: Be careful how you classify the session if there is a special session for this, so that it doesn't give that away.

adarose: should not be able to secretly overlay other audio over user's expected streams

bialpio_: had to refactor part of the spec for this.
… do we pause other audio if they're already running?

: sometimes background apps can play audio and sometimes not
… sometimes confusing around ducking audio from other sources.

??: we say exclusive audio, but maybe not exclusive-exclusive. Sometimes the OS can interrupt, etc.

cabanier: chrome will sometimes keep running when display is off
… audio session might be like that.

??: exclusive used to be the term, but now it's immersive

adarose: if a media thing wanted to differentiate, there would be a difference between directional and directional where moving your head did something.

rigel: walking down a street where you can have audio-only immersion as you go down a street would be cool.
… different elements in a scene have different emitters, but currently tied to phone position. would be neat to get head pose instead of having to move phone around.
… today need to move phone around.

??: on issue for this topic Jared had a link to someone on the native side with motion info from the native side

brandel_: Can shake or nod and get that input from the headphones.

Jared: Using tools like Unity, you can use things like colliders and things are helpful for making immersive audio experiences.

adarose: This seemed like a fun thing to end the day with. and this was a lovely discussion .

– DRAFT –
Immersive-Web WG/CG face-to-face day 1

24 April 2023

Attendees