Immersive Web WG/CG F2F

29 Jan 2019



cwilso, madlaina-kalunder, Jillian_Munson, dom, NellWaliczek, lgombos, trevorfsmith, Alberto_Elias_(remote), Atsushi_Shimono_(remote), Fernando_Mozilla_(remote), Hirokazu_Egashira_(remote), Laszlo_Gombos_(remote), Artem_(remotely), bertf, jungkees, ada, Madlaina_Kalunder_(remotely), Phu_Le_(remotely), Winston_(remotely), Tony_Brainwaive
Ada, ChrisWilson
Manish Goregaokar


<trevorfsmith> Hey, folks. We don't have wifi at the venue, yet, but we're working on it.

<trevorfsmith> We're doing a round of intros in the meantime.

<trevorfsmith> For some reason the WebEx chat isn't working.

<fernandojsg> trevorfsmith: I just wrote something and it seems to work on my side

<trevorfsmith> Hmmm. The third try worked. Strange.

<trevorfsmith> Ok, we have no wifi in the venue yet (😢) and the video stream isn't working. The Samsung site people are working on it, but it's a tough situation.

<fernandojsg> ok

<fernandojsg> I guess we will have audio at least right? :P

<trevorfsmith> We're setting up a WebEx audio channel.

<fernandojsg> cool, thanks!

<trevorfsmith> Ok, we're calling a 10 minute break while we attempt to get this sorted. Sorry, folks! We thought this was taken care of beforehand.

<cwilso> No audio on WebEx?

<fernandojsg> cwilso: is working already I can hear ada talking

<cwilso> Cool, thx

<dom> scribenick: johnpallett

<scribe> chair: trevorfsmith

<scribe> chair: cwilso

<scribe> chair: ada

<trevorfsmith> Ok, we're going to get started. The IRC channel is the place where notes will be taken and where you can send in questions. The Audio in WebEx should be working.

<trevorfsmith> Unfortunately, there's no video. 😭

<trevorfsmith> We'll link to the PRs and Issues in this channel so that remote folks can see what we're talking about.

Input and Controllers #336, #392 Pull #462

nell: this issue is long-standing; late 2018 bajones put forward a PR to expose input information
... had some input about the problem space, now presenting an alternate proposal
... framing of problem space
... when talking about motion controllers there are 5 things:
... (1) render it in the scene (2) get axis data from controller (3) map the two first things together, i.e. render like it is in the real world
... (4) render a legend around the motion controller to teach users what to do
... (5) get events that can drive actions
... first proposal only addressed #1 and #2; thank you for excellent feedback

bajones: going through proposal now... this is PR 462 and very recent PR 499

<dom> Custom interface for controller button/axis state (Variant A) #462

<dom> Gamepad based button/axis state (Variant B) #499

bajones: two variants of how to expose more of the intrinsic state to developers. Started off originally saying 'input is a messy thing, several APIs will have conflicting ideas, maybe we can expose a single button' but that had restricted utility and only worked for some use cases.
... lots of helpful feedback saying we needed more capabilities.
... e.g. need to get access to whole state of controller, e.g. buttons and axis
... Also, what is the controller that the user is holding? This is important so that rendering the virtual controller looks like the actual controller being held.
... That's extra-important for tutorials where you need to point at specific buttons and say what they do
... ended up with a unique ID that specifies which controller type the user is holding
... e.g. "Oculus-Touch" or (per Alex) something a little more mechanical such as USBVendor+ProductID - not sure yet if this works for Android+Bluetooth and certain other platforms
... but otherwise, ideally it's not something that everyone has to hardcode themselves.
... A few caveats - (1) no handedness included, since there's already data on the controller that says this
... e.g. oculus touch controllers have handedness
... And (2) this is explicitly carved out for privacy-sensitive situations, where the UA can report "unknown"
... e.g. the UA's privacy policy says you can't provide the controller ID, and then the site needs to render something generic
... we're trying to avoid enabling developers to filter out users based on input type, i.e. "If your controller isn't a Vive, you can't access this site"
... but it's probably a good idea to use this ID as a basis for a lot more mapping... and ideally there's a community-driven, open database that allows developers to fall back to a database of controller IDs and types instead of hard-coding into the site or the UA
... proposal breaks into to parts regarding how we expose axis and button state

<NellWaliczek> The renderid proposal part is here: https://github.com/immersive-web/webxr/pull/479

bajones: They are labelled "Variant A" and "Variant B" in the PR

<dom> Add renderId to XRInputSource #479

bajones: tried previously to do something based on the GamePad API, where we'd inject XR-style gamepads into the gamepad array
... it was weird, and there wasn't a proper mapping, and because of that there would be cases where developers would improperly mask out gamepads because they weren't identified properly
... so instead we're inventing a new thing that looks kind of like GamePad but is a little more specifically structured towards what we're trying to do directly
... [in PR see interfaceXRTrackedControllerState and XRTrackedController for more details]
... this is a more attractive option because it's purpose-built, it's clearly XR-centric - makes types of controllers easier, e.g. triggers, touchpads, joysticks

<dom> [reviewing IDL at https://github.com/immersive-web/webxr/pull/462/files#diff-6ea1f8ee087a12d7d770e854f7dbadb7R428]

bajones: also allows the group to extend if we get new requirements

[reviewing IDL at https://github.com/immersive-web/webxr/pull/462/files#diff-6ea1f8ee087a12d7d770e854f7dbadb7R428]

bajones: in XRTrackedControllerState there is also a name string; it's a localized string so that it can work across languages if the developer puts it on-screen
... we don't love the interface name. "Input" is overloaded, also can become XRInputInputtyInput if we keep using it... but in the process of talking through the proposal we reviewed the GamePad API, it's already shipping in browsers and the language already exists.

<dom> [reviewing https://github.com/immersive-web/webxr/pull/499/files#diff-6ea1f8ee087a12d7d770e854f7dbadb7R416]

bajones: When we reviewed Gamepad API structure, and GamepadButton API - includes things like 'connected' which we'd ignore, and timestamps which we don't need, and mapping which could be reused to communicate that we're using an XR standard

<dom> [reviewing mapping at https://github.com/immersive-web/webxr/pull/499/files#diff-6ea1f8ee087a12d7d770e854f7dbadb7R344]

bajones: When using the mapping value xr-standard, buttons[0] would be primary trigger, buttons[1] is always touchpad/joystick click, etc. (see PR)
... so we could take all the common elements that we expose and then, rather than putting the gamepad into the gamepad array and doing a weird mapping, instead we'd put a gamepad source into the XR input sources. It wouldn't show up in the traditional gamepad array and that'd only be used for traditional game console-style controllers.
... upside of this approach - more compatible with gamepads. Downside - more documentation required for how mappings should be interpreted.

<Zakim> ada, you wanted to show how to add a message

bajones: we're going to have to a lot of that work in either case, so might be in our benefit to rely on work already done w.r.t. the gamepad API

nell: two related thoughts. First, I'm inclined to not hold up 'finishing' WebXR to create a brand new design for something that solves many of the gamepad problems.
... one opportunity with this approach is that we can work on improving gamepad, and decouples the spec work for XR from detailed input spec work
... tough to re-invent gamepad without all the gamepad people involved (though bajones is also involved with gamepad API)
... so that's the second thought, it'd be ideal if we can separate XR from detailed input work
... that's going on already with gamepad

bajones: after nell gives her section, let's do a strawpoll with people in the room, on IRC on whether the custom gamepad idea is good or not

<klausw> cswilso: wait for Nell

nell: [going through some more details on spec design] - recapped 5 ideas above on gamepad requirements
... 1. draw it. 2. when are buttons pressed, axis, 3. visualize data from #2, 4. legend for how to use controller, 5. eventing
... focusing on data sources, mapping back to visuals, and labelling for a minute (items #2-#4)
... web has rich tradition of things starting as open source projects
... when thinking about how to animate models, a user agent is not going to deliver a 3D model file for the controller - not their business, and they're big binaries, and it's not clear how to rig the model
... so what about a schema where for a given gamepad ID, what if we could define a mapping that broke down expectations for that model
... then we put those expectations in an open source public repo next to a 3D model showing how to rig that model based on those expectations

webex: fail. then recovers. apologies.

nell: ideally then vendors can contribute their own models and schemas for how the models should be represented.
... now talking through schema and how it should be represented (this is brand new)
... [referencing branch at: webxr repo -> branches -> gamepad mapping schema] - this isn't a diff, they are brand new files

<dom> gamepad-mapping-schema branch

nell: referencing: https://github.com/immersive-web/webxr/tree/gamepad-mapping-schema gamepad-mapping-schema branch
... talking through schema in the context of the schema (JSON) files for a few controllers

<dom> oculus touch description

nell: thanks dom
... example: https://github.com/immersive-web/webxr/blob/gamepad-mapping-schema/gamepad-descriptions/Oculus-Touch.json oculus touch description
... id tells you what the controller is
... some of the names aren't ideal, ignore them for now.

<dom> [reviewing https://github.com/immersive-web/webxr/blob/gamepad-mapping-schema/gamepad-descriptions/Oculus-Touch.json#L71]

nell: looking at the physical building blocks of motion controllers: thumb sticks, D-pads, buttons (analog or digital, touch or not), and ...
... buttons: analog, digital, touch or not... but they're pressed, or not.
... thumbsticks: have a direction, and a pressed state
... touchpads provide X/Y data which is where your finger is on

<dom> [reviewing https://github.com/immersive-web/webxr/blob/gamepad-mapping-schema/gamepad-descriptions/045E-065D.json#L79]

nell: touchpads might also have edge or center-press buttons; almost D-pad like where you can put your finger on the diagonal
... note that that has rendering consequences, but this file breaks apart rendering (animation) from data sources
... note that schema file follows gltf conventions - could switch to ids later to match
... then in each element, 'gamepadAxisIndex" is how you map back to the gamepad mapping
... ... where this gets interesting is where we look at responses section

<dom> [reviewing https://github.com/immersive-web/webxr/blob/gamepad-mapping-schema/gamepad-descriptions/045E-065D.json#L100]

nell: it's another array of chunks with a set of response types. These are the pieces of the model that contribute data to decide how the model is deformed
... for example, if you touch the thumbstick it deforms the thumbstick parts that move, relative to the parts that don't; the value of each x/y axis and button value, to create a combined transform that should be applied to the thumbstick.
... the maker of the model file should put nodes that have no model data, just used for transforms.
... then the schema defines which node in the tree should have the transform applied to it
... there isn't a dedicated thumbstick press item since you need to touch it to press it
... touchpad 'touch' moves a dot indicating where it'd have to be on the model.
... extents show bounds. Don't need to know anything about the model file itself, schema maps to the file directly.

<dom> [reviewing https://github.com/immersive-web/webxr/blob/gamepad-mapping-schema/gamepad-descriptions/045E-065D.json#L17]

nell: wanted to make sure we were as flexible as possible
... in our example, data source 3 is the touchpad - the 'labelTransformId' is what you use to indicate the safe place on the model for a label
... that's the components that say what makes up the visualization of the controller
... if you look at the oculus-Touch file

<dom> [reviewing https://github.com/immersive-web/webxr/blob/gamepad-mapping-schema/gamepad-descriptions/Oculus-Touch.json#L5]

nell: there's also a 'hands' section which gives you a connection between the visualization of hands with the controller IDs and how they should be connected
... note that the left and right hands aren't the same in this case since the controller isn't symmetrical
... looking at primaryAxes, there's typically a default suggestion from manufacturers as to what the default button should be. This is still hand-wavy
... in general this approach could work with either of the proposed Variant A or Variant B

<Leonard> +q

nell: there's also one more top-level thing not in this example which is user agent overrides. This section would allow developers to avoid manual hacking in around particular names; would give an escape hatch around bugs in the browser.

<dom> Schema explanation

bajones: So this is a lot. The intent is to NOT require the use of XR controllers in the browser, rather it's a way to provide a more robust, professional experience to web applications that want it.
... if you're just building a video player you can probably ignore all of this, just render a remote control in the user's hand.
... you can build something consistent and reusable without diving into the deep end.
... but we know there are developers and users out there that want to provide the best experience for developers. This is an attempt to provide that without being a nightmare of encoding for everyone involved.

nell: related to that, if it's in an open source repo then proposed models could be ingested and tested online so that nodes are hooked up correctly

josh@mozilla: to clarify, this isn't a spec, this is something you'd bundle into your application, right? (nell: yes)

josh: I like the idea of reusing old APIs. The problem I see is the advantage/disadvantage is that it's tightly coupled to the gamepad API and associated bugs... generally the implementations are buggy and inconsistent.
... concerned about coupling to something outside our control

nell: me too. As part of the schema, X-axis could have a left and right value assigned to it, so that the file says what range to use instead of the gamepad API to override bugs
... but if a UA has a failure to comply then that's where the UA-override section would apply.
... one thing that's flakey with gamepad is that getGamePads call themselves. That wouldn't happen here since they're being returned from XR input sources

bajones: and, as an implementer of gamepad in Chrome.... I'm personally deeply sorry.
... and please log bugs!
... for this spec, though, we can say 'we're using the API in this way, and when we do these restrictions apply'
... but there will always be bugs, we should fix them when they come up. In general, though, I agree with your concern and I believe we have mechanisms to address issues.

klaus_@_google: this proposal defines physical properties of the input device - question, it's common in some cases for buttons to be combined into quadrants

klaus: is the idea here that the community could help define nuances and different use cases and mappings?

nell: there's a lot of opportunity for improved API shape; community overriding buttons might be valid, but we're not sure yet, it's early. We can iterate on schema later. Not trying to pack all the solutions into the first iteration.

klaus: Do we want mapping to allow updates later?

bajones: it's not novel to have an action mapping approach; from my perspective it feels like a spec rabbit hole where we're trying to ship something.
... doesn't personally feel like it's the right approach for v1 of the API to have complex mappings. Could in theory have JS libraries that provide mapping capabilities in the future; maybe something at the UA level for mappings in the future.
... e.g. VR API can give similar functionality by scheme. We'd have opportunities to make those more nuanced at the UA level without doing a web API
... but may have action mapping API later

<Zakim> ada, you wanted to ask about l3/r3 style buttons

ada@samsung : wanted feedback about touch on joystick -- could be useful for controllers for L3/R3, was trying to see if that was something useful

<Zakim> johnpallett, you wanted to ask whether specific types of inputs are necessary or whether generalized axis/input data is enough

<ada> JohnPallet, Google: Were alternatives considered for the schema implementation?

johnpallett: specifically, a more generalized axis/button approach that didn't rely on specific D-pad, Touchpad, Thubmbsticks?

nell: one point of clarification, hand input wouldn't be part of this, e.g. a knuckles controller that has buttons and joint controllers would have multiple inputs
... we can rev the schema to support new types of inputs - this is a library in the proposal - e.g. some new 'D-pad'
... can support fallbacks in the future if a UA is behind the times. For right now, we did an inventory of motion controller hardware that I'm aware of to make sure all known components are covered.

johnpallett: clarification: but do you need defined types? can you just use axis/buttons or is there a reason for 'd-pad'?

nell: not sure yet. Was thinking about things that contribute to motion. It's possible that could encapsute all data source types into a single model for motion and inputs

<klausw> ^^^ clarification for 10:58 "combined into quadrants" - a trackpad may be treated as four buttons by splitting it into quadrants, for controllers such as the Vive wands which don't have many buttons. Question was if this is up to app/library, or if we want to expose a mapping layer at a lower level.

nell: this current revision was a way of thinking through ways of combining ideas... will continue to explore

bajones: also, this isn't tied to a particular file format

nell: used the term 'node' - we need to understand the dependency

leonard: likes the idea of not holding back XR 'version 1' to get this all correct
... curious about capturing motion of the entire controller, or detailed press like a Wacom tablet, will these be considered?

bajones: actual motion (e.g. velocity, acceleration) - because we're not trying to add onto the gamepad API (unlike previous, not good versions) - any sort of pose or velocity information would come from the input source and space attached to that.
... we have an xrPose and there's a transform, and that's where we could slot in acceleration and velocity. We haven't right now because it's not clear whether can get a consistent signal from the native APIs.
... but we have a way to scale up to that if necessary.
... larger-canvas tracking might require more axes in the gamepad API, not sure about ways of doing multi-touch input. No great answer at this time, may require something other than the gamepad API

leonard: not clear on future-proofing, though?

bajones: could extend default mappings that we have for gamepad API. Likely evolution: Start with a gamepad approach, the move to an action-based approach.

<Zakim> alexturn, you wanted to discuss semantic clustering of the button mappings vs. explicit trigger/touchpad/joystick/grip mappings

alex turner_@_microsoft : This feels like layer A / layer B where things exist at the app layer, and there's a question about how things could get built on.

alexturner: Need to figure out how much of this needs to appear in the standard itself vs. an external thing.
... proposal may be mixing two separable things, (1) do we get shot down if we do a new API? and (2) do we have explicit grip/trigger/other types of inputs? Are these two design principles that could be separated?
... could be some benefits to relying on a different approach to semantic clustering
... and, what if someone uses button[0] and assumes it's reliable in a particular way, and then ignore voice-based or hand-based input
... could be subtle accidents as well - feeding data from one type of input to another type by accident (e.g. default at 0, vs not) - wonder if this next layer of abstractions will help?

bajones: going with gamepads forces us into semantic clustering, you get arrays and indices and that's it.
... that's why we want this more detailed approach of mapping
... this isn't the final say of how we expose input through the API. It's intended to be 'most impact, least effort' to ship API and then we'll have to layer more data on top of the API later.

alexturner: I think we can still separate them. Think of gamepad inputs, if you had same degree of explicitness you'd have more empty slots but you'd have extended IDs and other mechanisms for dealing with explicit types.

cwilso: observation, if we are going to rely on gamepad it seems like it'd behoove us to take a stronger hand in developing the gamepad API
... right now that connection is <fanfare> bajones who is an editor there

nell: At TPAC the chairs were asked whether the gamepad API should become part of XR

cwilso: gaming on the web is the other use case outside of XR. Gaming on the web doesn't have a group working on it.

nell: my answer at TPAC was - let's prove we can ship one spec, then consider a 2nd one.
... but yes we definitely should invest in gamepad API over time

bajones: Let's do a strawpoll, this is non-binding
... do you have an opinion on whether we should go with (1) custom button/axis solution, or (2) ride on gamepad coattails?

nell: (note that question about the schema is orthogonal, ignore that for noow)

<ada> stawpoll imminent: If you are not in the room +1 for should we do a schema or +2 for gamepad

<albertoelias> +2

bajones: Type +1 for custom solution, +2 for gamepad solution

<lgombos> +2

room: 1 vote for custom solution (#1), many many votes for gamepad solution (#2)

<albertoelias> I think we should aim for the simplest route to get the spec out there, but that gives developers access to all the underlying details controllers provide. We can then aim for nicer APIs also looking at what kinds of things libraries do

(thank you IRC participants for voting, overwhelming for #2)

<ada> Ada would like to change her vote for the gamepad solution, so it's 100% support

thanks alberto, read your note to the room

originOffset behaviour #477

<dom> Does originOffset behave differently for identity reference space? #477

ada: https://github.com/immersive-web/webxr/issues/477 Does originOffset behave differently for identity reference space? #477

bajones: there's ambiguity in the text surrounding the originOffset for reference spaces
... it's designed to let developers say what origin all poses should be relative to. Useful for touch scrolling on inline videos, etc.
... but there's ambiguity in how it's described which could be one of two things. Purpose here is to pick which one is best.
... by default, origin of virtual space is on floor in center of room
... first way of thinking about origin offsets is that you take origin of physical room and shifted where it appears in virtual space, in the image it's Z=-3
... and X=1. So the whole room moves by X+1 and Z-3
... this was the intent but it's easily interpreted as option 2, where I apply the same offset but it applies it to the origin of the VIRTUAL world instead
... so it's either offsetting the origin of the real world, or the virtual world. Either is valid.

<alexturn> +q to discuss if it's well-formed to reason about the relationship between two XRSpaces if you offset both of them

<alexturn> +q to discuss if it's well-formed to reason about the relationship between two XRSpaces if you offset both of them

<Zakim> alexturn, you wanted to discuss if it's well-formed to reason about the relationship between two XRSpaces if you offset both of them

bajones: strawpoll, should origin offsets be applied to the tracking origin (i.e. conceptually to the physical world) - A
... or should it apply to the virtual scene's origin - B

nick_@_8thwall: it's common to see this in community apps where the user wants to position the camera in the scene they're creating

nick: so if you're starting at t=0 you want to start at a position in the scene.
... that's consistent with A. What I'm stuck on with B is directionality?

<Leonard> +q

bajones: order of operations is spec'd out in WebXR and should be unambiguous but...

<fernandojsg> Hirokazu_: could you please mute yourself on webex? I'm hearing some background noise coming from your mic

bajones: would expect room to rotate around new origin in A
... but in B... would need to think about this a bit more. This might be an argument for A?

alexturner: one argument for option B, what we're offsetting is moving something in it's natural coordinate system. So it's fairly unambiguous to describe coordinates as poses in the natural space, and then you're just specifying the origin.
... I find it easier to offset spaces relative to each other; this means moving the virtual space relative to a real-world origin.
... this means I can multiply in the offset relative to poses that I get back. Using B it's easy to address multiple offsets, but with A it's harder.

bajones: the math is unambiguous, so it's really a question about which makes more sense.

leonard: does this apply to AR?

bajones: both in VR and AR you'd be adjusting virtual content and adjusting to physical world in some capacity, which we choose doesn't affect the math but might change usage and developer understanding

<alexturn> +q for alexis to ask a question

bajones: note that everyone's going to experiment anyway :)

brendan @ apple : we flipped the trackpad axis to move towards a more direct metaphor for input

brendan: where people had a mouse wheel rather than directly manipulating what user was working on
... since we don't have a point of view in the XR spec, it means it probably doesn't matter as much

<Zakim> johnpallett, you wanted to clarify understanding of Alex's point

josh@MOZ: Feels like the scrolling problem which dates to the late 70s so they just picked one

alexturner: Option B is about having a natural originn, and then you're moving virtual origins relative to it

<Zakim> alexturn, you wanted to discuss alexis to ask a question

johnpallett: is working on clarifying this for the IRC. :)

bajones: if you think about this as window positioning - it's totally natural to position a single window relative to my natural space
... and that matches AR as well
... but if you're fully immersed in the window, though, it starts to feel weird and backwards.
... so either will feel wrong in some circumstances
... So, strawpoll time!
... IRC strawpoll - issue 477

<ada> vote with +a or +b

bajones: If you like Interpretation in issue 477 please put +a

johnpallett: (commentary: this is offsetting the physical, natural origin relative to virtual space)

bajones: If you like Interpretation B please put +b

johnpallett: 7 in the room vote for A
... 12+ people in room vote for B

<fernandojsg> dom: could you please mute the people on webex? someone is snoring and I can't hear you correctly -_-

bajones: OK - there's no perfect solution, going with B for now. Thanks everyone.

Lunch. Back in 1h10m (around 1pm Pacific Time)

<Manishearth> scribe: Manish Goregaokar

<Manishearth> scribenick: Manishearth

<josh_marinacci> blerh

<scribe> chair: trevorfsmith

<scribe> chair: cwilso

<scribe> chair: ada

Session creation (#423,#424,#433)

<dom> Allow session creation to be blocked on required features #423

<dom> Define how to request features which require user consent #424

<dom> Added session feature handling to the explainer. #433

NellWaliczek: first a recap; this has been dormant for a bit

bajones: so for a little while there was this sense that we really wanted to have a way to do permissions within the api that tried to alleviate permissions dialog fatigure for users
... esp in AR use cases if you want to share environment information that's very privacy sensitive
... you need user consent, especially for cameras, environment objects, etc
... we were envisioning all these apis where you need to be able to request consent for various things but we don't want them to show up as modals
... there was some r&d saying we could ask for a bundle of perms at session creation time up front, which can show up as a single modal
... we've gone back and forth on that, we haven't committed yet, bc this would be quite a different pattern from what happens on the web today
... it's unlikely that we as a tech focused WG will come up with a solution for permissions fatigue on a whim
... so it feels weird to just say we wish to inject a new security model in our api and run with it
... there was a very direct conversation with NellWaliczek about what we actually need this for today (e.g. camera permissions)
... we prob shouldn't design an api we don't have many uses for ... yet
... this has led us to a point where we're looking for the required and desired features list, so it's kind of on hold on the editor's mind

NellWaliczek: the required and desired list ask comes from the permissions side but also because we don't want to spin up sessions when we will shut them down immediately
... things we're thinking of are: camera access, geolocation/orientation, spacial tracking stuff

<johnpallett> +q to ask whether the privacy & security explainer has been used as an input to the session creation conversationon

NellWaliczek: our goal is not to come up with a design, but to ask if the first version of webxr should attempt to unify such a model
... want to make sure we're not overengineering a solution; see if we can collab with privacy group (etc)
... i'd like to open the floor for thoughts on this

cabanier: we talked about this before when we thought y'all were going to looki nto the permissions api
... initially thought we could just inherit from the permissions of the origin -- why can't we do that?

NellWaliczek: we may have looked into it at the time but i can't recall if we concluded on something. but also we need to address the other side of avoiding session creation when perms aren't available

<max> Manishearth - you can use ... if the same person keeps talking while scribing

max: thanks


NellWaliczek: the only thing today that's qualifying is the spacial tracking thing

<dom> -> https://wicg.github.io/permissions-request/#api Permission Request API (split off from https://w3c.github.io/permissions/)

NellWaliczek: we can have it be an additive thing for now

<trevorfsmith> Nell and Rik are discussion that there are aspects of checking that hw can support a feature that are different than checking whether the user gives permissions.

<trevorfsmith> Nell and Rik are discussing that there are aspects of checking that hw can support a feature that are different than checking whether the user gives permissions.

NellWaliczek: some things may be permissions gates, but some may also be things the hardware doesn't support

<Zakim> johnpallett, you wanted to ask whether the privacy & security explainer has been used as an input to the session creation conversationon

johnpallett: so i don't have answers. but i think part of the discussion is on inputs and that's also been happening on the privacy and security repo
... there are two conversations here, one is the challenges with the permissions structure
... the other thing is a partial list of what the UA may wish to ask user consent for

<dom> Immersive Web Privacy and Security

johnpallett: happy to have the discussion but a lot of this info already exists

<dom> Explainer += [Cameras, Permissions] #15

NellWaliczek: should we recap this, or table for now?

johnpallett: can recap what i did at tpac

NellWaliczek: let's strike for now, talk about it on the next WG call and sync with johnpallett later

cwilso: i think we should have this conversation now

bajones: to give johnpallett a bit of time to prep let's switch to talking about hit testing now

Hit Testing and Anchors

NellWaliczek: i have a couple PRs in the queue

<johnpallett> johnpallett is ready to present the privacy slides from TPAC

NellWaliczek: first is about viewer space, building on the work that was done to unify pose retreival behavior

<dom> Add XRSession.viewerSpace and tidy XRSpace explanations #491

<johnpallett> (aimed at chairs) :)

NellWaliczek: #491, #492, #493
... #491 adds a viewer space object so you can relate viewer to other xr spaces in the world without mathematics

<dom> Restructures input-explainer.md to improve clarity and prep for hit-testing proposal #492

NellWaliczek: #492 should be fairly noncontroversial, not adding anything just refactoring. i realized that when getting into hit testing i wasn't sure where to add things so i restructured it to get a more logical flow

<dom> Adding an explainer for real-world hit testing #493

NellWaliczek: will merge both unless someone complains by tomorrow
... moving to #493
... big shout out to max and blair and (?) and alex who did some important prep work on the hittesting repo that was open on the CG
... based on a bunch of the investigations and explorations ...
... i spent a bucnh of time thinking about how to feather in the requirements for real world hit testing that would work across all platforms and hardware
... some points: #1 many uas are structured such that the tracking stuff runs in a separate process from the user's tab
... we have a choice where we can take that behavior and make it an async request, but this makes it near impossible to render a stable cursor
... the alternate is registering for hit test events from a particular source, which is useful for cursors
... with async you can have results packaged with the xrframe object
... if you look at how xr input sources are defined: they're created and destroyed during a tap, which isn't great if you're egistering async handlers
... we want to do this in a way that avoids undesirable perf hits on folks who don't wish to use this for hittesting
... https://github.com/immersive-web/webxr/blob/eeb899d38657a6c2bded097566dc41912c2bb8da/hit-testing-explainer.md#requesting-a-hit-test-source
... here's an example
... i've added an alternate to address some concerns about delay
... https://github.com/immersive-web/webxr/blob/eeb899d38657a6c2bded097566dc41912c2bb8da/hit-testing-explainer.md#automatic-hit-test-source-creation
... the dev that actually wants to opt in to hit testing can do so by providing a hit test source
... the third use case is when you want to just do a single hit test
... https://github.com/immersive-web/webxr/blob/eeb899d38657a6c2bded097566dc41912c2bb8da/hit-testing-explainer.md#hit-test-results
... goal for today is to make y'all familiar with this
... it's our first "real AR" thing
... on top of this is what we can use to build anchors
... thinking through it: if we want to place an anchor that position is also tied to a frame

klausw: quick comment, we probably need to return the hit test result relative to the current frame (not the frame it was requested on)
... should we support both options?

NellWaliczek: yeah we can support both too

<alexturn> +q to talk about which frame

klausw: as long as we don't require impls to support both bc it may not be easy

<Zakim> klausw, you wanted to say distinguishing spec-level permissions from UA dialogs and recommendations? and to say I think impl restrictions may require hit test results for the frame

max: just wanted to say i love this, it's great, it's addressing things i haven't thought of before
... based on my understanding of the general use cases for async you generally want to just place an object somewhere
... i think ergonomically this is great, works well with thread boundaries, great job!

RafaelCintron: correct me wrong but the current way XRFrame is specced it's only valid within the rAF of the session
... so if we use old frames they may not work when the user calls functions on them

NellWaliczek: yeah, XRFrames are currently short lived, but we could pin XRFrames till their promises go away
... not sure how worried we should be about this

bajones: (this is mostly nell's work, but): one thing i wanted to point out here is in the ideal world you want to ask every single frame to give instantaneous hit results immediately
... but it's not feasible
... the tradeoff is: do i want to know instantaneously what the hit test is , i have to schedule that ahead of time so i get results on a future frame
... the other alternative is "i want to knwo a one time hit from this ray, which comes back to me whenever". in that scenario it's questionable how much you care about the exact data of the frame it happened on
... in most cases it doesn't matter *exactly* where the hit occurred. if the accuracy *is* critical we can do the hit test source route which gets us sync results
... i think this api design gives us a balance between "i need to know exactly" (slightly more latent), vs "i need to know basically" (faster, but inaccurate)

RafaelCintron: the fact remains we need to store around all the XRFrame data

NellWaliczek: one alternative design is to put the request on the session object with a promise
... feedback im looking for is not whether or not this is a problem : if this PR is a good start at a high level
... would hate to hold up the whole PR on individual issues we can file

<Zakim> johnpallett, you wanted to ask nell to cover virtual scenario

johnpallett: could you cover the virtual object stuff, particulaly how the 3d engine would hook into what the intent of this is

NellWaliczek: when i originally sketched out the sample code i forgot we don't have occlusion
... so i wrote it so that you request virtual hit test wrt your pose and scene graph, and looked at which object was closer, and that's what you got
... it's problematic since real world objects can get in the way of the hit test
... sample code i put in here is that if you get a virtual hit test esult that will always win, since you may have accidentally put a real world object in the way
... and app devs don't have enough info to prevent that

bajones: one quick footnote: the TLDR of virtual hit testing is "engine code goes here", the spec doesn't actually give more than examples

johnpallett: one possible answer is to somehow extend this with things which are rendered with xr and things that are rendered with .. other things, but there are privacy concerned

<Zakim> alexturn, you wanted to talk about which frame

NellWaliczek: (this is just sample code, not part of the spec)

alexturn: in terms of timing it seems like there are three times here, time a (button press), time b (frame hit test query), time c (hit test is answered, in a different frame)

<johnpallett> johnpallett figured out that the virtual object hit test section of the PR was purely sample code and not part of the spec. Sorry it took me a bit to get there.

alexturn: the middle frame where you happened to make the request but didn't have the answer seems a bit arbitrary, it's probably not meaningful so we shouldn't focus too much on it

cabanier: it seems like you could have hundreds of hit tests at the same time, that let you scan the entire room.

NellWaliczek: they have to come from a source, can't have an offset on them. last time we decided we were not worried about this
... the idea was that you have an AR light vs full mode and the full mode just lets you hit test

cabanier: but such a session would need to ask for perms, yes

NellWaliczek: yes, this *must* be in an ar immersive session

<blair> we talked about AR Lite as being controlled by the UA, so it would be compatible with this

NellWaliczek: we can have lower-perm apis that let devs simply specify floor/etc info that won't leak issues

<blair> Ummmm ... what?

ada: you were saying you can only use this in immersive ar, what about inline-ar?

NellWaliczek: no such thing :)

<blair> what do you mean there is no lite thing?

<blair> oh, there's no inline, sorry. right.


we have inline-vr

<blair> audio in webex is crap, can't hear most of what's being said

max: i think we can just go ahead with what you have now and file issues. this async version will interact with anchors, so getting this out of the way so that we can answer questions about anchors is good
... this can be a convo we revisit as we move further

(sounds of agreement across the room)

<blair> +1

NellWaliczek: straw poll on the PRs?

<adrian> test

<max> +1

<dom> +1

<bajones> +1


<dkrowe> +1

<johnpallett> +1

<blair> I'm unable to understand what most people are saing ... lol

<adrian> +1

<cwilso> +1

<dulce303> +1

<blair> +1

<Kip> +1.1

<alexis_menard> +1

<daoshengmu> +1

<klausw> +1

<dom> +♥

<trevorfsmith> +1

<max> @blair :(

<bertf> +1

<blair> yes

<josh_marinacci> +1

<jungkees> +1

<blair> I vote in favor of all the mubling

<blair> mumbling

<trevorfsmith> +🌸

cwilso: i declare this passed, go forth and merge

<blair> good job, nell

NellWaliczek: congratulations! we now actually are XR!

<trevorfsmith> WOOT

(loud clapping and whooping across the room)

cwilso: 10min break
... we have rewrangled the schedule
... not going to cover privacy (which we added) bc john wants us to look at the repo

FoV / ViewMatrix (#461/#447) Expose combined frustrum(#203) Default FoV for Magic Window (#272)

bajones: there are several different topics we have clustered around dealing with different FOV forms

<dom> Remove ProjectionMatrix in favor of FOV values? #461

bajones: starting with #461
... we have two ways of doing FOV
... (snip)
... we also have largely for historical reasons a viewmatrix , a thing you can feed directly into webgl
... and there's also the projection matrix, a 16 element float32 array for feeding directly into webgl, which gives you all of the FOV values for a given view encoded into the format

the matrix is quite flexible

scribe: this can include many things like scale, view, etc
... mostly not used. but we have heard in the past, certain devices (hololens) broke the entire spec bc you can do arbitrary projection matrices
... we want to undo those changes now, maybe?
... is there a use for continuing to provide matrices but can we instead break this down into fov angles? some apis do this (oculus)
... impetus: i've heard anecdotally that people are using the projection matrix for fov values. i'm not sure if there's hardware out there that supports arbitrary projection matrices over FOV values
... would like to ask if we can/should remove the matrix in favor of FOV or just keep it

cabanier: i've talked with the openxr folks who are running into the same issue, and we do need the 16val matr
... they say if you go and do away with the matrix the result will look blurry

bajones: okay, that's an important datapoint -- if we can identify at least one piece of hardware with this limitation we must handle it
... it's not unreasonable similar issues will crop up on other hardware
... not sure how we can address cases where people decompose these matrices into the four values
... and now you've broken this code on some harware

Klaus_Weidner_google: the way i understand it is that each view can have different directions for their transforms which can already cause issues for a threejs camera


Klaus_Weidner_google: the projection matrix in ML is just a normal projection matrix

<Zakim> Manishearth, you wanted to give direct access to nullable fov values

Manishearth: perhaps we should just give them access to FOV values which are nullable so they're forced to think about it

bajones: folks may just make assumptions based on hardware they have and end up with lots of nulls

<alexturn> +q to talk about shader assumptions

bajones: kernel of truth: we should find out what the data folks are trying to actually get out is, and perhaps provide that directly

<Zakim> dkrowe, you wanted to see if we can provide both a matrix and additional data with no assumptions

dkrowe: is there also some value in giving values like FOV with the understanding that folks will still use projection matrix
... i.e. will people construct projection matrix from the FOV

<kaiping> U

bajones: even when there are red flags all over the place saying use THIS not THAT people will still do it wrong from looking in the devtools or inspector or something and it gets copied around
... we can't necessarily save people from themselves ;)

NellWaliczek: that said we try to create pits of success, not pits of failure
... carefully providing bits of ata like "the culling FOV" (not the FOV) is one way to solve it

bajones: we can also do things like providing a culling frustum

<Zakim> alexturn, you wanted to talk about shader assumptions

<leonard> +q

alexturn: what do we do for the actual rendering projection, how do we communicate this to the user
... do apps assume that the two views are parallel, etc
... for the most part i've found that the projection assumptions are less about your engine ingesting the matrix
... but you shaders may not expect weird projection matrices,
... it may not be an actual big deal, but perhaps we will see it a lot, we should go for more compat
... the last part is: sometimes the way between this balance of power vs success is a bit of a negotiation process, so if an app is going to make such assumptions by default we provide a "normal" proection matrix and engines can request the real matrix

s/request this/request the real matrix/

bajones: this is a good point. there's a difference between only providing the wrong info to a subset of users: doing the wrong thing by defualt vs people stumbling upon the wrong thing

<adrian> on the user end this is driven by copypasta

a l

leonard: so this is a quick question: i know the matrices won't cause problems, but for simpler things will this interfere with or disable orthographic projections

bajones: i can't imagine what this kind of thing would be like

leonard: handheld or architectural situation?

bajones: might be out of scope, the way our pipeline works has high affinity to this
... high affinity to projection views. have a hard time envisioning it
... regardless, if such a thing were to happen, providing proj matrices is probably the safer bet

<adrian> I'm afraid of the wrong results (correct on most headsets) becoming a snippet on stack overflow or something, but which is wrong on a lesser used platform (like say ML)

<adrian> so I would say projection matrices should be the only input, leaving decomposition to the engine/user

<adrian> for which there would be copypasta code for sure

<Zakim> Manishearth, you wanted to an example use case

<Zakim> NellWaliczek, you wanted to ask about inline fovs

Manishearth: real world use case: porting an existing application that cares about these values. correct solution is to rewrite, but that's cumbersome

NellWaliczek: i'd hate for us to be in a situation where we encourage devs to always use projection in immersive but not in inline

<Zakim> klaus_google, you wanted to say quick handwaving of what is and isn't covered by FOV angles for matrices

klaus_google: to the best of my understanding if you express angles/etc as these matrices you can represent any rectangular thing as these. as long as the screen is rectangular you should be good with just angles

NellWaliczek: but that leads to the q: as that scrolls up the page (in inline), how do we give devs control over which way they want it to work

<Kip> Assumption: No curved displays in inline? https://images.techhive.com/images/article/2013/10/lg_g_flex_02-100066355-orig.jpg

bajones: straw poll time!

<cwilso> +1 if you think we should stick with a projection matrix, -1 if you think we should explore some other structure

bajones: who in the room feels reasonably strongly that we should stick with a projection matrix, or who feels we should be investigating alternatives to the proj matrix for communicating that info

<klaus_google> to clarify, left/right/top/bottom angles (+near/far distance) should be able to express a projection matrix for a rectangular screen in an arbitrary orientation in space, including tilted HMD screens. Forward vector points in a perpendicular direction to the plane the screen is in.

<cabanier> +1

<ada> +1

<RafaelCintron> +1

<adrian> +1

<klaus_google> -1

<dkrowe> +1

alexturn: are we voting on the exact format of delivery or the constraints we have?
... i.e. if we say we vote for a matrix are we voting for a fully general matrix

<trevorfsmith> +1

bajones: we are voting on a highly documented matrix or a hopefully self documented structure

<art> +1

<leonard> +1

<jungkees> +1

<Kip> =1


<Kip> +1

<alexis_menard> +1

<alexturn> -1

bajones: straw poll result: no clear conclusion. prob means addl discussion
... we should jump around to more FOV topics
... moving to #272
... great many cases in XR where we have the projection terms dictated to us
... hardware says "this is how you show it or it's wrong"

<dom> Default FoV for magic window canvases #272

bajones: i.e. i'm on my phone, just looking at the screen, for inline content, there's nothing dictating what the FOV should be
... there's really nothing to go off of unless you have fancy tracking tech
... ideally devs should be able to give us feedback on what the fov should be
... (a) what should the default fov be?
... (b) also give devs a way to specify what they want the fov to be
... do we want to allow devs to specify horiz or vertical FOV, sometimes they want one sometimes they want the other

<leonard> +q

bajones: what would be a sensible way to come to a reasonable default FOV so when you create a session .. something useful comes out even if you haven't set the numbers

(?): certainly within a mobile context you can figure out some of these numbers like skew wrt eyes

(unsure if i got that right, it was very fast)

bajones: there are some cases where we know exactly where your eyes are wrt that canvas
... this does bud up into privacy issues
... bc with this info you can do gaze tracking, which advertisers would just ADORE
... so we need to have some perms model before we expose this, so we still need a reasonable default

klaus_google: if you're using a phone the actual fov the screen extends over is (??)
... possible just return a null proj matrix and let the app figure out that it's arbitrary

bajones: would love to avoid people having lots of if statements in their render pipelines
... don't want this to get wires crossed with immersive mode

<Zakim> Kip, you wanted to recommend allowing user override (eg, wider to reduce dizziness)

<Zakim> klaus_google, you wanted to say do we even need a default FOV? Return a null projection matrix to clarify it's arbitrary?

<klaus_google> ^^^ 15:05 clarification "screen extends over" - the actual angular extent of a phone screen at arm's length is very small, so you'd only see a small part of the scene when you use it as-is. Usually you'd want an arbitrary larger FOV for that scenario.

leonard: 0.4 - 0.5 is good in my experience

bajones: we could allow for crazy nonuniform scaling but this can break assumptions
... i feel like we should allow for both and default to horiz if we have to guess

<klaus_google> ^^^ 15:07 clarification for "0.4 - 0.5" - that's 0.4-0.5 times Math.PI. 0.5 times PI would be 90 degrees.

<adrian> why not just leave it up to the UA

<adrian> and make it choose something "reasonable"

<dkrowe> +q for changing FOV after inline session begins

Nick-8thWall: just a couple observation from what we see: at 8thwall we encourage devs to not just show the camera feed, so we ask them to expand the FOV
... so we actually encourage devs to be aware of this issue and handle it themselves so to me it shouldn't be a requirement in the api to return a fake value
... you're already communicating things like your clipping planes in these apis

so it's not too hard ask what the fov should be

Nick-8thWall: so it's not too hard to ask what the fov should be

bajones: fascinated with the point about ar without the camera feed, thank you for bringing it up

Kip: mine kinda segues into that. i shipped a game with a magic window without a camera feed
... we got some insights about why people want the FOV
... there's a bit of a sensitive issue related to a11y. age of users also came into effect. also limitations of eyesight
... given the ability to vary FOV they were able to hold the devices more comfortably
... giving the UA control over FOV lets us tweak these things per-user

bajones: this is a good point. the structure of our api means the UA always has the final say, so this is all hinting to the UA
... so the UA can have the ability to do what it wants

Kip: also, streaming contexts (e.g. on twitch) may wish to tweak this

<Zakim> dkrowe, you wanted to discuss changing FOV after inline session begins

dkrowe: question about fov in inline sessions i had: i can see the alue of having a consistent rendering path but if someone already has some application (some webgl app with a camera etc) and they're using the inline session just to get the tracking
... (with a headset it makes sense to use the matrices), if you have an inline session you may want to switch between multiple cameras but in immersive you may want to lock to one
... rephrasing: seems like making them specify fov upfront , does it mean like this is what folks will opt in to to keep their rendering path

bajones: if you're talking about an existing webgl app i feel like they won't take advantage of the inline capabilities
... but if it's ground-up immersive we already have feedback that folks want to keep a single rendering path
... we have an api like updaterenderstate where changes you make occur in the next frame, so you can't change properties in the middle of a frame. but you can change them at any point in the run time, and if you gave such a hint you can use that to do things like pinch zoom
... it's not just something you set at creation time and it stays forever
... (this is an old issue we should update)

<Zakim> NellWaliczek, you wanted to respond to nick

NellWaliczek: i want to comment and clarify something Nick-8thWall said
... you said folks were using 8thwall as a control mechanism for an opaque experience
... in the context of webxr this would be an inline session that has gone fullscreen with tracking access
... it's not actually ar as you're not seeing the world, but just using tracking -- it's an "opaque experience" that has access to tracking
... related to the questions of should we or shouldn't we allow inline sessions to have matrices go through XR, i don't see much value here
... i feel they should be specced to be nullable but non-null for immersive
... nothing XR adds to the mix by having them be non nullable, aside from *one* case: drawing rays for a screen tap
... catch is if we do this we can't have screen based input sources to behave well
... doesn't seem unreasonable to have a special system but limited to inline sessions
... i think we have enough info to say: we don't want arbitrary defaults
... and dev can provide them

bajones: i don't think we want to say we don't want arbitrary defaults

NellWaliczek: but we need a way to set them from the outset

bajones: agreed

<Zakim> johnpallett, you wanted to ask Kip whether they have details on how precise the custom FOV needs to be

NellWaliczek: okay, i think i have enough to make a PR here

johnpallett: followup about a11y from Kip's comment: is there a fingerprinting risk?
... and do you have data on how precise this needs to be?

Kip: so the ui wasn't an analog control screen, it was basically a pinch-zoom-ish ui, which we can roughly quantize

johnpallett: can you list specific use cases?

Kip: some people needed to hold the device further away to compensate for eyesight, but now you need to view a narrower FOV so you can see the text
... and others aer holding it closer and need a wider fov

johnpallett: will look into research from a privacy pov

<josh_marinacci> if we have extra time I would like to do a lightning talk.

lightning talks

RRSAgent please draft the minutes

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/01/29 23:42:58 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/eifjccibejtvirnghnrbldhbccfcghidvthcrvefihfr//
Succeeded: s/bajonoes/bajones/
Succeeded: s/#10/#1)/
Succeeded: s/ @ /_@_/g
Succeeded: s/veiwer/viewer/
Succeeded: s/request this/request the real matrix/
FAILED: s/request this/request the real matrix/
Present: cwilso madlaina-kalunder Jillian_Munson dom NellWaliczek lgombos trevorfsmith Alberto_Elias_(remote) Atsushi_Shimono_(remote) Fernando_Mozilla_(remote) Hirokazu_Egashira_(remote) Laszlo_Gombos_(remote) Artem_(remotely) bertf jungkees ada Madlaina_Kalunder_(remotely) Phu_Le_(remotely) Winston_(remotely) Tony_Brainwaive
WARNING: No scribe lines found matching ScribeNick pattern: <Manish\ Goregaokar> ...
Found ScribeNick: johnpallett
Found Scribe: Manish Goregaokar
Found ScribeNick: Manishearth
ScribeNicks: johnpallett, Manishearth
Agenda: https://github.com/immersive-web/administrivia/tree/master/F2F-Jan-2019

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: 

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)

[End of scribe.perl diagnostic output]