<atsushi> rrsagent. make log global
<atsushi> rrsagent. make log global
<atsushi> Charter Review [Chris and Ada]
<atsushi> rrsagent. make minutes global
<trevorfsmith> Scribe: Trevor F. Smith
<trevorfsmith> ScribeNick: trevorfsmith
<atsushi> rrsagent. make minutes global
Chairs: Chris Wilson, Ada Rose Cannon
Brandon: The big status update since the last f2f is the WebXR spec has shipped in a couple of browsers. Chrome and Oculus browser. Is it in Edge?
bajones: And Firefox is on the horizon.
kip: Yes, soon.
bajones: That's pretty cool, people are already doing useful things with it. We still need to get the spec to candidate recommendation status and that's been on the editors to push the last bit. Holidays and other aspects have interfered, but we're pushing through and should have CR fairly soon.
... The AR spec is pretty far along.
Manishearth: The AR spec has on remaining item and will remain pretty small, and it's pretty close to ready.
bajones: For handheld, the AR API will be in Chrome 81 which is in beta end of next week. So, it's on track. Along with that we're going to be shipping hit-test functionality.
piotr: We're shipping non-transient and transient varieties of hit-testing. It's to be in 81 which be stable mid-march.
bajones: Other work is happening, since those are still in progress we'll address those more individually over the next two days.
... We did ship the initial version of the WebXR input profiles lib that Nell was working on. So, that's code and art assets for controllers so it can be consumed by implementers. Three.js and Babylon are picking it up and others have expressed interest. I'll have a lighting talk about that later.
Manishearth: A major piece of work for the core is permissions so we need to get that through final review.
bajones: Another in-progress item is DOM overlay, which is being pushed mainly by Klaus from Google, primarily pointed at mobile (handheld) AR but could eventually be lifted up into headset AR.
... That is aimed to ship in Chrome 82.
cwilso: We used the input for CFCs for topics for this meeting. We have more room for lightening talk and also if you have bigger topics let us know. The calendar is somewhat flexible.
Manishearth: I intend this to be mostly an intro. Right now the hand input repo is a bunch of Issues and I have an idea for an API but it isn't fleshed out.
... The idea is to have articulated hand input, knowing where fingers and hands are. Right now, hand input is done by emulating a controller with primary and squeeze input.
jrossi: We just shipped a version for experimentation, but that's not the final form. We're looking to learn from the API we built for Unity apps and learn from that.
Manishearth: Yes, Firefox also emulates a controller. We'd like full hand input but without fingerprinting and with other security aspects.
... The idea is that your hand is made out of bones and each bone is an XR space with extra parameters and there's a hierarchy with named bones. Different platforms have different bones so indexing into the hand array.
... I invite people to take a look and I'm hoping to discuss this more at CG meetings. If there are specific things people want to talk about.
Ravi: Would the API specify gestures or would it be something for third parties to implement.
Manishearth: We should expose data but not higher level gestures. That said, gestures are better for privacy so that's open for disucssion.
bajones: Select and squeeze are part of the core XR input source so anything that's an input device is generally expected to produce those for broad compatibility. So unless hand input is divorced from the rest of XR input then we'd expect to have those gestures as that normalizes interaction with the other controllers.
Manishearth: This one open question about what the input profile string should be, which we should close out.
Ravi: Our device we have a specific gesture for push.
bajones: For select and squeeze there are left up to device providers to determine the appropriate mapping. So, we don't specify what select has to be the trigger. So with Oculus' hands with a specific pinch-to-select gesture that makes sense to map directly into the spec.
trevorfsmith: Are we handling loss of hand tracking separately from the regular API?
Manishearth: XR spaces give us all of the information that we need.
bajones: I was talking about hand tracking and a11y, what happens when someone is missing a finger?
Manishearth: See Issue #11. With missing a finger, implementers should be able to fall back to gestures that work. With extra finger, you have similar issues but the additional problem of rendering the additional finger. I'm not sure that we can force implementors/frameworks to handle these cases.
bajones: I imagine that this relies on heavily on platform implementations. The number of fingers also opens up specific fingerprinting since few people have more or fewer.
<Zakim> kip, you wanted to ask about balance of accessibility and fingerprinting
kip: Have there been thought about normalizing the hand models? Some people have unusually long or short fingers, so I wonder if there has been a way to normalize that model to avoid fingerprinting? If you do that, what sort of constraints would you use to ensure gesture recognizers can still work?
Manishearth: I haven't thought about that, but it's something we should address.
kirby: What is the plan for the libs and hands?
bajones: We can't push a "canonical hand model" for obvious reasons and it's not clear what the right path forward is there. The input library, the goal isn't a perfect representation of reality. There are specific models but also generic models. In that spirit if we had something that was apropriately abstract then maybe apps that aren't built with hands in mind they could still display hands. It's still an open question.
... It depends on how we surface hands in the API and I'm interested in figuring it out.
Manishearth: The OpenXR API differentiate between controller accurate and "natural" hands.
... The OpenXR API differentiate between "faithful" and "natural" hands.
Ravi: So, we're not planning to reveal a full hand mesh?
Manishearth: It's a thing but fingerprinting is hard to avoid with a full hand mesh, so we're tackling skeletons first. But, it's open to exploration.
<Zakim> mounir, you wanted to not move lighting earlier
avadacatavra: This isn't hand tracking related, but this might be a good time. My bosses' boss that there are a few specs that are looking for standards positions from Mozilla or TAG review but haven't gone through the working group or have a second implementation.
mounir: We have to ask for a tag review. I'm not sure what the other issue.
avadacatavra: We're asking for the standards position and TAG review before it is in the WG or anyone else is working on it.
... Should Sean talk directly with Alex about this?
mounir: We are working in open and getting feedback from others. This has been discussed for ages. The CG / WG stamp is administrivia. We've been shipping stuff in CGs for a long time. Hit-testing and DOM overlay are moving to the WG. I'd like to understand the issue. Is it the WG/CG or is it the process?
kip: This is the first WG I've worked in, but we need more opportunity to work with peers and review with our security groups. The best we can do is to defer a position when we don't have time for security review.
Manishearth: The concern from my side is that while the APIs are mostly fine but are designed only for handhelds so the issue is less one implementation but that it's only one form factor. The rush doesn't feel right.
cwilso: Speaking not as a chair, my day job is technical program management for Blink. So, getting Blink engineers to work with the standards process. So loop me and Alex in. The answer is kind of complex. Reviews are often triggered by our "intent to ship" stage which is meant to cause engineers to make sure that they're using the standards process.
... So, the effect is that we're often requesting TAG review much earlier, when it's still an experiment. We're looking at multi-stage TAG review or other review out in the open. Right now Blink engineers ask for TAG review before the standards group is ready.
... If you're concerned about it, then work with the engineer or me or Alex. The Blink and Chrome team tend to move fast. We understand the risk and I encourage Google engineers to be ready to change.
mounir: If you complain it will be taken into account. Chrome won't sit on their hands waiting for Mozilla.
... The issue is not two implementations, it's using a single form.
Ravi: We started working with Klaus on DOM overview to make it less handheld-oriented. We also want more care taken to assure support for both handheld and immersive displays.
klausw: For DOM overlay, I've shown both handheld and VR implementations. So, our main goal is handheld but it's unfair to say we aren't considering VR. This is meant to be a simple API and there's still room for a more complex layers approach. There's room for both.
Manishearth: I wish other browsers would implement it, but my main issue is that the design is only for handheld.
cwilso: You said that if the API is for handheld and not immersive then ...
Manishearth: Yes, my concern is that we need implementations on more than one type of use case (handheld).
klausw: Yes, that's the part I disagree with. I implemented it for VR. I would like other browser teams to implement this.
<Zakim> cwilso, you wanted to calrify
Manishearth: Yes, I mean mostly hit-testing. I know that DOM overlay has both.
cwilso: So, you're saying we go down a path for an API that should cover both but we implement it only on one form then we'll be unable to make good APIs?
ada: One process point: APIs will be developed in the CG until there are multiple implementations, so it won't be a standard until then. As long as it's still in the CG then there should be an expectation that the spec will change in the future.
Ok, we're taking a break.
<alexturn> We just figured the #immersive-web queue had become very productive and we'd worked out how to have 8 conversations in parallel.
<alexturn> Is there anywhere our WebEx video would show up, or should we turn it off?
<atsushi> alexturn: please check email at internal-immersive-web list
<alexturn> Thanks! We're on the WebEx for audio - just wondering if video is meaningful for this meeting.
<alexturn> I see no others with video on
<atsushi> meeting room is joinging via phone, so no video from here.
<alexturn> Got it - thanks!
<cwilso> Proposed charter: https://w3c.github.io/immersive-web-wg-charter/immersive-web-wg-charter.html
adarose: Ok, we're going to address the charter topic instead of AR lighting so people involved with AR lighting can be online at the right time.
<alexturn> Was hand input already talked about?
cwilso: Apparently I didn't send this out last week, so sorry about that. Our goal is in the next 18 minutes to review the new charter document based on feedback from you. We're not going to argue the merits of each item. We'll talk about AR lighting, then lunch, then we'll dig into this charter.
... We started talking about it months ago and the charter vote needs to be open for 4 weeks.
adarose: We'll extend the existing charter while that happens.
cwilso: The first link I sent is the charter and the next is a diff so you can see the changes.
... Feel free to queue up and ask questions while we review the diff.
... Basically, we removed some of the background, the scope is effectively the same with tweaks for new or retired products. We're still working on XR, big surprise. We didn't change much about the out of scope section. We did leave out of scope how the browser is implemented or designed. And we left out mech for large scale AR browsing. So, no coordination for sharing content on a global coordinate architecture.
... This is a two year charter, but feel free to bring it up if that's a problem.
... WebXR is in there. Gamepads and AR modules are in there. There are a series of new modules that we expect to land in this timeframe. When we discuss later, everyone should weigh in on those (missing or unneeded). Hit testing and DOM overlay modules are in there. Lighting estimating. Hand input. Real world geometry.
... There are two with only one person requestion: anchors and more advanced WebXR layers. Both are on the agenda for later. We also put in the "registry of WebXR gamepad assets", WebXR test api, and the polyfill.
... We haven't declared a timeline for each of these but need to take a stab at that. The only other things that significantly changed is that we will collaborate with other groups who do horizontal review and the GPU for the web discussions.
... Does anyone have any questions about that?
mounir: I think the anchors API is something we should have.
cwilso: Yes, it's been around for a while.
mounir: This is an item that we're working on now. We expect to launch it soon.
rik: Are we discussing in-scope items now?
cwilso: We could talk about those deliverables now in the 10 minutes before AR lighting.
mounir: Is the test API a deliverable for this group?
cwilso: It's not clear whether that's a normative recommendation or a definitive deliverable of the WG.
mounir: I have no idea. Other groups don't mention them in the charters.
klausw: If there's new things coming up would they be blocked?
cwilso: We can add things, but we want to make sure that we carefully define the scope. For examples, we could create anchors because it was in scope of the charter. We couldn't work on things mentioned in the out-of-scope section. So, if we wanted to add an occlusion module then we can do that because it's in scope.
klausw: Things like computer vision?
cwilso: Yes, there are four things that mounir sent to the group and we should address those, too.
... Yes, we have some flexibility as long as it's in scope of the charter. It doesn't mean we'll have recommendations for each of them in two years.
ada: We can recharter if necessary, too.
cwilso: Yes, if it's something we feel strongly about we can recharter.
<Zakim> alexturn, you wanted to chat after the charter discussion to talk about the progress made in the OpenXR F2F today on normalizing hands across vendors in native APIs
cwilso: For some topics you want different people in the room so it would be out of scope.
alexturn: While I had folks here, there are hand tracking updates from the OpenXR meeting that's happening now.
Manishearth: To talk about the test API, we did have a meeting with test folks at TPAC and they said that the test API is the right way to do it but we're the only ones who have done it. Most of them felt it was fine as a draft spec. At Mozilla there's an issue where they want to create a test API as a web extension API. It's not that all browsers need to agree to web extensions, but the test API as a WE means we would need to freeze it.
RafaelCintron: When I talk to council about charters they won't review them until they're almost final.
cwilso: I would hope by the end of the day today we'll have it pretty well locked except for remote participants who weigh in during the next bit. It'll probably take us until next week to have timelines in there, so the end of next week would be when we send it in.
... The process: propose charter, the AC votes (open for a month) where they send in problems or tweaks, then the AC votes to approve with those edits. So legal team edits can happens during that process, too. Ideally, sooner than later.
cwilso Lawyers get involved when it's time to rejoin the WG, so after the AC vote.
<Zakim> alexturn, you wanted to discuss hand tracking
(Switching to hand input)
alexturn: I'm at the OpenXR f2f and we talked through hand-input, including with Oculus.
??? (with Alex): We're all in rough consensus about the shape of the bones and how they're represented as coordinate systems.
alexturn: Right is X, Up is Y, Back is +Z and out from the wrist toward the tips of the fingers. The feedback was strong to have hands be consistent (not t-pose). There was a discussion about write orientation (pseudo-carpals) with a virtual bone between the wrist and middle. For the forearm joint it wasn't clear if they could sim non-identity joint so we're holding off on a forearm extension. And we discussed the format of the spaces.
... We think it'll be 26 spaces with a helper function.
... There was a lot of GH discussion about abstracting over different APIs but we're trying to get the right superset of features with the right naming joint after the bone that is on the way from the tip of the finger. We'll write this up and put it into the GH Issues.
???: We expect every joint specified in the spec to be recorded, regardless of whether they're tracked or all will be invalid if they're not tracked. All joints will ???? so you always get the joint location in another space so the app can build articulation with a little math assuming the bone structure hierarchy. In the spec we don't standardize the hierarchy, we expose the joint location individually so different meshing methods work.
<alexturn> Bye everyone!
bajones: We'll be reviewing the WebXR lighting estimation proposals and work that's happened so far. It started with Kip's work two f2fs ago. Also, we've had someone from Google who was experimenting with exposing it so we'll discuss that and how we can combine those efforts.
... (showing digital rocket shown in several AR scenes with different lighting) The lighting is adapting to surroundings, shadows are in the right direction. Light and dimmer as appropriate, and the relections are appropriate for the env.
They've generated a cubemap to generally approximates the environment.
bajones: There are two known APIs that cover this, but let me know if there are others: mobile phone APIs (ARCore on Android, ARKit on Apple).
... ARCore gives you a single implicit probe with an ambient term, a single directional light/intensity, and an HDR cubemap.
... ARKit gives you an HDR env map (somewhat undocumented) and metadata with extents of cubemap for things like parallax mapping. They provide an automatic mode that places lighting probes for you. ARKit only as directional lighting for facial AR from the front facing camera.
... Both APIs provide low power ambient only estimation (two floats) with color intensity and temperature.
... The initial explainer from Kip pretty directly serves up some of the info like spherical harmonics. There are minor technical quirks like in XRFrame we shouldn't do anything Promise-based.
bajones: Will on the Chrome team did an implementation of this API but was structurally more complex, taking ARCore info and surfacing it up to the web as directly as possible to evaluate how well it works. There are a couple of bits like XRWorldInformation from a proposed planes API which seemed reasonable at the time, but aren't appropriate for the standard.
RafaelCintron: I have a question about Kip's explainer. Is "createEnvironmentCube" meant to be called once?
kip: I'm proposed a feature passed into the session that causes the browser to make that data available at any time, so effectively you're subscribing to the data when creating the session.
bajones: Will's implementation has a stance on this that we can go over. We should discuss what approach is appropriate. One minor is that we should provide context for where the cubemap would be used.
Manishearth: This is a q for Kip: Which coefficients would this be for?
Kip: The first would be RGB and the rest are in that order.
Manishearth: Would it be better to be a more structured API instead of an array?
kip: The reason it's an array is that's how the lower level APIs provide it.
Manishearth: I understand that they are but it wasn't obvious how the array worked.
kip: I filed an Issue for better packing the data and exactly what those numbers means. We could definitely use feedback on how to nail down those definitions.
Manishearth: In physics we use our own names, but it could be more clear here. We can talk about it more later.
kip: The intent is lower order is first.
bajones: Feedback on structure and naming is very welcome. Three.js at the very least likes to suck in data in that form.
Nick-8thWall: If you're providing a WebGL texture with an env cube you probably need camera permissions.
kip: Yes, I filed an Issue just for that.
bajones: This is an important point. Structurally I hope that we can make that part of consent prompts at the beginning of the session. It will depend on individual platforms, of course. ARCore provides a cubemap that is 16x16 per side, so not terribly detailed but each platform will need to decide how much is too much.
bajones: Using Will's implementation I created a sample implementation with dinosaurs. They're probably not the best models because they're cartoony but you can see that there's a basic level of fit to the environment.
<kip> About camera-like permissions for cube map: https://github.com/immersive-web/lighting-estimation/blob/master/lighting-estimation-explainer.md#xrreflectionprobe
bajones: That's because the implementation has some problems. (shows slides of spheres with various reflections)
... Compared to Filament (Android SceneView) there are some problems but ideally we'll be able to make the WebXR equivalent similar in terms of cubemap, taking into account the quality of the provided cubemap.
... I would like to propose a simplified structure. I'd like to open this up so I'll go over a few things then hear from you. I'd like to stick closer to Kip's original structure than Will's implementation. Because iOS can manually place light probes we shouldn't block that but in the meantime favor automatic placement. I'd prefer lighting estimation be a feature indicated at the beginning of an XR session.
mounir: There's a cost to lighting in the hw, even in the background, so not doing it now doesn't mean we can't add it in the future.
<kip> Feature descriptor section added to explainer: https://github.com/immersive-web/lighting-estimation/blob/master/lighting-estimation-explainer.md#feature-descriptor
bajones: For low power ambient mode, is this something we'll need in the future? The docs mention it as a much lower power mode compared to full HD lighting. But, this was from before the lighting probe feature so maybe it's only for backward compatability?
Nick-8thWall: At 8th wall we have our own ambient light detector which is cheap computationally and we really don't see people using it to good effect. It doesn't make enough of a difference in most cases.
... For env cubemap stuff it might be more valuable but we haven't played around with that. We created experiments/examples using location of Sun to provide shadows but it's more important to have any shadow that for it to be right. Geospatial shadowing can create long bad shadows that aren't helpful. So, something is more important and sometimes better than something exactly correct.
bajones: Yes, the shadows in my examples are basic and static but they make a big difference.
Nick-8thWall: If you don't have the right cubemap with the bottom reflection then it's going to look pretty bad. Most prebaked cubemaps don't have the right properties.
dino: I think there's value in the vague ambient lighting even though we've moved on.
bajones: Thank you. Along those lines is I'd like to propose that if we expose the low power ambient mode then Kip has mentioned it's possible to encode that in the spherical harmonics. So, we could do that in a nice way that's invisible to developers to make it a little easier to author content that's agnostic to how those signals are provided on the back-end. If there's no directional light we could provide a 0 intensity light.
... Being able to offer the content once is more important.
dino: One of the technical challenges with low power ambient lighting. ARCore doesn't provide a way to transition between low and high power mode. Back to ??? shadows. It would be interesting to use the plane detection API to provide ambient occlusion information that the developer could use.
Nick-8thWall: When I talk about ambient lighting my experience is with mobile AR where the phone itself is modulating the gain on the display which is different than head-word AR where ambient lighting would be more important because the display isn't amping up the display.
bajones: I'm not sure how to do light matching in headworn AR, but that's a good point.
<Zakim> kip, you wanted to mention that the explainer is updated, and please review
kip: I wanted to mention that the explainer is updated. Please take a look at the security implication section.
bajones: One of the question about how to make the API more broadly available is handling of cubemaps. The underlying platforms surface this differently and WebXR will want to surface it as a WebGL texture, which is an expensive resource so you don't want to do a lot of per-frame processing.
... For some forms of PBR you may want to do some computation (e.g. mipmaps over a cubemap) because that would give you a more precise result using roughness. We know ARCore gives arrays of floats. For ARKit it returns a Metal texture with no documentation.
<DaveHill> light-matching on see-through AR devices is still valuable -- we looked at this on HoloLens a while back... light-direction, for example, still contributes a lot to scene matching
bajones: Even if they both give back the same size/format that's not clear that this would be a strict standard going forward as the platforms might change. Because it's a float or half-float then will that be exposed to WebGL 2.0 only?
... It's weird to pass in a WebGL context but then reject if they haven't enabled an extension when the context was created. It seems like a safe bet that if it's HDR then it'll need to be floating point of some sort but some platforms might do it differently?
... One question is whether we should be trying to provide mipmaps at all?
Will: If we're going to force mipmapping then wouldn't that assume all colors are in linear colorspace?
bajones: It is the question about how many assumptions/guarentees can we make about the data?
klausw: The parameters are complex and not easily interpreted so we either need clear explanations so we ensure consistency or we need assurances that the data is consistent.
bajones: Yes, I agree and we need to provide flags like blend-mode to ensure consistent behavior.
dino: I don't know what's inside the Metal texture. As to the WebGL texture I wonder if it would be OK to... every WebGL implementation has float/half-float extensions so I wonder if it would be OK to enable that for them?
... That seems better than expecting the developer to handle it.
bajones: That seems like a good thing to bring up at the WebGL WG meeting next week.
dino: I don't see any negative effects of enabling that extension. But yes, we can bring it up in the WebGL WG.
bajones: The order to extensions that is important may change other renders.
... If you can figure out the internals of the Metal texture that would be helpful, thank you.
... Because the cubemap is relatively expensive should we try to make them reusable? If so, we want to notify the developer if it has changed internally. Will's API updateWebGLEnvironmentCube has an optional parameter for passing in a target WebGLTexture. We've been dinged in the past for that pattern.
... While we could carry forward with it, I think we could get away with a slightly different approach of providing the texture and then updating it in place.
... Because of the async nature of generated cubemaps it probably won't be updated per-frame. It's currently about once per second. So, it doesn't need to be on XRFrame but instead it should be another object.
DaveHill: Question about updating cubemaps, but when the cubemap changes and there's only one then the transition needs to be hard. If we provide a new one then it can provide a smooth transition.
bajones: Yes, Kip and I was talking about this kind of transition. I've seen that effect used in other contexts, like if I have light probes in fixed points in the env one could weight them appropriately. The current proposal doesn't allow for that but it's good to consider.
... The underlying platforms don't accommodate that presently. I'm not sure how that would work out on ARKit. So, yes, a good thing to consider.
Nick-8thWall: I wonder if you cuold talk about why it's undesirable to have an API to query the texture pre-frame? That seems simpler from an implementation perspective.
bajones: The primary thing I'd be concerned about is if the developer needs to do work on the texture. In the dinosaur example I was doing work on each texture but it turns out that three.js was broken. Still, it's an understandable thing for a developer to do. It might not break your time budget, but things like generating mipmaps could be expensive. If the developer has to do work on the texture then we should let them know the
Nick-8thWall: Developers could compare objects.
bajones: Yes, that leads to leaving cubemaps on the ground or having an object updated without the developer knowing. None of these are intractable problems but I'm paranoid about it being expensive but they probably won't be terrible.
mounir: I can imagine with the event if ARCore and ARKit lets us know that there's a new texture at the same time as the RAF then we could fire the event then and it would be used in the next RAF callback. I wonder if it could be a boolean (not an event).
<Zakim> kip, you wanted to comment on per-frame cube map generation vs event driven updates, and how it relates to amortization of work over frames
bajones: The way ARCore handles this is it has a timestamp of the last change so the developer could check that on a per-frame basis. I think ARKit does give you more of an event but I would need to double-check that. We have other places in the WebXR spec where we dictate timing of events so we could do that here, too.
kip: When we're talking about things like cubemaps you might use it on that frame but most likely you'll use it over several frames. One reason is that engines will amortize work over several frames. It might decide to update on face of the cubemap per-frame.
... We have different goals from a native platform. The native goal is to change the object as quickly as possible during env changes. For the web, I'd suggest applying a filter that smooths out the content over time. The cost might be higher with GPU time but you don't have a specific event for fingerprinting on things like co-presense of users when a light changes.
... There may be somewhat of a balance between looking more natural and the security of the information. So, hanging it off of XRFrame or XRSession is part of that.
<kip> Please review and comment on the issue I've added for this: https://github.com/immersive-web/lighting-estimation/issues/25
<kip> Issue added for Brandon's open question on spaces: https://github.com/immersive-web/lighting-estimation/issues/24
bajones: I'm also proposing that XRLightProbe hang off of XRFrame since you're going to be putting it into a shader anyway. There are still open questions: How many guarantees should we give about data? What's the way to communicate the XRSpace in which the information is given? What level of granularity do we give to feature names?
ada: I suggest raising issues if there are further concerns.
<dino> Can someone provide a link to bajones's slides?
<cwilso> Brandon's publishing them into the FTF meeting folder in https://github.com/immersive-web/administrivia/tree/master/F2F-Feb-2020
<cwilso> (not done yet)"
<Manishearth> scribenick: Manishearth
cwilso: happy to have folks leave comments, otherwise i'm going to walk through stuff
cwilso: want to call attn to mounir's point re "
... "Finally, we would like to have this work ..." in the email
... if we decide to incubate one of these in the CG we shouldn't have problems doing so, i don't think we need to add tehse as normative today
mounir: the media WG has defined things as being in scope in the charter
... this s being chartered for 2/3 years and we should make sure any possible stuff is in scope
cwilso: we've talked about this before, and one thing you felt was left out was specifically image/face detection
... and this is one thing with security implications / etc
... and your point was that we might be exposing these from an underlying system and we may just want a better way to expose this
... e.g. image detection
mounir: to give more context
... we want to avoid doing raw camera as much as possible
... but image detection is something we have low level APIs for in arcore and arkit, so we could expose those
... google partners are asking for occlusion which we're hoping to push on too
ada: to clarify, for image detection you're talking about a system where you detect marker images, yes
... we will be working on this stuff if there's a way out of the cg
ada: your proposal is for a new section in the charter for stuff we don't want a deliverable for but we want them to be in scope, yes?
<Zakim> cwilso, you wanted to poll
mounir: yes. i can link to the media wg charter, i think it's been done elsewhere too
cwilso: yes, it has
... want to see if there are objections to doing this?
Nick-8thWall: what are the exact changes
cwilso: four APIs: raw camera, image detection, depth/occlusion, face detection, adding it to explict scope but not as a deliverable
ada: any other items people wish to add?
trevorfsmith: one item that i wanted to discuss: extending webextensions for AR capabilities
... webextensions are longer lived than sessions so it may make sense to have xr capabilities. we have an issue but hasn't received much traction
... maybe add this to scope-but-not-deliverables
cwilso: innate reaction: interesting space, need someone to spend time prototyping before figuring out what it turns into
... not really ... out of scope, but i worry about setting expectations that we're really looking into it
Nick-8thWall: having mechanisms to share AR experiences to social networks
... for AR Core: having access to the final composited output that can be used to generate image or video (e.g. a gif)
... kinda related to some CV items where there's potential for an application to access camera pixels, but it's an important use case
... idk how this would work with magic leap but there's a use case for AR headsets where yhou can share the scene to express your experience with people who are not using the heasets
cwilso: to clarify: your core point is access to compositing output
Nick-8thWall: kinda a byproduct
... point is really that you want to create artifacts of your experience
... point is not that i'm going to anayze the composited output, it's to take the experience and make it shareable
cwilso: the reason i ask is that when you're talking about sharing experiences if we're both wearing headsets
ada: i think ... storing or capturing compositing output is worth ...
... thinking of ways to do it without an API
... maybe worth raising it as an issue on the raw camera frames repo
<ada> Manishearth: Two things, I want to quickly point out that there is no API that gives access to the composited API,
<ada> Room: screen capture API
<ada> Manishearth: oh, that does take into account visited links and stuff,
<ada> ... anyway i qd my self to add a scope for this list is Navigation
Manishearth: should navigation be on the scope but not deliverable thing?
<Zakim> ada, you wanted to reply
cabanier: +1 same question
ada: i think this is really really important, part of the reason why we hadn't looked at it before as it was veiwed infeasible bc of creating secure contexts for users
... given that we have implementations is this something we feel is plausible?
* (the group has voted ada "best chair")
RafaelCintron: what about global scale anchors/etc?
cwilso, mounir: cloud anchors, defining things in unbounded ref spaces
cwilso: explicitly want to leave out of the charter specifically: we didn't want to define a cloud anchor system
... the subquestion here is "do you think we should try to take on sharing of anchors somehow?"
RafaelCintron: no preference, just want to clarify
mounir: icbw but for FPWD the spec is going to be compared with the charter
cwilso: if we get to the point where everyone somehow agrees on a cloud anchors system and has written all the spec text i'll go back to the AC and get it reapproved
ravi: to add to what RafaelCintron was saying we thought cloud anchors would never be added
... so very local stuff
... the other point: as a platform we support stuff like experience sharing
... so not sure if it needs to be part of XR or a device/platform thing
Nick-8thWall: to clarify, the XR spec currently disallows accessing composited output
Manishearth: (through web APIs)
kirby: remote assistance is another use case here (factory, mfg, etc)
<Zakim> kip, you wanted to ask if extending screen capture API to support capturing XR sessions would fall under our charter
kip: just taking a quick look at the screencap api there's an enum for if you want the whole screen etc
... if we are going to expand the screencap API is that part of something we can do as webxr?
<ada> Manishearth: to mention we are doing Dom Overlays similarly that it makes more sense to change a spec, whereas dom overlays is piggy backign on a spec
Manishearth: DOM overlays is similarly piggybacking on a spec
mounir: and yeah we should just edit the enum upstream
cwilso: general disagreement for a potential normative spec for the four things for what mounir had mentioned, plus experience sharing
... going back to the rest of the charter
... no q about the rest of the modules
avadacatavra: any desire to add gaze tracking potentially?
ada: anyone looking into it?
avadacatavra: mozilla is
... mozilla is looking into prototyping some of this
ada: related to foveated rendering?
avadacatavra: yes, part of it
DaveHill: to clarify, is this dwell time/etc ?
avadacatavra: yes. not just head pose, actually where the eyes are looking. we're looking at devices with this specifically enabled
ada: if mozilla is interested we should prob add this to "potentially new normative specs"
jrossi: these would still go through CG yes?
cwilso: yes, we're basically "licking the cookie" here
ada: is there an issue in the proposals repo talking about this?
avadacatavra: no, we started talking about this last week
cwilso: something you'd want to ship on by default possibly in the next two years?
<Zakim> mounir, you wanted to talk about anchors
avadacatavra: next two years? perhaps yes
ada: anyone has plans for anchors?
bajones: seems to be when we were first talking about webxr, anchors were a big part of the conversation, little surprised to see a muted reaction to the feature
... maybe a change to the content building patterns ? maybe people no longer feel like they need it right this second but yeah it should happen
Mike: we've been able to get by with hit testing and ar mode, but yeah this would be nice
ravi: can we take a little time and get back to this?
ada, cwilso: sure
cwilso: when google was doing early investigations a year and a half ago, it seems like it was valuable even for short experience
ada: to be clear there is no rush, feel free to check with your company and respond in a few days
mounir: people want the best experience on both native and web
klausw: the people working on implementations: our session durations are like 5s, which is different for actual users
... case in point we forgot to turn off the screensaver in AR
... so if folks ask for anchors just because we've noticed *our* experiences to not need this doesn't override that
... also good for hit testing in local coordinates
cwilso: modules that we've added:
... please speak up if you contest
... hit test module?
cwilso: dom overlays module?
cwilso: lighting estimation? we seem to have fairly clear support
cwilso: hand input?
mounir: i think we didn't have two groups have +1d hand input?
jrossi, cabanier : we're +1
RafaelCintron: also +1
Manishearth: (also +1)
cwilso: this is a two years charter, so it's two year's worth of work
... lighting probably would, maybe not hands
... but we don't know what we don't know. would expect the expected completion to be at the end of the two years or later
... real world geometry?
mounir: worth differentiating plane detection vs meshes in charter. may not be the same spec.
... meshes are ignored by mobile AR
cwilso: i think what you're asking is that we make sure the language enable us to have multiple specs or multiple representations of RWG so we don't lock in to exposing both of these in the spec
mounir: more about the deliverable. maybe just more than one
cwilso: anchors? already kinda talked about
ravi: (re: RWG) if we clearly state the deliverable is plane detection it's easier to make a decision
klausw: to clarify, maybe not everyone has the same mental picture about anchors. we basically establish local coordinate spaces that could be a hit test result, etc
... do folks understand this part or should we use different names
cwilso: current description is about specifying poses in the world that need to be updated.
... maybe needs to be "poses in local space"
klausw: "establish coordinate spaces by attaching them to real world things"?
bialpio: have a short presentation prepared so maybe we can postpone a bit
Nick-8thWall: maybe seems to be some issue about the lifetime of anchors (can they be reestablished across sessions?)
... this seems to be the distinction people care about
robko: in talking to some of our partners who are using native AR experience
... we're going to have a hard time moving folks to the web without anchors, imo it's important
mounir: trying to guess why there's so little excitement about anchors. is it actually used in AR headsets?
... since the spec isn't very clear as of now, if we got more clarity on what the scope of what anchors do
mounir: what about microsoft?
Yonet: yes, and it's important for a11y too
ada: ideally need two people to spearhead it
kip: mozilla is also +1
ib: just to add to what nick said, there's a middle ground where it's nt cloud but when a bunch of folks are gathered in a room -- the collaborative use case is quite compelling
... multiplayer experiences etc
cwilso: next topic! webxr layers
... any concerns about the deliverables?
Manishearth: mozilla +1 layers
mounir: why do we have a list of hardware in teh scope?
cwilso: it's example hardware
mounir: maybe we should just say "headsets, mobile devices , etc" without naming specific ones
... having a list will make some people unhappy, even as examples
jrossi: yeah seems weird in a scoping charter
cwilso: alright, let's jump to bialpio's discussion of anchors
bialpio: i have a small presentation
... will send slides later
... agenda: general approach of api, terminology, talk about the current impl, and then demos
... and then next steps
... terminology: what i mean by anchor: fixed loc in RW, that roughly translates to establishing local coord system. fixed does not mean poses do not change. as understanding of the system changes, the poses can change across frames
... probably necessary to query in fixed reference space for rendering
... i think this distinction needs to be made
... Approach: roughly following hit test, anchors created asynchronously. returned anchor object is in a zombie state till the first rAF call happens
... specifically to address that the anchor creation may not be instantaeous
... from that point on updates happen synchronously
... as we add more entities to webxr, there will be a way to create anchors based off of those entties, simply bc those entitites may have a bit more info about hwere in the world that anchor is attached to
... e.g. xrframe may have a way to create free floating anchors , but xrhittestresult may have a better way to do this, e.g. "i've created this based off of the floor plane, so if i think it moved, i move the anchor"
... updates are synchronous, xrframe contains a list of currently tracked anchors and they get updated each frame
... all will be accessible on the frame, incl the last update time
... just bc last update time didn't change doesn't mean the pose is the same
... untracked anchors removed from the list
... so we on't need an event based api
... fairly easy to extend with events but not for now
... lifetime can be controlled by app of udnerlying system (e.g. when tracking is lost)
ada: queue time
... if you've got an item, can it's position in 3d be related to multiple anchors, so if an anchor is lost you fall back to others
bialpio: IMO up to the application to do this
... thinking about the case where you're e.g recreating the Castles game and you're firing projectiles between castles and the projectile is an anchor, if stuff shifts maybe you want to design it so that it can still hit when that happens
DaveHill: if device regains tracking does the device automatically recreated deleted anchors?
bialpio: limitation in ARCore: once lost it is never regained. if you feel that this should be handled i the system we need to come up with some solution to work around ARCore's impl
DaveHill: losing tracking is something that just ... happens so
bialpio: unfortunatelhy for us we'd have to emulate this on ARCore and it would probably not be great
DaveHill: other q: you talked about anchors as a list. do you see the dev needing to look for the anchor in the list, or more in the background?
bialpio: probab;y not, the only thing you would have to look into the list to see if it's still there and if it's updated since the last time I saw it
<Zakim> bajones, you wanted to ask about scope of "no longer tracked"
bialpio: i think dave covered most of what i wanted to ask. will point out: my understanding is that most systems have a way to recover tracking
bajones: my understanding with HL is that you can never tell if an anchor is permalost
... maybe what you can do is even if the anchor is present the space isn't always localizable wrt various ref spaces
... we should check the persistence of anchors across platforms, this might be an arcore specific quirk
bialpio: my approach is that if one system is limited in this way we either need to emulate there or do lowest common enominator
bialpio: yeah we embed spaces, e.g. for hit test, etc we embed a space in it
Yonet: what about the anchor's radius ?
bialpio: not at the moment
klausw: to make something explicit when we're talking about anchors here can we be very clear that cloud / x-session anchors as something totally different
DaveHill: building off that: if we imagine a world where we have temporary device, global, cloud anchors, would be weird to have three different APIs
... maybe a crawl walk run thing, but maybe there's a reason to have a cloud anchors api that's different
bialpio: thanks to XRSpace we can expose things commonly across these probably
... weirdness would probably be about how you create anchors
... a bit about progress. i do have a chrome impl for android, behind a flag
... currently covers creation of free floating and plane attaced anchors
... have prototype for creating anchors based off hit test results w/o exposing the underlying entities
... demo time!!
<Zakim> avadacatavra, you wanted to talk about ieee vr remote participation
avadacatavra: just wanted to point people at the online vr remote participation at ieee vr, on hubs
... supporting cowatching of videos
Nick-8thWall: one minor q: given that anchors may not update right away, wondering why you need a promise to create them
... maybe just keep the API synchronous?
bialpio: making it a subscription API in general
... the way i see it is that since there's no easy way to express how long the system takes to create an anchor
... i want to guarantee that once the promise resovles the next raf callback will have it
mounir: slightly related. so the promise can be resolved with the anchor not having everything yet
bialpio: yes. the data coming back from the device happens once per frame, wanted a way to have the ordering of steps defined
mounir: why not resolve when everythig is set up
bialpio: steals a bit of time from the raf cb
... i guess there would be that big a diff betwn the two approaches
RafaelCintron: two questions: how is the webdev to know it doesn't support
... and [missed] about the last update
bialpio: for (1) we can just throw an exception
... perhaps not feature requests since i want all AR devices to support in some form?
... but they can always throw or reject in some form
mounir: may be folks who impl APIs and may not have devices who support it
bialpio: only case this would fill is if it's created with anchors as an optional feature and the device doesn't support
RafaelCintron: question 2: what's last updated time for the resolved thing?
bialpio: null before the raf callback. apart from that the app can detach the anchor, doesn't mean it's gone, that's the second case when it's updated
<cabanier> scribenick: cabanier
klausw: we have a feature repo for computer vision
... is Blair on the call?
... I will try to represent his explainer
... there were 5 main topics
- sync/async accoess to video
- expose native algorithms
- webrtc support
- expose computer vision building blocks
scribe: I think we need to expose these separately
... there's a khronos spec and Blair thinks that we should have a webvx one
... taking a spec back there are some use case
... for instance image or marker recognition
... either predefined images or markers
... and then monitor their poses
... both arcore and arkit have this
... by providing a database of images
... from the documentation arkit you have to supply the size, this is optional for arcore
... does anyone know?
... the API returns a pose and a tracking status. ie estimating a pose
... for arcore you get an estimated size
... if you want to expose this as a web api
... there's a shape detection api proposes, which we could consider looking into
... there are js based solutions for instance 8th wall
... but doing it through native APIs should give more consistent results
... which is an advantage
... a second use case is capturing what is on the screen
... having a better way to doing it (compared to a cable) would be nice
... or it could be saved to a file and shared through some mechanism
... for raw camera access
... for the handheld case, you get a fresh image per render frame and it matches the viewport
... for headset the story is harder
... the viewport is different, there might be different cameras
... they could be grayscale, etc. So there are a lot of difference compared to handheld
... a common use case is computer vision
... you need to know what the data looks like
... also, it would be nice to do web workers
... you could do video effects on the phone but not on headsets
... why not use getusermedia? the ar session might already have access to the camera
... one idea is to start out with the simple case
bialpio: arcore has a way to get non-exclusive access
klausw: yes. but that doesn't work everywhere and we need to explain this to developers
... if you want to support async camera access, things get a lot more complicated
... this is similar to mixed reality streaming
... you can do it by doing green screening
... if it is not tied to the frame loop, we need a separate loop for camera frames
... then we need to know what the camera pose is compared to the current pose
... we need timing information, distortion information
... do we want to support multiple cameras?
... this is all I wanted to present here
... I didn't create any APIs yet because it's too early
... is anyone opposed to this?
... did we miss cases?
... where should we go from here?
DaveHill: I felt that access to cameras didn't sit well because it's too low level
... on the oculus level we don't give access to those cameras
... because it is too low level
... I think that's the case on hololens as well
jrossi: did you think about the permissions model?
... is it similar to getusermedia?
klausw: if we do image recognition based on predefined images, I don't think so
... we can show the user what images are looked for
... image tracking needs less invasive permissions because of that
ib: what is too low level? customers are asking for camera intrinsics
... can you clarify what is too low?
DaveHill: if you're building a device, developers expect something to show up a certain way and if you rev your hardware, you will break the workflow
ib: the use cases I heard, is to have layers on top
... being able to give the quality of service would be helpful
klausw: for the phone AR case, you have everything. position and fov matches the xr viewerpose
... for the smartphone case do we need more?
ib: if you want to add a semantic layer, you still need that. (???)
klausw: you alreasdy have them implicitly so there might not be a convenient way
... do we want to be more explicit about adding camera intrinsics so things are more future proof
Nick-8thWall: I want to give some perspective
... for our customers, they have a wide array of cv technologies
... world and image tracking we support
... cylinder, face and general object recognition are often requested
... there's a wide array of technologies that people are interested in
... it's really important to consider the XR modalities
... for instance, image tracking we solved on smartphone ar
... but it doesn't work on magic leap
... having a way to do client driven computer vision is a problem
... on smartphone, you can capture a frame and do some processing
... which is very different from a headset
... finally from oculus, it doesn't seem to make much sense
... which is something to consider. All this tech is very platform specific
trevorfsmith: Nick already covered it
... AR glasses form factor. Users expect to get results from their front facing cameras
klausw: how do people see the priorities?
... do we need a camera API?
ada: a tracker would be very helpful. ie an image tracker
DaveHill: from facing RGB camera, access to that seems reasonable
... but tracking cameras seem problematic
klausw: I'm not saying that vendors need to do this
... forward vs tracking cameras distinction makes sense
ib: some access to intrinsics seems helpful
ravi: even if we allow image tracking, would that give an implicit permission to open the camera?
... because you can't always have the camera on
klausw: does the internet site get access to the camera? no
... does the browser? yes
... this is already the case for AR WebXR sessions on mobile
... for images tracking this seems somewhat similar
... for permission, it seems shouldn't show
mounir: we can use feature descriptor and then get a difference permission
klausw: no, it's for the indicated LED
... for instance google glass shows if there's recording
trevorfsmith: I work with a bunch of different clients
... the number one blocking for them is that the standards process is very slow
... now that we have webassembly, clients can run snippets of code
... so if they have camera access, it would be a huge enabler
jrossi: the challenge with the raw image is the inverse
... yes, this allows experimentation
... but then the implementator has to tell the user what the website is doing
... ie is the website recording people?
... and uploading it to YouTube?
... that's generally the concern. Even on the native side
... our company investigates apps and this is not available on the web
ada: it could be worrying if state actors interject themselves
... look for weapons, or people
... this could be scary to do it at this level
klausw: Blair is the feature lead for this repo
... maybe we can look closer at these proposal
... image recognition could be valuable without needing camera access
... make it extensible for headset camera and see what people think
cwilso: what do people want to get started on?
trevorfsmith: there are things in the proposal repo
... let's see if anything will be picked up soon
... web extensions
... and I want to know why people aren't interested
... right now that's only a single session
... but what if we have a number of long running session?
trevorfsmith: what about this topic is something?
RafaelCintron: I don't like web extensions
trevorfsmith: you like the session oriented model
RafaelCintron: yes. we need to be very careful here
Manishearth: this is not just web extension
... this sounds more like a service worker
... since it's a long running background task
trevorfsmith: yes, why is it not a PWA
... are people thinking it's not in scope?
cwilso: (talking off all hats) what is terrifying is that this a long running background app
... it breaks the mental model of closing a tab and having the website disappear
... for instance greasemonkey could run on any page and could do anything
... script kiddies could do horrifying things
... so, this is a long running agent with a very high risk of abuse
... people are interested but it causes a lot of caution
... the downside of getting it wrong is catastrophic
... for instance on magic leap, immersive and landscape apps
... most apps are immersive because it more self contained
... landscape apps are a new type of user agent
... and teach users so it's less scary
... things that work under the covers are scary
... service workers are designed very carefully to avoid thjis
<Zakim> ada, you wanted to clarify
cwilso: as we get the infrastructure, we can figure out to just do one thing
ada: what would it look like if you put multiple tabs on top of each other
trevorfsmith: that is argon. a project from Blair
ada: would this be the half way point?
... let people composite on top of each other
trevorfsmith: yes, there are such points by having a highly constrained version
Manishearth: service workers have a persistent and offline version
... people hate the persisten usage since it's mainly used for notifications
... so firefox allows to turn them off completely
... ar mode, you can drag your browser and different sessions can have different inline session
... the concept of diorama is better suited to this
... because you don't have to care about the compositing stuff
trevorfsmith: (looking over topics from the CG depot)
... has anyone looked at subtitles?
kip: we intend to
trevorfsmith: at the user agent
cwilso: there's a bunch of discussion during the a11y call
trevorfsmith: then there's declarative xr
kip: mozilla is somewhat interested
trevorfsmith: how can we create scenes?
Nick-8thWall: is it what aframe does?
kip: for instance a gltf element could pop out of the 2d page
Nick-8thWall: why is this a browser element?
kip: what if your picture element supports this
... it would allow things that you can't do with webxr
... accessibility role of the browser will help as well
ada: are you doing some work on css for 3d
ravi: our take for 3d, we just want to show some models
... a model or image tag.
... so you can have a 3d experience without having an explicit session
... for css, we use css transforms to break part of the page and render in 3d space
ada: I'm very excited about that
<ravi> This is the css transform sample we have done on Magicleap: https://twitter.com/GenevieveMak/status/1207342006093004800
trevorfsmith: the virtual keyboard API
... seems interesting
kip: for the webxr emulator extension
<Zakim> kip, you wanted to talk about emulator extension
kip: we would love to see more pluggable into the browser
... this could intersect with the webxr emulator extension
... and the test API
<ravi> This is our declarative implementation: https://magicleaphelio.com/devsamples/gltf-model
ada: there was a discussion on working on a web extension
... a feature list if there's a gltf tag
trevorfsmith: there isn't one yet
... there's google's modelviewer
... we can add it to cg repo if people are interested
ravi: modelviewer has polyfills so it works fine on the magic leap platform
mounir: modelviewer won't be implemented natively in the browser
trevorfsmith: what can we do in a library?
mounir: someone from microsoft said that it would help if you have native support for gltf
robko: that stands to reason that it helps with that
mounir: modelviewer is changing all the time
... we can talk to magic leap
... but khronos (???)
<Zakim> ada, you wanted to ask about repo?
Manishearth: at mozilla we are looking at the transform detached for a long time
... and we are interested on working together
ravi: we have opened an issue since the original proposal had some pushback
ada: we can have people way in
ravi: the original one used transform-style but preserve-3d made things more complicated
... which causes a conflict
... should the root support it and render it in 3d space?
ravi: should it be applied to descendants or the specific node?
Manishearth: I noticed that there's a virtual keyboard API
... sometimes you want input and there's no way to do that
... it's surprising that this hasn't come up yet
trevorfsmith: can you elaborate?
Manishearth: the only way to get a keyboard is to click on an element
... right now you have to hack to do it on canvas
trevorfsmith: calling up the keyboard seems something that we should have
ada: can we start this spec and then send it to another group?
jrossi: this has been explored and talk to the web apps people
... there's a concern about spam
... the other one is that the website has no context about what to bring up
... the mental model is what are the hints?
Manishearth: the api should be what should the ua do when you click on a edit box?
Nick-8thWall: on safari, if you go in fullscreen
... and you tap on the screen, the browser will warn you that someone is trying to steal your password
... so there's an issue with trusted ui
trevorfsmith: yes, we need to consider how much the UA can provide trus
... I will bring it up in the next cg call
jrossi: I forgot that there's an a11y angle
trevorfsmith: are there any other topics?