WebSpatial API for Spatialized HTML/CSS and Spatialized PWAs on spatial and multimodal AI devices
This page contains a video recording of the presentation made during the breakout session, along with a transcript. Video captions and transcript were automatically generated and may not properly translate the speaker's speech. Please use GitHub to suggest corrections.
Table of contents
See also:
Video
Transcript
ada: My computer lag. By the way, if you're talking in the room, I can't hear you, because your room is currently muted.
Room 406: Read this one. Oh...
Room 406: Can you hear me right now? Good. Sorry about that. We just introduce a little bit about ourselves. And about the agenda. Right. The first problem we're facing is that, right now, AI has become part of everyday computing. By the way, these slides, we try to cover as much of the information as possible, so that you can see it's a lot… it's a very long slideset, so it has around 100 slides, so we will not, be going to every slide details.
Room 406: AI has become part of everyday computing, and it's reshaping how we interact with device. Right now, we no longer just click or type with speak, gesture and look, and this shifted drive the demand of hands-free, multi-model interaction across all kinds of hardware, from wearable AI ping and AR glass to a high-bandwidth XR headset. Those devices now need an operation system that can see and hear what you do, and an interface that moves beyond flat screen into a physical world around us. For example, like, AI class is hand-free, and it say what you say, interact with voice. And AR glass, on the other hand, integrates information into physical space, moving UI beyond screens. XR device, however, have an infinity space for massive information and better spatial integration.
Room 406: Under this trend, we think, like, all apps, not just 3D games, need to consider spatial, hands-free, multi-model interaction use case. Here's some, screenshots from, like, AVP, apps that we can see. The, the, the, those, those, we have, like, multiple screens, and we have a 3D model can interact inside of the space. We should... We can split the, web page from a flat, totally flat 2D into, 3D, and, elevate our HTML element, and unlock the depth. And also, we can have multiple screens, containers, and it's native-powered, and also have true 3D content that can pop up off the window, and you can interact with it.
Room 406: What we call the spatial operation system has a far greater need than, the… In spatial API apps, like, UI calculator beyond 2D restriction, allow 3D model existing with a 2D panel, and many extra features we should consider under scenario of specialized, apps.
Room 406: And what we call the spatial operating system has a far greater need than mobile OS on the advantage of open web technology, such as install free, use it and close it, or try before install web application, the ability to access massive web data. And also, another trend we noticed is that when we're talking about the client-side AI agent, they browse the web, most of the inform or tools that AI agent is accessed to use. Let's say, if you want an AI agent to buy a bubble tea in a bubble tea website. This is an infrequent use, and the agent will not know he's going to use it, and that's why we need an in-store free and use it deleted model of apps.
Room 406: So, like ChatGPT, maybe in the future, the chat box are replaced by, address bar, message feeds are replacing the tabs. Like TikTok, Snapchat camera are replacing the address bar as well.
bytedance: Still what binding.
Room 406: And the benefit of the open web has been stolen by super apps in a closed ecosystem. If you had used WeChat and the mini app in it, you will understand how powerful those super apps are, but it's not included in open web.
Room 406: One interesting trend that we're seeing is with Apple and Google, they're extending their existing 2D UI frameworks to support these spatial applications. So, for example, on VisionOS, you can see that SwiftUI and RealityKit can be extended to build spatial applications on VisionOS, and additionally, iPhone and iPad apps can seamlessly be upgraded to VisionOS apps and become spatial. And then just recently, I believe this month, Android XR did something similar with Jetpack Compose. So you can use existing Android tablet apps and extend them to spatial applications for Android XR headsets. Just like Simon described, we feel like the technology currents fall behind in developing spatial applications, and that's the first problem we're facing, and maybe we can solve, or think about some solution together.
Room 406: And here is what we currently have and why it's not enough. Like, the native 2D GUI stack, or to the hybrid ecosystem, like React Native, or mini apps that drive from the mainstream web. They all have a good part, but they're all for the specialized web, it all has a, something, like, not enough. For example, the native part, the native app is closed, Pan for exclusive, and the Rex native, the web, is less web call advantage, like, not in-store needs. Alright, and the mini-app is actually existing in the web ecosystem, but, it's, it's not inherent from the, the existing web ecosystem. I'm gonna talk about some of the problems that currently have, specifically what the current web stack, and maybe talk about some opportunities that we have in order to improve this, to build better spatial applications.
Room 406: Fundamentally, there's been a paradigm shift in VR and AR operating systems. Traditionally, we had this non-unified rendering, right? a compositor-based rendering system, where the operating system doesn't really understand the content that a specific app is rendering. And some apps, for example, games, are rendering through the whole screen, they take over the whole environment, and they have fully immersive application. There's a trend towards having these spatial apps and multiple apps sharing the space. And being able to interact with them. For example, if you have volumetric models, you can imagine maybe the light is reflecting off of them, or they're reflections. And then also applications themselves might have a frosted background, where it's showing through the environment around it.
Room 406: In order to do this, the operating system has to leverage this unified rendering model, where it's managing the rendering of each application, and the applications themselves are more focused on rendering their content within their container volume. The benefit of this is it allows multiple apps to coexist at the same time, and interact with each other in very powerful ways. Both Vision OS and Android XR are moving to this paradigm. And the wonderful thing with this is you can mix 2D content with 3D content and interact in very unique ways.
Room 406: So we found that Apple establish an industry pattern with this, where they have 2D apps, the SwiftUI app, and then you can add 3D apps within this. So there's an hybrid user interface. And as I mentioned earlier, the OS is responsible for managing these spatial containers. And the applications themselves are more focused on the content within them.
Room 406: This is an example of multiple apps interacting with each other. Particularly, spatial apps are not focused on rendering in isolation, so it's not like a traditional 3D game. They're all interacting with each other. And this allows for new interactive models, such as eye-hand-based interaction. And very intuitive, touch-based, direct interaction. You don't necessarily need a controller. And this also elevates us to have spatial gestures, including drag, rotate, and zoom. So considering all of these things, we want to keep this in mind as we think about potential solutions for the web.
Room 406: Currently, what we have is WebXR, our traditional, HTML, CSS, and JavaScript technologies, and I also touch upon PWAs and how they can provide support for some of the features that we're talking about, and spatial browsing in general. So, WebXR is a great technology, and it provides a lot of powerful, features, especially for immersive web and building some of these high-fidelity 3D games. But, we definitely need to do more. So a WebXR session generally cannot coexist. So for them, we have multiple, application WebXR applications running at the same time. They take over the whole environment. Generally, they use low-level 3D APIs, along with WebGL and WebGPU. It's difficult for existing web developers to expand to this space. For HTML and CSS, we have this very powerful 2D graphics layout framework. We all know how to create great 2D applications with web technologies, particularly CSS and HTML. And I think there's an opportunity to expand this for spatial applications. For example, we have this this Z-index, but that's more about stacking order. It's not really about extending things into the 3D realm. And similarly, we have CSS APIs, which some of them do have Z-axis components to them, but they're all ultimately projected down into a 2D space. I think there's an opportunity to expand this to the 2D space. At the moment, most HTML elements are 2D flat panels with no volume. Web3D content ultimately projects into this 2D plane.
Room 406: And then, additionally, for colors, it's difficult to have these complex materials, for example, frosted glass or, sorry, frosted backgrounds or liquid glass. It's difficult to create these dynamic materials right now. The standard just doesn't support it.
Room 406: And then in terms of JavaScript APIs. we have very powerful link tags, but in order to have new spatial containers, be able to open windows around the user, there needs to be new APIs that allow for different configurations. Maybe you specify not just the two window, but 3D volumes. potentially you want to specify where the window gets opened, or what the size is. There's a lot of opportunities here in terms of the JavaScript APIs. And for the interaction models, there is a need to go past the simple pointer events, and have maybe spatial gestures and more complex interaction models.
Room 406: There are some features that have been proposed, for example, like the model tag, where it's the right step towards spatial web, and I think there's a lot of opportunities to expand this to provide more features, such as being able to not just have the model tag within the webpage, but maybe come out a little bit, and also have a dynamic interaction. Another feature to call out here is Safari has added this immersive mode, and an interesting thing with this is you can take an existing 2D web application and make it more spatialized. But there's certain limitations with this, right?
Room 406: And like I said, with the model tag, it can render 3D models, So it has this volumetric aspect to it. But there are some limitations with it. You can't programmatically control it, for example.
Room 406: So, one thing to call out with the spatial browsing introduced in Safari is, it doesn't necessarily rely on the developer building a 3D website. It approximates the best way to take this 2D web application and make it spatialized. And I think, long-term, maybe some websites we can spatialize using AI in this fashion, but you need to have certain APIs to really allow the developers to express their 3D intent. And give them more powerful features to create a truly spatialized web application.
Room 406: While we can do some very intuitive things, like take photos and maybe use AI to spatialize it to make it more 3D or realistic, but there are limitations to this, and ultimately, we need dedicated APIs and tools for the developer to spatialize their application and truly make it 3D.
Room 406: Next I want to talk about PWA apps and how they can help make, web spatial a little bit easier. So one thing is, with PWA's apps, once they're installed, they can run in standalone windows, and this is… would also be beneficial to spatial web applications, because then they're not really… you can imagine they're not really constrained to a 2D browser. But the downside of this is when the PWA app is uninstalled, you have to run it in the browser again, so this constrains the browser tab.
Room 406: Talking about the third problem here. As I alluded to before with WebXR, Web3D is still hard, and it doesn't have as many developers as traditional web development with HTML and CSS. If I just can give some background with traditional 3D development, specifically with games, you usually have this game loop, and the developer controls all aspects of the game, or the 3D engine? They can specify all the components, the physics, the animations, they control all the entities, and usually performance is a big issue with these applications, so they want to have ultimate control. Now, compared to a 2D web… a 2D development model, especially, for example, web development, it's more event-based. You have declarative UIs, especially with React. And then the developer responds to events that are triggered.
Room 406: So with 3D engines, you have these very powerful tools. And engines like Unity and Unreal help simplify some of the complexity. But you can't just plug in 2D UI into these 3D frameworks, it's quite difficult. And… Most regular web developers find it quite hard to enter this studio development world. Most apps only need 3D for a few parts, and the rest of it will probably be better to build in a 2D framework. And you can imagine with, especially with these AI or wearable devices, we need developers to be able to create very powerful applications easily, similar to how people were building web applications. So instead of 3D containing 2D elements, spatial development, I think, will benefit from having 2D applications contain 3D elements.
Room 406: Similar to SwiftUI and Android XR applications, imagine there's a shared space that the operating system manages, and then all spatial app lives within, within their own 2D containers, and the 2D spatial app can add 3D elements to it. And some of these 3D containers can be static models that you load maybe from an asset, like USDZ, or you can have dynamic 3D models, which are built in the application itself. Using primitives. We believe that having this paradigm of 2D development that can import or use 3D models will be a very powerful model.
Room 406: We've already seen some examples, especially with SwiftUI. I'll go into some more examples further. But, as I alluded to, there's these static 3D models, we can call them model 3D views, and then reality views, which are more dynamic. And one example with Apple's ecosystem is RealityKit, where the developer can really use high-level ECS or entities and components to build up their dynamic 3D models. This is a high-level architecture of… you have the spatial app, which is the native layer, the operating system layer, and then you have the 2D UI framework. For example, this could be SwiftUI or Jetpack Compose, and then within there, you have all your 2D elements on the left. And the 2D elements can contain 3D, 3D assets, or, for example, with RealityView 3D Engine APIs.
Room 406: Another element that we have for web development is the Canvas element. This is mainly used for, 2D drawing, and then we, like I said before, we have WebXR, and then we have Web3D engines like Three.js and A-Frame to help manage WebXR development. The challenges with using Canvas for Web Spatial is that it's mainly focused on dynamic content development, so the application can draw to the canvas, but the browser or the operating system doesn't really know what the content is inside of it. It doesn't really fit into this unified rendering model. And developing for Canvas, even though it's a web technology, it's not really intuitive for existing web developers. You have to have this different paradigm to develop with it. There are some 3D engines that help with Canvas development. For example, Unity can help with this, and there's some other libraries to help with this, but it's quite different from the traditional web develop. Another technology is WebXR, and also, this is also not quite sufficient. Like I mentioned earlier, it doesn't support this unified rendering model where multiple apps can use WebXR at the same time. And because WebXR is built on top of Canvas, it uses low-level 3D graphics APIs instead of traditional HTML and CSS.
Room 406: So, given these problems, what are some of the solutions or APIs that can help with this? And these are our ideas. Feel free to jump in with your own, or any thoughts that you have. We think there needs to be a new paradigm for enabling web spatial development, and so we call this web spatial. And we think we should give developers the power of the existing 2D web, but extend this to the 3D environment. And give them the full power of both 2D development and 3D development. We believe the existing standard web should be extended to spatial development, and extend it, so that you can have both 2D, the traditional 2D stuff, and 3D content. It means that popular 2D UI frameworks like React can continue to work and leverage new spatial features, and expand the ecosystem to spatial applications.
Room 406: And you can see some parallels with this in terms of SwiftUI and Jetpack Compose, where they did similar things. You have existing applications that they're extending to the spatial environment. And they have all the existing APIs that are backwards compatible, and they have some new APIs that help the developer spatialize their application. We think this would also be helpful with web development. This could include additional parameters in HTML, new features in CSS, and also new APIs in JavaScript that help with some of these web spatial features. But at the same time, preserving existing content development experiences.
Room 406: To fix the issue with the z-axis we talked about earlier. We think, though, it'll be beneficial for CSS to have as new back parameter, which is similar to left and right, but allows elements to be shifted along the z-axis and extend the existing CSS APIs, such as transform, translate, and rotate, and scale, to actually allow the developer to control their assets in the 3D space. When they rotate something in 3D space, it's not just being projected back into a 2D plane, but actually modifying them in 3D space. You can see some examples.
Room 406: And then for the background colors, and materials, we think it would be beneficial to have new CSS properties that allow for maybe translucency, where you have this frosted glass effect, or transparency, where, you can have some content that's floating. Or different materials that allow different interactions in a a spatial Show environment.
Room 406: Okay, talking about the interaction models. For spatial web applications, it would be helpful to have some additional spatial events, and these are some examples that we thought of, but obviously we're open to input in here. Maybe something like spatial drag, spatial magnify, high-level APIs that allow the developer to easily modify and interact with Spatial optics.
Room 406: And then on the PWA front, I think there are some opportunities here to expand the existing progressive web application model to allow some additional configurations for opening windows, positioning them, without having to sacrifice the existing model of install-free apps. So, for PWA apps, these are some potential new configurations that we can add. And then also on… I screw it. Side, having new options for opening windows and configuring them. For opening new windows or scenes, there could be some additional configurations for the type, whether it's window or volume, the sizing, the scaling, and maybe even alignment. Maybe you want to attach one scene to another scene. And then, for containing 3D elements within 2D, I think the 3D model that has been proposed can be extended to support multiple, assets for the USDZ or GLB, and then also being able to control it dynamically using JavaScript.
Room 406: And then to fix… or extend the canvas model, because you can't… it can't really fix all the issues that we currently have, or that we discuss with Spatial Web. We think we can have this new reality element, which is similar to the model tag for static 3D models, but extended to have primitives that allow the developer to dynamically build up a 3D model, or 3D asset. You can have, for example, a box, sphere, some of these basic elements, and then you can imagine they would build out their SUNY content using these primitives.
Room 406: And I think there's a lot more opportunities here. This is just the beginning. We're proposing some ideas. I think this is a long road, so down the road, maybe we talk about full space and augmented reality, ornaments, orbiters, and I think there's also some additional opportunities to extend WebXR sessions, like I said, we're open to discussions on this and seeing what your thoughts are. We are open to have questions, and then maybe I will save 5 minutes to show some real-world case, that already, adopted our Special SDK.