WebML WG Teleconference – 21 September 2023

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome Etienne Noël from Google to the WG! SW Eng Manager at Google Chrome, has worked on Project Fugu among other things, also experience with Storage APIs, caching all the things, Pacific timezone

Update from TPAC 2023

anssik: As discussed and agreed, the W3C's Web Machine Learning WG did not formally meet at this year's TPAC. However, the WG's work received a lot of interest during the hallway track and in other meetings.
… AI/ML topics were discussed in the Advisory Committee meeting and in multiple breakout sessions:

AI @ W3C (AC meeting)

Ethical AI (breakout)

Generative AI (breakout)
… in the AC meeting, W3C's Dom gave an update on "AI @ W3C".
… WebNN API and Ethical Principles work noted explicitly as key efforts in this space, also our latest work on transformer-based generative models and the broader impact of generative AI was discussed
… generative AI discussion was centered around impacts:
… - transforming content creation
… - interaction with content and services
… - new centralized risks
… - web-scraped content re-use (for training)
… questions for W3C roadmap that came up:
… - marking content as AI generated
… - marking content usable for training
… - exposing browser-provided ML models through Web APIs
… - large asset download/caching/sharing from browsers
… - delegating partial model execution to the network
… operational questions for W3C:
… generative AI role in:
… - spec writing
… - test writing
… - docs writing

anssik: in both the breakout sessions on Ethical AI and Generative AI the discussion was pretty much around ethical issues, and not so much on technical solutions
… in a WG chair capacity I invited interested folks who were interested to join the WG and help contribute to the Ethical Principles guidelines
… I shared that the WG is currently focused on delivering on its WebNN API v2 technical goals and we'd welcome new participants to help drive the ethical work forward.
… that's my summary from TPAC relevant to this WG, questions?

RachelY: it was great to see Anssi IRL
… we had a breakout session on computational intelligence too, Joshua Lochner joined and university researchers

Unleashing the Power of Computational Intelligence on the Web
… I'd like to ask this WG's participants to support the Community Group we're starting for Computational Intelligence

WebNN v2: Proposed model targets

anssik: Over the last few months the WG has identified the following as its proposed v2 model targets:
… - Text-to-image: Stable Diffusion unet/VAE/text encoder
… - Image segmentation: Segment Everything decoder
… - Speech-to-text: Whisper Tiny
… - Text-to-text generation (encoder-decoder): t5 and m2m100
… - Text-generation (decoder-only): llama
… the participants have proposed these targets that are backed with implementation experience, from the WebNN Chromium DirectML prototype and Transformers.js

chai: in the topic of transformer support, want to solicit feedback from the WG, I think this will be iterations
… we can start with the models now proposed, SD, Segment Everything
… we want to acknowledge adding this work will be incremental
… I think SD and Segment Everything are good start
… adding these to the spec will help the browsers improve other browser infrastructure such as the caching system
… SD 1.5 is a good starting point, not too big
… there is also interest in running SD on NPU, NPU support is separate from transformers support, but they're actually almost intertwined
… good opportunity to tack NPU support at the same time start small

ningxin_hu: initial targets are good, with my collegue Wanming we started op breakdown work, want to provide that as input to the WG in the coming days
… in this analysis we observed almost all these models in ONNX leverage dynamic shaping for input size
… today WebNN supports only static size tensors
… we looked at how WebNN implementation can handle this, leave dynamic shape to framework
… let WebNN deal with static shapes, doing this investigation in parallel
… we'll also provide that input, Jiawei also contributed to this work

<Joshua_Lochner> @Rachel @Anssi Link to my TPAC presentation (I reused the template from my previous talk, and updated the content to include the other HF JavaScript libraries): https://docs.google.com/presentation/d/1hx-Y1HFw_F88FJiqZnJ3FYlRnj7HKKwuYkJIRL0TcU0/edit?usp=sharing

<gb> @Rachel

<gb> … @Anssi

<jsbell> No concerns, just excited!

<RachelY> I submitted a proposal for a business group with several university research groups and companies joining as new members. You are invited to support the creation of this group: http://www.w3.org/community/groups/proposed#computationalibg I would appreciate your input on how this community group can contribute to new use cases and learn from the good work of the Web ML WG.

ningxin_hu: if there are no concerns with this set of model targets I'd ask the WG to initiate work on an op breakdown to better understand what is common across these architectures to inform WebNN API v2 priorities
… this work is similar to what we did for the so-called first-wave models a few years back when "v1" API effort was kicked off

The first-wave models op breakdown
… at that time we looked at the following models:
… - Image classification: SqueezeNet, MobileNet, ResNet
… - Object detection: TinyYOLO
… - Noise suppression: RNNoise, NSNet
… we don't need to copy this approach exactly, the important part is to conduct this work in public to allow the public to review and participate
… issue #375 is the place for discussions, suggestions and feedback
… from that issue you'll also find Dwayne's initial proposal for v2 ops for the proposed text-to-image, image segmentation and speech-to-text models
… comments? it seems we're ready to move forward with this plan?

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

Storage APIs for caching large models

anssik: I'd like to have a discussion on issues related to caching large models in the browser, current Storage APIs, specific issues and workarounds for those APIs.
… first, I'd like to set the expectation and then let Joshua share his insights on Storage APIs in general
… to frame this discussion:
… I'm not proposing we make any changes to the WebNN API to support caching, I'd like us to discuss issues related to current Storage APIs that could be submitted as feedback to the respective Storage API specs
… Storage APIs (plural) in browsers mean many things:
… for browsers we have: IndexedDB, localStorage, sessionStorage, Service Worker Cache API, applicationCache (deprecated, superseded by Service Worker Cache API)
… there's also the Storage Standard that defines APIs for persistent storage and quota estimates across all these Storage APIs.

IndexedDB

localStorage

sessionStorage

Service Worker Cache API

applicationCache (deprecated, background info)

Storage Standard
… I'd like to Joshua to give an intro to this topic and then we can discuss the pain points with current APIs for caching large models.

jsbell: some background info then handover to Etienne
… first, we need to distinguish imperative from network caching
… latter is not explicitly controlled, by hinted by HTTP Headers
… network caching in confusing to web developers
… x-site resources could be cached and it make loading pages faster
… we've learned that is a privacy concern, so using separate network caches nowadays
… today two top-level site a.com and b.net, even if the same resource is requested from CDN they receive separate resources
… there has been research on whether we can maintain privacy properties with transparent caches
… this WG wants to cache GBs, more pressing need to transparent caching or something that the browser caches and can be shared
… this leads to sites using imperative APIs to ensure data is not downloaded more than once
… a few new Storage APIs added include: File System API
… there's also a similarly names older API, not to be confuesd
… this new File System API aka Origin-Private File System (OPFS) ships in all browsers
… POSIX file semantics, accessible synchronously from workers
… works with Wasm, Emscipten native support
… AppCache is gone as Anssi mentioned, we can forget it
… SW Cache API is a replacement for AppCache, resource on any site pulled down, they're immutable resources, Cache API may be a good fit for those
… API fully accessible from Worker and Window contexts, please use it also outside ServiceWorker context
… IndexedDB has complexity on top of KVS

etienne: I've researched memory impact of downloading large file
… tested Cache API, OPFS
… downloading without hitting memory important, wanted to hear issues and what can we do to improve the existing APIs, do we need to inform developers better, do these APIs not meet the reqs of web developers?

Joshua_Lochner: great to meet you Etienne! I've created Transformers.js and for the most part it's been way simpler that people would expect to cache, I have >1GB model cached
… released today is in-browser close of HF Chat UI that is 800M param model, using SW Cache API, Transformers.js only uses the Cache API
… early creation of Transformers.js requested HF Hub and stored model downloaded with Cache API on the client
… top-level requirements, share store, manage multigig models
… key issue for me is x-site sharing of models
… now have to download model A on sites 1, 2 and 3 and share across the sites.
… workaround is to use a browser extension
… I created a sample browser extension and had issues with IndexedDB, when set active tab permission in manifest.json, running IndexedDB in the context of the web page had issues
… I serialized model data with base64 to work around this which of course increased the model size to work around the sharing limitation
… File System API is also another alternative for sharing across domain, the problem is it requires a buy-in from the user
… I want to make this completely invisible to the user where the model is stored and how it is stored
… needing to ask the user where the model is shared in his system is a problem

jsbell: thanks for the good news using the Cache API just worked with large models
… web developers don't understand always Cache API has improved over time so much
… back in the day only localStorage and sessionStorage with 5MB was available to web developers
… Chrome has improved quota and other browsers have followed
… limited storage wrt native apps was a big gap, in Chrome we piloted an upper limit of sites able to use up to 80% of the available disk space, we clean up that pool
… sites may compete for storage but that is not common
… Chrome, Safari, Firefox, Edge all can store gigabyte models today
… querying quota is also supported, Safari 17 supports the Storage API and its quota features
… JoshuaL talked about browser extensions to mediate between sites, going beyond web proper, but we want to learn how web developers work around these issues
… good way to prototype these things
… Anssi mentioned at TPAC, what if browser shipped common models
… interesting idea, wonderful exploration
… re IndexedDB issues, each top-level site sees its own storage, extension uses the extension's domains and proxies the data from the extension to the individual pages
… using base64'd strings to pass data shouldn't be needed
… new File System API has access to local files user can interact with, allows image editor on the web
… internals of this API and file handles have access to this OPFS, similar to IndexedDB, hidden from the user
… the privacy FS is hidden from the user, partitioned by top-level site, does not help with x-site sharing of cached models

<Joshua_Lochner> Thanks so much Josh B!

<etienne> Joshua L. I definitely want to follow up with you to understand better why serializing was necessary, we can discuss offline

<Joshua_Lochner> Sounds good!

chai: this reminds me of the earlier days of web browser that video file is too large to be cached, reminds me of the evolution of media, over time problems are mitigated by ecosystem, web sites and browsers, e.g. video streaming fixes the video download problem
… LLM download issue may solve itself in creative ways in the future

RafaelCintron: JoshuaB, what about the domain restrictions? Can JL do something to make users live easier?

jsbell: privacy protection is the most important

Update from Intel Innovation

chai: Intel Innovation 23 was this week, I was invited to give a talk on WebNN, I'll share the presentation
… a few important announcements, Gaudi datacenter, also AI chip in Intel MTL platform aka Intel Core Ultra
… I sense people want to see models running on NPU now, when we talked about WebNN, we have three processor types, Wasm for CPU SIMD, WebGPU for GPU, WebNN as the standard solution to accelerate on GPU and NPU, this was well received
… implementation work in Chromium on DirectML was also discussed
… the web is coming up, AI on the web can be accelerated similarly to what happens outside the web

anssik: thanks for the update Chai!

<chai> The link to the deck is here: https://static.rainfocus.com/intel/innv2023/sess/1689805601141001Xyjy/supmat/DevelopingWebBasedAIApps_Final_1694887505835001jjpL.pdf

– DRAFT –
WebML WG Teleconference – 21 September 2023

21 September 2023

Attendees

Meeting minutes

Update from TPAC 2023

WebNN v2: Proposed model targets

Storage APIs for caching large models

Update from Intel Innovation

Diagnostics