Meeting minutes
Repository: webmachinelearning/webnn
anssik: Welcome to a new participant, Ilya Rezvov from Google
Ilya: I work for Google on Wasm mostly, recently looking at ML-related efforts, fp16 in Wasm specifically and want to extend my interests to other areas of ML on the web
Call for Consensus: WebNN API CR Snapshot
anssik: On 28 Feb 2024 we issued a Call for Consensus (CfC) to publish the Web Neural Network API as a new Candidate Recommendation Snapshot (CRS).
-> CfC to publish WebNN API Candidate Recommendation Snapshot - review by 7 Mar 2024
https://
… anssik: the WG has received explicit support for this CfC and no concerns have been raised
… our CR readiness tracker #240 shows all green except for the WIP TAG delta review
… the TAG delta review in flight may raise some questions at the transition time so we should be prepared for that
… that said, I hope we can address this in flight by noting this publication in fact explicitly addresses earlier TAG review feedback by removing support for synchronous execution
… I will handle CRS transition logistics with Dom and will ask the WG for further information as needed
… we can still merge the currently open PRs before branching for this release, it is important that after each commit the spec remains in a cohesive state
… transition request processing is expected to take a week, so earliest publication date would be on the week 18-22 March 2024
<gb> Issue 240 Candidate Recommendation readiness tracker (by anssiko) [process]
Hybrid AI exploration
anssik: As you recall, we have a sister WebML Community Group responsible for incubating new ideas for future work, the CG works closely with this WG, sharing many participants
… the WebML CG has received a new proposal called "Hybrid AI exploration":
webmachinelearning/
<gb> Issue 5 Hybrid AI Exploration (by grgustaf)
anssik: so I wanted to invite WebML CG participants Michael, Geoff, Sudeep to present this proposal to solicit input from this WG and inform the direction this exploration should take
… the timebox is ~20 minutes including Q&A
… any concrete proposals from this exploration that may impact Web APIs are expected to be incubated in an applicable Community Group first
Michael: Proposal is titled "Hybrid AI for the Web", probably a bit mistitled, we're looking at general model management too
… started work to improve the fit of WebNN on the client and want to make sure we look at the right problems, we're not proposing solutions at this stage
Michael: first going through the general status as we undersrand it, specific issues, goals and requirements we see, prioritization, closing with questions for this group
Michael: looked at WebNN use cases, client AI execution clearly in focus
… we found some problems, e.g. language translation requires large models, long download times, need to figure out client capabilities
… startup time may be significant
… if we have two different web sites using a model we need to download a model twice,
… clients vary in capabilities, vary in time, clients grow rapidly in performance and want to avoid least common denominator approach
Michael: Specific issues, three broader categories:
… 1) Model Management
… - Large models cannot be reused across origins
… - Model storage and management opaque to the user
… - Cache eviction may not match user preferences
… 2) Elasticity through Hybrid AI
… - Distributing work between client and server
… - Difficult to predict performance on a client
… - Sharing detailed client capabilities a privacy risk
… (noting possible overlap with PWA caching mechanisms)
… 3) User Experience
… - Privacy behaviour unclear, not match user preferences
… - Managing latency of model downloads
Michael: Goals and Requirements
… Maximize ease of use for the end user
… - minimize load times and meet latency targets
… Portability and elasticity
… - minimize costs, support varying client capabilities, adapt based on resource availability
… Data privacy
… - Personal and biz data, support user choice and control
… last but not least, developer ease of use and consistency
Michael: Questions for Discussion
… How to:
… handle model download latency and storage?
… match model reqs to client capabilities?
… choose among model fidelity levels?
… support progressive transmission of models?
… partition single models, support separate models, both?
… Questions to the group:
… - what should be the priorities?
… - Specific use cases for Hybrid AI?
Michael: Proposed Next Steps
… 1) Make sure we solve the right problem
… We welcome your feedback via the GH issue submitted to the proposals repo:
webmachinelearning/
<gb> Issue 5 Hybrid AI Exploration (by grgustaf)
Michael: 2) Build a prototype implementation
… e.g. using the Model Loader API from the CG as a basis, we have some ideas to test
Michael: 3) Bring results back to the group to discuss further
RafaelCintron: is there a solution in mind?
Michael: caching strategy on the computational graph, negotiating model requirements and client capablities are a few ideas
anssik: we have discussed Storage APIs for caching large models in this group earlier
anssik: I recall Joshua Lochner has prototyped solutions to cross-site sharing of models with a browser extension
Joshua_Lochner: from Transformers.js perspective we're fortunate that the browser caching API is pretty performant, can do 1.5B param model and refresh the page and it loads from the cache
… however, the issues emerges when I go to another web site on a different origin, my extension idea works but it requires the user to download a random extension, extra effort for the user, not a standards-based feature
… for the size issue, I'm focused on smaller models that can perform in Wasm environment, soon WebGPU
… storage and issues related to exceptionally big models has not been the main focus, 50M to 250M parameters has been the focus of Transformers.js, sweet point
… due to the cache issues
… the main API is Web Cache API, models loaded as HTTP request-response pair from HuggingFace Hub
Michael: are you caching serializations of the models, ONNX files?
… any ideas re adapters?
Joshua_Lochner: caching serializations, single .onnx files, in the future separate graph and weight in separate files
… for adapters, haven't thought about it yet
… ONNX RT is rather limited in this sense, if we want to use an adapter we need to export the whole model
… MMS text-to-speech model was one example where an adapter is at the end
… what we have been able to do is split up the models, only works if the backend is identical, e.g. text-gen model, chop of the head
… that's one way to share weights, not exactly the way you're proposing
Michael: the topology is smaller chunk we're concerned of weight caching at this stage
… progressive transmitting creates questions regarding the API if it's read-only, the best now is to create zero nodes and add weight later
… to progressively enhance the model we need to built it from the scratch
Michael: the prototype will likely be similar to HuggingFace solution, for good UX may need some standardization work later in the future
Michael: what use cases can make use of different level of fidelity?
… big-little mapping to server-client, any specific use cases for Hybrid AI?
Joshua_Lochner: I guess some form of a personalization model that learns things over time, continuous training, a model where you can update the weights over time, private personalization preference learning model
… e.g. you're on Twitter and want to block certain things
… you're probably referring to much larger things, multiple LoRAs on top llama
Michael: fine-tuning on the client, split out the LoRAs so we can select them, a lot of these optimizations are relevant to big models mainly, smaller models are faster to download as is
Joshua_Lochner: another use case, some form of underlying embeddings adapting, speech-to-text, text-to-speech, base model stays the same
webmachinelearning/
<gb> Issue 5 Hybrid AI Exploration (by grgustaf)
Open issues and PRs
anssik: as usual, let's discuss open issues and review PRs based on your feedback
Core operator set
<zkis> webmachinelearning/
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
anssik: issue #573
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
jsbell: we're working on an additional contribution
… in principle, if there are emerging standards from the ecosystem, we should look at them as well to inform our work, describe them separately
Consider using label to allow better error handling for async errors
anssik: issue #585
<gb> Issue 585 Consider using `label` to allow better error handling for async errors. (by philloooo) [feature request]
jsbell: different between sync and async errors, in the initial implementation a lot of validation happens syncronously in the renderer process because XNNPACK is also running in that process
… sync errors can be moved earlier, async errors are hard for developers and also frameworks to handle
… thinking how to report those errors, with promise rejection, how to know which node is responsible for the error?
… proposed solution to follow WebGPU’s practice to define a MLObjectBase with a label field to let MLOperand extend from
jsbell: when an async error is raised them the developer has useful information about the reason
… more interesting if decomp or fusion is done
… Zoltan proposed we could auto-gen these labels
… we're interested in any feedback
zkis: I just agree with the latest comment from Josh
RafaelCintron: wanted to say I'm in favour of labels and sync
… I was also a proponent of .label WebGPU
anssik: jsbell Google folks interested in implementing this?
jsbell: yes
… what happens if developers call build and async build is happening and code modifies the label of an operand, does the build step snapshot all the labels
Dwayne: for debugging this is very helpful
… what is the format of labels, it would be helpful to raise the errors sooner than later, I wonder if you can know all the backend capabilities early, can do pre-validation, overall like the idea
zkis: we could keep generated labels separate from user-provided labels
… to be discused in the issue
Rename inputSize variables as inputRank in algorithms
anssik: issue #588
<gb> Issue 588 Rename inputSize variables as inputRank in algorithms (by inexorabletash) [conventions]
jsbell: this is very simple, comments welcome
Consider alternate styling/linking for method argument definitions
anssik: issue #574
<gb> Issue 574 Consider alternate styling/linking for method argument definitions (by inexorabletash) [question] [editorial] [conventions]
anssik: question regarding styling method arguments with three alternatives:
… Alternative 1: Make args into definitions
… Alternative 2: Style as definition list
… Alternative 3: Auto-generated table
<anssik> s/… the WG/anssik: the WG