Meeting minutes
Repository: webmachinelearning/webnn
Anssi: welcome to our first meeting of the year 2026, we had a break over the holiday and return to the usual cadence
Anssi: we'll start by acknowledging our later new participant who joined the WG:
… Liang Zeng from ByteDance
… welcome to the group, Liang!
Anssi: also welcome again Doug Schepers!
Doug: using classical ML for a11y improvements
… Jonathan Ding from Intel joins us as a guest for this meeting to present a new proposal, discussed next
New proposal: Dynamic AI Offloading Protocol (DAOP)
Repository: webmachinelearning/proposals
Anssi: from time to time we review new proposals submitted for consideration by the WebML community
Anssi: we have received a new proposal called the Dynamic AI Offloading Protocol (DAOP) #15 that could benefit from this group's feedback and suggestions
<gb> Issue 15 Dynamic AI Offloading Protocol (DAOP) (by jonathanding)
Anssi: as you know, our group has received feedback from developers and software vendors from time to time that they'd love to run inference tasks with WebNN, but often times they're unsure if the user's device is capable enough
… diversity of models and client hardware make it challenging to determine up front whether a given model run on the user's device with QoS that meets the requirements of the developer
… and we can't expose low-level details though the Web API to avoid fingerprinting, also we believe we shouldn't expose too much complexity through the Web API layer to remain future-proof
… this easily leads to a situation where web apps either choose to use the least common denominator model, or use cloud-based inference even if the user's device could satisfy the QoS requirements
… I have invited Jonathan Ding to share a new proposal called Dynamic AI Offloading Protocol (DAOP) to address the challenges related to offloading inference tasks from servers to client devices
… Jonathan will introduce the proposal in abstract, a few example use cases, and a high-level implementation idea -- we won't go into implementation details in this session
… after Jonathan's ~5-min intro we'll brainstorm a bit to feel the room and inform the next steps
… I will ask everyone to focus on the use cases -- do these use cases capture the key requirements?
Jonathan: this is about hybrid AI, expectation is to be able to offload the inference task to the client, offloading is not free, you have QoS expectations
… you need to be able to decide if the device is capable to run the given model and satisfy QoS requirements
Jonathan: Use Case 1: Adaptive Video Conferencing Background Blur
… A cloud-based video conferencing provider wants to offload background blur processing to the user's laptop to save server costs.
… 1. The cloud server sends a light-weight weightless Model Description (topology and input shape only, without heavy weight parameters) of the blur model to the client's laptop
Jonathan: ... 2. The laptop's browser runs a "Dry Run" simulation locally using the proposed API to estimate if it can handle the model at 30 FPS.
… 3. The laptop returns a QoS guarantee to the server.
… 4. If the QoS is sufficient, the server pushes the full model to the laptop; otherwise, processing remains on the cloud.
Jonathan: Use Case 2: Privacy-Preserving Photo Enhancement for Mobile Web
… A photo editing web app wants to run complex enhancement filters using the user's mobile NPU to reduce latency.
… 1. The application queries the device's capability using the standard performance estimation API, avoiding fingerprinting by returning a broad performance "bucket" rather than exact hardware specs.
… 2. The device calculates its capability based on the memory bandwidth and NPU TOPs required by the filter model.
… 3. Finding the device capable, the app enables the "High Quality" filter locally, ensuring the user's photos never leave the device.
Jonathan: two sub-proposals, differing in how they assign responsibility between the Caller and the Callee
… Sub-proposal A: Device-Centric (Caller Responsible)
… the Cloud acts as the central intelligence. It collects data from the device and makes the decision.
… Sub-proposal B: Model-Centric (Callee Responsible) - Preferred
… the Device acts as the domain expert. It receives a description of the work and decides if it can handle it.
<gb> Issue 15 Dynamic AI Offloading Protocol (DAOP) (by jonathanding)
Rafael: estimating capabilities of user hardware, can I run the model, you also want to know if you can run the model well, this has challenges in native environments too
… big weights will run with the same topology slower than with smaller weights
… need to consider the impact of other applications running on the system at the same time
<RafaelCintron> http://
Rafael: this site shows what information is exposed by WebGPU, WebGPU adapter information does disclose pretty detailed information that allows developers to infer some details about GPU, something similar could in abstract work for WebNN
Rafael: this is certainly something developers are struggling with and it is worth exploring further
… as models get bigger more people will struggle with this problem
Jonathan: thank you for the comments, very good feedback
… instead of running the entire model, this aligns with what we observe from ISV discussion
Anssi: I think rustnn/
Tarek: I have also other utils, e.g. for ONNX<->WebNN graph conversion
Doug: can a person opt into and opt out of sharing device capability information?
… the model-centric proposal does not fingerprint, does the user have agency in making sure the device is not used for compute they don't want it to be used?
Rafael: to answer Doug, for WebGPU and WebGL, those APIs have no permission prompts, and the APIs can allocate a lot of memory and compute, the same with JS, also Storage APIs, Chromium has lifted storage restrictions
Anssi: thank you Jonathan for sharing this proposal with the group
… I'm hearing the group agrees these use cases are valuable
… I also hear the group would like to see interested people move forward with this proposal
RESOLUTION: Create an explainer for Dynamic AI Offloading Protocol (DAOP) and initiate prototyping
Candidate Recommendation Snapshot 2026 review
Repository: webmachinelearning/webnn
Anssi: PR #915
<gb> Pull Request 915 Add Candidate Recommendation Snapshot for staging (by anssiko)
WebNN API spec release history
Anssi: we're ready to publish a new Candidate Recommendation Snapshot (CRS)
… this milestone will be communicated widely within the W3C community and externally
… our prior CRS release happened 11 April 2024, and a lot of progress has been made since:
… over 100 significant changes
… third wave of operators for enhanced transformers support
… the MLTensor API for buffer sharing
… a new abstract device selection mechanism
… the API surface has been modernized
… interoperability improvements informed by implementation experience and developer feedback
… improved security and privacy considerations
… fingerprinting mitigations
… new accessibility considerations
… I staged a release in PR #915 that adds an appendix with detailed changes per categories that map to W3C Process defined Classes of Changes:
https://
Anssi: the next step for us is to record the group's decision to request transition
… any questions or concerns, are we ready to publish?
Dom: this release triggers a Call for Exclusions so everything that's in the release scope gets Royalty-Free protection
<gb> Pull Request 915 Add Candidate Recommendation Snapshot for staging (by anssiko)
RESOLUTION: Publish a new Candidate Recommendation Snapshot of the WebNN API as staged in PR #915
<shepazu> (again, sorry for the noise… I'll try to be more respectful of group meeting time)
Implementation experience, from the past to the future
Anssi: in the past the group has also worked on webnn-native, a standalone native implementation as a C/C++ library
Markus: I'm interested in webnn-native library, understand the technical reasons for moving away from this library and into the current WebNN implementation that is more tightly integrated with the Chromium codebase
… webnn-native is similar to Dawn, a WebGPU implementation
Markus: should we revive webnn-native or use rustnn for native interface for WebNN?
Rafael: I would be personally in favour of reviving webnn-native once we ship OT
… it is a lot of work to integrate a 3rd party library into Chromium, smart pointers, bitsets and all that
… webnn-native came first, there was opposition back in the time for hosting the webnn-native library outside the Chromium project
Anssi: before the break Tarek shared news about the Python and Rust implementation of the WebNN API
… this work is now hosted under the newly established RustNN project along with other utils:
Anssi: this GH org hosts a number of repos:
… rustnn, the Rust implementation
… pywebnn, Python bindings for rustnn
… webnn-graph, a WebNN-oriented graph DSL
… webnn-onnx-utils, WebNN <-> ONNX conversion
… trtx-rs, TensorRT-RTX bindings
… and more
Tarek: it is a lot of fun working on RustNN, happy to add all interested collaborators to the repo
… I want to have all the WebNN demos working on Python as well, focusing on LLMs now
Tarek: I have a patch for Firefox to expose rustnn with JS bindings
Accelerated context option implementation feedback
Anssi: issue #911
<gb> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]
Anssi: we received new implementation feedback from Mingming (thanks!) for the accelerated context option
https://
Anssi: specifically the feedback asks for clarification how "accelerated" is supposed to interact with the existing power preference ("default", "high-performance", "low-power")
… currently, as specified, "accelerated" property has lower priority than "powerPreference"
… per Mingming, this creates difficulty in the following scenarios:
{ powerPreference: "low-power", accelerated: true } if no low-power device is available
{ powerPreference: "low-power", accelerated: false } if the implementation cannot force CPU to low-power state
{ powerPreference: "high-performance", accelerated: false } if the implementation cannot force CPU to high-performance state
… Mingming's proposal is to give "accelerated" a higher priority than "powerPreference"
… Zoltan's proposal is to consider "powerPreference" to set the power envelope limits
… I'd like to discuss how to evolve the spec to clarify this aspect
… first, I'd like to establish whether we agree both "accelerated" and "powerPreference" are hints i.e. implementers provide best-effort service given this information
… second, I'd like to ask if it would be clearer to present the possible combinations as an informative truth table instead of prose?
Mike: since these are hints, depending on the system the implementation can ignore them and be spec-conformant
… on macOS for example, WebGPU/GL may ignore similar hints
… how to interpret these hints, it may not be successful to try to prescribe what implementers should do
Zoltan: I summarized that if power preference is low-power it expressed developer priority for lower power, otherwise accelerated would have priority, nevertheless it is all hints, would use an informantive truth table
… I was considering Apple platform capabilities in this design
… power envelope may be for heat management or other reasons
Rafael: what do people think about items in the powerPreference enum?
… what is available in frameworks today for implementers?
… if the backend is CoreML or LiteRT, what do I do?
… fallback adapter would be one boolean
Rafael: suggestion, powerPreference enum could have a new "no-acceleration" value
<RafaelCintron> https://
Rafael: "no-acceleration" could map as of today to CPU as in current frameworks
… WebGPU has a similar problem and they solved it with powerPreference and fallback adapter
Mike: quick comment, the proposal from Rafael sounds reasonable, the name we may want to iterate on, "no-acceleration" should not explicitly mean run on CPU
Zoltan: we had a use case for "accelerated", need to revisit that