WebML WG Teleconference – 15 January 2026

Meeting minutes

Repository: webmachinelearning/webnn

Anssi: welcome to our first meeting of the year 2026, we had a break over the holiday and return to the usual cadence

Anssi: we'll start by acknowledging our later new participant who joined the WG:
… Liang Zeng from ByteDance
… welcome to the group, Liang!

Anssi: also welcome again Doug Schepers!

Doug: using classical ML for a11y improvements
… Jonathan Ding from Intel joins us as a guest for this meeting to present a new proposal, discussed next

New proposal: Dynamic AI Offloading Protocol (DAOP)

Repository: webmachinelearning/proposals

Anssi: from time to time we review new proposals submitted for consideration by the WebML community

Anssi: we have received a new proposal called the Dynamic AI Offloading Protocol (DAOP) #15 that could benefit from this group's feedback and suggestions

<gb> Issue 15 Dynamic AI Offloading Protocol (DAOP) (by jonathanding)

Anssi: as you know, our group has received feedback from developers and software vendors from time to time that they'd love to run inference tasks with WebNN, but often times they're unsure if the user's device is capable enough
… diversity of models and client hardware make it challenging to determine up front whether a given model run on the user's device with QoS that meets the requirements of the developer
… and we can't expose low-level details though the Web API to avoid fingerprinting, also we believe we shouldn't expose too much complexity through the Web API layer to remain future-proof
… this easily leads to a situation where web apps either choose to use the least common denominator model, or use cloud-based inference even if the user's device could satisfy the QoS requirements
… I have invited Jonathan Ding to share a new proposal called Dynamic AI Offloading Protocol (DAOP) to address the challenges related to offloading inference tasks from servers to client devices
… Jonathan will introduce the proposal in abstract, a few example use cases, and a high-level implementation idea -- we won't go into implementation details in this session
… after Jonathan's ~5-min intro we'll brainstorm a bit to feel the room and inform the next steps
… I will ask everyone to focus on the use cases -- do these use cases capture the key requirements?

Jonathan: this is about hybrid AI, expectation is to be able to offload the inference task to the client, offloading is not free, you have QoS expectations
… you need to be able to decide if the device is capable to run the given model and satisfy QoS requirements

Jonathan: Use Case 1: Adaptive Video Conferencing Background Blur
… A cloud-based video conferencing provider wants to offload background blur processing to the user's laptop to save server costs.
… 1. The cloud server sends a light-weight weightless Model Description (topology and input shape only, without heavy weight parameters) of the blur model to the client's laptop

Jonathan: ... 2. The laptop's browser runs a "Dry Run" simulation locally using the proposed API to estimate if it can handle the model at 30 FPS.
… 3. The laptop returns a QoS guarantee to the server.
… 4. If the QoS is sufficient, the server pushes the full model to the laptop; otherwise, processing remains on the cloud.

Jonathan: Use Case 2: Privacy-Preserving Photo Enhancement for Mobile Web
… A photo editing web app wants to run complex enhancement filters using the user's mobile NPU to reduce latency.
… 1. The application queries the device's capability using the standard performance estimation API, avoiding fingerprinting by returning a broad performance "bucket" rather than exact hardware specs.
… 2. The device calculates its capability based on the memory bandwidth and NPU TOPs required by the filter model.
… 3. Finding the device capable, the app enables the "High Quality" filter locally, ensuring the user's photos never leave the device.

Jonathan: two sub-proposals, differing in how they assign responsibility between the Caller and the Callee
… Sub-proposal A: Device-Centric (Caller Responsible)
… the Cloud acts as the central intelligence. It collects data from the device and makes the decision.
… Sub-proposal B: Model-Centric (Callee Responsible) - Preferred
… the Device acts as the domain expert. It receives a description of the work and decides if it can handle it.

<gb> Issue 15 Dynamic AI Offloading Protocol (DAOP) (by jonathanding)

Rafael: estimating capabilities of user hardware, can I run the model, you also want to know if you can run the model well, this has challenges in native environments too
… big weights will run with the same topology slower than with smaller weights
… need to consider the impact of other applications running on the system at the same time

<RafaelCintron> http://browserleaks.com/webgpu

Rafael: this site shows what information is exposed by WebGPU, WebGPU adapter information does disclose pretty detailed information that allows developers to infer some details about GPU, something similar could in abstract work for WebNN

Rafael: this is certainly something developers are struggling with and it is worth exploring further
… as models get bigger more people will struggle with this problem

Jonathan: thank you for the comments, very good feedback
… instead of running the entire model, this aligns with what we observe from ISV discussion

Anssi: I think rustnn/webnn-graph could be used for prototyping this proposal

Tarek: I have also other utils, e.g. for ONNX<->WebNN graph conversion

Doug: can a person opt into and opt out of sharing device capability information?
… the model-centric proposal does not fingerprint, does the user have agency in making sure the device is not used for compute they don't want it to be used?

Rafael: to answer Doug, for WebGPU and WebGL, those APIs have no permission prompts, and the APIs can allocate a lot of memory and compute, the same with JS, also Storage APIs, Chromium has lifted storage restrictions

Anssi: thank you Jonathan for sharing this proposal with the group
… I'm hearing the group agrees these use cases are valuable
… I also hear the group would like to see interested people move forward with this proposal

RESOLUTION: Create an explainer for Dynamic AI Offloading Protocol (DAOP) and initiate prototyping

Candidate Recommendation Snapshot 2026 review

Repository: webmachinelearning/webnn

Anssi: PR #915

<gb> Pull Request 915 Add Candidate Recommendation Snapshot for staging (by anssiko)

WebNN API spec release history

Anssi: we're ready to publish a new Candidate Recommendation Snapshot (CRS)
… this milestone will be communicated widely within the W3C community and externally
… our prior CRS release happened 11 April 2024, and a lot of progress has been made since:
… over 100 significant changes
… third wave of operators for enhanced transformers support
… the MLTensor API for buffer sharing
… a new abstract device selection mechanism
… the API surface has been modernized
… interoperability improvements informed by implementation experience and developer feedback
… improved security and privacy considerations
… fingerprinting mitigations
… new accessibility considerations
… I staged a release in PR #915 that adds an appendix with detailed changes per categories that map to W3C Process defined Classes of Changes:

https://www.w3.org/policies/process/#correction-classes

Anssi: the next step for us is to record the group's decision to request transition
… any questions or concerns, are we ready to publish?

Dom: this release triggers a Call for Exclusions so everything that's in the release scope gets Royalty-Free protection

<gb> Pull Request 915 Add Candidate Recommendation Snapshot for staging (by anssiko)

RESOLUTION: Publish a new Candidate Recommendation Snapshot of the WebNN API as staged in PR #915

<shepazu> (again, sorry for the noise… I'll try to be more respectful of group meeting time)

Implementation experience, from the past to the future

Anssi: in the past the group has also worked on webnn-native, a standalone native implementation as a C/C++ library

webnn-native

Markus: I'm interested in webnn-native library, understand the technical reasons for moving away from this library and into the current WebNN implementation that is more tightly integrated with the Chromium codebase
… webnn-native is similar to Dawn, a WebGPU implementation

Markus: should we revive webnn-native or use rustnn for native interface for WebNN?

Rafael: I would be personally in favour of reviving webnn-native once we ship OT
… it is a lot of work to integrate a 3rd party library into Chromium, smart pointers, bitsets and all that
… webnn-native came first, there was opposition back in the time for hosting the webnn-native library outside the Chromium project

Anssi: before the break Tarek shared news about the Python and Rust implementation of the WebNN API
… this work is now hosted under the newly established RustNN project along with other utils:

RustNN

Anssi: this GH org hosts a number of repos:
… rustnn, the Rust implementation
… pywebnn, Python bindings for rustnn
… webnn-graph, a WebNN-oriented graph DSL
… webnn-onnx-utils, WebNN <-> ONNX conversion
… trtx-rs, TensorRT-RTX bindings
… and more

Tarek: it is a lot of fun working on RustNN, happy to add all interested collaborators to the repo
… I want to have all the WebNN demos working on Python as well, focusing on LLMs now

Tarek: I have a patch for Firefox to expose rustnn with JS bindings

Accelerated context option implementation feedback

Anssi: issue #911

<gb> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]

Anssi: we received new implementation feedback from Mingming (thanks!) for the accelerated context option

https://www.w3.org/TR/webnn/#api-mlcontextoptions

Anssi: specifically the feedback asks for clarification how "accelerated" is supposed to interact with the existing power preference ("default", "high-performance", "low-power")
… currently, as specified, "accelerated" property has lower priority than "powerPreference"
… per Mingming, this creates difficulty in the following scenarios:

{ powerPreference: "low-power", accelerated: true } if no low-power device is available

{ powerPreference: "low-power", accelerated: false } if the implementation cannot force CPU to low-power state

{ powerPreference: "high-performance", accelerated: false } if the implementation cannot force CPU to high-performance state
… Mingming's proposal is to give "accelerated" a higher priority than "powerPreference"
… Zoltan's proposal is to consider "powerPreference" to set the power envelope limits
… I'd like to discuss how to evolve the spec to clarify this aspect
… first, I'd like to establish whether we agree both "accelerated" and "powerPreference" are hints i.e. implementers provide best-effort service given this information
… second, I'd like to ask if it would be clearer to present the possible combinations as an informative truth table instead of prose?

Mike: since these are hints, depending on the system the implementation can ignore them and be spec-conformant
… on macOS for example, WebGPU/GL may ignore similar hints
… how to interpret these hints, it may not be successful to try to prescribe what implementers should do

Zoltan: I summarized that if power preference is low-power it expressed developer priority for lower power, otherwise accelerated would have priority, nevertheless it is all hints, would use an informantive truth table
… I was considering Apple platform capabilities in this design
… power envelope may be for heat management or other reasons

Rafael: what do people think about items in the powerPreference enum?
… what is available in frameworks today for implementers?
… if the backend is CoreML or LiteRT, what do I do?
… fallback adapter would be one boolean

Rafael: suggestion, powerPreference enum could have a new "no-acceleration" value

<RafaelCintron> https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions

Rafael: "no-acceleration" could map as of today to CPU as in current frameworks
… WebGPU has a similar problem and they solved it with powerPreference and fallback adapter

Mike: quick comment, the proposal from Rafael sounds reasonable, the name we may want to iterate on, "no-acceleration" should not explicitly mean run on CPU

Zoltan: we had a use case for "accelerated", need to revisit that

– DRAFT –
WebML WG Teleconference – 15 January 2026

15 January 2026

Attendees

Meeting minutes

New proposal: Dynamic AI Offloading Protocol (DAOP)

Candidate Recommendation Snapshot 2026 review

Implementation experience, from the past to the future

Accelerated context option implementation feedback

Summary of resolutions

Diagnostics