Meeting minutes
Incubations
Dynamic AI Offloading Protocol
Repository: webmachinelearning/daop
Anssi: at our last meeting we resolved to create an explainer for Dynamic AI Offloading Protocol (DAOP) and initiate prototyping
https://
Anssi: thank you to MarkusH for providing feedback from Google Meet product perspective
Anssi: today, Jonathan will briefly present the explainer and prototype that he has put together (thanks!)
… you'll find the content that is presented by Jonathan from PR #1
<gb> Pull Request 1 Add DAOP explainer and estimateQoS() illustration with background blur demo (by jonathanding)
Reference implementation for illustration purposes (PR preview)
Anssi: after the presentation and demo, questions and comments are welcome
… and after the meeting, more detailed review feedback is welcome via PR #1
Anssi: Jonathan, to start, please introduce the explainer
Jonathan: this proposal extends the WebNN API
… DAOP proposal uses a model centric estimation, can be done at initialization time or dynamically during runtime
… the proposed estimateQoS() API extends the WebNN API
… this API uses performance tiers: "excellent", "good", "fair", "moderate", "slow", "very-slow", "poor"
… we propose a core API for performance negotiation
… to maximize the benefits of DAOP, the underlying WebNN specification should support a weightless build mode
… we propose that WebNN builders be extended to allow:
… - weightless constants
… - lazy / explicit binding
… instead of performance tier, the local inference engine can respond whether it can meet the given performance requirement using the proposed meetsRequirement() API
Jonathan: for prototyping, we polyfilled WebNN and provided a demo
[Jonathan sharing DAOP demo]
Jonathan: background blur demo uses lazy binding for performance benefit and avoids fingerprinting by using performance tiers
… micro-benchmarks can be re-run in the background also
… the explainer welcomes your comments and feedback in PR #1
<gb> Pull Request 1 Add DAOP explainer and estimateQoS() illustration with background blur demo (by jonathanding)
Jonathan: we chose the group's preferred approach, and rejected the alternative design
Rafael: if the implementation runs the benchmark locally, why browsers are better qualified to run the benchmark?
Jonathan: browser has knowledge of the underlying platform and its capabilities, so can provide a better offline estimation, also can estimate the QoS that holds over a longer period of time
Rafael: thanks, that clarifies the question, sometimes the browser indeed knows better on which devices certain ops run
Jonathan: we can provide a library abstraction on top of the proposed QoS estimation API
… the ultimate goal is to solve this problem for both web and native
Rafael: the guarantee of QoS works for games, but the web browser may be more complex as it has a number of tabs doing other tasks whose performance profile may change over time
… a learning from the Prompt API estimation was we don't want to run the user's machine to the ground when the API is called
… I think estimation sometimes is the best we can do, it can just be challenging and how to maintain it over time
MarkusT: for estimating ops themselves may be problematic due to fusion that can happen
… there may be background operations that impact the estimation
… I'd be concerned if performance metrics measured by the frameworks would be shared across websites to avoid super cookie-style mechanism
… generally I see this is useful as a library
Jonathan: eventually we need native support to understand fusion, caching etc. optimizations only native knows about
<Zakim> reillyg, you wanted to mention compatibility detection.
Reilly: I agree with MarkusT that this is probably not dangerous from fingerprinting perspective
… for compatibility detection of models, developers could try to build models of different shapes, the challenge in that is the requirement to download all the models, but with weightless graph shape can do initial testing, interested to see what opportunities that provides
<mtavenrath> One could build a performance database and use WebGL to query the GPU string...
Last Week in Community Group
Repository: webmachinelearning/webmcp
https://
https://
Anssi: the group established the WebMCP spec draft and landed the initial WebIDL in PR #75
<gb> MERGED Pull Request 75 Add WebIDL and descriptions to spec draft (by bwalderman)
Anssi: the group initiated work on WebMCP accessibility consideration, the group believes there's a great potential to improve accessibility with this tech, new a11y experts joined to contribute to this effort
Repository: webmachinelearning/translation-api
Anssi: for Translator and Language Detector APIs, there's been an active debase on input formats in Mozilla's standards position repo, see spec issue #60
<gb> Issue 60 Input formats for translation and language detection (by tomayac)
Repository: webmachinelearning/prompt-api
Anssi: for Prompt API, we discussed how to address the TAG review feedback
Web Neural Network API
Repository: webmachinelearning/webnn
Origin Trial launch
Anssi: I want to take a moment to celebrate WebNN Origin Trial launch on Chrome and Edge 146
… WebNN Origin Trial journey started in Chrome Beta on 11 Feb and hits Stable on 10 Mar
… and for Edge, Beta on 19 Feb and Stable on 12 Mar
… this significant implementation milestone follows hot on the heels of the WebNN API spec CRS milestone 22 Jan 2026
… congrats to the group for this major achievement!
… in concert with OT, the group also launched WebNN Docs, a vendor-neutral resource that assists early adopters on their OT journey
-> WebNN Docs: Origin Trials Registration
https://
Anssi: huge thanks to Belem for kicking off this docs resource!
… the entire webnn.io docs site is a community resource and you can contribute via GitHub:
Anssi: any other OT resources to bring to the group's attention?
Ningxin: I wanted to share that webnn-samples already added OT token so feel free to use the samples without the flag
Rafael: I did the same for Edge
<RafaelCintron> https://
Power preferences and the fallback concept
Anssi: issue #911
<gb> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]
Anssi: two things:
Anssi: first, I'd like to see if we can agree on the proposed spec change to expand powerPreference enum with a new value "fallback" (prior "no-acceleration")
… as discussed last time, "fallback" as a name would be more future-proof and conceptually aligned with WebGPU, thanks MikeW for the suggestion
… I feel this naming consideration is more than just bikeshedding
… second, I'd like to understand if we need any spec language changes to help clarify that these hints are non-normative
… we already avoid using any RFC 2119 MUSTs and SHOULDs in MLContextOptions spec section
https://
Anssi: perhaps we could add a green Note box to highlight this fact
Rafael: in both WebGPU/GL power preferences are hints
<RafaelCintron> https://
<reillyg> https://
Anssi: it seems we agree "fallback" is a good name for a new powerPreference and these hints are non-normative, and maybe we want to lift some language from WebGPU/GL for WebNN
Reilly: I think we're at the stage where PR with a specific proposal would be a reasonable next step
Dynamic dimensions
Anssi: issue #883
<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
Anssi: I'd like to review the latest MLDynamicDimension proposal and discuss any new information from prototyping
… Ningxin and Bin verified that the latest Chromium prototype of this feature is able to run TinyLlama 1.1B model with dynamic input size
… TinyLlama 1.1B is optimized for use on devices with restricted memory budget and computational capabilities and enables use cases such as real-time machine translation
https://
Anssi: this bigger model is thus well-suited to demonstrate the MLDynamicDimension feature
Anssi: prototyping experience suggested two additional IDL changes, change type of expand.newShape and reshape.newShape from unsigned long to MLDimension
Ningxin: we can implement full chat demo with this feature, in my comments I gave an example how the developer can build this model with dynamic input and set the concrete dimensions for the input at different stages, prefill stage per sequence length, and in the decoding loop increase the total sequence length during the loop while still using the same model
… at last TPAC we shared static shape limitation, where the developers has to use two separate models, for prefill and decode, this increases memory footprint and inference time
… now with dynamic shape support we can use just one model
… for reshape and expand we observed changes are required:
- MLOperand expand(MLOperand input, sequence<unsigned long> newShape, optional MLOperatorOptions options = {});
+ MLOperand expand(MLOperand input, sequence<MLDimension> newShape, optional MLOperatorOptions options = {});
- MLOperand reshape(MLOperand input, sequence<unsigned long> newShape, optional MLOperatorOptions options = {});
+ MLOperand reshape(MLOperand input, sequence<MLDimension> newShape, optional MLOperatorOptions options = {});
Ningxin: in the prototype we can expand known dynamic dimension, we introduced known dynamic dimension that is defined by builder.input
… you can reshape or expand a shape with dynamic dimensions without introducing a new concept
Ningxin: we believe we can also support Qwen and other LLMs with this proposed change
… Tarek's comments in rustnn are very helpful
Ningxin: to conclude, by using one model for both prefill and decode stages, the session creation time and memory footprint are reduced significantly
Anssi: Tarek is implementing and Markus is reviewing the the dynamic shape feature in rustnn, you can follow the plan and implementation progress
<gb> Issue 15 Support flexible input sizes (by tarekziade)
<gb> Pull Request 16 Add dynamic Dimension support (flexible input shapes) (by tarekziade)
MarkusT: I will convey Tarek's feedback
… in summary the changes seemed to go well for rustnn
Reilly: I support the group advancing to a spec PR for this feature
Markus: for the API change, do we support both the versions, reshape and expand?
Ningxin: the current proposal is a union, the IDL changes but supports both static and dynamic use cases can run all the static samples
<reillyg> +1
<RafaelCintron> +1
<mtavenrath> +1
RESOLUTION: Create a spec PR for MLDynamicDimension to support models that may require flexible input sizes, supports both static and dynamic use cases.
Next WG Teleconference 12 March 2026
Anssi: given many group participants are celebrating the Spring Festival, I will skip over the 26 Feb WG teleconference
… the next WG teleconference will be 12 March 2026
… I want to wish everyone who is celebrating Spring Festival very relaxing vacation
… for 2026 the animal is Horse representing energy, speed, freedom, and success
… I like this as it reflects this group and its timely launch of WebNN OT in 2026
… we will be back with even more energy and speed after this mini-break!
<ningxin> thanks Anssi!