WebML WG Teleconference – 18 December 2025

Meeting minutes

Repository: webmachinelearning/webnn

Anssi: we'll start by acknowledging our new participants who joined the WG:
… Victor Huang from Microsoft
… JuGuang Liu from ByteDance
… welcome Victor and JuGuang, we look forward to working with you!

Incubations

Anssi: I want to share two key takeaways from the Community Group meeting last week:

WebML CG Teleconference – 11 December 2025

Anssi: first, Anthropic migrated the MCP development into a newly launched neutral forum, Agentic AI Foundation (AAIF), hosted as a Directed Fund under the Linux Foundation
… I had discussions with Vasilii from Block, a co-founder of AAIF, as well as Dom from W3C team
… to that end, we are in process of formalizing the W3C Community Group's coordination relationship with the newly established AAIF to enable seamless collaboration
… concrete tasks include aligning our charters and ensuring our joint work mode facilitates building interoperability between MCP and WebMCP wrt common primitives where applicable

Anssi: second, we resolved to transition from the WebMCP explainer to a Community Group spec draft stage using the existing explainer, proposal and other supplementary documentation in the repo as the basis
… we plan to complete this important transition during the first quarter of 2026

New implementation experience and developer feedback

Python and Rust implementation of the WebNN API aimed at Firefox

Anssi: I'm pleased to share with you all an early Xmas present
… I will bring in Tarek to announce the first ever Python and Rust implementation of the WebNN API aimed at Firefox
… this new WebNN implementation improves the already high-quality WebNN API specification by providing further validation
… I want to thank Tarek on behalf of the group for initiating this important implementation effort that broadens the reach of WebNN to non-Chromium browsers and to the Python ecosystem making it possible to use WebNN outside the browser for the first time ever
… this helps us establish the WebNN API as the lingua franca spoken by both web and Python developers
… I will let Tarek share the exciting story

Tarek: I have a few slide prepared, will share them with the group

Tarek: the work is at an experimental stage
… rustnn is a Rust implementation of WebNN, independent library, follows WebNN spec
… easy to add a Python API similar to JS API, this allows Python ecosystem to play with WebNN
… the project was built with Firefox in mind, we want to integrate WebNN into Firefox
… I looked at Chromium implementation, also webnn-native, eventually chose Rust given it fits well with Gecko, also enables Python bindings easily
… design goals: strict spec interpretation, backend independence, early error detection, testability, not just for browsers
… high-level architecture, three executors: ONNX Runtime, Core ML, TensorRT
… ONNX Runtime executor implements the most operators, almost all of them
… backend converts the WebNN Graph to ONNX Graph and CoreML Graph
… simple and pluggable Rust code base
… Python implementation mirrors WebNN structure, context -> builder -> graph
… MobileNet demo from the WebNN samples repo works with Python bindings, using pywebnn
… Rust example, same conceptual phases as with Python
… for Firefox support, another patch in Bugzilla that implement the same WebIDL API as Chromium and uses cbindgen for bridging rustnn to C++
… I didn't do anything via IPC for POC purposes, the final patch will use IPC layer and is coming soon
… next I'll share a demo of Firefox with WebNN API
… MobileNet demo loads weights from 106 layers, grabs an image and does the classification in the browser, works exactly the same as the demo that exists in Chromium
… implementation status, currently 85 ops implemented, ~89% of current WebNN API spec
… implemented ops support validation, shape inference, ONNX, CoreML lowering
… some gaps exists, RNN family deferred, CoreML partially implemented, ref float16 issues, Firefox patch is POC quality due to IPC layer missing

Tarek: WPT and conformance test made implementation work way easier, 1350 test passing now for ONNX backend
… next steps, finish WPT data conversion for remaining ops, implement more demos, finish TensorRT execution support, performance, improve docs

rustnn GitHub repo

rustnn docs

Blog post: WebNN aimed at Firefox

pywebnn package (Python bindings for W3C WebNN API)

Firefox Integration Bug

Rafael: thank you so much for this work! Great to see Rust and Python
… have you thought about the more advanced demos for WebNN?

Tarek: I started with the demos hosted under the webmachinelearning GH org

Tarek: I will check those additional demos out

Rafael: you did all this in Rust, I was surprised you went to C++ to integrate with Firefox

Tarek: Gecko is a big C++ app and core is C++ so to integrate a new feature you create a Rust lib and expose it to C++ app
… maybe one day all this is in Rust, but now core is in C++, all the things in WebIDL is C++ code

Ningxin: thank you Tarek, awesome work!
… I will share another link for additional demos we host on Hugging Face so it'll be great to see those run in rustnn
… in your presentation the code you use compute and pass CPU buffers, do you have plans to support MLTensor and dispatch interface?

Tarek: I have MLTensor already, I did not surface it to this version yet

Dwayne: cool demo!
… you converted WPT to Python test cases, your experience?

Tarek: see https://github.com/tarekziade/rustnn/tree/main/tests/wpt_data for the approach

New developer feedback from the developer ecosystem

Anssi: we see developer excitement around WebNN building
… in appreciation of the developer community's contributions, we've curating the experiments and feedback into the awesome-webnn GH open to all:

webmachinelearning/awesome-webnn

Anssi: you will find pointers to community contributed demos, tech talks at developer events, tutorial and more in this repo

Anssi: I'd like to share recent feedback we received from a well-known developer through our spec repo, quoting:
… "Holy s**t, looks like in the right combination WebNN inference on GPU is over 5x faster than WebGPU"

webmachinelearning/webnn#763 (comment)

<gb> Issue 763 Request standards positions from Mozilla and WebKit (by reillyeon) [process]

Anssi: the right combination is the latest Chromium, ONNXRuntime backend, and ONNXRuntime Web
… in this developer's case he was able to tap into CUDA kernels via WebNN instead of generic GPU pipeline through WebGPU
… this provided a significant performance boost in this use case

Anssi: this feedback demonstrates how WebNN as a high level API abstraction is able to accelerate computationally expensive ops that are the building blocks of modern model architectures
… this reinforces the message that the WebNN and WebGPU APIs coexist and complement each other on the web platform, thus we continue to improve efficient interop bridges

External weights, learnings from WebGPU

Anssi: WebNN issue #901

<gb> Issue 901 Proposal: API to Separate Graph Building from Weight Loading to Reduce Peak Memory Usage (by mtavenrath) [feature request]

Anssi: related WebGPU issue gpuweb/gpuweb#4185

<gb> Issue 4185 Image uploading is insufficiently expressive to be optimized (by kainino0x) [api] [api-milestone-2-202502]

Anssi: I'd like to continue discuss external weights for constants, a proposal we initially explored at TPAC
… last week we received new information from WebGPU experts regarding data-streaming APIs and plain buffer uploads, and WebGPU's approach
… Reilly asked Kai from the WebGPU land whether they have considered a design that would allow an HTTP request as the source for a resource like so:

let constant = builder.constant(new Request("model.bin", {headers: {"Range": "bytes=5435435-5484329"}}));

Anssi: the WebGPU group was interested in this approach, but Kai shared they haven't yet done concrete work to integrate that feature into the WebGPU API

Anssi: in the WebGPU issue Kai illustrated two paths how an image file is fed into a GPUTexture:
… HTMLImageElement -> createImageBitmap -> copyExternalImageToTexture
… fetch -> blob -> createImageBitmap -> copyExternalImageToTexture
… WebGPU API's copyExternalImageToTexture() has since been upgraded and can now take as a source also ImageData and HTMLImageElement directly and accepts any of the following as a source:
… ImageData
… HTMLImageElement
… HTMLVideoElement
… VideoFrame
… HTMLCanvasElement
… OffscreenCanvas

https://www.w3.org/TR/webgpu/#gpucopyexternalimagesourceinfo

Anssi: so the following more optimized path should work now:
… HTMLImageElement -> copyExternalImageToTexture
… streaming to texture is still not supported, though

Reilly: I like pulling weights from HTTP request, the maximal hands off approach
… by using image we can cut some intermediate steps off, but if we host weights somewhere on the internet we can load them directly
… the challenge is on the framework side, they don't do anything similar, they take a streaming approach
… careful management of how much weights can be loaded in memory before feeding the GPU
… proposal from TPAC, having a constant that takes a stream might be what we want to do to meet frameworks where they are today
… you can still call all constants at one
… there's no backpressure in the system, to add that would needs streams or a promise somewhere
… next step to talk to ONNX folks who did WebNN integration and other framework providers

Anssi: should we coordinate anything wrt this with WebGPU group?

Reilly: we should run our proposal through WebGPU group to get their feedback for conceptual alignment
… there are WebGPU specific things, if they're interested in pulling directly from HTTP requests, we should make sure the way we specify things look similar
… I expect there to be differences
… alignment on the general pattern is the most important

Rafael: I haven't been heavily involved with this particular WebGPU feature, I wouldn't categorize this as streaming, in this case you need to wait for the download to finish before you can use it
… this is more like attaching things together with minimal steps
… go straight from first connection to WebGPU or WebNN, without conversion and memory copies
… we've seen cases where WebGPU you e.g. don't want alpha channel multiplied with colors, you just want to get the data

Reilly: I wouldn't view this as streaming but give implementation visibility when the resources are uploaded to the GPU
… existing frameworks expect the weights to be available at model compilation time, this is a limitation
… that means certain optimizations are not possible
… with HTTP request approach, we can give frameworks control over when the resources are loaded
… now the resources are loaded before build()
… we could change the behaviour when frameworks improve their approach

Ningxin: I will talk to Wangming and ONNX Runtime folks

Reilly: this is a forward-looking feature, will talk to LiteRT framework people about this feature

New device selection hints for MLContextOptions

Anssi: issue #902

<gb> Issue 902 Device selection criteria for usecase-driven scenarios (by fdwr) [device selection]

Anssi: I wanted to check the group's latest thoughts on hints to complement MLPowerPreference
… I simply translated Dwayne's table to IDL to tease out feedback:

webmachinelearning/webnn#902 (comment)

<gb> Issue 902 Device selection criteria for usecase-driven scenarios (by fdwr) [device selection]

dictionary MLContextOptions {
  MLPowerPreference powerPreference = "default";
+ MLLatencyPreference latencyPreference = /* default? */
+ MLWorkloadSizePreference workloadSizePreference = /* default? */
+ MLContinuityPreference continuityPreference = /* default? */
  boolean accelerated = true;
};

Anssi: adding hints is cheaper in a sense implementers can disregard any of them, but we still shouldn't add hints that are not backed with strong use cases to reduce conceptual weight of the API
… I see MikeW being supportive of the new hints in general and Zoltan's comments on the proposed opSupportLimitsPerDevice() and MikeW's confirmation the per-device op supports could be an addition on top of hints, not conflicting
… any further feedback on the set, do we see more use cases some of the the three dimensions under consideration: latency, workload size, continuity

Rafael: I'd like to see how to map these hints to the current backends to understand the implementability and hardware capabilities

Zoltan: I fully agree with Rafael, we need to map to the current backends, no need to break down per devices
… I think apps are interested in knowing which model to download and what constrains to be respected when inference is done, no need to micro-manage the implementation, but provide the best info to the implementation to select the policy

Add minimum data type set and rank range for input, constant, output

Anssi: issue #896 and PR #910

<gb> Pull Request 910 add minimum data types and rank range for operations (by BruceDai)

<gb> Issue 896 Add minimum data type set and rank range for input, constant, output and each operator into Spec (by BruceDai)

Anssi: related to issue #853

<gb> Issue 853 The minimum data type set (by huningxin) [operator specific]

Anssi: this PR adds "required data types" and "required ranks" columns to the "Tensor limits for ..." tables associated with each op in the spec
… adding this information was a tedious task, thanks Bruce for the PR
… with these enhanced tables, as a bonus, implementations can programmatically extract this data about the minimum data type set from WebNN API spec, now it is hard-coded in the implementation
… the PR review in ongoing with a lot of good comments, most resolved
… anything specific to discuss today for this issue or PR?

Anssi: feel free to merge the PR when adequate review has been received

Remove conformance tests with negative scale of DQ/Q operators

Anssi: issue #879 and PR #906

<gb> Pull Request 906 Restrict scale of dequantizeLinear and quantizeLinear to be positive (by BruceDai)

<gb> Issue 879 Propose to remove WPT conformance tests with negative scale of dequantizeLinear and quantizeLinear operator (by BruceDai) [question]

Anssi: per discussion in the issue, negative scales do have utility, but they are not yet supported on all backends
… due to this implementation limitation, the PR adds the following constraint to dequantizeLinear and quantizeLinear scale argument:
… "Values must be positive and nonzero."
… PR has been approved by Dwayne
… Ningxin you asked how to cover both wrong results and compilation failure cases?

Ningxin: I asked whether we need some text to mention the implementation-dependent behaviour, Phillis was asking about that in an earlier comment

Anssi: good to merge with adequate review

2025 Reflections

Anssi: Thank You for an exceptional 2025!
… it's been a busy year
… we are ending the year strong, with broader implementation experience as the icing on the cake
… the specification completed the latest wide review during 2025 with kudos
… we had F2F time with horizontal groups to deepen our collaboration
… we established new joint workstream for ethical and sustainability considerations with other groups
… our group grew +30% YOY, diversity increased with more ISVs and other early adopters joining us
… we keep hearing positive signals from the web developer ecosystem: WebNN API allows developers to unleash their creativity in ways not possible before in the browser
… the key message I'm hearing from all fronts is: you are on the right track, keep marching ahead
… I'm looking ahead to an even more exciting 2026, some milestones:
… we want to publish a new WebNN API Candidate Recommendation Snapshot aligned with the implementation that gets in the hands of early adopters
… I anticipate implementers to further improve the WebNN UX in 2026 informed by feedback from real users, continue work on performance optimizations, faster model compilation, reduced memory usage -- we will carefully craft WebNN API enhancements together that will improve the experience further
… LLM performance is crucially important and running SLMs in browser is becoming a real thing in 2026, a requirement for agentic workloads in the browser
… to that end, we will continue deliver important WebNN API enhancement for LLMs, such as dynamic shapes and op fusion
… and of course, WebGPU interop continues to be a crucial focus area
… and much more!
… thank you all for your contributions during 2025

<handellm> thank you all!

Anssi: there's some much more to come in 2026 and our path is clear
… our group's open standards and open source based approach continues to provide the users and developers agency and choice
… Thank You Everyone for your focus, dedication, contributions and friendship on this multi-year journey!
… we will be back after the holiday break 15th January 2026

– DRAFT –
WebML WG Teleconference – 18 December 2025

18 December 2025

Attendees