Meeting minutes
Repository: webmachinelearning/webnn
anssik: please welcome Khushal Sagar, Hannah Van Opstal, David Bokan from Google to the WebML WG!
… and please welcome Frank Li from Microsoft to WebML CG
… Frank recently added support for tool/function calling to the Prompt API, a prerequisite as we advance toward exciting space of enabling agentic workflows
Announcements
Awesome WebNN tools
anssik: Awesome WebNN tools updates, new WebNN Model-to-Code conversion tools published
WebNN Tools
… ONNX2WebNN by Ningxin
Ningxin: converts ONNX model to WebNN JS graph topology and weights bin file so JS can load weights from it, this enables light-weight use of WebNN without any framework dependencies
anssik: ... WebNN Code Generator by Belem
… WebNN Utilities / OnnxConverter by MS Edge team
anssik: see also a tutorial on how to generate WebNN vanilla JS for package-size sensitive deployments
Generating WebNN Vanilla JavaScript
anssik: the team expects to deliver further improvements with new WebNN code-to-code translation tools
… to allow converting existing Python-based ML code, PyTorch/TorchScript, from other frameworks to WebNN vanilla JavaScript
… thanks for Ningxin, Belem, MS Edge team for these contributions that help developers to adopt WebNN into their web apps
WebNN Documentation community preview
anssik: I'm pleased to launch a community preview of the new WebNN Documentation
anssik: the webnn-docs effort is very important as we enter this stage of wider developer adoption
… huge thanks to Belem for pulling this off!
… we believe the vendor-neutral WebNN developer docs should ultimately live in MDN that has the widest reach
… during this preview phase, we use the dedicated site to gather feedback and plan the next steps
… the GH repo is open to contributions
Web Almanac Generative AI 2025 chapter
anssik: Web Almanac is HTTP Archive’s annual state of the web report, and Christian is leading the GenAI chapter
HTTPArchive/
<gb> Issue 4104 Generative AI 2025 🆕 (by nrllh) [2025 chapter]
https://
anssik: Christian gave a great overview of this effort at our CG meeting, please check out:
Christian: we are planning a new chapter Web Almanac, annual publication to identify web trends, we want to find out how web sites are using WebNN and Built-in AI APIs
W3C TPAC 2025 group meetings
anssik: TPAC 2025, W3C's annual all-groups conference, will take place 10-14 November 2025 in Kobe, Japan. The venue is Kobe International Conference Center
… my expectation is the WebML WG participants prefer to meet during the TPAC week
… I also expect we will have a joint meeting with the WebML CG
… Group meetings can happen on Monday, Tuesday, Thursday, and Friday
… I have requested Monday (10 Nov) for the WG and Tuesday (11) for the CG meeting from TPAC organizers
… I expect the schedule to be confirmed next month and I'll share the details with the group when available
anssik: one consideration is related to timezones for possible remote participants, Japan Monday is US West Coast Sunday evening
… this may or may not work depending on how flexible you can be with your work hours on an exceptional basis
… feedback is still welcome via: webmachinelearning/
<gb> Issue 32 WebML WG/CG scheduling poll for TPAC 2025 (Kobe, Japan) (by anssiko)
anssik: questions?
Tarek: I'm considering coming and wanted to know if there are specific steps to take?
anssik: there will more information shared latest early August
Incubations
anssik: the WebML Community Group met at EU-APAC friendly time on Mon 26 May 2025
https://
https://
anssik: we received an HTTP Archive's Web Almanac update by Christian, check the minutes if you're interested in contributing
… we reviewed a new proposal for a Fact-checking API, the initial feedback suggests implementation has risks, a proposal is better experimented as a web extensions similarly to what WikiMedia had done
… we had a Proofreader API kick off
anssik: now in dev trial on Chrome, Origin Trial planned for Chrome 139
… feedback welcome
anssik: discussed new features and recent improvements landed to the Prompt API
… structured output improvements to fix bugs found via implementation experience
… assistant prefixes (aka prefills) to allow constraining responses by providing a prefix that will guide the LLM to a specific response format
… support for tool/function calling landed, paving the way for agentic workflows
… received updates from new Prompt(-like) API web extensions (e.g. AiBrow, Mozilla's trial web extension API) that extend the Prompt API baseline with new features
anssik: we deferred Translation API to a future meeting when we have Mozilla folks on the call
… we wanted to better understand the use cases of the Mozilla's Translation API proposal and see if we can converge
Operator specific issues
Drop support for int32/uint32 of zeropoint for quantizeLinear
anssik: issue #856
<gb> Issue 856 Consider drop the support for int32/uint32 of zeropoint for quantizeLinear (by lisa0314) [operator specific]
anssik: Lisa reports "WebNN spec said, quantizeLinear zeroPoint can support uint8/int8, uint32/int32"
anssik: and points the limitations in current backends:
… ORT quantizeLinear can't support int32/uint32 for zeroPoint
… TFLite quantize can't support int32/uint32 for zeroPoint
… per this data, Lisa's suggestion is to drop int32/uint32 of zeropoint for quantizeLinear
… comments?
Reilly: skimming this, it doesn't seem valuable to support int32 quantization, technically quantization, but does not seem very useful feature to me
Dwayne: I don't see a compelling need for int32 in zeropoint
RafaelCintron: +1 to what Dwayne said
ningxin: checking Core ML does not support int32 for zeropoint
Reilly: it does not make sense to quantize values to 32-bit integer type, not useful
Dwayne: this is now specced so that zeropoint is the same type as input, need to split data types between those
Dwayne: ONNX dequantizeLinear can be int32, but not the zeropoint
ningxin: the proposal here is for quantization only
Reilly: ONNX is the outlier to support int32 for quantized input as well, I'd expand the issue accordingly
… Dwayne and Ningxin, do you agree since there's binding in the spec from input to zeropoint type, does it make sense to drop int32/uint32 from quantizeLinear?
ningxin: my understanding is quantizeLinear is bound to output data type, from float to linear?
Reilly: there's a matching question on dequantize
… and whether to also drop support for int32/uint32 both input and output
Dwayne: I'll check ONNX history for reason why it is an outlier
Reilly: we do more research on broader int32/uint32 question
… will make a comment to the issue
Add missing 64-bit integers support for some reduction operators
anssik: issue #694 and PR #695
<gb> Pull Request 695 Bugfix: Add missing 64-bit integers support for some reduction operators (by huningxin) [operator specific]
<gb> Issue 694 Consider adding int64/uint64 data type support for some reduce operators (by lisa0314) [operator specific]
anssik: related issue #853
<gb> Issue 853 The minimum data type set (by huningxin) [operator specific]
ningxin: IIRC MikeW asked if this is optional, I shared this is optional and not mandatory
MikeW: I just approved the PR
anssik: this PR is good to merge
Other issues and PRs
<ningxin> If I remember correctly, int32 input of dequantizeLinear is useful for conv2d's bias
Evaluate sustainability impact
anssik: issue #861
<gb> Issue 861 Evaluate sustainability impact (by anssiko) [tag-needs-resolution]
anssik: I want to bump this issue opened in response to TAG review feedback:
>TAG: We would appreciate if the WG would evaluate the likely impacts on sustainability from introducing this API, perhaps in collaboration with the Sustainable Web IG. There are several competing likely effects, including the comparative energy efficiency of personal devices vs datacenters, the greater efficiency of WebNN over WebGPU for the same workload, increased use of neural nets as they get easier to access, faster device
obsolescence if older devices can't effectively run the workloads this API encourages, and likely other considerations. Any sustainability impacts might be balanced by increased utility and privacy, but it would be good to know what we're signing up for.
anssik: we discussed last time purpose-built ML accelerators aka NPUs are generally known to be more power-efficient than GPUs
… I opened an issue to solicit further input, suggestions, corrections and clarifications to inform related explainer and/or specification updates in response to this TAG feedback
RafaelCintron: asking did you remember sharing with the TAG NPUs being better for sustainability?
anssik: I shared the issue with the TAG
Reilly: a reasonable response would be what impacts sustainability is the broader adoption of ML techniques as a whole, client vs. server side, both take energy, and local execution is only possible if the local has enough energy and power
… there's a concern that applies across the whole space, local compute reduces the cost of site developer and pushes it to user, a power-privacy trade off
… I'm a little concerned e.g. crypto miners using local compute for their own benefit
… this is possible via Wasm and WebGPU already, however
RafaelCintron: substantial benefit from JS tools that allow minimizing the amount of bits to transfer over the network
… new machines are bought for new experiences
anssik: model caching helps with sustainability
Caching mechanism for MLGraph
anssik: issue #807 and PR #862
<gb> MERGED Pull Request 862 Add WebNN MLGraph Cache Explainer (by anssiko)
<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]
anssik: thanks to Reilly and Ningxin for your review and comments
… the first version of the explainer was merged
anssik: I'd like to discuss what participants think are the reasonable next steps for the spec and implementation
… as you recall, we have a prototype Chromium implementation and have explored how to use this in a real sample app
<gb> Pull Request 227 [DO NOT SUBMIT] Model cache POC (by shiyi9801)
RafaelCintron: strong proponent for a caching mechanism, the current API could be improved to have build and buildAndSave combined together
… there are EPs in ONNX that cannot go and do save at any point, you need to decide at build time
… I think that's unfortunate to save things to slow disk and I know people have said they want to do model inferencing securely
Zoltan: should we include build options?
RafaelCintron: build options sounds fine, you need to give it a name
Zoltan: by default does not save, you have to be explicit and define it in options
anssik: Rafael please comment on the issue so we remember to update the explainer
Reilly: I think what the Intel and Microsoft folks have been looking at is the design in ONNX and Chromium, this has a unique feature that it is possible to make the model ready for inferencing without going through serialization step to prepare the model to be saved
… in TFLite and Core ML, the only option is to produce a serialized model, and loading serializer model into a form ready for inference
… only the ORT supports building a model in memory in deserialized form ready for inference, this raises a question of do we force the model to be serialized anyway, so we can make saving the graph optional or mandatory step
… the benefit of making this optional is the latency of first inference, the user visits the site multiple time
… leaning towards not optimizing for the first load case so much, which leads me to say maybe saving the model becomes mandatory, only option is "build and save" where you have to name it
… that makes me concerned if we give the developer this capability, then everyone has to deal with the question how do I name it
… question is, how important this potential optimization for just one framework is?
RafaelCintron: if we have both "build" and "build and save" do we lose performance on TFLite?
… to have just "build and save", I haven't though of that, when would be the case you wouldn't want to save?
… some toy web site?
… if the user visits for multiple times, it always makes sense to save
Reilly: I feel our advice to developers is to always "build and save"
… only for sample sites "build" would be reasonable
anssik: the common case should be easy, the less common case should be possible
Reilly: only concern is implementation complexity, build without saving, if not exercised so often
RafaelCintron: I'd be OK with build-only take an option and to force people to name their model
… the "do not save" use case would be to keep the model secure and not allow developers to inspect it
anssik: is it security by obscurity?
RafaelCintron: would take more effort
anssik: action on Rafael to check the explainer reflects your thinking
Query supported devices
Before graph compilation
anssik: issue #815
<gb> Issue 815 Query supported devices before graph compilation (by anssiko) [device selection]
anssik: and related PR #860 by Zoltan (thanks!) for explainer updates
<gb> Pull Request 860 Update with an example HW selection guide and new use cases (by zolkis)
anssik: I'd like to check all the product-driven use case feedback from Google Meet is translated into explainer updates
… Zoltan has updated the key use cases and will talk to them, you can follow along from the staged explainer doc at:
<gb> Pull Request 860 Update with an example HW selection guide and new use cases (by zolkis)
Zoltan: developer scenarios to try to figure if the model can run on the target platform
… tried to avoid solutions, document requirements only
… UC 1. Pre-download capability check
… UC 2. Pre-download or pre-build hints and constraints
… UC 3. Post-compile query of inference details
Zoltan: Google Meet requirement was to figure out very fast whether they can use WebNN
… PTAL the use cases and requirements section, link shared above
Next meeting 14 August 2025
Anssi: due to the upcoming holiday season in the Northern hemisphere we will skip over July meetings and will meet again 14 August 2025
… thank you for your contributions during the first half of 2025, everyone!
… the community continues to grow and I'm pleased to see new people join from both big and small companies, and individuals