WebML WG Teleconference – 26 June 2025

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome Khushal Sagar, Hannah Van Opstal, David Bokan from Google to the WebML WG!
… and please welcome Frank Li from Microsoft to WebML CG
… Frank recently added support for tool/function calling to the Prompt API, a prerequisite as we advance toward exciting space of enabling agentic workflows

Announcements

Awesome WebNN tools

anssik: Awesome WebNN tools updates, new WebNN Model-to-Code conversion tools published

WebNN Tools
… ONNX2WebNN by Ningxin

Ningxin: converts ONNX model to WebNN JS graph topology and weights bin file so JS can load weights from it, this enables light-weight use of WebNN without any framework dependencies

anssik: ... WebNN Code Generator by Belem
… WebNN Utilities / OnnxConverter by MS Edge team

anssik: see also a tutorial on how to generate WebNN vanilla JS for package-size sensitive deployments

Generating WebNN Vanilla JavaScript

anssik: the team expects to deliver further improvements with new WebNN code-to-code translation tools
… to allow converting existing Python-based ML code, PyTorch/TorchScript, from other frameworks to WebNN vanilla JavaScript
… thanks for Ningxin, Belem, MS Edge team for these contributions that help developers to adopt WebNN into their web apps

WebNN Documentation community preview

anssik: I'm pleased to launch a community preview of the new WebNN Documentation

webmachinelearning/webnn-docs

https://webnn.io/

anssik: the webnn-docs effort is very important as we enter this stage of wider developer adoption
… huge thanks to Belem for pulling this off!
… we believe the vendor-neutral WebNN developer docs should ultimately live in MDN that has the widest reach
… during this preview phase, we use the dedicated site to gather feedback and plan the next steps
… the GH repo is open to contributions

Web Almanac Generative AI 2025 chapter

anssik: Web Almanac is HTTP Archive’s annual state of the web report, and Christian is leading the GenAI chapter

HTTPArchive/almanac.httparchive.org#4104

<gb> Issue 4104 Generative AI 2025 🆕 (by nrllh) [2025 chapter]

https://almanac.httparchive.org/

anssik: Christian gave a great overview of this effort at our CG meeting, please check out:

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-06-23-cg-minutes.md#http-archives-web-almanac

Christian: we are planning a new chapter Web Almanac, annual publication to identify web trends, we want to find out how web sites are using WebNN and Built-in AI APIs

W3C TPAC 2025 group meetings

anssik: TPAC 2025, W3C's annual all-groups conference, will take place 10-14 November 2025 in Kobe, Japan. The venue is Kobe International Conference Center
… my expectation is the WebML WG participants prefer to meet during the TPAC week
… I also expect we will have a joint meeting with the WebML CG
… Group meetings can happen on Monday, Tuesday, Thursday, and Friday
… I have requested Monday (10 Nov) for the WG and Tuesday (11) for the CG meeting from TPAC organizers
… I expect the schedule to be confirmed next month and I'll share the details with the group when available

anssik: one consideration is related to timezones for possible remote participants, Japan Monday is US West Coast Sunday evening
… this may or may not work depending on how flexible you can be with your work hours on an exceptional basis
… feedback is still welcome via: webmachinelearning/meetings#32

<gb> Issue 32 WebML WG/CG scheduling poll for TPAC 2025 (Kobe, Japan) (by anssiko)

anssik: questions?

Tarek: I'm considering coming and wanted to know if there are specific steps to take?

anssik: there will more information shared latest early August

Incubations

anssik: the WebML Community Group met at EU-APAC friendly time on Mon 26 May 2025

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-06-23-cg-agenda.md

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-06-23-cg-minutes.md

anssik: we received an HTTP Archive's Web Almanac update by Christian, check the minutes if you're interested in contributing
… we reviewed a new proposal for a Fact-checking API, the initial feedback suggests implementation has risks, a proposal is better experimented as a web extensions similarly to what WikiMedia had done
… we had a Proofreader API kick off

Proofreader API

anssik: now in dev trial on Chrome, Origin Trial planned for Chrome 139
… feedback welcome

anssik: discussed new features and recent improvements landed to the Prompt API
… structured output improvements to fix bugs found via implementation experience
… assistant prefixes (aka prefills) to allow constraining responses by providing a prefix that will guide the LLM to a specific response format
… support for tool/function calling landed, paving the way for agentic workflows
… received updates from new Prompt(-like) API web extensions (e.g. AiBrow, Mozilla's trial web extension API) that extend the Prompt API baseline with new features

anssik: we deferred Translation API to a future meeting when we have Mozilla folks on the call
… we wanted to better understand the use cases of the Mozilla's Translation API proposal and see if we can converge

Operator specific issues

[operator specific] issues

Drop support for int32/uint32 of zeropoint for quantizeLinear

anssik: issue #856

<gb> Issue 856 Consider drop the support for int32/uint32 of zeropoint for quantizeLinear (by lisa0314) [operator specific]

anssik: Lisa reports "WebNN spec said, quantizeLinear zeroPoint can support uint8/int8, uint32/int32"

https://www.w3.org/TR/webnn/#dom-mlgraphbuilder-quantizelinear-input-scale-zeropoint-options-zeropoint

anssik: and points the limitations in current backends:
… ORT quantizeLinear can't support int32/uint32 for zeroPoint
… TFLite quantize can't support int32/uint32 for zeroPoint
… per this data, Lisa's suggestion is to drop int32/uint32 of zeropoint for quantizeLinear
… comments?

Reilly: skimming this, it doesn't seem valuable to support int32 quantization, technically quantization, but does not seem very useful feature to me

Dwayne: I don't see a compelling need for int32 in zeropoint

RafaelCintron: +1 to what Dwayne said

ningxin: checking Core ML does not support int32 for zeropoint

Reilly: it does not make sense to quantize values to 32-bit integer type, not useful

Dwayne: this is now specced so that zeropoint is the same type as input, need to split data types between those

Dwayne: ONNX dequantizeLinear can be int32, but not the zeropoint

ningxin: the proposal here is for quantization only

Reilly: ONNX is the outlier to support int32 for quantized input as well, I'd expand the issue accordingly
… Dwayne and Ningxin, do you agree since there's binding in the spec from input to zeropoint type, does it make sense to drop int32/uint32 from quantizeLinear?

ningxin: my understanding is quantizeLinear is bound to output data type, from float to linear?

Reilly: there's a matching question on dequantize
… and whether to also drop support for int32/uint32 both input and output

Dwayne: I'll check ONNX history for reason why it is an outlier

Reilly: we do more research on broader int32/uint32 question
… will make a comment to the issue

Add missing 64-bit integers support for some reduction operators

anssik: issue #694 and PR #695

<gb> Pull Request 695 Bugfix: Add missing 64-bit integers support for some reduction operators (by huningxin) [operator specific]

<gb> Issue 694 Consider adding int64/uint64 data type support for some reduce operators (by lisa0314) [operator specific]

anssik: related issue #853

<gb> Issue 853 The minimum data type set (by huningxin) [operator specific]

ningxin: IIRC MikeW asked if this is optional, I shared this is optional and not mandatory

MikeW: I just approved the PR

anssik: this PR is good to merge

Other issues and PRs

<ningxin> If I remember correctly, int32 input of dequantizeLinear is useful for conv2d's bias

Evaluate sustainability impact

anssik: issue #861

<gb> Issue 861 Evaluate sustainability impact (by anssiko) [tag-needs-resolution]

anssik: I want to bump this issue opened in response to TAG review feedback:

>TAG: We would appreciate if the WG would evaluate the likely impacts on sustainability from introducing this API, perhaps in collaboration with the Sustainable Web IG. There are several competing likely effects, including the comparative energy efficiency of personal devices vs datacenters, the greater efficiency of WebNN over WebGPU for the same workload, increased use of neural nets as they get easier to access, faster device

obsolescence if older devices can't effectively run the workloads this API encourages, and likely other considerations. Any sustainability impacts might be balanced by increased utility and privacy, but it would be good to know what we're signing up for.

anssik: we discussed last time purpose-built ML accelerators aka NPUs are generally known to be more power-efficient than GPUs
… I opened an issue to solicit further input, suggestions, corrections and clarifications to inform related explainer and/or specification updates in response to this TAG feedback

RafaelCintron: asking did you remember sharing with the TAG NPUs being better for sustainability?

anssik: I shared the issue with the TAG

Reilly: a reasonable response would be what impacts sustainability is the broader adoption of ML techniques as a whole, client vs. server side, both take energy, and local execution is only possible if the local has enough energy and power
… there's a concern that applies across the whole space, local compute reduces the cost of site developer and pushes it to user, a power-privacy trade off
… I'm a little concerned e.g. crypto miners using local compute for their own benefit
… this is possible via Wasm and WebGPU already, however

RafaelCintron: substantial benefit from JS tools that allow minimizing the amount of bits to transfer over the network
… new machines are bought for new experiences

anssik: model caching helps with sustainability

Caching mechanism for MLGraph

anssik: issue #807 and PR #862

<gb> MERGED Pull Request 862 Add WebNN MLGraph Cache Explainer (by anssiko)

<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]

anssik: thanks to Reilly and Ningxin for your review and comments
… the first version of the explainer was merged

Explainer

anssik: I'd like to discuss what participants think are the reasonable next steps for the spec and implementation
… as you recall, we have a prototype Chromium implementation and have explored how to use this in a real sample app

shiyi9801/chromium#227

<gb> Pull Request 227 [DO NOT SUBMIT] Model cache POC (by shiyi9801)

https://github.com/webmachinelearning/webnn-samples/compare/master...shiyi9801:webnn-samples:model_cache

RafaelCintron: strong proponent for a caching mechanism, the current API could be improved to have build and buildAndSave combined together
… there are EPs in ONNX that cannot go and do save at any point, you need to decide at build time
… I think that's unfortunate to save things to slow disk and I know people have said they want to do model inferencing securely

Zoltan: should we include build options?

RafaelCintron: build options sounds fine, you need to give it a name

Zoltan: by default does not save, you have to be explicit and define it in options

anssik: Rafael please comment on the issue so we remember to update the explainer

Reilly: I think what the Intel and Microsoft folks have been looking at is the design in ONNX and Chromium, this has a unique feature that it is possible to make the model ready for inferencing without going through serialization step to prepare the model to be saved
… in TFLite and Core ML, the only option is to produce a serialized model, and loading serializer model into a form ready for inference
… only the ORT supports building a model in memory in deserialized form ready for inference, this raises a question of do we force the model to be serialized anyway, so we can make saving the graph optional or mandatory step
… the benefit of making this optional is the latency of first inference, the user visits the site multiple time
… leaning towards not optimizing for the first load case so much, which leads me to say maybe saving the model becomes mandatory, only option is "build and save" where you have to name it
… that makes me concerned if we give the developer this capability, then everyone has to deal with the question how do I name it
… question is, how important this potential optimization for just one framework is?

RafaelCintron: if we have both "build" and "build and save" do we lose performance on TFLite?
… to have just "build and save", I haven't though of that, when would be the case you wouldn't want to save?
… some toy web site?
… if the user visits for multiple times, it always makes sense to save

Reilly: I feel our advice to developers is to always "build and save"
… only for sample sites "build" would be reasonable

anssik: the common case should be easy, the less common case should be possible

Reilly: only concern is implementation complexity, build without saving, if not exercised so often

RafaelCintron: I'd be OK with build-only take an option and to force people to name their model
… the "do not save" use case would be to keep the model secure and not allow developers to inspect it

anssik: is it security by obscurity?

RafaelCintron: would take more effort

anssik: action on Rafael to check the explainer reflects your thinking

Query supported devices

Before graph compilation

anssik: issue #815

<gb> Issue 815 Query supported devices before graph compilation (by anssiko) [device selection]

anssik: and related PR #860 by Zoltan (thanks!) for explainer updates

<gb> Pull Request 860 Update with an example HW selection guide and new use cases (by zolkis)

anssik: I'd like to check all the product-driven use case feedback from Google Meet is translated into explainer updates
… Zoltan has updated the key use cases and will talk to them, you can follow along from the staged explainer doc at:

<gb> Pull Request 860 Update with an example HW selection guide and new use cases (by zolkis)

https://github.com/zolkis/webnn/blob/device-selection-explainer-next/device-selection-explainer.md#key-use-cases-and-requirements

Zoltan: developer scenarios to try to figure if the model can run on the target platform
… tried to avoid solutions, document requirements only
… UC 1. Pre-download capability check
… UC 2. Pre-download or pre-build hints and constraints
… UC 3. Post-compile query of inference details

Zoltan: Google Meet requirement was to figure out very fast whether they can use WebNN
… PTAL the use cases and requirements section, link shared above

Next meeting 14 August 2025

Anssi: due to the upcoming holiday season in the Northern hemisphere we will skip over July meetings and will meet again 14 August 2025
… thank you for your contributions during the first half of 2025, everyone!
… the community continues to grow and I'm pleased to see new people join from both big and small companies, and individuals

– DRAFT –
WebML WG Teleconference – 26 June 2025

26 June 2025

Attendees