WebML WG Teleconference – 9 October 2025

Meeting minutes

Anssi: we'll start by welcoming our latest new participants:
… please welcome to the WebML WG:
… Mark Foltz from Google
… Umar Iqbal from Washington University as an Invited Expert
… Aram Zucker-Scharff, Davis Shaver, and Stephen Erickson from The Washington Post
… welcome to all new participants, I look forward to working with you!

Incubations

Anssi: a debrief on the recent WebML Community Group developments

WebML CG Teleconference – 2 October 2025

Repository: webmachinelearning/webmcp

Anssi: we had another WebMCP API brainstorming session and made important resolutions:
… - resolved to make the tools be part of the discovery mechanism
… - resolved to look into higher-level hooks to connect WebMCP with external agents for listing tools
… - resolved that tool execution should be able to start/stop yielding to the user throughout its lifecycle, in context of elicitation
… - resolved navigator.modelContext is the "root" object name

F2F Agenda brainstorming

Repository: webmachinelearning/meetings

Anssi: F2F Agenda issue #35

<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)

Anssi: I want to discuss, review and update draft WebML WG/CG F2F Agenda based on your feedback
… now that we're getting closer to the F2F it makes sense to lower the level of abstraction and look at specific issues of interest
… note on logistics:
… registration open until 3 November
… meeting dates are 10-11 November 2025 (start on 9/10 for remotes in Pacific timezone!)
… please export invites as .ics from:

10 November 2025

11 November 2025

Anssi: the first day 10 Nov dedicated for WG / WebNN, second day 11 Nov for WebMCP, Built-in AI APIs
… we can still do adjustments based on feedback

https://github.com/w3c/tpac2025-breakouts/issues

https://www.w3.org/2025/11/TPAC/#schedule

Anssi: I observe good participation, both familiar names and new faces
… currently 42 in-person participants including observers, excluding remote participants
… to set the expectations for the F2F meeting:
… F2F is an opportunity to get to know people, including folks outside the group and the wider community
… humans usually work better together when they know each other
… we will not do low-level specification PR reviews on a big screen at the F2F, an async GH-driven work mode is better for that
… rather we try to make resolutions and seek consensus on important issues, chart the path forward, and eat Japanese food in a great company

Anssi: F2F Agenda issue #35

<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)

Anssi: I put up a draft to solicit feedback via comments for both TBA topics and timing to avoid scheduling conflicts as best as we can
… we've made good progress in closing down open issues for the WebNN API, stabilizing the spec
… recently a lot of energy has been put on broadening the implementation experience
… this is busy time as we race to meet release branch milestones and pass quality gates
… after this major push we get the API in the hands of early adopters to help us iron out remaining kinks
… the timing of this coincides with our TPAC meeting, so appreciate your contributions during this busy time

Reilly: I think implementation-wise are waiting on Windows ML backend, a big missing piece, getting very close
… another thing is interop of backends, take a good look at our wpt coverage and gaps

Anssi: who would be the best to lead the wpt discussion?

Reilly: the group could do a triage pass over wpt result and that could help answer whether there are any implementation differences that warrant spec changes

Ningxin: I wil check our team working on wpt tests

Markus: regarding NVIDIA, our backend is enabled, operator tests are fine, a few have accuracy problems due to reduced format we use internally, DML provider and choosing the backend had some issues

Rafael: I propose we discuss the system setup separately

Anssi: 10 November 2025 is the Working Group F2F with a WebNN API focus, here's the top-level view:

<ningxin> If it's a Chromium implementation issue, feel free to open an issue at https://issues.chromium.org/issues/new?component=1456206&template=0

Anssi: - Orientation
… group's charter framing, triage pass over WebNN issues, as a group exercise

Reilly: I think we did a pass over issues at previous TPAC in the beginning, running through issues live can be productive in real-time space
… maybe the editors can put together a report summarizing them
… Anssi: - New features
… 2-4 issues, can include supporting presentations
… - Customer feedback and collaborations
… please bring any feedback from frameworks, end-users, ISVs
… - Interop and technical cross-group coordination
… interop is the cornerstone of the web platform, wpt topics and any coordination with other W3C groups goes here
… - Implementation plans and trials
… we discuss upcoming trials, learnings from browsers, backends and frameworks that implement and integrate with the WebNN API
… - Horizontals
… we get to know experts behind horizontal groups: ethics, sustainability, privacy, security, all areas where we've recently recruited more participants to join us
… - Dinner
… we eat Japanese food in a great company!

Anssi: feedback welcome via GH comments, on these calls, via email

New features and operator specific issues

Drop support of 8-bit integers input for CumulativeSum

Anssi: issue #892

<gb> Issue 892 not found

Anssi: Ningxin proposes to drop support of 8-bit integers input for CumulativeSum due to lack of backend support

Repository: webmachinelearning/webnn

#892
… issue notes 8-bit integer input for cumulativeSum is not supported by any of the Chromium backends: Core ML, DirectML, ONNX, TFLite
… for symmetry, reduceSum also doesn't support 8-bit integer input
… I think we all agree to drop this, I see Phillis +1

<gb> Issue 892 Drop support of 8-bit integers input for CumulativeSum (by huningxin) [operator specific]

Reilly: SGTM

Flexible input sizes

Anssi: issue #883

<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific]

Anssi: I put this on the agenda as a reminder to check with Guenther for ORT Web and WebNN EP perspective for the feature
… I guess we're still awaiting Guenther's feedback?

Rafael: I haven't heard feedback yet, he personally thinks this is important

Anssi: do we block on Guenther or can we do some investigation ourselves in the interim to further this?

Reilly: I think the question is how is this getting implemented by backends, what is the role of WebNN in this decision, the framework could build multiple graphs
… I suspect that has all sorts of performance bottlenecks, I want to understand what is the form that various backends would prefer, something to abstract over

Dwayne: would need to familiarize myself with TFLite and Core ML, as of importance of this feature, interested in prototyping to see the possibility

Reilly: I haven't looked at this yet in TFLite and Core ML, DML EP can execute models with dynamic shapes, Joshua/HF used WebGPU EP and it has some support?

Dwayne: right

Markus: it can be expensive for ORT to have multiple graphs

Reilly: my intuition also, to require multiple graphs, pushing that deep into the stack the particular implementation can avoid recreating them, should figure out resource sharing, to push this down to component that interact with hardware

Markus: in TensorRT we have dynamic shapes and it is handled by our EP

Rafael: how ORT talks with EPs is an implementation detail

Markus: I recall some frameworks allow defining max size with flexible input sizes

Core operator set

Anssi: issue #573

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

Machine Learning Operator Mapping - All Raw Operators

Anssi: we had a good discussion at our prior meeting with regard to core operator set

prior meeting minutes

Anssi: Fabio wanted to get back to the group after talking with the NVIDIA team

Fabio: we're collecting all the ops that'd benefit from being in the set, one class is various attentions
… also gathers, MoE, TopK
… looking for other ops that'd benefit from not being composed

Reilly: I'm curious about MoE and attentions, my concern with these high-level ops that are tied to particular model architectures, while they give performance boost, not necessarily long-lived
… found out this by looking at e.g. LSTM where actual implementation details matter, and there were compatibility issues between implementations

Fabio: I will look into this
… Anssi: do we have any feedback?

Privacy and Security

Anssi: proposed changes to privacy considerations in PR #890

<gb> Pull Request 890 Revise privacy considerations (by anssiko)

Anssi: this PR suggests more changes than the minimal one-liner proposed by Reilly here:

"No information from the underlying platform is exposed directly." needs to be revised

<gb> Issue 886 Revise privacy considerations (by anssiko) [privacy-tracker]

Anssi: if the group would prefer a minimal change, I will update the PR accordingly

Anssi: finally, the security review was completed with positive feedback: "well-written in a narrative form"

w3c/security-request#85

<gb> CLOSED Issue 85 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [REVIEW REQUESTED] [pending] [CR]

Anssi: that means once the privacy revising issue #886 is addressed we've completed the latest wide review round!

<gb> Issue 886 Revise privacy considerations (by anssiko) [privacy-tracker]

wide review tracker

<gb> Issue 239 Wide review tracker (by anssiko) [process]

Query supported devices

Before graph compilation

Anssi: spec PR #895 and explainer PR #884

<gb> Pull Request 884 Update explainer with new proposal for simple accelerator mapping (by zolkis)

<gb> Pull Request 895 Add a simple accelerator selection mechanism. (by zolkis)

Anssi: thanks Zoltan for submitting these two PRs, ready for review now
… the spec PR suggests a simplified boolean-returning MLContext.accelerated and MLContext.cpuFallbackActive API
… proposed IDL change:

interface MLContext {
  undefined destroy();
+ readonly attribute boolean accelerated;
+ readonly attribute boolean cpuFallbackActive;
  readonly attribute Promise<MLContextLostInfo> lost;
};

Anssi: this minimal API change is per our discussion
… I'd like to get review from implementers, and if no concerns merge this PR

Zoltan: just mentioning I haven't identified steps that handle power options, could do that separately

Rafael: I have one question about MLContext, what in practice is the use case when accelerated and cpuFallbackActive both are false

Zoltan: currently the steps that I added do not allow this case

Rafael: what if the backend accelerates some of the ops?
… when accelerated and cpuFallbackActive are both true?

Zoltan: accelerated refers to massively parallel acceleration

Rafael: I guess there could be a case, if there's a CPU backend doing SIMD it could be considered accelerated with cpuFallbackActive

massively parallel acceleration expects GPU and NPU

Rafael: why do we need two booleans, if when accelerated is true there's no cpu fallback?
… want to understand the use cases when both are true or both are false

Zoltan: both are false is redundant, we should specify to avoid this combination

Zoltan: good input, but also need to include power options in the picture, based on its setting we could select NPU or GPU
… comments via PR welcome

– DRAFT –
WebML WG Teleconference – 9 October 2025

09 October 2025

Attendees