Meeting minutes
Anssi: we'll start by welcoming our latest new participants:
… please welcome to the WebML WG:
… Mark Foltz from Google
… Umar Iqbal from Washington University as an Invited Expert
… Aram Zucker-Scharff, Davis Shaver, and Stephen Erickson from The Washington Post
… welcome to all new participants, I look forward to working with you!
Incubations
Anssi: a debrief on the recent WebML Community Group developments
WebML CG Teleconference – 2 October 2025
Repository: webmachinelearning/webmcp
Anssi: we had another WebMCP API brainstorming session and made important resolutions:
… - resolved to make the tools be part of the discovery mechanism
… - resolved to look into higher-level hooks to connect WebMCP with external agents for listing tools
… - resolved that tool execution should be able to start/stop yielding to the user throughout its lifecycle, in context of elicitation
… - resolved navigator.modelContext is the "root" object name
F2F Agenda brainstorming
Repository: webmachinelearning/meetings
Anssi: F2F Agenda issue #35
<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)
Anssi: I want to discuss, review and update draft WebML WG/CG F2F Agenda based on your feedback
… now that we're getting closer to the F2F it makes sense to lower the level of abstraction and look at specific issues of interest
… note on logistics:
… registration open until 3 November
… meeting dates are 10-11 November 2025 (start on 9/10 for remotes in Pacific timezone!)
… please export invites as .ics from:
Anssi: the first day 10 Nov dedicated for WG / WebNN, second day 11 Nov for WebMCP, Built-in AI APIs
… we can still do adjustments based on feedback
https://
https://
Anssi: I observe good participation, both familiar names and new faces
… currently 42 in-person participants including observers, excluding remote participants
… to set the expectations for the F2F meeting:
… F2F is an opportunity to get to know people, including folks outside the group and the wider community
… humans usually work better together when they know each other
… we will not do low-level specification PR reviews on a big screen at the F2F, an async GH-driven work mode is better for that
… rather we try to make resolutions and seek consensus on important issues, chart the path forward, and eat Japanese food in a great company
Anssi: F2F Agenda issue #35
<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)
Anssi: I put up a draft to solicit feedback via comments for both TBA topics and timing to avoid scheduling conflicts as best as we can
… we've made good progress in closing down open issues for the WebNN API, stabilizing the spec
… recently a lot of energy has been put on broadening the implementation experience
… this is busy time as we race to meet release branch milestones and pass quality gates
… after this major push we get the API in the hands of early adopters to help us iron out remaining kinks
… the timing of this coincides with our TPAC meeting, so appreciate your contributions during this busy time
Reilly: I think implementation-wise are waiting on Windows ML backend, a big missing piece, getting very close
… another thing is interop of backends, take a good look at our wpt coverage and gaps
Anssi: who would be the best to lead the wpt discussion?
Reilly: the group could do a triage pass over wpt result and that could help answer whether there are any implementation differences that warrant spec changes
Ningxin: I wil check our team working on wpt tests
Markus: regarding NVIDIA, our backend is enabled, operator tests are fine, a few have accuracy problems due to reduced format we use internally, DML provider and choosing the backend had some issues
Rafael: I propose we discuss the system setup separately
Anssi: 10 November 2025 is the Working Group F2F with a WebNN API focus, here's the top-level view:
<ningxin> If it's a Chromium implementation issue, feel free to open an issue at https://
Anssi: - Orientation
… group's charter framing, triage pass over WebNN issues, as a group exercise
Reilly: I think we did a pass over issues at previous TPAC in the beginning, running through issues live can be productive in real-time space
… maybe the editors can put together a report summarizing them
… Anssi: - New features
… 2-4 issues, can include supporting presentations
… - Customer feedback and collaborations
… please bring any feedback from frameworks, end-users, ISVs
… - Interop and technical cross-group coordination
… interop is the cornerstone of the web platform, wpt topics and any coordination with other W3C groups goes here
… - Implementation plans and trials
… we discuss upcoming trials, learnings from browsers, backends and frameworks that implement and integrate with the WebNN API
… - Horizontals
… we get to know experts behind horizontal groups: ethics, sustainability, privacy, security, all areas where we've recently recruited more participants to join us
… - Dinner
… we eat Japanese food in a great company!
Anssi: feedback welcome via GH comments, on these calls, via email
New features and operator specific issues
Drop support of 8-bit integers input for CumulativeSum
Anssi: issue #892
<gb> Issue 892 not found
Anssi: Ningxin proposes to drop support of 8-bit integers input for CumulativeSum due to lack of backend support
Repository: webmachinelearning/webnn
#892
… issue notes 8-bit integer input for cumulativeSum is not supported by any of the Chromium backends: Core ML, DirectML, ONNX, TFLite
… for symmetry, reduceSum also doesn't support 8-bit integer input
… I think we all agree to drop this, I see Phillis +1
<gb> Issue 892 Drop support of 8-bit integers input for CumulativeSum (by huningxin) [operator specific]
Reilly: SGTM
Flexible input sizes
Anssi: issue #883
<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific]
Anssi: I put this on the agenda as a reminder to check with Guenther for ORT Web and WebNN EP perspective for the feature
… I guess we're still awaiting Guenther's feedback?
Rafael: I haven't heard feedback yet, he personally thinks this is important
Anssi: do we block on Guenther or can we do some investigation ourselves in the interim to further this?
Reilly: I think the question is how is this getting implemented by backends, what is the role of WebNN in this decision, the framework could build multiple graphs
… I suspect that has all sorts of performance bottlenecks, I want to understand what is the form that various backends would prefer, something to abstract over
Dwayne: would need to familiarize myself with TFLite and Core ML, as of importance of this feature, interested in prototyping to see the possibility
Reilly: I haven't looked at this yet in TFLite and Core ML, DML EP can execute models with dynamic shapes, Joshua/HF used WebGPU EP and it has some support?
Dwayne: right
Markus: it can be expensive for ORT to have multiple graphs
Reilly: my intuition also, to require multiple graphs, pushing that deep into the stack the particular implementation can avoid recreating them, should figure out resource sharing, to push this down to component that interact with hardware
Markus: in TensorRT we have dynamic shapes and it is handled by our EP
Rafael: how ORT talks with EPs is an implementation detail
Markus: I recall some frameworks allow defining max size with flexible input sizes
Core operator set
Anssi: issue #573
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
Machine Learning Operator Mapping - All Raw Operators
Anssi: we had a good discussion at our prior meeting with regard to core operator set
Anssi: Fabio wanted to get back to the group after talking with the NVIDIA team
Fabio: we're collecting all the ops that'd benefit from being in the set, one class is various attentions
… also gathers, MoE, TopK
… looking for other ops that'd benefit from not being composed
Reilly: I'm curious about MoE and attentions, my concern with these high-level ops that are tied to particular model architectures, while they give performance boost, not necessarily long-lived
… found out this by looking at e.g. LSTM where actual implementation details matter, and there were compatibility issues between implementations
Fabio: I will look into this
… Anssi: do we have any feedback?
Privacy and Security
Anssi: proposed changes to privacy considerations in PR #890
<gb> Pull Request 890 Revise privacy considerations (by anssiko)
Anssi: this PR suggests more changes than the minimal one-liner proposed by Reilly here:
"No information from the underlying platform is exposed directly." needs to be revised
<gb> Issue 886 Revise privacy considerations (by anssiko) [privacy-tracker]
Anssi: if the group would prefer a minimal change, I will update the PR accordingly
Anssi: finally, the security review was completed with positive feedback: "well-written in a narrative form"
<gb> CLOSED Issue 85 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [REVIEW REQUESTED] [pending] [CR]
Anssi: that means once the privacy revising issue #886 is addressed we've completed the latest wide review round!
<gb> Issue 886 Revise privacy considerations (by anssiko) [privacy-tracker]
<gb> Issue 239 Wide review tracker (by anssiko) [process]
Query supported devices
Before graph compilation
Anssi: spec PR #895 and explainer PR #884
<gb> Pull Request 884 Update explainer with new proposal for simple accelerator mapping (by zolkis)
<gb> Pull Request 895 Add a simple accelerator selection mechanism. (by zolkis)
Anssi: thanks Zoltan for submitting these two PRs, ready for review now
… the spec PR suggests a simplified boolean-returning MLContext.accelerated and MLContext.cpuFallbackActive API
… proposed IDL change:
interface MLContext {
undefined destroy();
+ readonly attribute boolean accelerated;
+ readonly attribute boolean cpuFallbackActive;
readonly attribute Promise<MLContextLostInfo> lost;
};
Anssi: this minimal API change is per our discussion
… I'd like to get review from implementers, and if no concerns merge this PR
Zoltan: just mentioning I haven't identified steps that handle power options, could do that separately
Rafael: I have one question about MLContext, what in practice is the use case when accelerated and cpuFallbackActive both are false
Zoltan: currently the steps that I added do not allow this case
Rafael: what if the backend accelerates some of the ops?
… when accelerated and cpuFallbackActive are both true?
Zoltan: accelerated refers to massively parallel acceleration
Rafael: I guess there could be a case, if there's a CPU backend doing SIMD it could be considered accelerated with cpuFallbackActive
massively parallel acceleration expects GPU and NPU
Rafael: why do we need two booleans, if when accelerated is true there's no cpu fallback?
… want to understand the use cases when both are true or both are false
Zoltan: both are false is redundant, we should specify to avoid this combination
Zoltan: good input, but also need to include power options in the picture, based on its setting we could select NPU or GPU
… comments via PR welcome