WebML WG Teleconference – 4 December 2025

Meeting minutes

Repository: webmachinelearning/webnn

Anssi: we'll start by acknowledging our new participants who joined the WG:
… Simon Wijckmans from cside (Client side development Inc)
… Lynne Jiang, Ben Greenstein from Google
… Chris Needham from BBC
… JaEun Jemma Ku from University of Illinois
… Pavan Yanamadala, Siddharth Mangesh, Sharanya Chandrasekaran, Noormina Abuthahir from PayPal
… Dexter Yang from ByteDance
… Zoltan Kis as an Invited Expert
… on behalf of the entire group, welcome to all new participants!

F2F recap

Archived F2F agenda

Working Group minutes

Community Group minutes

Anssi: I was not planning to recap the official agenda, but summarize the progress made outside the official meeting
… some highlights for me were the following:
… - we were able to raise awareness of our groups' inference and agentic work via breakouts and horizontal groups with the broader W3C community
… - we presented WebML WG and CG work at the very popular AI Agents and The Web breakout

WebML WG/CG at AI Agents and The Web breakout

Anssi: - together with Reilly, we presented WebNN at the Security IG F2F meeting on Fri

https://github.com/w3c/securityig/blob/main/meetings/2025/2025-11-14_agenda.md

Anssi: - Mozilla revised its WebNN position to support and Tarek initiated implementation work, see #763

<gb> Issue 763 Request standards positions from Mozilla and WebKit (by reillyeon) [process]

Anssi: - WebKit reopened its WebNN standards position
… - Markus and the NVIDIA team extended their exploration into various WebNN implementation strategies and optimizations, discussed during the week on the hallway track
… given the broader implementer interest is now ramping up fast, I propose we use W3C's Slack #webmachinelearning for synchronous implementation related discussions across implementers and continue to use IRC for these bi-weekly meetings
… Slack has certain benefits over IRC for this type of long-running discussions, e.g. message persistence, so I think this separation of concerns works here
… Tarek already started discussions about his Rust implementation on Slack and Markus chimed in, thanks!

- ... please join the W3C Slack #webmachinelearning to exchange ideas across implementers interested in WebNN

How to join W3C Slack

MarkusT: I'm looking forward to Tarek's work

W3C Web & AI Interest Group launched

Anssi: The Web & AI Interest Group is a forum to discuss ethical, societal, and technical implications of AI related technologies. Ethical Principles for Web Machine Learning established as a joint deliverable.

W3C Web & AI Interest Group Charter
… I had an exchange with Fabien Gandon who co-chairs the IG
… unfortunately Fabien couldn't join us today, but I'm conveying his welcome and invite anyone interested in ethical, societal, and technical implications of AI related technologies to join the IG
… we will develop our Ethical Principles document together with this newly formed IG
… to join, please follow the link in the charter document

Dom: thanks for the intro, the way to think about the IG, a place for the broader picture, this WebML WG does excellent deep work on technical specifications
… IG looks primarily non-technical topics, higher-level considerations how the AI & Web ecosystems can evolve harmoniously

Anssi: what is the IG's work mode?

Dom: GH-driven, some meetings planned

Anssi: is there flexibility in terms of non-normative deliverables the IG could work on?

Dom: yes, new non-normative deliverables would be welcome, the W3C team contact is working on a roadmap for the IG

Core operator set

Expand the expand operator to support blockwise broadcasting

Anssi: issue #903

<gb> Issue 903 Expand the expand operator to support blockwise broadcasting (by fdwr) [opset]

Anssi: this is one of the sub-issues spun off from the core op set meta issue

Dwayne: A) the preferred proposal is to:
… - move the blockwise broadcasting aspect into expand
… - leave the rest of the decomposition being the respective mul/div/sub/add for Q and DQ

expand()

Dwayne: B) the alternative considered:
… - extend resample with nearest neighbor to support multiple axes

resample()

Anssi: the preferred proposal is motivated by better conceptual alignment

Anssi: any questions or concerns with the preferred proposal?

Rob: need Reilly's input on Google's side
… will ping Reilly to provide feedback on this issue

Extend rank support

Anssi: issue #904

<gb> Issue 904 2D, or not 2D, that is the question (by fdwr) [opset]

Anssi: the more catchy name for this issue is "2D, or not 2D, that is the question" :-)
… Dwayne reports: "Multiple WebNN operators still have limited ranks which was historically done for backends that might be more limited"
… current backends have evolved since rank support was specified in WebNN
… limited ranks have caused issues in certain popular models
… Dwayne notes Whisper uses 1D conv and thus requires an extra reshape() step
… the issue contains a survey of the current operator rank support for CoreML, DML, LiteRT, ORT backends
… the proposal from Dwayne is to extend the rank support to match the intersection of the rank support of current backends
… the solution can take various API shapes
… Dwayne came up with the following options by studying other libraries:
… - A) Bake the axis count directly into the operator name
… - B) Use a single operator name, with an implicit axis count based on the input rank
… - C) Pass the reduction axis count separately from the input rank
… - D) Pass the explicit axes
… based on the pros/cons analysis option C or D is the most preferred

Anssi: we can reflect platform rank differences through MLOpSupportLimits
… any axis count 1-3 would be legal to WebNN if `axis count <= input rank`, see the table in the issue

Anssi: Dwayne suggests to avoid adding a zoo of new function names:
… foo1, foo2, foo3 etc.
… conv2d -> conv
… convTranspose2d -> convTranspose
… averagePool2d -> averagePool
… l2Pool2d -> l2Pool
… maxPool2d -> maxPool
… resample2d() -> resample

Dwayne: for each op, I can list IDL proposals to help readers

Anssi: does option C or D still allow AOT feature detection of ranks?
… a simple feature detection mechanism is to check for existence of a method on an object
… can we implement such a feature detection of supported ranks entirely with MLOpSupportLimits?
… an example of a simple feature detection:

const graphBuilder = new MLGraphBuilder(await navigator.ml.createContext());
if ('conv' in graphBuilder) console.log('conv() exists');

Anssi: the naming change has a compatibility impact as discussed in context of issue #821

<gb> Issue 821 Operator naming 2D vs 2d (by fdwr) [conventions]

Anssi: given the Origin Trials are imminent, I think this change would land after the initial OT period?

Dwayne: when making a new change, we give it 4 weeks for frameworks to update themselves, leave an alias in place

Composite operators / subgraphs

Anssi: issue #907

<gb> Issue 907 Composite operators / subgraphs (by fdwr) [opset]

Anssi: Core operator set was discussed at TPAC 2025 where we resolved to evolve the proposal for aggregate operators via subgraphs

RESOLUTION from TPAC 2025

Anssi: this builds upon the earlier exploration by Ningxin et al. on custom ops discussed at TPAC 2024

Custom ops at TPAC 2024

Anssi: Dwayne opened this topic-specific issue to pursue this proposal further and shared his background research on the topic (thanks!)
… see also the Case Study on WebNN Small Language Model Performance Optimization presented at TPAC 2025 for further motivation:

WebNN SLM Performance Optimization Case Study at TPAC 2025

Anssi: high-level motivation for the proposal has been discussed in context of the core op set meta issue and I think we have a general agreement
… - 100s of potential operators across ML libraries
… - adding all of them into a Web API is not feasible
… - WebNN core op set is designed to enable composability of larger aggregate ops
… - if the backend has a compatible implementation of the subgraph, can use a more efficient path vs. relying on pattern recognition by the implementation
… a popular concrete example of an aggregate op is multi-head attention, a key component of the transformer architecture introduced in the original 2017 paper
… Dwayne has a code snippet in the issue to demonstrate how this could look like in terms of API surface, basic steps (details, names etc. to be discussed)

Dwayne: a web developer defines a composite operator as a JS function using the existing WebNN built-in ops
… buildSubgraph() method returns the built subgraph
… subgraph() method returns the output given the built subgraph and input

Dwayne: this is more of an example, ideas welcome

MarkusT: how to handle different constants?
… do we want to get subgraph names?

Dwayne: would a name be helpful for a backend to recognize?

MarkusT: ML is done by frameworks, and pattern matching the subgraph, can pre-check if this is a name I expect, the name would be a hash to know what to pattern match against

Dwayne: looked at various ML libraries, would a list of candidate names be better?

MarkusT: if you dump the subgraph into debugging tool, the name would help with debugging
… subgraphs calling subgraphs?

Dwayne: seems useful for composability?

MarkusT: the input would be dynamic, the shape of the input determined by whatever output is sent to the input, subgraphs are like macros

Dwayne: ONNX has this concept of functions composed of multiple graphs

MarkusT: do we expect macro expansion by every backend?

Dwayne: not sure about that, each backend or layer below, should know the capabilities of the backend

MarkusT: if the backend would support subgraphs, perhaps the WebNN native interface would unroll

<Zakim> zkis, you wanted to ask if we maintain the semantics this was meant to be tanh

Zoltan: question, do we want to maintain semantics?

Dwayne: MarkusT's idea of including names would help with that

Zoltan: we should discuss whether we need to "standardize" those names
… an annotation mechanism

MarkusT: not prefer any meta information, new operation, how long does it take for us to standardize vs backend find a name and implement it?

Ningxin: to express an operator, the ops take optional input, for some attention ops, they have optional input too, how the subgraph concept can support that
… secondly, some existing WebNN ops have attributes, WebNN conventions

Ningxin: static attributes, how to go about them?

Dwayne: will add that as a consideration

MarkusT: what if attributes could override?

Dwayne: I suspect so

Push v pull architecture for constants

Anssi: issue #901

<gb> Issue 901 Proposal: API to Separate Graph Building from Weight Loading to Reduce Peak Memory Usage (by mtavenrath) [feature request]

Anssi: we discussed this proposal to reduce peak memory usage from Markus and the NVIDIA team at TPAC 2025
… and resolved to explore streaming constructor for constants

https://www.w3.org/2025/11/09-webmachinelearning-minutes.html#e940

Anssi: after TPAC, Markus provided further details on the benefits of the proposed pull-based model for constants in this issue:
… - 1. Latency Hiding via Parallel Compilation
… - 2. Direct-to-Disk Caching & I/O Alignment
… - 3. Persistent Layout Optimization
… - 4. Memory Architecture & UVM Efficiency
… - 5. Dynamic Resource Management
… and with the broader NVIDIA team looked into remote execution of neural networks, e.g. on a home-server, using external weights

Anssi: Dwayne notes external weights are already achievable via MLTensor when combined with MLGraphBuilder.input() method
… this allows MLGraph to be build without weights, to be written later via writeTensor()
… Dwayne suggests this addresses some of the concerns raised in this issue?
… what functionality do we miss with MLTensor and input()?
… I guess 5. dynamic resource management?

MarkusT: parsing constants is delayed, not all backends happy to call writeTensor()
… 5 points based on discussion with Reilly, if we have external resources we don't need to do memory copies at all, get the data at the time when you need it
… backends can pull the resources on demand, responsibility on the backend implementation
… we were wondering about caching, current ORT likely downloads all content, backend could be faster than code running in a JS process
… we're currently doing work in another ML framework with similar optimizations

MarkusT: pass an URL and offset proposal by Reilly sounded good, GGUF file passing to GraphBuilder and give input tensor names and we're done

Dwayne: I'll think about this more

Device selection

Device selection criteria for usecase-driven scenarios

Anssi: issue #902

<gb> Issue 902 Device selection criteria for usecase-driven scenarios (by fdwr) [device selection]

Anssi: we discussed this at TPAC 2025:

https://www.w3.org/2025/11/09-webmachinelearning-minutes.html#93d2

Anssi: - there was consensus that generally hints is the preferred mechanism, but no decision on which hints to pursue, if any, I posted an IDL diff to tease additional perspectives
… - there was interest in supporting multiple devices of a given type
… - there was an agreement prompt fatique is an issue, still evolving page embedded permission control (PEPC) might be a solution to that

WICG/PEPC

Anssi: - proposal that hints would help UA schedule real-time vs non-real time workloads running in parallel

MarkusH: if we have explicit (or implicitly detected by UA) Worker QoS, would there remain a use case for specifying the latency requirement? Same goes for the continuity.
… perhaps Worker QoS is implicitly detectable by the UA, could remove low-latency preference in that case
… a hint of real-time activity going on

MarkusH: perhaps Mike has feedback on an exact interface that would work out

– DRAFT –
WebML WG Teleconference – 4 December 2025

04 December 2025

Attendees