Meeting minutes
Repository: webmachinelearning/webnn
Anssi: we'll start by acknowledging our new participants who joined the WG:
… Simon Wijckmans from cside (Client side development Inc)
… Lynne Jiang, Ben Greenstein from Google
… Chris Needham from BBC
… JaEun Jemma Ku from University of Illinois
… Pavan Yanamadala, Siddharth Mangesh, Sharanya Chandrasekaran, Noormina Abuthahir from PayPal
… Dexter Yang from ByteDance
… Zoltan Kis as an Invited Expert
… on behalf of the entire group, welcome to all new participants!
F2F recap
Anssi: I was not planning to recap the official agenda, but summarize the progress made outside the official meeting
… some highlights for me were the following:
… - we were able to raise awareness of our groups' inference and agentic work via breakouts and horizontal groups with the broader W3C community
… - we presented WebML WG and CG work at the very popular AI Agents and The Web breakout
WebML WG/CG at AI Agents and The Web breakout
Anssi: - together with Reilly, we presented WebNN at the Security IG F2F meeting on Fri
https://
Anssi: - Mozilla revised its WebNN position to support and Tarek initiated implementation work, see #763
<gb> Issue 763 Request standards positions from Mozilla and WebKit (by reillyeon) [process]
Anssi: - WebKit reopened its WebNN standards position
… - Markus and the NVIDIA team extended their exploration into various WebNN implementation strategies and optimizations, discussed during the week on the hallway track
… given the broader implementer interest is now ramping up fast, I propose we use W3C's Slack #webmachinelearning for synchronous implementation related discussions across implementers and continue to use IRC for these bi-weekly meetings
… Slack has certain benefits over IRC for this type of long-running discussions, e.g. message persistence, so I think this separation of concerns works here
… Tarek already started discussions about his Rust implementation on Slack and Markus chimed in, thanks!
- ... please join the W3C Slack #webmachinelearning to exchange ideas across implementers interested in WebNN
MarkusT: I'm looking forward to Tarek's work
W3C Web & AI Interest Group launched
Anssi: The Web & AI Interest Group is a forum to discuss ethical, societal, and technical implications of AI related technologies. Ethical Principles for Web Machine Learning established as a joint deliverable.
W3C Web & AI Interest Group Charter
… I had an exchange with Fabien Gandon who co-chairs the IG
… unfortunately Fabien couldn't join us today, but I'm conveying his welcome and invite anyone interested in ethical, societal, and technical implications of AI related technologies to join the IG
… we will develop our Ethical Principles document together with this newly formed IG
… to join, please follow the link in the charter document
Dom: thanks for the intro, the way to think about the IG, a place for the broader picture, this WebML WG does excellent deep work on technical specifications
… IG looks primarily non-technical topics, higher-level considerations how the AI & Web ecosystems can evolve harmoniously
Anssi: what is the IG's work mode?
Dom: GH-driven, some meetings planned
Anssi: is there flexibility in terms of non-normative deliverables the IG could work on?
Dom: yes, new non-normative deliverables would be welcome, the W3C team contact is working on a roadmap for the IG
Core operator set
Expand the expand operator to support blockwise broadcasting
Anssi: issue #903
<gb> Issue 903 Expand the expand operator to support blockwise broadcasting (by fdwr) [opset]
Anssi: this is one of the sub-issues spun off from the core op set meta issue
Dwayne: A) the preferred proposal is to:
… - move the blockwise broadcasting aspect into expand
… - leave the rest of the decomposition being the respective mul/div/sub/add for Q and DQ
Dwayne: B) the alternative considered:
… - extend resample with nearest neighbor to support multiple axes
Anssi: the preferred proposal is motivated by better conceptual alignment
Anssi: any questions or concerns with the preferred proposal?
Rob: need Reilly's input on Google's side
… will ping Reilly to provide feedback on this issue
Extend rank support
Anssi: issue #904
<gb> Issue 904 2D, or not 2D, that is the question (by fdwr) [opset]
Anssi: the more catchy name for this issue is "2D, or not 2D, that is the question" :-)
… Dwayne reports: "Multiple WebNN operators still have limited ranks which was historically done for backends that might be more limited"
… current backends have evolved since rank support was specified in WebNN
… limited ranks have caused issues in certain popular models
… Dwayne notes Whisper uses 1D conv and thus requires an extra reshape() step
… the issue contains a survey of the current operator rank support for CoreML, DML, LiteRT, ORT backends
… the proposal from Dwayne is to extend the rank support to match the intersection of the rank support of current backends
… the solution can take various API shapes
… Dwayne came up with the following options by studying other libraries:
… - A) Bake the axis count directly into the operator name
… - B) Use a single operator name, with an implicit axis count based on the input rank
… - C) Pass the reduction axis count separately from the input rank
… - D) Pass the explicit axes
… based on the pros/cons analysis option C or D is the most preferred
Anssi: we can reflect platform rank differences through MLOpSupportLimits
… any axis count 1-3 would be legal to WebNN if `axis count <= input rank`, see the table in the issue
Anssi: Dwayne suggests to avoid adding a zoo of new function names:
… foo1, foo2, foo3 etc.
… conv2d -> conv
… convTranspose2d -> convTranspose
… averagePool2d -> averagePool
… l2Pool2d -> l2Pool
… maxPool2d -> maxPool
… resample2d() -> resample
Dwayne: for each op, I can list IDL proposals to help readers
Anssi: does option C or D still allow AOT feature detection of ranks?
… a simple feature detection mechanism is to check for existence of a method on an object
… can we implement such a feature detection of supported ranks entirely with MLOpSupportLimits?
… an example of a simple feature detection:
const graphBuilder = new MLGraphBuilder(await navigator.ml.createContext());
if ('conv' in graphBuilder) console.log('conv() exists');
Anssi: the naming change has a compatibility impact as discussed in context of issue #821
<gb> Issue 821 Operator naming 2D vs 2d (by fdwr) [conventions]
Anssi: given the Origin Trials are imminent, I think this change would land after the initial OT period?
Dwayne: when making a new change, we give it 4 weeks for frameworks to update themselves, leave an alias in place
Composite operators / subgraphs
Anssi: issue #907
<gb> Issue 907 Composite operators / subgraphs (by fdwr) [opset]
Anssi: Core operator set was discussed at TPAC 2025 where we resolved to evolve the proposal for aggregate operators via subgraphs
Anssi: this builds upon the earlier exploration by Ningxin et al. on custom ops discussed at TPAC 2024
Anssi: Dwayne opened this topic-specific issue to pursue this proposal further and shared his background research on the topic (thanks!)
… see also the Case Study on WebNN Small Language Model Performance Optimization presented at TPAC 2025 for further motivation:
WebNN SLM Performance Optimization Case Study at TPAC 2025
Anssi: high-level motivation for the proposal has been discussed in context of the core op set meta issue and I think we have a general agreement
… - 100s of potential operators across ML libraries
… - adding all of them into a Web API is not feasible
… - WebNN core op set is designed to enable composability of larger aggregate ops
… - if the backend has a compatible implementation of the subgraph, can use a more efficient path vs. relying on pattern recognition by the implementation
… a popular concrete example of an aggregate op is multi-head attention, a key component of the transformer architecture introduced in the original 2017 paper
… Dwayne has a code snippet in the issue to demonstrate how this could look like in terms of API surface, basic steps (details, names etc. to be discussed)
Dwayne: a web developer defines a composite operator as a JS function using the existing WebNN built-in ops
… buildSubgraph() method returns the built subgraph
… subgraph() method returns the output given the built subgraph and input
Dwayne: this is more of an example, ideas welcome
MarkusT: how to handle different constants?
… do we want to get subgraph names?
Dwayne: would a name be helpful for a backend to recognize?
MarkusT: ML is done by frameworks, and pattern matching the subgraph, can pre-check if this is a name I expect, the name would be a hash to know what to pattern match against
Dwayne: looked at various ML libraries, would a list of candidate names be better?
MarkusT: if you dump the subgraph into debugging tool, the name would help with debugging
… subgraphs calling subgraphs?
Dwayne: seems useful for composability?
MarkusT: the input would be dynamic, the shape of the input determined by whatever output is sent to the input, subgraphs are like macros
Dwayne: ONNX has this concept of functions composed of multiple graphs
MarkusT: do we expect macro expansion by every backend?
Dwayne: not sure about that, each backend or layer below, should know the capabilities of the backend
MarkusT: if the backend would support subgraphs, perhaps the WebNN native interface would unroll
<Zakim> zkis, you wanted to ask if we maintain the semantics this was meant to be tanh
Zoltan: question, do we want to maintain semantics?
Dwayne: MarkusT's idea of including names would help with that
Zoltan: we should discuss whether we need to "standardize" those names
… an annotation mechanism
MarkusT: not prefer any meta information, new operation, how long does it take for us to standardize vs backend find a name and implement it?
Ningxin: to express an operator, the ops take optional input, for some attention ops, they have optional input too, how the subgraph concept can support that
… secondly, some existing WebNN ops have attributes, WebNN conventions
Ningxin: static attributes, how to go about them?
Dwayne: will add that as a consideration
MarkusT: what if attributes could override?
Dwayne: I suspect so
Push v pull architecture for constants
Anssi: issue #901
<gb> Issue 901 Proposal: API to Separate Graph Building from Weight Loading to Reduce Peak Memory Usage (by mtavenrath) [feature request]
Anssi: we discussed this proposal to reduce peak memory usage from Markus and the NVIDIA team at TPAC 2025
… and resolved to explore streaming constructor for constants
https://
Anssi: after TPAC, Markus provided further details on the benefits of the proposed pull-based model for constants in this issue:
… - 1. Latency Hiding via Parallel Compilation
… - 2. Direct-to-Disk Caching & I/O Alignment
… - 3. Persistent Layout Optimization
… - 4. Memory Architecture & UVM Efficiency
… - 5. Dynamic Resource Management
… and with the broader NVIDIA team looked into remote execution of neural networks, e.g. on a home-server, using external weights
Anssi: Dwayne notes external weights are already achievable via MLTensor when combined with MLGraphBuilder.input() method
… this allows MLGraph to be build without weights, to be written later via writeTensor()
… Dwayne suggests this addresses some of the concerns raised in this issue?
… what functionality do we miss with MLTensor and input()?
… I guess 5. dynamic resource management?
MarkusT: parsing constants is delayed, not all backends happy to call writeTensor()
… 5 points based on discussion with Reilly, if we have external resources we don't need to do memory copies at all, get the data at the time when you need it
… backends can pull the resources on demand, responsibility on the backend implementation
… we were wondering about caching, current ORT likely downloads all content, backend could be faster than code running in a JS process
… we're currently doing work in another ML framework with similar optimizations
MarkusT: pass an URL and offset proposal by Reilly sounded good, GGUF file passing to GraphBuilder and give input tensor names and we're done
Dwayne: I'll think about this more
Device selection
Device selection criteria for usecase-driven scenarios
Anssi: issue #902
<gb> Issue 902 Device selection criteria for usecase-driven scenarios (by fdwr) [device selection]
Anssi: we discussed this at TPAC 2025:
https://
Anssi: - there was consensus that generally hints is the preferred mechanism, but no decision on which hints to pursue, if any, I posted an IDL diff to tease additional perspectives
… - there was interest in supporting multiple devices of a given type
… - there was an agreement prompt fatique is an issue, still evolving page embedded permission control (PEPC) might be a solution to that
Anssi: - proposal that hints would help UA schedule real-time vs non-real time workloads running in parallel
MarkusH: if we have explicit (or implicitly detected by UA) Worker QoS, would there remain a use case for specifying the latency requirement? Same goes for the continuity.
… perhaps Worker QoS is implicitly detectable by the UA, could remove low-latency preference in that case
… a hint of real-time activity going on
MarkusH: perhaps Mike has feedback on an exact interface that would work out