WebML WG Teleconference – 23 October 2025

Meeting minutes

Anssi: we'll start by acknowledging our new participants
… please welcome to the WebML WG:
… Sword Li from Cybozu, a Japanese company developing a web-based workplace collaboration platform
… Haoli Chen from ByteDance, familiar from its global social media platform TikTok
… welcome on board Sword and Haoli
… we look forward to your contributions and product-driven feedback on WebNN
… while new participants are onboarding, with mixed emotions we will say goodbye to our long-standing participant Zoltan who will be stepping away from this Working Group at the end of the month
… Zoltan has a long track record of contributions as one of the first participants in this group, he plans to continue in the WebML Community Group in his future capacity, so we will get to benefit from his expertise in our incubator in the future
… thank you Zoltan for all your contributions to this Working Group since its inception!

Zoltan: thank you everyone, it has been fun to be part of this effort, especially past few years we've gotten a lot of traction and momentum continues

<ningxin> Thanks much, Zoltan, your contribution is highly appreciated!

Incubations

Anssi: next, a quick recap of recent WebML Community Group developments

WebML CG Teleconference – 16 October 2025

Repository: webmachinelearning/webmcp

Anssi: last week we focused on WebMCP that is attracting a lot of attention and new participants interested in this agentic web capability are joining eager to contribute
… here's a brief summary of recent developments:
… - we scheduled WebMCP TPAC F2F discussions on Tuesday Japan morning to allow US West Coast remote participants join at better hours
… - for WebMCP elicitation #21, we resolved the proposed API should give user an option to block abusive sites permanently but throw an error to developers so legitimate sites can implement fallback behaviour

<gb> Issue 21 Elicitation (by bwalderman)

Anssi: - for interleaving interaction #20, we did not identify a concrete use case for informing sites when users decide to take over in the middle of a tool execution, thus we closed this issue with no action

<gb> CLOSED Issue 20 Interleaving user and Agent interaction with the site (by khushalsagar) [Agenda+]

Anssi: - for prompt injection #11, exploration continues tracking MCP upstream developments and by developing the clipboard mitigation idea through prototyping

<gb> Issue 11 Prompt injection (by bwalderman) [Agenda+]

Anssi: - lastly, declarative API PR #26 was discussed, the group agreed to continue evolve this API together with the imperative API, as a learning from Web Components

<gb> Pull Request 26 add explainer for the declarative api (by MiguelsPizza)

Anssi: questions comments?

F2F deep dives

Repository: webmachinelearning/meetings

Anssi: F2F Agenda issue #35

<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)

Anssi: I want to expand the plan for a few core sessions on the WebML WG F2F Agenda to make sure we can make these productive and interesting to you
… three are the following buckets: (1) implementation experience, (2) new features, (3) customer feedback

Implementation experience

Anssi: my proposal is to kick off the "Implementation plans and trials" session with demos

Anssi: kick off this session with demos that exercise diverse hardware accelerator
… from the demo session excitement, we'll transition to discuss browser vendors' trial plans, dissect the latest implementation experience across the layers to inform the WebNN spec development
… I'm aware that implementers may not want to share their detailed plans ahead of product launches, so it is OK to abstract out any such details

Rafael: we're doing all Edge work in upstream, 5-10 days delay from upstream
… any Origin Trials follow Chrome's schedule

Reilly: I mean the question is, what is Chrome's schedule, we're awaiting on finishing the integration with Windows ML API, so we have complete support on Windows, expecting this to land in stable in the next month or so, so likely in a position to do an OT hitting Stable for folks around the start of the year

Anssi: what are the tests and data you are looking for?
… wpt pass rate perhaps?

Reilly: wpt is in good shape, the biggest blocker is to look at various stability metrics, what's the security risk of launching this

Anssi: Edge has its own OT frontend?

Rafael: correct

New features

Repository: webmachinelearning/webnn

Anssi: in the "New features" session, the following have been proposed:
… (1) Core operator set #573 - discuss a plan to extend with attentions, MoE, TopK, MatMulNBits ... or fuse

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

Ningxin: we've investigated ops such as attention and MatMulNBits, decomp performance vs. fused in LLMs
… if time allows, we can share a 10-min update what we've learned, performance vs. code complexity
… (2) Support flexible input sizes #883 - understand and drive consensus on feature details such as dynamic shape types, unknown size, symbolic size, tensor-derived-sized, Markus provided pre-reading via TensorRT docs

<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific]

Ningxin: (3) bag of issues in "device selection" to seek consensus on

https://github.com/webmachinelearning/webnn/labels/device%20selection

Anssi: 1-2 topics can still fit in this "new features" session

webmachinelearning/meetings#35

<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)

Customer feedback

Anssi: "Customer feedback and collaborations" is the session to share feedback from real-world users
… I understand not everyone wants to speak to their future product features, but any feedback that is available either directly from customers, or through a proxy, with confidential details abstracted out, is welcome in this session
… to clarify, we consider customers broadly, feedback from end-users, web developers, frameworks, anyone who is interfacing with the WebNN API, either directly or through an abstraction, is welcome
… Belem has created a repo called Awesome WebNN to collect customer and user feedback signals into one place
… this community resource is one place to look for feedback and signals, anyone is welcome to contribute to this repo

webmachinelearning/awesome-webnn

New features and operator specific issues

Repository: webmachinelearning/webnn

The decomposition of lstm has issue for batch size 1 input

Anssi: issue #889

<gb> Issue 889 The decomposition of `lstm` has issue for batch size 1 input (by fujunwei) [question] [operator specific]

Anssi: Junwei opened this issue while working to enable Kokoro TTS model on TFLite backend
… implementation patch removed specific size 1 dimensions at 0 axis with squeeze_dims option in TFLite GraphBuilder implementation
… are we clear on spec changes required?

Reilly: I wasn't aware there's a spec side issue for this
… I reviewed the Chromium CL

Ningxin: I talked to Junwei offline, looks like our decompose sample code has an issue, to handle 1 side dimension correctly, we use squeeze, that will remote size 1 dimension unintentionally
… we need to fix our sample code because TFLite backend implementation follows the sample code, not spec text issue itself
… I can submit a PR to fix the sample code in the spec

Query supported devices

Before graph compilation

Anssi: spec PR #895 and explainer PR #884

<gb> Pull Request 884 Update explainer with new proposal for simple accelerator mapping (by zolkis) [device selection]

<gb> Pull Request 895 Add a simple accelerator selection mechanism. (by zolkis) [device selection]

Anssi: thanks Zoltan for refining the spec PR since our last discussion
… the PR was updated as follows since last review:
… - add "poll CPU fallback status" algorithm
… - in "create a context" algorithm, cpuFallbackActive initialized to undefined instead of false
… the IDL diff remains the same, two new boolean flags are added to MLContext:

interface MLContext {
  undefined destroy();
+ readonly attribute boolean accelerated;
+ readonly attribute boolean cpuFallbackActive;
  readonly attribute Promise<MLContextLostInfo> lost;
};

Anssi: Markus from Google Meet LGTM'd this PR (thanks!)
… I requested review from Ningxin and Rafael because you had provided feedback earlier, others are welcome to review too

Zoltan: last time it was mentioned we could add a truth table to clarify the combinations, I think that'd be too static, a dynamic relationship is better captured by the algorithm
… I will amend it per feedback

<RafaelCintron> OK, I will take a look.

<ningxin> Yes, I'll take a look

Zoltan: I've updated the explainer and we can merge them together with spec PR
… I will probably clean up the device selection explainer a bit

After graph compilation

Reilly: two pieces, we added graph.devices API to give developer visibility into what happened when they built the model, help with debugging, e.g. why the model is slow etc.

Anssi: issue #836 PR #854

<gb> Pull Request 854 define graph.devices (by philloooo) [device selection]

<gb> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]

Reilly: we could switch this to the same atributes proposed for MLContext, for CPU fallback, because for some implementations it is per-graph behaviour
… properties that come from interfaces could be logged, or information that could come up via developer tools, it is useful to understand how the system will behave in practice

https://github.com/webmachinelearning/webnn/labels/device%20selection

Rafael: with this after compile feature do we need also the before graph feature?

Reilly: I think the after compile is more specific, whether or not the acceleration comes from NPU or GPU
… there's two reasons for this specificity, models on macOS include CPU fallback and run significantly slowed that only use GPU and NPU, so being able to detect that
… maybe fallback is not specifically CPU only but when we use more than one device?
… about demos, we haven't had a chance to develop a demo yet for the after compile case, want to understand real-world case with multiple ML workloads at the same time
… want to understand load-balancing experience

Zoltan: wanted to say, we can add a new property that discloses if any fallback is happening, there was a use case for CPU-specific fallback
… I think Mike's proposal is also worth looking into, MLSupportLimits

Markus: CPU fallback was motivated by Google Meet feedback
… I guess, GPU fallback could be also interesting, but there's not enough experience yet to tell whether that is an appropriate solution

Zoltan: is that a consistent behaviour across platforms?

Reilly: given how the platforms work, that is macOS specific, architecturally we only see this on macOS, because it is the only platform that drives developers selecting three devices as where the model could run, others ask to pick CPU and another accelerator

Reilly: one or two ops falling back to CPU might be high cost due to context switching

Markus: we have a demo app in the works that might inform this discussion
… one thing is interop, packing tensors differently depending on the details of the device
… waiting for some input on interop issues before opening a new spec issue for this

Fabio: we did discuss with Markus T, either end user or developers have control over where workloads execute, think agents running 4-5 different models, you want to know where they run
… could have iGPU, dGPU, NPU, how to distribute the work so you can understand the performance you get
… also understand the privacy considerations, not necessarily want the developer to know the exact details but performance level expected
… no specific solution yet, but discussing how to best address this and how the end user could direct where to execute the workloads

Zoltan: this is a multi-faceted issue due to especially NPU device diversity, can we identify a simple solution that can be extended with more fine-grained information

Rafael: I think you have to be a power user to know WebNN is used and care how to distribute the workload across devices
… developers could be informed enough to do this
… there may be systems where GPU is faster than NPU and the other way around
… as diagnostics information, this would be good to have

Fabio: agree, my preference is to let the end-user figure these out, because they know the system, maybe one direction is to look how to allow the user to make the selection?
… example, Adobe Suite online, different tasks, in those cases you may have light-weight models that runs better on NPU and heavier models that run better on GPU
… we discussed internally if we can have implicit performance level to hang the devices off
… the evolving ecosystem makes this challenging

Rafael: supreme power user could get access to this data, but for normal users the platform should be able to pick the best devices
… most users are non-technical, and do not understand device selection details

Markus: echoing Rafael, maybe we can identify the reasons from wanting to configure this and that, I'd like to use low-latency setup for my use case that is real-time collaboration

Rafael: I'm in favour of hints as a mechanism

Cancel 6 Nov WG telcon due to F2F on 10-11 Nov

Anssi: I will cancel Thurday WebML WG 6 November Teleconference due to close proximity with the TPAC F2F meetings week
… as a reminder:
… please check you have the F2F in your calendar, and if not, export 10 Nov invite from:

https://www.w3.org/groups/wg/webmachinelearning/calendar/

Anssi: don't be confused it says "Tentative", that's a "feature" of the tool and the F2F meeting is Confirmed
… also, please check out the WebML WG agenda and group-specific instructions and share your suggestions:

webmachinelearning/meetings#35

<gb> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)

Anssi: we can use our F2F time together more productively if folks make proposal ahead the meeting for topics of interest to them
… see you in Kobe in-person or virtually on 10-11 November 2025 (or 9-10 November if you're in the US West Coast!)
… safe travels!

– DRAFT –
WebML WG Teleconference – 23 October 2025

23 October 2025

Attendees

Meeting minutes

Incubations

F2F deep dives

Implementation experience

New features

Customer feedback

New features and operator specific issues

The decomposition of lstm has issue for batch size 1 input

Query supported devices

Before graph compilation

After graph compilation

Cancel 6 Nov WG telcon due to F2F on 10-11 Nov

Diagnostics