WebML WG Teleconference – 11 September 2025

Meeting minutes

Anssi: we'll start by welcoming our new participants
… please welcome to the WebML WG:
… Rick Viscomi from Google
… Fabio Bernardon and Sandeep Kumar from NVIDIA
… Ilya Grigorik from Shopify
… and please welcome to the WebML CG:
… Uğur Toprakdeviren as an unaffiliated individual
… for new folks joining, this is officially a Working Group call where we focus on WebNN API, but by recent convention we've provided a quick update on incubations such as WebMCP and Built-in AI APIs at the beginning of the meeting
… for detailed discussion on incubations we have a separate call, we'll do some adjustments to that call schedule, to be discused in a few minutes

Incubations

Repository: webmachinelearning/webmcp

Anssi: first, an update on the recent WebML Community Group developments

WebMCP

Alex: thanks, I'm super excited to see this standards work starting, while working at Amazon I saw the need for this feature, in terms of what to get out of this group, fleshing out the spec, excited to contribute and elevate the spec to make web better

Jason: I saw the friction when MCP was out and started prototyping, being able to have the compute we owned by user and expose the value directly, thought it is a great idea, so put together an early implementation
… we met with Alex a few months ago and have been hashing this space out together, security is very important to get right

Anssi: recent WebMCP feature discussions include:
… - WebMCP for Service Workers explainer #19

<gb> MERGED Pull Request 19 Add new explainer for service workers (by bwalderman)

Brandon: feedback via dedicated issues is welcome for WebMCP for SW

Anssi: Capability discovery #8

<gb> Issue 8 Should tools be a means for capability discovery? (by bokand)

Anssi: licitation #21

<gb> Issue 21 EElicitation (by bwalderman)

Anssi: API design #15

<gb> Issue 15 API design (by bwalderman)

Anssi: Interleaving interaction #20

<gb> Issue 20 Interleaving user and Agent interaction with the site (by khushalsagar)

Anssi: Declarative API #22

<gb> Issue 22 Declarative API Equivalent (by EisenbergEffect)

Anssi: Prompt injection #11

<gb> Issue 11 Prompt injection (by bwalderman)

Anssi: API to list registered tools #16

<gb> Issue 16 Add API to list / execute tools? (by bokand)

Anssi: thank you everyone who contributed to these discussions
… this formative stage of the WebMCP proposal is the right time to join the effort

Brandon: encourage to read the explainer and want to highlight prompt injection since security is something we must get right, safety is important for users
… unsolved problem in the MCP ecosystem as well so all input is welcome

Community Group meeting schedule

Anssi: I'm proposing a change to the Community Group meeting schedule

<RafaelCintron> I would be in favor of interleaving.

Anssi: proposal to reuse this Thursday 15:00 UTC / 08:00 AM Pacific meeting slot for the Community Group call to better support AMER geo during the WebMCP ramp-up phase
… since this Working Group meets every other week, we could interleave the Community Group meeting with WebMCP focus on either even or odd weeks
… I believe this would simplify your calendaring exercise
… the trade-off is the time would not be optimal for APAC participation, especially from Japan
… feedback, comments?

Anssi: I see Alex and Jason +1'd
… Rafael, Brandon, Krushal also +1

Prompt API tool calling

Anssi: Prompt API tool calling issues under consideration

https://github.com/webmachinelearning/prompt-api/labels/tools

New features and operator specific issues

Repository: webmachinelearning/webnn

[operator specific] issues

[feature request] issues

Support dynamic tensor resizing for slice and resample2d

Anssi: issue #885

<gb> Issue 885 Support dynamic tensor resizing for slice and resample2d (by Honry) [feature request] [operator specific]

Anssi: first, noting this issue is related to the flexible input sizes issue we'll discuss after this one
… Wanming reports: "Currently, tensor resizing via WebNN’s slice and resample2d is limited to static parameters: slice must use static starts and sizes; resample2d must use static sizes and axes."
… this causes fallback with performance impact in certain models
… proposal is to enable dynamic tensor resizing with the following changes:
… - change slice starts and sizes argument types from unsigned long to MLOperand
… - change MLResample2dOptions.sizes and MLResample2dOptions.axes argument types from unsigned long to MLOperand

MLOperand slice(MLOperand input,
                sequence<[EnforceRange] unsigned long> starts,
                sequence<[EnforceRange] unsigned long> sizes,
                optional MLSliceOptions options = {});
dictionary MLResample2dOptions : MLOperatorOptions {
  MLInterpolationMode mode = "nearest-neighbor";
  sequence<float> scales;
  sequence<[EnforceRange] unsigned long> sizes;
  sequence<[EnforceRange] unsigned long> axes;
};
partial interface MLGraphBuilder {
  MLOperand resample2d(MLOperand input, optional MLResample2dOptions options = {});
};

https://www.w3.org/TR/webnn/#api-mlgraphbuilder-slice

https://www.w3.org/TR/webnn/#api-mlgraphbuilder-resample2d-method

Anssi: Dwayne suggests ORT should resolve patterns of cast/gather/unsqueeze before they reach the dependent operator
… if slice and resample would take dynamic *GPU* tensors as suggested it would waste parallel shader resources
… Dwayne further notes: "if the input parameters were moved into an MLTensor, there would need to be a requirement that any such tensors are CPU-restrained and do not execute on a remote device (GPU/NPU)."
… also note DML EP is able to do this, ensure such tensors stay on the CPU-side
… and asked whether this can apply to WebNN EP too?
… Wanming asked for more information on how DML EP implementation works, does it require graph recompilation?

Dwayne: to answer the question, must recompile the graph
… because dynamic inputs affect the shape, the values need to be known on the CPU-side, transitively not limited to image, must consider traversal
… these are tiny tensors, must consider all overhead of element tensors, no HW benefit if these can be resolved by the caller
… not necessary to change the options knowing DML EP can do this, there's a performance cost
… next step, can WebNN EP construction be delayed similarly to DML EP? I'll chat with Wanming and Ningxin

Reilly: Dwayne's elaboration answered my question, I was confused why only resize must support dynamic shapes, was expecting trickle down effect on the subgraph
… we've discussed dynamic tensor shape in general, helpful for models with KV caches in particular and LLMs in general

Dwayne: I think in general it is useful to support dynamic sizes, but if shape computation is affected it should be done before it reaches WebNN

Reilly: I think dynamic sizing capability would require recompilation, would have to be an intrinsic component of the API w/o going through the rebuilding
… frameworks can rebuild the graph without starting compilation from the scratch

Ningxin: per Wanming's last comment the resize target tensor is used to reflect the exact original image size, for this model if you check SegmentAnything demo, decoder runs many times on a single image, unless a new image is uploaded
… I'm not sure this model is modelled correctly to use dynamic size, in normal case even if the model can accept different image size, it is represented by flexible dimensions
… for the latter case, if the model is rearchitected, the user can recompile the model if the image size changes, does not happen often for each encoder inference

Dwayne: +1

Flexible input sizes

Anssi: issue #883

<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific]

Anssi: last time we introduced this proposal, well documented in the issue thanks to Dwayne's research
… given this is a significant change to the implementation, we want to ensure the main API consumers i.e. the ML JS frameworks' feedback is considered in this design phase
… the group identified ONNX Runtime and Transformers.js as users of the WebNN API who have had this requirement
… in ONNX Runtime, the dynamic shape tensors are passed to Execution Providers
… and currently, WebNN EP falls back to Wasm EP if dynamic shape tensors are passed to it
… per Dwayne's comments DirectML EP supports dynamic input shapes albeit with some performance penalty due to on demand creation of operators or delayed graph construction until shape information is known
… I did not see direct feedback from Guenther of ORT or Joshua of Transformers.js in the issue

JoshuaL: I'd say dynamic input shapes are input for Transformers.js users' use case
… recently WebNN has been able to work with some vision models that can fix input size and use static shapes, current LLM implementation of ours require dynamic shapes on decodes side, getting a new token adding it back
… we have though about creating another implementation that allow static shapes, but given WebGPU implementation is working
… great to hear there's movement in WebNN to address

<Joshua_Lochner> sounds good! And yes, a massive +1 from my side :)

Reilly: I agree on the value of doing this, there are versions of Whisper models that have memory trade-off, the questions is mostly to the folks familiar with the existing implementation of this in frameworks
… recompilation, how is this implemented in e.g. ORT?
… light-weight compilation process, what's the signal we can implement this across multiple platforms?

Phillis: on Core ML static and dynamic shapes are supported
… dynamic shapes only getting executed on CPU

Anssi: can we get this information via a public API in Core ML?

Phillis: yes, we can test for this

Dwayne: ORT support dynamic shapes, but less efficient
… must go down to individual nodes, cannot replan memory
… DML EP must recompile

Reilly: Transformers.js is using ORT Web, must be working in WebGPU EP

Dwayne: I don't know the details about it, can ask Guenther

Support uint8/int8 input for resample2d

Anssi: issue #872

<gb> Issue 872 Support uint8/int8 input for resample2d (by huningxin) [operator specific]

Anssi: I put this on the agenda as a last call for comments because I felt this was well fleshed out
… current status is 2 of 3 backends support this, is a candidate for optional data type support
… Rafael signalled support at our last meeting
… this is already implemented in Chromium and we have an agreement to make the corresponding spec change
… given no further comments in the issue suggest this can be turned into PR at the convenience of the editors

Mike: +1

Privacy considerations

Anssi: issue #886

<gb> Issue 886 Revise privacy considerations (by anssiko) [privacy-tracker]

Anssi: to close on the privacy review feedback, there are two remaining tasks:
… - review privacy considerations in the light of new information and spec updates
… - migrate relevant parts of the standalone self-review questionnaire to the in-spec privacy considerations section

https://github.com/webmachinelearning/webnn/blob/main/security-privacy.md

https://www.w3.org/TR/webnn/#privacy

Anssi: I can take a look at this but appreciate if more folks review privacy considerations, in particular those who have not contributed to this section yet to have fresh eyes on it

MLGraph Cache

Explainer

Anssi: explainer PR #862 was merged some time ago, the PR includes comments that provide context

<gb> MERGED Pull Request 862 Add WebNN MLGraph Cache Explainer (by anssiko)

Anssi: to recap, the proposal is an explicit API for caching compiled graphs, allowing web applications to save and reuse them, thereby reducing the overhead of repeated compilation
… this feature awaits experimental implementation experience
… Wanming was planning to do an experiment on ONNX Runtime Web with WebNN model cache support
… Ningxin, do you know the latest status, is this work planned or should we defer this for later?

Ningxin: Wanming's plan is depending on the Chromium prototype, that OTOH has dependency on a related ORT API
… for other backends like TFLite and Core ML, there's an opportunity for further implementation experience

Reilly: it is on our roadmap, currently focused on some other components of the system, that said, is on our radar definitely
… my last update on this is that Intel folks looked at building large models, and solutions for building large models and caching them
… storing weights on disk is step 1 of caching the model

Mike: I'll take a look at this issue and come back

Reilly: in Chromium implementation we already store Core ML models because that is required, but reuse later is not yet implemented explicitly
… there is a nice properly in Core ML, ORT, TFLite, they work on high-level concepts, if the context is CPU context initially, you can build on CPU and if the developer later wants high-performance it can be reused without rebuilding the model

Query supported devices

Anssi: as you recall, we split this problem space in two to be more digestible: before and after graph compilation
… the group needs to decide whether it wants to proceed with "before" case, "after" case, with both, or do neither
… for any new feature we need both real-world use case validation and implementation experience
… for the "before" case we have the use cases, but lack implementation experience
… for the "after" case we're missing use cases, but have some implementation experience
… let's discuss the "before graph compilation" case first

Before graph compilation

Anssi: the PR was updated by Zoltan to factor in feedback from the previous call, thanks!
… I want us to discuss implementability challenges that were brought up at our last meeting
… the updated proposal introduces the following changes to the API surface:
… 1) a new "accelerated" MLPowerPreference hint, this is developer-settable to true or false
… 2) getter to expose the post-context creation confidence level that the requested "accelerated" context is available, returns one of:
… "probably" -> the context will mostly use GPU/NPU, but CPU fallback may happen
… "best-effort" -> NPU/GPU is supported by the platform but it cannot guarantee it for sure
… "no" -> the platform tells it likely cannot provide NPU or GPU
… I believe this proposal considers all the use cases discussed so far
… we'd need implementation experience, preferably from multiple backends, e.g. OpenVINO, TFLite, CoreML?

Rafael: I'd like to understand the implementability of this feature
… it is unsettling to have more APIs with less explicit answers

Zoltan: certainly we can do true or false, in that case we have the CPU fallback information to avoid undefined or fuzzy space
… should CPU fallback be property or event?

MikeW: effectively agree with Rafael's concerns

Rafael: I think the main thing we're struggling with is that at context time we don't have graph at all, some implementations have generous opSupportLimits
… but later learns, cannot satisfy those limits

Phillis: same as with after graph compilation query?

Rafael: if there's no GPU, we can say not accelerated

Phillis: the reality is we cannot get any useful information before graph compilation, except whether there's the actual physical GPU or NPU on the system

Zoltan: ops and capabilities we haven't fleshed out yet, MikeW made a proposal for that, we can pick it up if this does not work out

Zoltan: Rafael, do we also need the capability query?

Rafael: do you mean, is it sufficient to specify the op limits?
… WebGPU allows asking "want HW accelerated", and the API responds "no HW acceleration for you, sorry!"
… it is worth having the feature if the developer only cares about WebNN

After graph compilation

anssik: issue #836 and PR #854

<gb> Pull Request 854 define graph.devices (by philloooo)

<gb> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]

anssik: for this "after graph compilation" case we have demonstrated imlementability with a Chromium prototype
… however, as discussed last time, the use cases are not documented so we wanted to work on those before making progress with this as a spec feature to ensure the proposed solution targets real-world use cases

Phillis: the use cases are what Markus mentioned before, we have a use case in abstract, we also considered example app with that use case
… it is not high priority currently, but is on our roadmap

– DRAFT –
WebML WG Teleconference – 11 September 2025

11 September 2025

Attendees