WebML WG Teleconference – 28 August 2025

Meeting minutes

anssik: we'll start by welcoming our new participants
… please welcome to the WebML WG:
… Joshua Lochner from Hugging Face! Joshua and HF crew is upgrading from an Invited Expert status to a full W3C member
… Markus Tavenrath from NVIDIA
… and please welcome new participants to the WebML CG:
… Alex Nahas and Jason McGhee as individual contributors, bringing in wealth of implementation experience from OSS projects to the WebMCP effort
… Vincent Scheib from Google
… Leonard Rosenthol from Adobe
… Wei Ding and Michael Zhou from Huawei
… Kenneth Christiansen from Intel
… Joon Park from Target Corporation
… Jd Fiscus and Jax Qian as individual contributors
… with mixed emotions, we say goodbye to Domenic Denicola from Google who announced his retirement earlier this week
… Domenic has made foundational contributions to the Built-in AI APIs and to the core of the web platform over the years

David: long-time Blink engineer, primary focus on WebMCP

Hannah: work with David on Script Tools and WebMCP, looking forward to working with you all

Incubations: Built-in AI APIs & agentic web

anssik: an update on the recent WebML Community Group developments

CG charter update

anssik: a call for review of the WebML CG Charter update initiated
… this change update proposes to make the WebMCP API a new WebML CG deliverable
… the WebMCP proposal is shaping up nicely and I'm pleased to see broad support and excitement around this proposal in the ecosystem
… please review the charter update by 2025-09-18, instructions:

WebML CG Charter update, review by 2025-09-18

WebMCP next steps

Repository: webmachinelearning/webmcp

WebMCP explainer

WebMCP proposal

anssik: explainer landed, next steps:
… agree on the API design approach, issue #15

<gb> Issue 15 API design (by bwalderman)

anssik: another important issue to discuss early is how to mitigate prompt injection, see issue #11

<gb> Issue 11 Prompt injection (by bwalderman)

anssik: see also other open issues:

https://github.com/webmachinelearning/webmcp/issues

David: I filed a few additional issues for WebMCP to be able to list execution tools, to open the API up to 3rd parties

anssik: this is issue #16

<gb> Issue 16 Add API to list / execute tools? (by bokand)

Query supported devices

Repository: webmachinelearning/webnn

anssik: now switching from incubations to the WebML WG and WebNN API topics

Before graph compilation

anssik: after extensive discussion we're converging on a design proposal, issue comment:

webmachinelearning/webnn#815 (comment)

<gb> Issue 815 Query supported devices before graph compilation (by anssiko) [device selection]

anssik: there's a PR #884 for the explainer update (thanks Zoltan!)

<gb> Pull Request 884 Update explainer with new proposal for simple accelerator mapping (by zolkis)

anssik: the spec update will arrive in a separate PR
… this proposal introduces the following changes to the API surface:
… 1) getter to expose if CPU fallback is active for the current context
… 2) a new "accelerated" MLPowerPreference hint
… 3) getter to expose if "accelerated" context is available for the current context
… MikeW gave +1 for the proposal

Zoltan: good summary, MikeW has provided feedback, further feedback from other browser vendors welcome
… we'll change the naming, MikeW prefers "accelerated" as the name, I'll update the PR accordingly

RafaelCintron: are we saying we want to take away the MLDeviceType? Or MLPowerPreference

Zoltan: no, we don't take anything away
… the intent of the hint is to signal the app wants the workload to be accelerated

RafaelCintron: you want this to be the input, rather than return?
… Zoltan: it can plainly say no acceleration available

RafaelCintron: "probably", "maybe", "no" is slightly confusing
… would prefer to see examples of real-world platforms and frameworks that return similar values to the proposal

Markus: I didn't understand whether this helps us to understand if there's a CPU fallback or not?

Zoltan: CPU fallback cannot be avoided on CoreML?

Markus: is the earlier query support idea still considered?
… both query support and CPU fallback active, we can live with those
… think background blur, runs well on hardware, we'd like to avoid a situation where we compile the model and find out it falls back

Reilly: wanted to say the problem with an API that allows checking model compatibility before loading it, it is not implementable on top of any of the backends we currently use
… CoreML, TFLite, presumably ONNX Runtime does not support that

Markus: I'm considering a specific platform like CoreML, there are known ways to not be able to execute on GPU or NPU

Reilly: that'd require reverse-engineering

Zoltan: can have context option that considers this use case

After graph compilation

anssik: issue #836 and PR #854

<gb> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]

<gb> Pull Request 854 define graph.devices (by philloooo)

anssik: this feature proposes to extend MLGraph, a compiled immutable graph, with a new property that exposes which device(s) will be used to execute the graph
… we agreed to await for a demo app to identify and validate the use cases
… wanted to check if this is still our plan of record?

Reilly: we haven't had time to do the demo yet, probably best to push this conversation forward with use case documentation

Zoltan: the PR needs to be updated in the light of latest discussion

Operator specific issues

[operator specific] issues

Flexible input sizes

anssik: issue #883

<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific]

anssik: a new feature request to support models whose input sizes are only known at inference time
… for example, certain vision models, transformers with varying input lengths and increasing KV cache e.g. speech recognition and language models
… without this feature app developers needs to modify the model to fix the input size, or add padding
… flexible input size is supported by native frameworks

anssik: Dwayne shared an extensive list of considerations in his issue comment (thanks!):
… API role
… API impact
… execution speed
… memory sharing
… latency of stages
… graph construction overhead
… internal complexity
… per this, Dwayne's suggestion is to add an intermediate step between build() and dispatch() where full shape computation and memory planning can occur
… this intermediate step in terms of an API shape is:

const knownShapeGraph = await potentiallyDynamicShapeGraph.computeShapes(inputs, outputs);

// knownShapeGraph is just an MLGraph where all inputs/outputs are resolved concretely.

// If all input shapes were already known, then the original MLGraph is already such a graph that can be

// passed directly to dispatch. This knownShapeGraph can be held onto by the caller for later

// iterations, and multiple known-shape-graphs of differing input shapes can be held.

context.dispatch(knownShapeGraph, inputs, outputs);

// Dispatch requires the MLGraph to have well defined shapes on its outputs, otherwise it's an error.

Dwayne: intermediate step because I want dispatch to be light-weight, not have shape inference or memory planning dependencies
… from the same original graph, you can create multiple instantiations of different sizes and have control over that
… LLM with a single execution the inputs change, but you want to use that LLM later, naming is subject to change

anssik: is this an intrusive change to the implementation?

ningxin: this proposal makes sense, I haven't seen this step in native frameworks, but I feel this makes sense for the WebNN API abstraction however
… MLOperatorDescriptor should be able to be defined at creation time
… many algorithms are to compute shape so we need to move that to the compute stage from build stage, not a trivial change so need more preparation and discussion perhaps
… I heard this requirement from ONNX Runtime team at TPAC 2024, also Transformers.js had this requirement

Support uint8/int8 input for resample2d

anssik: issue #872

<gb> Issue 872 Support uint8/int8 input for resample2d (by huningxin) [operator specific]

anssik: to recap, this is a proposal from Dwayne to add uint8 to resample2d() allowed input data types
… we wanted to survey native ML API support, and Ningxin provided us the data, thanks!

ONNX's Resize supports int8/uint8 input

CoreML's iOS17 resize supports int32 input

TFLite's resize_bilinear doesn't support integers input

anssik: and Dwayne notes there may be a workaround to TFLite's limitation

anssik: with this data at hand, are we ready to add uint8/int8 input support to resample2d, any concerns?

Dwayne: this could be an optional data type?

<ningxin> sgtm

RafaelCintron: no objections

Reilly: if two backends support this, it is good enough to make it optional

Normalize the behavior when NaN is used for minValue or maxValue of clamp operator

anssik: issue #874

<gb> Issue 874 Normalize the behavior when NaN is used for minValue or maxValue of clamp operator (by wangw-1991) [operator specific]

anssik: this is already implemented in Chromium
… we agreed last time to make the corresponding spec change
… we can skip discussion today and await the PR

Dwayne: correct

Wide review closure

Privacy review: opSupportLimits() privacy considerations

anssik: issue #875 and PR #881

<gb> Pull Request 881 Add opSupportLimits() privacy considerations for fingerprinting (by anssiko)

<gb> Issue 875 Future-proofing privacy considerations of New API feature to identify feature support per operator (by sandandsnow) [privacy-tracker]

anssik: this PR is in response to Privacy group's review feedback regarding opSupportLimits() and its fingerprinting impact
… thanks Phillis and Reilly for your review and comments, PR updated
… the group's proposed response is to add the following informative note to opSupportLimits() spec:

"NOTE: The opSupportLimits() API is not intended to provide additional entropy for browser fingerprinting. In current implementations this feature support information can be inferred from the OS and browser version alone. If the diversity of future implementations warrants it, this API allows future implementations to add new privacy mitigations e.g. to bucket capabilities similar to WebGPU to reduce entropy."

anssik: I'd welcome review from Rafael and/or MikeW for WebGPU perspective

RafaelCintron: will take a look

Security review

w3c/security-request#85

<gb> Issue 85 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [REVIEW REQUESTED] [pending] [CR]

anssik: this is the only review request with no response from the security reviewers
… I updated the review request to note if we don't hear from them we consider this review non-blocking for our spec progress
… once we close on these two, I'll initiate the process to publish a new Candidate Recommendation Snapshot so we keep our annual cadence

Open PRs

ningxin: I think PR #882 is ready to merge

<gb> Pull Request 882 Bugfix: Only allow 1 to N rank input for operators that take 1 axis (by huningxin)

anssik: PR #857

<gb> Pull Request 857 Support rankRange for op output tensors in opSupportLimits (by huningxin)

Dwayne: that's pretty mature right now?

ningxin: while prototyping I figured out the change is more than expected, still considering whether to go Phillis' way with a simpler solution
… I'll revisit this PR

Reilly: I think the PR update is that w-p-t has been updated with required and optional test cases
… we should start a PR to add minimum data type set to the spec

– DRAFT –
WebML WG Teleconference – 28 August 2025

28 August 2025

Attendees