WebML WG Teleconference – 24 April 2025

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome Aditya Chhabra, Drew Morris, Adnaan Nazir, Josi Rosenfeld, Eugen Thaci, Jeffrey Phillips Freeman from CleverThis Inc. to the WebML WG and CG!
… CleverThis is a tech company with interest in open standards and responsible and explainable AI

Eugen: I'm assisting Jeffrey in this WG

anssik: my expectation is CleverThis participants will be interested in contributing to the Ethical Principles work we have in this WG

anssik: also, please welcome Kevin Petit from ARM Limited to the WebML WG
… and finally, please welcome Jonathan Schneerson representing Temporal Series AI, a startup working on AI-driven solutions for financial markets, to the WebML CG!

Incubations

anssik: our next CG meeting is at EU and APAC friendly time on Mon 28 April:

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-04-28-cg-agenda.md

anssik: the Community Group is adding a new deliverable Proofreader API to its scope

https://lists.w3.org/Archives/Public/public-webmachinelearning/2025Apr/0001.html

anssik: please join on Mon if you're interested in the new local inference web extension experiment proposed by Tarek/Mozilla for consideration as a new CG deliverable
… we'll also discuss Prompt API feature requests: multi-modal real-time capabilities, Model Context Protocol support
… built-in AI APIs received early wide review, added shared privacy and security considerations, and i18n expert feedback helped address identified i18n issues with language detector

WG-CG collaboration

anssik: to encourage broader participation, I pushed this WG meeting forward by 1 hour
… an optional proposal is to reuse certain CG meetings for both WebNN and Built-in AI discussions, available options:
… AMER-APAC friendly meetings ~monthly Tue 17:00 PDT / Wed 08:00 CST
… EU-APAC friendly meeting ~monthly Mon 9:00 CEST / Mon 15:00 CST

anssik: the standards group (WG) and the incubation group (CG) have different IPR policies, but if the active participants are in both we can make this happen, you can join at:

https://webmachinelearning.github.io/community/#join

anssik: finally, the intent is not to create more meetings, but reuse the existing meeting infra more flexibly
… questions, feedback, thoughts?

Christian: +1 for the EU friendly slot

Ningxin: both slots are Shanghai friendly, so good with both

<jsbell> For me: neutral; whether a discussion is useful in a given meeting depends who will be present, so knowing agenda and predicted attendance could make it work

MikeW: Tue 17:00 PDT usually works

BlinkOn 20 takeaways

anssik: BlinkOn 20 happened earlier this month
… I'd like to discuss takeaways relevant to this group
… we have representatives from non-Chromium engine projects in this group, so I felt it is helpful to update everyone on discussions relevant to our cross-browser spec efforts
… and thanks to BlinkOn organizers for publicly sharing videos from the sessions
… I found the following two talks particularly relevant to the WebML WG and CG:

Compute Abstraction for AI: Wasm, WebGPU, and WebNN

Expandable Built in AI: Opening the Vision with Shared AI Models
… in the agenda I mapped a few BlinkOn discussion topics to our spec issues
… I likely missed some relevant talks, so please fill me in
… for WebNN:
… - Expose available hardware → device selection
… - Ahead-of-time compilation → MLGraph caching
… - NPU differences & model portability → op support limits future work?
… - Hybrid execution → best-effort buffer-sharing with tensors and constants

<jsbell> 3 more discussions on the "AI track":

<jsbell> Exploring Challenges with Cross Orign Resource & Model Sharing

<jsbell> Built in AI: One Year Later

<jsbell> WebGPU in Blink Discussion

jsbell: three other talks that are relevant
… cross-origin sharing that is experimental
… another built-in AI talk
… third one was WebGPU talk relevant for the interop aspects
… the broader audience did not know the specifics of compute APIs in general, so the talks were on a higher level of abstraction
… anssik: for built-in AI APIs:
… - Common model format & ops → (paused) Model Loader API, core operator set
… - Model sharing → built-in models exposed to low-level compute APIs?
… - Built-in model transparency → Model Cards integration?

MikeW: WebKit contributors meeting is in the Fall

Etienne: my key takeaway was AI is everywhere, inference on the web, a lot of feedback on Kenji's and my talk about built-in AI APIs
… Prompt API with structured output gathered interest, differences between browsers, but we think developers can work around that
… shared AI models has potential to solve same-origin cache issue

Operator specific issues

anssik: today we'll review and discuss operator specific issues that reduce code complexity and improve maintainability

[operator specific] issues

anssik: the `pad` operator has a few such open issues I wanted to discuss next

Clarify constraints for pad operation

anssik: issue #377

<gb> Issue 377 Need clarify constraint of 'beginningPadding' and 'endingPadding' for pad operation in "reflection" and "symmetric" mode (by BruceDai) [operator specific] [interop]

anssik: to elaborate, the issue is about the need to clarify constraint of 'beginningPadding' and 'endingPadding' for pad operation in "reflection" and "symmetric" mode

https://www.w3.org/TR/webnn/#api-mlgraphbuilder-pad

anssik: in the issue discussion the following platform APIs and frameworks have been reviewed:
… - DML_PADDING_OPERATOR_DESC
… - tf.mirrorPad
… - torch.nn.ReflectionPad2d behavior
… and recently Phillis shared Core ML constraints for its "reflect" and "replicate" modes that map to MLPaddingModes "reflection" and "edge" respectively
… and per Ningin's experiment, without limiting the padding size, TFLite gives different results than DirectML
… it looks like the latest proposal is to limit the padding size
… this proposal was supported by both Dwayne and Ningxin

Dwayne: correct

Ningxin: +1

anssik: rationale is the following, I think:
… - this is a safer option
… - allows for future extension to support extended wrapping if we identify models that require that feature to perform well
… - as a bonus, this behaviour could be emulated
… do we have an agreement to proceed with limiting the padding size?

<jsbell> SGTM!

anssik: no concerns, editors are free to proceed with the proposed solution

<jsbell> For #739, I think the big question is: can we drop "symmetric" ?

<gb> Issue 739 Limited support for pad on CoreML backend (by philloooo) [operator specific] [interop]

Limited support for pad on CoreML backend

anssik: issue #739
… this issue reports findings from Core ML backend implementation, specifically constraints wrt MLPaddingMode equivalent:

https://www.w3.org/TR/webnn/#enumdef-mlpaddingmode

anssik: we touched some of this in the previous issue

anssik: Phillis reports from Core ML:
… 1. `symmetric` mode is not supported
… 2. padding for more than the last two dimensions only supports `constant` mode
… 3. If mode is `reflect` (aka `reflection`) then beginning and ending paddings can be at most input size-1
… 4. If mode is `replicate` (aka `edge`) then beginning and ending paddings can be at most input size
… we discussed 3 and 4 in context of issue #377

<gb> Issue 377 Need clarify constraint of 'beginningPadding' and 'endingPadding' for pad operation in "reflection" and "symmetric" mode (by BruceDai) [operator specific] [interop]

anssik: for 1 and 2 the question is can they be emulated?
… the issue also contains proposals for Core ML pad future work
… MikeW has there been any updates to Core ML in this regard, how to share this feedback with your Core ML team?

MikeW: I can take the suggestions to the Core ML team

jsbell: based on research it looks like only one backend supports symmetric, so suggest dropping it from the spec and add if it is needed

Dwayne: I'm OK with it, I don't recall a model that used it
… OK dropping symmetric, another possibility is to have opSupportLimits support for modes
… if we can just drop symmetric and emulate edge case it is OK

ningxin: if there are no well-known use cases I'm fine dropping it to simplify implementation, if use cases are identified later we can reconsider

<Joshua_Lochner> https://github.com/huggingface/transformers.js-benchmarking/tree/main/data

Joshua_Lochner: I could assist with what models are supporting that, can provide additional information from transformers benchmarking database, on models that use symmetric padding
… now says uses pad operator, can expand what mode is used

Dwayne: data on this welcome in #739

<gb> Issue 739 Limited support for pad on CoreML backend (by philloooo) [operator specific] [interop]

Tensors for graph constants

anssik: issue #760 PR #830

<gb> MERGED Pull Request 830 Allow tensors for graph constants. (by bbernhar)

<gb> CLOSED Issue 760 Support building graphs from `MLTensor` containing constants (by bbernhar) [feature request]

anssik: over the course of the last two weeks all the suggestions in the PR were addressed, PR reviewed and merged, thank you everyone
… thanks Bryan for the PR, Bryan and Austin for prototyping, others for reviews and help
… from the IDL perspective, the change is the following:

interface MLContext {
+  Promise<MLTensor> createConstantTensor(
+    MLOperandDescriptor descriptor, AllowSharedBufferSource inputData);
   // ...
};
interface MLTensor {
   readonly attribute FrozenArray<unsigned long> shape;
   readonly attribute boolean readable;
   readonly attribute boolean writable;
+  readonly attribute boolean constant;
   undefined destroy();
 };
interface MLGraphBuilder {
  // Create an operand for a graph constant.
  MLOperand constant(MLOperandDescriptor descriptor,
                     AllowSharedBufferSource buffer);
   // Create a scalar operand from the specified number of the specified type.
   MLOperand constant(MLOperandDataType type, MLNumber value);
+  // Create an operand from a specified constant tensor.
+  MLOperand constant(MLTensor tensor);
   // Compile the graph up to the specified output operands asynchronously.
   Promise<MLGraph> build(MLNamedOperands outputs);
};

anssik: anything specific to report about this PR?

Dwayne: constants gives the backend an opportunity to optimize data layout, as it is not changing in the future

jsbell: question, what are the plans for ORT Web to take advantage of this?

ningxin: we could have that in our plan, we have a use case for a merged model, two subgraphs sharing constants as weights
… will update the group when available

Core operator set

anssik: #573

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

anssik: recently Arm Ltd folks joined the WG (welcome!) so I wanted to reinvigorate this effort that aims to identify current primitive gaps by mapping compositional fundamentals (e.g. PyTorch prims, TOSA, StableHLO) to WebNN operators
… Dwayne has produced a table to help with this mapping:

Machine Learning Operator Mapping - All Raw Operators

anssik: in particular I wanted to discuss if the TOSA mappings identified as part of this exercise have any open questions attached to them that ARM participants could help address
… for reference, the MLIR TOSA dialect implements the TOSA specification:

MLIR TOSA dialect

TOSA 1.0.0 draft spec

anssik: I note some of the recently added Wave 3 WebNN ops have not been mapped to TOSA yet in this table:

transpose
triangular
where
tile
sign
scatterNd
gatherNd
gatherElements
scatterElements
cumulativeSum
quantizeLinear
dequantizeLinear
logicalAnd
logicalOr
logicalXor
notEqual

anssik: it looks like some of these map quite directly to the TOSA 1.0.0 spec but it'd help if someone working closely with TOSA could take a look and provide feedback in this issue?

Dwayne: I remember talking with NVIDIA folks that they're planning to simplify TOSA, so have been waiting for that

ningxin: I want to note issue #817 about rounding op

<gb> Issue 817 Rounding operators (by fdwr) [feature request] [interop]

ningxin: use case for ONNX Runtime Web decomposition, missing rounding op support, rounding should be part of the core op set

<jsbell> +1 to adding rounding (already gave issue thumbs up)

MikeW: with the rounding, the behaviour should be consistent across platforms and backends

Dwayne: Ningxin tested round on Core ML, how about different compute units?

ningxin: that'd be good, I need to investigate how to do that

jsbell: I talked with Phillis and what you request about compute units

jsbell: Dwayne, one of the things in your analysis you found expanding conv beyond 2d make it N dimensional
… do you see models that require that?

<ningxin> whisper uses conv1d

Dwayne: we see conv1d cases

jsbell: Joshua_Lochner, perhaps you can add conv1d to your script?

Joshua_Lochner: can you send me a message with details and I'll follow up, I'm updating the script so everyone can run it locally

Caching mechanism for MLGraph

anssik: issue #807

<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]

anssik: we made a decision at out last meeting to create a new explainer that aligns with the Chromium implementation direction, Reilly is working on that
… meanwhile, we've received very initial Chromium implementation feedack via Shiyi (thanks!)

shiyi9801/chromium#227

<gb> Pull Request 227 [DO NOT SUBMIT] Model cache POC (by shiyi9801)

anssik: there's also an example how this caching feature integrates into an existing image_classification webnn-sample from Shiyi (thanks again!):

https://github.com/webmachinelearning/webnn-samples/compare/master...shiyi9801:webnn-samples:model_cache

anssik: basically the web developer first stores a key locally and replaces the build() line with caching logic wrapped in try...catch block
… the logic first tries to loadGraph() and if that fails falls back to build() followed by saveGraph()
… here's the code snippet:

try {
  console.log("try to load graph...");
  this.graph_ = await this.context_.loadGraph(this.modelCacheKey_);
  console.log("load graph succeed!");
} catch (e) {
  console.log("failed to load graph: ", e.message, " try to build graph...");
  this.graph_ = await this.builder_.build({'output': outputOperand});
  await this.context_.saveGraph(this.modelCacheKey_, this.graph_);
}

anssik: this example looks clear, any feedback or suggestions?
… Ningxin, any feedback from ORT backend model cache experimentation, e.g. does this WebNN feature integrate fit in with ORT Web?

ningxin: have discussed with Wanming, would make it simple from the start, provide the key so WebNN EP does what the sample shows above
… want to have a prototype for this, earlier we discussed if we should bring native ORT EPContext to the web, but that's more complex
… we re-thinked that and want to go with the simpler solution and will let the group know

<anssik> s/… anssik: for built-in AI APIs:/anssik: for built-in AI APIs:

– DRAFT –
WebML WG Teleconference – 24 April 2025

24 April 2025

Attendees