WebML WG Teleconference – 8 May 2025

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome new participants!
… Taylore Givens from Microsoft joining the WebML WG

Rafael: Taylore is working on Edge

anssik: Ehsan Toreini joining the WebML WG and CG affiliated with Samsung, focusing on privacy and trustworthiness of ML
… and Guy Bary joining the WebML CG as an individual contributor

Incubations

Etienne: there's more discussion in the European meetings, so maybe if we have a few topics we can discuss them in the WG meeting

anssik: our previous Community Group meeting was an EU-APAC friendly one on Mon 28 April

anssik: I'll share a few key takeaways from that meeting, if you're interested in details please check the agenda and minutes for references:

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-04-28-cg-agenda.md

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-04-28-cg-minutes.md

anssik: the Community Group discussed a new proposal from Tarek/Mozilla called "Local inference web extension", Tarek opened a new issue for gathering input in our community proposals repo (thanks!)

Local inference web extension (proposal)

<gb> Issue 9 Local Inference Web extension (by tarekziade)

anssik: we also discussed the following Prompt API feature requests:
… - an output language support detection option -- this feature itself garnered support and contributions are welcome on naming suggestions
… - multi-modal real-time capabilities -- we agreed this is potential future work, we also discussed the Web Speech API intersection with built-in AI APIs
… - Model Context Protocol support -- a hot topic in the industry, we agreed this is an early exploration topic in the web context for interest folks, Christian has been doing MCP web exploration and I encouraged him to share with the community any progress
… we acknowledged we received early wide review feedback for incubations, from the TAG for privacy and security and i18n for Language Detector, we're in fact ahead of expectations what comes to incubations review given we're already engaging with experts from the horizontal groups
… any questions?

Operator specific issues

anssik: today we'll again focus our review and discussion on operator specific issues that reduce code complexity and improve maintainability

[operator specific] issues

layerNormalization

anssik: issue #748

<gb> Issue 748 Is there a reason to support out of order `axes` for layerNormalization? (by philloooo) [question] [operator specific]

anssik: Phillis reported findings from Core ML backend, when layerNormalization axes are out of order the result is wrong, sometimes

WebNN spec > the layerNormalization algorithm

anssik: Phillis asks why do we support unordered (aka inconsecutive) axes, are there any use cases?
… it is noted we could emulate inconsecutive axes with transpose
… Lisa also noted recently that ORT supports only last consecutive axes (initially discussed in issue #832)

<gb> CLOSED Issue 832 Consider only support consecutive axes for LayerNormalization (by lisa0314)

anssik: a proposed fix would be to remove support for inconsecutive axes, any concerns with that?

Dwayne: the way I read, this mean inverted order, OK to have an axes 0 to 3 and skip over 1, not 1 where axes are in decreasing order
… I was looking at the test case

Dwayne: I'll confirm whether this is non-increasing or non-adjecent, I'd be in inclined to keep non-adjecent

anssik: this issue was first identified by Austin via w-p-t test failures while running test on macOS, see Chromium CL comments:

https://chromium-review.googlesource.com/c/chromium/src/+/5516331/4/third_party/blink/web_tests/platform/mac/virtual/webnn-service-with-gpu/external/wpt/webnn/conformance_tests/layer_normalization.https.any_gpu-expected.txt#3

anssik: currently wpt.fyi tests do not run on macOS, latest passing results:
… - PASS using CPU device on Chrome/Linux and Edge/Windows
… - PASS using GPU device on Chrome/Linux
… - PASS using NPU device on Chrome/Linux

Ningxin: wpt.fyi does not run against macOS Chrome, Chrome infrastructure does, run for every CL, Chrome folks are more familiar with the CI system
… to get CLs merged need to pass on all platforms

Dwayne: I'll add a comment to clarify expectations

Ningxin: in Core ML docs there's no in order axes requirement, is this missing doc or implementation bug?

MikeW: I'll check if docs are missing or if this is unspecified

triangular

anssik: issue #768

<gb> Issue 768 Consider removing or redesigning the `triangular` operator (by a-sully) [operator specific]

WebNN spec > triangular

anssik: Austin initially reported on Oct 2024 that all backends (DML, TFLite, Core ML) emulate triagular with a mask
… Austin proposed to consider removing triangular
… triangular was added as part of initial ops for transformers support in 2023
… Dwayne shared a very detailed investigation in the issue (much thanks!)
… and explained the specific triangular behaviour was inspired by numpy.tri*, torch.tri*, onnx.ai.Trilu, tf.linalg.diag APIs
… and how the Core ML's band_part missing the shift offset parameter causes an incompatibility

<Joshua_Lochner> Apologies for the late join! Had to take dogs to the vet.

anssik: and shared a use case: Stable Diffusion's text encoder uses triangular
… but noted only a few occurrences within a single model
… Dwayne seems to also suggest the removal of triangular is justified

Dwayne: there's new information on Trilu ops popularity

<Joshua_Lochner> I can get that information with huggingface/transformers.js-benchmarking. Give me a second

webmachinelearning/webnn#768 (comment)

<gb> Issue 768 Consider removing or redesigning the `triangular` operator (by a-sully) [operator specific]

Dwayne: there's decomp we could use

<Joshua_Lochner> https://huggingface.co/datasets/onnx-community/model-explorer

Joshua: I published the full dataset ^
… this gives a good idea what's used in the wild

Dwayne: knowing the number of occurrences within a model is also useful

Joshua: Trilu op was of interest, I can check that soon

Dwayne: before removing triangular would like to check with new information at hand

Ningxin: do we know any native optimized implementations for Trilu?
… that'd help optimize the kernels and improve performance of models using this op, I shared OpenVINO implementation, but it's also decomposed

Dwayne: DML one does

Ningxin: please share the DML details as a comment

Dwayne: will do

sign

anssik: issue #845

<gb> Issue 845 The allowed data types of input operand for `sign` operator should also include `int64` type (by BruceDai) [operator specific]

anssik: while updating the wpt tests for the sign operator, Bruce noticed the sign does not support int64 data type
… int64 data type was in Dwayne's original transformer ops proposal (from 2024) for sign:

webmachinelearning/webnn#375 (comment)

<gb> CLOSED Issue 375 Support for transformers (by dontcallmedom) [opset]

anssik: the reason for removing int64 was the lack of Core ML support
… however, since then the WebNN API added opSupportLimits API that allows checking for supported data types
… this seems analogous to how WebGPU allows developers to test for various GPU texture format support:

WebGPU spec > GPUTextureFormat

<dwayner> https://chromium-review.googlesource.com/c/chromium/src/+/6489214

anssik: any concerns with adding int64 data type support considering opSupportLimits API gives an option for developers to detect support?

Dwayne: I see test cases for int64 are already in wpt

MikeW: is this support an optional feature?

Dwayne: correct

MikeW: that'd be fine with me

anssik: it seems we have consensus, editors are free to submit a spec PR

WebNN wide review

anssik: I wanted to have an interim discussion on the wide review feedback we've received so we stay on top of the latest, and can agree on our response and react swiftly
… the wide review tracker has been updated:

webmachinelearning/webnn#239 (comment)

<gb> Issue 239 Wide review tracker (by anssiko) [process]

Accessibility

anssik: for Accessibility review, we have draft feedback

w3c/apa#350

<gb> Issue 350 WebNN Review (APA thread) (by matatk)

anssik: in summary, the feedback suggests a11y considerations for use cases
… as a response to this feedback, I’d propose we add a note similar to what we have for privacy at the top of use case section

https://www.w3.org/TR/webnn/#usecases

anssik: this feedback does not directly propose changes to the WebNN API surface, but suggests e.g. image and video elements that are part of the web-based AI experience are accessible

anssik: this proposed a11y informed text would be non-normative and worded e.g. “authors are encouraged to …” to avoid any normative language (MUST, SHOULD)

anssik: when we get the official a11y review feedback, I will prepare a group's response for your review
… any comments?

Architecture/TAG

anssik: for Architecture review by the TAG, our request is triaged with a focus on API design
… no feedback received yet

Internationalisation

anssik: for Internationalisation review, we received a suggestion recorded in a new issue:

webmachinelearning/webnn#837

<gb> Issue 837 String metadata and localization for operator labels (by xfq) [i18n-needs-resolution]

anssik: the review feedback is about string metadata and localization for operator labels

anssik: Josh crafted a PR with a proposed response (thanks!):

webmachinelearning/webnn#841

<gb> Pull Request 841 Add notes regarding label usage, provided by i18n review (by inexorabletash)

anssik: the proposal is to add informative notes notes regarding label usage per i18n reviewers suggestions
… I'm expecting us to merge this PR after the remaining minor comments are addressed
… any questions or comments?

Privacy

anssik: for privacy review, we received one question: "does [opSupportLimits] only provide information about what the browser provides? When you say "roughly corresponds" what do you mean?"

w3cping/privacy-request#156 (comment)

<gb> Issue 156 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [CR] [pending] [REVIEW REQUESTED]

anssik: I responded explaining the WebNN API does not directly provide any information about the underlying ML APIs or libraries
… I also noted the feature support per operator can be also inferred from the OS and browser version
… the WebNN API is on the Privacy WG's 2025-05-15 (next Friday) agenda so I expect to get full review feedback soon

Security

anssik: for Security review, no feedback yet

anssik: that's the latest on wide review, questions, comments?

Explainer updates

WebNN explainer

anssik: the WebNN API explainer gives a high-level overview of the API

WebNN explainer

anssik: we received some detailed feedback from an indivual outside the group (thanks!) and suggestion for explainer updates
… this feedback is particularly helpful because it comes from a person who is not very familiar with this API, so we can expect also other early adopters to have similar questions
… feedback in issue #840

<gb> Issue 840 What are the options when some Native NN API isn't available? (by iampi31415) [question]

anssik: I think it is helpful to address these in an explainer update

anssik: possible WebNN explainer updates:
… - revise the diagram, remove Android NN API that is deprecated, replace ONNX.js with ORT Web
… - ML accelerators -> ML accelerators aka NPUs, TPUs, XPUs
… - expand Considered alternatives explaining the complementary nature of WebNN and WebGPU
… e.g. along the lines of Reilly's response in the issue:
… WebGPU shaders can be highly-optimized for a specific model
… OTOH WebNN ops are highly-optimized for the latest hardware generation
… also worth noting these two APIs can now work better together thanks to MLTensor and related improvements

MLTensor explainer

anssik: Bryan submitted a PR #844 to reflect the current implementation (WIP)

<gb> Pull Request 844 MLTensor explainer: replace "import buffer" with "export tensor" (by bbernhar)

anssik: for explainer, the change is basically replacing "import buffer" with "export tensor"
… with two reviews this PR is good to merge
… PR #844

Dwayne: I'll look at this today and merge
… there will be a follow-up PR for corresponding spec changes

Query supported devices

anssik: we've discussed this feature carefully and I see there are diverse use cases for querying supported devices
… we split this issue in two, "before graph compilation" and "after graph compilation"
… before going to those issues, I'd like to discuss what I consider the generic things we seem to be getting an agreement on
… I think we agree we want to document the motivating use cases in the explainer, so we're use case driven

Device Selection explainer

anssik: we've discussed preferences such as "prefer CPU", "prefer NPU", "prefer GPU", "maximum performance", "maximum efficiency", "minimum overall power" that could be mapped to and explained through use cases
… and only then mapped to implementations to assess implementation feasibility and future proofness
… considering these preferences would be merely hints, they should be implementable on any backend

Zoltan: we should discuss on the issue and distill consensus on the explainer, I'm cautious on "prefer NPU"
… we should go by use cases, prefer issue discussion

anssik: we discussed hard-coding some device names such as "npu" may not make sense and it was clear we want to use more future-proof names
… "cpu" -> "where JS and Wasm execute", "gpu" -> "where WebGL and WebGPU programs execute", "other" -> "!cpu && !gpu"

MikeW: I was going to say I agree with what Zoltan said, prefer use case driven spec feature development and issue discussion

RafaelCintron: I just want to say, it is important to frame these as hints and preferences, let implementations do the ultimate decision, you may not always get what you want, but what is best for you in the end

Query supported devices after graph compilation

anssik: issue #836

<gb> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]

anssik: this newly opened issue considers "after graph compilation" situation
… Phillis explains:
… "Eventually which devices are used can only be known after a model is provided. It's a different problem from the context level device query mechanism [before graph compilation]."
… from implementation point of view, Core ML's MLComputePlan gives op level device selections after a model is compiled before calling dispatch

Core ML > MLComputePlan

anssik: also TFLite shares information on which delegate gets used for each op after the graph is compiled
… the proposal for WebNN suggests the following information could be attached to MLGraph that represents a compiled graph:
… - devices (plural) that will be used to execute the graph
… the examples in the issue illustrate this the best (consider devices names placeholders):

const graph = await builder.build({'C': C});
console.log(graph.devices) // ['cpu', 'npu']
const graph = await builder.build({'C': C});
console.log(graph.devices) // ['cpu', 'npu']
console.log(graph.deviceForOperations)
{
  "add_1": "cpu",
  "conv2d_2": "npu"
}

anssik: open questions:
… - how to generate identifiers for each op: automatically generate or use label and return only ops with labels attached to them?
… - is graph.devices enough, or does op level device selection have strong use cases?
… in general, I'd like to hear from the group if we should make progress with both "before" and "after" issues together, or should we work in phases one before another and in which order?

Zoltan: I'm interested in how we'd use the information we get for op-level

anssik: looking for use cases for this op-level granularity as well

– DRAFT –
WebML WG Teleconference – 8 May 2025

08 May 2025

Attendees

Meeting minutes

Incubations

Operator specific issues

layerNormalization

triangular

sign

WebNN wide review

Accessibility

Architecture/TAG

Internationalisation

Privacy

Security

Explainer updates

WebNN explainer

MLTensor explainer

Query supported devices

Query supported devices after graph compilation

Diagnostics