WebML WG Teleconference – 22 May 2025

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome our new WebML WG participants:
… Jonathan Schneerson from Temporal Series AI, AI startup specialized on time-dependent data from financial transactions, sensor streams etc.
… Peter Tanski and Suraj Bisht from Capital One Financial, a financial services company, also an early adopter of forward-looking web capabilities e.g. Web NFC API for authentication

anssik: as our group is growing, at the same time, with mixed emotions, I'm sharing that one esteemed participant is taking a break from work

Josh: Hi! I'm departing Google, will move to another country and no future plans yet, enjoyed working with this group truly, will remain contactable through personal email and IRC etc.

<jsbell> me: inexorabletash AT gmail DOT com

<ningxin> Josh, thanks so much for your tremendous contribution to this WG and WebNN spec!

anssik: thank you Josh for everything!

Rafael: the spec has tremendously benefited from your work, I've learned a lot from you!

ningxin: I will echo Rafael and Anssi, thank you so much for your tremendous work for this group!
… you've been a coach for me as an editor, really appreciate that and thank you and wish you a great next chapter!

<zkis> Thanks Josh! It started as a great run / job together on merging a lot of algorithms, which you have single-handedly improved in many iterations. I learned a lot in that process. Thank you!

Incubations

<jsbell> Thanks Zoltan and all!

anssik: on our upcoming Community Group EU-APAC Mon 26 May agenda for we have:
… Prompt API implementation experience from AiBrow
… New proposals: Web AI for Time Series, (recap) Local Inference Web extension
… Proofreader API kick off
… Prompt API security and privacy
… See the CG agenda for more references

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-05-26-cg-agenda.md

Google I/O and MS Build 2025 takeaways

anssik: both the events were (unsurprisingly) AI heavy, a few observations I think are of interest to this group:

Built-in AI APIs

anssik: both Edge and Chrome made announcements around Built-in AI APIs being worked on in the WebML CG

"Enabled by default: Prompt API for Chrome Extensions, Summarizer API, Translator API, Language Detector API; Origin trials: Writer API, Rewriter API; Early preview: Proofreader API"

Google I/O built-in AI APIs announcement

Josh: other thing demonstrated was multimodal Prompt API use, processing of image and audio as inputs, that's in early preview stage

"The Prompt API and Writing Assistance APIs — now available as developer previews in Edge Canary and Dev channels"

Microsoft Build built-in AI APIs announcement

anssik: notably, Prompt API in Edge developer preview available to web apps and pages, not just to extensions

Rafael: that's well covered Anssi, nothing to add

Windows ML

anssik: at Build Microsoft announced Windows ML as an evolution of DirectML

Windows ML announcement

anssik: this is relevant to the group from the WebNN implementation perspective, an opportunity to gather further implementation experience
… based on what was announced at Build:
… Windows ML promises to simplify dependency management on Windows
… vendor-specific execution providers are part of the Windows ML and updated by the OS

RafaelCintron: that is correct, it will be ONNX Runtime based

Josh: question to the group is, should we update the explainer WebNN architecture?
… anssik: - new device selection mechanism that supports hint-based, explicit selection, and automatic selection

<DwayneR> "WinML" is a WinRT-based wrapper atop ONNX Runtime (several years old). It's a fairly thin wrapper, plus some additional support for video frames and image conversion to input tensors. It only supported CPU and DirectML EP's.

<DwayneR> "WindowsML" is a Windows-specific fork of ONNX Runtime, directly calling the ORT API (with some slight renamings in the header). It supports multiple EP's.

Operator specific issues

anssik: as usual, we focus our review and discussion on operator specific issues

[operator specific] issues

int64 data type

anssik: I wanted us to take a look at various float64 data type related issues and PRs to check we're all aligned
… issue #283 fixed by PR #646

<gb> MERGED Pull Request 646 Specify the operand data type constraints of operations (by inexorabletash)

<gb> CLOSED Issue 283 Specify the operand data type constraints of operation (by huningxin) [question]

anssik: introduced constraints for input operands (thanks Josh!)
… issue #694 has a draft PR #695

<gb> Issue 694 Consider adding int64/uint64 data type support for some reduce operators (by lisa0314) [operator specific]

<gb> Pull Request 695 Bugfix: Add missing 64-bit integers support for some reduction operators (by huningxin) [operator specific]

anssik: to add int64/uint64 support for reduce ops
… the PR awaits Mike's response to a question: "should we also allow optional 64-bits integers support for these reduction ops?"

Ningxin: before changing this PR to draft due to opSupportLimits, in last meeting we heard Microsoft's feedback for sign for optional support for int64
… I'd propose to open this PR for review

<reillyg> a+

Reilly: at TPAC we discussed minimum data type set implementable across all Chromium backends, this might be a sign we should commence with that
… this should also allow us to clean up many wpt failures, we could hard-fail ops that do not support the minimum set
… and keep some ops optional

[ thumbs up from Dwayne ]

Reilly: some ops have no overlapping data types, that is an issue
… I'm not blocking re-opening PR #695

<gb> Pull Request 695 Bugfix: Add missing 64-bit integers support for some reduction operators (by huningxin) [operator specific]

ningxin: so for Reilly's proposal, should we have a separate issue?
… we should also consider device, CPU and GPU device may have different data type support for the same op?

Reilly: I guess, we have to do the analysis first, I expect if we include data type we find there are more data types not supported across all device types
… the proposal is we should consider device type selection optional
… and intersections should not consider data type, be orthogonal
… there's a separate question how we communicate to developer a particular device type need to use specific data type
… if we try to consider all these things at once, we can't come up with a useful op set
… what ops implementations must support is the question?
… this might force all implementations to support CPU and GPU always and rely on feature detection to find out NPU support
… "I prefer to run on NPU, give me the available data types for that"
… should focus on compatibility first, models that will surely execute

ningxin: compatibility means native framework compatibility, that is separate from the device?

Reilly: correct

zkis: I think with Windows ML announcement we can reiterate the device selection design, hints-based vs. explicit, it seems the current hints based is a subset of what Windows ML supports

anssik: issue #845 was fixed by PR #848

<gb> MERGED Pull Request 848 Bugfix: Support `int64` for `abs`, `neg`, `sign`, `prelu` and `relu` (by huningxin)

<gb> CLOSED Issue 845 The allowed data types of input operand for `sign` operator should also include `int64` type (by BruceDai) [operator specific]

<ningxin> I propose to open a sperate issue for Reilly's proposal of minimum data type set if there is not an existing one

<ningxin> sg

Reilly: Ningxin feel free to open an issue for this proposal

triangular

anssik: issue #768

<gb> Issue 768 Consider removing or redesigning the `triangular` operator (by a-sully) [operator specific]

anssik: JoshuaL shared new per-model Trilu op count data (thanks!) so I wanted us to discuss this as a group
… Dwayne shared his observations in the issue: "Most of these models contain just one instance of trilu, but that is a substantial percentage of model"

anssik: does this new data support the proposal to remove triangular op from the spec?

Dwayne: the additional data encourages keeping the triangular op
… but need to consider how many backends have support for this op
… I'm inclined to keep this op now, unless we have a better understanding of the Core ML decomposition
… decomposition is possible with large triangular matrices without taking a lot of memory

Reilly: as fuzzing the implementation, the fuzzer was able to find huge matrices with masks, not existing in practical models
… more implementation work required to do this, we could compute the mark as inference time than bake it via the generated model

Dwayne: computable at runtime, Core ML should be able to decompose this on the fly, I have details in the issue
… did you encounter input of big size?

Reilly: fuzzers do generate huge inputs

Dwayne: for security perspective?

Reilly: correct, to identify corner cases

Dwayne: any reservation to support this op?

Reilly: no as long as it is secure and efficient to implement

opSupportLimits level of detail for output tensor(s)

anssik: issue #835

<gb> Issue 835 opSupportLimits: Level of detail for output tensor(s)? (by inexorabletash) [question]

anssik: Josh explains: "there are a variety of opinions about how much detail opSupportLimits() should include for the output tensors."

Josh: just do the bare minimum is one approach, understandability is the burden, the question is, in all the ops it is possible to determine the shape and data type from the algorithm, do we rely on that?
… the most recent comment whether we should include this data is from Ningxin to let's add, I think this is waiting for someone to write a PR for this

anssik: the proposal from Ningxin to have ranks in output got substantive support

Dwayne: maybe biased from other specs, would prefer to have output and ranks for symmetry

anssik: Phillis reports the actual constraints from underlying ML frameworks are:
… - global tensor rank constraints
… - op level input rank constraints
… concludes we have two ways to expose this:
… - expose rank for per op output
… - represent global tensor constraint via opSupportLimits

<ningxin> I can write a PR if Josh hasn't started

Josh: let's spec this in the opSupportLimits

<jsbell> I have no open PRs, not starting anything new right now

Rounding

anssik: issue #817

<gb> Issue 817 Rounding operators (by fdwr) [feature request] [interop]

anssik: this issue has extensive background research by Dwayne, thanks again!
… the TLDR: add one function that is consistent with IEEE rounding mode, express decomposition for quantizeLinear operator
… the remaining open questions seem to be round behavior on Core ML NPU/ANE
… per Ningxin's experiment the rounding behavior is inconsistent between ANE/NPU and CPU
… ANE/NPU uses rounding away from zero
… Dwayne asked whether it is round *half* away from zero (RHAZ)

RAZ

RHAZ

anssik: do we know which it is?
… is the proposal to add the round operator with a note to implementers they should emulate this due to RAZ/RHAZ inconsistency between CPU and NPU?

Dwayne: Ningxin probably meant RHAZ too, this is low-level, Core ML is the only one that has a potential issue
… fundamentals to emulate this with only 2 ops

anssik: hearing no concerns to add this op, emulation path performant
… any comments?

Dwayne: thanks for your research!

Dwayne: I'll do the PR

<ningxin> Thanks Dwayne

isNaN op proposal

anssik: issue #811

<gb> Issue 811 Behavior when there are NaNs in argmin/max inputs (by philloooo) [interop]

Dwayne: several backends have this op, I guess this would benefit from PR

Caching mechanism for MLGraph

anssik: issue #807

<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]

anssik: with Reilly here, I wanted to revisit prototype implementation findings to reinvigorate work on the explainer
… we have an initial implementation based on Chromium and ORT, and sample code on how this integrates into existing sample
… there's also a Chromium Design Doc, but I'm not sure if that has been shared with the group yet?
… I recall Reilly commented he'd take a stab at the explainer based on the implementation

Reilly: I recall offering to write the explainer, I recall Mike already put something out there
… I can take an action to write this explainer
… as for implementation experience, there's Chromium Design Doc, Ningxin do you have updates for the prototype?

ningxin: I'll ensure the Chromium + ORT based Design Doc is public, for prototype status, we saved the compiled model using the ORT compiled model, we haven't made it work with GPU process yet
… based on that API sketch proposed by Reilly we experimented with ORT Web integration and would like to get more experience how the AI framework can utilize this feature
… can share early prototype of that with the group
… even if not in real model cache storage managed by the browser process, with saved to disk, we got good performance gain
… also discussing with ORT people how to reduce memory and disk overhead
… Reilly proposal separates build from save operation, source model must be kept after build, to allow saving the graph later, or save to temporary place on the disk
… this seems not very ideal, new idea from Rafael is to help overcome that issue

Rafael: recap, once you create a session from model building, key piece of information is not present
… later you may want to save the model, Ningxin proposes to keep the information to allow save later
… or have "build and save" at the same time
… to use memory efficiently

Reilly: the design I made was based on how Core ML and TFLite backends work, the model has to remain on disk
… I guess the question to Rafael is re ORT implementation, is there a change to the design that makes this more efficient?

Rafael: yes, it would help to force developers to decide at build time whether to save at the same time
… if we get more feedback from developers, we can drive that if it is a MUST requirement

Reilly: the cost of saving and deleting is minimum, if the system forces to do both at the same time and delete the file later would be reasonable

Rafael: cost of keeping the data around that may be needed later is the question

Reilly: I guess the answer is no per Ningxin's work

Query supported devices before graph compilation

anssik: issue #815

<gb> Issue 815 Query supported devices before graph compilation (by anssiko) [device selection]

zkis: I will update the explainer, will submit a PR for the group to review

<anssik> s/… - new/anssik: - new

– DRAFT –
WebML WG Teleconference – 22 May 2025

22 May 2025

Attendees