WebML WG Teleconference – 30 May 2024

Meeting minutes

Repository: webmachinelearning/webnn

Rob: a PM at Google working with Austin and Josh, joining this WG, interested in high-level story so that web developers can bring great experiences to the web

Announcements

WebNN Developer Preview announced at Build 2024

anssik: Microsoft's CEO Satya Nadella announced WebNN Developer Preview at Build 2024
… congratulations to the WG for this major achievement and recognition!
… if you watch the keynote recording, you note Satya spent hands-on time with WebNN-based web experiences and he liked it very much

WebNN Developer Preview

WebNN announced in Satya Nadella's Keynote at Microsoft Build 2024

anssik: I know it's been busy time for many of the WG's participants leading up to this event
… thank you for your continued contributions to the WG during this busy time

Awesome WebNN

anssik: continuing on this theme, I'm pleased to announce a new community resource called "Awesome WebNN"

Awesome WebNN
… this community-maintained GH repo provides delightful WebNN resources, curated list of awesome things around WebNN ecosystem
… a big thank you to our CG participant Belem for starting this effort
… I have asked Belem to lead the Awesome WebNN effort
… contributions are welcome from the entire web community i.e. not restricted to W3C members

TPAC 2024 F2F

Repository: webmachinelearning/meetings

anssik: thank you for responding to the indicative TPAC meeting poll in #23

<gb> Issue 23 WebML WG Hybrid Meeting at TPAC 2024 (by reillyeon)

anssik: I observed adequate support for having a WebML WG meeting at TPAC in Anaheim, CA
… I have conveyed this wish to TPAC planners and they are now working on the schedule
… on approximately 10 June we expect to announce a stable schedule
… we understand we cannot avoid conflicts with every other meeting at TPAC due to parallel tracks
… TPAC planners have assigned WebML WG Monday in the draft schedule
… given we have requested for a full day meeting, we have flexibity to adjust topic placement during the day flexibly considering specific needs of participants

NPU support

Repository: webmachinelearning/webnn

anssik: issue #623

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

anssik: as said it's been busy time for many of us, so my expectation we're now able to advance this feature in the spec-land

16 May 2024 NPU support discussion

anssik: checking the latest status, we still have solid support for option 1 as outlined by Dwayne in GH:

Option 1

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

anssik: Dwayne you have the group's full support to move forward with a spec PR for this feature at your earliest convenience
… I think that's all for this topic, any comments or questions?

Dwayne: I have a PR for the enum, will push it out today

MLBuffer

anssik: another important topic I pinned to the agenda is MLBuffer
… issue #688 and #542 have received updates recently

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

<gb> Issue 688 [MLBuffer] Support interop with WebGPU (by bbernhar) [webgpu interop]

anssik: thanks Bryan and Austin for advancing these proposals
… also thanks Rafael, Ningxin, Reilly and Mike for your comments, questions and suggestions
… first, I'd like to introduce and discuss the direct buffer sharing proposal #688
… and secondly, discuss the latest on the device-based storage object #542
… and third, discuss any other MLBuffer business there may be

Direct buffer sharing proposal

anssik: issue #688

anssik: from my high-level perspective, the following is how I see the motivation for this proposal:
… - WebNN lacks an API to import video frames
… - WebNN lacks an API to run custom ML ops
… - Performance benefits when JS copies are avoided
… - WebGPU lacks NPU support
… Bryan, please correct me and feel free to introduce the proposed solution

Bryan: thanks Anssi, sounds correct
… we're trying to come to requirements gathering phase, this is our latest understanding how we can implement this and allow developers the easiest way to interact with this feature
… we're trying to leverage what we have for WebGPU, I think for your next steps we want to hear if this is reasonable, if something else is missing
… we plan to file a separate issue with WebGPU WG/CG when we have consensus in this group, and then prototype this
… hopefully this given a clean view

<Zakim> anssik, you wanted to ask a question

anssik: when do you expect to open discussion in the WebGPU group?

Bryan: in a couple of weeks if no comments in this group

Rafael: I have not yet read the second proposal, but in general support the feature to allow interop with WebGPU, taking video frames, images, tensorize them
… if WebNN does not have your favourite op you can implement that with WebGPU and do that with minimum copies to keep interop performant

Device-based storage object

anssik: issue #542

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

anssik: this is about device-based storage object that may be used by WebNN ops
… I observed movement in this issue so wanted to bring this to the agenda
… Bryan and/or Austin, what do you want to share with the group?

Bryan: this is primarily Austin's investigation
… the biggest change is, MLBuffer was meant to be a linear bucket, developers can use it piecemeal, a model DML and graphics APIs used, multiple input with one MLBuffer
… CoreML cannot instantiate buffers in a similar manner, so cannot carve out MLBuffers similarly
… MLBuffer will thus behave more like a tensor
… will take in descriptor, stay compatible with Metal
… runtimes are managing buffer resources, own bag of worms, not new this is what WebGPU does, familiar territory

Austin: in summary, MLBuffer will behave more like a tensor typed with data type and shape as opposed to opaque bag of bits
… you'll need to know the shape and data type, we also talked about usage flags, if you know you want to rent out a buffer for GPU interop you want to allocate the buffer differently

https://github.com/webmachinelearning/webnn/labels/webgpu%20interop

anssik: appreciate if Bryan can help look at open "webgpu interop" issues and see if some are deprecated or can be closed

Austin: if we agree we need to change the function signature to add types and flags, a prequisite to add a macOS backend, if there's consensus we can go ahead and get more implementation experience

Austin: there has been discussion, want to hear if there are any concerns

Bryan: the usages Austin talks about is that createMLBuffer must be clear if the buffer is for interop or non-interop

<ningxin> +1 to prototype creating buffers with shape, type and flags

TAG feedback: threading approach on a GPU

anssik: TAG is generally happy with the WebNN API
… one question has arrived regarding threading approach on a GPU
… Much Thanks to Rafael for proactively providing a response speaking for the Chromium on Windows
… I'd like to review that response with the group and see if other participants have anything to add

TAG question

<gb> Issue 933 Updated review of WebNN API (by dontcallmedom) [Priority: urgent] [Progress: in progress] [Review type: small delta] [Review type: horizontal review] [Venue: WebML CG] [Mode: breakout] [Topic: Machine Learning] [Focus: Web

<gb> … architecture (pending)] [Focus: Security (pending)] [Focus: Privacy (pending)]

Rafael's response

anssik: quoting TAG question:

"@cynthia expressed a concern regarding the threading approach - that it's possible that an intensive model running on the GPU could disrupt normal site/content rendering, and that would manifest as things like delays in requestAnimationFrame(). Is this something you have considered?"

<gb> @cynthia

anssik: quoting Rafael's response:

"@matatk and @cynthia with Chromium on Windows, the WebNN context runs on a separate command queue from the Chromium compositor. Depending on the device, the ML work may run on a separate chip than the one which performs 3D work. Even when it runs on the same chip, the ML work is multi-tasked with other work on the system.

As with other web platform features (WebGL, WebGPU, 2D Canvas, CSS blurs, etc) overtaxing the system will eventually affect requestAnimationFrame. Web developers need to be responsible with the content they build."

<gb> @matatk

Rafael: I told TAG depending on the device used, it may run on the same chips that handle rAF(), even if the same chip is used it is multi-tasked

Joshua: sounds great, Rafael!

<jsbell> webmachinelearning/webnn#529

<gb> Issue 529 Specify WebNN timelines (by a-sully) [webgpu interop]

Joshua: we probably want to add something equivalent to your comment to the spec
… linking to definition on command queues and timelines and explain how WebNN interfaces with them
… also add hint that browser do appropriate things so that the system does the right things

ningxin: sounds good to me, for Chromium prototype we ensure CPU intensive tasks e.g. shader compilation are happening in the background thread to avoid blocking GPU main thread to avoid yank, we could also add this consideration to the spec

Open issues and PRs

anssik: as the usual, we'll discuss open issues and review PRs based on your feedback and progress:

All open issues

All open pull requests

Debrief on PRs merged recently

anssik: PR #690 was merged to add missing validation for a few ops, thanks Josh for the PR!

<gb> MERGED Pull Request 690 Add missing validation for pad(), slice(), and split() (by inexorabletash)

Josh: straight-forward PR identified while looking at the implementation

ningxin: I'd like to add Josh did great job auditing Chromium implementation!
… I'm doing more audit in the Chromium implementation, WIP, will share that in couple of days with the group

[interop] workstream

anssik: we have a new interop workstream!
… I'd like to have a discussion on how the group wants to approach the interop issues arising from differences between backends in the most effective manner.
… as discussed, these are often the "hard" issues

All interop issues

anssik: Josh, do you want to share your thoughts?

jsbell: there's been activity to figure out what data types are supported by which backends etc.
… active discussion whether the proposal satisfies WebNN EP requirements, if we implement that proposal and all frameworks adopt it and they can detect what is supported and where, that could become a non-issue
… referring to the issue #463

<gb> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request]

ningxin: with ONNX RT people, we got some feedback on this approach #463, it was good, also tensor layout for conv2d and pooling ops, today ONNX RT guess the preferred layout from device type, e.g. "cpu" prefer a specific layout

<jsbell> webmachinelearning/webnn#463 (comment)

<gb> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request]

ningxin: MLContext with data type can also report a preferred operand layout of the backend, that'd be very helpful for ONNX RT to adapt to that

[bug] Divide-by-zero outcome should be standardized

anssik: issue #691

<gb> Issue 691 Divide-by-zero outcome should be standardized (by huningxin)

anssik: a new issue from Ningxin via Chromium code review from Jiewei and Austin, about:

MLBatchNormalizationOptions.epsilon

anssik: issue is pretty clear, epsilon would be used by batchNormalization according to its calculation
… WebNN spec doesn't put any restriction on epsilon. It means if both variance and epsilon are 0, there will be divide-by-zero error.
… epsilon is also used by instanceNormalization and layerNormalization
… proposal from Ningxin is to standardize divide-by-zero outcome
… comments?

Dwayne: even between different GPU hardware you can have different NaN behaviour, I'd like to do more research across GPUs to see what the responses are

<ningxin> sgtm

Dwayne: can also look at CPU and NPU behaviour

[operator specific] Consider removing lstm and gru operators

anssik: issue #689 (related to #453)

<gb> Issue 453 Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability (by vsekhar) [process] [opset] [use case]

<gb> Issue 689 Consider removing `lstm` and `gru` operators (by a-sully) [question] [operator specific]

anssik: proposal to remove model-specific instructions like LSTM and gru
… rationale is TFLite delegates generally do not support the entire TFLite opset
… "For instance, TFLite GPU delegates only support a very basic variant of LSTM which does not support most of the parameters specified by WebNN."
… the issue provides a lot of details, thanks Austin

Austin: implementing these ops on many backends
… the issue provides a table with supported activation functions for LSTM across ONNX, DML, CoreML, TFLite, comparing with WebNN
… if you want to deploy an LSTM graph across frameworks unless you use relu or tanh
… high level ops have more knobs to turn, some knobs don't work on some platform, so need to do fallback with performance impact
… looking at the table, every backend has LSTM but not compatible between each other

anssik: Austin's summary provides three options:
… 1. Remove these operators from WebNN
… 2. Keep these operators, understanding that they will often be decomposed
… 3. Water down these operators into a least-common-denominator variant

Austin: what do we expect fallback path to be? All supported on all backends?
… if not supported, hard to ship a model so it works with more than one backend
… if our options are always do fallback, then are these ops useful in the first place then?
… tentative preference is option 1

anssik: any comments from others?

Dwayne: thank you for the investigation! I can help fill in DML data in the table
… when there's a decomposition path, then there's still value in keeping these ops
… LSTM is not used so frequently, I can look into my model archive to provide data for decision-making

– DRAFT –
WebML WG Teleconference – 30 May 2024

30 May 2024

Attendees