WebML WG Teleconference – 25 January 2023

Meeting minutes

Repository: webmachinelearning/webnn

Delta wide review and a new Candidate Recommendation

anssik: I want to review the proposed plan for delta wide review and a new CR Snapshot expected in Q1'24.
… let me recap the key concepts:
… - wide review: objective is to ensure all web stakeholders are able to perform reviews of the spec and provide comments, "delta" prefix means we seek feedback and comments on changes since last Q1'23 CR Snapstop publication
… - CR Snapshot: if substantive changes are made to a CR other than to remove features, the WG should publish a new CR Snapshot. This publication has more weight than CR Draft in that it signals it has gone through closer scrutiny

Delta wide review

anssik: I'm happy to handle on behalf of the WG all the wide review coordination and CR publication tasks, but I will seek the WG's review to ensure we're all aligned
… my proposed plan is for the WG is to build upon the work we've done for the initial CR published March 2023
… so we're focus on changes since our initial CR in Q1'23
… I'll discuss a few important things we need to agree on to qualify for a new CR Snapshot publication

Implementation experience

anssik: I want the WG to highlight in the new CR Snapshot the subtantive progress made in implementation experience since initial CR.
… I propose we use the WebNN implementation status page as an evidence (shout out to Belem & co for keeping this important resource up to date!)

Implementation Status of WebNN Operations

anssik: given WebNN API sits at the middle in the "Web ML stack", we're tracking both web engine and browser implementations as well as frameworks that are the prime consumers of the WebNN API
… for Chromium-based browsers Chrome and Edge, we have XNNPACK/CPU backend and DirectML/GPU backend
… for ChromeOS, we have MLService/CPU backend
… as of today, we're 42%, 90%, 17% code complete respectively, with DirectML backend most advanced
… in addition to browser implementations, we're implementing JS ML framework integrations to WebNN API
… currently focusing on TensorFlow Lite for TF.js External Delegate and ONNX Runtime Web Execution Provider
… (consider these as the glue libraries between the framework and the WebNN API)
… currently, TF integration in being worked on in a fork, ONNX is in upstream and is 94% code complete

anssik: the Implementation Status page also links to w-p-t dashboards for details

RafaelCintron: for Apple platforms there's a Chromium PR that adds basic support for WebNN, basic infrastructure, translating all ops, WIP

Test coverage

anssik: We are also expected to demonstrate how we ensure implementations are and will remain interoperable with new implementations we don't yet know about
… the cross-browser and cross-platform web-platform-tests is our tool for that

Web Platform Tests dashboard for WebNN
… currently we have in total 3750 subtests with a pass rate of roughly 40%

<chai> brb

<dwayner> CoreML Initial backend standup https://chromium-review.googlesource.com/c/chromium/src/+/5075312

Ningxin_Hu: 40% represents XNNPACK CPU backend tests, it does not test GPU yet
… XNNPACK op coverage is tested, matches 41% op coverage for XNNPACK

<chai> b

Ningxin_Hu: my expectation is we continue to evolve the w-p-t alongside the spec and our pass rate will increase, we will add more test cases for existing ops too e.g. when we unearth new edge cases (thanks Bruce & co for you work on w-p-t!)

Significant new & removed features, conventions, use cases

anssik: We should also note significant new & removed features, conventions update, new use cases since our previous CR Snapshot in Q1'23
… this information usually goes to the Status section of the spec, I'll prepare a PR for the WG to review
… for new features, I propose to highlight the new ops and data types "int64" and "uint64" added to support well-known transformers landed in #478 and discussed in #375

<gb> CLOSED Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115)

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

anssik: I also want to note any ops removed based on implementation experience to streamline the API
… I have pushed a PR for updated use cases #507, thanks for your review JoshuaB and Zoltan, I will merge this soon

<gb> Pull Request 507 Revise use cases with transformers (by anssiko)

anssik: Zoltan and JoshuaB have improved the spec conventions significantly, I want to highlight this work, it greatly enhances interop by removing ambiguity and makes future implementers work easier
… to summarize, I'll prepare a PR for the WG to review the changes to the spec to prepare it for CR

Ningxin_Hu: I'd like to give a heads up about the sync API and implementation experience from ONNX RT EP

<Ningxin_Hu> https://bugs.chromium.org/p/chromium/issues/detail?id=1488162

Ningxin_Hu: we did a performance comparison for sync vs async via asyncify
… ask is to check the state of the sync API

<Ningxin_Hu> microsoft/onnxruntime#19145

Ningxin_Hu: we are informed JSPI, promise integration is coming, comparing with asyncify model inference perf is close to sync esp with WebGPU

Ningxin_Hu: ONNX RT is using async API rather than sync API as before, proposal for the WG to consider dropping sync API support
… this is to remove a feature

anssik: thanks for this, we have an option to mark the sync API as "at risk" if we believe it might take longer to deprecate

Ningxin_Hu: proposed for removal are computeSync(), buildSync() and createContextSync() specifically

anssik: is this spec feature removal an intrusive change?

Ningxin_Hu: I need to look carefully, but I think we can just drop the sync execution paths and remove these three 3 *Sync() methods

<Ningxin_Hu> I'll open an issue

chai: I think it'd be helpful to capture this in an issue and PR for the record, but sounds reasonable to me

Refreshing the current Status section of the spec

WebNN: Status of this document (SOTD)

anssik: SOTD is for busy people to get a high-level view where we're at
… I want to seek the WG's feedback on how to update this section, currently it reads:

"Further implementation experience and user feedback is being gathered for the MLCommandEncoder interface that proposes to enable more efficient WebGPU integration. A proposal to simplify MLContext creation is being discussed. This document is maintained and updated at any time. Some parts of this document are work in progress and further improvements are expected to be reflected in revised Candidate Recommendation Drafts and

Snaphots."

anssik: my questions:
… - how do we want to phrase the MLCommandEncoder status?

chai: I think it depends what we think is the milestone for the next CR
… we want to highlight delta, do we think WebGPU interop is a milestone we want to cross?

anssik: I think we should go to CR in Q1 and not block on WebGPU interop and keep the current status text

chai: MLCommandEncoder is there and has been there since last CR
… - what is the current status re "simplify MLContext creation" #322?

<gb> Pull Request 322 Simplify MLContext creation (by wchao1115)

chai: I think at this point we should remove this sentence

anssik: we're drop Simplify MLContext creation from the status, not highlight it
… - MLBuffer, should we note this important work in status given it provides an abstraction for CPU, GPU, NPU to interop more efficiently?
… - any other new features, removals of important WIP topics to highlight to busy people in this section, chime in on the upcoming PR

Issue prioritization

anssik: As we're approaching another spec milestone aka CR Snapshot I want to discuss with you:
… - the most urgent and important issues for the group, or buckets of issues
… - practical steps we can take to make those issues more visible and actionable to the broader group
… I'd characterize the WG's current work mode as implementation-driven. That is a great work mode.
… OTOH that means that while the core group has a shared understanding of where our priorities are, the broader group would benefit from hints and guidance where they should focus their attention and contributions
… I've used these calls to "check the pulse" on issues to build shared understanding, but not everyone can join these calls, and our meeting cadence cannot fit all open issues
… I should acknowledge spec contributions come in many shapes and forms, a typical implementation-driven contribution is a normatively defined feature, this is what I label as "new features" in our agendas
… in addition to "new features" there's a wide range of other contributions:
… - identifying and reporting issues, also help spot stale issues or suggest issues to be closed if addressed
… - patches to keep the spec in a cohesive state e.g. refresh informative parts whenever there's a normative change
… e.g. code examples, use cases, explainer, programming model updates, privacy & security, ethical considerations, notes to implementers (notes are those green boxes)
… - patches that improve normative definitions, fix bugs, align with conventions
… on the agenda these are known as "enhancements"
… contributions outside "new features" are equally if not more important and are great opportunities for new contributors, no contribution is too small
… I'd like to open the floor for discussion and brainstoring on concrete things we could do as a group to help new contributors join with concrete contributions
… maybe we can create a group of volunteers to triage our issues, we can use the GH facilities at our disposals better (labels etc.)

jsbell: thanks Zoltan, Ningxin, Anssi for reviews of my PRs
… as a new contributor it's been hard to understand where to focus my contributions
… started with clean up first, then going over the 100+ issue we have open and labels would help
… a bulk of the issues are about a specific op and not substantial architectural issues
… knowing the status would help, e.g. "needs a PR" label
… "ops that cross different backends" would be great to know
… I know there are some meta issues, prefer smaller issues over big meta issues
… big directional issues, e.g. sync/async, op sets StableHLO, gigantic issues may be hidden, these should be surfaced clearly with a label
… also areas where the spec may be incomplete even if no interop issues
… e.g. details of some ops refer to references behind a paywall, this is an issue, we should unearth these and label them

<chai> +1 on opset query as important issue we should tackle

New features

MLBuffer

anssik: I want to use this call for a synchronous discussion on the proposal for a backend-agnostic storage type for WebNN operation (aka MLBuffer) informed by implementation experience.

anssik: issue #482

<gb> Issue 482 Support for device-based tensor storage objects (by bbernhar)

Chromium implementation: MLBuffer

anssik: thank you for very professional and in-depth discussion on this MLBuffer issue
… this is a very complex issue and baking it takes time
… perhaps Bryan can take the lead in sharing the latest on this feature and to have a discussion on open questions with Rafael chiming in

Bryan: summary is decisions to be made :)
… how to proceed with buffer transfers?
… MLBuffer is GPU mapped now to simplify it
… how does interop work, concrete story upfront?

RafaelCintron: thank everyone who participated in this issue, great feedback Austin and Reilly, others
… WebGPU today has mapAsync, implementation experience
… WebGPU interop is important, several scenarios for camera input, anything that resembles real-time, WebGPU interop required for those use cases
… we should stage the work so we don't have a giant PR
… we should ship something we want to change later, it is OK to have the implementation develop gradually

chai: I agree with Rafael that the best way to qualify a complex proposal is to back it with an implementation and simple samples that show WebGPU interop
… I think this is a long discussion and people brought out good perspectives, but running code is a compelling way to demonstrate this can be worked out, I would focus on WebGPU interop
… conceptually it could be anything but WebGPU is on top of people's minds

asully: I reviewed the PR, thanks for the great work!
… I'm excited about the proposal, I agree WebGPU interop is top of mind, MLBuffer to work with WebGPU
… assumption that MLBuffer is in GPU may be a mistake, needs to be device-agnostic
… the buffer could live elsewhere

Ningxin_Hu: thanks for the great discussion, agree WebGPU MLBuffer interop is important, want to highlight non-interop part is a demanded feature, chained inference scenario for LLMs
… ONNX has I/O Binding with WebGPU backend to keep the output from the previous inference and use it as input for the next inference
… if we only consider non-interop parts that is super helpful in itself
… for the chained inference scenario
… API change for the WebNN spec can fullfill this use case, agree with Austin we need to consider WebGPU interop but also want to note this WebNN only use case is important
… Rafael's proposal for staging work sounds good

asully: parallel to the discussion on MLBuffer, WebNN timelines and WebGPU timelines need some addition work

Bryan: I can put down the opens and everyone's positions on them

anssik: we test if "everyone can live with a proposal" to test if we can move forward, as a test for consensus

<anssik> s/Ningxin_Hu: my expectation/anssik: my expectation

– DRAFT –
WebML WG Teleconference – 25 January 2023

25 January 2024

Attendees