WebML WG Teleconference – 8 February 2024

Meeting minutes

Announcements

Editor change

anssik: first, I want to make an announcement regarding an editor change for the WebNN API
… Chai will take on additional responsibilities at Microsoft and will step down from his editor role
… he will continue as a WG participant and as a fan of this WG
… on behalf of the WG, I want to thank Chai for his transformative contributions to this WG as a co-editor of the WebNN API spec, including all the guidance and technical insight shared with the group
… when I asked Chai to join as a co-editor in 2020 we were in the incubation phase
… with Chai's sustained contributions over the last 4 years we evolved into a WG and the WebNN API advanced from an early proposal into a Candidate Recommendation, and today the work of this WG is widely recognized in the industry, quite a journey!
… thanks Chai for all the work you've done for the WG!

chai: thank you Anssi! It has been an interesting journey for me, we started early 2020 and I was convinced that the web platform needs WebNN to reach native perf
… this work has to happen, super excited where we are now in capable hands
… WebNN remains a high-priority effort for Microsoft

chai: thank you Anssi and Ningxin for inviting me early on into this effort

<Ningxin_Hu> Thanks so much Chai!

anssik: to fill in the vacant role, I'm pleased to appoint Dwayne Robinson as a new co-editor, joining Ningxin
… Dwayne is fully dedicated to help WebNN API spec advance and brings in a wealth of relevant experience, he's a brilliant engineer
… Dwayne has my full support and I'm committed to ensure this transition is as frictionless as possible
… thank you Dwayne for taking on this important role
… I will push a PR to acknowledge this editor change, config our repo, this change is effective immediately

<Ningxin_Hu> Thank you Dwayne!

webmachinelearning/webnn#561

<gb> Pull Request 561 Editor change (by anssiko)

Welcome new participants

anssik: I have also another announcement to make
… our WG continues to grow, and I'm pleased to welcome our latest new participants
… Mike Wyrzykowski, Anne van Kesteren, Benjamin Poulain, Theresa O'Connor, Cameron McCormack, Marcos Caceres, from Apple joined the WG this week, welcome all!
… I'm also pleased to welcome Geoff Gustafson from Intel to the WG
… also joining us as a guest today is Michael McCool from Intel
… Geoff and Michael are currently exploring hybrid AI space, related use cases and open issues
… I will invite them to share their hybrid AI exploration findings in our future meeting to allow WG participants provide guidance on future directions to explore beneficial to this WG

WebNN API delta wide review and CR Snapshot readiness

anssik: In short, we're on track. Thank you all.
… all delta wide review requests are in flight: Architecture/TAG, Privacy, a11y, i18n, Security, no concerns heard
… we plan to publish a new CR Snapshot next month.
… #240 tracks our progress toward CR readiness and a high-level status is staged in #532, we'll merge this PR when we branch for release

<gb> Pull Request 532 Update Status of this document for CR Snapshot (by anssiko)

<gb> Issue 240 Candidate Recommendation readiness tracker (by anssiko) [process]

dom: CR transition approval typically takes a week, couple of days to make sure the document is ready to be published

Wording changes debrief

anssik: jsbell has submitted a number of "Wording change" PRs (aka editorial change), I've helped merge some to take some workload off the editors
… I asked jsbell to provide a quick overview to make it easier for everyone to catch up with these changes.

Parameter Types

jsbell: - Drop arg type assertions for methods defined by WebIDL
… - Convert internal algorithm type assertions into parameter types
… - Add types to all internal algorithm members

Simplification

jsbell: - Bundle validation (power pref, device type, etc) right into context creation, rather than making it a separate step.
… - The spec had WebIDL-specified enums for device type and power preference as well as internal enums for the same thing, with usage that required these were identical. Dropped the internal-only versions and just use the WebIDL-specified types.
… - For dictionaries, Bikeshed automatically annotates members with types and defaults, so repeating that in the prose is extra work and extra noise. Also simplified the text by removing "Specifies" where it didn't add anything, and refer to lists rather than sequences in prose.

Fixes

jsbell: - There were a few places that linked to "object" which Bikeshed matched to the FileAPI's blob constructor, which is wrong. To fix this, introduced new definitions for "platform operator" and "platform operand" which also reduce the number of "implementation-defined" references in the spec.
… We're still linking to "number" wrong but that's in reshape(), which has some issues.

<Ningxin_Hu> Thanks much Josh, these PRs are very helpful!

<jsbell> yep!

Issue prioritization (cont'd)

anssik: With our WG growing we have new participants who are looking for guidance on how to best focus their contributions
… to help with that, we initiated an issue prioritization exercise, started document triage guidance and started labelling issues to test drive this work mode
… much thanks to jsbell for helping drive this effort!
… Josh joined me in the Triage team, others interested please get in touch with me.
… Josh, please introduce the triage guidance proposal and outline the outstanding questions for discussion

<jsbell> webmachinelearning/webnn#533

<gb> Pull Request 533 Process: Add documentation for labels, current and proposed (by inexorabletash)

Triage guidance for WebNN

#533

<gb> Pull Request 533 Process: Add documentation for labels, current and proposed (by inexorabletash)

Current labels

jsbell: Apologies that last telecon I was rushed due to a power outage
… My goal is to make the next steps for the API proposal overall clearer - separate the important high level issues - like bounds on interop, or the minimal operator set, etc - from stylistic issues, or issues about specific operators.
… Some issues contain proposals and are just waiting for a PR, some require more discussion. It's hard to tell how far we are from being complete at a spec level, and where to focus attention.
… PR #533 has documentation for labels and proposals for a few changes. Anssi has implemented several of the proposals, which include:
… Added "feature request"
… Renamed to "opset"
… Renamed to "conventions"
… Dropped unused labels ("dependencies", "tag")
… Consensus on using "bug" for "the spec has a flaw that must be fixed"

Outstanding questions

jsbell: Proposed removals
… - "enhancement" as ambiguous - unclear if this is a feature request or editorial improvement.
… - "help wanted" unclear who can help - is this an outstanding question, PR needed, etc.
… Proposed additions
… - "op specific" for issues scoped to a single op (or closely related ops) - about 37 issues, listed in PR
… - Common from other specs: "has PR", "needs PR", "has WPT", "needs WPT" - signal next steps
… - Do we like "v2" or should we consider Milestones? Probably defer this, wait for more experience/feedback.
… - Do we want markdown files in the top level directory, or a docs/ directory, or …?

RafaelCintron: support milestones

Dwayne: granularity of the milestones?

jsbell: up for discussion
… v2 is usually "future work"
… "before the next CR"

Reilly: should discuss what we'd consider features that should be in the first release in browser engines
… we have folks from multiple browser vendors in the WG now

PR #533

<gb> Pull Request 533 Process: Add documentation for labels, current and proposed (by inexorabletash)

<Ningxin_Hu> docs/ dir sgtm

jsbell: indeed, PTAL PR #533

New features

MLBuffer

anssik: first this is a big and complex topic
… thanks Bryan, Austin and others for all the exploration and documentation, prototyping
… we have documented the following spec explorations:
… MLBuffer proposal #482 broken into:

<gb> Issue 482 Support for device-based tensor storage objects (by bbernhar) [webgpu interop]

anssik: - Creation and representing MLBuffer on XPU devices #542

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

anssik: - Uploading/downloading tensor data #543

<gb> Issue 543 [MLBuffer] Uploading/downloading tensor data (by bbernhar) [webgpu interop]

anssik: - Support for MLBuffer in graph execution #544

<gb> Issue 544 [MLBuffer] Support for MLBuffer in graph execution (by bbernhar) [webgpu interop]

anssik: MLBuffer exploration #541

<gb> Pull Request 541 Add MLBuffer exploration doc (by a-sully) [webgpu interop]

MLBuffer exploration (HTML preview)

anssik: MLBuffer exploration discusses:
… 1) Goals
… 2) Overarching Questions
… 3) Use Case: Chained Inference
… - MLBuffer creation
… - Writing to an MLBuffer
… - Execute an MLGraph
… - Read back data from an MLBuffer
… 4) Use Case: WebGPU Interop
… - Rent out an MLBuffer to WebGPU
… - Return a rented-out MLBuffer back to WebNN

<Ningxin_Hu> +1 to merge it into explainer

anssik: also the Chromium prototype has a lot of informative discussion

Chromium implementation

anssik: where would you like to focus today Bryan, Austin?

Bryan: overarching questions are important, need to be discussed

#542

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

Bryan: is the goal of WebNN to make it look like WebGPU but works on more than GPU backend, e.g. NPU backend

Reilly: I would focus on the use cases, we have an understanding there are places where apps using WebNN want to have interop between work they do in WebNN and in WebGPU, e.g. Zoom web client
… semantic segmentation use case running on an NPU, pass result to GPU, render to a display with GPU
… desire for anything with image processing and using features that cannot be implemented with WebNN to pass data back and forth between WebNN and WebGPU
… developers passing data between WebNN and Wasm similarly, another similar use case

Reilly: driven by those use cases, we should have interop for MLBuffer for those APIs
… re whether MLBuffer should be a unifying interface for all these backends, internal to WebNN
… MLBuffer should work with NPU, should work with all the contexts and configs

zkis: I have question, what is the main thing in MLBuffer, identify buffers you can map between backends to avoid data copies?
… or enable hybrid execution across XPUs?
… another question, do we design the API so it is a generic design that can adapt to future use cases such as possible hybrid execution

Reilly: the use cases are very important

RafaelCintron: to answer Bryan's questions, MLBuffer to work with GPUBuffer, it should work, makes it straightforward to know what to do
… for use cases, key are 1. WebGPU interop both ways to and from WebNN starting
… 2. chained inference

Ningxin_Hu: for CPU I support device agnostic API
… CPU has optimized mem layout, e.g. SIMD, requires tensor mem layout
… in chained inference use case we can pass previous inference output and pass it as an input to the next inference
… no mem relayout between the inferences
… for use cases I'm interested in chained inference, because perf overhead especially in transformers models, e.g. T5 and SD models
… many iterations to run one graph to complete one inference, needs several iterations to go through one sequence
… with WebGPU and GPUBuffer data transfer with good performance, 2x, with read back to CPU with T5 model through ONNX RT I/O binding
… noticed an open issue #559 to ask for control flow ops, similar use case

<gb> Issue 559 Control flow operations: if, while (by philloooo) [opset] [feature request]

Ningxin_Hu: before adding while op, if we have MLBuffer we can use JS-control flow and fulfill this requirement

chai: MLBuffer's role in GPU interop space, the question whether it should be explicitly controlled by WebNN or managed by WebNN implementation
… thinking of NPU it is different from GPU
… NPU is an accelerator, does not have breadth of support unlike GPU that is general purpose
… because if this there needs to be a fallback that happens within the graph, it is not a wholesale change from one processor to the next but similar to how CPU and GPU interaction works
… fallback needs graph resource and workload scheduling done in the OS itself to be super efficient
… my view is the surfacing of NPU capability may not be in full control of the backend but hybrid execution between NPU and GPU with backend controlling scheduling
… this is my hunch we need this design for efficiency

Bryan: what makes sense in WebGPU may not make sense in WebNN, thinking about this
… we have map buffers, CPU accessible GPU buffer, do we need similar mapping for WebNN and WebGPU?
… staging memory does it make sense for WebNN?
… do we throw the unified I/O path out of the window?
… inspiration from ONNX RT for I/O bindings
… big questions, does it have to be like WebGPU?

asully: I've been advocating WebGPU-style
… no sync readbacks like in WebGL
… if MLBuffer is not CPU not necessity for asyncronizity
… what if the buffer is on NPU or opaque?
… the most flexible approach I think is to have a WebGPU-like approach with everything being mapping nothing can be done in sync from content timeline

Reilly: I said this should be like WebGPU and that was mainly in the context of if we expose similar semantics it should be like WebGPU
… if we don't support mapping we can be different from WebGPU

Bryan: in WebGPU you don't map a buffer, you tell GPU to copy it over, WebNN has to spec a copy API to be similar
… or overload dispatch, or map a buffer
… I need machinery to do the mapping

#542

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

RafaelCintron: WebGPU and buffer mapping took long to design!
… don't feel bad if it takes long for you too
… vendors with GPUs with unified memory would like to enhance the buffer in WebGPU
… MLBuffer should work with a variety of devices

#541

<gb> Pull Request 541 Add MLBuffer exploration doc (by a-sully) [webgpu interop]

<Ningxin_Hu> Thanks Anssi!

anssik: Happy Chinese New Year Ningxin!

– DRAFT –
WebML WG Teleconference – 8 February 2024

08 February 2024

Attendees

Meeting minutes

Announcements

Editor change

Welcome new participants

WebNN API delta wide review and CR Snapshot readiness

Wording changes debrief

Parameter Types

Simplification

Fixes

Issue prioritization (cont'd)

Triage guidance for WebNN

Outstanding questions

New features

MLBuffer

Diagnostics