WebML WG Teleconference – 12 January 2023

Meeting minutes

anssik: Welcome to 2023!

WebNN API open PRs and issues

ghurlbot, this is webmachinelearning/webnn

<ghurlbot> anssik, OK. But note that I'm currently off. Please use: ghurlbot, on

ghurlbot, on

<ghurlbot> anssik, OK.

anssik: good progress was made in GH over the holiday period, so I'd like us to review the open PRs and those landed and discuss issues filed.
… my expectation is we'll identify and fast track any priority changes that should get into the initial CR release train.

Add lstm and lstmCell ops, rename MLOperator to MLActivation

anssik: #321

<ghurlbot> Pull Request 321 [closed] Add LSTM to the operator list (wchao1115)

anssik: this PR adds lstm and lstmCell ops to the spec.
… this PR received adequate review and I considered it ready to be merged.
… notably, we explicitly mention LSTM architecture in our current charter so it was great to get this done. This is a crucial building block that improves the "classic" RNNs.
… thanks Chai!
… any comments from Chai? Any questions from anyone?

Chai: no further comments, happy to get this in

Simplify MLContext creation: remove MLDeviceType, remove "high-performance" from MLPowerPreference

anssik: #322

<ghurlbot> Pull Request 322 Simplify MLContext creation (wchao1115)

anssik: this PR proposes to remove MLDeviceType enum:

enum MLDeviceType {

"cpu",

"gpu"

};

anssik: and to remove "high-performance" from MLPowerPreference enum:

enum MLPowerPreference {

"default",

"high-performance",

"low-power"

};

anssik: To ease the review, I added a diff of the proposed IDL changes to GH excluding MLOperator -> MLActivation rename changes

Diff with IDL changes in PR #322

anssik: my summary of the practical impact of these changes for the default context is:
… - device type selection is an implementation detail
… - ML frameworks (key customers) using WebNN API cannot request a CPU, GPU or NPU implementation explicitly
… - a GPU context can be created only from a WebGPU device

chai: thanks for the diff in a comment!
… PR is straightforward, there are a few conversation points to discuss
… before the change, if you want to create a CPU context you set the device type as "cpu" and create a context
… then everything is done on CPU
… with the change you create a context without setting any device type
… the implementation will be a CPU implementation
… for the GPU, before the change, you had two ways to create a context
… first, set device type to "gpu" and you'll get GPU context, second way is to give it a WebGPU device
… so two ways to do it
… after the change, we remove the first way to do it
… say, go create a WebGPU device and give it to createContext method
… we remove the first way of creating a GPU implementation, for 1) simplicity, with this change we can simply think of GPU context in terms of WebGPU context so it maps 1:1
… 2) the two ways to create a GPU context are fundamentally different in one way, if you create with a device type "gpu" the implementation will have to own the device that is created internally, so "gpu" device type has to manage its GPU device, while WebGPU device is not owned, but given to you
… WebGPU is passed in terms of implementation, "this is my device, feel free to use it"
… eliminating the so-called internal device type "gpu", we don't need to own the device but we work with the WebGPU device
… creating a WebGPU-backed context is a little more complicated in a way you have to create and select and adapter for WebGPU
… advantages: 1) can have multiple GPU adapters in one system
… ~1/3 of all systems will have more than one adapter
… design the API to get WebGPU be in the biz or selecting the adapter
… if you have the MLDeviceType "gpu" then such system might pick a different adapter than WebGPU would use, e.g. discrete and integrated
… each adapter is its own resource domain, you need to create a shareable resource to work cross adapters
… having just WebGPU way of selecting and adapter, you just go design what you want, WebGPU has the same enum with three values, incl. "high-performance", "low-power"

<Zakim> dom, you wanted to ask about extensibility to other *PU

dom: thanks Chai, this makes a lot of sense, supportive of the direction
… at some point I think we were thinking of CPU, GPU and extensibility for xPUs such as NPUs and such
… I wonder how that has been taken into account

Chai: thanks Dom, this PR will not address all about NPU, needs additional change
… thinking around NPU is it'll most likely be more similar how we do CPU currently than with GPU
… because NPU surfaces itself to the system as another adapter, a weird kind of adapter, cannot render anything just does compute
… in the future WebGPU might want to pick it up
… from the point of view hardware adapter itself, it is a lot more appropriate for WebNN to enumerate NPU on behalf of the user
… WebGPU contract is around graphics operations
… if you want NPU, you want WebNN to deal with the adapter, more similar to CPU than GPU

dom: question was, we're closing the simple path to say pick a CPU or pick a NPU
… WebNN is a logical place to find logical NPU adapters? and we can figure out how to integrate that into createContext method?

Chai: I see it not as a new device type but a new power preference

Dom: we can consider that when we get there

zkis: question to Chai, in earlier discussions we had a context type and a device type
… with script managed context and user agent managed context
… script managed context has valid use cases for ML frameworks that want to select an explicit type of an adapter

Chai: you're question is how does this work with frameworks?
… the framework will own the WebGPU device, if the app uses both WebGPU and ML frameworks, e.g. WebGPU to render and TF.js to run ML, the app needs to manage the lifetime of devices so this makes it easier to manage with no hidden device
… we make it WebNN biz to manage the device and not to mention, performance impact given no control over the device
… to answer your question, [the device] needs to be owned by either the app or the framework
… the app can device whether to use one adapter or more, ownership of the adapter will bubble up to the app or the framework
… I agree with Ningxin this will complicate framework code
… now just set enum and that's it, with these changes must pass the device and/or own the device themselves and pass it to WebNN

zkis: solves for WebGPU, how to create a CPU specific context?

Chai: call createContext without parameters

zkis: but that can be overridden by GPU

Chai: no, you get a CPU if called without parameters

zkis: what if I want CPU+some accelerator?

Chai: I don't have an answer right now

zkis: we don't have an adapter abstraction in WebNN

Chai: NPU adapter will look very different from any WebGPU adapter, it does not render, does not support shaders

zkis: owned by lower layers
… how about add another createContext method?

Chai: as of today, there is no a notion of NPU adapter, WebGPU could do that if it would want to
… but we don't know if they want to enumerate NPU adapters in the future

Chai: focus of the PR is between CPU and GPU and simplify the GPU story

ningxin_hu: thanks for the discussion, to summarize:
… 1) framework developer point of view, the device selection is part of the WebGPU, concern is on compute side, because per our current spec only default context can execute computeSync method
… for framework developer, also graph execution part is affected, CPU and GPU execution paths would diverge
… that's my concern regarding the framework developer impact
… if you look at the native frameworks, ONNX or others, commonly the fw allows developer to select a device or some execution provider, a device abstraction, and the execution path stays the same
… we abstract the device difference but allow to offload compute to different devices, but with this change a developer needs to adapt for that
… 2) backend implementer, e.g. Chrome or other browser engine, this change puts hard requirement on any backend, it must support WebGPU
… some OSes may not interact with WebGPU, e.g. MLService of ChromeOS, that abstracts CPU, NPU etc. with no integration with WebGPU
… 3) dependency, adds a hard dependency on WebGPU API, but WebNN interop is post-v1, now v2 feature, after this change, GPU support would move to "v2" feature for WebNN, otherwise we lose GPU support for WebNN "v1", and in extreme case if WebGPU interop does not materialize we lose GPU support all together

<zkis> I have updated #302 with this proposal, Chai please take a look: https://github.com/webmachinelearning/webnn/issues/302#issuecomment-1380586654

<ghurlbot> Issue 302 API simplification: context types, context options, createContext() (zolkis) v2

Chai: what you say is indexed on the thing that this change will make WebGPU a requirement for WebNN v1

ningxin_hu: correct

Chai: that is not entirely true, given the reason we introduce command encoder interface is to allow to submit work to WebGPU and allow WebGPU implementation manage its queue
… you still own submitting to the queue, populate the queue with a workload with MLCommandEncoder, allow to interop with WebGPU API and expect the app to submit work
… all is behind the scenes, we are going to have our own queue, and submit just the ML workload and get the final result of the execution
… compute method for WebGPU is a wrapper on top of the work submission mechanism
… does not make WebGPU interop a requirement, changes implementation of submitting work to GPU
… with internal GPU you also own your device, ask for a queue and you have your output data in output buffer you hand out to the caller
… ONNX RT as an example, to execute on GPU you give it a GPU queue, because apps using ONNX RT may use graphics API to do graphics work
… for GPU submission, for ORT, you give it a queue
… eventually the owner of the GPU device tends to be the app
… the notion we wrap everything works but is less flexible
… it makes efficient interop much harder, because the app does not see the device
… this change makes that slightly harder to implement but hope gives more flexibility to the app

ningxin_hu: if ML context compute can work with WebGPU context it would make me feel better
… I had a concern re throwing operation error in compute() as commented in GH

Chai: I'll look into that and try to fix that

zkis: pointer to Chai that there's a connection to #302 marked for "v2", check the updated comment there, almost the same thing what you proposed but makes the context type explicit

<ghurlbot> Issue 302 API simplification: context types, context options, createContext() (zolkis) v2

zkis: please review

<Zakim> zkis, you wanted to point out synergy with #302 (V2 discussion)

<ningxin_hu> Probably we could loop in Ping on this issue for feedback

Transfer the input and output views for asynchronous execution

<ningxin_hu> I'll check with PIng

anssik: issue #318

<ghurlbot> Issue 318 The input and output resources race condition issue of asynchronous execution (huningxin)

anssik: Jiawei spotted this bug as part of the Chromium implementation review.
… The issue summary is that for WebNN graph async execution, the main thread and the worker thread may access the input or output array buffers at the same time that can cause a race condition.
… Ningxin submitted a PR #323 to fix this. Much thanks for that! Also much thanks to Domenic for his advice that is incorporated into the PR for large parts.

<ghurlbot> Pull Request 323 Transfer the input and output views for asynchronous execution (huningxin)

anssik: currently the PR is welcoming everyone's review, I'd like to see this landed very soon given it fixes a critical bug.
… Ningxin, anything you'd like to share regarding this change?

ningxin_hu: good summary, only thing to add, after investigation this is a well known design consideration, noted as a warning in Web IDL, we are using the preferred solution, "Bring Your Own Buffer"
… from WebIDL point of view, frameworks will bring their own buffers
… this is a solved by Streams API and documented in Web IDL spec and we follow that design, check it out for more details, Domenic also has a blog post about this, check the issue for these materials

zkis: this discussion has been extremely helpful for the upcoming algorithm discussions

Improve graph execution steps

anssik: issue #316 documents editorial improvements to improve the steps for async and sync execution.

<ghurlbot> Issue 316 Review sync vs async compute differences (zolkis)

anssik: Zoltan has a WIP PR #319 out that defines generic graph execution steps and use them in the sync and async compute() methods.

<ghurlbot> Pull Request 319 WiP: Improve graph execution steps (zolkis)

zkis: I need to rework these when Ningxin's PR is merged

Simplify the operand layout support of conv2d and pooling 2d operations

anssik: in issue #324 Nignxin explains how in the existing WebNN spec, conv2d supports two input operand layouts defined by MLInputOperandLayout and four filter operand layouts defined by MLConv2dFilterOperandLayout.

<ghurlbot> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (huningxin)

anssik: Ningxin concludes this may make the implementation more complicated especially if a native ML framework or OS API doesn't support some of these layouts
… this feedback was also thanks to Chromium review by Jiawei - another demonstration why working on spec and impl in tandem is so effective
… To fix this, Ningxin proposes to reduce the supported operand layouts and just keep the default operand layout
… and let the layout adaption and graph level optimization be handled by ML frameworks that usually already support such functionalities.
… Ningxin, please feel free to fill me in

<ningxin_hu> need more inputs from framework developers

<ningxin_hu> I can also @ping in that issue

<ghurlbot> @ping

WebNN API Candidate Recommendation readiness

anssik: we've reached what I'd describe as a "W3C Process-defined expectation bar for CR readiness" == green with some yellow in CR readiness tracker #240

<ghurlbot> Issue 240 Candidate Recommendation readiness tracker (anssiko)

anssik: - we can show that the specification has met all Working Group requirements
… - we have added no new normative references since FPWD
… - we can document how adequate implementation experience will be demonstrated thanks to well advanced implementations across multiple backends and the first version of the test suite that just landed to WPT repo
… - we have strong evidence that the spec has received wide review
… - we have identified WebGPU interop as a feature at risk for initial CR and we plan to address it in CR updates
… but because we're an ambitious WG, I've set our CR quality bar higher than normal so that we do not just meet the bar but exceed the quality expectations
… thus I allow us for some time for final polish before we ship the CR during Q1

Meeting scheduling adjustment

anssik: Chinese New Year is Jan 22, 2023
… 2023 animal sign is rabbit, the luckiest out of all the twelve animals in the Chinese zodiac :-)
… Chinese folks will get 7 days off from work from January 21st to January 27th in 2023
… in consideration of this, I will cancel our next scheduled meeting that overlaps with this holiday period.
… Happy New Year to our participants from China!
… I'm proposing we push forward our bi-weekly meeting cycle by one week so that we'd meet:
… 2 Feb, 16 Feb, 2 Mar, 16 Mar, 30 Mar etc. at the usual time 15:00-16:00 UTC

<ningxin_hu> Thanks Anssi!

– DRAFT –
WebML WG Teleconference – 12 January 2023

12 January 2023

Attendees