WebML WG Teleconference – 8 September 2022

Meeting minutes

ghurlbot, this is webmachinelearning/webnn

<ghurlbot> anssik, OK

WebNN API Candidate Recommendation open issues

Current CR issues

Support asynchronous context creation

anssik: issue #272 has the latest discussion

<ghurlbot> Issue 272 Support asynchronous context creation (huningxin) cr

anssik: two PRs for two design alternatives
… PR #274 is using the x() + xSync() pattern (Sync postfix)

<ghurlbot> Pull Request 274 Support async context creation and use sync postfix (huningxin)

anssik: PR #285 is using the xAsync() + x() pattern (Async postfix)

<ghurlbot> Pull Request 285 Introduce MLContext.createContextAsync (huningxin)

anssik: good discussion, no clear consensus because naming is hard!w
… based on feedback from Domenic it appears #274 aligns with the (ideal) web platform convention
… based on our investigation, #285 aligns with the emerging WebGPU API convention
… a new data point from Domenic was that a worker-only sync API is an exception
… I know this issue needs a compromise, so everyone should think: "can I live with the chosen design"

dom: I guess Sync APIs should be exceptions, so Sync version is used by code generators so postfixing with Sync, longer complex name for those consumers

anssi: that aligns with Domenic's feedback
… and that sync-only worker apis are expected to disappear given evolution of JS-WASM integration

Rafael: my feedback has been that a CPU-blocking op should return a promise; an operation that doesn't block on the CPU (e.g. because they run on the GPU) should return immediately
… the CPU & GPU timelines are completely different
… our API should reflect where the job will be running and behave sync or async accordingly

chai: I agree with Rafael on the pattern, but since we're providing an abstraction over both the GPU & CPU timelines, I don't think we can adjust it that way
… otherwise we would have to expose different APIs for CPU / GPU, which I think would be more confusing
… the issue at hand is about naming here
… since there is no consistency, looking at the WebGPU prior art of using -Async, where WebGL has been using -Sync
… in the fact of no perfect answer, I would follow the convention of WebGPU given that we no longer support WebGL
… given our interop with WebGPU, consistency with it would be useful

<Zakim> dom, you wanted to suggest we ask TAG's input

dom: looking at WebGPU API, it does not have the same split with sync API available in the worker context, but in main thread
… so sync API exception does not apply to WebGPU API
… I hear the argument with WebGPU consistency, but we're not actually consistent with main/worker thread exposure
… may be useful to ask advise from TAG on this specific issue
… no super critical, but that would help us get to a decision, think Sync postfix is cleanest path
… one of my hats in W3C is to make sure we provide a good developer experience, and think that pattern is a better fit -- not a make or break thing

anssik: can we bring this up with TAG so that we could get an answer faster than for usual reviews?

proposed RESOLUTION: Seek TAG recommendation on asynchronous context creation naming issue

Rafael: our abstraction design may create unneeded delays - hopefully our API avoids that risk

RESOLUTION: Seek TAG recommendation on asynchronous context creation naming issue

Define ULP (unit of least precision) tolerances for testing

#265

<ghurlbot> Issue 265 Define ULP (unit of least precision) tolerances for Conformance testing of WebNN API (BruceDai) cr

anssik: w-p-t PR has been updated: https://github.com/web-platform-tests/wpt/pull/34287
… two the most recent commits add the following changes:
… 1) add tests for concat / reshape / slice / split / squeeze / transpose with input tensors
… 2) Add tests for clamp and relu operations
… ask for Chai to review the proposed ULP tolerances

chai: I've been working with my team for quite a while on this
… we have internal values for DirectML, but they're not fit to use directly in a public way
… we've been looking at how to make them leverageable in the WebNN context
… we're very close - some of the ops won't be to rely ULP given the set of devices & the CPU/GPU abstraction
… operators like add can use a 0-ULP tolerance, but pow or sin/cosin are very complex and implementation-based
… a ULP test won't be reliable for these
… we've been tabulizing these recommendations and we're close to share them out
… it won't be short answer
… one of our senior engineer in our team has been working on this and may be providing the input on the issue

anssi: we'll be happy to invite him on a call to discuss this

bruce_dai: thanks chai - looking forward to the input
… the ULP test assertion has landed in WPT

dom: we can land WPT tests through our own internal review when we don't touch files outside of webnn

Add method steps and normative algorithms to operations

anssik: issues #210 and #211

<ghurlbot> Issue 211 Define algorithms for dictionaries with lists as default values (anssiko)

<ghurlbot> Issue 210 Add method steps to operations (anssiko) cr

anssik: Zoltan volunteered to help with these issues, he sent regrets for today, so we'll get back to this on our next meeting

Support for int8 quantized models

#128

<ghurlbot> Issue 128 WebNN should support int8 quantized models (wchao1115) cr

anssik: on our previous call we heard support from Rafael/Msft and Jonathan/Google not opposed
… Ningxin shared implementation options:
… - quantize/dequantize operations that frameworks could use within graphs
… - introduce this as an operand descriptor
… thoughts on these options?

Ningxin: also, another approach to introduce quantized ops, such as quantizedConv etc.
… the WG should choose the quantization schema we need to survey the existing native APIs
… Chai I recall the spec PR for this was on your list?

chai: support quantized int8 is important for NPU, even if they're not yet in scope

anssi: do we still think we can have it included in the spec by EOY/CR?

ningxin_hu: it's important for NPU, but int8 quantization is optimized in CPU & GPU for some hardware configuration
… that would lead to performance boost there

dom: I'm hearing there's value to get this feature done, but not clear who is going to make a proposal
… it would need to land in the upcoming few months
… should clarify whether we make this CR and who is driving it

chai: is this asking whether to hold our CR transition until we add int8 support?

anssi: essentially

chai: if the timeline is set re shipping, we need to look at what we have
… and figure what we can live without

anssi: our work is driven by use cases from which we've derived our features for performant operations on current or soon-to-be-current hardware
… that said, our CR is not the end

Current CR issues

dom: is int8 quant nice to have or must have for CR?

ningxin_hu: it looks like int8 is a feature addition compared to algorithm steps is a bug fix
… algo steps looks quite a bit of editorial work given the many operations we have to write up

chai: I think it looks quite doable to keep quantized ops, possibly as a new device type
… to make sure we wouldn't break the API by adding it later
… esp with the idea of adding the NPU device type

dom: I think it is good idea to evaluate whether our API can work with new device types such as NPUs, not sure if NPUs should be in CR release scope due to required implementation experience
… quantization for existing devices is fine, if this effort only makes sense in the context of NPU support I'm less sure

anssi: we can identify features as at risk to allow exiting CR in time if implementation feedback (for e.g. NPU) is not received

anssik: evalution now is good, dom?

dom: correct

Use cases for future work

Content filtering

anssik: Humera submitted a PR #253 that was presented to the group that recommended this as a future work item

<ghurlbot> Pull Request 253 Add "Ethical Content Filtering" use case to WebNN specs (humeranoor)

Future work proposal: Content Filtering
… to keep discussion in one place, proposal to close the PR
… to implement this use case, the web community needs to agree on web extensions and improvements to other APIs such as webRequest and declarativeNetRequest
… thanks Humera!

Humera: I don't mind closing the PR now

<Humera_eyeo> https://webmachinelearning.github.io/webnn/#usecases-application

Humera: use cases drive WebNN API definition and many nice applications in this list, we found content filtering use case was missing, and if this use case is in the list, it will help convince other web extension developers to change their APIs

dom: there's a very active CG for web extensions

https://www.w3.org/community/webextensions/

https://github.com/w3c/webextensions

Performance adaptation

#207

<ghurlbot> Pull Request 207 Update "Performance Adaptation" use case (spshin3)

anssik: this PR has been open for over a year now and we have had an extensive discussion on it
… based on review comments, it is suggested this use case should be submitted as a future work proposal https://github.com/webmachinelearning/proposals
… and we should this PR
… any concerns with that?

[no concerns heard to close PR]

– DRAFT –
WebML WG Teleconference – 8 September 2022

08 September 2022

Attendees