WebML WG Teleconference – 18 Nov 2021

Meeting minutes

TPAC meeting follow-up

anssi: we had great presentations over the 3 days of our TPAC meeting
… I'm thinking of debriefing these discussions in order we had them during TPAC
… we could also add Model Loader to our agenda

Rationale/criteria for adding new ops to the WebNN API

ONNX adding new operators presentation

anssi: we had a guest speaker, Michal from the ONNX project
… how much of these learnings resonate with our needs? which of these thoughts should we adapt to our work?
… from my perspective, many of the guidance Michal has showed we've been following implicitly, like decomposition into primitives, or backing operators with use cases
… or tying addition to support in popular frameworks

Rafael: Rama is the one most closely involved in ONNIX & operators

chai: in general, I would say we have already taken many of the benefits and design considerations from ONNX in WebNN
… I think we're already pretty aligned with what was presented
… one tension point that I'm still unsure about: ONNX was initially focused fully on interoperable, as a portable machine learning format, without being specific about implementations
… ONNX is pretty "relax" in that it defines the semantics of an operation without being prescriptive on how it should be implemented
… WebNN is almost the same: we want interop
… but the difference is that WebNN is designed for performance
… to give better results than e.g. WebGL-based framework implementations
… direct access to platform APIs allows this additional performance
… because want it to be fast *and* cross-platform interoperable, this creates a tension point
… ONNX didn't have to consider hardware-acceleration
… for WebNN, we should try to be optimal first while providing x-framework support
… we need to pay attention to how operators will be implemented

<ningxin_hu> +1 for implementability

anssi: ONNX doesn't include a reference implementation
… why was that decision made?
… we've been discussing that for WebNN to implement a proper test suite for it
… any learning from ONNX?
… Web APIs traditionally don't have a reference implementation

Chai: this decision emerges from ONNX not being implementation specific
… from a standard point of view, I agree we shouldn't care
… but for testing, we do need a reference implementation to serve as a baseline semantic behavior to be compared with
… otherwise there is no clarity on how it should work
… but this shouldn't be part of the standard
… for conformance testing, we have to compare results from the hardware with something
… as I have been discussing with Bruce

anssi: should we define a lightweight process for submitting new operators?
… do we want to add some more formality to our current process?

rachel: +1 to developing some reference implementation
… ONNX may be presenting a simpler path to ease adoption

Ningxin: two example of lessons I learned: static input preferred over dynamic values

ONNX "Prefer static attributes over dynamic input values"

Ningxin: this relates to our decision to move min / max value from dynamic to static
… with static values being easier to optimized and to be mapped to native APIs

Use tensor type for the padding rather than MLOperand #224

Ningxin: another is issue #224 which proposes to change the pad operator to use a static array
… I would like us to consider this as a guideline for adding new ops
… Another lesson is to include the shape inference logic

ONNX "Shape inference logic should be included"

<Chai> +1 on prefer static, dynamism breaks accelerator's pipelining

Ningxin: that's almost true for WebNN operators
… but there are gaps, e.g. the recent issue about the @@@ is missed for conv2d
… we could call out for the output shape calculation - that will help for implementations, both for the reference impl and browser implementations
… some native APIs won't calculate the shape by themselves

anssi: we could document these principles in a .md file to help us keep track of this
… maybe completed with issue/pull request templates on github
… any objection to adopting this? starting small

<ningxin_hu> +1

RESOLUTION: Create a light-weight process to guide submitting new operator requests to WebNN

Versioning and web compatibility

<Chai> +1

anssi: we had TAG participants joining us for that one; my take away from this session is that we should request incremental TAG review when they evolve

Web Neural Network API TAG review

anssi: WebNN was reviewed some months ago
… once we evolve the spec, we can ask TAG feedback on specific design choices we're making
… we may want to submit the CG's Model Loader API for early TAG review

Privacy and security discussion

Fingerprinting best practices

Fingerprinting severity

anssi: we had members from the Privacy IG (PING) joining us with good discussions
… Nick Doty, author of the fingerprinting guidance, described some of the best practices in this space
… we should document the fingerprinting surface of the WebNN check based on these best practices
… we also discussed with the WebGPU people based on their experience in that space
… the adaptor selection in WebGPU exposes information

AI accelerator device selection #169

anssi: similar to our issue #169
… any thoughts on this?

Chai: this is a complicated topic; it's not just adding an enum value
… you don't know where the resources come from; many APIs that run on the accelerators nowadays still take resources from the CPU
… that's a difference between GPU and an arbitrary accelerator device
… Also, to different people device selection are thought about differently
… for WebGPU/WebGL, they'll think of it as picking one device among many
… for people running a model, they want the best device for the model they have
… this can require workload analysis (heavy) vs picking one of many adaptors
… the design is not fully settled
… someone has been asking a smart selection logic with "auto"
… it needs a lot more thoughts, since there is no precedent on how to do that on platforms today

Adding a new use case for 'Framework Use Cases' #207

Chai: this may be premature to standardize

dom: on the device adapter aspect, I agree with Chai this is a challenging problem, I think the solution might be to identify what kind of params would help the browser pick the right device
… e.g. WebXR API is build around selection of the device that is used to run VR/XR experiences, this can have significant impact on the API shape

anssi: we need to understand the problem before designing the solution

ningxin: with WebNN, we have another preference that developers can be set based on the power setting - low power vs high performance
… this would give opportunity for implementors to help the Web app to access some AI accelerators if it fits in this power category
… that would help some AI accelerators be exposed through this setting

chai: even without the context of the Web today, in a native platform API e.g. on Windows, and you want the app to use that API, even at that level, there is no support for that; we don't know how to do it
… the current accelerators in the market come with their own ways of @@@
… there is no universal way to expose these accelerators today

ML JS framework performance, focus areas for WebNN

WebNN ML JS Framework Performance presentation slides

anssi: we got a nice presentation from Ningxin on benchmark results
… any take away in terms of potential improvements of the API?
… there are 2 design related issues: sync vs async, main vs worker exposure

Should restrict the sync APIs to only exist in Workers? #229

Should WebNN support async APIs? #230

anssi: they have performance impact

Ningxin: I opened this issue based on feedback from TPAC meetings
… the WebNN sync APIs have proved to work very well for WASM-based frameworks (ONNX, TF-Lite, OpenCV)
… because they're written in C++ using sync primitives, using sync WebNN APIs makes it easy to compile them in WASM
… but sync APIs block the main thread
… the solution would be to move the APIs to the worker thread
… this requires communication between main and worker to exchange the data across thread
… which can have performance impact
… the impact is still unknown
… the results from my prototypes were based on main thread operations
… the other issue is from the JS developer perspective, e.g. TF.js
… it's used mostly in the main thread, where we would need an async API to avoid blocking
… should we also provide an async API for compute? that would help with JS adoption of WebNN

RafaelCintron: WebGPU and WebGL are technically async APIs because you submit it on the CPU which then queues it to the GPU
… when WebNN is used in the GPU, the commands are executed in parallel from a CPU perspective, so essentially async - we should block on the CPU until the inference is done
… for WebNN CPU operations, they shouldn't be blocking - so it might be OK to have a promise callback
… Workers are challenging to use to manage multiple threads; there is ongoing work to make it simpler to split things across workers
… we shouldn't require people to use web workers for now

<ningxin_hu> great inputs, thanks Rafael

dom: related discussion happening in WebRTC WG, whether to expose an API in worker context only, no decision yet
… want to be careful to not create situations similar to XMLHTTPRequest API that initially was both sync and async and sync API made it easier to adopt for developers but had to be deprecated with a high cost due to performance penalty

anssik: other APIs besides WebCodecs and mediacapture-transform with similar issues?

anssik: let's continue discussion in the issues Ningxin opened

Integrating an open-source cross-platform implementation of the Web Neural Network API into a web engine

Integrate WebNN-native into Chromium presentation slides

Proposal to start documenting

implementation status

anssik: I've opened a github issue as a follow up to document implementation status

https://webmachinelearning.github.io/

<ningxin_hu> The ChromeStatus entry of WebNN was created: https://www.chromestatus.com/feature/5738583487938560

ningxin: the addition of the chromestatus entry is part of the chromium feature launch process
… as would be the prototyping
… with help from others, we've documenting the motivation, specification link and design doc
… this includes information about compatibility, sync vs async, ethical considerations
… this is a starting point
… this is under review by the chromium stakeholders
… after that, we will send an intent to prototype to blink-dev and expect to receive broader feedback
… once that's done, I'll share the link to that email here

<Jonathan> great progress, Ningxin!

anssik: great progress, looking forward to see this landing

<Jonathan> me too

anssik: in the interest of time, we'll defer "Conformance testing of WebNN API" and "Ethical issues in using Machine Learning on the Web" as well as "Model Loader API update" to our next call that takes place 2nd Dec 2021
… thanks for joining!

– DRAFT –
WebML WG Teleconference – 18 Nov 2021

18 November 2021

Attendees