WebML CG Teleconference – 14 May 2020

Meeting minutes

WebNN first wave models and ops ONNX and XLA-HLO intercept

anssik: PR updated with input from Chai and Daniel to add ONNX and XLA-HLO columns

anssik: PR pending Daniel's review.

ningxin_hu: I've integrated Daniel's and Chai's input to the table

<ningxin_hu> https://‌github.com/‌webmachinelearning/‌webnn/‌blob/‌d97d2dba34b82bc2bb579c55fb5741a82f253f93/‌op_compatibility/‌first_wave_models.md

first_wave_models.md (HTML preview)

ningxin_hu: latest update was to add ONNX and XLA-HLO data for ops, for example Daniel commented that some ops can be lowered to more primitive ops in XLA-HLO
… that's incorporated into the table now, so all comments have been addressed
… another update is the gemm op, that's also added to the table to reflect the recent spec changes
… anssik: Any blockers that need to be discussed and resolved in the call?

PROPOSED RESOLUTION: Merge First Wave Model PR #52

anssik: any concerns?

[hearing none]

Resolution: Merge First Wave Model PR #52

Element-wise add & mul, concat, reshape, gemm, and transpose ops WebNN API definitions
… to summarize the work, 3 PRs are open 1 has been merged already:

<paul-mcdadniel-msft> hello !

element-wise add & mul (PR #54)

concat (PR #55)

reshape (PR #57)

gemm and transpose (PR #58)

anssik: specifically, PRs #54, #55, and #57 are open, PR #58 merged.
… let's take these op definitions one by one

Element-wise add & mul

anssik: pending Chai's review, merge conflicts now resolved.
… other opens?

ningxin_hu: no other opens, waiting for Chai's review

Concat

anssik: pending Rama's review
… Rama suggested to use of negative axis values to count backwards from the last axis
… Ningxin shared some OS APIs do not support negative axis values, would need to be handled in browser implementation, alternative proposal to leave it to JS framework-level
… Comments?

ningxin_hu: waiting for Rama's opinion whether supporting only positive axis values is OK, leave it to JS frameworks to handle conversion

rama: that's definitely fine, I guess only case where that could be a problem is if the rank is not statically known, you cannot do this at build time, must do at runtime

<Chai> ningxin, i just approved the add PR. Looks good.

Reshape

anssik: Pending Rama's review
… it seems this PR could probably be merged, since Rama's suggestion to overload shape() could be added later, thoughts?

rama: that's fine

Gemm and transpose

anssik: General Matrix Multiply (gemm) and transpose, a lot of good design discussion in this PR
… initially explored an idea to make gemm() a static method on the nn interface instead of a regular interface method, to allow the caller choose whether to use the static helper functions or to explicitly call into regular interface operations.
… then dropped the static gemm() operation and instead creating a non-normative note section that explains how the behavior of this operation can be generically emulated
… this convention is borrowed from WebGL spec (thanks Rafael!)

Chai: Gemm goes back to model table, Gemm is a high-level op that can be implemented by using others ops, one of the ops that are often handled as a single op in the OS level
… e.g. in DirectML handled as a single op, the question is how to handle high-level ops
… need to understand the core set of the interface that needs to be implemented, and for high-level ops they can be implemented with something else, so they could be considered "optional"
… in the beginning explored the idea, looked at WebIDL static method approach, based on feedback figured out it might create more confusion that clarify
… WebGL spec tackes similar issues using non-normative notes, so following the same pattern for gemm definition (thanks Rafael!)
… this makes it clear in the spec the op is in fact a high-level op, and caller can use the pseudo-code defined means that decomposes into core low-level ops
… consider that's a fair balance between choosing between small and big ops, like debated earlier
… everyone seem to sign off on this approach
… this apply to all the ops in the table, likely relu and pooling, like Daniel suggested
… for big ops we should be consistent and explain how big ops can be implemented in terms of other ops
… last note, gemm requires transpose, so added that in the same PR, for any big op, if it turns out we need a primitive op, we need to add that

ningxin_hu: first, I'd like to mention there's an agenda item "Intersection between XLA & ONNX (Paul)"

<paul-mcdadniel-msft> let's push the XLA & ONNX ops table discussion for the next session, thanks !

ningxin_hu: second, comment on Chai's high and low level primitives, my question is about models, I'd like to propose we can add activation incl. relu, leaky relu

<ningxin_hu> high level ones: relu, leakyRelu, clip

<ningxin_hu> low level ones: element-wise min and max

anssik: any feedback?

Ping: TF.js perspective, I see the benefit of high-level ops, do these differentiate between low and high level from the system perspective are they likely to be implemented by the OS.

Rama: non-normative means how the high-level can be implemented in terms of low-level ops, different implementations can implement those differently

Ping: conceptually, does the user need to undertand the difference, or is this hidden from the user?

Rama: this is an implementation detail, hidden from the user

Chai: from the user point of view, high and low ops are available, they can use one of another, the purpose for adding a non-normative section is to say this op will probably be faster if you use it rather than implement it yourself (in JS)
… WebNN API is a browser API and is treated as a single unit, on Windows DirectML support gemm as a single unit
… so implementation is a lot more efficient for GPU to use high-level that low-level that requires copies between ops especially with large tensors
… we'd implement both, the goal of the spec should be to focus on ops that we already know and are supported in OS API as single units
… anything can be a big op but differentiation is that only the ones we know are implemented as fused unit, in the interest to keep the spec smaller
… the point of defining big ops is to allow map to OS layer that implements them as a single unit

Ping: my question is from our colleagues from Android, NN API, their design is more having a system becoming a compiler, they try compile the graph into high level ops against their implementation
… compilation effort put on to the user in this case, the other way around is to leave this to the OS and let OS compile the graph instead of the user

Rama: AFAICS, the spec allows both

Ping: people who compile the model, it is not the best way forward?

Chai: for instance we know if you have certain ops you can fuse, that's happening on OS level
… OS level fusion can also be more dynamic
… we cannot leave the option open that if the caller wants to handle it as a single unit it has an option to do that, that is what the spec represents, allows both the options, if the OS is able to do more complex fusion, we're able to do that with the current spec, and spec describes what is possible
… provides options for both paths

Ping: OS can do further optimization?

Chai: Corrent.

Ping: another question, maybe related, for the compiler, should we provide a different type...?

Chai: I made one attempt to differentiate using static method, the feedback was that from the browser's point of view, it needs to implement both so no benefit, has benefit to the caller that can see high level op is different
… now spec describes how those big ops can be implemented in terms of small ops

WebAssembly System Interface (WASI) machine learning module

anssik: Then something new, please welcome Mingqiu_Sun and Andrew_Brown to introduce the WASI-nn proposal

Proposal for WASI-nn: a machine learning module (issue #272)

anssik: Note that we identified this exploration in this group last year, so I'm pleased to see this work is now being explored

Web Neural Network API as-is in WASI? (PR #32)

anssik: WASI issue #272 is for discussing the addition of a machine learning module to WASI. Contains a very rough draft of what the API could look like, wasi_ephemeral_nn.witx

wasi_ephemeral_nn.witx

anssik: loosely inspired by the WebNN API, hence the name WASI-nn

Mingqiu_Sun: prepared some slides with background, will walk you through the API after
… WASI is WebAssembly System Interface, driven by a subgroup of WASM CG
… this module defined in terms of Witx interface defined by Mozilla
… four month ago ByteCode Alliance was formed, focused on WASI, and promote Wasm use outside browser
… we're happily involved with many activities, have an open source Wasm VM for constrained devices, called Wasm Microruntime
… also worked on Wasmtime SIMD implementation
… and them this WASI-nn proposal
… motivation for WASI-nn, we think after you train your model you need to deploy it to various devices different archs and OSs, Wasm provides the benefit of portability
… we want to start simple, have scope on inferencing initially, inspired by WebNN model loader API
… I talked with Ningxin about this idea half a year ago
… we want to be a framework and model format agnostic
… we want to have an initial simple step, and when this group figures out how to do model loader API we can reuse concept

RafaelCintron: what is the plan to adopt this for JS developers?

Mingqiu_Sun: current focus on Wasm, WebNN to cover JS developers

RafaelCintron: WebNN is for graph builders, load model API is another one, we'd like to eventually put these together
… I think making a Wasm only API might not make sense, so we don't exclude web developers

Andrew_Brown: not all use cases are shared between Wasm and JS
… there are runtimes, where there's only Wasm available
… that's the key reason, to serve runtimes that do not understand JS

RafaelCintron: having the JS and Wasm APIs look and feel the same, should be out goals
… so we have first-class Web API for model loader

[Andrew sharing witx definition of the proposal]

Andrew: witx is like WebIDL but for defining WASI modules

[reviewing the details of the proposal definition]

wasi_ephemeral_nn.witx

<Ping> question from ping

Chai: is it corrent to assume, WASI wants to be consistent with WebNN defined for JS audience

Mingqiu_Sun: that is one goal, we want to align where possible, so we can have a uniform API between different languages

<abrown> Here's a more legible version of the API (auto-generated docs so beware): https://‌github.com/‌WebAssembly/‌wasi-nn/‌blob/‌master/‌phases/‌ephemeral/‌docs.md#-wasi_ephemeral_nn

Mingqiu_Sun: but using Wasm specific mechanism,s so it cannot be 100% consistent

Andrew_Brown: it is not exactly the same, but the intent is to be as close as possible to the JS API

https://‌webmachinelearning.github.io/‌model-loader/

https://‌github.com/‌webmachinelearning/‌model-loader

https://‌github.com/‌webmachinelearning/‌model-loader/‌blob/‌master/‌explainer.md

ningxin_hu: was on the queue before this topic :)
… this topic is good, and having a consistent interface between the two is a great goal

Mingqiu_Sun: any comments, please submit feedback via GH repo

<abrown> https://‌github.com/‌WebAssembly/‌WASI/‌issues/‌272

Andrew_Brown: all feedback should go to the tracking issue https://‌github.com/‌WebAssembly/‌WASI/‌issues/‌272

<paul-mcdadniel-msft> it would be great to know if WASI and WASI-nn having started exploring how to layer this on top of operating systems, like WinML and DirectML. how would they pass off the model/graph to the OS to run. thanks for the slides today !!

RafaelCintron_: how much support does WASI have across browsers?

Andrew: it is not aimed at browsers, it is a system interface for standalone runtimes
… Node.js and Wasmtime are two biggest implementations of WASI

RafaelCintron_: can you use WASI in browsers

Andrew: there are experimental means to do that

RafaelCintron_: how does WASI handle if someone want to load say video and keep that on GPU?

Mingqiu_Sun: WASI is in early phase in itself, work ongoing on Crypto API

Andrew: File IO like in POSIX is available

<paul-mcdadniel-msft> thanks !

– DRAFT –
WebML CG Teleconference – 14 May 2020

14 May 2020

Attendees

Meeting minutes

WebNN first wave models and ops ONNX and XLA-HLO intercept

Element-wise add & mul, concat, reshape, gemm, and transpose ops WebNN API definitions
… to summarize the work, 3 PRs are open 1 has been merged already:

Element-wise add & mul

Concat

Reshape

Gemm and transpose

WebAssembly System Interface (WASI) machine learning module

Adjourn

Summary of resolutions

Diagnostics

Attendees

Meeting minutes

WebNN first wave models and ops ONNX and XLA-HLO intercept

Element-wise add & mul, concat, reshape, gemm, and transpose ops WebNN API definitions … to summarize the work, 3 PRs are open 1 has been merged already:

Element-wise add & mul

Concat

Reshape

Gemm and transpose

WebAssembly System Interface (WASI) machine learning module

Adjourn

Summary of resolutions

Diagnostics

Element-wise add & mul, concat, reshape, gemm, and transpose ops WebNN API definitions
… to summarize the work, 3 PRs are open 1 has been merged already: