W3C Workshop on Web and Machine Learning

Introducing WASI-NN - by Mingqiu Sun & Andrew Brown (Intel)

Previous: Collaborative Learning All talks Next: Fast client-side ML with TensorFlow.js

    1st

slideset

Slide 1 of 40

Hello, this is Mingqiu Sun.

I have my colleague, Andrew Brown with me.

Today, we're gonna talk about WASI-NN, stand for WebAssembly System Interface for Neural Network.

Let's talk about what is WASI?

WASI stand for WebAssembly System Interface.

It's a subgroup of the WebAssembly CG, and we are the official group to define WASI API. And here the GitHub repo, and we have a video meeting biweekly.

WASI is using Witx to defined interface.

So let me talk about some background information about Bytecode Alliance.

So this is an open source community that we're dedicated to create a secure new software foundation, building on top of such standard as web assembly, and WASI.

Mozilla, Fastly, Intel and RedHat are the founding members.

Currently, we have three major engagement with that organization.

We contribute a small footprint web assembly implementation coder or WebAssembly Micro Runtime.

We are responsible for SIMD implementation in Wasmtime, and we are driving this WASI Neural Network interface definition and the POC.

So those are the three major activity we are engaged with.

And today, we're gonna talk about the WASI-NN.

So let's talk about why we are talking about this.

What's the motivation behind WASI neural network?

So, in a typical use scenario for machine learning, trained model need to be deployed on a wide variety of devices with different architecture and operating systems.

So, web assembly is a perfect format for this kind of deployment because it's platform independent.

So why WASI?

Why not doing machine learning completely inside web assembly?

So the main reason is that machine learning typically require a special hardware support in order to be high performance.

You know, for example, for CPU, typically, the AVX512 to have extremely good performance and similarly you might need a GPU, TPU for hardware acceleration.

So machine learning is still evolving rapidly and there was like a new operation and a new network topology emerging continuously.

So it makes sense to have a system interface that connect with a special implementation of those topology or new operations outside a web assembly domain.

So here a few design considerations.

We are focusing on more, what's called a model loader API first, because you know, inferencing is by quantity, it's a vast majority of our machine learning use cases.

And that's the reason that it's our initial focus and we plan to add training part later on.

Secondly it's a simpler API with excellent IP protection.

So you don't need to expose the internal details about your machine learning model through this API.

And it was inspired by the WebNN effort and they have very similar model loader API.

And then, we had the joint review with them.

So, our intention is to make this API framework and the model format agnostic, and then we expect that it will be supported on a wide variety of devices, such as CPU, GPU, FPGA, and TPU.

So next slide.

We're gonna turn over to my colleague, Andrew, to cover the actual API definition.

Hi, so the proposed WASI-NN interface is available at this link.

If you see some of the examples from this slide, you'll see that it specifies a way to describe tensors, a way to load models, and a way to execute inference requests using those tensors and the loaded models.

The proposal does not yet include a mechanism for training models.

And also notable is it doesn't specify the encoding of the model.

And so, that model format is opaque at the WASI-NN interface level.

That means that the implementation of WASI-NN, what's in the, for example, WebAssembly runtime would have to understand the model format in order to perform the inference.

What you're looking at here is a sort of simple architecture diagram of a POC that we're attempting, that we're looking at in Wasmtime.

Once complete, what you'll be able to do is take user application code and a WASI-NN header, and compile that to a WASM file.

If you combine that with a trained model, potentially converted using OpenVINO's model optimizer, you can hand those over to Wasmtime and Wasmtime will be able to execute the WASM file using the model file to perform inference.

And the path we're taking right now is to implement the WASI-NN interface using OpenVINO, which would allow us to perform inference on a variety of different hardware.

Back to you, Mingqiu.

Okay.

So call for action.

So this is still like an early stage of proposal.

We would like to welcome your input on this proposal to engage us in the WASI community.

This is early stage, as I said, and it's easy to change if you see anything you don't like.

Thank you very much.

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

Previous: Collaborative Learning All talks Next: Fast client-side ML with TensorFlow.js

Thanks to Futurice for sponsoring the workshop!

futurice

Video hosted by WebCastor on their StreamFizz platform.