W3C Workshop on Web and Machine Learning

ONNX.js - A Javascript library to run ONNX models in browsers and Node.js - by Emma Ning (Microsoft)

Previous: Fast client-side ML with TensorFlow.js All talks Next: Paddle.js - Machine Learning for the Web



Slide 1 of 40

Hi, everyone.

This talk is about ONNX.js, which is a JavaScript library to run ONNX model in a browser and Node.js.

This is Emma from Microsoft.

JavaScript is one of the most important languages.

As a web technology survey reports, JavaScript is used by 95% of websites, and it tops the list of the most popular client-side languages.

Another important scenario using JavaScript is Electron apps.

Electron enables you to create desktop applications with pure JavaScript by providing a runtime with rich native APIs.

If you can build a website, you can build a desktop app with Electron.

There are a lot of well-known apps built with Electron, Slack, VS code and GitHub desktop.

All of them are done with Node.js through Electron, and the experience is pretty good.

Same as websites, Electron apps are cross-platform, compatible with Mac, Windows and Linux.

As you know, machine learning has been widely used for improving product experience.

Can we run machine learning with JavaScript in client-side applications?

Originally, people have some concern given that JavaScript isn't designed for high performance computing and machine learning requires significant computation when executing neural network model.

Actually, there are a lot of good techniques to make JavaScript and machine learning work quite well together for developing more engaging and advanced client-side AI capabilities.

Then, there are some well-known benefits of using client-side machine learning, like privacy protection.

Since client-side models work offline, user do not need to worry about their data being sent across the Internet.

Realtime analysis, although client-side hardware may be slow, it's almost certainly faster than waiting to retrieve results from a server when user need to uploading big data in a bad network.

It makes livestream video analysis possible.

Even with no connection to Internet, client-side machine learning experience wouldn't be broken.

When client-side AI applications are developed with JavaScript, AI developers can easily enable consistent AI experience cross-platform, accelerate performance by utilizing GPUs and distribute the experience to users without asking for any libraries and drivers installation.

Similar to tensorflow.js, ONNX.js is another framework to provide capability of running machine learning models with JavaScript.

Model format ONNX.js support is ONNX.

So allow me to give a brief introduction of ONNX first.

ONNX stands for open neural network exchange, is an open standard for representing machine learning models.

As a standard, it defines three things, an extensible computation graph, standard data types, and built-in operators.

Here is an example of ONNX model.

The spec supports both DNN and traditional machine learning models.

As an open standard, the beauty of ONNX is framework interoperability.

As long as a model is trained through a framework which supports ONNX, you can convert that model to ONNX format.

Here are some of the popular frameworks that support ONNX conversion.

For some of these like PyTorch, ONNX format export is built in natively, and for others like tensorflow Keras, there are separate installable package that can handle conversion.

There is already support for many popular models, including object detection like Fast R-CNN, speech recognition and NLP including BERT and other transformers.

Since ONNX community was established in 2017 by Microsoft and Facebook, it has been attracting more and more companies to join.

Today, the ONNX community is made up of over 40 companies.

Last year, ONNX project was accepted into Linux Foundation as a graduated project.

This is a key milestone in establishing ONNX as a vendor-neutral open format standard.

ONNX.js is pure JavaScript implementation of ONNX framework which allows user to run ONNX models in a browser and Node.js.

ONNX.js optimize model inference on both CPU and GPU by leveraging several advanced techniques.

I will talk about the detail later.

The graph on the left is the high-level architecture of ONNX.js.

Graph engine will load ONNX model file, then interpret it to your model DAG, then execution engine will call appropriate backend to execute the model, to get the output.

There are three backends enabled, two for CPU using JavaScript and WebAssembly separately, one for GPU using WebGL.

Also ONNX.js provides profiler, logger and other utilities for easily debugging and analysis.

Except Firefox on Android, ONNX.js supports all browsers on major platforms.

So you can easily build up your AI applications across platform with ONNX.js.

For running on CPU, ONNX.js adopts WebAssembly to accelerate the model at near-native speed.

WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platform.

It's generally much faster than JavaScript for heavy workloads in a machine learning task.

JavaScript is dynamically typed and garbage-collected, which can cause significantly slow down at runtime.

Based on our evaluation, compared to JavaScript, WebAssembly can improve the performance by over 11 times.

We have enabled WebAssembly as one CPU backend since ONNX.js was open sourced in 2018.

One year later, tensorflow.js started exploring WebAssembly.

Furthermore, ONNX.js utilize a web worker to provide multi-thread environment for operator parallelization.

Originally, web worker was introduced to unblock UI rendering.

It allows you to create additional thread to run other long-run computation separately.

ONNX.js leverages web worker to enable parallelization within heavy operators, which significantly improve the performance on machines with multicores.

By taking full advantage of WebAssembly and web worker, the final result shows over 19 times speedup on CPU with four cores.

WebGL is adopted for GPU acceleration.

WebGL is a popular standard for accessing GPU capabilities.

It's a JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser.

WebGL is based on OpenGL, which provides direct access to a computer's GPU.

Graphics creation in JavaScript is similar to machine learning, because it requires fast processing power to animate and draw detailed vectors.

Based on WebGL, ONNX.js enable many optimizations for reducing data transfer between CPU and GPU, as well as reducing GPU processing cycle to further push the performance to the maximum.

Here is a chart showing performance improvements along with some major optimizations.

Finally, we were able to reduce the latency of ResNet50 on GPU by more than three times.


If you want to run a model with ONNX.js, here is end-to-end flow.

You can train a model through any framework supporting ONNX, convert it to ONNX format using public conversion tools, then you can inference the converted model with ONNX.js with this.

This is a HTML example to use ONNX.js, majorly three steps, create an ONNX session, load ONNX model and generate inputs, then run the model with the session.run.

Also you can use NPM and bundling tools to use ONNX.js.

To demonstrate Web ML capability and help user ramp up with ONNX.js easily, we built up ONNX.js demo website.

Five models are enabled on this website.

Here is a example of running YOLO model in a browser.

You can choose different backend, CPU or GPU.

Since YOLO is realtime neural network for object detection, in addition to image detection, we implemented a realtime detection scenario through your local camera.

ONNX.js is evolving and we'd love to embrace your contribution.

Here are three major buckets to make ONNX.js better.

Currently, ONNX.js support limited ONNX operators, we need to catch up with evolving ONNX spec.

There are still a lot of opportunities to further optimize ONNX.js performance.

For example, WebNN, web neural network, is one well promising tech ONNX.JS can integrate.

Some experimental results have already showed very good performance gain.

Lastly, more demos can help attract more users by promoting ONNX.js capabilities.


That's the end.

Hope you enjoy this talk.


Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

Previous: Fast client-side ML with TensorFlow.js All talks Next: Paddle.js - Machine Learning for the Web

Thanks to Futurice for sponsoring the workshop!


Video hosted by WebCastor on their StreamFizz platform.