W3C Workshop on Web and Machine Learning

ml5.js: Friendly Machine Learning for the Web - by Yining Shi (New York University, RunwayML)

Previous: Paddle.js - Machine Learning for the Web All talks Next: Pipcook, a front-end oriented DL framework



Slide 1 of 40

Hello, everyone.

My name is Yining.

I am an adjunct professor at New York University.

I work on ml5.js, a friendly machine learning JavaScript library.

In this talk, I will make a brief introduction to ml5.js, how it's built and discuss some challenges in the development process.

Ml5.js aims to make machine learning more approachable to a broad audience of artists, designers, creative coders and students.

The library provides access to machine learning algorithms and models in the browser, building on top of TensorFlow.js with no other external dependencies.

Ml5.js is inspired by Processing and P5.js, whose goal is to empower people of all interests and backgrounds to learn how to program and make creative work with code.

However, to get started with machine learning, one needs advanced understanding of math and programming.

And we'd like to make this process easier so that machine learning can be something that everyone can learn, understand and explore freely.

What does ml5.js do?

It provides immediate access to pre-trained models in the browser and we can also build and train our own neural networks in the browser from scratch.

Here is an example of MobileNet, object detection model running in the browser with just a few lines of code.

Here is an example of running PoseNet to detect human poses.

Ml5.js also provides friendly API to get access to more human-readable results and draw those results on the canvas with, for example, p5.js.

This is an example of running Style Transfer model with your webcam in real time.

Besides running pre-trained models, we can also create our own neutral networks with ml5.js.

This is a demo of how we can create a neural network that classifies RGB values into common color names.

With ml5.js, we can load the data, create model, train it and run the model.

With the debug mode enabled, ml5.js can also visualize the training progress on the right-hand side.

It helps us to debug and improve our neural network.

Here is a collection of other models and methods that ml5.js provides.

You can learn more about them on the ml5 website.

Ml5 has a wide collection of image, sound and text-based models with a variety of applications, such as detecting objects, human bodies, hand poses and faces, generating text, images and drawings, implementing image translations, classifying audios, detecting pitch and analyzing words and sentences.

Ml5.js also provides NeuralNetwork, FeatureExtractor, KNNClassifier and KMeans as helper functions.

How do I use ml5.js?

We can run a model in the browser with ml5.js in three simple steps.

First, create a model.

Secondly, ask the model to classify or predict something based on a input, like an image or a text.

And step three, getting the results.

It also has great integration with p5.js, a JavaScript library for creating graphics and animations in the browser, which makes it easier to get inputs from webcam or microphones and also to show the outputs with canvas, image or audio.

How is ml5.js built?

Besides the core library, the ml5.js project also includes examples, documentations, guides for training and data collection, learning materials for workshops and courses.

Ml5.js extends the functionality of tf.js.

It uses tf.js models, data API, layer API and the face API.

Under the hood, it utilizes the CPU, WebGL, or WebAssembly in the browser.

Ml5.js provides a high-level and beginner-friendly API to users.

Web applications are very accessible.

There are a lot of web applications made by the ml5.js community.

Here are a few examples.

A Whac-A-Mole game that you can play with your webcam, a flying game where you can control your characters with your voice, an interactive story reading experiments that uses your voice as input to generate stories and drawings.

There are many more applications built with ml5.js that you can find at its community page.

People find the low effort in using existing browser API desirable.

For example, using webcam and microphones with the ability of rendering output easily to image, canvas, audio or text elements on the DOM.

So they're perfect for creative projects.

Webcam video, audio and mouse interactions are often used as input to models and the conversion between those formats is often a multi-step process.

Therefore, having native support for converting browser I/O streams to model inputs and vice versa would be very helpful.

For example, TensorFlow.js models support HTML video or image elements as model inputs.

In the progress of porting models into ml5.js, the first thing to consider is the model size.

It needs to be small enough so we can load it into the browser.

Secondly, to support real-time interaction in the browser, the model needs to have low latency.

The last thing is the model format.

It should be portable to the web.

Here is a common workflow of porting a pre-trained model into ml5.js.

A model from a machine learning research paper might be implemented in other frameworks, like PyTorch.

So first, we need to implement the model in TensorFlow and train the model.

Then we convert the model into tf.js format with tf.js converter.

And lastly, port it into ml5.js to provide high-level API to users.

Here, the first step, which is implementing the model in TensorFlow and train it is the most time-consuming step and not all the operations are supported between different machine learning frameworks.

Therefore, it will be very helpful to have a standard model format for the web or have a tool that can make this step easier.

ONNX project is making the conversion between different machine learning frameworks easier.

Here are some more links about ml5.js.

That's all from me.

Thank you so much for watching.

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

Previous: Paddle.js - Machine Learning for the Web All talks Next: Pipcook, a front-end oriented DL framework

Thanks to Futurice for sponsoring the workshop!


Video hosted by WebCastor on their StreamFizz platform.