W3C Workshop on Web and Machine Learning

Opportunities and Challenges for TensorFlow.js and beyond - by Jason Mayes (Google)

Previous: Machine Learning and Web Media All talks Next: Machine Learning in Web Architecture



Slide 1 of 40

Hello, everyone.

My name is Jason Mayes, I am the developer advocate for TensorFlow.js here at Google, and today we'd like to talk to you about some of the opportunities and challenges we've seen, whilst creating and maintaining TensorFlow.js, and we believe these things will be applicable to the wider Machine Learning and JavaScript community as well.

So let's get started.

Now, for those of you who are not aware about us, essentially TensorFlow.js is an open source Machine Learning Library, that is built in JavaScript.

It allows you to do machine learning in the browser, on the client-side, which means you have lower latency, higher privacy, and lower serving cost of course.

And we also support other environments such as Node.js, which means we can execute in a whole bunch of places.

And in fact, if we look at the next slide, you can see all the environments we run.

And the reason I bring this up, is because when we're defining web standards, often these things trickle into these other environments as well.

So, we've got all the common web browsers there, but also, Node.js on the back end, React Native for mobile native apps.

We've got Electron for desktop native apps, and of course Raspberry Pi for Internet of Things, which we can access via Node.js.

So, I just want to be mindful when we are thinking about ideas today that we are aware that these things could trickle through to these other areas when people try to use machine learning in JavaScript in these environments as well.

So for those of you who are not familiar with our architecture, this is the current stack.

Right now we have a bunch of pre-made models that sit at the very top there, that are super easy to use JavaScript classes.

Just below this, we have a Layers API, which is a high level API to allow you to do machine learning more easily, which is very similar to Keras in Python, if you're familiar with that.

Below these we have our core and Ops API, which is the more mathematical layer, so allows you to do the things like linear algebra, and so on and so forth.

And this can talk to different environments, such as the client-side or the server-side.

Now if you just focus on the client-side for a second, you can see things like the browser, WeChat, React Native sitting over there, and each one of these environments understands how to talk to different back-ends, such as the CPU, WebGL or web assembly.

Now, of course the CPU is always available, but it's the slowest form of execution.

If a graphics card is available, we can leverage WebGL to get higher performance on the graphics card, and if web assembly is available, we can leverage high performance on the CPU, by utilizing low level instructions.

Now, it should also be noted people can also convert models from Python into JavaScript using our converters, as you can see on the left-hand side, and this is something to bear in mind because people might try and load larger models or more complex models in the future via this method.

Now we see three key user journeys right now, when people are using TensorFlow.js.

First one is the ability to run models that are pre-trained, that's the easiest route and what people often start with.

People then choose to try and retrain their models by transfer learning as their next step to work with their own custom data, and then of course a third point is to write their own models completely from scratch.

And this might be in the browser entirely.

Or, it could be a combination of Node.js and then running the resulting model in the browser.

And of course this can be used for anything you might dream up, and here's just a few examples of things people have been creating, that we've seen on the internet today.

Things like augmented reality, sound recognition, sentiment analysis, web page optimization, and much much more.

Well, almost anything.

And today, we'd like to talk to you about some of those limitations and roadblocks that we found whilst building and maintaining TensorFlow.js.

And we believe these, will be applicable to any Machine Learning library created going forward.

So the first point we want to talk about is Float32.

Now, this is great for many of the tasks, and I know Float64 is even supported in JavaScript.

However, when we're doing model quantization, we actually want to support Float16, and this currently does not exist in JavaScript or in Wasm.

And this is really important to us, so that we can execute models faster, and use less memory when doing so too.

And of course, you might get a 10% drop-off in your model accuracy by doing this, but for some environments that might be acceptable, especially on mobile on older devices, where you might not have the speed to begin with.

Right now, on the server-side, we can actually, store things in 16-bit.

However, when we load it into JavaScript memory, it then gets converted to Float32, and we end up using the same memory and have the same speed as before, which means no progress there for us.

So, what if we could support model quantization to use less memory and gain faster inference speeds in JavaScript at runtime.

This is the question we'd like to pose to you today.

Now, of course, to address this, we'd need to do this in both JavaScript and web assembly, so that supports all the environments we will execute in the foreseeable future, as we showed at the beginning of this presentation.

Next up, garbage collection.

For WebGL. As you know, JavaScript is really great at cleaning up after itself when writing Vanilla JavaScript code.

However, the same is not so true for WebGL.

And, as you know, TensorFlow.js uses WebGL to get graphics card acceleration for our Machine Learning models in the web browser and beyond.

So, right now we have a function called TF.tidy() that we've created to clean up after ourselves if the user puts their code within this function.

However, not all users know about this at the very beginning especially beginners, and for that reason, it'd be really nice to have the same level of clean-up with graphics card memory as we do with the regular JavaScript code.

So the question is here, how can we clean up WebGL memory as well?

So we know that WebGPU is also coming down the line, but maybe this needs to be addressed in that specification as well.

Can we clean up graphics card memory both in WebGPU and WebGL.

And the latter, this also might benefit people working with 3D graphics and other things too beyond even the Machine Learning space.

Next up, graphics card acceleration.

Currently, we have WebGL to execute Ops in the machine learning model as we previously discussed, but it'd be much more efficient if the browser exposed lower level APIs to the graphics card so we could more efficiently leverage the hardware.

Now, the question here is what lower level support do we need for efficient Machine Learning when using the graphics card.

And of course, WebGPU is on the way, but, what else needs to be added to that spec to ensure we have something that works well, specifically for machine learning.

Next up, we've got Model Security.

Now we see a lot of production use cases, whereby they require the model to be securely delivered to the client, in a way that it can't be copied and used on other websites.

Especially for large corporate brands, they spend a lot of money and time creating these models, and where they won't just give away their IP for free.

So, our question is here, how can we deliver a Machine Learning model to the JavaScript environment in the Web browser, without revealing it, maybe there's a way that we can have a secure way to grab some arbitrary JavaScript code from the server, that is doing Machine Learning stuff along with the model as well and the assets you might need to execute, and all that can be downloaded by the browser behind the scenes, in private memory, which cannot be accessed by the JS developer on the front end.

However, they can do some kind of remote procedure call to that code, so that they can execute it and get results back, without exposing the model itself.

And this is up to discussion of course, that's just one example of how it could be solved which would require browser level implementation support to do that properly.

And currently, this is a big barrier for many people trying to go into production-use cases but still want the benefits of running on the client side, such as privacy and lower latency and cost savings on the server.

And of course, as soon as you put the model on the server-side, those benefits disappear because you have to then send the data from the client to the server.

And what about model warm-up, it can take a couple of uses before the model can actually run at the optimal speed in the browser environment.

Of course, first of all you need to download the model.

Secondly you need to load it into memory and pass all that stuff, and then thirdly, you need to just run some data for at once to get everything else set up.

And this can take a non trivial amount of time especially for larger models.

So the question here becomes, what if there is a standardized way to specify that a better model is available and should be prepared and swapped to when ready kinda like progressive enhancement.

Now taking a very hypothetical example, maybe you've got an object recognition model, and this could be something like COCO SSD that gives you the bounding box data.

This loads really fast in the web browser right now and can be used very quickly and efficiently.

But maybe your end goal is to actually have some kind of image segmentation model which might be heavier to load.

So what if you could take that initial smaller model, load that, get some results coming in straight away, but once heavier model is actually ready, you can switch to that automatically.

And this could be very interesting as things progress and we start seeing larger models being used in the web environment in many years to come.

And with that, thanks for watching and I encourage people to check out the #MadeWithTFJS hashtag on Twitter or LinkedIn to see what our community has been making.

You can see a whole bunch of awesome stuff that our community has made, which might inspire you as well of other questions for this discussion.

People are really pushing the boundaries by combining TensorFlow.js with things like WebGL, WebRTC, WebXR, all these other web standards are combining with Machine Learning to do many many great things.

So, feel free to check that out, and we'd love to talk to you later on for the full discussion about this topic.

Thank you very much for watching.

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

Previous: Machine Learning and Web Media All talks Next: Machine Learning in Web Architecture

Thanks to Futurice for sponsoring the workshop!


Video hosted by WebCastor on their StreamFizz platform.