So let's get started.
It allows you to do machine learning in the browser, on the client-side, which means you have lower latency, higher privacy, and lower serving cost of course.
And we also support other environments such as Node.js, which means we can execute in a whole bunch of places.
And in fact, if we look at the next slide, you can see all the environments we run.
And the reason I bring this up, is because when we're defining web standards, often these things trickle into these other environments as well.
So, we've got all the common web browsers there, but also, Node.js on the back end, React Native for mobile native apps.
We've got Electron for desktop native apps, and of course Raspberry Pi for Internet of Things, which we can access via Node.js.
So for those of you who are not familiar with our architecture, this is the current stack.
Just below this, we have a Layers API, which is a high level API to allow you to do machine learning more easily, which is very similar to Keras in Python, if you're familiar with that.
Below these we have our core and Ops API, which is the more mathematical layer, so allows you to do the things like linear algebra, and so on and so forth.
And this can talk to different environments, such as the client-side or the server-side.
Now if you just focus on the client-side for a second, you can see things like the browser, WeChat, React Native sitting over there, and each one of these environments understands how to talk to different back-ends, such as the CPU, WebGL or web assembly.
Now, of course the CPU is always available, but it's the slowest form of execution.
If a graphics card is available, we can leverage WebGL to get higher performance on the graphics card, and if web assembly is available, we can leverage high performance on the CPU, by utilizing low level instructions.
Now we see three key user journeys right now, when people are using TensorFlow.js.
First one is the ability to run models that are pre-trained, that's the easiest route and what people often start with.
People then choose to try and retrain their models by transfer learning as their next step to work with their own custom data, and then of course a third point is to write their own models completely from scratch.
And this might be in the browser entirely.
Or, it could be a combination of Node.js and then running the resulting model in the browser.
And of course this can be used for anything you might dream up, and here's just a few examples of things people have been creating, that we've seen on the internet today.
Things like augmented reality, sound recognition, sentiment analysis, web page optimization, and much much more.
And today, we'd like to talk to you about some of those limitations and roadblocks that we found whilst building and maintaining TensorFlow.js.
And we believe these, will be applicable to any Machine Learning library created going forward.
So the first point we want to talk about is Float32.
And this is really important to us, so that we can execute models faster, and use less memory when doing so too.
And of course, you might get a 10% drop-off in your model accuracy by doing this, but for some environments that might be acceptable, especially on mobile on older devices, where you might not have the speed to begin with.
Right now, on the server-side, we can actually, store things in 16-bit.
This is the question we'd like to pose to you today.
Next up, garbage collection.
However, the same is not so true for WebGL.
And, as you know, TensorFlow.js uses WebGL to get graphics card acceleration for our Machine Learning models in the web browser and beyond.
So, right now we have a function called TF.tidy() that we've created to clean up after ourselves if the user puts their code within this function.
So the question is here, how can we clean up WebGL memory as well?
So we know that WebGPU is also coming down the line, but maybe this needs to be addressed in that specification as well.
Can we clean up graphics card memory both in WebGPU and WebGL.
And the latter, this also might benefit people working with 3D graphics and other things too beyond even the Machine Learning space.
Next up, graphics card acceleration.
Currently, we have WebGL to execute Ops in the machine learning model as we previously discussed, but it'd be much more efficient if the browser exposed lower level APIs to the graphics card so we could more efficiently leverage the hardware.
Now, the question here is what lower level support do we need for efficient Machine Learning when using the graphics card.
And of course, WebGPU is on the way, but, what else needs to be added to that spec to ensure we have something that works well, specifically for machine learning.
Next up, we've got Model Security.
Now we see a lot of production use cases, whereby they require the model to be securely delivered to the client, in a way that it can't be copied and used on other websites.
Especially for large corporate brands, they spend a lot of money and time creating these models, and where they won't just give away their IP for free.
However, they can do some kind of remote procedure call to that code, so that they can execute it and get results back, without exposing the model itself.
And this is up to discussion of course, that's just one example of how it could be solved which would require browser level implementation support to do that properly.
And currently, this is a big barrier for many people trying to go into production-use cases but still want the benefits of running on the client side, such as privacy and lower latency and cost savings on the server.
And of course, as soon as you put the model on the server-side, those benefits disappear because you have to then send the data from the client to the server.
And what about model warm-up, it can take a couple of uses before the model can actually run at the optimal speed in the browser environment.
Of course, first of all you need to download the model.
Secondly you need to load it into memory and pass all that stuff, and then thirdly, you need to just run some data for at once to get everything else set up.
And this can take a non trivial amount of time especially for larger models.
So the question here becomes, what if there is a standardized way to specify that a better model is available and should be prepared and swapped to when ready kinda like progressive enhancement.
Now taking a very hypothetical example, maybe you've got an object recognition model, and this could be something like COCO SSD that gives you the bounding box data.
This loads really fast in the web browser right now and can be used very quickly and efficiently.
But maybe your end goal is to actually have some kind of image segmentation model which might be heavier to load.
So what if you could take that initial smaller model, load that, get some results coming in straight away, but once heavier model is actually ready, you can switch to that automatically.
And this could be very interesting as things progress and we start seeing larger models being used in the web environment in many years to come.
And with that, thanks for watching and I encourage people to check out the #MadeWithTFJS hashtag on Twitter or LinkedIn to see what our community has been making.
You can see a whole bunch of awesome stuff that our community has made, which might inspire you as well of other questions for this discussion.
People are really pushing the boundaries by combining TensorFlow.js with things like WebGL, WebRTC, WebXR, all these other web standards are combining with Machine Learning to do many many great things.
So, feel free to check that out, and we'd love to talk to you later on for the full discussion about this topic.
Thank you very much for watching.