<anssik> anssik has changed the topic to: https://www.w3.org/wiki/TPAC/2018/SessionIdeas#Machine_Learning_for_the_Web
<Kangz> Hi, this is Corentin Wallez
anssik: Welcome to the Machine Learning for the Web breakout session at TPAC! I'll be giving a quick background on this idea.
… We proposed and created a Community Group a few weeks ago. Before that, we did some incubation in the Web Platform Incubator Community Group (WICG)
… This helped set the scope of the work.
… Basically, the work is scoped to neural networks. Low-level API that can be implemented accross platforms to do client-side inference.
… Why do we want to do that?
… Getting closer to the platforms improves performance. For object recognition and immersive computation, that's crucial.
… But there are also many computation needs that you need to do on the client.
… I would like to start with a demo.
anssik: The demo works on your Web browsers right now because we're using a polyfill built on top of WebGL, and later WebGPU.
ningxinhu: [going through demo]
… Model gets downloaded first. In this case, cached through Service Worker.
… The demo uses Koko(sp?) that can recognize about 100 objects.
… The WebML propotype uses Chromium. Two polyfill implementations: WebAssembly and WebGL.
… WebAssembly is much slower, because WebAssembly lacks SIMD features that allow to parallelize computation.
… WebGL2 is faster. WebML is the fatest.
anssik: The prototype is quite convincing in terms of performance gains with WebML.
… We have a F2F meeting of the CG this Friday afternoon.
… We'd like to hear from your use cases to start with.
… The primary user of the current API is authors of frameworks, e.g. TensorFlow.js
ningxinhu: The scope is that we want to do Edge AI computing.
… [showing presentation slides]
… Sentiment anaylsis, etc.
… JS Machine Learning frameworks such as TensorFlow.js, Keras.js, OpenCV.js provide the capabilities to do the inference on the client.
… The models you've trained can be reused in different frameworks.
… Some advantages to edge AI: latency, availability, privacy, and also cost as processing is delegated to the client. Users use their computation capabilities.
… Gain of performance is also on the power consumption side. Current JS-based solutions consume a lot of power, especially on mobile devices.
… We prepared some benchmark to compare performances.
… I will not go into much details about the data, but the takeaway is that the gap between a Web-based and native solution is huge.
… Even comparing a solution that uses WebGPU and a WebML approach, there's still a 4 times gain.
… Looking at existing solutions today, CPU Deep Learning (DL) optimizations are not exposed to applications. That's crucial, in particular on mobile devices.
… Also dedicated DL hardware accelerators.
?1: Some use cases where deep learning is not the right approach. Why don't you try to support other kinds of algorithms?
… Use case around high-res images
ningxinhu: My understanding is that DL needs low-res images, because training models would not work too well with high-res images.
anssik: We wanted to be practical with the scope. Minimal approach of what would enable use cases fast.
?1: There are use cases where you don't have all of the data to train the models, and Web developers will run into the issue.
ningxinhu: Web developers won't have much data to train models, that's your concern?
ningxinhu: Web developers will be able to reuse other trained models and re-train them with additional data.
… Here, our focus is how to tackle the performance issue for the Web. With that, the Web developer can leverage existing trained models.
anssik: It should be made clear that training is considered out of scope of this work.
ningxinhu: Our proposing Web Neural Network (WebNN) is meant to leverage existing hardware infrastructure. The OS should ship their own Machine Learning API, abstracted away by WebNN.
… WebNN provides low-level interfaces to these APIs to support different models, e.g. ONNX models, TensorFlow models, etc.
… Positioning the API on the Web. A Usage API uses a built-in model, easy to ingrate. Maps to different platform primitives, such as NLP, Skills, MLKit.
… The Model-level API (WebML) would map to CoreML, WinML, TensorFlow Lite, etc.
… WebNN is the acceleration API, close to hardware optimization. Maps to BNNS, MPS, DirectML, Android NN.
… WebNN is our current focus.
anssik: Right, the low-level WebNN proposal is the goal of the incubation in the Community Group. We're hoping this will lead to the standardization of the other higher-level APIs in the future.
Alex: WebNN is JS functions to manipulate models, is that correct?
ningxinhu: We can review the exact design afterwards.
anssik: You can refer to the following scope statement
ningxinhu: On the top of this, WebML (not a good name), if you have interfaces for models, it will ease reuse of models. But that API can be built as a polyfill on top of the WebNN API in the near future.
?2: [performance figures question]
ningxinhu: WebNN is within 10-20% of pure native performances.
<anssik> [POC = proof of concept]
ningxinhu: We want to use a Proof of Concept as starting point and explore cross-platform capability.
… The polyfill API is modeled from Android NNAPI.
… You need to manipulate models, with operations and operands. Also, you need a compilation stage to initialize the hardware.
… Last the execution stage where you process inputs according to the models and make inferance to produce outputs.
… We're using a promise-based API, so when your promise is resolved, you have the answer.
anssik: If people want to see more demos, they can haz them!
ningxinhu: We started with a polyfill with WebAssembly and WebGL backends. Compute shaders.
… and then developed a Chromium prototype, starting with MPS/BNNS API for MacOS, NN API for Android, cross-platform hardware agnostic.
… WebNN can leverage CPUs, GPUs and accelerators.
Myles: Wondering about numbers. What happens that explains the 24x times gap between CoreML (CPU) and WebAssembly implementations.
ningxinhu: Some missing threading and SIMD features explain the difference. The gap should be reduced once features are in.
Iank: The number is a bit unfair for WASM, because these features are incoming.
Myles: Let's pretend these features are used. How far down would that gap become?
ningxinhu: We'd probably go down to 4x.
sangwhan: Comparison should go with TensorFlow. Nested loops to do convolution would be hard to achieve.
Deepti: A lot of feedback about WASM SIMD is that we may not be covering the right set of instructions.
ningxinhu: Multiple add.
sangwhan: Question about the serialization of models, saw that in scope in the draft charter.
[draft charter shown]
sangwhan: To make interoperability less plainful, it would be good to include that in the scope.
… Worried about the bytes in the array buffer.
anssik: I think you might want to open an issue on the GitHub repo
Corentin: There's a standard in the Khronos group for that.
sangwhan: Right, I just want to clarify what's in scope. Doesn't matter if it reuses an existing spec.
ningxinhu: WebNN is very low-level API. Main target is JS frameworks. They can add their logic to deal with such issues.
sangwhan: Not entirely convinced, but OK.
anssik: Feel free to influence the scope on GitHub. The draft charter is a starting point. We want to get your input. Not an easy task to land on a scope that seems to work for people.
anssik: First face-to-face on Friday
… 2 hour meeting. [going through agenda]
… We'd like to start not with the proposed solution, but with use cases and requirements. Got some good contributions from Tomoyuki here for instance.
… Then we'll go into more details on the proposal. If people have other proposals in mind, you're welcome to submit them for consideration.
… I look forward to having a productive session. See you all on Friday!