WebML WG Virtual Meeting at TPAC 2021 - Day 2

27 October 2021


Anssi_Kostiainen, BelemZhang, BruceDai, BryanBernhart, ChaiChaoweeraprasit, CorentinWallez, DeeptiGandluri, dom, Geun-Hyung, jlbirch, JonathanBingham, JunweiFu, MattWilson, MingMing, MingquiSun, NickDoty, NingxinHu, npdoty, PetrPenzin, RachelYager, RafaelCintron, Sam, SingpilShin, TakioYamaoka, Wanming, WanmingLin, weiler, ZoltanKis
Anssi, anssik, dom

Meeting minutes

Anssi: [reviews agenda]

Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0014/WebNN_ML_JS_Framework_Performance.pdf

ML JS framework performance, focus areas for WebNN

[ Slide 1 ]

Ningxin: we'll be reviewing performance of ML frameworks with WASM to investigate the integration of WebNN

[ Slide 2 ]

Ningxin: 3 frameworks we've investigated: ONNX Runetime Web, TF Lite Web, OpenCV.js

[ Slide 3 ]

Ningxin: for each framework, I'll talk about how we integrated WebNN and then talk about the prototype to support WebNN
… and then the tools to collect performance number
… and finally review the said numbers
… ONNX Runtime has a mechanism called execution provider to allocate specific nodes or subgraphs in memory to execute by a specific library
… they have a CPU backend by default, a GPU execution provider (EP), DirectML EP, and others
… this architecture is compled to Web via emscripten
… with WASM, it only supports CPU

[ Slide 4 ]

Ningxin: the WASM module is compiled from the C++ code base
… they also have WebGL engine
… we have prototyped the addition of a WebNN execution provider
… it's written in C++, coming from the WebNN-native project
… (which maps to the WebIDL definition)
… it supports 14 ONNX ops

webnn-native GitHub repo

Ningxin: when an ops is not supported, it fallbacks to the default CPU EP
… this allows to use WebNN + CPU EPs to run a full graph
… for instance, in our test, our WebNN EP didn't support Softmax, so it fallbacks to WASM in that case
… we compiled this with a customized emscripten
… everything is available on github

[ Slide 5 ]

Ningxin: we used the ONNX Runtime Web Demo to collect benchmark data
… we want to evaluate the performance of WebNN by accessing the native ML APis
… we did that with a node.js add served in an electron.js app
… here we used DirectML for GPU device for WebNN, and OpenVINO for the CPU device

[ Slide 6 ]

Ningxin: with that framework, we compare performance across devices

<Zakim> anssik, you wanted to ask a question

Ningxin: the charts show a great speedup compared to the baseline of WASM+SIMD
… we also tested with SharedArrayBuffer enabled which already gives an improvement over the baseline
… but with WebNN native, we get e.g. a 9x speedup with squeezenet
… WebNN is almost on par with the ONNX native execution provider
… we get similar results on the GPU device, with WebGL being the baseline
… with again WebNN being on par with the native EP

[ Slide 7 ]

Ningxin: a similar review of what we did with TensorFlow that has a similar mechanism

[ Slide 8 ]

Ningxin: in TF, this is known as a delegate
… by default, TF Lite Web uses WASM
… there again, we added support for WebNN

[ Slide 9 ]

Ningxin: we collected data on the TF Lite demo based on Open Vino on a linux laption

[ Slide 10 ]

Ningxin: the chart shows the results, with WASM+SIMD as baseline
… we can't do a device-by-device comparison since TF Lite doesn't have a GPU backend at the moment
… without going into details, the WebNN delegate performance there again is pretty close to Native Delegate

[ Slide 11 ]

Ningxin: we got similar results for OpenCV.js

[ Slide 14 ]

Ningxin: we did note a significant gap running with GoogleNet between WebNN & OpenVINO
… this because it needs to fallback on WASM for certain operations

[ Slide 15 ]

Ningxin: we've been working incorporating with frameworks in the past few months
… falling back to default backend has proved a good way to allow progressive enhancement
… Separating build & compute has also worked out well to map to the frameworks
… The Sync API has proved quite important - the framework codebase are C++ based and mostly sync
… to mitigate the concerns about blocking the main thread, that sync api might be moved to a worker
… The design to produce results in standard layout in preallocated output buffer is also working well to avoid memory copy & conversion
… Fused operators have proved to give good performance thanks to the graph optimizers

[ Slide 16 ]

Ningxin: with this electron/Node.Js implementation, we're getting good results and is helping us to reach native performance
… and should help reach the performance of browser implementation
… We would be happy to see WebNN API implemented in browser, we can help with adapting the JS frameworks to add WebNN as a backend
… WebNN also needs to be added to ecmscripten

<Zakim> anssik, you wanted to ask about Electron.js vs browser security sandbox overhead expectations

Anssi: do you have an estimate of the performance penalty of the browser security sandbox?
… that wouldn't show with electron.js

Ningxin: we had some conversations with the ChromeOS team as they're prototyping the model loader API, incl with the browser security model
… for the compute/inference part, that should be similar to WebNN
… so I asked them and they indicated a pretty small overhead
… so we should still be getting performance close to native

Jonathan: we're still early in our evaluation of what security overhead will be needed with WebNN
… it will probably vary across hardware and drivers
… hard to evaluate the performance penalty at the moment
… maybe WebGPU can help shed some light on this

Corentin: for WebGPU, a bunch of the performance overhead comes from securing shaders
… but that probably doesn't apply to the context of WebNN computation
… there may be overhead in getting data from JS to the model runner and back to JS

Rafael: How does WebNN CPU backend prototype compare to ONNXRuntime's native CPU backend?

Ningxin: we chose to use the same OpenVINO backend of the native backends in ONNX
… to help with comparison
… but WebNN has other backends that could bring different results

Rafael: +1 to Corentin on WebGPU performance - the bottleneck is mostly in the CPU crossing the JS barrier
… this may be minimal when the data starts on the GPU, e.g. with camera input

Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0015/Integrate_WebNN-native_into_Chromium_TPAC.pdf

Integrating an open-source cross-platform implementation of the Web Neural Network API into a web engine

<RafaelCintron> Question I asked: How does WebNN CPU backend prototype compare to ONNXRuntime's native CPU backend?

[ Slide 1 ]

Junwei: presenting on implementing WebNN in chromium

[ Slide 2 ]

WebNN implementation in Chromium Design Doc

Standalone native implementation of the Web Neural Network API

[ Slide 3 ]

Junwei: WebNN allows to access hardware acceleration from browsers, with a set of based operations e.g. conv2d
… browsers need to implement WebNN by plugging in native ML APIs to access hardware acceleration
… e.g. DirectML accesses GPU

[ Slide 4 ]

Junwei: the WebNN Execution Model is based on an MLContext with a device target (cpu or gpu)
… which gives a way to create an MLGraphBuilder
… to build a graph that is then compiled and can be used to compute named inputs and outputs into buffer (CPU or GPU)

[ Slide 5 ]

Junwei: the WebNN-native architecture builds on the Dawn project
… WebNN-native uses a C API based on a 1:1 mapping of the WebIDL

[ Slide 6 ]

Petr: Late comment on the previous presentation (ML JS framework performance) - I believe that Wasm baseline should be running with all its security checks enabled, therefore with full Web sandbox overhead we can probably expect WebNN/Wasm ratio to reduce a bit.

[ Slide 7 ]

[ Slide 8 ]

Junwei: WebNN is aligned with WebGPU implementation by building on top of Dawn
… this allows to share buffers with WebGPU, with the same security mechanism
… we're calling for review on the design document
… there are still questions about WebGPU interoperability

[ Slide 9 ]

Junwei: we're planning to implement WebNN in Chromium based on interations on the design doc
… and then follow the Chromium process, starting with ChromeStatus entry
… not sure if that should be under "new feature incubation" or "implementation of existing standard"
… we're looking for mentors to help us through the process

<kangz> q: has there been an investigation on how to integrate on Firefox / WebKit as well?

Corentin: very thorough explanation of integrating webnn-native in chromium - any similar investigation for firefox and webkit?

Ningxin: we have mostly experience with Chromium but we would also welcome mentors and contributions from other engines

Rachel: how does this play with yesterday's presentation on the ONNX framework?

Ningxin: we investigated the Web version of the ONNX runtime which is a WASM compiled version of ONNX runtime

Privacy and security discussion continued

Jonathan: the model loader API is a complementary API to WebNN - WebNN is focused on supporting ML framework on JS

Model Loader API explainer

Jonathan: the model loader API allows to pass models from JS directly and the underlying implementation takes care of running it
… Model loader could be layered on top of WebNN or use a different backend
… ChromeOS is exploring the model loader API with a focus on getting the security to work
… they're looking toward an origin trial in 2022, hopefully 1st half of the year
… we're working with Ningxin and others to align the APIs with WebNN (e.g. shared namespace, shared input/output)
… ideally we would end up sharing implementation code
… we would also like to be able to run performance comparisons at some point
… we might be able to eek out some extra performance from the model approach

Anssi: is it easier to secure the model loader API?

Jonathan: in the short term, yes, if you ignore performance
… one path could be to simply limit model loader API to WASM which is already hardened - but then you get no perf benefit
… we're exploring another path but still CPU-only
… to get the performance, we'll need to get into the more challenging spaces with hardware integration

weiler: I don't think I've heard enough details to evaluate anything from a privacy perspective at the moment

ningxin_hu: junwei mentioned it in the design doc - WebNN targets multiple device backends (CPU, GPU, specialized accelerators)

<npdoty> I'm also curious about the DRM proposals (as mentioned in the Zoom chat)

ningxin_hu: with WASM we have the CPU sandbox, with WebGPU, a GPU sandbox
… I was wondering how to do with WebNN, esp when considering new specialized accelerators

<Zakim> Mingqiu, you wanted to ask what mechanism do you propose to protect ML models?

<npdoty> npdoty: concerned about DRM or protection of models, because users won't have the ability to inspect the code that the machine that is running. losses of transparency and protection against biases

<npdoty> anssik: ethical issues to be discussed tomorrow

Minutes manually created (not a transcript), formatted by scribe.perl version 149 (Tue Oct 12 21:11:27 2021 UTC).