Meeting minutes
Anssi: [reviews agenda]
ML JS framework performance, focus areas for WebNN
Ningxin: we'll be reviewing performance of ML frameworks with WASM to investigate the integration of WebNN
Ningxin: 3 frameworks we've investigated: ONNX Runetime Web, TF Lite Web, OpenCV.js
Ningxin: for each framework, I'll talk about how we integrated WebNN and then talk about the prototype to support WebNN
… and then the tools to collect performance number
… and finally review the said numbers
… ONNX Runtime has a mechanism called execution provider to allocate specific nodes or subgraphs in memory to execute by a specific library
… they have a CPU backend by default, a GPU execution provider (EP), DirectML EP, and others
… this architecture is compled to Web via emscripten
… with WASM, it only supports CPU
Ningxin: the WASM module is compiled from the C++ code base
… they also have WebGL engine
… we have prototyped the addition of a WebNN execution provider
… it's written in C++, coming from the WebNN-native project
… (which maps to the WebIDL definition)
… it supports 14 ONNX ops
Ningxin: when an ops is not supported, it fallbacks to the default CPU EP
… this allows to use WebNN + CPU EPs to run a full graph
… for instance, in our test, our WebNN EP didn't support Softmax, so it fallbacks to WASM in that case
… we compiled this with a customized emscripten
… everything is available on github
Ningxin: we used the ONNX Runtime Web Demo to collect benchmark data
… we want to evaluate the performance of WebNN by accessing the native ML APis
… we did that with a node.js add served in an electron.js app
… here we used DirectML for GPU device for WebNN, and OpenVINO for the CPU device
Ningxin: with that framework, we compare performance across devices
<Zakim> anssik, you wanted to ask a question
Ningxin: the charts show a great speedup compared to the baseline of WASM+SIMD
… we also tested with SharedArrayBuffer enabled which already gives an improvement over the baseline
… but with WebNN native, we get e.g. a 9x speedup with squeezenet
… WebNN is almost on par with the ONNX native execution provider
… we get similar results on the GPU device, with WebGL being the baseline
… with again WebNN being on par with the native EP
Ningxin: a similar review of what we did with TensorFlow that has a similar mechanism
Ningxin: in TF, this is known as a delegate
… by default, TF Lite Web uses WASM
… there again, we added support for WebNN
Ningxin: we collected data on the TF Lite demo based on Open Vino on a linux laption
Ningxin: the chart shows the results, with WASM+SIMD as baseline
… we can't do a device-by-device comparison since TF Lite doesn't have a GPU backend at the moment
… without going into details, the WebNN delegate performance there again is pretty close to Native Delegate
Ningxin: we got similar results for OpenCV.js
Ningxin: we did note a significant gap running with GoogleNet between WebNN & OpenVINO
… this because it needs to fallback on WASM for certain operations
Ningxin: we've been working incorporating with frameworks in the past few months
… falling back to default backend has proved a good way to allow progressive enhancement
… Separating build & compute has also worked out well to map to the frameworks
… The Sync API has proved quite important - the framework codebase are C++ based and mostly sync
… to mitigate the concerns about blocking the main thread, that sync api might be moved to a worker
… The design to produce results in standard layout in preallocated output buffer is also working well to avoid memory copy & conversion
… Fused operators have proved to give good performance thanks to the graph optimizers
Ningxin: with this electron/Node.Js implementation, we're getting good results and is helping us to reach native performance
… and should help reach the performance of browser implementation
… We would be happy to see WebNN API implemented in browser, we can help with adapting the JS frameworks to add WebNN as a backend
… WebNN also needs to be added to ecmscripten
<Zakim> anssik, you wanted to ask about Electron.js vs browser security sandbox overhead expectations
Anssi: do you have an estimate of the performance penalty of the browser security sandbox?
… that wouldn't show with electron.js
Ningxin: we had some conversations with the ChromeOS team as they're prototyping the model loader API, incl with the browser security model
… for the compute/inference part, that should be similar to WebNN
… so I asked them and they indicated a pretty small overhead
… so we should still be getting performance close to native
Jonathan: we're still early in our evaluation of what security overhead will be needed with WebNN
… it will probably vary across hardware and drivers
… hard to evaluate the performance penalty at the moment
… maybe WebGPU can help shed some light on this
Corentin: for WebGPU, a bunch of the performance overhead comes from securing shaders
… but that probably doesn't apply to the context of WebNN computation
… there may be overhead in getting data from JS to the model runner and back to JS
Rafael: How does WebNN CPU backend prototype compare to ONNXRuntime's native CPU backend?
Ningxin: we chose to use the same OpenVINO backend of the native backends in ONNX
… to help with comparison
… but WebNN has other backends that could bring different results
Rafael: +1 to Corentin on WebGPU performance - the bottleneck is mostly in the CPU crossing the JS barrier
… this may be minimal when the data starts on the GPU, e.g. with camera input
Integrating an open-source cross-platform implementation of the Web Neural Network API into a web engine
<RafaelCintron> Question I asked: How does WebNN CPU backend prototype compare to ONNXRuntime's native CPU backend?
Junwei: presenting on implementing WebNN in chromium
WebNN implementation in Chromium Design Doc
Standalone native implementation of the Web Neural Network API
Junwei: WebNN allows to access hardware acceleration from browsers, with a set of based operations e.g. conv2d
… browsers need to implement WebNN by plugging in native ML APIs to access hardware acceleration
… e.g. DirectML accesses GPU
Junwei: the WebNN Execution Model is based on an MLContext with a device target (cpu or gpu)
… which gives a way to create an MLGraphBuilder
… to build a graph that is then compiled and can be used to compute named inputs and outputs into buffer (CPU or GPU)
Junwei: the WebNN-native architecture builds on the Dawn project
… WebNN-native uses a C API based on a 1:1 mapping of the WebIDL
Petr: Late comment on the previous presentation (ML JS framework performance) - I believe that Wasm baseline should be running with all its security checks enabled, therefore with full Web sandbox overhead we can probably expect WebNN/Wasm ratio to reduce a bit.
Junwei: WebNN is aligned with WebGPU implementation by building on top of Dawn
… this allows to share buffers with WebGPU, with the same security mechanism
… we're calling for review on the design document
… there are still questions about WebGPU interoperability
Junwei: we're planning to implement WebNN in Chromium based on interations on the design doc
… and then follow the Chromium process, starting with ChromeStatus entry
… not sure if that should be under "new feature incubation" or "implementation of existing standard"
… we're looking for mentors to help us through the process
<kangz> q: has there been an investigation on how to integrate on Firefox / WebKit as well?
Corentin: very thorough explanation of integrating webnn-native in chromium - any similar investigation for firefox and webkit?
Ningxin: we have mostly experience with Chromium but we would also welcome mentors and contributions from other engines
Rachel: how does this play with yesterday's presentation on the ONNX framework?
Ningxin: we investigated the Web version of the ONNX runtime which is a WASM compiled version of ONNX runtime
Privacy and security discussion continued
Jonathan: the model loader API is a complementary API to WebNN - WebNN is focused on supporting ML framework on JS
Jonathan: the model loader API allows to pass models from JS directly and the underlying implementation takes care of running it
… Model loader could be layered on top of WebNN or use a different backend
… ChromeOS is exploring the model loader API with a focus on getting the security to work
… they're looking toward an origin trial in 2022, hopefully 1st half of the year
… we're working with Ningxin and others to align the APIs with WebNN (e.g. shared namespace, shared input/output)
… ideally we would end up sharing implementation code
… we would also like to be able to run performance comparisons at some point
… we might be able to eek out some extra performance from the model approach
Anssi: is it easier to secure the model loader API?
Jonathan: in the short term, yes, if you ignore performance
… one path could be to simply limit model loader API to WASM which is already hardened - but then you get no perf benefit
… we're exploring another path but still CPU-only
… to get the performance, we'll need to get into the more challenging spaces with hardware integration
weiler: I don't think I've heard enough details to evaluate anything from a privacy perspective at the moment
ningxin_hu: junwei mentioned it in the design doc - WebNN targets multiple device backends (CPU, GPU, specialized accelerators)
<npdoty> I'm also curious about the DRM proposals (as mentioned in the Zoom chat)
ningxin_hu: with WASM we have the CPU sandbox, with WebGPU, a GPU sandbox
… I was wondering how to do with WebNN, esp when considering new specialized accelerators
<Zakim> Mingqiu, you wanted to ask what mechanism do you propose to protect ML models?
<npdoty> npdoty: concerned about DRM or protection of models, because users won't have the ability to inspect the code that the machine that is running. losses of transparency and protection against biases
<npdoty> anssik: ethical issues to be discussed tomorrow