WebML CG Teleconference – 18 March 2021

Meeting minutes

webnn-native update

anssik: "The initial implementation supports 10 first-wave ops required by LeNet example including add, averagePool2d, conv2d, matmul, maxPool2d, mul, relu, reshape, softmax and reshape."

anssik: Ningxin to share an update and plans
… raise any topics the group should weight in on

ningxin_hu: Dawn project code generator reused by webnn-native
… WebNN C/C++ headers generated by Dawn code generator from webnn.json and generator/templates
… The webnn.h: the C header (generated as out/Release/gen/src/include/webnn/webnn.h after build)
… A C++ wrapper for the webnn.h (generated as out/Release/gen/src/include/webnn/webnn_cpp.h after build)
… designed to keep up with the spec changes more easily
… infrastructure also generated such as base objects and interface classes
… second part, backend implementations:
… DirectML on Windows 10 (under src/webnn_native/dml/)
… OpenVINO on Windows 10 and Linux (under src/webnn_native/openvino/)
… The unit and end2end tests cover the 10 first-wave ops (under src/tests/)
… the lastly, there's LeNet C++ example that is equivalent to webmachinelearning/webnn-samples/lenet (under src/examples/LeNet)

ningxin_hu: Apache 2.0 licensed
… 3rd party dependencies: The code generator and infrastructure code of Dawn project and The DirectMLX and device wrapper of DirectML project
… if Chai and Rafael could share their comments it'd be welcome

anssik: any blockers to land the PR?

ningxin_hu: for DML usage, would like to get Chai's approval for that

Chai: have been looking at this on and off, it is a lot of work
… will look at this a bit more

ningxin_hu: the implementation doesn't reflect the very latest spec, but will catch up
… the webnn-native high-level goals: 1) inform the WebNN API work and group about op compatibility and 2) provide a performance benchmark

RafaelCintron: what is the long-term maintenance story?

ningxin_hu: I will continue work on this project in addition to the spec work, I commit to maintain the project, cannot speak for other contributors of course

wonsuk__: a questions about the plan? This is using C++, mostly focused on desktop and embedded systems, right?
… any plan for mobile implementations Android, iOS?

ningxin_hu: from the API perspective we use C/C++ to make this standalone so no runtime or app framework dependency
… I don't think a mobile app usage is our primary goal, but nothing prevents integration in such environments

ningxin_hu: Ping asked an important questions about Wasm usage
… if we need to continue Wasm usage investigation, we need some tooling for EMScripten
… that is missing today, but if we want to continue Wasm investigation this project could be used to fill in that gap

Ping: our thinking is, if this can be available, usage in Wasm world similar to WebGPU bindings (ed. note: line was breaking, scribe struggles)

https://github.com/webmachinelearning/webnn-native/

Operation-specific APIs

anssik: Ping and Jonathan shared Operation-specific APIs proposal use cases and requirements

Operation-specific APIs proposal

anssik: I'd like us to unpack this proposal and discuss whether the high-level requirements from the proposal could be translated into WebNN API requirements and whether there's support for that design direction
… In the agenda I enumerated some requirements derived from discussion in PR #149
… Req: Direct access to the convolution op
… Addressed by: https://github.com/webmachinelearning/webnn/pull/149#discussion_r591634607
… correct?

ningxin_hu: my point is, if we use WebNN graph API developers can create a single-op graph
… and thanks to Chai's PR, the steps to create a single-op graph reduces to two steps, that could map to Ping's requirement in the Operation-specific APIs proposal

Ping: I think it looks fine to use graph for this, one concern is performance?
… we can use both graph API and op API for fusion
… the only concern is performance, if there's overhead with graph API

Chai: I would expect the user of the op-specific convolution would go though the process of compilation before using it, it happens in the framework anyway, e.g. conv+relu
… you'd create a subgraph for conv+relu and compile it ahead of time, this is something should already happen in any framework when they load the model
… by the time you run them you call into any native convolution API, this is happening in WinML, TF, pretty common step
… to execute and op immediately is essentially a combination of the prior steps including compilation, but it is not expected to do every time
… I'm not concerned of performance

<ping_yu> question is the caching is done at the framework level or webnn level?

Chai: relative to native

Chai: I think it should be done on the framework level, because the graph has no notion of internal caching
… there's a GraphBuilder object that has context for graph building and compilation, can be managed by the user
… the lifetime of that builder belongs to the framework, only for building, not for executing it

<ping_yu> for example WebGL has shader caching, I think WebNN might want to create a low level caching for compiled graph.

Chai: the MLGraph, compiled graph, is something the framework would hang on to
… maybe conv+relu, of just conv the framework should keep them, when executing though this just invoke it, similar to a native API

Ping: thanks for the explanation
… I think WebGL has its own shader caching, we try to hide those caches

Chai: from the implementation standpoint, at compile step going from platform to the driver, interrogating what the driver supports
… DML would then create this pathway to reach the hardware block
… not worries about performance, because it is going to be the same as with native, unless your browser implementation does something inefficient
… the spec allows for an efficient implementation, so the WebNN API spec does not constrain performance
… from the API semantic point of view, this is exactly for what it is designed for

<ping_yu> sounds good

anssik: ... Req: Support for native tensor types, GPU buffers

Addressed by https://github.com/webmachinelearning/webnn/pull/149 for GPU buffers
… what is missing? Is this a reasonable req?

Ping: The question is, whether we should keep the device specific graph, or device agnostic graph compiled to a specific device?
… Req: Device preference setting when compiling a graph to reduce IO between accelerators

ningxin_hu: I asked this question on our last meeting
… in PR discussion Chai mentioned a use case
… to create a device specific graph that allows constants to be loaded from GPU or device buffers
… in the PR discussions my question was resolved
… looking at the latest spec, a device specific graph can be created

Chai: the last outstanding issue is around how the caller of this API using this as an op API can resource upload and download(?)
… it should already support it the way the PR is written, if the context is created by an existing device and input and output is submitted as device resources, by definition the caller of this API is responsible for manually downloading back to CPU memory
… I haven't explained this well in the spec, but I want to improve this and add more text to amend the PR #149
… if the caller asks WebNN to create a context for their own device, their responsible

ningxin_hu: I have no question about GPU interops, but question about the use case for the op-specific API and Wasm CPU scenario
… in the latest API you can only interact with CPU buffers for Wasm
… that way if the device is CPU, the implementation will use some optimized memory layout for hardware acceleration
… if the user code uses Wasm and ArrayBuffer that'll result in every input and output to be relayout
… not optimal for executing multiple ops and without access to intermediate results

<Chai__> maybe we should open an issue to track WASM interop scenario

Ping: my question is for Wasm, typically if the CPU is supposed to always bring back the value of the op or is it a handle for the tensor?

<Chai__> i'd like to learn more, maybe there is something we can do to improve it further

anssik: I'd propose to spin the relevant high-level requirements for WebNN API into their own issues to be discussed and resolved one by one

<ningxin_hu> I'll create an issue

RafaelCintron: wanted to say, one thing we need to clarify at which point the web developers weights are used in the compilation
… if they give us an ArrayBuffer, change it in between, need to specify when that works
… be more explicit if compilation means we copy things

<Chai__> +1 on rafael

anssik: Rafael to open an issue on that

anssik: Req: Optimization for single-op graph execution

<ping_yu> yes, I agree graph API can address the op level requirement

anssik: ping are you fine with a subset of WebNN API satisfying the reqs of op-specific APIs?

Ping: SGTM

anssik: I propose we keep https://github.com/webmachinelearning/proposals/issues/2 open until we have addressed all the reqs in the WebNN API

– DRAFT –
WebML CG Teleconference – 18 March 2021

18 March 2021

Attendees

Meeting minutes

webnn-native update

Operation-specific APIs

Adjourn

Diagnostics