14:57:22 <RRSAgent> RRSAgent has joined #webmachinelearning
14:57:22 <RRSAgent> logging to https://www.w3.org/2021/03/18-webmachinelearning-irc
14:57:24 <anssik> Chair: Anssi
14:57:25 <Zakim> RRSAgent, make logs Public
14:57:25 <Zakim> please title this meeting ("meeting: ..."), anssik
14:57:35 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-03-18-agenda.md
14:57:45 <anssik> Meeting: WebML CG Teleconference – 18 March 2021
14:57:58 <anssik> Scribe: Anssi
14:58:03 <anssik> scribeNick: anssik
14:58:10 <anssik> Present+ Anssi_Kostiainen
14:58:17 <anssik> RRSAgent, draft minutes v2
14:58:17 <RRSAgent> I have made the request to generate https://www.w3.org/2021/03/18-webmachinelearning-minutes.html anssik
15:00:12 <ningxin_hu> ningxin_hu has joined #webmachinelearning
15:00:37 <anssik> Present+ Ningxin_Hu
15:00:57 <anssik> Present+ Chai_Chaoweeraprasit
15:03:15 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:03:38 <anssik> Present+ Rafael_Cintron
15:04:28 <anssik> RRSAgent, draft minutes v2
15:04:28 <RRSAgent> I have made the request to generate https://www.w3.org/2021/03/18-webmachinelearning-minutes.html anssik
15:04:42 <anssik> Topic: webnn-native update
15:04:52 <anssik> -> https://github.com/webmachinelearning/webnn-native/pull/1 webnn-native initial implementation
15:05:14 <chai> chai has joined #webmachinelearning
15:05:35 <anssik> Present+ Ganesan_Ramalingam
15:05:44 <anssik> anssik: "The initial implementation supports 10 first-wave ops required by LeNet example including add, averagePool2d, conv2d, matmul, maxPool2d, mul, relu, reshape, softmax and reshape."
15:05:48 <anssik> anssik: Ningxin to share an update and plans
15:05:50 <wonsuk_> wonsuk_ has joined #webmachinelearning
15:05:51 <anssik> ... raise any topics the group should weight in on
15:05:56 <Rama> Rama has joined #webmachinelearning
15:05:59 <anssik> Present+ Wonsuk_Lee
15:06:26 <anssik> ningxin_hu: Dawn project code generator reused by webnn-native
15:06:51 <Chai__> Chai__ has joined #webmachinelearning
15:07:10 <anssik> ... WebNN C/C++ headers generated by Dawn code generator from webnn.json and generator/templates
15:07:23 <anssik> ... The webnn.h: the C header (generated as out/Release/gen/src/include/webnn/webnn.h after build)
15:07:29 <anssik> ... A C++ wrapper for the webnn.h (generated as out/Release/gen/src/include/webnn/webnn_cpp.h after build)
15:07:40 <anssik> ... designed to keep up with the spec changes more easily
15:07:58 <anssik> ... infrastructure also generated such as base objects and interface classes
15:07:59 <chai___> chai___ has joined #webmachinelearning
15:08:27 <wonsuk__> wonsuk__ has joined #webmachinelearning
15:08:33 <anssik> ... second part, backend implementations:
15:08:42 <anssik> ... DirectML on Windows 10 (under src/webnn_native/dml/)
15:08:53 <anssik> ... OpenVINO on Windows 10 and Linux (under src/webnn_native/openvino/)
15:09:10 <anssik> ... The unit and end2end tests cover the 10 first-wave ops (under src/tests/)
15:09:24 <anssik> ... the lastly, there's LeNet C++ example that is equivalent to webmachinelearning/webnn-samples/lenet (under src/examples/LeNet)
15:09:49 <anssik> RRSAgent, draft minutes v2
15:09:49 <RRSAgent> I have made the request to generate https://www.w3.org/2021/03/18-webmachinelearning-minutes.html anssik
15:12:19 <anssik> ningxin_hu: Apache 2.0 licensed
15:12:54 <anssik> ... 3rd party dependencies: The code generator and infrastructure code of Dawn project and The DirectMLX and device wrapper of DirectML project
15:13:34 <anssik> ... if Chai and Rafael could share their comments it'd be welcome
15:13:47 <chai___> present+
15:13:55 <anssik> q?
15:14:27 <ping_yu> ping_yu has joined #webmachinelearning
15:14:59 <anssik> anssik: any blockers to land the PR?
15:15:21 <anssik> ningxin_hu: for DML usage, would like to get Chai's approval for that
15:15:30 <anssik> Chai: have been looking at this on and off, it is a lot of work
15:15:46 <anssik> ... will look at this a bit more
15:15:55 <anssik> Present+ Ping_Yu
15:16:10 <RafaelCintron> q+
15:16:52 <anssik> ningxin_hu: the implementation doesn't reflect the very latest spec, but will catch up
15:17:48 <anssik> ... the webnn-native high-level goals: 1) inform the WebNN API work and group about op compatibility and 2) provide a performance benchmark
15:17:57 <anssik> q?
15:18:00 <anssik> ack RafaelCintron
15:18:20 <anssik> RafaelCintron: what is the long-term maintenance story?
15:19:16 <wonsuk__> q+
15:19:27 <anssik> ningxin_hu: I will continue work on this project in addition to the spec work, I commit to maintain the project, cannot speak for other contributors of course
15:19:30 <anssik> q?
15:19:59 <anssik> ack wonsuk__
15:20:30 <anssik> wonsuk__: a questions about the plan? This is using C++, mostly focused on desktop and embedded systems, right?
15:20:44 <anssik> ... any plan for mobile implementations Android, iOS?
15:21:08 <anssik> ningxin_hu: from the API perspective we use C/C++ to make this standalone so no runtime or app framework dependency
15:22:24 <anssik> ... I don't think a mobile app usage is our primary goal, but nothing prevents integration in such environments
15:22:28 <anssik> q?
15:23:10 <anssik> ningxin_hu: Ping asked an important questions about Wasm usage
15:23:30 <anssik> ... if we need to continue Wasm usage investigation, we need some tooling for EMScripten
15:23:53 <anssik> ... that is missing today, but if we want to continue Wasm investigation this project could be used to fill in that gap
15:24:44 <anssik> Ping: our thinking is, if this can be available, usage in Wasm world similar to WebGPU bindings (ed. note: line was breaking, scribe struggles)
15:25:02 <anssik> q?
15:25:26 <anssik> https://github.com/webmachinelearning/webnn-native/
15:25:41 <anssik> Topic: Operation-specific APIs
15:25:52 <anssik> anssik: Ping and Jonathan shared Operation-specific APIs proposal use cases and requirements
15:25:57 <anssik> -> https://github.com/webmachinelearning/proposals/issues/2 Operation-specific APIs proposal
15:26:18 <anssik> anssik: I'd like us to unpack this proposal and discuss whether the high-level requirements from the proposal could be translated into WebNN API requirements and whether there's support for that design direction
15:26:35 <anssik> ... In the agenda I enumerated some requirements derived from discussion in PR #149
15:26:48 <anssik> ... Req: Direct access to the convolution op
15:26:55 <anssik> ... Addressed by: https://github.com/webmachinelearning/webnn/pull/149#discussion_r591634607
15:27:34 <anssik> ... correct?
15:27:35 <anssik> q?
15:27:42 <ningxin_hu> q+
15:27:46 <anssik> ack ningxin_hu
15:28:07 <anssik> ningxin_hu: my point is, if we use WebNN graph API developers can create a single-op graph
15:28:37 <anssik> ... and thanks to Chai's PR, the steps to create a single-op graph reduces to two steps, that could map to Ping's requirement in the Operation-specific APIs proposal
15:30:35 <anssik> Ping: I think it looks fine to use graph for this, one concern is performance?
15:31:13 <anssik> ... we can use both graph API and op API for fusion
15:31:34 <anssik> ... the only concern is performance, if there's overhead with graph API
15:32:03 <Chai__> q+
15:32:08 <anssik> ack Chai__
15:32:59 <anssik> Chai: I would expect the user of the op-specific convolution would go though the process of compilation before using it, it happens in the framework anyway, e.g. conv+relu
15:33:24 <anssik> ... you'd create a subgraph for conv+relu and compile it ahead of time, this is something should already happen in any framework when they load the model
15:34:11 <anssik> ... by the time you run them you call into any native convolution API, this is happening in WinML, TF, pretty common step
15:34:41 <anssik> ... to execute and op immediately is essentially a combination of the prior steps including compilation, but it is not expected to do every time
15:34:55 <anssik> ... I'm not concerned of performance
15:35:03 <ping_yu> question is the caching is done at the framework level or webnn level?
15:35:06 <anssik> ... relative to native
15:35:40 <anssik> Chai: I think it should be done on the framework level, because the graph has no notion of internal caching
15:36:00 <anssik> ... there's a GraphBuilder object that has context for graph building and compilation, can be managed by the user
15:36:14 <anssik> ... the lifetime of that builder belongs to the framework, only for building, not for executing it
15:36:22 <ping_yu> for example WebGL has shader caching, I think WebNN might want to create a low level caching for compiled graph.
15:36:35 <anssik> ... the MLGraph, compiled graph, is something the framework would hang on to
15:37:02 <anssik> ... maybe conv+relu, of just conv the framework should keep them, when executing though this just invoke it, similar to a native API
15:37:33 <anssik> Ping: thanks for the explanation
15:37:52 <anssik> ... I think WebGL has its own shader caching, we try to hide those caches
15:39:28 <anssik> Chai: from the implementation standpoint, at compile step going from platform to the driver, interrogating what the driver supports
15:39:47 <anssik> ... DML would then create this pathway to reach the hardware block
15:40:54 <anssik> ... not worries about performance, because it is going to be the same as with native, unless your browser implementation does something inefficient
15:41:19 <anssik> ... the spec allows for an efficient implementation, so the WebNN API spec does not constrain performance
15:41:39 <anssik> ... from the API semantic point of view, this is exactly for what it is designed for
15:41:45 <ping_yu> sounds good
15:42:33 <anssik> anssik: ... Req: Support for native tensor types, GPU buffers
15:42:41 <anssik> Addressed by https://github.com/webmachinelearning/webnn/pull/149 for GPU buffers
15:43:26 <anssik> ... what is missing? Is this a reasonable req?
15:45:28 <anssik> Ping: The question is, whether we should keep the device specific graph, or device agnostic graph compiled to a specific device?
15:45:44 <anssik> ... Req: Device preference setting when compiling a graph to reduce IO between accelerators
15:45:59 <ningxin_hu> q+
15:46:04 <anssik> ack ningxin_hu
15:46:16 <anssik> ningxin_hu: I asked this question on our last meeting
15:46:24 <anssik> ... in PR discussion Chai mentioned a use case
15:46:42 <anssik> ... to create a device specific graph that allows constants to be loaded from GPU or device buffers
15:46:49 <anssik> ... in the PR discussions my question was resolved
15:47:05 <anssik> ... looking at the latest spec, a device specific graph can be created
15:47:06 <anssik> q?
15:47:52 <anssik> Chai: the last outstanding issue is around how the caller of this API using this as an op API can resource upload and download(?)
15:48:36 <anssik> ... it should already support it the way the PR is written, if the context is created by an existing device and input and output is submitted as device resources, by definition the caller of this API is responsible for manually downloading back to CPU memory
15:49:03 <anssik> ... I haven't explained this well in the spec, but I want to improve this and add more text to amend the PR #149
15:49:24 <anssik> ... if the caller asks WebNN to create a context for their own device, their responsible
15:50:14 <ningxin_hu> q+
15:50:17 <anssik> q?
15:50:27 <anssik> ack ningxin_hu
15:51:00 <anssik> ningxin_hu: I have no question about GPU interops, but question about the use case for the op-specific API and Wasm CPU scenario
15:51:20 <anssik> ... in the latest API you can only interact with CPU buffers for Wasm
15:51:42 <anssik> ... that way if the device is CPU, the implementation will use some optimized memory layout for hardware acceleration
15:52:00 <anssik> ... if the user code uses Wasm and ArrayBuffer that'll result in every input and output to be relayout
15:52:21 <anssik> ... not optimal for executing multiple ops and without access to intermediate results
15:53:42 <Chai__> maybe we should open an issue to track WASM interop scenario
15:53:44 <anssik> Ping: my question is for Wasm, typically if the CPU is supposed to always bring back the value of the op or is it a handle for the tensor?
15:53:59 <Chai__> i'd like to learn more, maybe there is something we can do to improve it further
15:55:00 <anssik> anssik: I'd propose to spin the relevant high-level requirements for WebNN API into their own issues to be discussed and resolved one by one
15:55:02 <ningxin_hu> I'll create an issue
15:55:16 <RafaelCintron> q+
15:55:21 <anssik> ack RafaelCintron
15:55:53 <anssik> RafaelCintron: wanted to say, one thing we need to clarify at which point the web developers weights are used in the compilation
15:56:25 <anssik> ... if they give us an ArrayBuffer, change it in between, need to specify when that works
15:56:38 <anssik> ... be more explicit if compilation means we copy things
15:56:59 <Chai__> +1 on rafael
15:57:09 <anssik> anssik: Rafael to open an issue on that
15:57:20 <anssik> anssik: Req: Optimization for single-op graph execution
15:58:47 <ping_yu> yes, I agree graph API can address the op level requirement
15:59:03 <anssik> anssik: ping are you fine with a subset of WebNN API satisfying the reqs of op-specific APIs?
15:59:08 <anssik> Ping: SGTM
15:59:47 <anssik> anssik: I propose we keep https://github.com/webmachinelearning/proposals/issues/2 open until we have addressed all the reqs in the WebNN API
16:00:45 <anssik> RRSAgent, draft minutes v2
16:00:45 <RRSAgent> I have made the request to generate https://www.w3.org/2021/03/18-webmachinelearning-minutes.html anssik
16:00:57 <anssik> RRSAgent, draft minutes v2
16:00:57 <RRSAgent> I have made the request to generate https://www.w3.org/2021/03/18-webmachinelearning-minutes.html anssik
16:01:48 <anssik> Topic: Adjourn
16:01:51 <anssik> RRSAgent, draft minutes v2
16:01:51 <RRSAgent> I have made the request to generate https://www.w3.org/2021/03/18-webmachinelearning-minutes.html anssik
16:38:57 <sitarharel> sitarharel has joined #webmachinelearning
17:59:54 <Zakim> Zakim has left #webmachinelearning