14:59:12 <RRSAgent> RRSAgent has joined #webmachinelearning
14:59:12 <RRSAgent> logging to https://www.w3.org/2019/12/05-webmachinelearning-irc
14:59:17 <Zakim> Zakim has joined #webmachinelearning
14:59:22 <anssik> RRSAgent, make logs public
14:59:28 <anssik> Meeting: WebML CG Teleconference – 5 December 2019
14:59:36 <anssik> Chair: Anssi
14:59:38 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2019-12-05-agenda.md
14:59:42 <ningxinhu> ningxinhu has joined #webmachinelearning
14:59:42 <anssik> Scribe: Anssi
14:59:47 <anssik> scribeNick: anssik
15:00:55 <anssik> Present+ Anssi_Kostiainen, Rafael_Cintron, Ningxin_Hu
15:01:07 <Rafael> Rafael has joined #webmachinelearning
15:01:12 <anssik> RRSAgent, draft minutes v2
15:01:12 <RRSAgent> I have made the request to generate https://www.w3.org/2019/12/05-webmachinelearning-minutes.html anssik
15:02:39 <anssik> Present+ Nikhil_Thorat
15:02:49 <Rama> Rama has joined #webmachinelearning
15:03:13 <anssik> Present+ Ganesan_Ramalingam
15:03:41 <ningxinhu> Present+ Ningxin_Hu
15:05:34 <anssik> TOPIC: Buffer sharing between GPU and ML accelerator
15:05:40 <anssik> -> https://github.com/webmachinelearning/webnn/issues/33 [investigation] buffer sharing between GPU and ML accelerator
15:05:59 <anssik> anssik: on our 3 Oct call the group wanted to understand buffer sharing between GPU and ML accelerators.
15:06:17 <anssik> ... Ningxin took ownership to work with Paul McDaniel to figure out how to approach this problem
15:06:40 <anssik> ... the expectation is to demonstrate feasibility of running expensive ops such as conv2d on dedicated accelerators
15:06:46 <anssik> ... and share buffer to WebGPU compute shader for running custom ops
15:06:59 <anssik> ... Ningxin please give us an update on this investigation and areas where the group could contribute
15:07:09 <jdarpinian> jdarpinian has joined #webmachinelearning
15:07:41 <anssik> Present+ James_Darpinian
15:08:14 <paul_msft> paul_msft has joined #webmachinelearning
15:08:18 <anssik> ningxinhu: status as of today, dependency work, POC on Windows w/ DirectML and Intel's IA and PC devkit
15:08:28 <anssik> Present+ Paul_McDaniel
15:08:59 <anssik> ningxinhu: rebased Chromium 80 for WebGPU D3D12 support
15:09:09 <anssik> ... caused TF.js backend to crash on that version :-/
15:09:29 <anssik> ... my investigation shows lack of read only storage buffer support in Chromium WebGPU on Windows
15:09:50 <anssik> ... workaround to remove readonly declaration in the TF.js preprocessor
15:10:10 <anssik> ... Rafael working on Ch WebGPU on Windows and Nikhil on TF.js backend might be able to hepl
15:10:15 <anssik> s/hepl/help/
15:10:28 <anssik> Rafael: lack of readonly is a known issue, so need to work without it
15:10:41 <anssik> ningxinhu: want to get your confirmation this is a viable workaround
15:10:59 <anssik> Rafael: should be fine, things will just get better when we have readonly proper
15:11:04 <anssik> ... want to hear what you found
15:12:38 <anssik> ningxinhu: ported POC to Windows, DirectML backend for buffer sharing, JS surface shared through WebGPU buffer
15:13:09 <anssik> ... WebGPU backend of D3D12 and WebNN backend on DirectML share buffers via D3D12Resource
15:13:34 <anssik> ... Test1 - conv2d/add/relu (WebGPU): 37.93 ms
15:13:34 <anssik> ... Test2 - conv2d (WebNN) -> ArrayBufferView -> add/relu (WebGPU): 27.04 ms
15:13:34 <anssik> ... Test3 - conv2d (WebNN) -> WebGPUBuffer -> add/relu (WebGPU): 9.18 ms
15:13:34 <anssik> ... Test4 - conv2d/add/relu (WebNN): 7.58 ms
15:14:28 <anssik> ... Test4 all runs all 3 ops on DirectML
15:15:01 <anssik> Rafael: can you clarify what diff is between Test1 and Test3?
15:15:56 <anssik> ningxinhu: Test1 running all on compute shader, Test3 conv2d in WebNN, add/relu WebGPU compute shader
15:16:43 <anssik> ... we want to prove custom ops can be done on WebGPU, optimized ops can be offloaded to WebNN, still efficient interop
15:16:52 <anssik> ... Test3 is the key result
15:17:22 <anssik> Rafael: when you say "WebNN" means "DirectML" for the purposes POC
15:17:27 <anssik> ningxinhu: right
15:17:38 <anssik> s/purposes POC/POC purposes/
15:19:19 <anssik> ningxinhu: question for the group, do we need to support WebGPUBuffer as input and output for WebNN execution?
15:19:49 <anssik> Rafael: I think sharing with WebGPU buffers is one route, or have another notion of a buffer
15:20:12 <anssik> ... implementation can use under the covers D3D 12 or something else
15:20:51 <anssik> ningxinhu: I remember a discussion earlier, you mentioned an idea of some umbrella interface like NNBuffer that's support both CPU and GPU buffer, is it worth exploring?
15:21:18 <anssik> Rafael: you mean a buffer that abstracts out the hardware?
15:22:03 <anssik> ningxinhu: yes, asking if an abstract interface makes sense, can work efficiently
15:22:22 <anssik> Rafeal: either way we can explore, comes down to how much we want to tie our destiny to WebGPU group
15:22:45 <ningxinhu> webmlbuffer: https://github.com/webmachinelearning/webnn/issues/17#issuecomment-519304938
15:22:46 <anssik> ... as long as it's under covers, we can be flexible in the future, if these "buffers" are not doing it on GPU
15:23:38 <anssik> anssik: did you get feedback you wanted?
15:24:26 <anssik> ningxinhu: yes, this area needs more investigation, e.g. buffer abstraction level needs to be looked at
15:25:55 <anssik> ningxinhu: another part of the investigation: leveraged DXCore API to enumerate adapters that support compute-only devices
15:26:41 <anssik> ... implemented polyfill for "low-power" preference, in POC we map that to DirectML backend low-power ML accelerometer enumerating devices via DXCore
15:27:32 <anssik> ... in our experiment we use a small PC with both GPU and VPU connected via M.2 interface
15:29:09 <anssik> Rafael: in this experiment, you did not try to do custom ops, all run on DirectML?
15:29:22 <anssik> ningxinhu: yes, in this experiment all running on DirectML on VPU
15:29:40 <anssik> Rafael: how does this differ from Test4?
15:29:52 <anssik> ningxinhu: Test4 is on GPU, and this is on VPU
15:30:06 <anssik> Rafael: how long Test4 took on VPU
15:31:11 <anssik> ningxinhu: not able to publish performance numbers for VPU yet
15:32:22 <anssik> ... this is the status I had so far, one open: can we share buffer between WebGPU compute shader and offload to VPU, open in terms of multi-adapter support in DirectML
15:33:00 <anssik> ... another question, VPU is a DXCore device, not sure if DXCore has multi-adapter support
15:33:33 <anssik> Rafael: WebGPU let's ask for an adapter given capabilities
15:33:51 <anssik> ... e.g. "give me an ML adapter or an adapter that can do ML well"
15:34:57 <anssik> ... does the VPU let run high-level shading language, or just custom ops?
15:35:04 <anssik> ningxinhu: I need to look into that
15:35:27 <anssik> Rafael: DXCore has multi-adapter support, can enumerate adapters etc.
15:36:09 <anssik> ningxinhu: can two WebGPU devices share buffer between themselves?
15:36:25 <anssik> Rafael: WebGPU in general, no you cannot, neither can you textures
15:37:09 <anssik> ... you can go and share heaps and stuff with two WebGPU adapters, but very slow, cannot use proprietary compression schemes etc.
15:37:35 <anssik> ... not recommended to do that, we shouldn't expose buffers and textures that can be shared between two devices
15:38:14 <anssik> ningxinhu: worry is, if VPU is the only device, but needs to share buffer with WebGPU will get performance penalty
15:39:52 <anssik> Rafael: let's pretend we tie outselves to WebGPU: we can do custom ops with custom shaders, web dev is forced to go back to ArrayBuffers, if there's ML adapters
15:41:40 <anssik> ningxinhu: future investigation item proposal: can we run Test3 on VPU
15:42:22 <anssik> Rafael:  good to understand the cost of that
15:42:44 <anssik> RRSAgent, draft minutes v2
15:42:44 <RRSAgent> I have made the request to generate https://www.w3.org/2019/12/05-webmachinelearning-minutes.html anssik
15:43:57 <anssik> ningxinhu: we're using TF WebGPU backend as a test vehicle, so want TF folks feedback if this is the right way to go
15:44:07 <anssik> Present+ Daniel_Smilkov
15:44:25 <anssik> Daniel: good approach, we are now fusing ops more aggressively before inference
15:44:54 <anssik> ... we would run a single shader on fused op
15:45:14 <anssik> ... there's a fixed overhead of running a shader
15:46:00 <anssik> ... as far as WebGPUBuffers, getting a custom op sounds reasonable, this API should also allow to get TypedArray back if executed on CPU, or GPUBuffer if on GPU
15:46:19 <anssik> ningxinhu: thanks, this links back to Rafael's idea, an idea to explore
15:47:05 <anssik> TOPIC: Op compatibility
15:47:15 <anssik> anssik: we started track op compatibility a while ago, have two ops under investigations, looking at conv2d first
15:47:23 <anssik> -> https://github.com/webmachinelearning/webnn/issues/28 [op compatibility] conv2d
15:47:33 <anssik> anssik: open issues:
15:47:40 <anssik> ... 1. dilation is not supported by BNNS.
15:47:40 <anssik> ... 2. MPS and BNNS only support same beginning and ending paddings along each spatial axis.
15:47:40 <anssik> ... 3. strides for N and C dimension are not supported by all APIs.
15:47:40 <anssik> ... 4. int to unsigned int conversion for strides and dilations are required by MPS, BNNS and DML.
15:48:07 <anssik> ... Benjamin raised a couple of new opens:
15:48:22 <anssik> ... 1. Precision
15:48:22 <anssik> ... 2. Input shape
15:48:22 <anssik> ... 3. Dimensions int or a Number -> ToLength()
15:48:33 <anssik> ... anyone able to help with these?
15:49:09 <anssik> anssik: missing anything?
15:50:20 <anssik> Rafael: we haven't yet validated DML column in the table
15:50:31 <anssik> Paul: looking at the table briefly, thanks Ningxin!
15:50:50 <anssik> ... where's ONNX column?
15:51:34 <anssik> ningxinhu: I just link the APIs we're using in the WebNN POC, on Windows we use DirectML, TF should be under another level
15:52:19 <anssik> Paul: we discussed schema as an interop mechanism? can we look ONNX schema vs. TF schema?
15:52:57 <anssik> ningxinhu: maybe Nikhil can comment on that, my contrib on POC/impl perspective
15:54:22 <anssik> https://github.com/webmachinelearning/webnn/issues/17#issuecomment-519562441
15:54:46 <anssik> anssik: RESOLVED: The specification will reference a subset of the ONNX operations, starting small, adding more ops when compatibility with major ML JavaScript frameworks has been validated
15:56:05 <anssik> James: we shouldn't ignore the underlying APIs, because that's the foundation on top of which we'd implement WebNN API
15:56:46 <anssik> Paul: could we add a column for ONNX to the table?
15:57:39 <anssik> anssik: proposing we move the table to e.g. a markdown file so Paul could contribute ONNX data
15:58:05 <anssik> James: need to find the restricted subset of ONNX that can be implemented