14:59:12 RRSAgent has joined #webmachinelearning 14:59:12 logging to https://www.w3.org/2019/12/05-webmachinelearning-irc 14:59:17 Zakim has joined #webmachinelearning 14:59:22 RRSAgent, make logs public 14:59:28 Meeting: WebML CG Teleconference – 5 December 2019 14:59:36 Chair: Anssi 14:59:38 Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2019-12-05-agenda.md 14:59:42 ningxinhu has joined #webmachinelearning 14:59:42 Scribe: Anssi 14:59:47 scribeNick: anssik 15:00:55 Present+ Anssi_Kostiainen, Rafael_Cintron, Ningxin_Hu 15:01:07 Rafael has joined #webmachinelearning 15:01:12 RRSAgent, draft minutes v2 15:01:12 I have made the request to generate https://www.w3.org/2019/12/05-webmachinelearning-minutes.html anssik 15:02:39 Present+ Nikhil_Thorat 15:02:49 Rama has joined #webmachinelearning 15:03:13 Present+ Ganesan_Ramalingam 15:03:41 Present+ Ningxin_Hu 15:05:34 TOPIC: Buffer sharing between GPU and ML accelerator 15:05:40 -> https://github.com/webmachinelearning/webnn/issues/33 [investigation] buffer sharing between GPU and ML accelerator 15:05:59 anssik: on our 3 Oct call the group wanted to understand buffer sharing between GPU and ML accelerators. 15:06:17 ... Ningxin took ownership to work with Paul McDaniel to figure out how to approach this problem 15:06:40 ... the expectation is to demonstrate feasibility of running expensive ops such as conv2d on dedicated accelerators 15:06:46 ... and share buffer to WebGPU compute shader for running custom ops 15:06:59 ... Ningxin please give us an update on this investigation and areas where the group could contribute 15:07:09 jdarpinian has joined #webmachinelearning 15:07:41 Present+ James_Darpinian 15:08:14 paul_msft has joined #webmachinelearning 15:08:18 ningxinhu: status as of today, dependency work, POC on Windows w/ DirectML and Intel's IA and PC devkit 15:08:28 Present+ Paul_McDaniel 15:08:59 ningxinhu: rebased Chromium 80 for WebGPU D3D12 support 15:09:09 ... caused TF.js backend to crash on that version :-/ 15:09:29 ... my investigation shows lack of read only storage buffer support in Chromium WebGPU on Windows 15:09:50 ... workaround to remove readonly declaration in the TF.js preprocessor 15:10:10 ... Rafael working on Ch WebGPU on Windows and Nikhil on TF.js backend might be able to hepl 15:10:15 s/hepl/help/ 15:10:28 Rafael: lack of readonly is a known issue, so need to work without it 15:10:41 ningxinhu: want to get your confirmation this is a viable workaround 15:10:59 Rafael: should be fine, things will just get better when we have readonly proper 15:11:04 ... want to hear what you found 15:12:38 ningxinhu: ported POC to Windows, DirectML backend for buffer sharing, JS surface shared through WebGPU buffer 15:13:09 ... WebGPU backend of D3D12 and WebNN backend on DirectML share buffers via D3D12Resource 15:13:34 ... Test1 - conv2d/add/relu (WebGPU): 37.93 ms 15:13:34 ... Test2 - conv2d (WebNN) -> ArrayBufferView -> add/relu (WebGPU): 27.04 ms 15:13:34 ... Test3 - conv2d (WebNN) -> WebGPUBuffer -> add/relu (WebGPU): 9.18 ms 15:13:34 ... Test4 - conv2d/add/relu (WebNN): 7.58 ms 15:14:28 ... Test4 all runs all 3 ops on DirectML 15:15:01 Rafael: can you clarify what diff is between Test1 and Test3? 15:15:56 ningxinhu: Test1 running all on compute shader, Test3 conv2d in WebNN, add/relu WebGPU compute shader 15:16:43 ... we want to prove custom ops can be done on WebGPU, optimized ops can be offloaded to WebNN, still efficient interop 15:16:52 ... Test3 is the key result 15:17:22 Rafael: when you say "WebNN" means "DirectML" for the purposes POC 15:17:27 ningxinhu: right 15:17:38 s/purposes POC/POC purposes/ 15:19:19 ningxinhu: question for the group, do we need to support WebGPUBuffer as input and output for WebNN execution? 15:19:49 Rafael: I think sharing with WebGPU buffers is one route, or have another notion of a buffer 15:20:12 ... implementation can use under the covers D3D 12 or something else 15:20:51 ningxinhu: I remember a discussion earlier, you mentioned an idea of some umbrella interface like NNBuffer that's support both CPU and GPU buffer, is it worth exploring? 15:21:18 Rafael: you mean a buffer that abstracts out the hardware? 15:22:03 ningxinhu: yes, asking if an abstract interface makes sense, can work efficiently 15:22:22 Rafeal: either way we can explore, comes down to how much we want to tie our destiny to WebGPU group 15:22:45 webmlbuffer: https://github.com/webmachinelearning/webnn/issues/17#issuecomment-519304938 15:22:46 ... as long as it's under covers, we can be flexible in the future, if these "buffers" are not doing it on GPU 15:23:38 anssik: did you get feedback you wanted? 15:24:26 ningxinhu: yes, this area needs more investigation, e.g. buffer abstraction level needs to be looked at 15:25:55 ningxinhu: another part of the investigation: leveraged DXCore API to enumerate adapters that support compute-only devices 15:26:41 ... implemented polyfill for "low-power" preference, in POC we map that to DirectML backend low-power ML accelerometer enumerating devices via DXCore 15:27:32 ... in our experiment we use a small PC with both GPU and VPU connected via M.2 interface 15:29:09 Rafael: in this experiment, you did not try to do custom ops, all run on DirectML? 15:29:22 ningxinhu: yes, in this experiment all running on DirectML on VPU 15:29:40 Rafael: how does this differ from Test4? 15:29:52 ningxinhu: Test4 is on GPU, and this is on VPU 15:30:06 Rafael: how long Test4 took on VPU 15:31:11 ningxinhu: not able to publish performance numbers for VPU yet 15:32:22 ... this is the status I had so far, one open: can we share buffer between WebGPU compute shader and offload to VPU, open in terms of multi-adapter support in DirectML 15:33:00 ... another question, VPU is a DXCore device, not sure if DXCore has multi-adapter support 15:33:33 Rafael: WebGPU let's ask for an adapter given capabilities 15:33:51 ... e.g. "give me an ML adapter or an adapter that can do ML well" 15:34:57 ... does the VPU let run high-level shading language, or just custom ops? 15:35:04 ningxinhu: I need to look into that 15:35:27 Rafael: DXCore has multi-adapter support, can enumerate adapters etc. 15:36:09 ningxinhu: can two WebGPU devices share buffer between themselves? 15:36:25 Rafael: WebGPU in general, no you cannot, neither can you textures 15:37:09 ... you can go and share heaps and stuff with two WebGPU adapters, but very slow, cannot use proprietary compression schemes etc. 15:37:35 ... not recommended to do that, we shouldn't expose buffers and textures that can be shared between two devices 15:38:14 ningxinhu: worry is, if VPU is the only device, but needs to share buffer with WebGPU will get performance penalty 15:39:52 Rafael: let's pretend we tie outselves to WebGPU: we can do custom ops with custom shaders, web dev is forced to go back to ArrayBuffers, if there's ML adapters 15:41:40 ningxinhu: future investigation item proposal: can we run Test3 on VPU 15:42:22 Rafael: good to understand the cost of that 15:42:44 RRSAgent, draft minutes v2 15:42:44 I have made the request to generate https://www.w3.org/2019/12/05-webmachinelearning-minutes.html anssik 15:43:57 ningxinhu: we're using TF WebGPU backend as a test vehicle, so want TF folks feedback if this is the right way to go 15:44:07 Present+ Daniel_Smilkov 15:44:25 Daniel: good approach, we are now fusing ops more aggressively before inference 15:44:54 ... we would run a single shader on fused op 15:45:14 ... there's a fixed overhead of running a shader 15:46:00 ... as far as WebGPUBuffers, getting a custom op sounds reasonable, this API should also allow to get TypedArray back if executed on CPU, or GPUBuffer if on GPU 15:46:19 ningxinhu: thanks, this links back to Rafael's idea, an idea to explore 15:47:05 TOPIC: Op compatibility 15:47:15 anssik: we started track op compatibility a while ago, have two ops under investigations, looking at conv2d first 15:47:23 -> https://github.com/webmachinelearning/webnn/issues/28 [op compatibility] conv2d 15:47:33 anssik: open issues: 15:47:40 ... 1. dilation is not supported by BNNS. 15:47:40 ... 2. MPS and BNNS only support same beginning and ending paddings along each spatial axis. 15:47:40 ... 3. strides for N and C dimension are not supported by all APIs. 15:47:40 ... 4. int to unsigned int conversion for strides and dilations are required by MPS, BNNS and DML. 15:48:07 ... Benjamin raised a couple of new opens: 15:48:22 ... 1. Precision 15:48:22 ... 2. Input shape 15:48:22 ... 3. Dimensions int or a Number -> ToLength() 15:48:33 ... anyone able to help with these? 15:49:09 anssik: missing anything? 15:50:20 Rafael: we haven't yet validated DML column in the table 15:50:31 Paul: looking at the table briefly, thanks Ningxin! 15:50:50 ... where's ONNX column? 15:51:34 ningxinhu: I just link the APIs we're using in the WebNN POC, on Windows we use DirectML, TF should be under another level 15:52:19 Paul: we discussed schema as an interop mechanism? can we look ONNX schema vs. TF schema? 15:52:57 ningxinhu: maybe Nikhil can comment on that, my contrib on POC/impl perspective 15:54:22 https://github.com/webmachinelearning/webnn/issues/17#issuecomment-519562441 15:54:46 anssik: RESOLVED: The specification will reference a subset of the ONNX operations, starting small, adding more ops when compatibility with major ML JavaScript frameworks has been validated 15:56:05 James: we shouldn't ignore the underlying APIs, because that's the foundation on top of which we'd implement WebNN API 15:56:46 Paul: could we add a column for ONNX to the table? 15:57:39 anssik: proposing we move the table to e.g. a markdown file so Paul could contribute ONNX data 15:58:05 James: need to find the restricted subset of ONNX that can be implemented