W3C

– DRAFT –
WebML CG Teleconference – 3 October 2019

03 October 2019

Attendees

Present
Anssi_Kostiainen, Daniel_Smilkov, Ganesan_Ramalingam, Greg_Whitworth, Kai_Ninomiya, Nikhil_Thorat, Ningxin_Hu, Paul_McDaniel, Rafael_Cintron
Regrets
Chair
Anssi
Scribe
Anssi, anssik

Meeting minutes

F2F recap

F2F agenda

F2F Day 1 minutes

F2F Day 2 minutes

<jdarpinian> me too, "The Webex Meeting is locked"

<jdarpinian> got in, thanks!

Rafael: F2F minutes were clear, discussed with Apple at WebGPU F2F
… spoke with Myles Maxfield, he told me Apple favors an API that is not WebGPU extension
… would be easy for developers to misuse the API if so

jdarpinian: also talked to Myles, and I think he did not know if their hw allows sharing buffers between GPU and ML hardware
… WebGPU extension does not mean buffers allocated on GPU necessarily
… would be good to be able to specify "I want to use this buffer for ML"

anssik: are there minutes from WebGPU?

jdarpinian: can look into the minutes

paul: Microsoft has also custom hw for ML offloading, not sharing GPU buffers, so must support scenario with non-GPU hw not sharing buffers

jdarpinian: about sharing buffers, we'll want an API that does not share buffers, not WebGPU-based, not necessarily mean we shouldn't investigate WebGPU-based APIs, GPUs are growing ML-based features
… also it still might be simpler to release a WebGPU-based API even if it'd not perform as well on every platform, e.g. on those that cannot share buffers

Ningxin_Hu: questions re WebGPU F2F, james mentioned WebGPU extension, did you discuss WebGL extension too at WebGPU F2F?

jdarpinian: WebGL extension not discussed directly
… VuklanML F2F had discussions on MLIR and TVM
… no meta command API going into Vulkan, instead prefer exposing lower-level primitives allowing shaders access tensor cores of GPUs of today and write their own kernels, do kernel fusion
… not sure if that direction makes sense for as, just a data point

anssik: anyone from VulkanML to participate this group?

jdarpinian: more hw vendors, e.g. ARM, Qualcomm would be nice to get as participants here

https://‌www.w3.org/‌2019/‌Talks/‌dhm-ml-workshop/‌standardization.html

https://‌www.w3.org/‌2019/‌Talks/‌dhm-ml-workshop/

https://‌github.com/‌webmachinelearning/‌webnn/‌blob/‌master/‌explainer.md

https://‌github.com/‌immersive-web/‌webxr/‌blob/‌master/‌explainer.md

<jdarpinian> webgpu face to face meeting minutes, ML mentioned briefly: https://‌docs.google.com/‌document/‌d/‌1CmKo59tjZwmePVrFpHpIG0W5shKR_GOrnNuMStPCEko/‌edit

WebNN interop investigation next steps

https://‌docs.google.com/‌presentation/‌d/‌1KGRc1RnnYt_1JK2Pk6r2xRkD60v4F8jc4beHMv0crng/‌edit#slide=id.g6353211274_0_23

<Ningxin_Hu> https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌6#issuecomment-536408448

Ningxin_Hu: after F2F, I provided details of the investigations to issue #6 for WebGPU buffer sharing
… we have Apple MPS POC, Metal backend, WebNN can compile subgraph for WebGPU device

[Ningxin recaps WebNN investigation from F2F]

Ningxin_Hu: need extend WebNN API to allow compute subgraphs, to avoid data moving across devices

anssik: does Ningxin's POC results agree with Apple's concerns re buffer sharing?

Rafael: interested in hearing Ningxin's view on performance delta in this scenario?

Ningxin_Hu: POC investigations on MBP, so does not have dedicated ML hardware
… tests exercise WebGPU compute shaders and Metal compute shaders

Ningxin_Hu: is this reasonable requirement: we want WebNN to compile to a dedicated ML hardware, test with WebGPU shader compute shader exchanging data with it, profile performance of buffer sharing

<PaulM_> Can we use intel ml chips as a test case ?

Paul: you're looking for hardware to prove this out?

Ningxin_Hu: re future POC requirememnts 1) choose dedicated ML hardware to test with, 2) decide a data point we want

Paul: I like data-driven design as proposed by anssi

Ningxin_Hu: we have Movidius VPU in our POC via OpenVINO on Linux
… we could probably have similar setup on Windows through DirectML

Paul: that sounds awesome, let's follow up off this call

<Ningxin_Hu> POC repo: https://‌github.com/‌otcshare/‌chromium-src

jdarpinian: comment on using Movidius, these are often connected over USB implies bandwidth constraints
… PCI Express would be better

Ningxin_Hu: previous setup was with USB, but current hardware on PCI Express

Explore custom op support by DSL

Kai: sort of interested, but not up to speed with it

Support compiling and executing ops for devices, CPU or GPU

<PaulM_> Need to drop off.

Adjourn

Minutes manually created (not a transcript), formatted by Bert Bos's scribe.perl version Mon Apr 15 13:11:59 2019 UTC, a reimplementation of David Booth's scribe.perl. See history.

Diagnostics

Succeeded: s/TVM(?)/TVM/

Maybe present: anssik, jdarpinian, Kai, paul, Rafael