14:58:43 RRSAgent has joined #webmachinelearning 14:58:47 logging to https://www.w3.org/2023/01/12-webmachinelearning-irc 14:58:47 RRSAgent, make logs Public 14:58:48 please title this meeting ("meeting: ..."), anssik 14:58:48 Meeting: WebML WG Teleconference – 12 January 2023 14:59:29 Chair: Anssi 14:59:34 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-01-12-wg-agenda.md 14:59:47 Scribe: Anssi 14:59:47 scribeNick: anssik 14:59:53 Present+ Anssi_Kostiainen 15:00:06 Regrets+ Dominique_Hazael-Massieux 15:00:25 ningxin_hu has joined #webmachinelearning 15:00:37 Present+ Ningxin_Hu 15:01:02 Present+ Sungpil_Shin 15:01:21 Present+ Bruce_Dai 15:01:23 Sungpil_Shin__ETRI_ has joined #webmachinelearning 15:01:28 Present+ Chai_Chaoweeraprasit 15:01:42 RRSAgent, draft minutes 15:01:43 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik 15:02:15 bruce_dai has joined #webmachinelearning 15:03:49 chai has joined #webmachinelearning 15:03:49 anssik: Welcome to 2023! 15:03:56 Regrets- Dominique_Hazael-Massieux 15:03:56 Regrets- Dominique_Hazael-Massieux 15:04:01 Present+ Dominique_Hazael-Massieux 15:05:37 scribe+ 15:05:51 Topic: WebNN API open PRs and issues 15:05:56 ghurlbot, this is webmachinelearning/webnn 15:05:56 anssik, OK. But note that I'm currently off. Please use: ghurlbot, on 15:06:06 ghurlbot, on 15:06:07 anssik, OK. 15:06:17 anssik: good progress was made in GH over the holiday period, so I'd like us to review the open PRs and those landed and discuss issues filed. 15:06:31 ... my expectation is we'll identify and fast track any priority changes that should get into the initial CR release train. 15:06:39 Subtopic: Add lstm and lstmCell ops, rename MLOperator to MLActivation 15:06:41 RafaelCintron has joined #webmachinelearning 15:06:48 anssik: #321 15:06:49 https://github.com/webmachinelearning/webnn/issues/321 -> Pull Request 321 [closed] Add LSTM to the operator list (wchao1115) 15:06:52 Present+ Rafael_Cintron 15:06:57 ... this PR adds lstm and lstmCell ops to the spec. 15:07:03 ... this PR received adequate review and I considered it ready to be merged. 15:07:16 ... notably, we explicitly mention LSTM architecture in our current charter so it was great to get this done. This is a crucial building block that improves the "classic" RNNs. 15:07:22 ... thanks Chai! 15:07:24 ... any comments from Chai? Any questions from anyone? 15:07:37 zkis has joined #webmachinelearning 15:08:01 Chai: no further comments, happy to get this in 15:08:44 q? 15:09:00 Subtopic: Simplify MLContext creation: remove MLDeviceType, remove "high-performance" from MLPowerPreference 15:09:22 anssik: #322 15:09:22 https://github.com/webmachinelearning/webnn/issues/322 -> Pull Request 322 Simplify MLContext creation (wchao1115) 15:09:32 anssik: this PR proposes to remove MLDeviceType enum: 15:09:40 enum MLDeviceType { 15:09:40 "cpu", 15:09:40 "gpu" 15:09:40 }; 15:09:57 anssik: and to remove "high-performance" from MLPowerPreference enum: 15:09:57 15:09:57 enum MLPowerPreference { 15:09:57 "default", 15:09:58 "high-performance", 15:09:58 "low-power" 15:09:58 }; 15:10:17 anssik: To ease the review, I added a diff of the proposed IDL changes to GH excluding MLOperator -> MLActivation rename changes 15:10:17 -> Diff with IDL changes in PR #322 https://github.com/webmachinelearning/webnn/pull/322#issuecomment-1379983187 15:11:06 anssik: my summary of the practical impact of these changes for the default context is: 15:11:21 ... - device type selection is an implementation detail 15:11:51 ... - ML frameworks (key customers) using WebNN API cannot request a CPU, GPU or NPU implementation explicitly 15:12:09 ... - a GPU context can be created only from a WebGPU device 15:13:12 chai: thanks for the diff in a comment! 15:13:28 ... PR is straightforward, there are a few conversation points to discuss 15:13:49 ... before the change, if you want to create a CPU context you set the device type as "cpu" and create a context 15:13:57 ... then everything is done on CPU 15:14:11 ... with the change you create a context without setting any device type 15:14:29 ... the implementation will be a CPU implementation 15:14:55 ... for the GPU, before the change, you had two ways to create a context 15:15:15 ... first, set device type to "gpu" and you'll get GPU context, second way is to give it a WebGPU device 15:15:20 ... so two ways to do it 15:15:30 ... after the change, we remove the first way to do it 15:15:41 ... say, go create a WebGPU device and give it to createContext method 15:16:10 ... we remove the first way of creating a GPU implementation, for 1) simplicity, with this change we can simply think of GPU context in terms of WebGPU context so it maps 1:1 15:16:23 q? 15:17:09 ... 2) the two ways to create a GPU context are fundamentally different in one way, if you create with a device type "gpu" the implementation will have to own the device that is created internally, so "gpu" device type has to manage its GPU device, while WebGPU device is not owned, but given to you 15:17:24 ... WebGPU is passed in terms of implementation, "this is my device, feel free to use it" 15:17:42 q+ 15:17:53 ... eliminating the so-called internal device type "gpu", we don't need to own the device but we work with the WebGPU device 15:18:23 q+ to ask about extensibility to other *PU 15:19:12 ... creating a WebGPU-backed context is a little more complicated in a way you have to create and select and adapter for WebGPU 15:19:12 ... advantages: 1) can have multiple GPU adapters in one system 15:19:42 ... ~1/3 of all systems will have more than one adapter 15:19:57 ... design the API to get WebGPU be in the biz or selecting the adapter 15:20:31 ... if you have the MLDeviceType "gpu" then such system might pick a different adapter than WebGPU would use, e.g. discrete and integrated 15:21:16 ... each adapter is its own resource domain, you need to create a shareable resource to work cross adapters 15:21:55 ... having just WebGPU way of selecting and adapter, you just go design what you want, WebGPU has the same enum with three values, incl. "high-performance", "low-power" 15:21:59 q? 15:22:45 ack zkis 15:23:23 ack dom 15:23:24 dom, you wanted to ask about extensibility to other *PU 15:23:43 dom: thanks Chai, this makes a lot of sense, supportive of the direction 15:24:04 ... at some point I think we were thinking of CPU, GPU and extensibility for xPUs such as NPUs and such 15:24:18 ... I wonder how that has been taken into account 15:24:49 Chai: thanks Dom, this PR will not address all about NPU, needs additional change 15:25:03 ... thinking around NPU is it'll most likely be more similar how we do CPU currently than with GPU 15:25:28 ... because NPU surfaces itself to the system as another adapter, a weird kind of adapter, cannot render anything just does compute 15:25:38 ... in the future WebGPU might want to pick it up 15:26:23 q+ 15:26:31 ... from the point of view hardware adapter itself, it is a lot more appropriate for WebNN to enumerate NPU on behalf of the user 15:26:49 ... WebGPU contract is around graphics operations 15:27:05 ... if you want NPU, you want WebNN to deal with the adapter, more similar to CPU than GPU 15:27:26 dom: question was, we're closing the simple path to say pick a CPU or pick a NPU 15:27:49 ... WebNN is a logical place to find logical NPU adapters? and we can figure out how to integrate that into createContext method? 15:28:02 Chai: I see it not as a new device type but a new power preference 15:28:14 q+ 15:28:16 Dom: we can consider that when we get there 15:28:17 q? 15:28:18 ack me 15:29:00 zkis: question to Chai, in earlier discussions we had a context type and a device type 15:29:19 ... with script managed context and user agent managed context 15:29:37 ... script manager context has valid use cases for ML frameworks that want to select an explicit type of an adapter 15:29:43 s/manager/managed 15:29:55 Chai: you're question is how does this work with frameworks? 15:30:44 ... the framework will own the WebGPU device, if the app uses both WebGPU and ML frameworks, e.g. WebGPU to render and TF.js to run ML, the app needs to manage the lifetime of devices so this makes it easier to manage with no hidden device 15:31:10 ... we make it WebNN biz to manage the device and not to mention, performance impact given no control over the device 15:31:36 ... to answer your question, [the device] needs to be owned by either the app or the framework 15:32:20 ... the app can device whether to use one adapter or more, ownership of the adapter will bubble up to the app or the framework 15:32:31 ... I agree with Ningxin this will complicate framework code 15:32:54 ... now just set enum and that's it, with these changes must pass the device and/or own the device themselves and pass it to WebNN 15:32:56 q? 15:33:11 zkis: solves for WebGPU, how to create a CPU specific context? 15:33:20 Chai: call createContext without parameters 15:33:32 zkis: but that can be overridden by GPU 15:33:46 Chai: no, you get a CPU if called without parameters 15:34:01 zkis: what if I want CPU+some accelerator? 15:34:09 Chai: I don't have an answer right now 15:34:21 zkis: we don't have an adapter abstraction in WebNN 15:34:47 Chai: NPU adapter will look very different from any WebGPU adapter, it does not render, does not support shaders 15:34:52 zkis: owned by lower layers 15:35:13 ... how about add another createContext method? 15:35:36 Chai: as of today, there is no a notion of NPU adapter, WebGPU could do that if it would want to 15:35:49 ... but we don't know if they want to enumerate NPU adapters in the future 15:35:49 q? 15:37:16 Chai: focus of the PR is between CPU and GPU and simplify the GPU story 15:37:20 ack ningxin_hu 15:37:39 ningxin_hu: thanks for the discussion, to summarize: 15:38:59 ... 1) framework developer point of view, the device selection is part of the WebGPU, concern is on compute side, because per our current spec only default context can execute computeSync method 15:39:33 ... for framework developer, also graph execution part is affected, CPU and GPU execution paths would diverge 15:39:59 ... that's my concern regarding the framework developer impact 15:40:44 ... 2) if you look at the native frameworks, ONNX or others, commonly the fw allows developer to select a device or some execution provider, a device abstraction, and the execution path stays the same 15:41:20 ... we abstract the device difference but allow to offload compute to different devices, but with this change a developer needs to adapt for that 15:41:25 s/2)// 15:42:10 ... 2) backend implementer, e.g. Chrome or other browser engine, this change puts hard requirement on any backend, it must support WebGPU 15:43:05 ... some OSes may not interact with WebGPU, e.g. MLService of ChromeOS, that abstracts CPU, NPU etc. with no integration with WebGPU 15:44:29 ... 3) dependency, adds a hard dependency on WebGPU API, but WebNN interop is post-v1, now v2 feature, after this change, GPU support would move to "v2" feature for WebNN, otherwise we lose GPU support for WebNN "v1", and in extreme case if WebGPU interop does not materialize we lose GPU support all together 15:44:40 I have updated #302 with this proposal, Chai please take a look: https://github.com/webmachinelearning/webnn/issues/302#issuecomment-1380586654 15:44:40 https://github.com/webmachinelearning/webnn/issues/302 -> Issue 302 API simplification: context types, context options, createContext() (zolkis) v2 15:44:51 Chai: what you say is indexed on the thing that this change will make WebGPU a requirement for WebNN v1 15:45:11 q+ to point out synergy with #302 (V2 discussion) 15:45:14 ningxin_hu: correct 15:46:15 Chai: that is not entirely true, given the reason we introduce command encoder interface is to allow to submit work to WebGPU and allow WebGPU implementation manage its queue 15:46:56 ... you still own submitting to the queue, populate the queue with a workload with MLCommandEncoder, allow to interop with WebGPU API and expect the app to submit work 15:47:40 ... all is behind the scenes, we are going to have our own queue, and submit just the ML workload and get the final result of the execution 15:47:56 ... compute method for WebGPU is a wrapper on top of the work submission mechanism 15:48:15 ... does not make WebGPU interop a requirement, changes implementation of submitting work to GPU 15:49:15 ... with internal GPU you also own your device, ask for a queue and you have your output data in output buffer you hand out to the caller 15:49:15 q? 15:49:40 ... ONNX RT as an example, to execute on GPU you give it a GPU queue, because apps using ONNX RT may use graphics API to do graphics work 15:50:00 ... for GPU submission, for ORT, you give it a queue 15:50:12 ... eventually the owner of the GPU device tends to be the app 15:50:49 ... the notion we wrap everything works but is less flexible 15:50:49 ... it makes efficient interop much harder, because the app does not see the device 15:51:21 ... this change makes that slightly harder to implement but hope gives more flexibility to the app 15:52:04 q+ 15:52:16 ack ningxin_hu 15:52:40 ningxin_hu: if ML context compute can work with WebGPU context it would make me feel better 15:53:46 ... I had a concern re throwing operation error in compute() as commented in GH 15:54:09 Chai: I'll look into that and try to fix that 15:54:12 q? 15:54:48 zkis: pointer to Chai that there's a connection to #302 marked for "v2", check the updated comment there, almost the same thing what you proposed but makes the context type explicit 15:54:48 https://github.com/webmachinelearning/webnn/issues/302 -> Issue 302 API simplification: context types, context options, createContext() (zolkis) v2 15:55:06 ... please review 15:55:18 q? 15:55:22 ack zkis 15:55:22 zkis, you wanted to point out synergy with #302 (V2 discussion) 15:55:57 q? 15:56:40 q? 15:56:47 RRSAgent, draft minutes 15:56:48 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik 15:57:00 Probably we could loop in Ping on this issue for feedback 15:57:09 Subtopic: Transfer the input and output views for asynchronous execution 15:57:14 I'll check with PIng 15:57:17 anssik: issue #318 15:57:18 https://github.com/webmachinelearning/webnn/issues/318 -> Issue 318 The input and output resources race condition issue of asynchronous execution (huningxin) 15:57:24 ... Jiawei spotted this bug as part of the Chromium implementation review. 15:57:51 ... The issue summary is that for WebNN graph async execution, the main thread and the worker thread may access the input or output array buffers at the same time that can cause a race condition. 15:58:11 ... Ningxin submitted a PR #323 to fix this. Much thanks for that! Also much thanks to Domenic for his advice that is incorporated into the PR for large parts. 15:58:11 https://github.com/webmachinelearning/webnn/issues/323 -> Pull Request 323 Transfer the input and output views for asynchronous execution (huningxin) 15:58:20 ... currently the PR is welcoming everyone's review, I'd like to see this landed very soon given it fixes a critical bug. 15:58:27 ... Ningxin, anything you'd like to share regarding this change? 15:59:15 ningxin_hu: good summary, only thing to add, after investigation this is a well known design consideration, noted as a warning in Web IDL, we are using the preferred solution, "Bring Your Own Buffer" 15:59:46 ... from WebIDL point of view, frameworks will bring their own buffers 16:00:40 ... this is a solved by Streams API and documented in Web IDL spec and we follow that design, check it out for more details, Domenic also has a blog post about this, check the issue for these materials 16:00:40 RRSAgent, draft minutes 16:01:11 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik 16:01:19 q? 16:01:27 zkis: this discussion has been extremely helpful for the upcoming algorithm discussions 16:01:27 Subtopic: Improve graph execution steps 16:01:37 anssik: issue #316 documents editorial improvements to improve the steps for async and sync execution. 16:01:38 https://github.com/webmachinelearning/webnn/issues/316 -> Issue 316 Review sync vs async compute differences (zolkis) 16:01:54 ... Zoltan has a WIP PR #319 out that defines generic graph execution steps and use them in the sync and async compute() methods. 16:01:55 https://github.com/webmachinelearning/webnn/issues/319 -> Pull Request 319 WiP: Improve graph execution steps (zolkis) 16:02:07 zkis: I need to rework these when Ningxin's PR is merged 16:02:12 RRSAgent, draft minutes 16:02:13 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik 16:02:52 Subtopic: Simplify the operand layout support of conv2d and pooling 2d operations 16:02:52 anssik: in issue #324 Nignxin explains how in the existing WebNN spec, conv2d supports two input operand layouts defined by MLInputOperandLayout and four filter operand layouts defined by MLConv2dFilterOperandLayout. 16:02:52 https://github.com/webmachinelearning/webnn/issues/324 -> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (huningxin) 16:03:09 ... Ningxin concludes this may make the implementation more complicated especially if a native ML framework or OS API doesn't support some of these layouts 16:03:17 ... this feedback was also thanks to Chromium review by Jiawei - another demonstration why working on spec and impl in tandem is so effective 16:03:55 ... To fix this, Ningxin proposes to reduce the supported operand layouts and just keep the default operand layout 16:03:55 ... and let the layout adaption and graph level optimization be handled by ML frameworks that usually already support such functionalities. 16:03:58 ... Ningxin, please feel free to fill me in 16:04:11 need more inputs from framework developers 16:04:30 q? 16:04:31 I can also @ping in that issue 16:04:31 https://github.com/ping -> @ping 16:04:49 Topic: WebNN API Candidate Recommendation readiness 16:05:38 anssik: I'd reached what I'd describe as a "W3C Process-defined expectation bar for CR readiness" == green with some yellow in CR readiness tracker #240 16:05:38 https://github.com/webmachinelearning/webnn/issues/240 -> Issue 240 Candidate Recommendation readiness tracker (anssiko) 16:05:57 ... - we can show that the specification has met all Working Group requirements 16:06:08 ... - we have added no new normative references since FPWD 16:06:48 ... - we can document how adequate implementation experience will be demonstrated thanks to well advanced implementations across multiple backends and the first version of the test suite that just landed to WPT repo 16:07:01 ... - we have strong evidence that the spec has received wide review 16:07:22 ... - we have identified WebGPU interop as a feature at risk for initial CR and we plan to address it in CR updates 16:07:55 ... but because we're an ambitious WG, I've set our CR quality bar higher than normal so that we do not just meet the bar but exceed the quality expectations 16:08:08 ... thus I allow us for some time for final polish before we ship the CR during Q1 16:08:42 q? 16:09:08 Topic: Meeting scheduling adjustment 16:09:13 anssik: Chinese New Year is Jan 22, 2023 16:09:25 ... 2023 animal sign is rabbit, the luckiest out of all the twelve animals in the Chinese zodiac :-) 16:09:31 ... Chinese folks will get 7 days off from work from January 21st to January 27th in 2023 16:09:37 ... in consideration of this, I will cancel our next scheduled meeting that overlaps with this holiday period. 16:09:42 ... Happy New Year to our participants from China! 16:09:55 ... I'm proposing we push forward our bi-weekly meeting cycle by one week so that we'd meet: 16:09:55 ... 2 Feb, 16 Feb, 2 Mar, 16 Mar, 30 Mar etc. at the usual time 15:00-16:00 UTC 16:09:59 RRSAgent, draft minutes 16:10:00 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik 16:10:07 Thanks Anssi! 16:10:55 RRSAgent, draft minutes 16:10:57 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik 16:14:13 s/I'd reached/we've reached 16:14:16 RRSAgent, draft minutes 16:14:17 I have made the request to generate https://www.w3.org/2023/01/12-webmachinelearning-minutes.html anssik