14:54:07 RRSAgent has joined #webmachinelearning 14:54:11 logging to https://www.w3.org/2023/11/16-webmachinelearning-irc 14:54:11 RRSAgent, make logs Public 14:54:12 please title this meeting ("meeting: ..."), anssik 14:54:12 Meeting: WebML WG Teleconference – 16 November 2023 14:54:16 Chair: Anssi 14:54:20 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-11-16-wg-agenda.md 14:54:24 Scribe: Anssi 14:54:30 scribeNick: anssik 14:54:38 gb, this is webmachinelearning/webnn 14:54:38 anssik, OK. 14:54:43 Present+ Anssi_Kostiainen 14:54:50 RRSAgent, draft minutes 14:54:51 I have made the request to generate https://www.w3.org/2023/11/16-webmachinelearning-minutes.html anssik 15:00:17 ningxin_hu has joined #webmachinelearning 15:00:34 Joshua_Lochner has joined #webmachinelearning 15:00:45 Present+ Joshua_Lochner 15:01:01 Present+ Ningxin_Hu 15:01:27 Present+ Chai_Chaoweeraprasit 15:01:45 Present+ Austin_Sullivan 15:01:58 chai has joined #webmachinelearning 15:02:00 Present+ Deepti_Gandluri 15:02:22 Present+ Dwayne_Robinson 15:03:07 jsbell has joined #webmachinelearning 15:03:07 Present+ Joshua_Bell 15:03:07 Present+ Zoltan_Kis 15:03:07 Present+ Reilly_Grant 15:03:07 Deepti has joined #webmachinelearning 15:03:07 Present+ Reilly_Grant 15:03:17 dwayner has joined #webmachinelearning 15:03:29 RRSAgent, draft minutes 15:03:30 I have made the request to generate https://www.w3.org/2023/11/16-webmachinelearning-minutes.html anssik 15:03:56 Present+ Dominique_Hazael-Massieux 15:04:13 RafaelCintron has joined #webmachinelearning 15:04:20 Topic: Announcements 15:04:23 Subtopic: Web & Networks IG coordination 15:04:42 anssik: W3C Web & Networks Interest Group is rechartering with a coordination opportunity with WebML WG 15:04:57 ... this IG is interested in working with us to explore how to load balance computing between cloud and client 15:05:04 Present+ Rafael_Cintron 15:05:04 .... an example of this could be an inference workload 15:05:15 ... this IG has network infrastructure experts in it 15:05:19 asully has joined #webmachinelearning 15:06:01 ... expected IG investigations include identifying what network characteristics (bandwidth, latency, radio power consumption etc.) would be helpful to surface as higher-level hints via Web APIs or protocols to help web apps using APIs such as WebNN make informed decisions on where to run their workloads 15:06:09 -> Proposed Web and Networks Interest Group Charter https://www.w3.org/2023/11/proposed-web-networks-charter.html 15:06:19 -> Voting instructions (Member-only): https://lists.w3.org/Archives/Member/w3c-ac-members/2023OctDec/0029.html 15:06:27 anssik: every W3C Member company can vote, talk to your AC rep to cast a vote 15:07:00 Subtopic: Implementation status 15:07:14 -> Implementation Status of WebNN Operations https://webmachinelearning.github.io/webnn-status/ 15:07:31 anssik: I wanted to check if there are implementation status updates to share with the WG 15:07:36 ... should we check with Belem to update the above status page? 15:07:51 ... I recall Reilly asked a questions about macOS backend implementation on our earlier call? 15:07:56 zolkis has joined #webmachinelearning 15:08:36 Reilly: we have been looking at the question of implementation against CoreML 15:09:02 anssik: we look forward to welcoming Apple on board this WG 15:09:17 ... but while waiting for that to happen, I'd encourage the WG to investigate based on publicly available information, implementation story on Apple's platforms 15:09:50 ... I also noticed Reilly recently made helpful contributions to the WebNN polyfill, thanks! 15:10:11 ... while the polyfill is not considered an implementation per se, it helps web developers bring WebNN-accelerated web experiences to platforms that do not yet have WebNN native implementation with the same codebase 15:11:12 Reilly: I was focused on WebNN samples, their performance, polyfill contributions were to upgrade the polyfill dependencies to get more up to date view on its performance 15:12:20 ... I got stuck with the patch with TF upgrade, the way how the polyfill is compiling under browser, but test suite runs on Node.js 15:12:48 ... I'm probably not going to contribute to this polyfill beyond low-hanging fruit issues 15:13:54 q? 15:14:08 Subtopic: Web LLM collaboration on hybrid execution use case 15:14:22 deepti has joined #webmachinelearning 15:14:39 Present+ Vivek_Sekhar 15:14:43 anssik: Tianqi Chen from Carnegie Mellon University, OctoML, created Web LLM, a JS library that accelerates select LLMs in browsers with WebGPU 15:14:45 Vivek has joined #webmachinelearning 15:14:47 -> Web LLM repo https://github.com/mlc-ai/web-llm 15:14:58 anssik: Web LLM runs Llama with 70b parameters in some high-end systems 15:15:14 ... Tianqi shared an interesting use case with this WG: "There are great synergies to webnn related projects that possibly enables future hybrid executions of models(e.g. webgpu for customized op and some through webnn)" 15:15:34 -> Proposed hybrid execution use case https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1803950944 15:15:35 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 15:15:58 anssik: if we find time from his busy calendar we'll invite Tianqi to our future call to share his experiences with Web LLM 15:16:13 ... meanwhile, I suggest we may spin this hybrid execution use case into its own GH issue, thoughts? 15:17:20 Joshua_Lochner: I've definitely paid attention to this project, 70B params is amazing, I'm interested in running similar models on lower end systems 15:17:41 q? 15:17:51 q+ 15:17:56 ack ningxin_hu 15:18:27 https://bugs.chromium.org/p/chromium/issues/detail?id=1492036#c3 15:18:35 ningxin_hu: not related to Web LLM, to follow-up on the benchmark, we got benchmark data regarding the split and emulation path for slice 15:19:24 ... decomposing to multiple slice, we see perf difference in the range of 10-20%, both iGPU and dGPU, please see the cgbug for details 15:19:42 q? 15:19:56 Topic: WebNN v2: Review transformer ops spec contributions (continued) 15:20:12 anssik: issue #375 and PR #478 15:20:12 https://github.com/webmachinelearning/webnn/issues/478 -> Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115) 15:20:12 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 15:20:40 anssik: Chai submitted a PR #478 to add support for new ops and data types needed by well-known transformers the WG has identified -- thanks Chai and also Dwayne! 15:21:06 ... this PR also removes one op per our maintenance commitment and in-depth understanding of model targets thanks to careful research by participants 15:21:16 ... I'd encourage the WG to review this PR 15:21:27 ... this is a substantive change, diff stats +1500 -900 15:21:49 ... Chai's PR comment provide a great summary so I'd like Chai to walk us through this PR and perhaps highlight areas where reviewers should most focus on 15:21:54 q? 15:22:27 Chai: thanks Anssi, thanks all for the early feedback! 15:23:19 ... in the PR description I have the summary, this is not an extensive list of ops needed for Transformers, this is a starting point 15:23:37 ... allows us to run some popular models we've identified, there will likely be additions later 15:24:14 ... you see some removal, we are considering the entire spec "v1" and we actively validating the spec driven by implementation experience 15:25:04 ... when we solidify the op set we want to be more strict about changes 15:25:05 q+ 15:25:30 ... regarding CoreML compatibility, for every spec change we have looked at CoreML to make sure this works there 15:25:55 ... we also look at the major ML frameworks compatibility, in addition to major OS ML APIs 15:26:15 ... if folks think there should be additional compatibility targets, please let us know 15:26:57 ... we discussed e.g. clamp, min and max being equal, a minor case but frameworks did not agree on that minor detail 15:26:58 q? 15:27:48 ack jsbell 15:28:22 jsbell: thanks you Chai for confirming we are still comfortable making changes to this spec 15:28:44 ... how is the WPT coverage for the API, when we add ops do we have WPT tests for those? 15:28:45 q? 15:29:28 (in this case, this would be adding tests and additions to the baseline implementation) 15:29:41 chai: touching on breaking change aspect, we can make change at this stage, but I wouldn't make big changes, this PR makes one such change by removing one op 15:30:01 ... red diff is due to bikeshed formatting mostly 15:30:09 ... we are updating WPT in tandem with the spec PR 15:30:10 q? 15:30:26 ningxin_hu: we have a small team working on WPT for these new ops 15:30:34 (for the record, the culprit of the overly red diff is not bikeshed, but the diff script itself) 15:30:47 ... baseline implementation, pure JS impl of all the ops, is updated to help testing effort 15:30:52 https://github.com/webmachinelearning/webnn-baseline/pulls 15:31:18 ningxin_hu: we've merged some of these new ops and are in progress of reviewing the remaining ops being added 15:31:41 https://github.com/web-platform-tests/wpt/pull/43179 15:31:48 ... also WPT PRs work in progress 15:32:31 ... we want to get the WPT tests right and add gradually, help wanted to review WPT PRs 15:32:33 q+ 15:32:46 ack jsbell 15:32:58 jsbell: thanks, that's awesome! 15:33:25 sgtm, will do that 15:33:34 ... please include status update on WPT tests in the PR #478 15:33:35 https://github.com/webmachinelearning/webnn/issues/478 -> Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115) 15:35:10 q? 15:35:48 Topic: Enhancements 15:36:01 Subtopic: Simplify matmul op 15:36:05 anssik: issue #470 15:36:05 https://github.com/webmachinelearning/webnn/issues/470 -> Issue 470 Simplify `matmul` op (by huningxin) 15:36:33 anssik: WebNN matmul supports 1-D input tensors but 1-D input tensors are not widely supported by native ML APIs, specifically: 15:36:48 ... - DirectML's DML_GEMM_OPERATOR_DESC 15:36:48 ... - BNNS's BroadcastMatMul 15:36:48 ... - TensorFlow's BatchMatMulV2 15:36:58 ... the open question is: 15:37:02 ... - should matmul drop support for 1-D input tensors? 15:37:18 ... frameworks could still support 1-D input tensors by reshaping to 2-D, prepend or append 1 dimension 15:37:30 ... this reshape incurs no performance penalty, because no memory copy 15:37:45 ... this change would simplify the implementation, WPT tests, reduce conformance testing cost 15:37:59 q? 15:38:57 ningxin_hu: in the CL review we prototyped without 1-D support 15:39:56 ... additional comment, in the issue we looked at XNNPACK's xnn_define_batch_matrix_multiply 15:40:12 ... none of these native API support 1-D so this is good to be dropped 15:40:15 q? 15:41:05 anssik: everyone OK to drop support for 1-D input tensors? 15:41:12 [ silence mean consent ] 15:41:50 Subtopic: Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option 15:41:54 anssik: issue #326 15:41:55 https://github.com/webmachinelearning/webnn/issues/326 -> Issue 326 Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option (by huningxin) [Editorial] 15:42:18 anssik: we discussed this in Q2 2023 and agreed to revisit when the conventions update has landed, now that has happened so I wanted to revisit this issue 15:42:24 -> https://www.w3.org/2023/04/27-webmachinelearning-minutes.html#t08 15:42:45 anssik: initial issue description: "WebNN conv2d operation allows to set MLConv2dOptions.autoPad option to "same-upper" or "same-lower" of MLAutoPad enum." 15:43:03 ... proposed fix back then: "The spec should define the algorithm of how the padding values are automatically computed." and Ningxin proposed to fix this by reusing 2d pooling definitions 15:43:18 ... a new proposal emerged recently: "drop the support of MLAutoPad and only support explicit padding" 15:43:25 ... thoughts? 15:43:38 ... If we want to reduce the implementation, testing burden, and spec complexity maybe the new proposal is better? 15:44:06 q+ 15:44:23 Dwayne: haven't spent more thought on this, not strongly proposing the dropping support 15:44:47 ... maybe callers could do the work so WebNN wouldn't have to worry about this 15:44:53 ack chai 15:45:27 chai: not specific to this topic, just like any other API design discussion, there's always two sides of the argument, we have to make a decision 15:45:46 ... establishing a principle when considering an API change is helpful, this topic is a great example of this tension 15:46:04 ... defining a backend API we want it to be more tight in a sense we don't want to bloat the API space too much 15:46:21 ... define smallest possible exposure so versioning in the future is easier 15:46:39 ... we also need to consider the ease of implementation of framework developers who sit on top of WebNN API 15:47:13 ... if we make the WebNN API too low, too explicit, it will burden the framework developers more, is more error-prone for them 15:47:50 ... looking this API is look at tension points to find the best compromise, you don't want to make the API too explicit so frameworks can innovate, OTOH don't want to bloat the API 15:48:02 q+ 15:48:13 ack ningxin_hu 15:48:32 ningxin_hu: thanks Chai, for this specific issue we can investigate and ask framework authors for input 15:49:12 ... we can drop this WebNN spec and implementation if authors already handle this calculation 15:49:51 ... ONNXRT and TF 15:50:05 Dwayne: both these have helpers to massage paddings 15:50:33 q? 15:50:50 Subtopic: API lacks handling for async ML device errors on the context 15:51:01 anssik: issue #477 15:51:01 https://github.com/webmachinelearning/webnn/issues/477 -> Issue 477 API lacks handling for async ML device errors on the context (by bbernhar) 15:51:15 ... Bryan is asking: "What happens if a WebNN operation dispatched through MLContext encounters some internal error which causes the GPU device to get removed?" 15:51:36 ... as you know, Bryan is well informed of all things WebGPU and also contributing to WebNN implementation now so can help bridge the efforts 15:52:04 ... Bryan continues: "I would expect WebNN to provide a spec into how fatal (device) errors are handled so the WebNN developer could respond appropriately. If we want to do more with MLContext (ex. create buffers), I believe we'll need a more robust error mechanism like WebGPU" 15:52:08 -> WebGPU Errors & Debugging https://www.w3.org/TR/webgpu/#errors-and-debugging 15:52:49 RafaelCintron: I'm supportive of Bryan's proposal 15:53:11 ... you can use GPU and then a driver update happens and things may break, this helps with that 15:53:46 q+ 15:53:58 q? 15:54:04 ack chai 15:54:22 chai: I think this also has to do how frameworks will respond to this event? 15:54:45 ... if frameworks will bubble up this event to the app and expect the app to clean up, then it makes sense for WebNN to send this up 15:54:46 q+ 15:54:58 q+ 15:55:10 ... good to understand how frameworks do this nowadays 15:55:25 ack ningxin_hu 15:55:58 ningxin_hu: many Web APIs backed by GPU, is there a unified way to get this error information surfaced to applications? 15:56:07 ... this impacts many APIs, not just WebGPU and WebNN 15:56:18 q? 15:56:22 ack RafaelCintron 15:56:43 RafaelCintron: there's no API for error information that spans all the Web APIs using GPUs 15:56:49 q+ 15:57:22 ... WebGPU provides contextlost promise 15:57:44 ... 2D Canvas did have a feature request, but not sure if it was implemented 15:57:59 ... for WebNN inclined to follow WebGPU promise path 15:58:17 ... re frameworks, BabylonJS handles these errors 15:58:50 ... "house may burn down", so we need these APIs to help web developers build robust sites 15:58:52 q? 15:58:55 ack chai 15:59:37 chai: responding to ningxin_hu, in my previous PR, I said avoiding internal GPU device is important, if it want to use GPU device it has to use WebGPU device 15:59:43 q+ 15:59:45 ... in part motivated on error handling discussed here 16:00:08 ... if the only device WebNN will use, implying when device reset happens WebNN is part of the app stack for that WebGPU device 16:00:38 ... a way to do uniform error handling, WebNN including the framework on top, an app for WebGPU so to speak, we have an internal device "gpu" for WebNN 16:01:05 ... this is a discussion that hasn't finish and good to revisit 16:01:22 ningxin_hu: I feel we can do unification if we reuse WebGPU device for "gpu" selection 16:01:41 ... if WebNN supports NPU later we need a similar interface to surface errors 16:02:04 chai: I believe NPU will follow a different route, the NPU adapter is not handled by WebGPU because it is not a graphics adapter 16:02:34 q? 16:02:37 ack ningxin_hu 16:03:46 RRSAgent, draft minutes 16:03:47 I have made the request to generate https://www.w3.org/2023/11/16-webmachinelearning-minutes.html anssik 18:04:31 Zakim has left #webmachinelearning