14:54:07 <RRSAgent> RRSAgent has joined #webmachinelearning
14:54:11 <RRSAgent> logging to https://www.w3.org/2023/11/16-webmachinelearning-irc
14:54:11 <Zakim> RRSAgent, make logs Public
14:54:12 <Zakim> please title this meeting ("meeting: ..."), anssik
14:54:12 <anssik> Meeting: WebML WG Teleconference – 16 November 2023
14:54:16 <anssik> Chair: Anssi
14:54:20 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-11-16-wg-agenda.md
14:54:24 <anssik> Scribe: Anssi
14:54:30 <anssik> scribeNick: anssik
14:54:38 <anssik> gb, this is webmachinelearning/webnn
14:54:38 <gb> anssik, OK.
14:54:43 <anssik> Present+ Anssi_Kostiainen
14:54:50 <anssik> RRSAgent, draft minutes
14:54:51 <RRSAgent> I have made the request to generate https://www.w3.org/2023/11/16-webmachinelearning-minutes.html anssik
15:00:17 <ningxin_hu> ningxin_hu has joined #webmachinelearning
15:00:34 <Joshua_Lochner> Joshua_Lochner has joined #webmachinelearning
15:00:45 <anssik> Present+ Joshua_Lochner
15:01:01 <anssik> Present+ Ningxin_Hu
15:01:27 <anssik> Present+ Chai_Chaoweeraprasit
15:01:45 <anssik> Present+ Austin_Sullivan
15:01:58 <chai> chai has joined #webmachinelearning
15:02:00 <anssik> Present+ Deepti_Gandluri
15:02:22 <anssik> Present+ Dwayne_Robinson
15:03:07 <jsbell> jsbell has joined #webmachinelearning
15:03:07 <anssik> Present+ Joshua_Bell
15:03:07 <anssik> Present+ Zoltan_Kis
15:03:07 <reillyg> Present+ Reilly_Grant
15:03:07 <Deepti> Deepti has joined #webmachinelearning
15:03:07 <anssik> Present+ Reilly_Grant
15:03:17 <dwayner> dwayner has joined #webmachinelearning
15:03:29 <anssik> RRSAgent, draft minutes
15:03:30 <RRSAgent> I have made the request to generate https://www.w3.org/2023/11/16-webmachinelearning-minutes.html anssik
15:03:56 <anssik> Present+ Dominique_Hazael-Massieux
15:04:13 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:04:20 <anssik> Topic: Announcements
15:04:23 <anssik> Subtopic: Web & Networks IG coordination
15:04:42 <anssik> anssik: W3C Web & Networks Interest Group is rechartering with a coordination opportunity with WebML WG
15:04:57 <anssik> ... this IG is interested in working with us to explore how to load balance computing between cloud and client
15:05:04 <dom> Present+ Rafael_Cintron
15:05:04 <anssik> .... an example of this could be an inference workload
15:05:15 <anssik> ... this IG has network infrastructure experts in it
15:05:19 <asully> asully has joined #webmachinelearning
15:06:01 <anssik> ... expected IG investigations include identifying what network characteristics (bandwidth, latency, radio power consumption etc.) would be helpful to surface as higher-level hints via Web APIs or protocols to help web apps using APIs such as WebNN make informed decisions on where to run their workloads
15:06:09 <anssik> -> Proposed Web and Networks Interest Group Charter https://www.w3.org/2023/11/proposed-web-networks-charter.html
15:06:19 <anssik> -> Voting instructions (Member-only): https://lists.w3.org/Archives/Member/w3c-ac-members/2023OctDec/0029.html
15:06:27 <anssik> anssik: every W3C Member company can vote, talk to your AC rep to cast a vote
15:07:00 <anssik> Subtopic: Implementation status
15:07:14 <anssik> -> Implementation Status of WebNN Operations https://webmachinelearning.github.io/webnn-status/
15:07:31 <anssik> anssik: I wanted to check if there are implementation status updates to share with the WG
15:07:36 <anssik> ... should we check with Belem to update the above status page?
15:07:51 <anssik> ... I recall Reilly asked a questions about macOS backend implementation on our earlier call?
15:07:56 <zolkis> zolkis has joined #webmachinelearning
15:08:36 <anssik> Reilly: we have been looking at the question of implementation against CoreML
15:09:02 <anssik> anssik: we look forward to welcoming Apple on board this WG
15:09:17 <anssik> ... but while waiting for that to happen, I'd encourage the WG to investigate based on publicly available information, implementation story on Apple's platforms
15:09:50 <anssik> ... I also noticed Reilly recently made helpful contributions to the WebNN polyfill, thanks!
15:10:11 <anssik> ... while the polyfill is not considered an implementation per se, it helps web developers bring WebNN-accelerated web experiences to platforms that do not yet have WebNN native implementation with the same codebase
15:11:12 <anssik> Reilly: I was focused on WebNN samples, their performance, polyfill contributions were to upgrade the polyfill dependencies to get more up to date view on its performance
15:12:20 <anssik> ... I got stuck with the patch with TF upgrade, the way how the polyfill is compiling under browser, but test suite runs on Node.js
15:12:48 <anssik> ... I'm probably not going to contribute to this polyfill beyond low-hanging fruit issues
15:13:54 <anssik> q?
15:14:08 <anssik> Subtopic: Web LLM collaboration on hybrid execution use case
15:14:22 <deepti> deepti has joined #webmachinelearning
15:14:39 <dom> Present+ Vivek_Sekhar
15:14:43 <anssik> anssik: Tianqi Chen from Carnegie Mellon University, OctoML, created Web LLM, a JS library that accelerates select LLMs in browsers with WebGPU
15:14:45 <Vivek> Vivek has joined #webmachinelearning
15:14:47 <anssik> -> Web LLM repo https://github.com/mlc-ai/web-llm
15:14:58 <anssik> anssik: Web LLM runs Llama with 70b parameters in some high-end systems
15:15:14 <anssik> ... Tianqi shared an interesting use case with this WG: "There are great synergies to webnn related projects that possibly enables future hybrid executions of models(e.g. webgpu for customized op and some through webnn)"
15:15:34 <anssik> -> Proposed hybrid execution use case https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1803950944
15:15:35 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]
15:15:58 <anssik> anssik: if we find time from his busy calendar we'll invite Tianqi to our future call to share his experiences with Web LLM
15:16:13 <anssik> ... meanwhile, I suggest we may spin this hybrid execution use case into its own GH issue, thoughts?
15:17:20 <anssik> Joshua_Lochner: I've definitely paid attention to this project, 70B params is amazing, I'm interested in running similar models on lower end systems
15:17:41 <anssik> q?
15:17:51 <ningxin_hu> q+
15:17:56 <anssik> ack ningxin_hu
15:18:27 <ningxin_hu> https://bugs.chromium.org/p/chromium/issues/detail?id=1492036#c3
15:18:35 <anssik> ningxin_hu: not related to Web LLM, to follow-up on the benchmark, we got benchmark data regarding the split and emulation path for slice
15:19:24 <anssik> ... decomposing to multiple slice, we see perf difference in the range of 10-20%, both iGPU and dGPU, please see the cgbug for details
15:19:42 <anssik> q?
15:19:56 <anssik> Topic: WebNN v2: Review transformer ops spec contributions (continued)
15:20:12 <anssik> anssik: issue #375 and PR #478
15:20:12 <gb> https://github.com/webmachinelearning/webnn/issues/478 -> Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115)
15:20:12 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]
15:20:40 <anssik> anssik: Chai submitted a PR #478 to add support for new ops and data types needed by well-known transformers the WG has identified -- thanks Chai and also Dwayne!
15:21:06 <anssik> ... this PR also removes one op per our maintenance commitment and in-depth understanding of model targets thanks to careful research by participants
15:21:16 <anssik> ... I'd encourage the WG to review this PR
15:21:27 <anssik> ... this is a substantive change, diff stats +1500 -900
15:21:49 <anssik> ... Chai's PR comment provide a great summary so I'd like Chai to walk us through this PR and perhaps highlight areas where reviewers should most focus on
15:21:54 <anssik> q?
15:22:27 <anssik> Chai: thanks Anssi, thanks all for the early feedback!
15:23:19 <anssik> ... in the PR description I have the summary, this is not an extensive list of ops needed for Transformers, this is a starting point
15:23:37 <anssik> ... allows us to run some popular models we've identified, there will likely be additions later
15:24:14 <anssik> ... you see some removal, we are considering the entire spec "v1" and we actively validating the spec driven by implementation experience
15:25:04 <anssik> ... when we solidify the op set we want to be more strict about changes
15:25:05 <jsbell> q+
15:25:30 <anssik> ... regarding CoreML compatibility, for every spec change we have looked at CoreML to make sure this works there
15:25:55 <anssik> ... we also look at the major ML frameworks compatibility, in addition to major OS ML APIs
15:26:15 <anssik> ... if folks think there should be additional compatibility targets, please let us know
15:26:57 <anssik> ... we discussed e.g. clamp, min and max being equal, a minor case but frameworks did not agree on that minor detail
15:26:58 <anssik> q?
15:27:48 <anssik> ack jsbell
15:28:22 <anssik> jsbell: thanks you Chai for confirming we are still comfortable making changes to this spec
15:28:44 <anssik> ... how is the WPT coverage for the API, when we add ops do we have WPT tests for those?
15:28:45 <anssik> q?
15:29:28 <dom> (in this case, this would be adding tests and additions to the baseline implementation)
15:29:41 <anssik> chai: touching on breaking change aspect, we can make change at this stage, but I wouldn't make big changes, this PR makes one such change by removing one op
15:30:01 <anssik> ... red diff is due to bikeshed formatting mostly
15:30:09 <anssik> ... we are updating WPT in tandem with the spec PR
15:30:10 <anssik> q?
15:30:26 <anssik> ningxin_hu: we have a small team working on WPT for these new ops
15:30:34 <dom> (for the record, the culprit of the overly red diff is not bikeshed, but the diff script itself)
15:30:47 <anssik> ... baseline implementation, pure JS impl of all the ops, is updated to help testing effort
15:30:52 <ningxin_hu> https://github.com/webmachinelearning/webnn-baseline/pulls
15:31:18 <anssik> ningxin_hu: we've merged some of these new ops and are in progress of reviewing the remaining ops being added
15:31:41 <ningxin_hu> https://github.com/web-platform-tests/wpt/pull/43179
15:31:48 <anssik> ... also WPT PRs work in progress
15:32:31 <anssik> ... we want to get the WPT tests right and add gradually, help wanted to review WPT PRs
15:32:33 <jsbell> q+
15:32:46 <anssik> ack jsbell
15:32:58 <anssik> jsbell: thanks, that's awesome!
15:33:25 <ningxin_hu> sgtm, will do that
15:33:34 <anssik> ... please include status update on WPT tests in the PR #478
15:33:35 <gb> https://github.com/webmachinelearning/webnn/issues/478 -> Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115)
15:35:10 <anssik> q?
15:35:48 <anssik> Topic: Enhancements
15:36:01 <anssik> Subtopic: Simplify matmul op
15:36:05 <anssik> anssik: issue #470
15:36:05 <gb> https://github.com/webmachinelearning/webnn/issues/470 -> Issue 470 Simplify `matmul` op (by huningxin)
15:36:33 <anssik> anssik: WebNN matmul supports 1-D input tensors but 1-D input tensors are not widely supported by native ML APIs, specifically:
15:36:48 <anssik> ... - DirectML's DML_GEMM_OPERATOR_DESC
15:36:48 <anssik> ... - BNNS's BroadcastMatMul
15:36:48 <anssik> ... - TensorFlow's BatchMatMulV2
15:36:58 <anssik> ... the open question is:
15:37:02 <anssik> ... - should matmul drop support for 1-D input tensors?
15:37:18 <anssik> ... frameworks could still support 1-D input tensors by reshaping to 2-D, prepend or append 1 dimension
15:37:30 <anssik> ... this reshape incurs no performance penalty, because no memory copy
15:37:45 <anssik> ... this change would simplify the implementation, WPT tests, reduce conformance testing cost
15:37:59 <anssik> q?
15:38:57 <anssik> ningxin_hu: in the CL review we prototyped without 1-D support
15:39:56 <anssik> ... additional comment, in the issue we looked at XNNPACK's xnn_define_batch_matrix_multiply
15:40:12 <anssik> ... none of these native API support 1-D so this is good to be dropped
15:40:15 <anssik> q?
15:41:05 <anssik> anssik: everyone OK to drop support for 1-D input tensors?
15:41:12 <anssik> [ silence mean consent ]
15:41:50 <anssik> Subtopic: Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option
15:41:54 <anssik> anssik: issue #326
15:41:55 <gb> https://github.com/webmachinelearning/webnn/issues/326 -> Issue 326 Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option (by huningxin) [Editorial]
15:42:18 <anssik> anssik: we discussed this in Q2 2023 and agreed to revisit when the conventions update has landed, now that has happened so I wanted to revisit this issue
15:42:24 <anssik> -> https://www.w3.org/2023/04/27-webmachinelearning-minutes.html#t08
15:42:45 <anssik> anssik: initial issue description: "WebNN conv2d operation allows to set MLConv2dOptions.autoPad option to "same-upper" or "same-lower" of MLAutoPad enum."
15:43:03 <anssik> ... proposed fix back then: "The spec should define the algorithm of how the padding values are automatically computed." and Ningxin proposed to fix this by reusing 2d pooling definitions
15:43:18 <anssik> ... a new proposal emerged recently: "drop the support of MLAutoPad and only support explicit padding"
15:43:25 <anssik> ... thoughts?
15:43:38 <anssik> ... If we want to reduce the implementation, testing burden, and spec complexity maybe the new proposal is better?
15:44:06 <chai> q+
15:44:23 <anssik> Dwayne: haven't spent more thought on this, not strongly proposing the dropping support
15:44:47 <anssik> ... maybe callers could do the work so WebNN wouldn't have to worry about this
15:44:53 <anssik> ack chai
15:45:27 <anssik> chai: not specific to this topic, just like any other API design discussion, there's always two sides of the argument, we have to make a decision
15:45:46 <anssik> ... establishing a principle when considering an API change is helpful, this topic is a great example of this tension
15:46:04 <anssik> ... defining a backend API we want it to be more tight in a sense we don't want to bloat the API space too much
15:46:21 <anssik> ... define smallest possible exposure so versioning in the future is easier
15:46:39 <anssik> ... we also need to consider the ease of implementation of framework developers who sit on top of WebNN API
15:47:13 <anssik> ... if we make the WebNN API too low, too explicit, it will burden the framework developers more, is more error-prone for them
15:47:50 <anssik> ... looking this API is look at tension points to find the best compromise, you don't want to make the API too explicit so frameworks can innovate, OTOH don't want to bloat the API
15:48:02 <ningxin_hu> q+
15:48:13 <anssik> ack ningxin_hu
15:48:32 <anssik> ningxin_hu: thanks Chai, for this specific issue we can investigate and ask framework authors for input
15:49:12 <anssik> ... we can drop this WebNN spec and implementation if authors already handle this calculation
15:49:51 <anssik> ... ONNXRT and TF
15:50:05 <anssik> Dwayne: both these have helpers to massage paddings
15:50:33 <anssik> q?
15:50:50 <anssik> Subtopic: API lacks handling for async ML device errors on the context
15:51:01 <anssik> anssik: issue #477
15:51:01 <gb> https://github.com/webmachinelearning/webnn/issues/477 -> Issue 477 API lacks handling for async ML device errors on the context (by bbernhar)
15:51:15 <anssik> ... Bryan is asking: "What happens if a WebNN operation dispatched through MLContext encounters some internal error which causes the GPU device to get removed?"
15:51:36 <anssik> ... as you know, Bryan is well informed of all things WebGPU and also contributing to WebNN implementation now so can help bridge the efforts
15:52:04 <anssik> ... Bryan continues: "I would expect WebNN to provide a spec into how fatal (device) errors are handled so the WebNN developer could respond appropriately. If we want to do more with MLContext (ex. create buffers), I believe we'll need a more robust error mechanism like WebGPU"
15:52:08 <anssik> -> WebGPU Errors & Debugging https://www.w3.org/TR/webgpu/#errors-and-debugging
15:52:49 <anssik> RafaelCintron: I'm supportive of Bryan's proposal
15:53:11 <anssik> ... you can use GPU and then a driver update happens and things may break, this helps with that
15:53:46 <chai> q+
15:53:58 <anssik> q?
15:54:04 <anssik> ack chai
15:54:22 <anssik> chai: I think this also has to do how frameworks will respond to this event?
15:54:45 <anssik> ... if frameworks will bubble up this event to the app and expect the app to clean up, then it makes sense for WebNN to send this up
15:54:46 <ningxin_hu> q+
15:54:58 <RafaelCintron> q+
15:55:10 <anssik> ... good to understand how frameworks do this nowadays
15:55:25 <anssik> ack ningxin_hu
15:55:58 <anssik> ningxin_hu: many Web APIs backed by GPU, is there a unified way to get this error information surfaced to applications?
15:56:07 <anssik> ... this impacts many APIs, not just WebGPU and WebNN
15:56:18 <anssik> q?
15:56:22 <anssik> ack RafaelCintron
15:56:43 <anssik> RafaelCintron: there's no API for error information that spans all the Web APIs using GPUs
15:56:49 <chai> q+
15:57:22 <anssik> ... WebGPU provides contextlost promise
15:57:44 <anssik> ... 2D Canvas did have a feature request, but not sure if it was implemented
15:57:59 <anssik> ... for WebNN inclined to follow WebGPU promise path
15:58:17 <anssik> ... re frameworks, BabylonJS handles these errors
15:58:50 <anssik> ... "house may burn down", so we need these APIs to help web developers build robust sites
15:58:52 <anssik> q?
15:58:55 <anssik> ack chai
15:59:37 <anssik> chai: responding to ningxin_hu, in my previous PR, I said avoiding internal GPU device is important, if it want to use GPU device it has to use WebGPU device
15:59:43 <ningxin_hu> q+
15:59:45 <anssik> ... in part motivated on error handling discussed here
16:00:08 <anssik> ... if the only device WebNN will use, implying when device reset happens WebNN is part of the app stack for that WebGPU device
16:00:38 <anssik> ... a way to do uniform error handling, WebNN including the framework on top, an app for WebGPU so to speak, we have an internal device "gpu" for WebNN
16:01:05 <anssik> ... this is a discussion that hasn't finish and good to revisit
16:01:22 <anssik> ningxin_hu: I feel we can do unification if we reuse WebGPU device for "gpu" selection
16:01:41 <anssik> ... if WebNN supports NPU later we need a similar interface to surface errors
16:02:04 <anssik> chai: I believe NPU will follow a different route, the NPU adapter is not handled by WebGPU because it is not a graphics adapter
16:02:34 <anssik> q?
16:02:37 <anssik> ack ningxin_hu
16:03:46 <anssik> RRSAgent, draft minutes
16:03:47 <RRSAgent> I have made the request to generate https://www.w3.org/2023/11/16-webmachinelearning-minutes.html anssik
18:04:31 <Zakim> Zakim has left #webmachinelearning