14:58:58 RRSAgent has joined #webmachinelearning 14:59:02 logging to https://www.w3.org/2025/03/13-webmachinelearning-irc 14:59:02 RRSAgent, make logs Public 14:59:03 please title this meeting ("meeting: ..."), anssik 14:59:03 Meeting: WebML WG Teleconference – 13 March 2025 14:59:40 Chair: Anssi 15:00:15 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-03-13-wg-agenda.md 15:00:18 Scribe: Anssi 15:00:24 scribeNick: anssik 15:00:38 gb, this is webmachinelearning/webnn 15:00:38 anssik, OK. 15:00:43 Present+ Anssi_Kostiainen 15:00:59 ningxin has joined #webmachinelearning 15:01:08 Present+ Reilly_Grant 15:01:18 Present+ Dwayne_Robinson 15:01:22 lgombos has joined #webmachinelearning 15:01:31 Present+ Etienne_Noel 15:01:35 present + Laszlo_Gombos 15:01:46 Present+ Joshua_Bell 15:01:54 Present+ Michael_McCool 15:02:00 present + Winston_Chen 15:02:06 Present+ Mike_Wyrzykowski 15:02:14 Present+ Tarek_Ziade 15:02:29 Present+ Zoltan_Kis 15:02:50 RRSAgent, draft minutes 15:02:51 I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik 15:03:03 zkis has joined #webmachinelearning 15:03:38 anssik: Please welcome Deepti Gandluri from Google to the WebML CG and Brent Zundel from mesur.io to the WebML WG! 15:03:46 Topic: Announcements 15:03:52 Subtopic: WebML WG Charter 2025-2027 approved 15:03:58 -> Call for Participation: Web Machine Learning Working Group Charter approved; Join the Web Machine Learning WG https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2025Mar/0002.html 15:04:06 anssik: key takeaways: 15:04:16 ... The group is chartered through April 30, 2027, possible to recharted midterm to add new deliverables 15:04:28 RafaelCintron has joined #webmachinelearning 15:04:33 ... The Working Group's mission remains the same, focus on WebNN API, we'll also work closely with the Community Group to ensure alignment with new incubations and share expertise 15:04:41 Present+ Rafael_Cintron 15:04:54 ... thanks to the positive feedback from W3C members, the new charter is operational ahead of time 15:04:58 ... thank you all for your support! 15:05:12 ... Current participants are not required to rejoin, but new participants are welcome, please pass the news to your internal teams who may be interested 15:05:26 Subtopic: WebNN Samples Test Framework 15:05:40 anssik: a new automation test framework for testing W3C WebNN Samples has been contributed to the WebML CG 15:05:43 -> https://github.com/webmachinelearning/webnn-samples-test-framework 15:05:49 anssik: thank you Ning & Belem for the initial contribution! 15:05:56 ... this CLI tool automates running webnn-samples 15:05:59 -> https://github.com/webmachinelearning/webnn-samples 15:06:09 tarek has joined #webmachinelearning 15:06:12 anssik: supported OSes include Windows, Linux and macOS, supported browsers are Chrome (all channels) and Edge (only Canary) 15:06:25 Present+ Tarek_Ziade 15:06:33 ... this test framework is in scope of the CG "Test Suites and Other Software" deliverable 15:06:49 -> Web Machine Learning Community Group Charter - Test Suites and Other Software https://webmachinelearning.github.io/charter/#test-suites 15:07:20 Topic: Incubations summary 15:07:35 anssik: I again asked Etienne to share a summary of the CG's 12 March telcon, thanks Etienne for taking notes 15:07:38 -> https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-03-12-cg-minutes.md 15:08:01 anssik: as a reminder, minutes are drafts and corrections to the minutes are welcome 15:08:24 ... before sharing the summary, I'd like to acknowledge we've received feedback from EMEA participants that a more EMEA friendly time slot would be preferred 15:08:50 ... we're looking into alternating the time of the CG telcons to better serve our global community 15:09:43 [Etinne shares a summary of the 12 March CG meeting, for details see the meeting minutes] 15:11:15 Present+ Ningxin_Hu 15:11:47 RRSAgent, draft minutes 15:11:48 I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik 15:12:55 q? 15:13:21 Topic: Operator set Wave 3 and spin-offs 15:13:28 anssik: PR #805 has been merged 15:13:29 https://github.com/webmachinelearning/webnn/pull/805 -> MERGED Pull Request 805 Operator set wave 3 (by fdwr) 15:13:50 ... congrats to the WG and massive thanks to Dwayne for all the updates to the PR, and Ningxin and Josh, others for your in-depth review comments and suggestions! 15:14:30 ... as shared earlier, I'll initiate TAG review soon and this op set Wave 3 is a key piece of that review scope 15:14:44 ... I closed the big transformers issue #375 that we've referred to for all transformers-related work 15:14:44 https://github.com/webmachinelearning/webnn/issues/375 -> CLOSED Issue 375 Support for transformers (by dontcallmedom) [opset] 15:14:58 ... the group now satisfies its requirements for well-known transformer models as outlined in its charter scope 15:15:13 ... incremental improvements are expected and we will track future improvements in separate smaller issues now that the foundations are in place 15:15:17 Thank you Joshua and Ningxin for the careful eye 🙏 and the 243 comments! :) 15:15:36 ... I want to thank you all again, and convey feedback we received from Julien Chaumond, Hugging Face CTO: "This is insanely impactful work you've been doing, thank you" 15:15:54 Always happy to help, Dwayne! 15:16:01 Thanks for Dwayne, this is a huge contribution! 15:16:04 ... Keep up the great work! 15:16:44 Subtopic: dequantizeLinear emulation improvement 15:16:51 anssik: the first spin-off from PR #805 15:16:57 -> https://github.com/webmachinelearning/webnn/issues/779#issuecomment-2689214272 15:16:58 https://github.com/webmachinelearning/webnn/issues/779 -> CLOSED Issue 779 Support block-wise quantization (by huningxin) [operator specific] 15:17:22 anssik: Dwayne realized dequantizeLinear emulation could be improved significantly if expand() was augmented to accept any from-shape that was an integer multiple of the to-shape 15:17:27 -> https://www.w3.org/TR/webnn/#api-mlgraphbuilder-expand 15:17:41 anssik: given issue #779 was closed by PR #805, I'd support tracking this as a separate issue 15:17:42 https://github.com/webmachinelearning/webnn/issues/779 -> CLOSED Issue 779 Support block-wise quantization (by huningxin) [operator specific] 15:17:42 https://github.com/webmachinelearning/webnn/pull/805 -> MERGED Pull Request 805 Operator set wave 3 (by fdwr) 15:17:53 anssik: Dwayne feel free to create a new issue for this improvement 15:18:42 Dwayne: good summary, this would enable cleaner composability 15:19:07 ... I will open a new issue for that 15:19:26 Subtopic: Gather multiaxis operator proposal 15:19:29 anssik: a second spin-off from PR #805 15:19:32 -> https://github.com/webmachinelearning/webnn/issues/767#issuecomment-2688528116 15:19:33 https://github.com/webmachinelearning/webnn/issues/767 -> CLOSED Issue 767 Request the decomposition for gatherElements, scatterElements and scatterND (by fujunwei) [operator specific] 15:19:51 anssik: Dwayne has explored a way to more generically express a gathering operation, rather than the 3 distinct operators: gather elements, gather blocks, gather ND blocks 15:19:57 ... problem statement: "Is there a more fundamental expression of a gathering operation that is more generic while also being simpler to document and implement?" 15:20:19 anssik: Dwayne's draft Gather multiaxis operator proposal to more generically express a gathering operation has a ton of important details: 15:20:24 -> https://github.com/fdwr/MachineLearningOperators/blob/master/Multigather.md 15:20:31 anssik: I invite folks to review the draft for: 15:20:36 ... - operator equivalence classes 15:20:43 ... - proposed multigather operator IDL 15:20:47 ... - a prototype implementation of gatherMultiaxis() in JS 15:20:50 ... - operator mapping 15:21:02 anssik: This could become another standalone issue? I think the group may need some extra time to look into this 15:21:41 Dwayne: separate issue makes sense, this is still a rough draft, takes a while to digest all the details 15:21:52 ... all three ops use the same shader under the hood 15:23:00 anssik: this is pioneering effort 15:23:04 q? 15:23:51 q+ 15:23:56 ack jsbell 15:24:18 jsbell: wanted to ask about u/int4, rough timeline? 15:24:22 Dwayne: next week 15:24:48 Topic: Query mechanism for supported devices 15:25:08 anssik: issue #815 15:25:08 https://github.com/webmachinelearning/webnn/issues/815 -> Issue 815 Query mechanism for supported devices (by anssiko) [device selection] 15:25:14 anssik: thanks everyone for your contributions in the issue 15:25:21 ... I'm happy to see active discussion and new perspectives, ideation 15:25:45 ... as a reminder, the approach I'd prefer us to use is extract real-world use cases first, then assess implementability and only last go deep into the solution space 15:26:03 ... I'll recap the recent feedback received, and then we will open the discussion and folks can fill in the blanks 15:26:28 ... Mike's comment "a processing chip is best described in terms of capabilities" 15:26:57 ... received a response from Reilly: "this is best in principle however the reality of all the frameworks we have been prototyping WebNN against is that none of them expose which capabilities a processing chip supports" 15:27:18 ... and continued "the capabilities we end up exposing through the opSupportLimits() method are the capabilities of the framework in general rather than any of the processors it can target." 15:27:32 ... Mike suggested listing opSupportLimits() per processor 15:28:02 ... Zoltan suggested passing requiredCapabilities (dataTypes, maximumRank, operators) at contextCreation() invocation that would automatically select a device 15:28:47 ... Mike shared "in the WebGPU specification we discussed this a bit as many of the same privacy consideration exist [...] the UA can choose to bucket / report any specific limits / features / operations they wish to mitigate the potential privacy concerns." 15:28:51 -> https://www.w3.org/TR/webgpu/#privacy-machine-limits 15:29:29 anssik: continued: "perhaps a browser wants all models to run on a class of devices with different hardware support, so it reports the lowest common set supported across all devices even on the higher end devices" 15:29:58 ... Reilly suggested defining the processor types in terms of abstractions exposed by the web platform: 15:30:08 ... - The "cpu" is the processor where JavaScript and WebAssembly execute 15:30:21 ... - The "gpu" is the processor where WebGL and WebGPU programs execute 15:30:25 ... - All other processor types are unnamed 15:30:33 ... ("cpu" and "gpu" are placeholder names, could be "foo" and "bar") 15:31:17 ... Reilly's proposal reads: "The control I propose adding is the ability to request an MLContext with the capacity to invoke ML graphs using the capacity of either the "cpu" or "gpu" processor, or that explicitly does not invoke ML graphs using the capacity of either the "cpu" or "gpu" processor. Implementations are free to ignore this request." 15:31:25 present+ Thomas_Steiner 15:31:27 ... "Additionally, I propose adding a property which informs the developer which processors were involved in a previous ML graph dispatch task." 15:31:33 ... capabilities != capacity, capacity answer a question "can this perform adequately?" 15:31:49 ... the two use cases extracted from Reilly's proposals: 15:31:54 ... - UC1: "allow the developer to provide a hint to the platform about the workloads it intends to run (e.g. please don't use the GPU for this ML task, there is WebGPU work I intend to schedule as well)" 15:32:14 ... - UC2: "allow the developer to understand the performance of their application based on how the browser ended up being able to schedule their workloads" 15:32:24 ... Zoltan's refined proposal for includeDevice + excludeDevice as hints to createContext() 15:32:43 ... Josh proposed "Maybe [MLContext.]dispatch() could return a Promise that resolves to a dictionary with details of the inference, e.g. what compute units it used, and in the future additional diagnostics like timing?" 15:33:22 anssik: to summarize, we received new use cases (e.g. UC1 and UC2), feedback on implementability (capabilities of frameworks vs. chips), and many proposed solutions for the possible API shape (perProcessorSupportedLimits, "cpu-like" & "gpu-like", includeDevice & excludeDevice hints, dispatch() resolving with inference details) 15:33:31 ... I'd suggest Zoltan to extract the use cases from this issue and add them to the explainer 15:33:44 anssik: I'd open the floor for discussion, please queue up 15:35:04 Mike: if we want to expose capabilities of each chip, we have more features in WebGPU than in WebNN, so could mitigate the privacy issues 15:36:08 Reilly: I'm supportive of Zoltan's and Mike's proposal on how to expose processor capabilities in more nuanced way, I'm completely supportive on adding that to the API 15:36:36 ... only hesitation is the current frameworks do not support that, not able to query Core ML, TensorFlow etc. for processors to use 15:37:07 ... proposal is mainly focused on, given the constraints, what information is useful for developers 15:37:28 ... not wanting to explicitly name every processor type, the ecosystem will shift and we want future-proofing 15:38:07 ... I can justify calling "something-cpu" and "something-gpu" that tie to Wasm and WebGL/GPU 15:38:24 ... capacity of the system to executure JS comes from "something-cpu" 15:38:52 ... that reflects the feedback we get from developers, they have a portfolio of features 15:39:17 ... you want to hint the browser ahead of time, if certain placement of the work is a good idea 15:39:29 q? 15:40:26 Zoltan: I was happy to see Reilly's take to define cpu and gpu in terms of browser implementation, my hunch is they're not mutually exclusive with Mike's proposal, it is also a simple API easier to implement 15:41:00 ... OTOH, the Mike's proposal has been out for a while, might take more work to implement, could do both 15:41:20 ... Mike, any concerns to implement with the cpu-like and gpu-like concepts? 15:41:52 Mike: no direct concerns with the proposal, but not immediately clear of cpu-like and gpu-like are useful if the frameworks don't expose this information? 15:42:07 s/clear of/clear if 15:42:14 Mike: need to read Reilly's proposal again 15:42:32 q+ 15:43:03 jsbell: I accumulation inference results on the objects, would prefer to have a more direct way to get to that information 15:43:05 ack reillyg 15:43:30 Reilly: wanted to clarify that the two proposals have two different use cases, or developer needs 15:44:06 ... 1) tailor for capabilities, 2) capacity question, the amount of compute resources the system has to execute XPU workloads 15:44:17 ... propose to tackle the capacity question first 15:44:45 +1 to reilly's point, I also see them cater to different developer flows, both seem valid 15:45:05 ... capability question is also interesting, we'd need to hard code the capabilities, which is easy if only a small number of chips are targets 15:45:07 q? 15:45:34 q+ 15:45:47 Zoltan: we can wait on Mike, it is important to get his feedback, Reilly has defined the use case clearly, whether we can implement that differently 15:45:54 ack RafaelCintron 15:46:18 RafaelCintron: question for Reilly on the capacity proposal, is capacity a moment in time? 15:46:27 ... or something that changes over time 15:47:16 Reilly: dispatch returning some information where inference actually executed on is one part 15:47:39 ... ahead of time hint, based on what I know, this works on this class of devices, try to run the workload here 15:48:11 ... two signals 1) where you should put the workload and 2) this is where the system actually ran it on 15:48:40 ... I expect developers to test on machines many users have, representative sample of devices to understand their characteristics 15:48:58 RafaelCintron: how do they know the browsers run on those systems? 15:49:13 reillyg: for fingerprinting reasons, we don't share excess information 15:49:41 RafaelCintron: WebGPU has a device class 15:51:13 ... what if the web developer learns "CPU and GPU are off limits", other available devices do not have the required ops, what to do? 15:51:31 reillyg: hint back to developer, "here's where it actually ran" is a useful signal 15:52:20 ... potentially room for a device class style hint for WebNN that answers a question "do you have non-GPU-style accelerator?" 15:52:21 q? 15:53:45 Topic: Caching mechanism for MLGraph 15:53:49 anssik: issue #807 15:53:50 https://github.com/webmachinelearning/webnn/issues/807 -> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request] 15:54:01 ... no new feedback in the issue since we discussed this last time 15:54:07 q+ 15:54:27 ... first, I wanted to check if anyone has investigated implementation issues with model cache within browser sandbox and possible solutions on how to overcome the sandbox restrictions safely? 15:54:39 ... second, would like to gather further feedback on implementation-defined vs. explicit caching API, but would like to understand the implementability story first 15:54:42 ack reillyg 15:55:26 Reilly: I can speak to both, we haven't had a change to look at implementation yet, we believe it is feasible to implement caching in our current Chromium prototype for TFLite and CoreML backends 15:55:39 ... sandboxing is not an issue, because this is an origin-scoped cache 15:56:19 ... the reason for explicit caching API has to do with weights 15:56:49 ... the distribution between WebGPU and WebNN is different, for WebGPU caching shaders make sense, smaller 15:57:15 ... weights in WebNN can be reorganized in compilation stage, shader can modify the weights 15:58:12 McCool: I think we have to understand use cases, save download or compile time 15:58:35 ... single origin vs. cross-origin caches 16:00:17 Reilly: this is totally implementable, but it is work, however looking at ORT and TFLite they don't have a concept of a cached model 16:00:40 ... required implementation work on both browser side of time and framework side of things 16:00:59 I need to go. Losing my room. 16:01:38 Reilly: ORT has a Session option to dump the optimized file to an .ort file format which reloads more quickly. We might want to ask ORT to add a way to get it via a memory blob rather than file path. 16:01:45 ningxin: one comment regarding frameworks, I'm aware of ORT concept, EPContext, that allow embed a compiled binary blob into ONNX model, encapsulated as a special operator 16:02:11 ... worth investigating if this could help with model caching 16:02:13 q? 16:02:40 q? 16:03:03 (regarding my comment, my point was an implicit cache cannot save model download time - you need an explicit cache) 16:03:59 RRSAgent, draft minutes 16:04:00 I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik 16:11:09 s/executure/execute 16:12:28 s/I accumulation/accumulating 16:15:49 s/side of time/side 16:16:58 RRSAgent, draft minutes 16:17:00 I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik