14:58:58 <RRSAgent> RRSAgent has joined #webmachinelearning
14:59:02 <RRSAgent> logging to https://www.w3.org/2025/03/13-webmachinelearning-irc
14:59:02 <Zakim> RRSAgent, make logs Public
14:59:03 <Zakim> please title this meeting ("meeting: ..."), anssik
14:59:03 <anssik> Meeting: WebML WG Teleconference – 13 March 2025
14:59:40 <anssik> Chair: Anssi
15:00:15 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-03-13-wg-agenda.md
15:00:18 <anssik> Scribe: Anssi
15:00:24 <anssik> scribeNick: anssik
15:00:38 <anssik> gb, this is webmachinelearning/webnn
15:00:38 <gb> anssik, OK.
15:00:43 <anssik> Present+ Anssi_Kostiainen
15:00:59 <ningxin> ningxin has joined #webmachinelearning
15:01:08 <anssik> Present+ Reilly_Grant
15:01:18 <anssik> Present+ Dwayne_Robinson
15:01:22 <lgombos> lgombos has joined #webmachinelearning
15:01:31 <anssik> Present+ Etienne_Noel
15:01:35 <lgombos> present + Laszlo_Gombos
15:01:46 <anssik> Present+ Joshua_Bell
15:01:54 <anssik> Present+ Michael_McCool
15:02:00 <Winston> present + Winston_Chen
15:02:06 <anssik> Present+ Mike_Wyrzykowski
15:02:14 <anssik> Present+ Tarek_Ziade
15:02:29 <anssik> Present+ Zoltan_Kis
15:02:50 <anssik> RRSAgent, draft minutes
15:02:51 <RRSAgent> I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik
15:03:03 <zkis> zkis has joined #webmachinelearning
15:03:38 <anssik> anssik: Please welcome Deepti Gandluri from Google to the WebML CG and Brent Zundel from mesur.io to the WebML WG!
15:03:46 <anssik> Topic: Announcements
15:03:52 <anssik> Subtopic: WebML WG Charter 2025-2027 approved
15:03:58 <anssik> -> Call for Participation: Web Machine Learning Working Group Charter approved; Join the Web Machine Learning WG https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2025Mar/0002.html
15:04:06 <anssik> anssik: key takeaways:
15:04:16 <anssik> ... The group is chartered through April 30, 2027, possible to recharted midterm to add new deliverables
15:04:28 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:04:33 <anssik> ... The Working Group's mission remains the same, focus on WebNN API, we'll also work closely with the Community Group to ensure alignment with new incubations and share expertise
15:04:41 <anssik> Present+ Rafael_Cintron
15:04:54 <anssik> ... thanks to the positive feedback from W3C members, the new charter is operational ahead of time
15:04:58 <anssik> ... thank you all for your support!
15:05:12 <anssik> ... Current participants are not required to rejoin, but new participants are welcome, please pass the news to your internal teams who may be interested
15:05:26 <anssik> Subtopic: WebNN Samples Test Framework
15:05:40 <anssik> anssik: a new automation test framework for testing W3C WebNN Samples has been contributed to the WebML CG
15:05:43 <anssik> -> https://github.com/webmachinelearning/webnn-samples-test-framework
15:05:49 <anssik> anssik: thank you Ning & Belem for the initial contribution!
15:05:56 <anssik> ... this CLI tool automates running webnn-samples
15:05:59 <anssik> -> https://github.com/webmachinelearning/webnn-samples
15:06:09 <tarek> tarek has joined #webmachinelearning
15:06:12 <anssik> anssik: supported OSes include Windows, Linux and macOS, supported browsers are Chrome (all channels) and Edge (only Canary)
15:06:25 <anssik> Present+ Tarek_Ziade
15:06:33 <anssik> ... this test framework is in scope of the CG "Test Suites and Other Software" deliverable
15:06:49 <anssik> -> Web Machine Learning Community Group Charter - Test Suites and Other Software https://webmachinelearning.github.io/charter/#test-suites
15:07:20 <anssik> Topic: Incubations summary
15:07:35 <anssik> anssik: I again asked Etienne to share a summary of the CG's 12 March telcon, thanks Etienne for taking notes
15:07:38 <anssik> -> https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-03-12-cg-minutes.md
15:08:01 <anssik> anssik: as a reminder, minutes are drafts and corrections to the minutes are welcome
15:08:24 <anssik> ... before sharing the summary, I'd like to acknowledge we've received feedback from EMEA participants that a more EMEA friendly time slot would be preferred
15:08:50 <anssik> ... we're looking into alternating the time of the CG telcons to better serve our global community
15:09:43 <anssik> [Etinne shares a summary of the 12 March CG meeting, for details see the meeting minutes]
15:11:15 <anssik> Present+ Ningxin_Hu
15:11:47 <anssik> RRSAgent, draft minutes
15:11:48 <RRSAgent> I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik
15:12:55 <anssik> q?
15:13:21 <anssik> Topic: Operator set Wave 3 and spin-offs
15:13:28 <anssik> anssik: PR #805 has been merged
15:13:29 <gb> https://github.com/webmachinelearning/webnn/pull/805 -> MERGED Pull Request 805 Operator set wave 3 (by fdwr)
15:13:50 <anssik> ... congrats to the WG and massive thanks to Dwayne for all the updates to the PR, and Ningxin and Josh, others for your in-depth review comments and suggestions!
15:14:30 <anssik> ... as shared earlier, I'll initiate TAG review soon and this op set Wave 3 is a key piece of that review scope
15:14:44 <anssik> ... I closed the big transformers issue #375 that we've referred to for all transformers-related work
15:14:44 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> CLOSED Issue 375 Support for transformers (by dontcallmedom) [opset]
15:14:58 <anssik> ... the group now satisfies its requirements for well-known transformer models as outlined in its charter scope
15:15:13 <anssik> ... incremental improvements are expected and we will track future improvements in separate smaller issues now that the foundations are in place
15:15:17 <dwayner> Thank you Joshua and Ningxin for the careful eye 🙏 and the 243 comments! :)
15:15:36 <anssik> ... I want to thank you all again, and convey feedback we received from Julien Chaumond, Hugging Face CTO: "This is insanely impactful work you've been doing, thank you"
15:15:54 <jsbell> Always happy to help, Dwayne!
15:16:01 <ningxin> Thanks for Dwayne, this is a huge contribution!
15:16:04 <anssik> ... Keep up the great work!
15:16:44 <anssik> Subtopic: dequantizeLinear emulation improvement
15:16:51 <anssik> anssik: the first spin-off from PR #805
15:16:57 <anssik> -> https://github.com/webmachinelearning/webnn/issues/779#issuecomment-2689214272
15:16:58 <gb> https://github.com/webmachinelearning/webnn/issues/779 -> CLOSED Issue 779 Support block-wise quantization (by huningxin) [operator specific]
15:17:22 <anssik> anssik: Dwayne realized dequantizeLinear emulation could be improved significantly if expand() was augmented to accept any from-shape that was an integer multiple of the to-shape
15:17:27 <anssik> -> https://www.w3.org/TR/webnn/#api-mlgraphbuilder-expand
15:17:41 <anssik> anssik: given issue #779 was closed by PR #805, I'd support tracking this as a separate issue
15:17:42 <gb> https://github.com/webmachinelearning/webnn/issues/779 -> CLOSED Issue 779 Support block-wise quantization (by huningxin) [operator specific]
15:17:42 <gb> https://github.com/webmachinelearning/webnn/pull/805 -> MERGED Pull Request 805 Operator set wave 3 (by fdwr)
15:17:53 <anssik> anssik: Dwayne feel free to create a new issue for this improvement
15:18:42 <anssik> Dwayne: good summary, this would enable cleaner composability
15:19:07 <anssik> ... I will open a new issue for that
15:19:26 <anssik> Subtopic: Gather multiaxis operator proposal
15:19:29 <anssik> anssik: a second spin-off from PR #805
15:19:32 <anssik> -> https://github.com/webmachinelearning/webnn/issues/767#issuecomment-2688528116
15:19:33 <gb> https://github.com/webmachinelearning/webnn/issues/767 -> CLOSED Issue 767 Request the decomposition for gatherElements, scatterElements and scatterND (by fujunwei) [operator specific]
15:19:51 <anssik> anssik: Dwayne has explored a way to more generically express a gathering operation, rather than the 3 distinct operators: gather elements, gather blocks, gather ND blocks
15:19:57 <anssik> ... problem statement: "Is there a more fundamental expression of a gathering operation that is more generic while also being simpler to document and implement?"
15:20:19 <anssik> anssik: Dwayne's draft Gather multiaxis operator proposal to more generically express a gathering operation has a ton of important details:
15:20:24 <anssik> -> https://github.com/fdwr/MachineLearningOperators/blob/master/Multigather.md
15:20:31 <anssik> anssik: I invite folks to review the draft for:
15:20:36 <anssik> ... - operator equivalence classes
15:20:43 <anssik> ... - proposed multigather operator IDL
15:20:47 <anssik> ... - a prototype implementation of gatherMultiaxis() in JS
15:20:50 <anssik> ... - operator mapping
15:21:02 <anssik> anssik: This could become another standalone issue? I think the group may need some extra time to look into this
15:21:41 <anssik> Dwayne: separate issue makes sense, this is still a rough draft, takes a while to digest all the details
15:21:52 <anssik> ... all three ops use the same shader under the hood
15:23:00 <anssik> anssik: this is pioneering effort
15:23:04 <anssik> q?
15:23:51 <jsbell> q+
15:23:56 <anssik> ack jsbell
15:24:18 <anssik> jsbell: wanted to ask about u/int4, rough timeline?
15:24:22 <anssik> Dwayne: next week
15:24:48 <anssik> Topic: Query mechanism for supported devices
15:25:08 <anssik> anssik: issue #815
15:25:08 <gb> https://github.com/webmachinelearning/webnn/issues/815 -> Issue 815 Query mechanism for supported devices (by anssiko) [device selection]
15:25:14 <anssik> anssik: thanks everyone for your contributions in the issue
15:25:21 <anssik> ... I'm happy to see active discussion and new perspectives, ideation
15:25:45 <anssik> ... as a reminder, the approach I'd prefer us to use is extract real-world use cases first, then assess implementability and only last go deep into the solution space
15:26:03 <anssik> ... I'll recap the recent feedback received, and then we will open the discussion and folks can fill in the blanks
15:26:28 <anssik> ... Mike's comment "a processing chip is best described in terms of capabilities"
15:26:57 <anssik> ... received a response from Reilly: "this is best in principle however the reality of all the frameworks we have been prototyping WebNN against is that none of them expose which capabilities a processing chip supports"
15:27:18 <anssik> ... and continued "the capabilities we end up exposing through the opSupportLimits() method are the capabilities of the framework in general rather than any of the processors it can target."
15:27:32 <anssik> ... Mike suggested listing opSupportLimits() per processor
15:28:02 <anssik> ... Zoltan suggested passing requiredCapabilities (dataTypes, maximumRank, operators) at contextCreation() invocation that would automatically select a device
15:28:47 <anssik> ... Mike shared "in the WebGPU specification we discussed this a bit as many of the same privacy consideration exist [...] the UA can choose to bucket / report any specific limits / features / operations they wish to mitigate the potential privacy concerns."
15:28:51 <anssik> -> https://www.w3.org/TR/webgpu/#privacy-machine-limits
15:29:29 <anssik> anssik: continued: "perhaps a browser wants all models to run on a class of devices with different hardware support, so it reports the lowest common set supported across all devices even on the higher end devices"
15:29:58 <anssik> ... Reilly suggested defining the processor types in terms of abstractions exposed by the web platform:
15:30:08 <anssik> ... - The "cpu" is the processor where JavaScript and WebAssembly execute
15:30:21 <anssik> ... - The "gpu" is the processor where WebGL and WebGPU programs execute
15:30:25 <anssik> ... - All other processor types are unnamed
15:30:33 <anssik> ... ("cpu" and "gpu" are placeholder names, could be "foo" and "bar")
15:31:17 <anssik> ... Reilly's proposal reads: "The control I propose adding is the ability to request an MLContext with the capacity to invoke ML graphs using the capacity of either the "cpu" or "gpu" processor, or that explicitly does not invoke ML graphs using the capacity of either the "cpu" or "gpu" processor. Implementations are free to ignore this request."
15:31:25 <tomayac> present+ Thomas_Steiner
15:31:27 <anssik> ... "Additionally, I propose adding a property which informs the developer which processors were involved in a previous ML graph dispatch task."
15:31:33 <anssik> ... capabilities != capacity, capacity answer a question "can this perform adequately?"
15:31:49 <anssik> ... the two use cases extracted from Reilly's proposals:
15:31:54 <anssik> ... - UC1: "allow the developer to provide a hint to the platform about the workloads it intends to run (e.g. please don't use the GPU for this ML task, there is WebGPU work I intend to schedule as well)"
15:32:14 <anssik> ... - UC2: "allow the developer to understand the performance of their application based on how the browser ended up being able to schedule their workloads"
15:32:24 <anssik> ... Zoltan's refined proposal for includeDevice + excludeDevice as hints to createContext()
15:32:43 <anssik> ... Josh proposed "Maybe [MLContext.]dispatch() could return a Promise that resolves to a dictionary with details of the inference, e.g. what compute units it used, and in the future additional diagnostics like timing?"
15:33:22 <anssik> anssik: to summarize, we received new use cases (e.g. UC1 and UC2), feedback on implementability (capabilities of frameworks vs. chips), and many proposed solutions for the possible API shape (perProcessorSupportedLimits, "cpu-like" & "gpu-like", includeDevice & excludeDevice hints, dispatch() resolving with inference details)
15:33:31 <anssik> ... I'd suggest Zoltan to extract the use cases from this issue and add them to the explainer
15:33:44 <anssik> anssik: I'd open the floor for discussion, please queue up
15:35:04 <anssik> Mike: if we want to expose capabilities of each chip, we have more features in WebGPU than in WebNN, so could mitigate the privacy issues
15:36:08 <anssik> Reilly: I'm supportive of Zoltan's and Mike's proposal on how to expose processor capabilities in more nuanced way, I'm completely supportive on adding that to the API
15:36:36 <anssik> ... only hesitation is the current frameworks do not support that, not able to query Core ML, TensorFlow etc. for processors to use
15:37:07 <anssik> ... proposal is mainly focused on, given the constraints, what information is useful for developers
15:37:28 <anssik> ... not wanting to explicitly name every processor type, the ecosystem will shift and we want future-proofing
15:38:07 <anssik> ... I can justify calling "something-cpu" and "something-gpu" that tie to Wasm and WebGL/GPU
15:38:24 <anssik> ... capacity of the system to executure JS comes from "something-cpu"
15:38:52 <anssik> ... that reflects the feedback we get from developers, they have a portfolio of features
15:39:17 <anssik> ... you want to hint the browser ahead of time, if certain placement of the work is a good idea
15:39:29 <anssik> q?
15:40:26 <anssik> Zoltan: I was happy to see Reilly's take to define cpu and gpu in terms of browser implementation, my hunch is they're not mutually exclusive with Mike's proposal, it is also a simple API easier to implement
15:41:00 <anssik> ... OTOH, the Mike's proposal has been out for a while, might take more work to implement, could do both
15:41:20 <anssik> ... Mike, any concerns to implement with the cpu-like and gpu-like concepts?
15:41:52 <anssik> Mike: no direct concerns with the proposal, but not immediately clear of cpu-like and gpu-like are useful if the frameworks don't expose this information?
15:42:07 <anssik> s/clear of/clear if
15:42:14 <anssik> Mike: need to read Reilly's proposal again
15:42:32 <reillyg> q+
15:43:03 <anssik> jsbell: I accumulation inference results on the objects, would prefer to have a more direct way to get to that information
15:43:05 <anssik> ack reillyg
15:43:30 <anssik> Reilly: wanted to clarify that the two proposals have two different use cases, or developer needs
15:44:06 <anssik> ... 1) tailor for capabilities, 2) capacity question, the amount of compute resources the system has to execute XPU workloads
15:44:17 <anssik> ... propose to tackle the capacity question first
15:44:45 <zkis> +1 to reilly's point, I also see them cater to different developer flows, both seem valid
15:45:05 <anssik> ... capability question is also interesting, we'd need to hard code the capabilities, which is easy if only a small number of chips are targets
15:45:07 <anssik> q?
15:45:34 <RafaelCintron> q+
15:45:47 <anssik> Zoltan: we can wait on Mike, it is important to get his feedback, Reilly has defined the use case clearly, whether we can implement that differently
15:45:54 <anssik> ack RafaelCintron
15:46:18 <anssik> RafaelCintron: question for Reilly on the capacity proposal, is capacity a moment in time?
15:46:27 <anssik> ... or something that changes over time
15:47:16 <anssik> Reilly: dispatch returning some information where inference actually executed on is one part
15:47:39 <anssik> ... ahead of time hint, based on what I know, this works on this class of devices, try to run the workload here
15:48:11 <anssik> ... two signals 1) where you should put the workload and 2) this is where the system actually ran it on
15:48:40 <anssik> ... I expect developers to test on machines many users have, representative sample of devices to understand their characteristics
15:48:58 <anssik> RafaelCintron: how do they know the browsers run on those systems?
15:49:13 <anssik> reillyg: for fingerprinting reasons, we don't share excess information
15:49:41 <anssik> RafaelCintron: WebGPU has a device class
15:51:13 <anssik> ... what if the web developer learns "CPU and GPU are off limits", other  available devices do not have the required ops, what to do?
15:51:31 <anssik> reillyg: hint back to developer, "here's where it actually ran" is a useful signal
15:52:20 <anssik> ... potentially room for a device class style hint for WebNN that answers a question "do you have non-GPU-style accelerator?"
15:52:21 <anssik> q?
15:53:45 <anssik> Topic: Caching mechanism for MLGraph
15:53:49 <anssik> anssik: issue #807
15:53:50 <gb> https://github.com/webmachinelearning/webnn/issues/807 -> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]
15:54:01 <anssik> ... no new feedback in the issue since we discussed this last time
15:54:07 <reillyg> q+
15:54:27 <anssik> ... first, I wanted to check if anyone has investigated implementation issues with model cache within browser sandbox and possible solutions on how to overcome the sandbox restrictions safely?
15:54:39 <anssik> ... second, would like to gather further feedback on implementation-defined vs. explicit caching API, but would like to understand the implementability story first
15:54:42 <anssik> ack reillyg
15:55:26 <anssik> Reilly: I can speak to both, we haven't had a change to look at implementation yet, we believe it is feasible to implement caching in our current Chromium prototype for TFLite and CoreML backends
15:55:39 <anssik> ... sandboxing is not an issue, because this is an origin-scoped cache
15:56:19 <anssik> ... the reason for explicit caching API has to do with weights
15:56:49 <anssik> ... the distribution between WebGPU and WebNN is different, for WebGPU caching shaders make sense, smaller
15:57:15 <anssik> ... weights in WebNN can be reorganized in compilation stage, shader can modify the weights
15:58:12 <anssik> McCool: I think we have to understand use cases, save download or compile time
15:58:35 <anssik> ... single origin vs. cross-origin caches
16:00:17 <anssik> Reilly: this is totally implementable, but it is work, however looking at ORT and TFLite they don't have a concept of a cached model
16:00:40 <anssik> ... required implementation work on both browser side of time and framework side of things
16:00:59 <reillyg> I need to go. Losing my room.
16:01:38 <dwayner> Reilly: ORT has a Session option to dump the optimized file to an .ort file format which reloads more quickly. We might want to ask ORT to add a way to get it via a memory blob rather than file path.
16:01:45 <anssik> ningxin: one comment regarding frameworks, I'm aware of ORT concept, EPContext, that allow embed a compiled binary blob into ONNX model, encapsulated as a special operator
16:02:11 <anssik> ... worth investigating if this could help with model caching
16:02:13 <anssik> q?
16:02:40 <anssik> q?
16:03:03 <McCool> (regarding my comment, my point was an implicit cache cannot save model download time - you need an explicit cache)
16:03:59 <anssik> RRSAgent, draft minutes
16:04:00 <RRSAgent> I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik
16:11:09 <anssik> s/executure/execute
16:12:28 <anssik> s/I accumulation/accumulating
16:15:49 <anssik> s/side of time/side
16:16:58 <anssik> RRSAgent, draft minutes
16:17:00 <RRSAgent> I have made the request to generate https://www.w3.org/2025/03/13-webmachinelearning-minutes.html anssik