14:55:25 RRSAgent has joined #webmachinelearning 14:55:29 logging to https://www.w3.org/2025/02/27-webmachinelearning-irc 14:55:29 RRSAgent, make logs Public 14:55:30 please title this meeting ("meeting: ..."), anssik 14:55:30 Meeting: WebML WG Teleconference – 27 February 2025 14:55:34 Chair: Anssi 14:55:39 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-02-27-wg-agenda.md 14:55:43 Scribe: Anssi 14:55:47 scribeNick: anssik 14:55:55 gb, this is webmachinelearning/webnn 14:55:58 anssik, OK. 14:56:03 Present+ Anssi_Kostiainen 14:56:07 Regrets+ Mike_Wyrzykowski 14:56:46 lgombos has joined #webmachinelearning 14:57:36 Present+ Rafael_Cintron 14:57:45 RafaelCintron has joined #webmachinelearning 14:57:46 Present+ Laszlo_Gombos 14:58:11 RRSAgent, draft minutes 14:58:12 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik 14:58:55 jsbell has joined #webmachinelearning 14:59:04 Present+ Joshua_Bell 15:00:00 ningxin has joined #webmachinelearning 15:00:13 Present+ Etienne_Noel 15:00:31 Present+ Ningxin_Hu 15:00:48 Joshua_Lochner has joined #webmachinelearning 15:01:02 Present+ Joshua_Lochner 15:01:13 Present+ Christian_Liebel 15:01:21 Present+ Dwayne_Robinson 15:01:56 zkis has joined #webmachinelearning 15:01:57 RRSAgent, draft minutes 15:01:58 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik 15:02:07 Present+ Zoltan_Kis 15:02:09 Present+ Laszlo_Gombos 15:02:28 anssik: please welcome Brad Triebwasser from Google to the WebML WG and CG! 15:02:45 Topic: Announcements 15:02:52 Subtopic: WebML Working Group Charter Advisory Committee review open until 2025-03-05/06 15:03:00 anssik: WebML WG Charter AC review open until 2025-03-05/06, please reach out to your AC rep and ask to vote 15:03:04 present+ 15:03:06 -> Locate your AC rep (W3C Member-only link) https://www.w3.org/Member/ACList 15:03:13 -> Voting instructions for AC reps (W3C Member-only link) https://lists.w3.org/Archives/Member/w3c-ac-members/2025JanMar/0029.html 15:03:28 anssik: I can help establish connections to your AC reps 15:03:37 Subtopic: Authentic Web workshop 15:04:00 anssik: W3C is hosting a mini-workshop series to review proposals to combat misinformation on the web 15:04:11 ... this is a follow-up to two TPAC 2024 breakouts that discussed Originator Profile and Content Authenticity proposals, these proposals are linked from the invitation: 15:04:20 -> Invitation to Authentic Web mini-workshop series - session 1, 12 March 2025 15:04:20 https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2025Feb/0004.html 15:04:26 anssik: this 1-hour workshop is open to all, including public, so feel free to pass the link to your friends who may be interested 15:04:41 McCool has joined #webmachinelearning 15:04:52 Topic: Incubations summary 15:05:07 anssik: I asked Etienne to share a summary of the CG's 26 Feb telcon, thanks Etinne for taking notes 15:05:22 ... since the CG meeting time is not very EU-friendly, many EU-based folks myself included will happily catch up with discussions on this call 15:05:33 ... Tarek from Mozilla shared he was on vacation so couldn't join this time 15:05:39 s/Etinne/Etienne 15:05:46 -> WebML CG Teleconference – 26 February 2025 - 00:00-01:00 UTC https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-02-26-cg-agenda.md 15:06:12 Etienne: summary of the current API proposals in the CG was discussed for Prompt API, Translator & Language Detector API, Writing Assistance APIs 15:06:23 ... Microsoft has contributed to structure output for Prompt API 15:06:36 ... Chrome believe that could make it feasible for the open web 15:06:44 s/believe/believes 15:06:53 s/structure output/structured output 15:07:32 Etienne: feedback and concerns from ai.* pattern, conflicts with minimizers 15:08:57 ... OT 5x more sign ups than usual, overall great feedback 15:09:33 ... Firefox AI Runtime, model caching garnered interest 15:10:07 ... sharing models across origins, Kenji has received some feedback for this feature 15:11:25 ... will use GH issues for topic tracking in the future meetings 15:11:27 q? 15:12:24 Christian: happy to see the minutes and/or summary 15:13:15 q? 15:13:39 Topic: Query mechanism for supported devices 15:14:01 anssik: after our last meeting, we decided to remove the MLDeviceType (in PR #809) as the first phase of our device selection solution 15:14:02 https://github.com/webmachinelearning/webnn/pull/809 -> MERGED Pull Request 809 Remove MLDeviceType (by zolkis) [device selection] 15:14:10 ... now, as the next phase, I'd like to continue discuss the query mechanism 15:14:21 ... thanks everyone for your contributions in the issue! 15:14:56 ... the approach I'd like to try here is to document real-world use cases, then assess implementability and only last go deep into the solution space 15:15:16 ... Markus and Fredrik shared a real-time video processing use case, quoting: 15:15:24 ... 1. If the user selects to use functionality like background blur, we want to offer the best quality the device can offer. So the product has a small set of candidate models and technologies (WebNN, WebGPU, WASM) that it has to choose between. Accelerated technologies come with allowance for beefier models. 15:15:24 ... 2. The model/tech choser algorithm needs to be fast, and we need to avoid spending seconds or even hundreds of milliseconds to figure out if a given model should be able to run accelerated. So for example downloading the entirety (could be large things..), compiling & try-running a model seems infeasible. 15:16:31 -> https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2658627753 15:16:32 https://github.com/webmachinelearning/webnn/issues/815 -> Issue 815 Query mechanism for supported devices (by anssiko) [device selection] 15:16:39 anssik: the explainer has been updated with this use case, more use cases welcome! 15:16:48 ... I derived the following requirements from this use case: 15:17:01 ... - query mechanism should be available before model download 15:17:28 ... - query mechanism should signal explicitly if the context is "accelerated" for the given model to allow developer-defined fallback to other technologies (e.g. WebGPU, Wasm) 15:17:48 ... - query mechanism must signal explicitly if it downgrades the requested "accelerated" context to "non-accelerated" context 15:18:20 anssik: position from Mike/WebKit is clear: 15:18:35 ... - query mechanism should be based on capabilities, not on device names such as "gpu", "npu" 15:18:48 anssik: comments from Dwayne for data types: 15:19:09 ... - Model data type (like float16) can make a big perf difference, and so knowing before model download is useful. 15:19:14 ... -- float32: runs on both GPU and CPU, but not any many NPU's 15:19:31 ... -- float16: works on NPU. On GPU offers ~2x boost over float32, but much slower on CPU because CPU's lack dedicated float16 hardware. 15:19:40 ... -- int8: works well on CPU and NPU, but then GPU's suffer. 15:19:58 fr has joined #webmachinelearning 15:20:11 anssik: known implementation constraints: 15:20:39 ... - Reilly notes frameworks/backends don't expose enough information to determine ahead of time whether a given operation will be optimized or not, or is emulated 15:21:07 ... - Ningxin notes even if an op is emulated by browser, the underlying framework may still be able to fuse the decomposed small ops into optimized one, which we can't tell either. 15:21:17 ... - Ningxin surveyed backends: 15:21:25 ... -- DirectML is always an "accelerated" context 15:21:41 ... -- TFLite can check if the context is "accelerated" (fully delegated in TFLite terms) 15:21:51 ... -- Core ML always has a CPU fallback 15:22:07 ... -- ONNX Runtime allows to disable fallback to CPU 15:22:48 anssik: Reilly noted that from interoperability perspective, the CPU fallback is considered positive 15:22:56 -> Resolving tension between interoperability and implementability https://www.w3.org/TR/design-principles/#implementability 15:23:11 anssik: other considerations: 15:23:17 ... what is actually "accelerated"? 15:23:46 ... Reilly said "I might call a CPU inference engine using XNNPACK "accelerated" since it is using hand-optimized assembly routines tuned for a handful of CPU architectures rather than a naive implementation in C++" 15:24:06 anssik: I'd like to open the discussion, use cases and implementability consideration first 15:24:11 ... solutions only second 15:24:22 q? 15:25:46 Zoltan: the summary was good, I'm wondering since we have conflicting requirements, on some platforms cannot avoid fallback, thus want to go with introspection 15:26:42 ... start with a generic introspection mechanism and add capabilities that can be implemented? 15:27:05 jsbell: not sure what are the next steps right now 15:27:44 ... introspection API seems quite hard, how to define "accelerated" and Ningxin's comments on even if the browser implementation thinks something is accelerated it may not be deep inside or vice versa 15:28:06 Zoltan: can we try add "avoid CPU fallback" option? 15:29:05 jsbell: exploring further hints might work, have discussed with Reilly that if the web app was using compute unit e.g. GPU, add a mechanism to be able to say "please don't use this compute unit" 15:29:34 q+ 15:29:34 ... this does not satisfy the "do not download the model" requirement 15:30:57 dwayner has joined #webmachinelearning 15:31:45 q? 15:32:48 q? 15:32:50 ack RafaelCintron 15:33:15 RafaelCintron: interesting point that was brought up, that we should allow developer to define "avoid GPU" 15:33:34 ... in the spec we can give WebGPU context, people wanted that to be a hint "do use GPU" instead 15:34:18 ... as for the downloading a model being a requirement or not, I wonder if people are happy if we don't download weight, just the graph topology 15:34:19 q? 15:35:40 RafaelCintron: depending on ORT EP we could possibly download the graph without weights 15:36:16 ... Reilly's decision tree in the issue is a nice example 15:36:31 ... it is important to tell "accelerated" or "not accelerated" 15:37:13 ... agree with Mike that it is dangerous to use e.g. CPU and NPU to make assumptions on compute unit characteristics or performance based on names 15:39:42 RafaelCintron: we should talk with Markus more to understand the exact models that he wants to run and see if we can design an API to the frameworks underneath, Google Meet is a great use case to validate the API design against 15:40:56 q? 15:40:56 q+ 15:41:05 ack ningxin 15:41:58 ningxin: my comment is about "avoid CPU" or "CPU only" option, Anssi mentioned multi-stage query mechanism, probably we can allow an application to know if the context is CPU only before download 15:42:09 ... e.g. TFLite is CPU-only currently without Delegates 15:42:26 ... in such a case a web app might opt in to use alternative API such as WebGPU 15:43:36 ... in the second phase, developer may want to specify a hint to avoid CPU fallback, TFLite or DML or ORT we can disable CPU fallback, this has the model topology known, if this causes an error, developer can know 15:43:48 q? 15:46:10 Dwayne: in ONNX format you can split the weight out, but cannot easily download only the topology 15:46:45 ... could download protobuf, not sure how to do that with existing APIs 15:46:48 I have been able to load the graph topology without downloading the entire model. Unless you're referring to internally? 15:47:50 Joshua_Lochner: I was investigating this recently, download all ONNX weight on Hub and see what ops are supported, for debugging purposes using Netron 15:48:13 ... is this something internal to the runtime that you don't support external data formats? 15:49:53 Dwayne: you can do that, but using existing libraries such as ORT Web it is not possible to download the model without weight 15:50:11 yes, exactly 15:51:05 Topic: Operator set Wave 3 15:51:14 anssik: PR #809 15:51:14 https://github.com/webmachinelearning/webnn/pull/809 -> MERGED Pull Request 809 Remove MLDeviceType (by zolkis) [device selection] 15:51:18 ... huge thanks to Dwayne for all the updates to the PR! 15:51:25 ... and Ningxin and Josh for your review suggestions 15:51:29 ... the PR is now marked as "ready for review", everyone PTAL! 15:51:45 ... recent updates include: 15:51:56 ... - GatherND example updates 15:52:00 ... - gatherND and scatterND algorithm steps 15:52:04 ... - Update slice algorithm steps for strides 15:52:10 ... - Update blockwise broadcasting to return true/false 15:52:20 ... Dwayne, feel free to share what folks should look at in particular in the final review? 15:52:34 PR #805 15:52:35 https://github.com/webmachinelearning/webnn/pull/805 -> Pull Request 805 Operator set wave 3 (by fdwr) 15:53:02 Dwayne: thanks Nignxin and Joshua for all the feedback! 15:53:17 q? 15:53:44 anssik: I'm planning to initiate TAG review soon and this op set Wave 3 is one important piece of that review scope 15:53:54 ... I support the ideas of splitting u/int4 into a separate PR, we could initiate the TAG review without blocking on it 15:54:06 q? 15:54:30 Dwayne: could move forward with TAG review without u/int4 15:55:00 Bikeshed is reporting errors on latest changes https://github.com/webmachinelearning/webnn/actions/runs/13563543433/job/37911433004?pr=805 I'll review over next few days 15:55:08 q? 15:55:24 I'll take another look, thanks so much! 15:55:38 Topic: Rounding operators 15:55:43 anssik: issue #817 15:55:44 https://github.com/webmachinelearning/webnn/issues/817 -> Issue 817 Rounding operators (by fdwr) [feature request] 15:55:54 ... this issue opened by Dwayne contains detailed research into known gaps in rounding functions 15:55:59 ... Dwayne explains: 15:56:05 "I lacked rounding functions in WebNN, particularly the default IEEE standard round-tied-halves-to-nearest-even. There's not a nice way to simulate this missing function, and it's a basic primitive that belongs in a library anyway. So I propose filling in some gaps" 15:56:33 anssik: there's a detailed rounding modes table with WebNN, IEEE, C++ equiv, JS equiv 15:56:39 ... library support, examples, emulation 15:56:46 ... based on this research a new roundEven() API is suggested 15:56:52 partial interface MLGraphBuilder { 15:56:52 MLOperand roundEven(MLOperand input, optional MLOperatorOptions options = {}); 15:56:52 }; 15:56:52 15:56:52 partial dictionary MLOpSupportLimits { 15:56:52 MLSingleInputSupportLimits roundEven; 15:56:52 }; 15:57:16 anssik: I'd like to see others review the proposal and if no concerns this could be turned into a PR 15:57:25 ... thank you Dwayne for documenting and educating us on the rounding function details! 15:58:13 Dwayne: TLDR: add one function that is consistent with IEEE rounding mode, it is important to express decomposition for dequantize operator 15:58:13 q? 15:58:40 q? 15:59:11 q+ 15:59:12 ack ningxin 15:59:38 ningxin: did you survey three major backends for support? 15:59:43 Dwayne: yes, there's a list of backends in the table 16:00:20 q? 16:01:01 RRSAgent, draft minutes 16:01:02 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik 16:11:36 s/download weight/download weights 16:13:05 s/weight out/weights out 16:13:44 s/ONNX weight/ONNX weights 16:14:10 s/without weight/without weights 16:17:18 s/(in PR #809)// 16:17:25 RRSAgent, draft minutes 16:17:26 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik 16:18:16 s/PR #809/PR #805 16:18:19 RRSAgent, draft minutes 16:18:20 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik 16:19:02 s/ideas/idea 16:21:49 s/dequantize/quantizeLinear/ 16:22:02 RRSAgent, draft minutes 16:22:03 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik 16:23:22 Present+ Thomas_Steiner 16:23:36 RRSAgent, draft minutes 16:23:38 I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik