14:55:25 <RRSAgent> RRSAgent has joined #webmachinelearning
14:55:29 <RRSAgent> logging to https://www.w3.org/2025/02/27-webmachinelearning-irc
14:55:29 <Zakim> RRSAgent, make logs Public
14:55:30 <Zakim> please title this meeting ("meeting: ..."), anssik
14:55:30 <anssik> Meeting: WebML WG Teleconference – 27 February 2025
14:55:34 <anssik> Chair: Anssi
14:55:39 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-02-27-wg-agenda.md
14:55:43 <anssik> Scribe: Anssi
14:55:47 <anssik> scribeNick: anssik
14:55:55 <anssik> gb, this is webmachinelearning/webnn
14:55:58 <gb> anssik, OK.
14:56:03 <anssik> Present+ Anssi_Kostiainen
14:56:07 <anssik> Regrets+ Mike_Wyrzykowski
14:56:46 <lgombos> lgombos has joined #webmachinelearning
14:57:36 <anssik> Present+ Rafael_Cintron
14:57:45 <RafaelCintron> RafaelCintron has joined #webmachinelearning
14:57:46 <anssik> Present+ Laszlo_Gombos
14:58:11 <anssik> RRSAgent, draft minutes
14:58:12 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik
14:58:55 <jsbell> jsbell has joined #webmachinelearning
14:59:04 <anssik> Present+ Joshua_Bell
15:00:00 <ningxin> ningxin has joined #webmachinelearning
15:00:13 <anssik> Present+ Etienne_Noel
15:00:31 <anssik> Present+ Ningxin_Hu
15:00:48 <Joshua_Lochner> Joshua_Lochner has joined #webmachinelearning
15:01:02 <anssik> Present+ Joshua_Lochner
15:01:13 <anssik> Present+ Christian_Liebel
15:01:21 <anssik> Present+ Dwayne_Robinson
15:01:56 <zkis> zkis has joined #webmachinelearning
15:01:57 <anssik> RRSAgent, draft minutes
15:01:58 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik
15:02:07 <anssik> Present+ Zoltan_Kis
15:02:09 <lgombos> Present+ Laszlo_Gombos
15:02:28 <anssik> anssik: please welcome Brad Triebwasser from Google to the WebML WG and CG!
15:02:45 <anssik> Topic: Announcements
15:02:52 <anssik> Subtopic: WebML Working Group Charter Advisory Committee review open until 2025-03-05/06
15:03:00 <anssik> anssik: WebML WG Charter AC review open until 2025-03-05/06, please reach out to your AC rep and ask to vote
15:03:04 <tomayac> present+
15:03:06 <anssik> -> Locate your AC rep (W3C Member-only link) https://www.w3.org/Member/ACList
15:03:13 <anssik> -> Voting instructions for AC reps (W3C Member-only link) https://lists.w3.org/Archives/Member/w3c-ac-members/2025JanMar/0029.html
15:03:28 <anssik> anssik: I can help establish connections to your AC reps
15:03:37 <anssik> Subtopic: Authentic Web workshop
15:04:00 <anssik> anssik: W3C is hosting a mini-workshop series to review proposals to combat misinformation on the web
15:04:11 <anssik> ... this is a follow-up to two TPAC 2024 breakouts that discussed Originator Profile and Content Authenticity proposals, these proposals are linked from the invitation:
15:04:20 <anssik> -> Invitation to Authentic Web mini-workshop series - session 1, 12 March 2025
15:04:20 <anssik>  https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2025Feb/0004.html
15:04:26 <anssik> anssik: this 1-hour workshop is open to all, including public, so feel free to pass the link to your friends who may be interested
15:04:41 <McCool> McCool has joined #webmachinelearning
15:04:52 <anssik> Topic: Incubations summary
15:05:07 <anssik> anssik: I asked Etienne to share a summary of the CG's 26 Feb telcon, thanks Etinne for taking notes
15:05:22 <anssik> ... since the CG meeting time is not very EU-friendly, many EU-based folks myself included will happily catch up with discussions on this call
15:05:33 <anssik> ... Tarek from Mozilla shared he was on vacation so couldn't join this time
15:05:39 <tomayac> s/Etinne/Etienne
15:05:46 <anssik> -> WebML CG Teleconference – 26 February 2025 - 00:00-01:00 UTC https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-02-26-cg-agenda.md
15:06:12 <anssik> Etienne: summary of the current API proposals in the CG was discussed for Prompt API, Translator & Language Detector API, Writing Assistance APIs
15:06:23 <anssik> ... Microsoft has contributed to structure output for Prompt API
15:06:36 <anssik> ... Chrome believe that could make it feasible for the open web
15:06:44 <anssik> s/believe/believes
15:06:53 <tomayac> s/structure output/structured output
15:07:32 <anssik> Etienne: feedback and concerns from ai.* pattern, conflicts with minimizers
15:08:57 <anssik> ... OT 5x more sign ups than usual, overall great feedback
15:09:33 <anssik> ... Firefox AI Runtime, model caching garnered interest
15:10:07 <anssik> ... sharing models across origins, Kenji has received some feedback for this feature
15:11:25 <anssik> ... will use GH issues for topic tracking in the future meetings
15:11:27 <anssik> q?
15:12:24 <anssik> Christian: happy to see the minutes and/or summary
15:13:15 <anssik> q?
15:13:39 <anssik> Topic: Query mechanism for supported devices
15:14:01 <anssik> anssik: after our last meeting, we decided to remove the MLDeviceType (in PR #809) as the first phase of our device selection solution
15:14:02 <gb> https://github.com/webmachinelearning/webnn/pull/809 -> MERGED Pull Request 809 Remove MLDeviceType (by zolkis) [device selection]
15:14:10 <anssik> ... now, as the next phase, I'd like to continue discuss the query mechanism
15:14:21 <anssik> ... thanks everyone for your contributions in the issue!
15:14:56 <anssik> ... the approach I'd like to try here is to document real-world use cases, then assess implementability and only last go deep into the solution space
15:15:16 <anssik> ... Markus and Fredrik shared a real-time video processing use case, quoting:
15:15:24 <anssik> ... 1. If the user selects to use functionality like background blur, we want to offer the best quality the device can offer. So the product has a small set of candidate models and technologies (WebNN, WebGPU, WASM) that it has to choose between. Accelerated technologies come with allowance for beefier models.
15:15:24 <anssik> ... 2. The model/tech choser algorithm needs to be fast, and we need to avoid spending seconds or even hundreds of milliseconds to figure out if a given model should be able to run accelerated. So for example downloading the entirety (could be large things..), compiling & try-running a model seems infeasible.
15:16:31 <anssik> -> https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2658627753
15:16:32 <gb> https://github.com/webmachinelearning/webnn/issues/815 -> Issue 815 Query mechanism for supported devices (by anssiko) [device selection]
15:16:39 <anssik> anssik: the explainer has been updated with this use case, more use cases welcome!
15:16:48 <anssik> ... I derived the following requirements from this use case:
15:17:01 <anssik> ... - query mechanism should be available before model download
15:17:28 <anssik> ... - query mechanism should signal explicitly if the context is "accelerated" for the given model to allow developer-defined fallback to other technologies (e.g. WebGPU, Wasm)
15:17:48 <anssik> ... - query mechanism must signal explicitly if it downgrades the requested "accelerated" context to "non-accelerated" context
15:18:20 <anssik> anssik: position from Mike/WebKit is clear:
15:18:35 <anssik> ... - query mechanism should be based on capabilities, not on device names such as "gpu", "npu"
15:18:48 <anssik> anssik: comments from Dwayne for data types:
15:19:09 <anssik> ... - Model data type (like float16) can make a big perf difference, and so knowing before model download is useful.
15:19:14 <anssik> ... -- float32: runs on both GPU and CPU, but not any many NPU's
15:19:31 <anssik> ... -- float16: works on NPU. On GPU offers ~2x boost over float32, but much slower on CPU because CPU's lack dedicated float16 hardware.
15:19:40 <anssik> ... -- int8: works well on CPU and NPU, but then GPU's suffer.
15:19:58 <fr> fr has joined #webmachinelearning
15:20:11 <anssik> anssik: known implementation constraints:
15:20:39 <anssik> ... - Reilly notes frameworks/backends don't expose enough information to determine ahead of time whether a given operation will be optimized or not, or is emulated
15:21:07 <anssik> ... - Ningxin notes even if an op is emulated by browser, the underlying framework may still be able to fuse the decomposed small ops into optimized one, which we can't tell either.
15:21:17 <anssik> ... - Ningxin surveyed backends:
15:21:25 <anssik> ... -- DirectML is always an "accelerated" context
15:21:41 <anssik> ... -- TFLite can check if the context is "accelerated" (fully delegated in TFLite terms)
15:21:51 <anssik> ... -- Core ML always has a CPU fallback
15:22:07 <anssik> ... -- ONNX Runtime allows to disable fallback to CPU
15:22:48 <anssik> anssik: Reilly noted that from interoperability perspective, the CPU fallback is considered positive
15:22:56 <anssik> -> Resolving tension between interoperability and implementability https://www.w3.org/TR/design-principles/#implementability
15:23:11 <anssik> anssik: other considerations:
15:23:17 <anssik> ... what is actually "accelerated"?
15:23:46 <anssik> ... Reilly said "I might call a CPU inference engine using XNNPACK "accelerated" since it is using hand-optimized assembly routines tuned for a handful of CPU architectures rather than a naive implementation in C++"
15:24:06 <anssik> anssik: I'd like to open the discussion, use cases and implementability consideration first
15:24:11 <anssik> ... solutions only second
15:24:22 <anssik> q?
15:25:46 <anssik> Zoltan: the summary was good, I'm wondering since we have conflicting requirements, on some platforms cannot avoid fallback, thus want to go with introspection
15:26:42 <anssik> ... start with a generic introspection mechanism and add capabilities that can be implemented?
15:27:05 <anssik> jsbell: not sure what are the next steps right now
15:27:44 <anssik> ... introspection API seems quite hard, how to define "accelerated" and Ningxin's comments on even if the browser implementation thinks something is accelerated it may not be deep inside or vice versa
15:28:06 <anssik> Zoltan: can we try add "avoid CPU fallback" option?
15:29:05 <anssik> jsbell: exploring further hints might work, have discussed with Reilly that if the web app was using compute unit e.g. GPU, add a mechanism to be able to say "please don't use this compute unit"
15:29:34 <RafaelCintron> q+
15:29:34 <anssik> ... this does not satisfy the "do not download the model" requirement
15:30:57 <dwayner> dwayner has joined #webmachinelearning
15:31:45 <jsbell> q?
15:32:48 <anssik> q?
15:32:50 <anssik> ack RafaelCintron
15:33:15 <anssik> RafaelCintron: interesting point that was brought up, that we should allow developer to define "avoid GPU"
15:33:34 <anssik> ... in the spec we can give WebGPU context, people wanted that to be a hint "do use GPU" instead
15:34:18 <anssik> ... as for the downloading a model being a requirement or not, I wonder if people are happy if we don't download weight, just the graph topology
15:34:19 <anssik> q?
15:35:40 <anssik> RafaelCintron: depending on ORT EP we could possibly download the graph without weights
15:36:16 <anssik> ... Reilly's decision tree in the issue is a nice example
15:36:31 <anssik> ... it is important to tell "accelerated" or "not accelerated"
15:37:13 <anssik> ... agree with Mike that it is dangerous to use e.g. CPU and NPU to make assumptions on compute unit characteristics or performance based on names
15:39:42 <anssik> RafaelCintron: we should talk with Markus more to understand the exact models that he wants to run and see if we can design an API to the frameworks underneath, Google Meet is a great use case to validate the API design against
15:40:56 <anssik> q?
15:40:56 <ningxin> q+
15:41:05 <anssik> ack ningxin
15:41:58 <anssik> ningxin: my comment is about "avoid CPU" or "CPU only" option, Anssi mentioned multi-stage query mechanism, probably we can allow an application to know if the context is CPU only before download
15:42:09 <anssik> ... e.g. TFLite is CPU-only currently without Delegates
15:42:26 <anssik> ... in such a case a web app might opt in to use alternative API such as WebGPU
15:43:36 <anssik> ... in the second phase, developer may want to specify a hint to avoid CPU fallback, TFLite or DML or ORT we can disable CPU fallback, this has the model topology known, if this causes an error, developer can know
15:43:48 <anssik> q?
15:46:10 <anssik> Dwayne: in ONNX format you can split the weight out, but cannot easily download only the topology
15:46:45 <anssik> ... could download protobuf, not sure how to do that with existing APIs
15:46:48 <Joshua_Lochner> I have been able to load the graph topology without downloading the entire model. Unless you're referring to internally?
15:47:50 <anssik> Joshua_Lochner: I was investigating this recently, download all ONNX weight on Hub and see what ops are supported, for debugging purposes using Netron
15:48:13 <anssik> ... is this something internal to the runtime that you don't support external data formats?
15:49:53 <anssik> Dwayne: you can do that, but using existing libraries such as ORT Web it is not possible to download the model without weight
15:50:11 <Joshua_Lochner> yes, exactly
15:51:05 <anssik> Topic: Operator set Wave 3
15:51:14 <anssik> anssik: PR #809
15:51:14 <gb> https://github.com/webmachinelearning/webnn/pull/809 -> MERGED Pull Request 809 Remove MLDeviceType (by zolkis) [device selection]
15:51:18 <anssik> ... huge thanks to Dwayne for all the updates to the PR!
15:51:25 <anssik> ... and Ningxin and Josh for your review suggestions
15:51:29 <anssik> ... the PR is now marked as "ready for review", everyone PTAL!
15:51:45 <anssik> ... recent updates include:
15:51:56 <anssik> ... - GatherND example updates
15:52:00 <anssik> ... - gatherND and scatterND algorithm steps
15:52:04 <anssik> ... - Update slice algorithm steps for strides
15:52:10 <anssik> ... - Update blockwise broadcasting to return true/false
15:52:20 <anssik> ... Dwayne, feel free to share what folks should look at in particular in the final review?
15:52:34 <anssik> PR #805
15:52:35 <gb> https://github.com/webmachinelearning/webnn/pull/805 -> Pull Request 805 Operator set wave 3 (by fdwr)
15:53:02 <anssik> Dwayne: thanks Nignxin and Joshua for all the feedback!
15:53:17 <anssik> q?
15:53:44 <anssik> anssik: I'm planning to initiate TAG review soon and this op set Wave 3 is one important piece of that review scope
15:53:54 <anssik> ... I support the ideas of splitting u/int4 into a separate PR, we could initiate the TAG review without blocking on it
15:54:06 <anssik> q?
15:54:30 <anssik> Dwayne: could move forward with TAG review without u/int4
15:55:00 <jsbell> Bikeshed is reporting errors on latest changes https://github.com/webmachinelearning/webnn/actions/runs/13563543433/job/37911433004?pr=805 I'll review over next few days
15:55:08 <anssik> q?
15:55:24 <ningxin> I'll take another look, thanks so much!
15:55:38 <anssik> Topic: Rounding operators
15:55:43 <anssik> anssik: issue #817
15:55:44 <gb> https://github.com/webmachinelearning/webnn/issues/817 -> Issue 817 Rounding operators (by fdwr) [feature request]
15:55:54 <anssik> ... this issue opened by Dwayne contains detailed research into known gaps in rounding functions
15:55:59 <anssik> ... Dwayne explains:
15:56:05 <anssik> "I lacked rounding functions in WebNN, particularly the default IEEE standard round-tied-halves-to-nearest-even. There's not a nice way to simulate this missing function, and it's a basic primitive that belongs in a library anyway. So I propose filling in some gaps"
15:56:33 <anssik> anssik: there's a detailed rounding modes table with WebNN, IEEE, C++ equiv, JS equiv
15:56:39 <anssik> ... library support, examples, emulation
15:56:46 <anssik> ... based on this research a new roundEven() API is suggested
15:56:52 <anssik> partial interface MLGraphBuilder {
15:56:52 <anssik>   MLOperand roundEven(MLOperand input, optional MLOperatorOptions options = {});
15:56:52 <anssik> };
15:56:52 <anssik> 
15:56:52 <anssik> partial dictionary MLOpSupportLimits {
15:56:52 <anssik>   MLSingleInputSupportLimits roundEven;
15:56:52 <anssik> };
15:57:16 <anssik> anssik: I'd like to see others review the proposal and if no concerns this could be turned into a PR
15:57:25 <anssik> ... thank you Dwayne for documenting and educating us on the rounding function details!
15:58:13 <anssik> Dwayne: TLDR: add one function that is consistent with IEEE rounding mode, it is important to express decomposition for dequantize operator
15:58:13 <anssik> q?
15:58:40 <anssik> q?
15:59:11 <ningxin> q+
15:59:12 <anssik> ack ningxin
15:59:38 <anssik> ningxin:  did you survey three major backends for support?
15:59:43 <anssik> Dwayne: yes, there's a list of backends in the table
16:00:20 <anssik> q?
16:01:01 <anssik> RRSAgent, draft minutes
16:01:02 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik
16:11:36 <anssik> s/download weight/download weights
16:13:05 <anssik> s/weight out/weights out
16:13:44 <anssik> s/ONNX weight/ONNX weights
16:14:10 <anssik> s/without weight/without weights
16:17:18 <anssik> s/(in PR #809)//
16:17:25 <anssik> RRSAgent, draft minutes
16:17:26 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik
16:18:16 <anssik> s/PR #809/PR #805
16:18:19 <anssik> RRSAgent, draft minutes
16:18:20 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik
16:19:02 <anssik> s/ideas/idea
16:21:49 <anssik> s/dequantize/quantizeLinear/
16:22:02 <anssik> RRSAgent, draft minutes
16:22:03 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik
16:23:22 <anssik> Present+ Thomas_Steiner
16:23:36 <anssik> RRSAgent, draft minutes
16:23:38 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/27-webmachinelearning-minutes.html anssik