14:53:56 RRSAgent has joined #webmachinelearning 14:54:00 logging to https://www.w3.org/2025/12/04-webmachinelearning-irc 14:54:00 inviting RRSAgent 14:54:00 RRSAgent, make logs Public 14:54:01 please title this meeting ("meeting: ..."), anssik 14:54:02 Zakim, prepare meeting 14:54:02 RRSAgent, make logs Public 14:54:03 please title this meeting ("meeting: ..."), anssik 14:54:05 Meeting: WebML WG Teleconference – 4 December 2025 14:54:10 Chair: Anssi 14:54:17 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-12-04-wg-agenda.md 14:54:21 Scribe: Anssi 14:54:26 scribeNick: anssik 14:54:35 gb, this is webmachinelearning/webnn 14:54:35 anssik, OK. 14:54:41 Present+ Anssi_Kostiainen 14:59:04 RRSAgent, draft minutes 14:59:05 I have made the request to generate https://www.w3.org/2025/12/04-webmachinelearning-minutes.html anssik 14:59:33 Present+ Davis_Shaver 15:00:14 Present+ Zoltan_Kis 15:00:17 davisshaver has joined #webmachinelearning 15:00:28 Present+ Markus_Handell 15:00:36 Joshua_Lochner has joined #webmachinelearning 15:00:54 Present+ Joshua_Lochner 15:01:15 Present+ Markus_Tavenrath 15:01:22 mtavenrath has joined #webmachinelearning 15:01:31 Present+ Dominique_Hazael-Massieux 15:01:44 Present+ Ehsan_Toreini 15:02:14 Present+ Ningxin_Hu 15:02:20 Present+ Rob_Kochman 15:02:32 Present+ Dwayne_Robinson 15:02:53 ningxin has joined #webmachinelearning 15:04:01 dom has joined #webmachinelearning 15:04:18 Anssi: we'll start by acknowledging our new participants who joined the WG: 15:04:25 zkis has joined #webmachinelearning 15:04:40 ... Simon Wijckmans from cside (Client side development Inc) 15:04:49 ... Lynne Jiang, Ben Greenstein from Google 15:04:58 ... Chris Needham from BBC 15:05:05 ... JaEun Jemma Ku from University of Illinois 15:05:17 ... Pavan Yanamadala, Siddharth Mangesh, Sharanya Chandrasekaran, Noormina Abuthahir from PayPal 15:05:21 ... Dexter Yang from ByteDance 15:05:28 ... Zoltan Kis as an Invited Expert 15:05:36 ... on behalf of the entire group, welcome to all new participants! 15:05:56 DwayneR has joined #webmachinelearning 15:06:18 Topic: F2F recap 15:06:29 -> Archived F2F agenda https://github.com/webmachinelearning/meetings/tree/main/2025-11-10-kobe 15:06:40 -> Working Group minutes https://www.w3.org/2025/11/09-webmachinelearning-minutes.html 15:06:40 -> Community Group minutes https://www.w3.org/2025/11/11-webmachinelearning-minutes.html 15:06:49 Anssi: I was not planning to recap the official agenda, but summarize the progress made outside the official meeting 15:06:59 ... some highlights for me were the following: 15:07:17 ... - we were able to raise awareness of our groups' inference and agentic work via breakouts and horizontal groups with the broader W3C community 15:07:24 ... - we presented WebML WG and CG work at the very popular AI Agents and The Web breakout 15:07:29 -> WebML WG/CG at AI Agents and The Web breakout https://anssiko.github.io/ai-and-web-tpac-2025/ 15:07:43 Anssi: - together with Reilly, we presented WebNN at the Security IG F2F meeting on Fri 15:07:47 -> https://github.com/w3c/securityig/blob/main/meetings/2025/2025-11-14_agenda.md 15:08:03 Anssi: - Mozilla revised its WebNN position to support and Tarek initiated implementation work, see #763 15:08:04 https://github.com/webmachinelearning/webnn/issues/763 -> Issue 763 Request standards positions from Mozilla and WebKit (by reillyeon) [process] 15:08:17 ... - WebKit reopened its WebNN standards position 15:08:26 Ehsan has joined #webmachinelearning 15:08:37 ... - Markus and the NVIDIA team extended their exploration into various WebNN implementation strategies and optimizations, discussed during the week on the hallway track 15:09:07 ... given the broader implementer interest is now ramping up fast, I propose we use W3C's Slack #webmachinelearning for synchronous implementation related discussions across implementers and continue to use IRC for these bi-weekly meetings 15:09:14 ... Slack has certain benefits over IRC for this type of long-running discussions, e.g. message persistence, so I think this separation of concerns works here 15:09:24 ... Tarek already started discussions about his Rust implementation on Slack and Markus chimed in, thanks! 15:09:29 - ... please join the W3C Slack #webmachinelearning to exchange ideas across implementers interested in WebNN 15:09:32 -> How to join W3C Slack https://www.w3.org/wiki/Slack 15:09:35 RobKochman has joined #webmachinelearning 15:10:46 q? 15:11:16 MarkusT: I'm looking forward to Tarek's work 15:11:33 Topic: W3C Web & AI Interest Group launched 15:11:50 Anssi: The Web & AI Interest Group is a forum to discuss ethical, societal, and technical implications of AI related technologies. Ethical Principles for Web Machine Learning established as a joint deliverable. 15:11:58 -> W3C Web & AI Interest Group Charter https://www.w3.org/2025/10/webai-ig-charter.html 15:12:19 ... I had an exchange with Fabien Gandon who co-chairs the IG 15:12:35 ... unfortunately Fabien couldn't join us today, but I'm conveying his welcome and invite anyone interested in ethical, societal, and technical implications of AI related technologies to join the IG 15:12:39 ... we will develop our Ethical Principles document together with this newly formed IG 15:12:44 ... to join, please follow the link in the charter document 15:13:17 Dom: thanks for the intro, the way to think about the IG, a place for the broader picture, this WebML WG does excellent deep work on technical specifications 15:14:34 ... IG looks primarily non-technical topics, higher-level considerations how the AI & Web ecosystems can evolve harmoniously 15:14:35 q? 15:15:03 Anssi: what is the IG's work mode? 15:15:14 Dom: GH-driven, some meetings planned 15:16:32 Anssi: is there flexibility in terms of non-normative deliverables the IG could work on? 15:17:03 Dom: yes, non-normative deliverables would be welcome, the W3C team contact is working on a roadmap for the IG 15:17:05 q? 15:17:49 Topic: Core operator set 15:18:09 Subtopic: Expand the expand operator to support blockwise broadcasting 15:18:11 Anssi: issue #903 15:18:12 https://github.com/webmachinelearning/webnn/issues/903 -> Issue 903 Expand the expand operator to support blockwise broadcasting (by fdwr) [opset] 15:18:32 ... this is one of the sub-issues spun off from the core op set meta issue 15:20:35 Dwayne: A) the preferred proposal is to: 15:20:40 ... - move the blockwise broadcasting aspect into expand 15:20:45 ... - leave the rest of the decomposition being the respective mul/div/sub/add for Q and DQ 15:20:50 -> expand() https://webmachinelearning.github.io/webnn/#dom-mlopsupportlimits-expand 15:20:56 Dwayne: B) the alternative considered: 15:21:00 ... - extend resample with nearest neighbor to support multiple axes 15:21:04 -> resample() https://webmachinelearning.github.io/webnn/#api-mlgraphbuilder-resample2d-method 15:21:22 Anssi: the preferred proposal is motivated by better conceptual alignment 15:21:54 Anssi: any questions or concerns with the preferred proposal? 15:22:34 Rob: need Reilly's input on Google's side 15:23:19 ... will ping Reilly to provide feedback on this issue 15:23:50 q? 15:24:15 Subtopic: Extend rank support 15:24:20 Anssi: issue #904 15:24:22 https://github.com/webmachinelearning/webnn/issues/904 -> Issue 904 2D, or not 2D, that is the question (by fdwr) [opset] 15:24:24 ... the more catchy name for this issue is "2D, or not 2D, that is the question" :-) 15:24:45 ... Dwayne reports: "Multiple WebNN operators still have limited ranks which was historically done for backends that might be more limited" 15:24:57 ... current backends have evolved since rank support was specified in WebNN 15:25:05 ... limited ranks have caused issues in certain popular models 15:25:09 .... Dwayne notes Whisper uses 1D conv and thus requires an extra reshape() step 15:25:22 ... the issue contains a survey of the current operator rank support for CoreML, DML, LiteRT, ORT backends 15:25:40 ... the proposal from Dwayne is to extend the rank support to match the intersection of the rank support of current backends 15:26:26 ... the solution can take various API shapes 15:26:30 ... Dwayne's came up with the following options by studying other libraries: 15:26:37 ... - A) Bake the axis count directly into the operator name 15:26:37 ... - B) Use a single operator name, with an implicit axis count based on the input rank 15:26:37 ... - C) Pass the reduction axis count separately from the input rank 15:26:37 ... - D) Pass the explicit axes 15:26:55 ... based on the pros/cons analysis option C or D is the most preferred 15:27:56 Anssi: we can reflect platform rank differences through MLOpSupportLimits 15:28:07 ... any axis count 1-3 would be legal to WebNN if `axis count <= input rank`, see the table in the issue 15:28:18 Anssi: Dwayne suggests to avoid adding a zoo of new function names: 15:28:27 ... foo1, foo2, foo3 etc. 15:28:35 ... conv2d -> conv 15:28:35 ... convTranspose2d -> convTranspose 15:28:36 ... averagePool2d -> averagePool 15:28:36 ... l2Pool2d -> l2Pool 15:28:36 ... maxPool2d -> maxPool 15:28:37 ... resample2d() -> resample 15:30:13 Dwayne: for each op, I can list IDL proposals to help readers 15:30:25 q? 15:31:31 Anssi: does option C or D still allow AOT feature detection of ranks? 15:31:35 ... a simple feature detection mechanism is to check for existence of a method on an object 15:31:38 ... can we implement such a feature detection of supported ranks entirely with MLOpSupportLimits? 15:31:53 ... an example of a simple feature detection: 15:31:53 ``` 15:31:53 const graphBuilder = new MLGraphBuilder(await navigator.ml.createContext()); 15:31:53 if ('conv' in graphBuilder) console.log('conv() exists'); 15:31:53 ``` 15:32:32 Anssi: the naming change has a compatibility impact as discussed in context of issue #821 15:32:33 https://github.com/webmachinelearning/webnn/issues/821 -> Issue 821 Operator naming 2D vs 2d (by fdwr) [conventions] 15:33:24 ... given the Origin Trials are imminent, I think this change would land after the initial OT period? 15:34:26 Dwayne: when making a new change, we give it 4 weeks for frameworks to update themselves, leave an alias in place 15:34:27 q? 15:35:26 Subtopic: Composite operators / subgraphs 15:35:34 Anssi: issue #907 15:35:35 https://github.com/webmachinelearning/webnn/issues/907 -> Issue 907 Composite operators / subgraphs (by fdwr) [opset] 15:35:50 ... Core operator set was discussed at TPAC 2025 where we resolved to evolve the proposal for aggregate operators via subgraphs 15:35:54 -> RESOLUTION from TPAC 2025 https://www.w3.org/2025/11/09-webmachinelearning-minutes.html#ffff 15:36:05 Anssi: this builds upon the earlier exploration by Ningxin et al. on custom ops discussed at TPAC 2024 15:36:10 -> Custom ops at TPAC 2024 https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b039 15:36:25 Anssi: Dwayne opened this topic-specific issue to pursue this proposal further and shared his background research on the topic (thanks!) 15:36:34 ... see also the Case Study on WebNN Small Language Model Performance Optimization presented at TPAC 2025 for further motivation: 15:36:40 -> WebNN SLM Performance Optimization Case Study at TPAC 2025 https://lists.w3.org/Archives/Public/www-archive/2025Nov/att-0000/WebNN_SLM_Optimization_-_TPAC.pdf 15:37:06 Anssi: high-level motivation for the proposal has been discussed in context of the core op set meta issue and I think we have a general agreement 15:37:13 ... - 100s of potential operators across ML libraries 15:37:26 ... - adding all of them into a Web API is not feasible 15:37:42 ... - WebNN core op set is designed to enable composability of larger aggregate ops 15:38:12 ... - if the backend has a compatible implementation of the subgraph, can use a more efficient path vs. relying on pattern recognition by the implementation 15:38:32 ... a popular concrete example of an aggregate op is multi-head attention, a key component of the transformer architecture introduced in the original 2017 paper 15:38:56 ... Dwayne has a code snippet in the issue to demonstrate how this could look like in terms of API surface, basic steps (details, names etc. to be discussed) 15:39:43 Dwayne: a web developer defines a composite operator as a JS function using the existing WebNN built-in ops 15:39:51 ... buildSubgraph() method returns the built subgraph 15:40:00 ... subgraph() method returns the output given the built subgraph and input 15:40:06 q+ to ask if we maintain the semantics this was meant to be tanh 15:40:17 Dwayne: this is more of an example, ideas welcome 15:40:18 q? 15:40:31 MarkusT: how to handle different constants? 15:40:36 ... do we want to get subgraph names? 15:40:47 Dwayne: would a name be helpful for a backend to recognize? 15:41:16 MarkusT: ML is done by frameworks, and pattern matching the subgraph, can pre-check if this is a name I expect, the name would be a hash to know what to pattern match against 15:41:39 Dwayne: looked at various ML libraries, would a list of candidate names be better? 15:42:25 MarkusT: if you dumb the subgraph into debugging tool, the name would help with debugging 15:42:39 ... subgraphs calling subgraphs? 15:42:46 Dwayne: seems useful for composability? 15:44:03 MarkusT: the input would be dynamic, the shape of the input determined by whatever output is sent to the input, subgraphs are like macros 15:44:16 Dwayne: ONNX has this concept of functions composed of multiple graphs 15:44:28 MarkusT: do we expect macro expansion by every backend? 15:45:01 Dwayne: not sure about that, each backend or layer below, should know the capabilities of the backend 15:45:25 q+ 15:45:28 MarkusT: if the backend would support subgraphs, perhaps the WebNN native interface would unroll 15:45:30 q? 15:45:40 ack zkis 15:45:40 zkis, you wanted to ask if we maintain the semantics this was meant to be tanh 15:46:05 handellm has joined #webmachinelearning 15:46:37 Zoltan: question, do we want to maintain semantics? 15:46:49 Dwayne: MarkusT's idea of including names would help with that 15:47:09 Zoltan: we should discuss whether we need to "standardize" those names 15:47:22 ... an annotation mechanism 15:48:06 MarkusT: not prefer any meta information, new operation, how long does it take for us to standardize vs backend find a name an implement it? 15:48:07 q? 15:48:10 ack ningxin 15:48:57 Ningxin: to express an operator, the ops take optional input, for some attention ops, they have optional input too, how the subgraph concept can support that 15:49:33 ... secondly, some existing WebNN ops have attributes, WebNN conventions 15:50:11 Ningxin: static attributes, how to go about them? 15:50:24 Dwayne: will add that as a consideration 15:50:25 q? 15:50:51 MarkusT: what if attributes could override? 15:50:53 Dwayne: suspect so 15:50:54 q? 15:51:16 q? 15:51:25 Topic: Push v pull architecture for constants 15:51:30 Anssi: issue #901 15:51:31 https://github.com/webmachinelearning/webnn/issues/901 -> Issue 901 Proposal: API to Separate Graph Building from Weight Loading to Reduce Peak Memory Usage (by mtavenrath) [feature request] 15:51:44 ... we discussed this proposal to reduce peak memory usage from Markus and the NVIDIA team at TPAC 2025 15:51:53 ... and resolved to explore streaming constructor for constants 15:51:57 -> https://www.w3.org/2025/11/09-webmachinelearning-minutes.html#e940 15:52:04 Anssi: after TPAC, Markus provided further details on the benefits of the proposed pull-based model for constants in this issue: 15:52:11 ... - 1. Latency Hiding via Parallel Compilation 15:52:11 ... - 2. Direct-to-Disk Caching & I/O Alignment 15:52:11 ... - 3. Persistent Layout Optimization 15:52:11 ... - 4. Memory Architecture & UVM Efficiency 15:52:11 ... - 5. Dynamic Resource Management 15:52:36 ... and with the broader NVIDIA team looked into remote execution of neural networks, e.g. on a home-server, using external weights 15:52:48 Anssi: Dwayne notes external weights are already achievable via MLTensor when combined with MLGraphBuilder.input() method 15:53:06 ... this allows MLGraph to be build without weights, to be written later via writeTensor() 15:53:11 ... Dwayne suggests this addresses some of the concerns raised in this issue? 15:53:20 ... what functioanality do we miss with MLTensor and input()? 15:53:27 ... I guess 5. dynamic resource management? 15:53:43 q? 15:54:22 MarkusT: parsing constants is delayed, not all backends happy to call writeTensor() 15:54:58 ... 5 points based on discussion with Reilly, if we have external resources we don't need to do mem copies at all, get the data at the time when you need it 15:55:23 ... backends can pull the resources on demand, responsibility on the backend implementation 15:56:13 ... we were wondering about caching, current ORT likely downloads all content, backend could be faster than code running in a JS process 15:56:35 ... we're currently doing work in another ML framework with similar optimizations 15:56:37 q? 15:57:13 MarkusT: pass an URL and offset proposal by Reilly sounded good, gguf file passing to GraphBuilder and give input tensor names and we're done 15:57:14 q? 15:57:30 q? 15:57:48 Dwayne: I'll think about this more 15:58:01 q? 15:58:23 Present+ Mike_Wyrzykowski 15:58:45 q? 15:58:51 Topic: Device selection 15:58:56 Subtopic: Device selection criteria for usecase-driven scenarios 15:59:03 Anssi: issue #902 15:59:04 https://github.com/webmachinelearning/webnn/issues/902 -> Issue 902 Device selection criteria for usecase-driven scenarios (by fdwr) [device selection] 15:59:07 ... we discussed this at TPAC 2025: 15:59:11 -> https://www.w3.org/2025/11/09-webmachinelearning-minutes.html#93d2 16:00:16 ... - there was consensus that generally hints is the preferred mechanism, but no decision on which hints to pursue, if any, I posted an IDL diff to tease additional perspectives 16:00:36 ... - there was interest in supporting multiple devices of a given type 16:00:56 ... - there was an agreement prompt fatique is an issue, still evolving page embedded permission control (PEPC) might be solution to that 16:01:00 -> https://github.com/WICG/PEPC 16:01:39 Anssi: - proposal that hints would help UA schedule real-time vs non-real time workloads running in parallel 16:01:44 q? 16:03:06 MarkusH: if we have explicit (or implicitly detected by UA) Worker QoS, would there remain a use case for specifying the latency requirement? Same goes for the continuity. 16:03:35 ... perhaps Worker QoS is implicitly detectable by the UA, could remove low-latency preference in that case 16:03:52 ... a hint of real-time activity going on 16:03:53 q? 16:04:14 q? 16:05:12 MarkusH: perhaps Mike has feedback on an exact interface that would work out 16:05:22 q? 16:05:29 RRSAgent, draft minutes 16:05:30 I have made the request to generate https://www.w3.org/2025/12/04-webmachinelearning-minutes.html anssik 16:08:13 s/yes, non-normative/yes, new non-normative 16:09:20 s/Dwayne's came/Dwayne came 16:12:09 s/dumb the/dump the 16:13:16 s/an implement/and implement 16:14:25 s/suspect so/I suspect so 16:15:11 s/functioanality/functionality 16:15:37 s/mem copies/memory copies 16:16:18 s/gguf/GGUF 16:16:56 s/be solution/be a solution 16:17:29 RRSAgent, draft minutes 16:17:31 I have made the request to generate https://www.w3.org/2025/12/04-webmachinelearning-minutes.html anssik 18:30:29 Zakim has left #webmachinelearning