14:54:00 RRSAgent has joined #webmachinelearning 14:54:05 logging to https://www.w3.org/2026/01/15-webmachinelearning-irc 14:54:05 RRSAgent, make logs Public 14:54:06 please title this meeting ("meeting: ..."), anssik 14:54:07 Meeting: WebML WG Teleconference – 15 January 2026 14:54:17 Chair: Anssi 14:54:18 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-01-15-wg-agenda.md 14:54:34 Scribe: Anssi 14:54:47 scribeNick: anssik 14:54:58 gb, this is webmachinelearning/webnn 14:55:03 anssik, OK. 14:55:05 Present+ Anssi_Kostiainen 14:59:09 Present+ Dwayne_Robinson 14:59:57 Present+ Doug_Schepers 15:00:25 Present+ Jonathan_Ding 15:00:31 Present+ Dominique_Hazael-Massieux 15:00:38 Joshua_Lochner has joined #webmachinelearning 15:00:42 Present+ Joshua_Lochner 15:00:53 Present+ Ugur_Acar 15:01:13 Present+ Tarek_Ziade 15:01:29 Present+ Mike_Wyrzykowski 15:02:09 Present+ Fabio_Bernardon 15:02:24 RRSAgent, draft minutes 15:02:26 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 15:02:38 Present+ Rafael_Cintron 15:02:48 dwayner has joined #webmachinelearning 15:02:52 Present+ Markus_Tavenrath 15:02:55 RafaelCintron has joined #webmachinelearning 15:03:37 RRSAgent, draft minutes 15:03:38 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 15:04:00 Present+ Ben_Greenstein 15:04:12 Present+ Ehsan_Toreini 15:04:33 Anssi: welcome to our first meeting of the year 2026, we had a break over the holiday and return to the usual cadence 15:04:39 Anssi: we'll start by acknowledging our later new participant who joined the WG: 15:04:44 ... Liang Zeng from ByteDance 15:04:49 ... welcome to the group, Liang! 15:04:52 BenGreenstein has joined #webmachinelearning 15:05:05 Ehsan has joined #webmachinelearning 15:05:56 Ugur_Depixen has joined #webmachinelearning 15:05:59 Anssi: also welcome again Doug Schepers! 15:06:15 Mike_Wyrzykowski has joined #webmachinelearning 15:06:39 Doug: using classical ML for a11y improvements 15:07:13 ... Jonathan Ding from Intel joins us as a guest for this meeting to present a new proposal, discussed next 15:07:24 Topic: New proposal: Dynamic AI Offloading Protocol (DAOP) 15:07:32 gb, this is webmachinelearning/proposals 15:07:32 anssik, OK. 15:07:45 Anssi: from time to time we review new proposals submitted for consideration by the WebML community 15:07:53 Anssi: we have received a new proposal called the Dynamic AI Offloading Protocol (DAOP) #15 that could benefit from this group's feedback and suggestions 15:07:54 https://github.com/webmachinelearning/proposals/issues/15 -> Issue 15 Dynamic AI Offloading Protocol (DAOP) (by jonathanding) 15:08:23 ... as you know, our group has received feedback from developers and software vendors from time to time that they'd love to run inference tasks with WebNN, but often times they're unsure if the user's device is capable enough 15:08:49 ... diversity of models and client hardware make it challenging to determine up front whether a given model run on the user's device with QoS that meets the requirements of the developer 15:09:18 ... and we can't expose low-level details though the Web API to avoid fingerprinting, also we believe we shouldn't expose too much complexity through the Web API layer to remain future-proof 15:09:50 ... this easily leads to a situation where web apps either choose to use the least common denominator model, or use cloud-based inference even if the user's device could satisfy the QoS requirements 15:10:09 ningxin has joined #webmachinelearning 15:10:14 ... I have invited Jonathan Ding to share a new proposal called Dynamic AI Offloading Protocol (DAOP) to address the challenges related to offloading inference tasks from servers to client devices 15:10:43 ... Jonathan will introduce the proposal in abstract, a few example use cases, and a high-level implementation idea -- we won't go into implementation details in this session 15:10:54 ... after Jonathan's ~5-min intro we'll brainstorm a bit to feel the room and inform the next steps 15:10:59 ... I will ask everyone to focus on the use cases -- do these use cases capture the key requirements? 15:11:38 Jonathan: this is about hybrid AI, expectation is to be able to offload the inference task to the client, offloading is not free, you have QoS expectations 15:11:51 zolkis has joined #webmachinelearning 15:12:01 ... you need to be able to decide if the device is capable to run the given model while satisfy QoS requirements 15:12:01 present+ Zoltan_Kis 15:12:05 Jonathan: Use Case 1: Adaptive Video Conferencing Background Blur 15:12:13 ... A cloud-based video conferencing provider wants to offload background blur processing to the user's laptop to save server costs. 15:12:32 ... 1. The cloud server sends a light-weight weightless Model Description (topology and input shape only, without heavy weight parameters) of the blur model to the client's laptop 15:12:47 Present+ Zoltan_Kis 15:12:57 Jonathan: ... 2. The laptop's browser runs a "Dry Run" simulation locally using the proposed API to estimate if it can handle the model at 30 FPS. 15:13:03 ... 3. The laptop returns a QoS guarantee to the server. 15:13:07 ... 4. If the QoS is sufficient, the server pushes the full model to the laptop; otherwise, processing remains on the cloud. 15:13:12 Jonathan: Use Case 2: Privacy-Preserving Photo Enhancement for Mobile Web 15:13:19 ... A photo editing web app wants to run complex enhancement filters using the user's mobile NPU to reduce latency. 15:13:45 ... 1. The application queries the device's capability using the standard performance estimation API, avoiding fingerprinting by returning a broad performance "bucket" rather than exact hardware specs. 15:13:52 ... 2. The device calculates its capability based on the memory bandwidth and NPU TOPs required by the filter model. 15:13:57 ... 3. Finding the device capable, the app enables the "High Quality" filter locally, ensuring the user's photos never leave the device. 15:14:03 Jonathan: two sub-proposals, differing in how they assign responsibility between the Caller and the Callee 15:14:07 ... Sub-proposal A: Device-Centric (Caller Responsible) 15:14:11 ... the Cloud acts as the central intelligence. It collects data from the device and makes the decision. 15:14:15 ... Sub-proposal B: Model-Centric (Callee Responsible) - Preferred 15:14:19 ... the Device acts as the domain expert. It receives a description of the work and decides if it can handle it. 15:14:48 https://github.com/webmachinelearning/proposals/issues/15 -> Issue 15 Dynamic AI Offloading Protocol (DAOP) (by jonathanding) 15:16:20 q+ 15:16:21 q? 15:16:23 ack RafaelCintron 15:17:02 Rafael: estimating capabilities of user hardware, can I run time model, you also want to know if you can run the model well, this has challenges in native environments too 15:17:34 ... big weight will run with the same topology slower than with smaller models 15:17:58 ... need to consider the impact of other applications running on the system at the same time 15:18:07 http://browserleaks.com/webgpu 15:19:02 Rafael: this site shows what information is exposed by WebGPU, WebGPU adapter information does disclose pretty detailed information that allows developers to defer some details about GPU, something similar could in abstract work for WebNN 15:19:03 q? 15:19:47 Rafael: this is certainly something developers are struggling with and it is worth exploring further 15:19:58 ... as models get bigger more people will struggle with this problem 15:19:59 q? 15:20:24 Jonathan: thank you for the comments, very good feedback 15:20:54 ... instead of running the entire model, this aligns with what we observe from ISV discussion 15:20:55 q? 15:22:37 Anssi: I think https://github.com/rustnn/webnn-graph could be used for prototyping this proposal 15:23:24 Tarek: I have also other utils, e.g. for ONNX<->WebNN graph conversion 15:23:28 q? 15:25:09 Doug: can a person opt into and opt out of sharing device capability information? 15:26:20 ... the second model does not fingerprint, does the user have agency in making sure the device is not used for compute they don't want it to be used? 15:27:06 q+ 15:27:11 ack RafaelCintron 15:28:10 Rafael: to answer Doug, for WebGPU and WebGL, those APIs have no permission prompts, and the APIs can allocate a lot of memory and compute, the same with JS, also Storage APIs, Chromium has lifted storage restrictions 15:29:54 Anssi: thank you Jonathan for sharing this proposal with the group 15:29:58 ... I'm hearing the group agrees these use cases are valuable 15:30:06 ... I also hear the group would like to see interested people move forward with this proposal 15:30:34 RESOLUTION: Create an explainer for Dynamic AI Offloading Protocol (DAOP) and initiate prototyping 15:30:49 Topic: Candidate Recommendation Snapshot 2026 review 15:31:01 gb, this is webmachinelearning/webnn 15:31:01 anssik, OK. 15:31:06 Anssi: PR #915 15:31:07 https://github.com/webmachinelearning/webnn/pull/915 -> Pull Request 915 Add Candidate Recommendation Snapshot for staging (by anssiko) 15:31:11 -> WebNN API spec release history https://www.w3.org/standards/history/webnn/ 15:31:19 Anssi: we're ready to publish a new Candidate Recommendation Snapshot (CRS) 15:31:37 ... this milestone will be communicated widely within the W3C community and externally 15:31:42 ... our prior CRS release happened 11 April 2024, and a lot of progress has been made since: 15:31:50 ... over 100 significant changes 15:32:00 ... third wave of operators for enhanced transformers support 15:32:04 ... the MLTensor API for buffer sharing 15:32:13 ... a new abstract device selection mechanism 15:32:17 ... the API surface has been modernized 15:32:27 ... interoperability improvements informed by implementation experience and developer feedback 15:32:31 ... improved security and privacy considerations 15:32:36 ... fingerprinting mitigations 15:32:40 ... new accessibility considerations 15:32:48 ... I staged a release in PR #915 that adds an appendix with detailed changes per categories that map to W3C Process defined Classes of Changes: 15:32:52 -> https://www.w3.org/policies/process/#correction-classes 15:33:09 Anssi: the next step for us is to record the group's decision to request transition 15:33:34 ... any questions or concerns, are we ready to publish? 15:33:42 q? 15:34:22 shepazu has joined #webmachinelearning 15:34:33 Dom: this release triggers a Call for Exclusions so everything that's in the release scope gets Royalty-Free protection 15:34:47 https://github.com/webmachinelearning/webnn/pull/915 -> Pull Request 915 Add Candidate Recommendation Snapshot for staging (by anssiko) 15:35:15 RESOLUTION: Publish a new Candidate Recommendation Snapshot of the WebNN API as staged in PR #915 15:35:27 (again, sorry for the noise… I'll try to be more respectful of group meeting time) 15:35:43 Topic: Implementation experience, from the past to the future 15:36:15 Anssi: in the past the group has also worked on webnn-native, a standalone native implementation as a C/C++ library 15:36:22 -> webnn-native https://github.com/webmachinelearning/webnn-native 15:36:48 Markus: I'm interested in webnn-native library, understand the technical reasons for moving away from this library and into the current WebNN implementation that is more tightly integrated with the Chromium codebase 15:37:15 ... webnn-native is similar to Dawn, a WebGPU implementation 15:39:56 q+ 15:40:01 Markus: should we revive webnn-native or use rustnn for native interface for WebNN? 15:40:04 ack RafaelCintron 15:40:48 Rafael: I would be personally in favour of reviving webnn-native once we ship OT 15:41:28 ... it is a lot of work to integrate a 3rd party library to Chromium, smart pointers, bitsets and all that 15:41:54 ... webnn-native came first, there was opposition back in the time for hosting the webnn-native library outside the Chromium project 15:41:55 q? 15:42:20 Present+ Ningxin_Hu 15:43:02 q? 15:43:22 Anssi: before the break Tarek shared news about the Python and Rust implementation of the WebNN API 15:43:31 ... this work is now hosted under the newly established RustNN project along with other utils: 15:43:36 -> RustNN https://github.com/rustnn 15:43:41 Anssi: this GH org hosts a number of repos: 15:43:50 ... rustnn, the Rust implementation 15:43:55 ... pywebnn, Python bindings for rustnn 15:44:01 ... webnn-graph, a WebNN-oriented graph DSL 15:44:09 ... webnn-onnx-utils, WebNN <-> ONNX conversion 15:44:15 ... trtx-rs, TensorRT-RTX bindings 15:44:20 ... and more 15:44:50 Tarek: it is a lot of fun working on RustNN, happy to all interested collaborators to the repo 15:45:16 ... I want to have all the WebNN demos working on Python as well, focusing on LLMs now 15:45:30 RRSAgent, draft minutes 15:45:31 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 15:46:04 Tarek: I have a patch for Firefox to expose rustnn with JS bindings 15:46:04 q? 15:46:30 Topic: Accelerated context option implementation feedback 15:46:36 Anssi: issue #911 15:46:37 https://github.com/webmachinelearning/webnn/issues/911 -> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection] 15:46:45 Anssi: we received new implementation feedback from Mingming (thanks!) for the accelerated context option 15:46:51 -> https://www.w3.org/TR/webnn/#api-mlcontextoptions 15:47:21 Anssi: specifically the feedback asks for clarification how "accelerated" it is supposed to interact with the existing power preference ("default", "high-performance", "low-power") 15:47:31 ... currently, as specified, "accelerated" property has lower priority than "powerPreference" 15:47:39 ... per Mingming, this creates difficulty in the following scenarios: 15:48:15 { powerPreference: "low-power", accelerated: true } if no low-power device is available 15:48:33 { powerPreference: "low-power", accelerated: false } if the implementation cannot force CPU to low-power state 15:48:54 { powerPreference: "high-performance", accelerated: false } if the implementation cannot force CPU to high-performance state 15:49:07 ... Mingming's proposal is to give "accelerated" a higher priority than "powerPreference" 15:49:21 q+ 15:49:22 ... Zoltan's proposal is to consider "powerPreference" to set the power envelope limits 15:49:38 ... I'd like to discuss how to evolve the spec to clarify this aspect 15:49:57 ... first, I'd like to establish whether we agree both "accelerated" and "powerPreference" are hints i.e. implementers provide best-effort service given this information 15:50:24 ... second, I'd like to ask if it would be clearer to present the possible combinations as an informative truth table instead of prose? 15:50:26 q? 15:50:29 ack Mike_Wyrzykowski 15:50:54 Mike: since these are hints, depending on the system the implementation can ignore them and be spec-conformant 15:51:10 ... on macOS for example, WebGPU/GL may ignore similar hints 15:51:34 ... how to interpret these hints, it may not be successful to try to prescribe what implementers should do 15:51:35 q+ 15:52:00 ack zolkis 15:53:07 Zoltan: I summarized that if power preference is low-power it expressed developer priority for lower power, otherwise accelerated would have priority, nevertheless it is all hints, would use an informantive truth table 15:53:10 q+ 15:53:37 ... I was considering Apple platform capabilities in this design 15:54:00 ... power envelope may be for heat management or other reasons 15:54:02 q? 15:54:06 ack RafaelCintron 15:54:32 q+ 15:54:56 Rafael: what do people think about items in the powerPreference enum? 15:55:08 ... what is available in frameworks today for implementers? 15:55:28 ... if the backend is CoreML or LiteRT, what do I do? 15:55:52 ... fallback adapter would be one boolean 15:56:48 Rafael: suggestion, powerPreference enum could have a new "no-acceleration" entry 15:57:49 https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions 15:57:50 ... "no-acceleration" could map as of today to CPU to map to current frameworks 15:58:27 ... WebGPU has a similar problem and they solved it with powerPreference and fallback adapter 15:58:29 q? 15:58:36 ack Mike_Wyrzykowski 15:59:19 Mike: quick comment, the proposal from Rafael sounds reasonable, the name we may want to iterate on, "no-acceleration" should not explicitly mean run on CPU 15:59:20 q? 16:00:15 q? 16:00:32 RRSAgent, draft minutes 16:00:33 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 16:01:06 Zoltan: we had a use case for "accelerated", need to revisit that 16:01:09 q? 16:02:30 RRSAgent, draft minutes 16:02:31 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 16:04:26 s/while satisfy/and satisfy 16:06:46 s/to Chromium/into Chromium 16:08:08 s/happy to/happy to add 16:08:40 s/it is supposed/is supposed 16:09:16 shepazu has joined #webmachinelearning 16:09:16 zolkis has joined #webmachinelearning 16:09:16 Mike_Wyrzykowski has joined #webmachinelearning 16:09:16 Joshua_Lochner has joined #webmachinelearning 16:09:16 reillyg has joined #webmachinelearning 16:10:51 s/entry/value 16:11:17 s/to map to/as in 16:11:42 RRSAgent, draft minutes 16:11:43 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 16:21:31 s/run time model/run the model 16:24:08 RRSAgent, draft minutes 16:24:10 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 16:25:02 s/big weight/big weights 16:25:15 s/smaller models/smaller weights 16:25:17 RRSAgent, draft minutes 16:25:19 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 16:25:40 s/defer/infer 16:27:17 s/second model/model-centric proposal 16:27:17 RRSAgent, draft minutes 16:27:19 I have made the request to generate https://www.w3.org/2026/01/15-webmachinelearning-minutes.html anssik 18:01:27 Zakim has left #webmachinelearning