14:58:07 RRSAgent has joined #webmachinelearning 14:58:11 logging to https://www.w3.org/2026/01/29-webmachinelearning-irc 14:58:11 RRSAgent, make logs Public 14:58:12 please title this meeting ("meeting: ..."), anssik 14:58:13 Meeting: WebML WG Teleconference – 29 January 2026 14:58:18 Chair: Anssi 14:58:22 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-01-29-wg-agenda.md 14:58:28 Scribe: Anssi 14:58:32 scribeNick: anssik 14:59:33 gb, this is webmachinelearning/webnn 14:59:34 anssik, OK. 14:59:38 Present+ Anssi_Kostiainen 14:59:44 Present+ Markus_Tavenrath 14:59:45 Regrets+ Tarek_Ziade 15:00:06 Present+ Ugur_Acar 15:00:46 Present+ Mike_Wyrzykowski 15:00:53 Present+ Rafael_Cintron 15:01:03 Present+ Dwayne_Robinson 15:01:21 Mike_Wyrzykowski has joined #webmachinelearning 15:01:34 Present+ Ben_Greenstein 15:01:44 dwayner has joined #webmachinelearning 15:02:00 Present+ Ningxin_Hu 15:02:19 ningxin has joined #webmachinelearning 15:02:28 RRSAgent, draft minutes 15:02:29 I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik 15:02:53 Present+ Ehsan_Toreini 15:03:01 BenGreenstein has joined #webmachinelearning 15:03:15 Present+ Markus_Handell 15:03:38 Anssi: please welcome the later new participant who joined the WG: 15:03:50 ... Benjamin VanderSloot from Mozilla 15:03:57 ... Dominic Farolino from Google 15:04:12 ... welcome to the group, Benjamin and Dominic! 15:04:18 Topic: Incubations 15:04:27 -> https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-01-22-cg-agenda.md 15:04:30 -> https://www.w3.org/2026/01/22-webmachinelearning-minutes.html 15:04:37 handellm has joined #webmachinelearning 15:04:44 RafaelCintron has joined #webmachinelearning 15:04:49 Anssi: first, the WebML Community Group transitioned from the WebMCP explainer to a Community Group spec draft stage and published the initial draft: 15:04:53 -> https://webmachinelearning.github.io/webmcp/ 15:05:03 Anssi: the group will now port over content from the explainer to formalize the proposal 15:05:28 Anssi: second, an initial proposal for W3C WebML CG and Agentic AI Foundation coordination was reviewed and discussed 15:05:56 ... third, the group resolved to evolve the declarative proposal to expose tools via HTML in parallel with the imperative WebMCP API 15:06:11 ... for Built-in AI APIs, we welcomed new editors on board, Reilly and Ehsan picking up this responsibility for the Translator and Language Detector APIs 15:06:19 ... we also discussed implementers' priorities for 2026 to inform the group 15:06:29 ... Mike shared Chrome's biggest focus for Built-in AI APIs is the Prompt API 15:06:45 ... lastly, we resolved to shift the WebML CG call forward by one hour 15:07:00 ... we will keep this WebML WG call at its current timeslot in appreciation of our PRC, Japan and APAC participants who join already now at very late hours 15:07:18 Topic: HTTP Archive’s annual state of the web report for Generative AI 15:07:38 -> https://almanac.httparchive.org/en/2025/generative-ai 15:07:59 ... it is worth noting this report discusses the pros and cons of cloud vs local inference in its technology overview 15:08:05 -> https://almanac.httparchive.org/en/2025/generative-ai#cloud-versus-local 15:08:18 Anssi: thank you to the team who produced this extensive report 15:08:24 Ehsan has joined #webmachinelearning 15:08:40 Topic: Candidate Recommendation Snapshot published 15:08:43 gb, this is webmachinelearning/webnn 15:08:43 anssik, OK. 15:08:49 -> WebNN API spec release history https://www.w3.org/standards/history/webnn/ 15:08:53 Anssi: on 22 January 2026 we published a new Candidate Recommendation Snapshot (CRS) 15:09:04 ... since previous major WebNN CRS publication (11 April 2024) we have made over 100 significant changes 15:09:39 ... this WebNN "v3" milestone release added new ops and datatypes, improved API abstractions and developer ergonomics, interoperability, added new horizontal considerations and more 15:09:49 ... the group received kudos for its work with horizontal groups and topics: privacy, security, sustainability, ethics 15:10:11 ... the GitHub CI/CD is now configured to publish new Candidate Recommendation Drafts 15:10:11 and the spec editors are welcome to proceed to merge any open PRs that were awaiting as usual 15:10:15 ... huge congratulations to the group for this significant publication milestone! 15:10:29 Present+ Christian_Liebel 15:11:25 Christian: per HTTP Archive’s annual state of the web report interest in client-side AI is growing massively 15:11:54 ... WebLLM and Transformers.js use has grown sharply 15:12:08 Topic: Accelerated context option implementation feedback 15:12:13 Anssi: issue #911 15:12:14 https://github.com/webmachinelearning/webnn/issues/911 -> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection] 15:12:33 Anssi: we will discuss new implementation feedback and seek consensus on the proposed spec change to add "no-acceleration" to powerPreference enum 15:12:40 Anssi: for implementation feedback, we have new information from rustnn and Chromium 15:12:53 -> https://github.com/rustnn/rustnn#backend-selection 15:13:05 -> https://chromium-review.googlesource.com/c/chromium/src/+/7513189 15:13:13 Anssi: for rustnn, Tarek implemented backend selection based on the current spec using two hints passed to createContext: 15:13:20 ... boolean accelerated = true; 15:13:20 ... enum MLPowerPreference { "default", "high-performance", "low-power" } 15:13:37 ... for Chromium, Mingming prototyped the new proposed "no-acceleration" value for MLPowerPreference enum without accelerated boolean: 15:13:48 ... enum MLPowerPreference { "default", "high-performance", "low-power", "no-acceleration" }; 15:13:55 present+ 15:14:08 Anssi: I believe both implementations expose the MLContext.accelerated boolean and we have an emerging consensus on that part 15:14:11 q? 15:14:13 ack christianliebel 15:14:18 ... the hints provided at createContext time are still being discussed 15:14:30 -> https://github.com/webmachinelearning/webnn/issues/911#issuecomment-3798857878 15:14:31 https://github.com/webmachinelearning/webnn/issues/911 -> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection] 15:14:43 Anssi: in issue discussion, Zoltan notes "no-acceleration" would be a good context option but not a power preference per se 15:14:49 Ugur_Depixen has joined #webmachinelearning 15:15:02 ... WebGPU/GL use powerPreference and I believe is what developers expect to see in similar APIs that interface with hardware such as WebNN API 15:15:09 ... per Priority of Constituencies principle, I would suggest we consider developer ergonomics over theoretical purity in this case and do not rename powerPreference even if we choose to include "no-acceleration" in this enum 15:15:14 ... we can explain the naming issue in the spec prose 15:15:26 -> https://github.com/webmachinelearning/webnn/issues/911#issuecomment-3813141583 15:15:40 Anssi: Bryan comments that MLDevicePreference 'high-performance' is too ambiguous for WebGPU interop scenarios 15:15:44 ... Bryan's problem statement: 15:16:00 ... "In a hybrid system, WebNN might resolve to the NPU while WebGPU resolves to the dGPU. Since the adapters won't match, interop won't work." 15:16:05 ... feedback summary is a powerPreference alone is insufficient 15:16:06 q? 15:16:52 Rafael: I think there's unfortunately diversity in the ecosystem, multiple adapters, hybrid adapters, I think it is important that the WebNN agrees 15:17:08 ... WebGPU "high-performance" and WebNN "high-performance" should pick the same adapter 15:17:36 ... in the past we used to have GPUDevice, when you make a context you specify you want to do WebGPU interop with this device, I think we removed that from the Chromium implementation 15:18:11 ... I'm personally OK to have no-accelation enum in powerPreferences, also can live with it being its own boolean via fallbackAdapter in which case powerPreferences is ignored 15:18:34 ... WebGPU and WebNN selections should agree so they stay consistent 15:18:47 ... no way to rationalize what happens if WebGPU and WebNN pick a different adapter 15:18:48 q? 15:20:05 Ningxin: Mingming's idea is to explore Rafael's proposal and see how it can simplify the mapping to ONNX Runtime backend, ORT has its device selection policy with its own enum values 15:20:29 Anssi: does the Chromium prototype map to ORT policy? 15:20:35 Ningxin: makes it easier to map to it 15:21:09 Anssi: how about other JS frameworks? 15:22:21 q+ 15:22:26 Markus: I'm a bit late to this issue, I think for audio and video real-time inference, it is important to spec options so that you don't get executed on HW that could be shared with other apps on the system 15:22:47 ... there's a case for accelerated video where we accept any device, just the fastest one 15:23:54 q+ 15:24:00 ... for video case we could not get executed on CPU, the latest comment from Rafael to use the same preferences over WebGPU and WebNN, maybe you can create WebGPU context from WebNN context if the values need to match 15:24:25 q? 15:24:35 ack Mike_Wyrzykowski 15:25:26 MikeW: the option seems reasonable in general, but the name could use some thinking, sometimes least power-consuming or fastest, or most optimal device is the accelerated device, maybe like "fallback" as a name would be good instead 15:25:44 q? 15:25:47 ack mtavenrath 15:26:35 Markus: maybe independent of power options, ML is a huge pipeline, and looking at all the APIs involved, Web Audio, WebGPU, shouldn't we have an interface that says give me the best end-to-end optimized device? 15:27:17 ... if you want to do audio, maybe WebGPU is the worst thing to do 15:27:29 handellm has joined #webmachinelearning 15:28:03 q+ 15:28:19 Markus: we are thinking is the developer able to defer some hints per from where the input comes and where the output goes, e.g. input from WebGPU we never want to do CPU post-processing unless on OMA system 15:28:28 ... mem copies take so much time 15:28:29 q+ 15:28:30 q? 15:28:36 ack handellm 15:29:23 Markus: that sounds like out TPAC discussion regarding worker QoS, someone from Apple and Google suggested a way to describe to the entire pipeline, what are the concrete options for that, a new Web API or massage into WebNN? 15:29:51 ... create an options such as MediaStream where you retain the context of the entire system 15:29:52 q? 15:30:06 q? 15:30:10 ack RafaelCintron 15:31:00 Rafael: Markus from NVIDIA has an interesting suggestion, we had a similar idea when we passed GPUDevice to context, we could go back to that design, but in that case you pass the GPUDevice and powerPreference and they may all disagree among themselves 15:31:08 ... web developers perhaps should organize themselves 15:31:45 ... when you make the context all tensors are in the same domain, some want to interop with WebGPU, some want to interop with Web Audio, that'd require two contexts with different interop requirements 15:32:15 ... would web developers be fine with those tensors attached to different contexts not being able to access each other? 15:32:16 q? 15:32:44 Markus: there's always a mem copy if you swap between contexts, we give web developers control, do we want to give that control? 15:32:44 q? 15:33:47 q? 15:33:54 Topic: Model Cache API 15:34:00 Anssi: issue #807 15:34:01 https://github.com/webmachinelearning/webnn/issues/807 -> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request] 15:34:13 -> https://github.com/webmachinelearning/webnn/blob/main/cache-explainer.md 15:34:17 Anssi: the group has done initial investigation and drafted a proposal for an explicit API for caching compiled graphs 15:34:36 ... this ahead-of-time (AOT) compilation in particular benefits slower end-user devices 15:35:00 ... the proposed API is documented in the explained 15:35:09 ... the proposed API is documented in the explained 15:35:17 s/... the proposed API is documented in the explained/ 15:35:22 ... this reduces the overhead of repeated compilation 15:35:47 ... I want to understand whether 2026 is the right time to reinvigorate this discussion on the Model Cache API 15:35:53 ... also, I want to understand if we have new implementation experience that should be shared with the group and documented in the explainer 15:36:18 ... from our past discussion, I recall we discussed how to make this work with Chromium sandboxing and storage partitioning constraints 15:36:24 q? 15:37:27 Ningxin: we have this a priority for this year, web app developers and vendors are interested in this feature 15:37:46 ... AOT compilation will be an alternative way to solve this issue 15:38:45 ... from platform API perspective, we've worked with Windows ML team, there's an API to write a callback when you compile a model, through that you get a compiled blob, useful for implementing this in Chromium, pass this compiled blob from a sandbox to storage management 15:38:54 ... we plan to experiment with the API 15:39:56 ... for loading, we will investigate if we can let the browser process save the model file and pass a duplicate handle to GPU process and map that model into memory and share Windows ML ORT API, to load a compiled model 15:40:18 ... these two APIs are available and we plan to experiment with a prototype and report back to the group 15:41:08 ... last year we shared initial investigated results from our non-sandboxed implementation, writing to a file directly, to get initial performance data, with the new architecture we expect to see better performance 15:42:11 ... the alternative design from Rafael was discussed earlier, buildAndSave(), to understand memory footprint impact, we explore further 15:42:12 q? 15:42:33 q? 15:43:08 Topic: Floating point accuracy for sin and cos with range constraint 15:43:12 Anssi: WebNN issue #914 15:43:13 https://github.com/webmachinelearning/webnn/issues/914 -> Issue 914 Floating point accuracy for sin and cos with range constraint (by philloooo) [operator specific] 15:43:19 ... I want to discuss the proposal to define accuracy for sin and cos 15:43:28 ... Phillis reports the WebNN spec does not define the accuracy of floating point operations 15:43:34 ... she identified this while running WPT that caused failures on TFLite GPU backend 15:43:39 ... also notes WebGPU Shading Language accuracy for some built-in ops, including sin and cos but limits to the range of [-PI, PI] 15:43:52 -> https://gpuweb.github.io/gpuweb/wgsl/#concrete-float-accuracy 15:44:00 Anssi: Phillis' offered two possible options for the group, verbatim: 15:44:01 RafaelCintron has joined #webmachinelearning 15:44:15 ... a) align with the WGSL spec, so no guarantee for inputs outside of [-PI, PI]. And update the WPTs. 15:44:19 ... b) using the periodicity of the cos/sin function, perform range reduction by emulation before calling cos 15:44:23 ... Ningxin SGTM'd option A 15:44:27 ... also Dwayne seems to be fine with A 15:44:42 Anssi: currently WPT contains tests for tolerances but the WebNN spec does not include any language about it 15:44:47 ... Dwayne documented the WPT tolerances in a GH issues a few years ago and I believe this is what is currently codified in WPT: 15:44:54 -> https://github.com/webmachinelearning/webnn/issues/265#issuecomment-1256242643 15:44:54 https://github.com/webmachinelearning/webnn/issues/265 -> CLOSED Issue 265 WPT tests tracker (by BruceDai) 15:44:58 -> https://github.com/webmachinelearning/webnn/issues/338#issuecomment-1419652594 15:44:59 https://github.com/webmachinelearning/webnn/issues/338 -> CLOSED Issue 338 WPT tests tracker / 2 for remaining ops (by BruceDai) [testing] 15:45:33 Anssi: Dwayne's suggestion is to first verify the WPT results across all current backends 15:45:42 ... and after this, put them into in the spec 15:45:44 q? 15:45:56 Dwayne: that is what I recommend 15:46:52 q? 15:47:02 +1 to Dwayne's proposal 15:47:19 q+ 15:47:23 ack mtavenrath 15:48:09 Markus: option A is preferred for fast execution 15:48:10 q? 15:48:43 Anssi: I hear the group prefers option A and Dwayne's proposed next steps 15:48:44 q? 15:49:05 Topic: The minimum data type set 15:49:12 Anssi: issue #853 15:49:12 https://github.com/webmachinelearning/webnn/issues/853 -> Issue 853 The minimum data type set (by huningxin) [operator specific] 15:49:34 ... the context of this issue is "the minimum data type set" represents the data types implementable across all Chromium backends 15:50:05 ... in this investigation, we have identified certain inconsistencies (some of which can be possible implementation bugs to be fixed later) and I'd like the group to weigh in on our preferred approach 15:50:18 ... do we want the opSupportLimits API to: 15:50:30 ... a) report things exactly as-is from the underlying backend, even if inconsistent 15:50:44 ... b) report predictable results, even if that sometimes requires us to normalize the outliers 15:51:25 Dwayne: if it's cheap like with reshape option B is preferred 15:52:15 Anssi: any objections to proceed with option B? 15:52:31 +1 to option b 15:52:44 [no objections heard] 15:52:59 Topic: Support flexible input sizes 15:53:05 Anssi: issue #883 15:53:06 https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+] 15:53:24 ... we resolved in Kobe F2F to do more prototyping before specifying a solution to flexible input sizes 15:53:51 ... Tarek build a webnn-graph tool that converts ONNX dynamic graphs into WebNN static graphs, and then back to ONNX for execution via the ML graph builder 15:53:56 -> https://github.com/rustnn/webnn-graph/ 15:54:21 Anssi: Tarek reports this worked well for models such as MobileNet and a sentence-transformers model all-MiniLM-L6-v2 15:54:34 ... but it did not work for text generations models that rely on dynamic shapes, common in modern LLMs 15:54:49 ... core issue is constant folding that bakes in fixed input size for LLMs 15:55:13 Anssi: Ningxin notes that with ORT operators with dynamic input size currently fall back to Wasm EP, that causes suboptimal performance 15:55:37 ... a workaround is to prepare static models for prefill and decode stages and set sequence length to max value 15:56:01 ... cons: higher mem usage, 2x compile time, requires padding to max length, complex to deploy multiple static models 15:56:17 ... Tarek proposed to add MLOperandDescriptor.shape for optional symbolic dims to be bound at compute() and updateSlice for KV cache write at decoding time 15:56:23 -> MLDynamicDimension Chromium prototype https://chromium-review.googlesource.com/c/chromium/src/+/7496954 15:56:52 Anssi: Tarek had a conflict with this meeting, but I discussed with him and I'm proxying his feedback: 15:56:56 ... "Ningxin's change on #883 is really promising, my feedback to the group is that I am planning to implement it on rustnn this week to try it out and come back with the results. If it works well I'd be +1 to do this." 15:58:37 Ningxin: open question in the comment is whether we should introduce runtime shape related operator, for example shape operator that can put input operand into output operand, there are also e.g. dynamicShape 15:59:02 ... dynamic reshape allows another input operand, can only know at inference time its shape 15:59:37 ... for the second model, dynamic slice, start and end, specific value must be known at inference time, ONNX spec has these ops 16:00:11 ... for WebNN introducing dynamic reshape/slice, shape in unknown at build time, thus cannot do validation at build time 16:00:34 ... open questions to the group whether we consider having those dynamic* operators or have native ML runtime do the shape inference 16:01:57 ... another option is to explore whether we could run these LLMs without those ops, embedding position ids as constants, slice the piece of model to use, without dynamic slice must use JS to do slice on CPU side and feed it into the input operand 16:02:09 ... this makes the conversion and tooling more complex, this is a tradeoff 16:02:14 ... group's feedback welcome 16:02:15 q? 16:03:06 q? 16:05:02 RRSAgent, draft minutes 16:05:03 I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik 16:09:02 s/powerPreference alone/powerPreference enum alone 16:10:16 s/map to ORT/map directly to ORT 16:10:58 s/just the fastest one/not just the fastest one 16:11:33 s/sometimes least/sometimes the least 16:12:17 s/Markus: I'm/MarkusH: I'm 16:13:15 s/per from where/from where 16:13:34 s/OMA/UMA 16:13:48 s/out TPAC/our TPAC 16:14:19 s/an options such as /options such as in 16:16:11 RRSAgent, draft minutes 16:16:12 I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik 16:17:07 s/the explained/the explainer 16:18:26 s/investigated/investigation 16:18:53 s/explore further/explore this further 16:20:18 s/Tarek build/Tarek built 16:22:19 s/dynamicShape/dynamic shape 16:22:51 s/in unknown/is unknown 16:23:32 RRSAgent, draft minutes 16:23:33 I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik 16:23:52 s/sometimes least/sometimes the least 16:23:53 RRSAgent, draft minutes 16:23:54 I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik