14:58:07 <RRSAgent> RRSAgent has joined #webmachinelearning
14:58:11 <RRSAgent> logging to https://www.w3.org/2026/01/29-webmachinelearning-irc
14:58:11 <Zakim> RRSAgent, make logs Public
14:58:12 <Zakim> please title this meeting ("meeting: ..."), anssik
14:58:13 <anssik> Meeting: WebML WG Teleconference – 29 January 2026
14:58:18 <anssik> Chair: Anssi
14:58:22 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-01-29-wg-agenda.md
14:58:28 <anssik> Scribe: Anssi
14:58:32 <anssik> scribeNick: anssik
14:59:33 <anssik> gb, this is webmachinelearning/webnn
14:59:34 <gb> anssik, OK.
14:59:38 <anssik> Present+ Anssi_Kostiainen
14:59:44 <anssik> Present+ Markus_Tavenrath
14:59:45 <anssik> Regrets+ Tarek_Ziade
15:00:06 <anssik> Present+ Ugur_Acar
15:00:46 <anssik> Present+ Mike_Wyrzykowski
15:00:53 <anssik> Present+ Rafael_Cintron
15:01:03 <anssik> Present+ Dwayne_Robinson
15:01:21 <Mike_Wyrzykowski> Mike_Wyrzykowski has joined #webmachinelearning
15:01:34 <anssik> Present+ Ben_Greenstein
15:01:44 <dwayner> dwayner has joined #webmachinelearning
15:02:00 <anssik> Present+ Ningxin_Hu
15:02:19 <ningxin> ningxin has joined #webmachinelearning
15:02:28 <anssik> RRSAgent, draft minutes
15:02:29 <RRSAgent> I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik
15:02:53 <anssik> Present+ Ehsan_Toreini
15:03:01 <BenGreenstein> BenGreenstein has joined #webmachinelearning
15:03:15 <anssik> Present+ Markus_Handell
15:03:38 <anssik> Anssi: please welcome the later new participant who joined the WG:
15:03:50 <anssik> ... Benjamin VanderSloot from Mozilla
15:03:57 <anssik> ... Dominic Farolino from Google
15:04:12 <anssik> ... welcome to the group, Benjamin and Dominic!
15:04:18 <anssik> Topic: Incubations
15:04:27 <anssik> -> https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-01-22-cg-agenda.md
15:04:30 <anssik> -> https://www.w3.org/2026/01/22-webmachinelearning-minutes.html
15:04:37 <handellm> handellm has joined #webmachinelearning
15:04:44 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:04:49 <anssik> Anssi: first, the WebML Community Group transitioned from the WebMCP explainer to a Community Group spec draft stage and published the initial draft:
15:04:53 <anssik> -> https://webmachinelearning.github.io/webmcp/
15:05:03 <anssik> Anssi: the group will now port over content from the explainer to formalize the proposal
15:05:28 <anssik> Anssi: second, an initial proposal for W3C WebML CG and Agentic AI Foundation coordination was reviewed and discussed
15:05:56 <anssik> ... third, the group resolved to evolve the declarative proposal to expose tools via HTML in parallel with the imperative WebMCP API
15:06:11 <anssik> ... for Built-in AI APIs, we welcomed new editors on board, Reilly and Ehsan picking up this responsibility for the Translator and Language Detector APIs
15:06:19 <anssik> ... we also discussed implementers' priorities for 2026 to inform the group
15:06:29 <anssik> ... Mike shared Chrome's biggest focus for Built-in AI APIs is the Prompt API
15:06:45 <anssik> ... lastly, we resolved to shift the WebML CG call forward by one hour
15:07:00 <anssik> ... we will keep this WebML WG call at its current timeslot in appreciation of our PRC, Japan and APAC participants who join already now at very late hours
15:07:18 <anssik> Topic: HTTP Archive’s annual state of the web report for Generative AI
15:07:38 <anssik> -> https://almanac.httparchive.org/en/2025/generative-ai
15:07:59 <anssik> ... it is worth noting this report discusses the pros and cons of cloud vs local inference in its technology overview
15:08:05 <anssik> -> https://almanac.httparchive.org/en/2025/generative-ai#cloud-versus-local
15:08:18 <anssik> Anssi: thank you to the team who produced this extensive report
15:08:24 <Ehsan> Ehsan has joined #webmachinelearning
15:08:40 <anssik> Topic: Candidate Recommendation Snapshot published
15:08:43 <anssik> gb, this is webmachinelearning/webnn
15:08:43 <gb> anssik, OK.
15:08:49 <anssik> -> WebNN API spec release history https://www.w3.org/standards/history/webnn/
15:08:53 <anssik> Anssi: on 22 January 2026 we published a new Candidate Recommendation Snapshot (CRS)
15:09:04 <anssik> ... since previous major WebNN CRS publication (11 April 2024) we have made over 100 significant changes
15:09:39 <anssik> ... this WebNN "v3" milestone release added new ops and datatypes, improved API abstractions and developer ergonomics, interoperability, added new horizontal considerations and more
15:09:49 <anssik> ... the group received kudos for its work with horizontal groups and topics: privacy, security, sustainability, ethics
15:10:11 <anssik> ... the GitHub CI/CD is now configured to publish new Candidate Recommendation Drafts
15:10:11 <anssik> and the spec editors are welcome to proceed to merge any open PRs that were awaiting as usual
15:10:15 <anssik> ... huge congratulations to the group for this significant publication milestone!
15:10:29 <anssik> Present+ Christian_Liebel
15:11:25 <anssik> Christian: per HTTP Archive’s annual state of the web report interest in client-side AI is growing massively
15:11:54 <anssik> ... WebLLM and Transformers.js use has grown sharply
15:12:08 <anssik> Topic: Accelerated context option implementation feedback
15:12:13 <anssik> Anssi: issue #911
15:12:14 <gb> https://github.com/webmachinelearning/webnn/issues/911 -> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]
15:12:33 <anssik> Anssi: we will discuss new implementation feedback and seek consensus on the proposed spec change to add "no-acceleration" to powerPreference enum
15:12:40 <anssik> Anssi: for implementation feedback, we have new information from rustnn and Chromium
15:12:53 <anssik> -> https://github.com/rustnn/rustnn#backend-selection
15:13:05 <anssik> -> https://chromium-review.googlesource.com/c/chromium/src/+/7513189
15:13:13 <anssik> Anssi: for rustnn, Tarek implemented backend selection based on the current spec using two hints passed to createContext:
15:13:20 <anssik> ... boolean accelerated = true;
15:13:20 <anssik> ... enum MLPowerPreference { "default", "high-performance", "low-power" }
15:13:37 <anssik> ... for Chromium, Mingming prototyped the new proposed "no-acceleration" value for MLPowerPreference enum without accelerated boolean:
15:13:48 <anssik> ... enum MLPowerPreference { "default", "high-performance", "low-power", "no-acceleration" };
15:13:55 <christianliebel> present+
15:14:08 <anssik> Anssi: I believe both implementations expose the MLContext.accelerated boolean and we have an emerging consensus on that part
15:14:11 <anssik> q?
15:14:13 <anssik> ack christianliebel
15:14:18 <anssik> ... the hints provided at createContext time are still being discussed
15:14:30 <anssik> -> https://github.com/webmachinelearning/webnn/issues/911#issuecomment-3798857878
15:14:31 <gb> https://github.com/webmachinelearning/webnn/issues/911 -> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]
15:14:43 <anssik> Anssi: in issue discussion, Zoltan notes "no-acceleration" would be a good context option but not a power preference per se
15:14:49 <Ugur_Depixen> Ugur_Depixen has joined #webmachinelearning
15:15:02 <anssik> ... WebGPU/GL use powerPreference and I believe is what developers expect to see in similar APIs that interface with hardware such as WebNN API
15:15:09 <anssik> ... per Priority of Constituencies principle, I would suggest we consider developer ergonomics over theoretical purity in this case and do not rename powerPreference even if we choose to include "no-acceleration" in this enum
15:15:14 <anssik> ... we can explain the naming issue in the spec prose
15:15:26 <anssik> -> https://github.com/webmachinelearning/webnn/issues/911#issuecomment-3813141583
15:15:40 <anssik> Anssi: Bryan comments that MLDevicePreference 'high-performance' is too ambiguous for WebGPU interop scenarios
15:15:44 <anssik> ... Bryan's problem statement:
15:16:00 <anssik> ... "In a hybrid system, WebNN might resolve to the NPU while WebGPU resolves to the dGPU. Since the adapters won't match, interop won't work."
15:16:05 <anssik> ... feedback summary is a powerPreference alone is insufficient
15:16:06 <anssik> q?
15:16:52 <anssik> Rafael: I think there's unfortunately diversity in the ecosystem, multiple adapters, hybrid adapters, I think it is important that the WebNN agrees
15:17:08 <anssik> ... WebGPU "high-performance" and WebNN "high-performance" should pick the same adapter
15:17:36 <anssik> ... in the past we used to have GPUDevice, when you make a context you specify you want to do WebGPU interop with this device, I think we removed that from the Chromium implementation
15:18:11 <anssik> ... I'm personally OK to have no-accelation enum in powerPreferences, also can live with it being its own boolean via fallbackAdapter in which case powerPreferences is ignored
15:18:34 <anssik> ... WebGPU and WebNN selections should agree so they stay consistent
15:18:47 <anssik> ... no way to rationalize what happens if WebGPU and WebNN pick a different adapter
15:18:48 <anssik> q?
15:20:05 <anssik> Ningxin: Mingming's idea is to explore Rafael's proposal and see how it can simplify the mapping to ONNX Runtime backend, ORT has its device selection policy with its own enum values
15:20:29 <anssik> Anssi: does the Chromium prototype map to ORT policy?
15:20:35 <anssik> Ningxin: makes it easier to map to it
15:21:09 <anssik> Anssi: how about other JS frameworks?
15:22:21 <Mike_Wyrzykowski> q+
15:22:26 <anssik> Markus: I'm a bit late to this issue, I think for audio and video real-time inference, it is important to spec options so that you don't get executed on HW that could be shared with other apps on the system
15:22:47 <anssik> ... there's a case for accelerated video where we accept any device, just the fastest one
15:23:54 <mtavenrath> q+
15:24:00 <anssik> ... for video case we could not get executed on CPU, the latest comment from Rafael to use the same preferences over WebGPU and WebNN, maybe you can create WebGPU context from WebNN context if the values need to match
15:24:25 <anssik> q?
15:24:35 <anssik> ack Mike_Wyrzykowski
15:25:26 <anssik> MikeW: the option seems reasonable in general, but the name could use some thinking, sometimes  least power-consuming or fastest, or most optimal device is the accelerated device, maybe like "fallback" as a name would be good instead
15:25:44 <anssik> q?
15:25:47 <anssik> ack mtavenrath
15:26:35 <anssik> Markus: maybe independent of power options, ML is a huge pipeline, and looking at all the APIs involved, Web Audio, WebGPU, shouldn't we have an interface that says give me the best end-to-end optimized device?
15:27:17 <anssik> ... if you want to do audio, maybe WebGPU is the worst thing to do
15:27:29 <handellm> handellm has joined #webmachinelearning
15:28:03 <handellm> q+
15:28:19 <anssik> Markus: we are thinking is the developer able to defer some hints per from where the input comes and where the output goes, e.g. input from WebGPU we never want to do CPU post-processing unless on OMA system
15:28:28 <anssik> ... mem copies take so much time
15:28:29 <RafaelCintron> q+
15:28:30 <anssik> q?
15:28:36 <anssik> ack handellm
15:29:23 <anssik> Markus: that sounds like out TPAC discussion regarding worker QoS, someone from Apple and Google suggested a way to describe to the entire pipeline, what are the concrete options for that, a new Web API or massage into WebNN?
15:29:51 <anssik> ... create an options such as MediaStream where you retain the context of the entire system
15:29:52 <anssik> q?
15:30:06 <anssik> q?
15:30:10 <anssik> ack RafaelCintron
15:31:00 <anssik> Rafael: Markus from NVIDIA has an interesting suggestion, we had a similar idea when we passed GPUDevice to context, we could go back to that design, but in that case you pass the GPUDevice and powerPreference and they may all disagree among themselves
15:31:08 <anssik> ... web developers perhaps should organize themselves
15:31:45 <anssik> ... when you make the context all tensors are in the same domain, some want to interop with WebGPU, some want to interop with Web Audio, that'd require two contexts with different interop requirements
15:32:15 <anssik> ... would web developers be fine with those tensors attached to different contexts not being able to access each other?
15:32:16 <anssik> q?
15:32:44 <anssik> Markus: there's always a mem copy if you swap between contexts, we give web developers control, do we want to give that control?
15:32:44 <anssik> q?
15:33:47 <anssik> q?
15:33:54 <anssik> Topic: Model Cache API
15:34:00 <anssik> Anssi: issue #807
15:34:01 <gb> https://github.com/webmachinelearning/webnn/issues/807 -> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]
15:34:13 <anssik> -> https://github.com/webmachinelearning/webnn/blob/main/cache-explainer.md
15:34:17 <anssik> Anssi: the group has done initial investigation and drafted a proposal for an explicit API for caching compiled graphs
15:34:36 <anssik> ... this ahead-of-time (AOT) compilation in particular benefits slower end-user devices
15:35:00 <anssik> ... the proposed API is documented in the explained
15:35:09 <anssik> ... the proposed API is documented in the explained
15:35:17 <anssik> s/... the proposed API is documented in the explained/
15:35:22 <anssik> ... this reduces the overhead of repeated compilation
15:35:47 <anssik> ... I want to understand whether 2026 is the right time to reinvigorate this discussion on the Model Cache API
15:35:53 <anssik> ... also, I want to understand if we have new implementation experience that should be shared with the group and documented in the explainer
15:36:18 <anssik> ... from our past discussion, I recall we discussed how to make this work with Chromium sandboxing and storage partitioning constraints
15:36:24 <anssik> q?
15:37:27 <anssik> Ningxin: we have this a priority for this year, web app developers and vendors are interested in this feature
15:37:46 <anssik> ... AOT compilation will be an alternative way to solve this issue
15:38:45 <anssik> ... from platform API perspective, we've worked with Windows ML team, there's an API to write a callback when you compile a model, through that you get a compiled blob, useful for implementing this in Chromium, pass this compiled blob from a sandbox to storage management
15:38:54 <anssik> ... we plan to experiment with the API
15:39:56 <anssik> ... for loading, we will investigate if we can let the browser process save the model file and pass a duplicate handle to GPU process and map that model into memory and share Windows ML ORT API, to load a compiled model
15:40:18 <anssik> ... these two APIs are available and we plan to experiment with a prototype and report back to the group
15:41:08 <anssik> ... last year we shared initial investigated results from our non-sandboxed implementation, writing to a file directly, to get initial performance data, with the new architecture we expect to see better performance
15:42:11 <anssik> ... the alternative design from Rafael was discussed earlier, buildAndSave(), to understand memory footprint impact, we explore further
15:42:12 <anssik> q?
15:42:33 <anssik> q?
15:43:08 <anssik> Topic: Floating point accuracy for sin and cos with range constraint
15:43:12 <anssik> Anssi: WebNN issue #914
15:43:13 <gb> https://github.com/webmachinelearning/webnn/issues/914 -> Issue 914 Floating point accuracy for sin and cos with range constraint (by philloooo) [operator specific]
15:43:19 <anssik> ... I  want to discuss the proposal to define accuracy for sin and cos
15:43:28 <anssik> ... Phillis reports the WebNN spec does not define the accuracy of floating point operations
15:43:34 <anssik> ... she identified this while running WPT that caused failures on TFLite GPU backend
15:43:39 <anssik> ... also notes WebGPU Shading Language accuracy for some built-in ops, including sin and cos but limits to the range of [-PI, PI]
15:43:52 <anssik> -> https://gpuweb.github.io/gpuweb/wgsl/#concrete-float-accuracy
15:44:00 <anssik> Anssi: Phillis' offered two possible options for the group, verbatim:
15:44:01 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:44:15 <anssik> ... a) align with the WGSL spec, so no guarantee for inputs outside of [-PI, PI]. And update the WPTs.
15:44:19 <anssik> ... b) using the periodicity of the cos/sin function, perform range reduction by emulation before calling cos
15:44:23 <anssik> ... Ningxin SGTM'd option A
15:44:27 <anssik> ... also Dwayne seems to be fine with A
15:44:42 <anssik> Anssi: currently WPT contains tests for tolerances but the WebNN spec does not include any language about it
15:44:47 <anssik> ... Dwayne documented the WPT tolerances in a GH issues a few years ago and I believe this is what is currently codified in WPT:
15:44:54 <anssik> -> https://github.com/webmachinelearning/webnn/issues/265#issuecomment-1256242643
15:44:54 <gb> https://github.com/webmachinelearning/webnn/issues/265 -> CLOSED Issue 265 WPT tests tracker (by BruceDai)
15:44:58 <anssik> -> https://github.com/webmachinelearning/webnn/issues/338#issuecomment-1419652594
15:44:59 <gb> https://github.com/webmachinelearning/webnn/issues/338 -> CLOSED Issue 338 WPT tests tracker / 2 for remaining ops (by BruceDai) [testing]
15:45:33 <anssik> Anssi: Dwayne's suggestion is to first verify the WPT results across all current backends
15:45:42 <anssik> ... and after this, put them into in the spec
15:45:44 <anssik> q?
15:45:56 <anssik> Dwayne: that is what I recommend
15:46:52 <anssik> q?
15:47:02 <ningxin> +1 to Dwayne's proposal
15:47:19 <mtavenrath> q+
15:47:23 <anssik> ack mtavenrath
15:48:09 <anssik> Markus: option A is preferred for fast execution
15:48:10 <anssik> q?
15:48:43 <anssik> Anssi: I hear the group prefers option A and Dwayne's proposed next steps
15:48:44 <anssik> q?
15:49:05 <anssik> Topic: The minimum data type set
15:49:12 <anssik> Anssi: issue #853
15:49:12 <gb> https://github.com/webmachinelearning/webnn/issues/853 -> Issue 853 The minimum data type set (by huningxin) [operator specific]
15:49:34 <anssik> ... the context of this issue is "the minimum data type set" represents the data types implementable across all Chromium backends
15:50:05 <anssik> ... in this investigation, we have identified certain inconsistencies (some of which can be possible implementation bugs to be fixed later) and I'd like the group to weigh in on our preferred approach
15:50:18 <anssik> ... do we want the opSupportLimits API to:
15:50:30 <anssik> ... a) report things exactly as-is from the underlying backend, even if inconsistent
15:50:44 <anssik> ... b) report predictable results, even if that sometimes requires us to normalize the outliers
15:51:25 <anssik> Dwayne: if it's cheap like with reshape option B is preferred
15:52:15 <anssik> Anssi: any objections to proceed with option B?
15:52:31 <ningxin> +1 to option b
15:52:44 <anssik> [no objections heard]
15:52:59 <anssik> Topic: Support flexible input sizes
15:53:05 <anssik> Anssi: issue #883
15:53:06 <gb> https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
15:53:24 <anssik> ... we resolved in Kobe F2F to do more prototyping before specifying a solution to flexible input sizes
15:53:51 <anssik> ... Tarek build a webnn-graph tool that converts ONNX dynamic graphs into WebNN static graphs, and then back to ONNX for execution via the ML graph builder
15:53:56 <anssik> -> https://github.com/rustnn/webnn-graph/
15:54:21 <anssik> Anssi: Tarek reports this worked well for models such as MobileNet and a sentence-transformers model all-MiniLM-L6-v2
15:54:34 <anssik> ... but it did not work for text generations models that rely on dynamic shapes, common in modern LLMs
15:54:49 <anssik> ... core issue is constant folding that bakes in fixed input size for LLMs
15:55:13 <anssik> Anssi: Ningxin notes that with ORT operators with dynamic input size currently fall back to Wasm EP, that causes suboptimal performance
15:55:37 <anssik> ... a workaround is to prepare static models for prefill and decode stages and set sequence length to max value
15:56:01 <anssik> ... cons: higher mem usage, 2x compile time, requires padding to max length, complex to deploy multiple static models
15:56:17 <anssik> ... Tarek proposed to add MLOperandDescriptor.shape for optional symbolic dims to be bound at compute() and updateSlice for KV cache write at decoding time
15:56:23 <anssik> -> MLDynamicDimension Chromium prototype https://chromium-review.googlesource.com/c/chromium/src/+/7496954
15:56:52 <anssik> Anssi: Tarek had a conflict with this meeting, but I discussed with him and I'm proxying his feedback:
15:56:56 <anssik> ... "Ningxin's change on #883 is really promising, my feedback to the group is that I am planning to implement it on rustnn this week to try it out and come back with the results. If it works well I'd be +1 to do this."
15:58:37 <anssik> Ningxin: open question in the comment is whether we should introduce runtime shape related operator, for example shape operator that can put input operand into output operand, there are also e.g. dynamicShape
15:59:02 <anssik> ... dynamic reshape allows another input operand, can only know at inference time its shape
15:59:37 <anssik> ... for the second model, dynamic slice, start and end, specific value must be known at inference time, ONNX spec has these ops
16:00:11 <anssik> ... for WebNN introducing dynamic reshape/slice, shape in unknown at build time, thus cannot do validation at build time
16:00:34 <anssik> ... open questions to the group whether we consider having those dynamic* operators or have native ML runtime do the shape inference
16:01:57 <anssik> ... another option is to explore whether we could run these LLMs without those ops, embedding position ids as constants, slice the piece of model to use, without dynamic slice must use JS to do slice on CPU side and feed it into the input operand
16:02:09 <anssik> ... this makes the conversion and tooling more complex, this is a tradeoff
16:02:14 <anssik> ... group's feedback welcome
16:02:15 <anssik> q?
16:03:06 <anssik> q?
16:05:02 <anssik> RRSAgent, draft minutes
16:05:03 <RRSAgent> I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik
16:09:02 <anssik> s/powerPreference alone/powerPreference enum alone
16:10:16 <anssik> s/map to ORT/map directly to ORT
16:10:58 <anssik> s/just the fastest one/not just the fastest one
16:11:33 <anssik> s/sometimes least/sometimes the least
16:12:17 <anssik> s/Markus: I'm/MarkusH: I'm
16:13:15 <anssik> s/per from where/from where
16:13:34 <anssik> s/OMA/UMA
16:13:48 <anssik> s/out TPAC/our TPAC
16:14:19 <anssik> s/an options such as /options such as in
16:16:11 <anssik> RRSAgent, draft minutes
16:16:12 <RRSAgent> I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik
16:17:07 <anssik> s/the explained/the explainer
16:18:26 <anssik> s/investigated/investigation
16:18:53 <anssik> s/explore further/explore this further
16:20:18 <anssik> s/Tarek build/Tarek built
16:22:19 <anssik> s/dynamicShape/dynamic shape
16:22:51 <anssik> s/in unknown/is unknown
16:23:32 <anssik> RRSAgent, draft minutes
16:23:33 <RRSAgent> I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik
16:23:52 <anssik> s/sometimes least/sometimes the least
16:23:53 <anssik> RRSAgent, draft minutes
16:23:54 <RRSAgent> I have made the request to generate https://www.w3.org/2026/01/29-webmachinelearning-minutes.html anssik