14:49:44 RRSAgent has joined #webmachinelearning 14:49:48 logging to https://www.w3.org/2026/06/04-webmachinelearning-irc 14:49:48 RRSAgent, make logs Public 14:49:49 please title this meeting ("meeting: ..."), anssik 14:49:50 Chair: Anssi 14:50:01 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-06-04-wg-agenda.md 14:50:01 Scribe: Anssi 14:50:01 scribeNick: anssik 14:50:03 gb, this is webmachinelearning/webnn 14:50:03 anssik, OK. 14:50:08 Present+ Anssi_Kostiainen 14:50:17 Regrets+ Markus_Handell 14:50:17 RRSAgent, draft minutes 14:50:19 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 14:50:53 Meeting: WebML WG Teleconference – 4 June 2026 14:50:54 RRSAgent, draft minutes 14:50:56 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 14:51:54 mtavenrath has joined #webmachinelearning 14:58:22 DwayneR has joined #webmachinelearning 14:58:40 Present+ Dwayne_Robinson 14:58:45 Present+ Markus_Tavenrath 15:00:36 Mike_Wyrzykowski has joined #webmachinelearning 15:00:45 Present+ Jonathan_Ding 15:00:54 Present+ Mike_Wyrzykowski 15:01:12 RRSAgent, draft minutes 15:01:14 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 15:01:21 Present+ Ningxin_Hu 15:01:31 Present+ Rafael_Cintron 15:01:36 zolkis has joined #webmachinelearning 15:01:39 Present+ Zoltan_Kis 15:01:48 RRSAgent, draft minutes 15:01:50 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 15:02:18 Present+ Ehsan_Toreini 15:03:03 Present+ Reilly_Grant 15:03:31 Topic: Announcements 15:03:39 Subtopic: TPAC 2026 15:03:40 RafaelCintron has joined #webmachinelearning 15:03:53 Anssi: TPAC 2026, an annual W3C conference, is in Dublin, Ireland on 26-30 October 2026 15:04:05 ... during that week, the W3C groups gather to resolve challenging issues and discuss future directions for the web platform 15:04:13 ... TPAC group meetings take place Mon, Tue, Thu and Fri 15:04:17 ... Wed is for breakout sessions and social events 15:04:24 ... my working assumption is this Working Group will meet during the TPAC week 15:04:24 ningxin has joined #webmachinelearning 15:04:34 ... before I request a meeting time from the TPAC planners, I'd like to hear from the group about any potential conflicts with other groups or preferences for meeting times during TPAC week 15:04:39 ... historically, we have met on Monday 15:04:44 ... assuming also other groups tend to stick with their established schedules, that is my default preference 15:04:47 Ehsan has joined #webmachinelearning 15:04:55 ... but I'm open to other suggestions 15:04:59 ... some of the group have shared their schedule preferences already: 15:05:02 -> https://github.com/w3c/tpac2026-meetings/issues 15:05:11 Anssi: TPAC 2026 website is not yet live, I'll share more information when it becomes available 15:05:19 Anssi: questions, comments? 15:05:29 Topic: Web Neural Network API 15:05:46 Subtopic: Dynamic AI Offloading Protocol (DAOP) update 15:06:06 Anssi: the Dynamic AI Offloading Protocol (DAOP) incubation is ongoing and I'm pleased to welcome Jonathan Ding to share an update on this explorative work 15:06:19 ... learnings from this incubation will inform WebNN feature work wrt QoS estimation, dynamic execution routing and other features 15:06:24 ... Jonathan presents first and we will have a discussion after 15:06:29 ... we'll timebox this to 15 minutes combined 15:06:32 ... relevant presentation materials will be shared after the meeting 15:06:36 Anssi: Jonathan, please take it away 15:06:44 Slideset: daop-slides 15:06:53 [slide 2] 15:06:57 Jonathan: DAOP Exploration for Hybrid LLM / Agent scenarios 15:07:08 ... Dynamically offload to device / fallback to cloud 15:07:15 ... Routing based on (latency, accuracy) 15:07:23 ... - Latency = Estimate(Prompt, Model) 15:07:36 ... - Accuracy = P(Correct | Prompt, Model) 15:08:05 Present+ Sarah_Drasner 15:08:17 Jonathan: - Others (cost, privacy, …) in future 15:08:53 ... PoC WIP – pre-trained offline models for routing 15:09:18 ... - Latency based on static cost models with microbenchmarks / profiling data on operators 15:09:31 ... - Accuracy based on matrix factorization algorithms w/ selected LLM benchmarks – revised on RouteLLM 15:09:39 ... Deployment Considerations 15:09:50 ... - Proposing W3C standards in WebML WG – QoS, Events 15:09:53 ... - Routing Layer on top of Built-in AI of browser / Web runtime 15:09:59 [slide 3] 15:10:04 Jonathan: Latency Estimation 15:10:10 ... Estimation Model 15:10:15 ... - Static Cost Model on Raw / Fused Compute Graph 15:10:48 ... - Piecewise Log-space Interpolation in Parameter Space for key Ops, e.g. 15:11:05 ... -- (M, K, N, dtype) for MUL_MAT 15:11:10 ... -- (seqQ, seqKV, Q-heads, KV-heads) for FLASH_ATTN 15:11:16 ... Data Sources 15:11:45 ... - COLD: One-time microbenchmarks (~90s, ~1000 pts) 15:11:54 ... - WARM: Profiling data from daily inference [future] 15:12:00 [slide 4] 15:12:05 Jonathan: Initial Experiments about Accuracy based offloading 15:12:16 ... [ Jonathan presents the Two-Level Conservative Routing Decision Flow and performance of the PoC routing model ] 15:14:54 q+ 15:15:07 Anssi: thank you Jonathan! 15:15:10 ack zolkis 15:15:40 Zoltan: when you run local models, it is in one part, but harness matters so context and memory management to be taken into account, or is it like running a model in the cloud? 15:16:36 Jonathan: you raise a good point that not all text inside your prompt are important, e.g. with large context perhaps only beginning, the system prompt and the end are important 15:16:56 ... this is important for the accuracy consideration, you need the rest for prefill, for accuracy need to do something smarter 15:16:57 q? 15:17:11 Zoltan: thanks for the response 15:19:28 Jonathan: my next step would be add profiling to the inference engine, to get data from daily runs, for privacy considerations, we don't want to expose any such data directly 15:19:55 ... for a new model you'd ask a questions what type of latency I should expect given a particular context length 15:20:19 s/questions/question 15:20:26 Jonathan: a 15:20:33 s/Jonathan: a// 15:21:22 Jonathan: latency estimate can be bucketed in a way it can provide useful information for the web developer to inform model offload decision while preserving privacy 15:22:49 Jonathan: for WebNN, we proposed estimateQoS API, to inform the developer if during execution another workload takes compute resources away 15:24:00 MikeW: WebGPU has a similar timing mechanism 15:24:13 https://www.w3.org/TR/webgpu/#timestamp 15:24:59 Jonathan: if we look at the algoritm, we want to keep the model simple to minimize overhead, in the future, profiling data could be more comprehensible 15:25:28 ... we see some papers suggesting that instead of complex maths, maybe put a simple 1-2 layer network to do the projection 15:25:46 ... you do this either via math interpolation or linear neural network 15:25:59 ... I'll be exploring this further 15:26:14 q? 15:27:03 Subtopic: Effective MLComputePolicy 15:27:08 Anssi: issue #934 15:27:09 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] 15:27:36 ... A number of proposals were shared in this issue, in no particular order: 15:27:49 ... - 1) compilation metrics & runtime estimates by MikeW 15:28:10 mtavenrath has joined #webmachinelearning 15:28:11 ... - 2) low latency v high throughput tradeoff implications by Dwayne 15:28:22 ... - 3) strict hints to fail at build by MarkusH, Reilly to speak to this? 15:28:37 ... - 4) "low-latency" and "precision" hints by Dwayne 15:28:44 ... I propose we discuss each of these separately to stay focused 15:28:51 ... prefer to start with a use case or rationale/motivation for the proposed design, then discuss the proposal itself 15:28:58 -> 1) compilation metrics & runtime estimates https://github.com/webmachinelearning/webnn/issues/934#issuecomment-4515928955 15:28:58 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] 15:29:43 MikeW: if the information we're looking for is latency, other factors of that form, we could expose this directly and mitigate fingerprinting concerns by bucketing every 10-20 ms 15:29:57 ... I think sites could make decision if they want to use WebNN based on that information 15:30:28 Anssi: how is WebGPU experience for the similar API? 15:30:46 q+ 15:30:47 MikeW: we have seen several sites adopt these metrics 15:30:50 q? 15:30:52 ack reillyg 15:31:50 Reilly: my question to MikeW, WebGPU version of this provides after execution metrics, to apply this to WebNN are you implying we'd provide these metrics as the model is being used, or implementations would benchmark at built time and provide that data? 15:32:02 MikeW: estimated metrics seem to be what is desired? 15:32:14 ... UA could look at what was generated and make an estimate based on that information 15:32:14 q? 15:32:32 q+ 15:32:38 ack RafaelCintron 15:33:15 Rafael: in the issue it was said the OS can change depending on where the model runs 15:33:22 ... how reliable is the initial estimate? 15:33:34 MikeW: repeatedly calling this could give different estimates 15:33:42 ... depending on how UA decides to implement this 15:34:20 ... in WebGPU this is part of the render pass 15:34:21 q? 15:34:41 -> 2) low-latency differs from throughput https://github.com/webmachinelearning/webnn/issues/934#issuecomment-4516188947 15:34:42 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] 15:35:32 Dwayne: highest throughput may not be always lowest latency due to the need move data back and forth 15:35:51 q? 15:35:53 q+ 15:35:57 ack Mike_Wyrzykowski 15:36:10 MikeW: if we did have these metrics, we could have metrics for latency and total runtime duration 15:36:17 q+ 15:36:21 ack RafaelCintron 15:36:31 https://www.w3.org/TR/webgpu/#dom-gpuqueue-onsubmittedworkdone 15:37:23 Rafael: wanted to say WebGPU has onsubmittedworkdone that fires when the task completes 15:37:36 ... timer is more precise 15:37:37 q? 15:38:19 -> 3) strict hints to fail at build https://github.com/webmachinelearning/webnn/issues/934#issuecomment-4533601729 15:38:19 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] 15:39:20 Reilly: I think the metrics idea here are good, my main concern in this example of WebNN is that there are so many different paths the UA can take to implement the model, that just looking at latency and throughput may not fully capture the optimization space for the applicaion 15:39:51 ... MarkusH's proposal initial had interop concerns, "build this model with there performance standards, otherwise fail" 15:40:40 ... the proposal here is that in the process of building, what HW the model is compatible with, the UA can trigger a build failure if the model is not compatible to be executed on low-power or high-performance HW 15:41:04 ... proposal to let the UA to decide where to execute the model and pick the HW where to execute 15:41:19 ... if not compatible with the HW, signal a failure 15:41:41 ... some frameworks provide some level of fallback, so considering whether strict as an option would all the UA to move things around 15:41:57 ... "string" may not be the best term, better the model is compatible with a particular execution path 15:42:08 s/string/strict 15:42:10 q? 15:42:13 q+ 15:42:18 ack ningxin 15:42:47 Ningxin: thanks for the discussion, this reminds me on performance discussion, data moving 15:43:29 ... audio suppression use case, want to run on CPU because moving data around is overhead and as a small model no need to move data 15:43:46 ... developer may want to hint the developer from where the data comes from and cares about overhead of data moving 15:44:41 ... "I want to do compute close to the data" hint may be what helps in such use cases 15:44:48 ... IPC also introduced overhead 15:44:50 q? 15:45:39 Ningxin: pipeline use case, "I want this WebNN context to interop with WebGPU rendering pipeline" 15:46:19 ... for audio noise suppression, we need something other than just "fallback", I know CPU is not a good name to use 15:46:30 ... want to focus on the existing use cases 15:46:57 ... compute policy now focuses on compute (high TOPS), not data moving 15:46:58 q? 15:47:07 -> 4) "low-latency" and "precision" hints https://github.com/webmachinelearning/webnn/pull/923#issuecomment-4305974696 15:47:08 https://github.com/webmachinelearning/webnn/pull/923 -> Pull Request 923 Refactor device selection: Rename to computePolicy, remove accelerated, and add fallback (by mingmingtasd) [policy selection] 15:48:12 q+ 15:48:14 Dwayne: re "low-latency" as Ningxin said, for "precision" I meant another signal for this 15:48:17 ack ningxin 15:49:14 Ningxin: for "precision", in WebNN we expect the inference engine to adhere to what is defined for the graph, do you mean if we introduce "precision" we allow the inference engine to do low-precision compute, e.g. casting? 15:49:38 Dwayne: this does not apply currently, in the future we could have an opt-in to say "I want higher precision to matter" 15:50:08 MarkusT: precision is important for trigonometric functions 15:50:34 ... if there would be a way to say I'm find chopping off some low bits it could be useful for some models 15:50:35 q? 15:51:24 Dwayne: by default I'd lean on conformance and allow hints to loosen that expectation via hints 15:51:24 q? 15:51:48 q? 15:53:10 q+ 15:53:15 ack zolkis 15:53:45 Zoltan: wanted to mention accelerators will evolve and I'd prefer record the use case preference so the implementation can make mapping to the current and future silicon 15:55:42 Anssi: would like to gauge whether any of these 4 proposals have strong support from the group, and if so, which one(s) we should prioritize for further exploration and potential implementation? 15:55:51 ... feedback welcome via GH on that 15:56:03 Subtopic: Dynamic shapes 15:56:08 Anssi: issue #883 15:56:08 https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] 15:56:21 Anssi: we have new dispatch-time shape validation implementation experience delivered by Bin Miao, thank you! 15:56:25 -> https://github.com/webmachinelearning/webnn/issues/883#issuecomment-4486385528 15:56:26 https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] 15:57:57 Ningxin: team implemented a prototype of dispatch-time shape validation 15:58:40 ... this means we verify that the intermediate and output shapes generated throughout the entire graph by actual inputs are valid 15:59:10 ... four models tested with different input sized 15:59:23 ... O(N) complexity with the number of nodes in the graph 16:00:46 ... question to the group "Can we compose the equivalent of 0 and -1 with the newly proposed shape and dynamicReshape (or reshapeDynamic)?" 16:01:34 ... it would be useful if WebNN allows compute the final graph's shape when they specify the inputs before the dispatch 16:01:36 ... question to the group, should we expose the inferred output shapes a caller? 16:01:44 s/shapes/shapes to 16:01:45 q? 16:03:57 Ningxin: the two questions we can resolve via GH issue discussion, we plan to do more model testing and report results to the group 16:04:17 ... we know this is a highly sought after feature so we want to make it real 16:04:36 q? 16:05:35 RRSAgent, draft minutes 16:05:37 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 16:21:50 s|daop-slides|https://raw.githubusercontent.com/webmachinelearning/daop/refs/heads/main/presentations/2026-06-04-daop-update.pdf 16:21:52 RRSAgent, draft minutes 16:21:54 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 16:28:03 s|https://raw.githubusercontent.com/webmachinelearning/daop/refs/heads/main/presentations/2026-06-04-daop-update.pdf|https://github.com/webmachinelearning/daop/blob/main/presentations/2026-06-04-daop-update.pdf 16:28:05 RRSAgent, draft minutes 16:28:06 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 16:31:27 s/is in one part/is one part 16:32:33 s/algoritm/algorithm 16:42:35 s/built time/build time 16:43:16 s/change depending on where/decide to change where 16:43:36 s/need move/need to move 16:44:20 s/applicaion/application 16:44:36 s/initial had/initially had 16:44:48 s/there performance/these performance 16:45:32 s/all the/allow the 16:46:12 s/the developer from/from 16:47:06 s/and cares/and indicate preference 16:47:20 s/introduced/introduces 16:48:13 s/find/fine 16:48:27 s/via hints// 16:49:37 s/sought after/sought-after 16:49:46 RRSAgent, draft minutes 16:49:48 I have made the request to generate https://www.w3.org/2026/06/04-webmachinelearning-minutes.html anssik 17:10:34 anssik has joined #webmachinelearning 17:10:34 chrishtr has joined #webmachinelearning 17:10:34 christianliebel has joined #webmachinelearning 17:10:34 hyojin has joined #webmachinelearning 17:10:34 gregwhitworth has joined #webmachinelearning 17:10:34 vmpstr has joined #webmachinelearning 17:10:34 tomayac27 has joined #webmachinelearning 17:10:34 jyasskin has joined #webmachinelearning 17:10:34 awafaa has joined #webmachinelearning 17:10:34 vasilii has joined #webmachinelearning 18:14:32 anssik has joined #webmachinelearning 18:14:32 chrishtr has joined #webmachinelearning 18:14:32 christianliebel has joined #webmachinelearning 18:14:32 hyojin has joined #webmachinelearning 18:14:32 gregwhitworth has joined #webmachinelearning 18:14:32 vmpstr has joined #webmachinelearning 18:14:32 tomayac27 has joined #webmachinelearning 18:14:32 jyasskin has joined #webmachinelearning 18:14:32 awafaa has joined #webmachinelearning 18:14:32 vasilii has joined #webmachinelearning 18:26:38 Zakim has left #webmachinelearning