14:55:49 RRSAgent has joined #webmachinelearning 14:55:53 logging to https://www.w3.org/2026/06/18-webmachinelearning-irc 14:55:54 RRSAgent, make logs Public 14:55:55 please title this meeting ("meeting: ..."), anssik 14:56:06 Meeting: WebML WG Teleconference – 18 June 2026 14:56:06 Chair: Anssi 14:56:06 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-06-18-wg-agenda.md 14:56:07 Scribe: Anssi 14:56:09 scribeNick: anssik 14:56:23 Present+ Anssi_Kostiainen 14:56:28 RRSAgent, draft minutes 14:56:29 I have made the request to generate https://www.w3.org/2026/06/18-webmachinelearning-minutes.html anssik 14:58:14 dwayner has joined #webmachinelearning 14:58:30 Present+ Dwayne_Robinson 14:59:12 Mike_Wyrzykowski has joined #webmachinelearning 14:59:20 Present+ Mike_Wyrzykowski 15:00:00 ydaniv has joined #webmachinelearning 15:01:02 Present+ Yehonatan_Daniv 15:01:02 Present+ Bryan_Bernhart 15:01:06 Present+ Ningxin_Hu 15:02:09 Present+ Rafael_Cintron 15:02:20 Present+ Reilly_Grant 15:02:29 RRSAgent, draft minutes 15:02:30 I have made the request to generate https://www.w3.org/2026/06/18-webmachinelearning-minutes.html anssik 15:03:06 Anssi: please join me in welcoming the latest new participant to the WG: 15:03:11 ... Severin Ferrand from Google 15:03:19 ... Yehonatan Daniv from Vix.com 15:03:27 RafaelCintron has joined #webmachinelearning 15:05:08 ... Julien Bataille from Rakuten Group, Inc. 15:05:12 ... welcome all! 15:05:19 ningxin has joined #webmachinelearning 15:05:24 Topic: Announcements 15:05:34 Subtopic: TPAC 2026 15:05:41 s/Vix.com/Wix.com/ 15:05:41 Anssi: as discussed, TPAC 2026 takes place in Dublin, Ireland on 26-30 October 2026 15:05:58 ... I have requested Monday, 26 October 2026, for this Working Group, to be confirmed by TPAC planners 15:06:06 ... I have also started a F2F agenda issue where I invite the group to share their thoughts on potential agenda topics for the TPAC F2F meeting: 15:06:10 -> https://github.com/webmachinelearning/meetings/issues/39 15:06:11 https://github.com/webmachinelearning/meetings/issues/39 -> Issue 39 WebML WG/CG F2F Agenda - TPAC 2026 (Dublin, Ireland) (by anssiko) 15:06:12 Ehsan has joined #webmachinelearning 15:06:30 Anssi: the expectation is we use the F2F to have both discussions on the shorter-term issues as well as longer-term horizon with new features for future directions 15:06:48 ... we also provide space for sharing demos and implementation updates with the broader community 15:07:08 Anssi: I have planned the Community Group meeting for Tuesday, 27 October 2026, so if you attend TPAC you may want to consider attending that as well to connect with the broader community 15:07:27 ... we operate in the same space and there is a lot of overlap between the two groups, TPAC is a prime opportunity for cross-pollination of ideas between the groups 15:07:46 ... I expect the TPAC group schedule to be finalized later this month, I will share more information as soon as it becomes available 15:07:50 Anssi: questions, comments? 15:07:58 q? 15:08:06 q+ to ask a question 15:08:07 ack anssik 15:08:07 anssik, you wanted to ask a question 15:08:20 Subtopic: New charter proposal 15:08:26 Anssi: I want to share an update on the new charter proposal for this Working Group 15:08:42 ... as you may have noticed, the WebMCP proposal being developed in the WebML CG has generated excitement and momentum 15:08:47 -> https://github.com/webmachinelearning/webmcp 15:09:49 Anssi: I added WebMCP deliverable to the WebML CG last year, and I'm pleased to see the progress that has been made by the WebML CG, and the excitement WebMCP has generated within the broader web community 15:10:07 ... WebMCP is one of the fastest growing incubations in the history of the W3C's Community Group (CG) program 15:10:24 ... the W3C CEO and senior leadership team is supportive of evolving the WebML WG charter to include WebMCP as a deliverable 15:10:41 ... the next step is to finalize the formal charter proposal document that outlines the proposed changes to the charter, including the addition of WebMCP as a deliverable, and submit it for review by the W3C membership 15:10:44 ... the draft charter proposal is available for you to review: 15:10:49 -> https://github.com/w3c/charter-drafts/pull/829/changes 15:10:50 https://github.com/w3c/charter-drafts/pull/829 -> Pull Request 829 [wg/webmachinelearning] Adds WebMCP in scope (by plehegar) 15:11:15 Anssi: as an important deliverable, I expect active discussion in the W3C community, a healthy part of the process 15:11:21 ... I welcome your feedback on the draft charter proposal on our next call 15:11:32 ... in the meantime, if you have any immediate thoughts or questions about the draft charter proposal, please feel free to share them now or reach out to me directly 15:11:41 q? 15:11:55 Topic: Web Neural Network API 15:12:00 gb, this is webmachinelearning/webnn 15:12:01 anssik, OK. 15:12:13 RafaelCintron has joined #webmachinelearning 15:12:15 Subtopic: WebGPU interop 15:12:26 Anssi: this topic is to discuss WebGPU interop and expected spec changes 15:12:37 ... I have invited Bryan to the call who has signaled interest in this topic and has been actively involved in the WebGPU interop discussions 15:12:49 ... the MLTensor Explainer discussed WebGPU interop: 15:12:53 -> https://github.com/webmachinelearning/webnn/blob/main/mltensor-explainer.md#webgpu-interop 15:12:58 Anssi: we have a few "webgpu interop" related issues: 15:13:02 -> "webgpu interop" issues https://github.com/webmachinelearning/webnn/labels/webgpu%20interop 15:13:06 Anssi: issue #529 for WebNN timelines 15:13:08 https://github.com/webmachinelearning/webnn/issues/529 -> Issue 529 Specify WebNN timelines (by a-sully) [webgpu interop] 15:13:13 another issue #343 for WebGPU interop sample code 15:13:14 https://github.com/webmachinelearning/webnn/issues/343 -> Issue 343 Add sample code for WebGPU interop (by huningxin) [editorial] [webgpu interop] 15:13:38 Anssi: are there any other spec changes expected either in the WebNN or WebGPU spec, or both, to enable WebGPU interop? 15:13:48 q+ 15:13:52 ... have we ported over all the necessary WebGPU interop bits from the MLTensor explainer to the WebNN spec? 15:13:54 ack reillyg 15:14:01 Rack reillyg 15:14:08 s/Rack reillyg// 15:15:04 Mike_Wyrzykowski2 has joined #webmachinelearning 15:15:11 Reilly: we have a Chromium prototype, the JS API itself is implemented to some extent, platform-specific bits e.g. HW buffers, most mature on macOS, being implemented on WIndows and Linux with LiteRT 15:15:43 ... it is currently in an explainer only, WebGPU interop changes have not been ported over the WebNN spec 15:16:08 ... Bryan made a change to make the export feature synchronous to improve pipelining 15:16:16 q? 15:16:46 Bryan: I agree with Reilly, the explainer is out of date, a lot of people want to use this feature and would like to see this well specified 15:17:20 +1 to specifying it more formally and updating the explainer. 15:17:24 +1 15:17:27 +1 15:17:30 ... more formal specification would be helpful, does the group support specifying WebGPU interop parts in the spec? 15:17:53 q+ 15:18:16 +1 15:18:29 ack ningxin 15:19:07 Ningxin: about WebGPU interop sample, I shared a demo at TPAC 2025, I recommend to move that sample to the group's webnn-samples repo 15:20:02 +1 15:20:05 +1 15:20:06 +1 15:20:08 RESOLUTION: The group acknowledges the importance of WebGPU interop and supports porting over the necessary features to the WebNN API spec from the MLTensor explainer. 15:20:18 q_ 15:20:19 q+ 15:20:25 ack reillyg 15:20:49 Reilly: Bryan mentioned some difficulties, was that about undocumented buffering requirements maybe? 15:21:12 Bryan: today the usage, if you import a buffer, it is implicit, what format each side of the APIs need to use 15:21:22 Reilly: these could be formally documented in the spec 15:21:57 ... when implementing this on Core ML there are some limitations in terms of data types and shapes for input buffers 15:22:36 ... we partially worked around these, probably an open question, something to be exposed through opSupportLimits 15:23:10 Bryan: there was another issue about device selection that breaks interop, we need to clarify how that will work 15:23:33 Reilly: there's a version of content creation that takes a GPUDevice 15:23:52 ... implicit hint to pick a device that is efficient 15:24:08 ... some platforms may ignore this hint 15:24:41 ... looking at ML inference frameworks, GPUs seem to like tensors in certain shapes 15:24:49 q? 15:25:16 Bryan: we have to come up with guidelines for this 15:25:24 q? 15:26:29 Reilly: can you open a new GH issue for this Bryan? 15:26:43 +1 to open an issue about gpudevice passing 15:26:54 Bryan: will do that, filing separate issues 15:26:55 q? 15:27:02 q? 15:27:09 Subtopic: Effective MLComputePolicy 15:27:14 Anssi: issue #934 15:27:15 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] [Agenda+] 15:27:23 Anssi: last time we received an update on the Dynamic AI Offloading Protocol (DAOP) incubation by Jonathan 15:27:33 ... we had a limited timebox for DAOP, so on this call, I wanted to share some additional information about the explainer to inform the effective MLComputePolicy discussion 15:27:42 -> The estimateQoS() API https://github.com/webmachinelearning/daop/tree/main#the-estimateqos-api 15:27:56 ... DAOP explainer contained estimateQoS() API for performance negotiation 15:28:25 ... using the estimateQoS() API the web developer can get a performance tier estimate for executing the graph on the local device, which can help them make informed decisions about offloading to the cloud or executing locally 15:28:34 [[ 15:28:34 dictionary MLQoSReport { 15:28:34 MLPerformanceTier performanceTier; 15:28:34 }; 15:28:36 partial interface MLContext { 15:28:36 Promise estimateQoS(MLGraph graph, optional MLQoSOptions options); 15:28:36 }; 15:28:39 dictionary MLQoSOptions { 15:28:39 // Input characteristics 15:28:39 record inputDescriptors; 15:28:42 // Weights characteristics (Optional) 15:28:42 boolean weightsSparsity = false; 15:28:42 }; 15:28:43 ]] 15:29:04 -> Performance Tiers https://github.com/webmachinelearning/daop/tree/main#performance-tiers 15:29:27 Anssi: performance tiers are represented as Tier strings to avoid fingerprinting and to allow implementations evolve the exact tier boundaries based on their specific hardware capabilities and performance characteristics: 15:29:37 Anssi: Tier / Indicative Latency / Interpretation 15:29:39 ... "excellent" / < 16 ms / Real-time (60 fps frame budget) 15:29:39 ... "good" / < 100 ms / Interactive responsiveness 15:29:39 ... "fair" / < 1 s / Responsive for non-real-time tasks 15:29:39 ... "moderate" / < 10 s / Tolerable for batch or one-shot tasks 15:29:40 ... "slow" / < 30 s / Noticeable wait 15:29:40 ... "very-slow" / < 60 s / Long wait 15:29:40 ... "poor" / ≥ 60 s / Likely unacceptable for most use cases 15:30:24 Anssi: the important point to note is that the exact tier boundaries are implementation-defined 15:30:34 ... the explainer also provides an "Adaptive Background Blur" example to make the API more concrete: 15:30:38 -> https://github.com/webmachinelearning/daop/tree/main#example-code-adaptive-background-blur 15:30:57 Anssi: in this example, if the tier is one of "excellent", "good", "fair", or "moderate" the graph is executed locally, otherwise the graph is offloaded to the cloud 15:31:17 -> Boolean Requirement API https://github.com/webmachinelearning/daop/tree/main#1-boolean-requirement-api 15:31:43 Anssi: this meetsRequirement() API returns asynchronously and via events a "can meet requirement" boolean response given a tier expectation: 15:31:50 [[ 15:31:50 partial interface MLContext { 15:31:50 Promise meetsRequirement(MLGraph graph, MLPerformanceTier requiredTier, optional MLQoSOptions options); 15:31:50 }; 15:31:52 interface MLQoSChangeEvent : Event { 15:31:52 readonly attribute boolean meetsRequirement; 15:31:52 }; 15:31:53 ]] 15:32:40 Anssi: with these DAOP insights in mind, I want to circle back to the effective MLComputePolicy discussion in issue #934 15:32:41 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] [Agenda+] 15:32:44 ... the following proposals were discussed last time: 15:32:50 ... - 1) compilation metrics & runtime estimates by MikeW 15:32:50 ... - 2) low latency v high throughput tradeoff implications by Dwayne 15:32:50 ... - 3) strict hints to fail at build by MarkusH 15:32:50 ... - 4) "low-latency" and "precision" hints by Dwayne 15:33:21 Anssi: looking at the recent comments, I see an exchange between Ningxin and MikeW about runtime drift considerations 15:33:28 ... Ningxin acknowledged MikeW's point that system load, thermal state, and accelerator contention do change over time 15:33:35 ... Ningxin proposed an intendedComputePolicies API, a compile-time signal to address MarkusH's use case "if fallback, route to cloud": 15:33:39 -> https://github.com/webmachinelearning/webnn/issues/934#issuecomment-4626388359 15:33:40 https://github.com/webmachinelearning/webnn/issues/934 -> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] [Agenda+] 15:34:16 Ningxin: I want to find a middle ground, MikeW make a good point that runtime characteristics are dynamic, timestamp query would be one approach 15:35:19 ... my point was for an application, for MarkusH's use case, the intended compute policy, the best result the context can deliver against the user's hint, decided at MLGraphBuilder.build() and stable for the MLGraph's lifetime 15:36:05 ... the API could have some signal to let the application know the compile-time intention for graph execution plan, we make it clear this could change at runtime, if runtime monitoring is needed, other solutions need to be used 15:36:16 Anssi: the proposed API for this is as follows: 15:36:18 [[ 15:36:18 interface MLGraph { 15:36:18 // Compile-time assignment, not a runtime guarantee. 15:36:19 readonly attribute FrozenArray intendedComputePolicies; 15:36:19 } 15:36:19 ]] 15:36:24 q? 15:37:08 MikeW: the association of fallback with a quaranteed CPU path seems questionnable, because there can be multiple CPU path 15:38:10 ... hints for web apps to make a decision is a wrong API shape 15:39:34 ... the current proposal seems to be that hints can be translated to performance characteristics, it would be better to provide performance characteristics, for instance, latency, how many milliseconds bucketized, runtime execution, bucketized 15:39:42 q? 15:39:58 Present+ Ehsan_Toreini 15:40:00 q+ 15:40:05 ack reillyg 15:40:53 Reilly: MikeW, in the scenario where we are providing bucketed performance characteristics, how does that interact with the system how the graph is constructed at later stage? 15:41:16 ... how much does this is a guarantee, how much sites could use this information? 15:41:39 Q+ 15:42:16 MikeW: you can poll the API, or the granularity of the metrics will be made less precise 15:42:18 q? 15:42:23 ack RafaelCintron 15:42:53 Rafael: question to MikeW, is your proposal to have the UA take the model and guess how long it takes to run and provide that information before inference? 15:43:03 ... and during runtime, tell how long it took? 15:43:16 MikeW: I don't have preference and would be fine with either of these options 15:43:32 q+ 15:44:00 Rafael: the second solution seems better fit, the guessing approach, how would you implement that on Core ML? 15:44:34 MikeW: we can guess when we compile the model, you get back information from Core ML, and can guess based on this, consider it a rough hint 15:44:50 q+ 15:44:57 Rafael: if all ops running on GPU the guess would be "fast" 15:45:22 ... I wonder how well that would work in practice, how good a guess our "guess" would be 15:45:23 q? 15:45:28 ack reillyg 15:46:10 Reilly: I think that guessing is not required, you can run the model, and in the heat of loading the benchmark would be worse than at runtime 15:46:27 ... possible to implement, yes, but not clear on the details 15:47:18 ... related discussion for CPU Performance API on blink-dev 15:48:05 ... multiple pipelines could submit to hardware, changes are dynamic, if performance is not adequate could switch to another compute unit 15:48:45 ... applications are capable of dynamic scaling, but in ML models, loading an ML model is a significant upfront cost, loading and compiling both are costly 15:49:10 ... the purpose of an API like this is to really allow the developer to pick something to do initially that is close to correct 15:49:46 ... the developer may look at this performance information and see it is "mid-tier" and it is much more than they need, and they could download a bigger model in the background, while the app is running fine using a smaller model 15:50:13 ... the space seems to me like solving the "page load time problem", how to in a reasonable amount of time to decide what to load initially to allow the page to function 15:50:27 ... all the solutions discussed don't solve that problem 15:50:28 q? 15:50:41 ack ningxin 15:51:16 Ningxin: by revising MarkusH's use case I observe, it is not only about model inference performance, but also want to avoid data copies 15:51:41 ... use case about noise suppression, audio comes from CPU and the application does not want to move the work to other compute unit 15:51:50 ... this is something I suppose the app wants to avoid 15:52:24 ... also Markus mentions the need to get the execution to be scheduled to a low-latency enabling compute unit 15:52:52 ... I want to add to MikeW's proposal, if we have performance characteristics, could we also indicate if there is a data copy? 15:53:09 ... e.g. add low-latency signal? 15:53:10 q? 15:54:12 MikeW: what Ningxin and Reilly said sounds good, both ideas are valuable and we should consider real use cases such as MarkusH 15:54:17 q? 15:55:07 q? 15:57:05 Subtopic: Discrepancies with maxPool2d padding behavior 15:57:08 Anssi: issue #935 15:57:09 https://github.com/webmachinelearning/webnn/issues/935 -> Issue 935 Discrepancies with maxPool2d padding behavior (by philloooo) [operator specific] [Agenda+] 15:57:28 ... Phillis reports that maxPool2d with roundingType "ceil" produces inconsistent results for different backends when pooling windows cover only padding elements 15:57:32 -> https://www.w3.org/TR/webnn/#dom-mlroundingtype-ceil 15:57:41 Anssi: questions posed to the group: 15:57:55 ... - "What is the expected behavior when a pooling window covers only padding elements for maxPool2d? 15:58:04 ... - "Should the spec clarify whether padding elements are treated as -Infinity (ignored) or if there is a fallback value like 0 when no valid input elements are in the window?" 15:58:18 Anssi: Ningxin proposed "Ignoring padding elements for maxPool2d makes sense to me. A window covering only padding would also be ignored." 15:58:32 ... Dwayne +1'd "ignoring padding elements for maxPool2d" 15:58:41 ... suggests that for "a window covering only padding" needs some value for the pool() op, either 0 or -Infinity 15:58:55 q? 15:59:58 Ningxin: for maxPool2d if we ignore window, the value does not matter, we should make the spec text clear so that we ignore also window for padding elements, should change the output size calculation to adjust output size with stride and padding if window only covers the padding elements 16:00:42 Dwayne: I mentioned in general, you can't sample outside window and padding size, making this point moot 16:01:06 Ningxin: in my proposal "ignore padding elements" if the window covers partially, this is implementation-defined, spec can say ignore this 16:01:23 ... ONNX reference implementation is NaN for this case 16:01:32 q? 16:04:18 RESOLUTION: Ignore padding elements for maxPool2d and adjust output size when a window covers only padding. (issues #935) 16:04:23 RRSAgent, draft minutes 16:04:25 I have made the request to generate https://www.w3.org/2026/06/18-webmachinelearning-minutes.html anssik 16:04:31 +1 16:08:33 s/WIndows/Windows 16:08:51 s/an explainer/the explainer 16:09:16 s/ported over the/ported over to the 16:10:20 s/need to use/should use 16:26:02 RRSAgent, draft minutes 16:26:04 I have made the request to generate https://www.w3.org/2026/06/18-webmachinelearning-minutes.html anssik