14:57:36 RRSAgent has joined #webmachinelearning 14:57:40 logging to https://www.w3.org/2026/05/21-webmachinelearning-irc 14:57:40 Meeting: WebML WG Teleconference – 21 May 2026 14:57:40 RRSAgent, make logs Public 14:57:41 please title this meeting ("meeting: ..."), anssik 14:57:44 Chair: Anssi 14:57:55 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-05-21-wg-agenda.md 14:58:00 Scribe: Anssi 14:58:06 scribeNick: anssik 15:00:25 Present+ Ningxin_Hu 15:00:28 Present+ Dwayne_Robinson 15:01:11 Present+ Anssi_Kostiainen 15:01:17 RRSAgent, draft minutes 15:01:18 I have made the request to generate https://www.w3.org/2026/05/21-webmachinelearning-minutes.html anssik 15:01:25 ningxin has joined #webmachinelearning 15:01:39 Present+ Bryan_Bernhart 15:01:55 DwayneR has joined #webmachinelearning 15:02:23 Present+ Reilly_Grant 15:03:27 Ehsan has joined #webmachinelearning 15:03:44 Anssi: please note gh aka Ghurlbot aka GitHub URL robot is on vacation today, so the minutes will miss some fancy features such as direct links to GH issues and pulls ;) 15:03:44 RafaelCintron has joined #webmachinelearning 15:03:52 Present+ Ehsan_Toreini 15:04:25 Present+ Rafael_Cintron 15:04:36 RRSAgent, draft minutes 15:04:38 I have made the request to generate https://www.w3.org/2026/05/21-webmachinelearning-minutes.html anssik 15:05:20 very interesting! 15:05:47 Topic: Web Neural Network API 15:05:58 Subtopic: Low-precision floating-point data types 15:06:13 Anssi: issue #930 15:06:18 ... last time we resolved to survey the existing backends' support for low-precision floating-point data types 15:06:24 ... we have survey results contributed by Dwayne and MarkusT, thanks! 15:07:16 -> https://github.com/webmachinelearning/webnn/issues/930#issuecomment-4410036511 15:08:00 handellm has joined #webmachinelearning 15:08:55 Dwayne: many ML frameworks support bloat16, for fp8 there's not too many 15:10:32 Reilly: I don't whether my comment is important to have in the spec, it is an observation that in this example constant cast subgraph, input cast and cast output of the subgraph, is possible the implementation could go beyond the underlying framework support 15:10:43 ... if able to take a bit of a storage hit 15:11:17 Anssi: is this informative content useful for implementers? 15:11:50 Reilly: a list of things where we don't need to worry about compatibility because frameworks can do X, Y Z at minimal cost 15:12:36 Present+ Markus_Handell 15:13:05 Reilly: my suggestion would be to put this in "how to maintain the spec" document 15:13:55 Anssi: it sounds like this could go to https://github.com/webmachinelearning/webnn/blob/main/CONTRIBUTING.md 15:15:29 Anssi: proposed next steps to use Dwayne's table to come up with a concrete proposal 15:15:43 s/steps/step 15:18:33 +1 15:18:41 RESOLUTION: Draft a concrete proposal based on the survey results documented in the issue and update CONTRIBUTING.md with polyfill guidance. (issue #930) 15:18:59 Subtopic: Allow 0 size dimensions 15:19:03 Anssi: issue #391 15:19:36 ... Chromium DirectML backend that didn't support 0's in dimensions has been removed from Chromium 15:19:53 ... this unblocks the feature, and suggest we can resolve to allow 0 size dimensions in WebNN 15:20:24 ... this assuming the original use case is still valid: 15:20:29 ... "a graph where a tensor may be temporarily sliced down to emptiness and then reconcatenated later with other data" 15:20:54 Anssi: is that use case is still valid? 15:21:10 s/is still/still 15:21:48 Bryan: no concerns 15:22:06 RESOLUTION: Allow 0 size dimensions in WebNN (issue #391) 15:22:20 Subtopic: Upper bounds on concat inputs and split outputs 15:22:25 Anssi: issue #931 PR #933 15:22:47 ... this is about proposed security mitigation to prevent OOM crashes 15:23:04 ... the proposed approach is to set upper bounds on: 15:23:08 ... - the number of inputs to concat 15:23:12 ... - the number of outputs from split 15:23:33 ... our method for picking an appropriate upper bound was "4-10x as large as the largest thing we've ever seen" rounding down to the previous lowest power of 2 15:23:41 ... the issue discussion suggests we're converging to 8192 as the upper bound for both 15:23:52 ... 8192 proposed by Dwayne, and I see thumbs up by Phillis and MikeW on GitHub 15:23:57 ... PR #933 implements the proposed change 15:24:39 Anssi: any last comments before we merge? 15:24:51 [no concerns] 15:24:52 RESOLUTION: Set upper bounds on the number of inputs to concat and outputs from split to 8192 (issue #931, PR #933) 15:25:08 Subtopic: Effective MLComputePolicy exposure 15:25:15 Anssi: issue #924 15:25:31 s/#924/#934 15:25:39 -> https://github.com/webmachinelearning/webnn/issues/934 15:25:57 Anssi: we resolved to open a new issue for Effective MLComputePolicy exposure, this is it 15:26:25 ... the group decided to shift away from thd device-centric graph.devices proposal prototyped by Phillis to a policy-based abstraction 15:26:43 ... the expectation is that this "effective MLComputePolicy" will align with the MLComputePolicy enum concepts passed as hints at context creation time (PR #923 ready to merge): 15:26:49 enum MLComputePolicy { 15:26:49 "default", 15:26:49 "high-performance", 15:26:49 "low-power", 15:26:49 "fallback" 15:26:49 }; 15:27:02 Anssi: the web-facing API change is MLContextOptions.powerPreference -> MLContextOptions.computePolicy 15:27:11 ... now "Effective MLComputePolicy" is the other part 15:27:36 ... the effective MLComputePolicy that is actually used by the implementation, as opposed to the policy hint specified by the user at context creation time 15:27:43 ... this means the effective policy can be reliably exposed only after graph compilation 15:27:56 ... MarkusH will introduce the use cases and concrete proposal for Effective MLComputePolicy, thanks! 15:28:58 MarkusH: it seems natural to use compute policy of the compiler graph 15:29:11 ... to understand the quality of execution 15:29:32 ... like graph.devices but using a policy set as an abstraction instead 15:30:06 ... if we have a graph we expect to run on CPU with fallback, and we see low-power + high-performance we wouldn't try to execute it 15:30:20 ... we'd record a failure to get the required compute capabilities 15:31:28 ... 1. Real-Time Media Constraints 15:31:49 ... audio models need low and reasonably predictable latency to ensure UX quality 15:32:17 ... 2. Dynamic execution routing 15:32:23 ... if potential hardware contention is detected, can rearrange workloads to avoid degrading UX 15:32:31 ... 3. QoS monitoring 15:33:02 ... can monitor the effective compute policy to detect if the implementation is falling back to a less efficient execution path, and use this information for debugging and optimization 15:33:24 MarkusH: we only succeed sometimes in 10% of times and want to reconsider the architecture of the model 15:33:25 q? 15:33:37 -> use cases https://github.com/webmachinelearning/webnn/issues/934#issuecomment-4406374578 15:33:59 q+ 15:34:05 ack RafaelCintron 15:34:33 Rafael: OK to return the policy, but why it is an array? 15:34:55 ... some things don't go together, e.g. low-power + high-performance, you could use both but it depends? 15:35:09 MarkusH: future device could be both low-power and high-performance, you could return both 15:35:28 ... some won't be able to execute on hardware, then you'd return fallback + high-perfomance 15:35:51 Are they in priority order? 15:35:53 q+ 15:36:27 Rafael: we need to check with MikeW what can be implemented on Apple hardware 15:36:57 Core ML does let you. 15:37:19 Dwayne: Are they in priority order? 15:37:27 Core ML gives you supported devices on a per-operator basis. 15:37:55 MarkusH: haven't though about the order, was initially looking at Set type, but we don't seem to have an IDL equivalent 15:38:23 q+ 15:38:41 Dwayne: these could be potentially seamingly exclusive options, will continue in the issue 15:38:43 ack ningxin 15:39:15 Ningxin: my question is about dynamic execution routing, to decide how to run a workload localy or in the load, does the app need to download the workload to observe the effective policy? 15:39:37 ... is that for a new workload or an existing local workload that is running? 15:40:29 MarkusH: if we have an idea what the current workload is, we compile the model and see where it ends up, get this effective compute policies and see if there's a contention risk, and can decide to not run it on the device but in the cloud 15:40:31 q? 15:40:42 Ningxin: app needs to download the model to know? 15:40:52 MarkusH: right 15:41:33 -> https://github.com/webmachinelearning/daop 15:41:57 q? 15:42:27 Ningxin: QoS monitoring sounds like it would end up in dynamic routing, e.g. degration of QoS you could fallback to the cloud? 15:42:57 s/degration/degradation during runtime is observed 15:43:49 MarkusH: yes, runtime adaptation would be needed in this case 15:44:15 q? 15:44:18 ack reillyg 15:44:49 Reilly: question on CoreML implementability 15:45:12 ... you get a report on an op-basis where to run after compile 15:45:49 ... quality is not necessarily the right framing, when you partition the graph you want to reduce the number of partitions and minimize passing data between the partitions 15:46:17 ... not easy way to express that to the web developer, rather express as "priority over quality" 15:46:40 I never said "quality". I said "priority". 15:47:04 s/priority over quality/priority 15:47:09 q? 15:47:31 s/not easy way to express that to the web developer, rather express as// 15:47:50 RRSAgent, draft minutes 15:47:51 I have made the request to generate https://www.w3.org/2026/05/21-webmachinelearning-minutes.html anssik 15:48:28 q? 15:48:57 q? 15:49:46 MarkusH: I'm interested in what MikeW has to say about the proposal, also Zoltan's feedback on how to make these checks simpler for web developers 15:50:56 Anssi: are your three use cases in order of importance? 15:51:30 MarkusH: the most critical use cases to address are 1 and 3 15:51:46 ... 2 is becoming more important due to global compute shortage in data centers 15:53:07 Anssi: anyone have developers in mind who would like to review this proposal? 15:53:29 q+ 15:53:32 ... and how does this fit in with JS ML Framework abstraction on top? 15:53:37 ack RafaelCintron 15:54:08 q+ 15:54:24 Rafael: high-performance and low-power closely match with WebGL and WebGPU, so exposure in those APIs, we can ask people for feedback 15:54:33 q- 15:54:43 ... I'm hopeful others will follow good paths paved by pioneers 15:57:14 Reilly: as implementers, we probably don't have the best understanding of what web developers needs exactly, thanks MarkusH for providing this view 15:59:30 RRSAgent, draft minutes 15:59:32 I have made the request to generate https://www.w3.org/2026/05/21-webmachinelearning-minutes.html anssik 16:00:50 s/don't whether/don't know whether 16:01:28 s/constant cast/constant > cast 16:02:03 s/if able/if the implementation is able 16:02:24 s/Y Z/Y or Z 16:04:20 s/thd/the 16:05:14 s/compiler/compiled 16:07:54 s/localy/locally 16:08:22 s/in the load/on the cload 16:08:32 s/cload/cloud 16:09:26 RESOLUTION: Review Dynamic AI Offload Protocol in context of the effective MLComputePolicy proposal. 16:09:46 RRSAgent, draft minutes 16:09:47 I have made the request to generate https://www.w3.org/2026/05/21-webmachinelearning-minutes.html anssik 16:12:26 s/degradation during runtime is observed of QoS/if QoS degradation is observed during runtime 16:13:34 s/Framework abstraction/framework abstractions 16:14:10 s/those APIs/those APIs familiar to web developers 16:14:35 s/needs exactly/need exactly 16:14:55 RRSAgent, draft minutes 16:14:57 I have made the request to generate https://www.w3.org/2026/05/21-webmachinelearning-minutes.html anssik 18:24:28 gb has joined #webmachinelearning