14:53:14 RRSAgent has joined #webmachinelearning 14:53:18 logging to https://www.w3.org/2026/05/07-webmachinelearning-irc 14:53:18 RRSAgent, make logs Public 14:53:19 please title this meeting ("meeting: ..."), anssik 14:53:19 Meeting: WebML WG Teleconference – 7 May 2026 14:53:24 Chair: Anssi 14:53:29 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-05-07-wg-agenda.md 14:53:41 Scribe: Anssi 14:53:46 scribeNick: anssik 14:53:54 gb, this is webmachinelearning/webnn 14:53:54 anssik, OK. 14:53:59 Present+ Anssi_Kostiainen 14:54:03 RRSAgent, draft minutes 14:54:04 I have made the request to generate https://www.w3.org/2026/05/07-webmachinelearning-minutes.html anssik 14:57:14 mtavenrath has joined #webmachinelearning 14:57:41 mtavenrath1 has joined #webmachinelearning 15:00:31 Present+ Reilly_Grant 15:00:36 Present+ Markus_Tavenrath 15:02:13 Present+ Bryan_Bernhart 15:02:19 Present+ Dwayne_Robinson 15:02:24 handellm has joined #webmachinelearning 15:02:32 DwayneR has joined #webmachinelearning 15:02:40 Present+ Ningxin_Hu 15:03:29 Present+ Markus_Handell 15:03:59 Present+ Rafael_Cintron 15:04:04 ningxin has joined #webmachinelearning 15:04:18 RafaelCintron has joined #webmachinelearning 15:04:35 Anssi: please join me in welcoming Sarah Drasner from Google as a new participant to the WG 15:04:39 Topic: Web Neural Network API 15:04:53 Subtopic: Proposed new low-precision floating-point data types 15:04:58 Anssi: issue #930 15:04:59 https://github.com/webmachinelearning/webnn/issues/930 -> Issue 930 RFE: Add support for more floating point low-precision ML data types (`bfloat16`, `fp8`, `nvfp4`) (by mtavenrath) [opset] [feature request] [Agenda+] 15:05:21 ... a proposal for MarkusT for new low-precision floating-point data types in WebNN 15:05:26 ... these would help avoid upcast or quantization 15:05:39 ... help with model porting, quantization accuracy, reduce memory footprint and bandwidth 15:05:49 ... current WebNN data type support: float32, float16, int64, uint64, int32, uint32, int8, uint8 15:05:53 -> https://www.w3.org/TR/webnn/#appendices-mloperanddatatype-arraybufferview-compatibility 15:06:14 Anssi: Dwayne provided feedback on the proposal, asking about cross-platform implementability 15:06:33 ... proposed three criteria for evaluating the proposed data types: 15:06:37 ... - supported widely by various hardware 15:06:45 ... - available in backends CoreML/TFLite/ORT 15:06:52 ... - likely to stand the test of time 15:07:48 MarkusT: bfloat16 and fp8 are quite popular in models today 15:08:05 ... most current-gen HW support these data types 15:08:36 ... we might need only float8m4e3 variant for inference for WebNN 15:09:21 ... bloat16 for inference with larger values, with bfloat16 input, if HW does not support can convert to WebNN supported types 15:10:06 ... I think fp4 is not yet ready to be standardized 15:10:43 q+ 15:10:50 Dwayne: survey of existing backends' support for these would be a good next step 15:11:40 MarkusT: polyfilling opportunity? 15:11:42 ack reillyg 15:12:07 Reilly: data types for constants that are not supported by the backend, browser can wait until it builds the graph, can cast to supported data types 15:12:34 ... can take a float value, pass to cast operator and use that in an operator and an implementation will allow implementation to cast to more types than the backends allow 15:12:35 q? 15:14:19 MarkusT: casting is one time cost mostly, no complications 15:15:33 Reilly: frameworks must understand the WebNN may not support data types it passes to WebNN but needs to do casting, need opt in for this casting to happen 15:16:25 Reilly: opSupportLimits tells what ops are possible, developer can choose how to fit within the limitations of the implementation 15:16:51 ... I'm not implying any additional support unless we can implement constant casting 15:18:04 anssik has joined #webmachinelearning 15:18:04 tomayac27 has joined #webmachinelearning 15:18:04 christianliebel has joined #webmachinelearning 15:18:04 chrishtr has joined #webmachinelearning 15:18:04 gregwhitworth has joined #webmachinelearning 15:18:04 vmpstr has joined #webmachinelearning 15:18:04 vasilii has joined #webmachinelearning 15:18:04 jyasskin has joined #webmachinelearning 15:18:04 awafaa has joined #webmachinelearning 15:18:19 ... it is up to the developer to allow the graph to say fp8 weights, require to do fp8 math and detect through opSupportLimits, fp8 is not supported ... (inaudible) ... the app can detect that 15:19:10 q+ 15:21:38 Ningxin: I can contribute OpenVINO data to the survey 15:21:57 ... we can also do a survey about fp8 usage in well-known models 15:22:18 ... it can be used to compress weights, also used for compressed KV cache 15:22:51 q? 15:22:59 ack ningxin 15:23:15 RESOLUTION: Survey the existing backends' support for low-precision floating-point data types 15:23:25 Subtopic: Formulaic mitigation for integer overflows 15:23:29 Anssi: issue #928 15:23:30 https://github.com/webmachinelearning/webnn/issues/928 -> Issue 928 Consider specify lower limit for conv2d/pool2d kernel sizes, dilations, strides (by philloooo) [security-tracker] [Agenda+] 15:23:46 ... continuing from our last meeting, we seem to now have a proposal for a formulaic mitigation for integer overflows in WebNN, that would be applicable to all backends 15:23:51 ... Dillon's latest proposal: 15:23:56 ... output_width * output_height * kernel_width * kernel_height * input_channels * sizeof(input_element) to be less than 2GB 15:24:09 Anssi: Dwayne reviewed the proposal, provided feedback, and had questions 15:24:18 ... is this specific to 32-bit systems? 15:24:23 ... does this break Stable Diffusion? 15:25:05 Dwayne: I tested the proposed equation, it seems to be going to a good direction, we can continue to iterate on that idea 15:25:06 q? 15:25:28 zolkis has joined #webmachinelearning 15:25:41 q+ 15:25:41 Present+ Zoltan_Kis 15:25:58 ack ningxin 15:26:34 Ningxin: this might be implementation dependent, not sure if this would be a good limit to expose to applications 15:26:40 q+ 15:26:47 ack reillyg 15:27:13 Reilly: I concur Ningxin, this is implementation specific and we want to express this for opSupportLimits 15:27:34 q? 15:28:21 Reilly: we should think this similarly to tensor ranks 15:31:12 My suggestion is that this is no different that existing operator support limits expressed by opSupportLimits(). 15:31:30 This is not a new area of implemention-specific behavior. 15:31:59 The proposed resolution should be to extend opSupportLimits() to be able to express limits on the size of inputs. 15:32:32 +1 15:32:35 +! 15:32:45 RESOLUTION: Extend opSupportLimits() to be able to express limits on the size of inputs. (issue #928) 15:32:59 Subtopic: Bounded dynamic dimension 15:33:03 Anssi: issue #883 15:33:04 https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+] 15:33:10 ... the group has been reviewing the new 9 ops required to enable this feature: mod, shape, range, dynamic[Reshape | Expand | Slice | Pad | Split | Resample2d] 15:33:14 -> https://github.com/webmachinelearning/webnn/issues/883#issuecomment-4302848951 15:33:15 https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+] 15:33:24 Anssi: extensive review comments from Dwayne received, thanks!: 15:33:34 -> https://github.com/webmachinelearning/webnn/issues/883#issuecomment-4394072650 15:34:12 Dwayne: I support adding new ops, just want to make sure they're consistent with existing ops 15:34:59 Ningxin: a questions about algorithm for shape calculation, tooling could help eliminate shape calculation 15:35:13 ... CPU-based operations to do shape inference before GPU 15:36:01 ... how that would impact tooling, 1) how much tools can help to reduce the need for shape inference in the graph, more investigation needed? 2) CPU or GPU device, sync to avoid any performance issues 15:36:36 ...GPU may need to wait for CPU in some cases 15:36:48 Dwayne: I need to use a concrete example and share that in the issue comment 15:37:23 ... not sure how to express this with tooling, shape computation inside the model, the author could do that, but if we can generically express that in WebNN we don't need to ask the backends to have their own heuristics to read back from devices 15:37:34 ... I'd like to take SD and see how this might be expressed in that model 15:38:04 Ningxin: SD probably already is solved, could we focus on some new model, Z-Image-Turbo? 15:38:05 q? 15:38:46 Dwayne: some ops in this proposal are useful outside dynamic shape computation as well 15:39:33 Ningxin: my plan is to as the next step investigate the model, and in parallel work with the team to look at the dispatch-time shape validation as mentioned earlier 15:39:40 ... that is needed if we choose this path 15:39:59 ... Chromium implementation prototype work also planned 15:40:07 q? 15:40:37 Subtopic: Effective MLComputePolicy exposure 15:40:51 Anssi: PR #923 15:40:51 https://github.com/webmachinelearning/webnn/pull/923 -> Pull Request 923 Refactor device selection: Rename to computePolicy, remove accelerated, and add fallback (by mingmingtasd) [device selection] 15:41:22 ... as a spin-off from PR #923 that is ready to land now, we had a discussion about the effective MLComputePolicy that is actually used by the implementation 15:41:41 ... this effective policy can be exposed only after compilation 15:42:07 ... the assumption here is this policy-centric mechanism will supersede the earlier device-centric MLGraph.devices proposal that was discussed in issue #836 and PR #854 that was closed 15:42:07 https://github.com/webmachinelearning/webnn/pull/854 -> CLOSED Pull Request 854 define graph.devices (by philloooo) [device selection] 15:42:07 https://github.com/webmachinelearning/webnn/issues/836 -> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection] 15:43:06 Reilly: we still expose MLGraph.devices on the graph in Chromium 15:43:31 ... if the updated proposal works for MarkusH we can implement it 15:43:48 ... I think the group believes the policy is a better abstraction than devices 15:44:11 ... but we need to make sure the effective policy is exposed in a way that allows developers to understand it 15:45:47 MarkusH: I have two major things: 1) we do need feedback if we have a model that is intended to run on a low-power device such as NPU it actually runs there as expected; 2) adaptation use case, we can shift execution away if we're overloaded, and get poor performance from "high-performance" 15:46:55 Anssi: is this as simple as expose MLGraph.policy? 15:47:38 Reilly: fallback = false signal would express what the developer is looking for 15:47:52 q? 15:48:31 MarkusH: if we start adding things to MLComputePolicy mapping to effective policy may not be 1:1 15:48:32 q? 15:48:33 q+ 15:48:37 ack RafaelCintron 15:49:40 Rafael: one thing, I'm OK with current "low-power", "high-performance", but in some generation of devices, should not assume "high-performance" means GPU always 15:49:54 Ack 15:50:16 q+ 15:50:23 ack ningxin 15:51:25 Ningxin: re Rafael's point, Intel Panther Lake has 12Xe GPU configuration, where GPU is faster than NPU, but on lower-end SKU, GPU is less capable than the NPU in that device 15:51:46 ... re MarkusH comment, GPU offloading, is an application hint helpful here? 15:52:01 q+ 15:52:03 ... something like offloading 15:52:06 ack handellm 15:52:24 MarkusH: my thought is to specify fallback when I want to avoid accelerated devices 15:52:52 ... I'd specify fallback in terms of MLComputePolicy 15:54:07 Ningxin: that is useful, my point is GPU might be already busy so you may want to move workload to another device such as NPU in such a case 15:54:28 ... should we have something specific to signal that preference "still use some accelerator, but avoid GPU" 15:54:56 MarkusH: if I spec "low-power" I'd do that because I'm running on a laptop that's not plugged in 15:55:13 ... if there's enough power and performance available that might not be a problem 15:55:51 ... if we have performance issues on inference workloads, with specific device, it makes sense to understand that 15:56:12 ... if NPU is low power I'd execute on that device 15:56:22 Ningxin: we can prototype and test 15:56:23 q? 15:57:04 Zoltan: we went back and forth on this design earlier, no strong opinion at this stage 15:57:20 q? 15:57:49 Anssi: should we open a fresh issue for this? 15:57:53 MarkusH supports the idea 15:58:08 Anssi: I propose we close issue #836 and open a new issue for Effective MLComputePolicy exposure 15:58:08 https://github.com/webmachinelearning/webnn/issues/836 -> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection] 15:58:45 Reilly: I think the proposal has shifted enough that we should open a fresh issue 15:59:24 +1 15:59:26 SG! 15:59:30 RESOLUTION: Close issue #836 and open a new issue for Effective MLComputePolicy exposure discussion. 16:00:19 RRSAgent, draft minutes 16:00:21 I have made the request to generate https://www.w3.org/2026/05/07-webmachinelearning-minutes.html anssik 16:15:33 s/will allow implementation to cast/can cast 16:15:53 s/the WebNN/that WebNN 16:16:34 s/it passes/passed 16:18:22 s/a questions/a question 16:19:15 s/SD/Stable Diffusion 16:21:11 s/NPU it/NPU that it 16:22:48 s/with specific/with a specific 16:23:39 RRSAgent, draft minutes 16:23:40 I have made the request to generate https://www.w3.org/2026/05/07-webmachinelearning-minutes.html anssik 18:29:17 Zakim has left #webmachinelearning 18:57:27 gb has joined #webmachinelearning