14:53:14 <RRSAgent> RRSAgent has joined #webmachinelearning
14:53:18 <RRSAgent> logging to https://www.w3.org/2026/05/07-webmachinelearning-irc
14:53:18 <Zakim> RRSAgent, make logs Public
14:53:19 <Zakim> please title this meeting ("meeting: ..."), anssik
14:53:19 <anssik> Meeting: WebML WG Teleconference – 7 May 2026
14:53:24 <anssik> Chair: Anssi
14:53:29 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2026-05-07-wg-agenda.md
14:53:41 <anssik> Scribe: Anssi
14:53:46 <anssik> scribeNick: anssik
14:53:54 <anssik> gb, this is webmachinelearning/webnn
14:53:54 <gb> anssik, OK.
14:53:59 <anssik> Present+ Anssi_Kostiainen
14:54:03 <anssik> RRSAgent, draft minutes
14:54:04 <RRSAgent> I have made the request to generate https://www.w3.org/2026/05/07-webmachinelearning-minutes.html anssik
14:57:14 <mtavenrath> mtavenrath has joined #webmachinelearning
14:57:41 <mtavenrath1> mtavenrath1 has joined #webmachinelearning
15:00:31 <anssik> Present+ Reilly_Grant
15:00:36 <anssik> Present+ Markus_Tavenrath
15:02:13 <anssik> Present+ Bryan_Bernhart
15:02:19 <anssik> Present+ Dwayne_Robinson
15:02:24 <handellm> handellm has joined #webmachinelearning
15:02:32 <DwayneR> DwayneR has joined #webmachinelearning
15:02:40 <anssik> Present+ Ningxin_Hu
15:03:29 <anssik> Present+ Markus_Handell
15:03:59 <anssik> Present+ Rafael_Cintron
15:04:04 <ningxin> ningxin has joined #webmachinelearning
15:04:18 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:04:35 <anssik> Anssi: please join me in welcoming Sarah Drasner from Google as a new participant to the WG
15:04:39 <anssik> Topic: Web Neural Network API
15:04:53 <anssik> Subtopic: Proposed new low-precision floating-point data types
15:04:58 <anssik> Anssi: issue #930
15:04:59 <gb> https://github.com/webmachinelearning/webnn/issues/930 -> Issue 930 RFE:  Add support for more floating point low-precision ML data types (`bfloat16`, `fp8`, `nvfp4`) (by mtavenrath) [opset] [feature request] [Agenda+]
15:05:21 <anssik> ... a proposal for MarkusT for new low-precision floating-point data types in WebNN
15:05:26 <anssik> ... these would help avoid upcast or quantization
15:05:39 <anssik> ... help with model porting, quantization accuracy, reduce memory footprint and bandwidth
15:05:49 <anssik> ... current WebNN data type support: float32, float16, int64, uint64, int32, uint32, int8, uint8
15:05:53 <anssik> -> https://www.w3.org/TR/webnn/#appendices-mloperanddatatype-arraybufferview-compatibility
15:06:14 <anssik> Anssi: Dwayne provided feedback on the proposal, asking about cross-platform implementability
15:06:33 <anssik> ... proposed three criteria for evaluating the proposed data types:
15:06:37 <anssik> ... - supported widely by various hardware
15:06:45 <anssik> ... - available in backends CoreML/TFLite/ORT
15:06:52 <anssik> ... - likely to stand the test of time
15:07:48 <anssik> MarkusT: bfloat16 and fp8 are quite popular in models today
15:08:05 <anssik> ... most current-gen HW support these data types
15:08:36 <anssik> ... we might need only float8m4e3 variant for inference for WebNN
15:09:21 <anssik> ... bloat16 for inference with larger values, with bfloat16 input, if HW does not support can convert to WebNN supported types
15:10:06 <anssik> ... I think fp4 is not yet ready to be standardized
15:10:43 <reillyg> q+
15:10:50 <anssik> Dwayne: survey of existing backends' support for these would be a good next step
15:11:40 <anssik> MarkusT: polyfilling opportunity?
15:11:42 <anssik> ack reillyg
15:12:07 <anssik> Reilly: data types for constants that are not supported by the backend, browser can wait until it builds the graph, can cast to supported data types
15:12:34 <anssik> ... can take a float value, pass to cast operator and use that in an operator and an implementation will allow implementation to cast to more types than the backends allow
15:12:35 <anssik> q?
15:14:19 <anssik> MarkusT: casting is one time cost mostly, no complications
15:15:33 <anssik> Reilly: frameworks must understand the WebNN may not support data types it passes to WebNN but needs to do casting, need opt in for this casting to happen
15:16:25 <anssik> Reilly: opSupportLimits tells what ops are possible, developer can choose how to fit within the limitations of the implementation
15:16:51 <anssik> ... I'm not implying any additional support unless we can implement constant casting
15:18:04 <anssik> anssik has joined #webmachinelearning
15:18:04 <tomayac27> tomayac27 has joined #webmachinelearning
15:18:04 <christianliebel> christianliebel has joined #webmachinelearning
15:18:04 <chrishtr> chrishtr has joined #webmachinelearning
15:18:04 <gregwhitworth> gregwhitworth has joined #webmachinelearning
15:18:04 <vmpstr> vmpstr has joined #webmachinelearning
15:18:04 <vasilii> vasilii has joined #webmachinelearning
15:18:04 <jyasskin> jyasskin has joined #webmachinelearning
15:18:04 <awafaa> awafaa has joined #webmachinelearning
15:18:19 <anssik> ... it is up to the developer to allow the graph to say fp8 weights, require to do fp8 math and detect through opSupportLimits, fp8 is not supported ... (inaudible) ... the app can detect that
15:19:10 <ningxin> q+
15:21:38 <anssik> Ningxin: I can contribute OpenVINO data to the survey
15:21:57 <anssik> ... we can also do a survey about fp8 usage in well-known models
15:22:18 <anssik> ... it can be used to compress weights, also used for compressed KV cache
15:22:51 <anssik> q?
15:22:59 <anssik> ack ningxin
15:23:15 <anssik> RESOLUTION: Survey the existing backends' support for low-precision floating-point data types
15:23:25 <anssik> Subtopic: Formulaic mitigation for integer overflows
15:23:29 <anssik> Anssi: issue #928
15:23:30 <gb> https://github.com/webmachinelearning/webnn/issues/928 -> Issue 928 Consider specify lower limit for conv2d/pool2d kernel sizes, dilations, strides (by philloooo) [security-tracker] [Agenda+]
15:23:46 <anssik> ... continuing from our last meeting, we seem to now have a proposal for a formulaic mitigation for integer overflows in WebNN, that would be applicable to all backends
15:23:51 <anssik> ... Dillon's latest proposal:
15:23:56 <anssik> ... output_width * output_height * kernel_width * kernel_height * input_channels * sizeof(input_element) to be less than 2GB
15:24:09 <anssik> Anssi: Dwayne reviewed the proposal, provided feedback, and had questions
15:24:18 <anssik> ... is this specific to 32-bit systems?
15:24:23 <anssik> ... does this break Stable Diffusion?
15:25:05 <anssik> Dwayne: I tested the proposed equation, it seems to be going to a good direction, we can continue to iterate on that idea
15:25:06 <anssik> q?
15:25:28 <zolkis> zolkis has joined #webmachinelearning
15:25:41 <ningxin> q+
15:25:41 <anssik> Present+ Zoltan_Kis
15:25:58 <anssik> ack ningxin
15:26:34 <anssik> Ningxin: this might be implementation dependent, not sure if this would be a good limit to expose to applications
15:26:40 <reillyg> q+
15:26:47 <anssik> ack reillyg
15:27:13 <anssik> Reilly: I concur Ningxin, this is implementation specific and we want to express this for opSupportLimits
15:27:34 <anssik> q?
15:28:21 <anssik> Reilly: we should think this similarly to tensor ranks
15:31:12 <reillyg> My suggestion is that this is no different that existing operator support limits expressed by opSupportLimits().
15:31:30 <reillyg> This is not a new area of implemention-specific behavior.
15:31:59 <reillyg> The proposed resolution should be to extend opSupportLimits() to be able to express limits on the size of inputs.
15:32:32 <ningxin> +1
15:32:35 <reillyg> +!
15:32:45 <anssik> RESOLUTION: Extend opSupportLimits() to be able to express limits on the size of inputs. (issue #928)
15:32:59 <anssik> Subtopic: Bounded dynamic dimension
15:33:03 <anssik> Anssi: issue #883
15:33:04 <gb> https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
15:33:10 <anssik> ... the group has been reviewing the new 9 ops required to enable this feature: mod, shape, range, dynamic[Reshape | Expand | Slice | Pad | Split | Resample2d]
15:33:14 <anssik> -> https://github.com/webmachinelearning/webnn/issues/883#issuecomment-4302848951
15:33:15 <gb> https://github.com/webmachinelearning/webnn/issues/883 -> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
15:33:24 <anssik> Anssi: extensive review comments from Dwayne received, thanks!:
15:33:34 <anssik> -> https://github.com/webmachinelearning/webnn/issues/883#issuecomment-4394072650
15:34:12 <anssik> Dwayne: I support adding new ops, just want to make sure they're consistent with existing ops
15:34:59 <anssik> Ningxin: a questions about algorithm for shape calculation, tooling could help eliminate shape calculation
15:35:13 <anssik> ... CPU-based operations to do shape inference before GPU
15:36:01 <anssik> ... how that would impact tooling, 1) how much tools can help to reduce the need for shape inference in the graph, more investigation needed? 2) CPU or GPU device, sync to avoid any performance issues
15:36:36 <anssik> ...GPU may need to wait for CPU in some cases
15:36:48 <anssik> Dwayne: I need to use a concrete example and share that in the issue comment
15:37:23 <anssik> ... not sure how to express this with tooling, shape computation inside the model, the author could do that, but if we can generically express that in WebNN we don't need to ask the backends to have their own heuristics to read back from devices
15:37:34 <anssik> ... I'd like to take SD and see how this might be expressed in that model
15:38:04 <anssik> Ningxin: SD probably already is solved, could we focus on some new model, Z-Image-Turbo?
15:38:05 <anssik> q?
15:38:46 <anssik> Dwayne: some ops in this proposal are useful outside dynamic shape computation as well
15:39:33 <anssik> Ningxin: my plan is to as the next step investigate the model, and in parallel work with the team to look at the dispatch-time shape validation as mentioned earlier
15:39:40 <anssik> ... that is needed if we choose this path
15:39:59 <anssik> ... Chromium implementation prototype work also planned
15:40:07 <anssik> q?
15:40:37 <anssik> Subtopic: Effective MLComputePolicy exposure
15:40:51 <anssik> Anssi: PR #923
15:40:51 <gb> https://github.com/webmachinelearning/webnn/pull/923 -> Pull Request 923 Refactor device selection: Rename to computePolicy, remove accelerated, and add fallback (by mingmingtasd) [device selection]
15:41:22 <anssik> ... as a spin-off from PR #923 that is ready to land now, we had a discussion about the effective MLComputePolicy that is actually used by the implementation
15:41:41 <anssik> ... this effective policy can be exposed only after compilation
15:42:07 <anssik> ... the assumption here is this policy-centric mechanism will supersede the earlier device-centric MLGraph.devices proposal that was discussed in issue #836 and PR #854 that was closed
15:42:07 <gb> https://github.com/webmachinelearning/webnn/pull/854 -> CLOSED Pull Request 854 define graph.devices (by philloooo) [device selection]
15:42:07 <gb> https://github.com/webmachinelearning/webnn/issues/836 -> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]
15:43:06 <anssik> Reilly: we still expose MLGraph.devices on the graph in Chromium
15:43:31 <anssik> ... if the updated proposal works for MarkusH we can implement it
15:43:48 <anssik> ... I think the group believes the policy is a better abstraction than devices
15:44:11 <anssik> ... but we need to make sure the effective policy is exposed in a way that allows developers to understand it
15:45:47 <anssik> MarkusH: I have two major things: 1) we do need feedback if we have a model that is intended to run on a low-power device such as NPU it actually runs there as expected; 2) adaptation use case, we can shift execution away if we're overloaded, and get poor performance from "high-performance"
15:46:55 <anssik> Anssi: is this as simple as expose MLGraph.policy?
15:47:38 <anssik> Reilly: fallback = false signal would express what the developer is looking for
15:47:52 <anssik> q?
15:48:31 <anssik> MarkusH: if we start adding things to MLComputePolicy mapping to effective policy may not be 1:1
15:48:32 <anssik> q?
15:48:33 <RafaelCintron> q+
15:48:37 <anssik> ack RafaelCintron
15:49:40 <anssik> Rafael: one thing, I'm OK with current "low-power", "high-performance", but in some generation of devices, should not assume "high-performance" means GPU always
15:49:54 <handellm> Ack
15:50:16 <ningxin> q+
15:50:23 <anssik> ack ningxin
15:51:25 <anssik> Ningxin: re Rafael's point, Intel Panther Lake has 12Xe GPU configuration, where GPU is faster than NPU, but on lower-end SKU, GPU is less capable than the NPU in that device
15:51:46 <anssik> ... re MarkusH comment, GPU offloading, is an application hint helpful here?
15:52:01 <handellm> q+
15:52:03 <anssik> ... something like offloading
15:52:06 <anssik> ack handellm
15:52:24 <anssik> MarkusH: my thought is to specify fallback when I want to avoid accelerated devices
15:52:52 <anssik> ... I'd specify fallback in terms of MLComputePolicy
15:54:07 <anssik> Ningxin: that is useful, my point is GPU might be already busy so you may want to move workload to another device such as NPU in such a case
15:54:28 <anssik> ... should we have something specific to signal that preference "still use some accelerator, but avoid GPU"
15:54:56 <anssik> MarkusH: if I spec "low-power" I'd do that because I'm running on a laptop that's not plugged in
15:55:13 <anssik> ... if there's enough power and performance available that might not be a problem
15:55:51 <anssik> ... if we have performance issues on inference workloads, with specific device, it makes sense to understand that
15:56:12 <anssik> ... if NPU is low power I'd execute on that device
15:56:22 <anssik> Ningxin: we can prototype and test
15:56:23 <anssik> q?
15:57:04 <anssik> Zoltan: we went back and forth on this design earlier, no strong opinion at this stage
15:57:20 <anssik> q?
15:57:49 <anssik> Anssi: should we open a fresh issue for this?
15:57:53 <anssik> MarkusH supports the idea
15:58:08 <anssik> Anssi: I propose we close issue #836 and open a new issue for Effective MLComputePolicy exposure
15:58:08 <gb> https://github.com/webmachinelearning/webnn/issues/836 -> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]
15:58:45 <anssik> Reilly: I think the proposal has shifted enough that we should open a fresh issue
15:59:24 <reillyg> +1
15:59:26 <handellm> SG!
15:59:30 <anssik> RESOLUTION: Close issue #836 and open a new issue for Effective MLComputePolicy exposure discussion.
16:00:19 <anssik> RRSAgent, draft minutes
16:00:21 <RRSAgent> I have made the request to generate https://www.w3.org/2026/05/07-webmachinelearning-minutes.html anssik
16:15:33 <anssik> s/will allow implementation to cast/can cast
16:15:53 <anssik> s/the WebNN/that WebNN
16:16:34 <anssik> s/it passes/passed
16:18:22 <anssik> s/a questions/a question
16:19:15 <anssik> s/SD/Stable Diffusion
16:21:11 <anssik> s/NPU it/NPU that it
16:22:48 <anssik> s/with specific/with a specific
16:23:39 <anssik> RRSAgent, draft minutes
16:23:40 <RRSAgent> I have made the request to generate https://www.w3.org/2026/05/07-webmachinelearning-minutes.html anssik
18:29:17 <Zakim> Zakim has left #webmachinelearning
18:57:27 <gb> gb has joined #webmachinelearning