13:58:20 RRSAgent has joined #webmachinelearning 13:58:25 logging to https://www.w3.org/2024/10/17-webmachinelearning-irc 13:58:25 RRSAgent, make logs Public 13:58:26 please title this meeting ("meeting: ..."), anssik 13:58:27 Meeting: WebML WG Teleconference – 17 October 2024 13:58:45 Chair: Anssi 13:58:49 AramZS_ has joined #webmachinelearning 13:58:49 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-10-17-wg-agenda.md 13:58:57 Scribe: Anssi 13:59:01 scribeNick: anssik 13:59:11 gb, this is webmachinelearning/webnn 13:59:13 anssik, OK. 13:59:23 Present+ Anssi_Kostiainen 13:59:36 asully has joined #webmachinelearning 13:59:50 Present+ Michael_McCool 13:59:55 Present+ Austin_Sullivan 14:00:49 Present+ Dwayne_Robinson 14:01:00 Present+ Zoltan_Kis 14:01:16 Present+ Ningxin_Hu 14:01:34 dwayner has joined #webmachinelearning 14:01:46 RRSAgent, draft minutes 14:01:47 I have made the request to generate https://www.w3.org/2024/10/17-webmachinelearning-minutes.html anssik 14:01:54 zkis has joined #webmachinelearning 14:02:33 anssik: I hope you had a great time at TPAC! 14:03:01 ... today, we'll resume our bi-weeklies with a refresher on selected resolutions and proposals from F2F, and discuss our next step 14:03:11 ... but first, I'd like to welcome our most recent new participants: 14:03:20 ... Talha Gorsi and Robert Simpson from Qualcomm 14:03:27 ... Sohum Chatterjee from Microsoft 14:03:31 ... welcome to the WebML WG! 14:03:55 ningxin has joined #webmachinelearning 14:04:22 Topic: WebNN Operator Update Wave 3 14:04:32 -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0014/WebNN_Operator_Update_Wave_3.pdf 14:04:37 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#798f 14:04:56 McCool has joined #webmachinelearning 14:05:20 Resolution: Update spec with Wave 3 operators, initiate int4/uint4 wide review. 14:05:33 anssik: Dwayne's Wave 3 plan as documented in his slides received group's support 14:05:45 ... the editors can now start formulate the Wave 3 changes as a spec PR 14:06:09 ... all-in-one PR or multiple smaller PRs as long as self-contained 14:06:41 Dwayne: prefer one PR initially 14:06:47 sgtm 14:07:17 Dwayne: everyone on the call has seen the slides, one interesting update, all Wave 3 ops are in Chromium now 14:08:03 ... WebNN EP to validate this Chromium work 14:08:10 ... 8 remaining ops out of 12 there 14:08:33 ... no new ops in mind beyond Wave 3 for now 14:08:47 anssik: I think the group should check in with the TAG for the int4/uint4 type 14:09:12 ... the plan is to expose these 4-bit types through Int8Array using a packing approach since there's no Int4Array in JS 14:09:16 -> https://tc39.es/ecma262/multipage/indexed-collections.html#table-49 14:09:25 anssik: we could land our int4/uint4 proposal into the spec and ask TAG to review it, no need to block on the TAG review to land the spec PR given we're doing a CR Draft here 14:09:51 McCool: bitwise ops? 14:10:00 Dwayne: need to consider CoreML compat 14:10:22 ... MikeW shared that it might be something they could add in the future but could take a while, meanwhile need emulation path 14:10:30 ... via BNNS or MPS 14:10:39 McCool: in theory could pack them on the CPU 14:11:00 Present+ Etienne_Noel 14:11:43 q? 14:11:58 Topic: Quantization and dequantization (QDQ) 14:12:03 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#2842 14:12:15 Resolution: Add QDQ operators for int8 and int4 and consolidate #93 #128 #623. 14:12:15 https://github.com/webmachinelearning/webnn/issues/623 -> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request] [device selection] 14:12:15 https://github.com/webmachinelearning/webnn/issues/93 -> Issue 93 Add QuantizeLinear and DequantizeLinear for mixed precision (by kpu) [opset] [feature request] 14:12:15 https://github.com/webmachinelearning/webnn/issues/128 -> Issue 128 WebNN should support int8 quantized models (by wchao1115) [v2] [opset] [feature request] 14:12:40 anssik: based on our F2F discussion, the challenge is how to represent int4 on the web and across backends as discussed in context of Wave 3 14:12:54 anssik: it seems int4 and uint4 support for quantizeLinear and dequantizeLinear landed very recently in Chromium, any learnings to share? 14:12:58 -> https://chromium-review.googlesource.com/c/chromium/src/+/5922495 14:13:38 Ningxin: we learned, when we test some models, we need to support int32 for dequantized operator 14:14:12 ... we exercise some models with bias in int32 14:15:23 Dwayne: the other aspect is the block size consideration, when dequant, have to expand the tensor to full size before passing to WebNN if the higher level has the concept of block size, uses a lot GPU memory 14:15:58 One example model using int32 dequantize operator: https://huggingface.co/webml/models/resolve/main/int8/resnet50-v1-12-qdq.onnx 14:16:05 anssik: roll these changes into the Wave 3 PR? 14:16:11 Wayne: yes, fold into Wave 3 PR 14:16:58 Austin: CoreML backend has some limitations, int4 need to be emulated 14:17:18 ... three variant of quantization available 14:17:37 Dwayne: the spec as defined, may need some updated after TF and CoreML backend implementation experience 14:17:54 Topic: Platform capability detection 14:17:58 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b5cc 14:18:02 anssik: issue #463 and PR #755 14:18:03 https://github.com/webmachinelearning/webnn/pull/755 -> MERGED Pull Request 755 Define opSupportLimits() (by philloooo) 14:18:03 https://github.com/webmachinelearning/webnn/issues/463 -> CLOSED Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request] 14:18:24 anssik: Proposal: Collect real-world feedback from opSupportLimits() usage to inform baseline and max limits. 14:19:06 Dwayne: data type support is already in, max limits in remaining work 14:19:48 McCool: max dimension limits for tensors? 14:20:07 Dwayne: some hardware do have such limits 14:20:25 McCool: scatter and gather, how big index is warranted? 14:20:28 https://www.w3.org/TR/webnn/#valid-dimension 14:20:43 Austin: the spec defines a valid dimension 14:21:02 ... CoreML and TF indices are int32 primarily 14:21:20 ... trying a model with int64 indices would not work on TF or CoreML 14:21:52 ... thus int64 indices should not be necessary 14:22:47 ... having a common set that works everyone, when indices would have to be int32 14:23:07 Dwayne: max dimensions, should they be exposed via opSupportLimits? Or via data types? 14:23:21 Austin: max rank should be exposed via opSupportLimits() 14:23:43 q+ 14:24:16 anssik: does ORT exercise opSupportLimits() API? 14:25:03 "Use opSupportLimits to dynamically check data type support" https://github.com/microsoft/onnxruntime/pull/22025 14:25:03 https://github.com/microsoft/onnxruntime/pull/22025 -> MERGED Pull Request 22025 [WebNN EP] Use opSupportLimits to dynamically check data type support (by Honry) [ep:WebNN] 14:25:12 Ningxin: we have WebNN EP implemented for data types, some ONNX models require int64, then WebNN EP will fallback argMax/Min to CPU 14:26:00 ... re maximum rank, would like to prioritize that, because SegmentAnything model has high-rank tensor (>5) 14:26:22 ... this model will fail without high-rank tensor 14:26:43 Yes, it's a 6D transpose (pattern found in a few models). 14:27:27 Austin: we discussed a minimal set that should be supported everywhere and use opSupportLimits() as an optional feature on top 14:28:06 q? 14:28:14 Topic: Device selection abstractions 14:28:23 Present+ Natasha_Gaitonde 14:28:30 -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0006/MLDeviceType.pdf 14:28:35 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b5c1 14:29:11 anssik: Proposal: Draft a spec PR to remove MLDeviceType, keep MLPowerPreference. Gauge prototyping interest and impact to frameworks? 14:29:19 ... there was support for removing explicit device type from Reilly and Mike 14:29:37 ... Rafael notes the Windows ecosystem is more heterogeneous, not having a device selection mechanism harder 14:29:55 q+ 14:29:57 q? 14:30:04 ack ningxin 14:31:15 q+ 14:31:16 ningxin: my open regarding removal is, can we still have an opportunity for the developer to select a particular device type, e.g. CPU, if we leave in the power preference 14:31:17 q+ 14:31:37 ack zkis 14:32:14 zkis: the question is, do we have other use cases where the application wants to control that certain models want to run on low power? 14:32:40 ... we need more use cases where application control is relevant 14:32:49 q? 14:33:10 ack dwayner 14:33:34 Dwayne: Apple's concern was implementability as a hard requirement 14:34:01 NatashaGaitonde has joined #webmachinelearning 14:34:06 ... anybody want to remove this, would like to consider scenarios like desktops with two GPUs 14:34:18 ... or NPUs being faster than GPU, does power preference express that? 14:34:40 ... or scenario where GPU is already busy, thus NPU usage preferred for better user experience 14:34:40 q? 14:34:55 q? 14:35:00 ack asully 14:35:44 Austin: I think from my perspective, I'd like to see this be a requirement, but would like to go to OT in Chromium in the next couple of milestones and I don't see us changing the semantics of this option before that OT 14:36:06 ... from Chrome team's perspective, eager to get this in hands of real users to understand what they feel about this option 14:36:07 q? 14:36:50 Topic: Google Chrome feedback revisited 14:36:57 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#899d 14:37:01 Resolution: Close Chrome feedback #453 as completed. 14:37:02 https://github.com/webmachinelearning/webnn/issues/453 -> Issue 453 Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability (by vsekhar) [process] [opset] [use case] 14:37:26 Austin: I can close soon. 14:37:38 Topic: Interop issues across different backends 14:37:43 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#7320 14:37:47 -> "interop" issues https://github.com/webmachinelearning/webnn/issues?q=is%3Aissue+is%3Aopen+label%3Ainterop 14:38:05 anssik: Proposal: Revisit interop issues e.g. remove pooling's rounding direction from MLPool2dOptions to close #324, decide between clamp or "ignore the index" approach to close #486. 14:38:05 https://github.com/webmachinelearning/webnn/issues/324 -> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (by huningxin) [feature request] [operator specific] [interop] 14:38:05 https://github.com/webmachinelearning/webnn/issues/486 -> Issue 486 Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled (by huningxin) [operator specific] [interop] 14:38:15 ... there seem to be those few issues where we have an agreement on the solution 14:38:25 ... Ningxin, any other interop issues where the group feedback is required to make progress? 14:39:10 Ningxin: interop issues in Wave 3, Jianwei reported gather elements on TFLite, no direct mapping, not an interop issue per se, but needs an emulation path for those backends 14:39:54 ... I'll check with Jiawei, currently documented in the Transformers issue #375 14:39:55 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [opset] 14:40:24 Austin: prefer a separate issue 14:40:30 q? 14:40:38 Topic: Core operator set 14:40:42 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#23a8 14:41:02 anssik: Proposal: Document requirements for adding "new core op" and "non-core op", (consider e.g. TOSA, MLIR linalg), categorize ops in spec. 14:41:15 anssik: issue #573 14:41:16 https://github.com/webmachinelearning/webnn/issues/573 -> Issue 573 Core operator set (by philloooo) [question] [opset] 14:41:39 ... per F2F discussion, it looks like the group had consensus to document what it takes to add an operator to the spec 14:41:39 ... and see if the guidelines for adding "high-level" (decomposable) vs "low-level" ops should be different 14:41:54 -> https://github.com/webmachinelearning/webnn/blob/main/CONTRIBUTING.md#proposing-and-adding-a-new-operation 14:43:03 Dwayne: we wanted to define an op low-level if it cannot be decomposed further 14:44:03 ningxin: I agree, as discussed in Wave 3 we probably want to compate TOSA, linalg to find gaps and useful primitives WebNN spec is missing and also address high-level ops, see if some can be removed when expressable with primitives 14:44:15 ... optimized implementation via fusion 14:44:36 ... custom ops using primitives 14:45:54 anssik: are the existing guidelines still valid? 14:46:00 Dwayne: seem useful to me 14:46:15 Austin: hoping one day we can remove the guidelines because we have all we need :-) 14:46:26 q? 14:46:37 Topic: MLTensor 14:46:38 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#e5b7 14:46:49 anssik: Proposal: Merge explainer PR with an agreement MLDeviceType changes to impact buffer allocation. 14:47:00 ... it's great to see Corentin from WebGPU group continue review the MLTensor explainer and provide insights 14:47:06 ... Austin, what are the remaining open questions from Corentin? 14:48:35 Austin: remaining thing to address, from WebNN perspective allocating a tensor is opaque, easy guarantee to give given reading and writing, if you hand out the buffer to WebGPU cannot give the same guarantee 14:49:00 ... agreed with Corentin we need to expose the layout of the buffer to the developer, will work on incorporating that feedback into the PR 14:49:00 q? 14:49:12 Dwayne: strides between dimensions can have gaps? 14:49:34 Austin: potentially, I think naively, what does exposing a layout mean, one block, then expose just as a big array 14:49:44 Dwayne: subtiling or blocks 14:49:58 Austin: something WebGPU would need to know 14:50:16 ... IIUC on Windows side it's always a big block 14:50:22 Dwayne: it's always linear on Windows 14:51:03 Austin: I'm hoping the same assumption holds on CoreML on Mac, if not, we need to do some further design work 14:51:23 ... the challenge is what is the buffer WebGPU is given and how to read and write to it 14:51:25 q? 14:51:46 anssik: there's a path forward, that's great 14:51:57 Austin: hope is we can merge the PR soon 14:52:14 q? 14:52:25 Topic: Wide review: TAG 14:52:29 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#cdd5 14:52:41 Resolution: Add resource contention considerations to the spec to address TAG review feedback. 14:52:47 anssik: I pushed a PR #765 to add resource contention considerations to the spec 14:52:47 https://github.com/webmachinelearning/webnn/pull/765 -> Pull Request 765 Add resource contention considerations (by anssiko) 14:53:00 anssik: thanks Reilly for your review 14:53:21 Topic: Tensor primitives 14:53:25 -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0007/Tensor_Primitive_Ops_Proposal_-_TPAC.pdf 14:53:29 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b039 14:53:44 anssik: Proposal: Continue explore authoring high-level ops with tensor primitives. 14:54:00 ... the goal to demonstrate we can composite custom ops using unified tensor-level primitives 14:54:16 ... Ningxin, would you like to get group's input on the proof-of-concept direction? 14:54:33 q+ 14:54:53 ningxin: this topic is synergistic with the core op set discussion 14:54:55 q- 14:55:25 zkis: the questions about composability, how can we optimize memory structure 14:55:27 q? 14:55:37 will do 14:55:42 Topic: Translation and Prompt APIs 14:55:46 -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0008/TPAC_2024_Built-in_AI_APIs.pdf 14:55:51 -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#20fb 14:56:04 anssik: Proposal: Solicit input on whether to adopt these APIs as new deliverables in the WG's Charter 2025-> 14:56:15 ... would like to hear early signals for adopting the Translation and Prompt APIs into the WG 14:56:23 ... official W3C-wide review would happen in Q1'25 when we recharter 14:56:38 in the WG or the CG? what is the preference? 14:57:19 anssik: editor preference was the WG 14:59:01 I'll be leading this area at Google so would love for this area to be part of this group. 15:00:00 q+ Natasha 15:00:28 Etienne: initially proposing Translation and Prompt APIs for the WG adoption 15:00:37 ack Natasha 15:01:14 Natasha: want to understand customer requirements, in terms of standardization this forum in important for getting feedback from other vendors 15:01:28 q+ 15:01:36 ... another feature would be Storage, how origins could share or not share access to these large models 15:01:38 q? 15:02:09 ack McCool 15:03:11 -> https://developers.google.com/privacy-sandbox/cookies/related-website-sets 15:04:03 q? 15:07:03 anssik: hearing initial support from Google and Microsoft for adopting Translation and Prompt APIs in the WebML WG, we will solicit more input from other WG participants before we initiate official rechartering and W3C-wide review in Q1'25 15:07:11 RRSAgent, draft minutes 15:07:12 I have made the request to generate https://www.w3.org/2024/10/17-webmachinelearning-minutes.html anssik 15:10:46 s/some updated/some updates 15:11:33 s/works everyone/works everywhere 15:17:18 s/Jiawei/Jiewei/g 15:17:45 s/Jianwei/Jiewei/g 15:22:42 RRSAgent, draft minutes 15:22:44 I have made the request to generate https://www.w3.org/2024/10/17-webmachinelearning-minutes.html anssik 17:05:43 Zakim has left #webmachinelearning