13:58:20 <RRSAgent> RRSAgent has joined #webmachinelearning
13:58:25 <RRSAgent> logging to https://www.w3.org/2024/10/17-webmachinelearning-irc
13:58:25 <Zakim> RRSAgent, make logs Public
13:58:26 <Zakim> please title this meeting ("meeting: ..."), anssik
13:58:27 <anssik> Meeting: WebML WG Teleconference – 17 October 2024
13:58:45 <anssik> Chair: Anssi
13:58:49 <AramZS_> AramZS_ has joined #webmachinelearning
13:58:49 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-10-17-wg-agenda.md
13:58:57 <anssik> Scribe: Anssi
13:59:01 <anssik> scribeNick: anssik
13:59:11 <anssik> gb, this is webmachinelearning/webnn
13:59:13 <gb> anssik, OK.
13:59:23 <anssik> Present+ Anssi_Kostiainen
13:59:36 <asully> asully has joined #webmachinelearning
13:59:50 <anssik> Present+ Michael_McCool
13:59:55 <anssik> Present+ Austin_Sullivan
14:00:49 <anssik> Present+ Dwayne_Robinson
14:01:00 <anssik> Present+ Zoltan_Kis
14:01:16 <anssik> Present+ Ningxin_Hu
14:01:34 <dwayner> dwayner has joined #webmachinelearning
14:01:46 <anssik> RRSAgent, draft minutes
14:01:47 <RRSAgent> I have made the request to generate https://www.w3.org/2024/10/17-webmachinelearning-minutes.html anssik
14:01:54 <zkis> zkis has joined #webmachinelearning
14:02:33 <anssik> anssik: I hope you had a great time at TPAC!
14:03:01 <anssik> ... today, we'll resume our bi-weeklies with a refresher on selected resolutions and proposals from F2F, and discuss our next step
14:03:11 <anssik> ... but first, I'd like to welcome our most recent new participants:
14:03:20 <anssik> ... Talha Gorsi and Robert Simpson from Qualcomm
14:03:27 <anssik> ... Sohum Chatterjee from Microsoft
14:03:31 <anssik> ... welcome to the WebML WG!
14:03:55 <ningxin> ningxin has joined #webmachinelearning
14:04:22 <anssik> Topic: WebNN Operator Update Wave 3
14:04:32 <anssik> -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0014/WebNN_Operator_Update_Wave_3.pdf
14:04:37 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#798f
14:04:56 <McCool> McCool has joined #webmachinelearning
14:05:20 <anssik> Resolution: Update spec with Wave 3 operators, initiate int4/uint4 wide review.
14:05:33 <anssik> anssik: Dwayne's Wave 3 plan as documented in his slides received group's support
14:05:45 <anssik> ... the editors can now start formulate the Wave 3 changes as a spec PR
14:06:09 <anssik> ... all-in-one PR or multiple smaller PRs as long as self-contained
14:06:41 <anssik> Dwayne: prefer one PR initially
14:06:47 <ningxin> sgtm
14:07:17 <anssik> Dwayne: everyone on the call has seen the slides, one interesting update, all Wave 3 ops are in Chromium now
14:08:03 <anssik> ... WebNN EP to validate this Chromium work
14:08:10 <anssik> ... 8 remaining ops out of 12 there
14:08:33 <anssik> ... no new ops in mind beyond Wave 3 for now
14:08:47 <anssik> anssik: I think the group should check in with the TAG for the int4/uint4 type
14:09:12 <anssik> ... the plan is to expose these 4-bit types through Int8Array using a packing approach since there's no Int4Array in JS
14:09:16 <anssik> -> https://tc39.es/ecma262/multipage/indexed-collections.html#table-49
14:09:25 <anssik> anssik: we could land our int4/uint4 proposal into the spec and ask TAG to review it, no need to block on the TAG review to land the spec PR given we're doing a CR Draft here
14:09:51 <anssik> McCool: bitwise ops?
14:10:00 <anssik> Dwayne: need to consider CoreML compat
14:10:22 <anssik> ... MikeW shared that it might be something they could add in the future but could take a while, meanwhile need emulation path
14:10:30 <anssik> ... via BNNS or MPS
14:10:39 <anssik> McCool: in theory could pack them on the CPU
14:11:00 <anssik> Present+ Etienne_Noel
14:11:43 <anssik> q?
14:11:58 <anssik> Topic: Quantization and dequantization (QDQ)
14:12:03 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#2842
14:12:15 <anssik> Resolution: Add QDQ operators for int8 and int4 and consolidate #93 #128 #623.
14:12:15 <gb> https://github.com/webmachinelearning/webnn/issues/623 -> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request] [device selection]
14:12:15 <gb> https://github.com/webmachinelearning/webnn/issues/93 -> Issue 93 Add QuantizeLinear and DequantizeLinear for mixed precision (by kpu) [opset] [feature request]
14:12:15 <gb> https://github.com/webmachinelearning/webnn/issues/128 -> Issue 128 WebNN should support int8 quantized models (by wchao1115) [v2] [opset] [feature request]
14:12:40 <anssik> anssik: based on our F2F discussion, the challenge is how to represent int4 on the web and across backends as discussed in context of Wave 3
14:12:54 <anssik> anssik: it seems int4 and uint4 support for quantizeLinear and dequantizeLinear landed very recently in Chromium, any learnings to share?
14:12:58 <anssik> -> https://chromium-review.googlesource.com/c/chromium/src/+/5922495
14:13:38 <anssik> Ningxin: we learned, when we test some models, we need to support int32 for dequantized operator
14:14:12 <anssik> ... we exercise some models with bias in int32
14:15:23 <anssik> Dwayne: the other aspect is the block size consideration, when dequant, have to expand the tensor to full size before passing to WebNN if the higher level has the concept of block size, uses a lot GPU memory
14:15:58 <ningxin> One example model using int32 dequantize operator: https://huggingface.co/webml/models/resolve/main/int8/resnet50-v1-12-qdq.onnx
14:16:05 <anssik> anssik: roll these changes into the Wave 3 PR?
14:16:11 <anssik> Wayne: yes, fold into Wave 3 PR
14:16:58 <anssik> Austin: CoreML backend has some limitations, int4 need to be emulated
14:17:18 <anssik> ... three variant of quantization available
14:17:37 <anssik> Dwayne: the spec as defined, may need some updated after TF and CoreML backend implementation experience
14:17:54 <anssik> Topic: Platform capability detection
14:17:58 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b5cc
14:18:02 <anssik> anssik: issue #463 and PR #755
14:18:03 <gb> https://github.com/webmachinelearning/webnn/pull/755 -> MERGED Pull Request 755 Define opSupportLimits() (by philloooo)
14:18:03 <gb> https://github.com/webmachinelearning/webnn/issues/463 -> CLOSED Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request]
14:18:24 <anssik> anssik: Proposal: Collect real-world feedback from opSupportLimits() usage to inform baseline and max limits.
14:19:06 <anssik> Dwayne: data type support is already in, max limits in remaining work
14:19:48 <anssik> McCool: max dimension limits for tensors?
14:20:07 <anssik> Dwayne: some hardware do have such limits
14:20:25 <anssik> McCool: scatter and gather, how big index is warranted?
14:20:28 <asully> https://www.w3.org/TR/webnn/#valid-dimension
14:20:43 <anssik> Austin: the spec defines a valid dimension
14:21:02 <anssik> ... CoreML and TF indices are int32 primarily
14:21:20 <anssik> ... trying a model with int64 indices would not work on TF or CoreML
14:21:52 <anssik> ... thus int64 indices should not be necessary
14:22:47 <anssik> ... having a common set that works everyone, when indices would have to be int32
14:23:07 <anssik> Dwayne: max dimensions, should they be exposed via opSupportLimits? Or via data types?
14:23:21 <anssik> Austin: max rank should be exposed via opSupportLimits()
14:23:43 <ningxin> q+
14:24:16 <anssik> anssik: does ORT exercise opSupportLimits() API?
14:25:03 <dwayner> "Use opSupportLimits to dynamically check data type support" https://github.com/microsoft/onnxruntime/pull/22025
14:25:03 <gb> https://github.com/microsoft/onnxruntime/pull/22025 -> MERGED Pull Request 22025 [WebNN EP] Use opSupportLimits to dynamically check data type support (by Honry) [ep:WebNN]
14:25:12 <anssik> Ningxin: we have WebNN EP implemented for data types, some ONNX models require int64, then WebNN EP will fallback argMax/Min to CPU
14:26:00 <anssik> ... re maximum rank, would like to prioritize that, because SegmentAnything model has high-rank tensor (>5)
14:26:22 <anssik> ... this model will fail without high-rank tensor
14:26:43 <dwayner> Yes, it's a 6D transpose (pattern found in a few models).
14:27:27 <anssik> Austin: we discussed a minimal set that should be supported everywhere and use opSupportLimits() as an optional feature on top
14:28:06 <anssik> q?
14:28:14 <anssik> Topic: Device selection abstractions
14:28:23 <anssik> Present+ Natasha_Gaitonde
14:28:30 <anssik> -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0006/MLDeviceType.pdf
14:28:35 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b5c1
14:29:11 <anssik> anssik: Proposal: Draft a spec PR to remove MLDeviceType, keep MLPowerPreference. Gauge prototyping interest and impact to frameworks?
14:29:19 <anssik> ... there was support for removing explicit device type from Reilly and Mike
14:29:37 <anssik> ... Rafael notes the Windows ecosystem is more heterogeneous, not having a device selection mechanism harder
14:29:55 <zkis> q+
14:29:57 <anssik> q?
14:30:04 <anssik> ack ningxin
14:31:15 <dwayner> q+
14:31:16 <anssik> ningxin: my open regarding removal is, can we still have an opportunity for the developer to select a particular device type, e.g. CPU, if we leave in the power preference
14:31:17 <asully> q+
14:31:37 <anssik> ack zkis
14:32:14 <anssik> zkis: the question is, do we have other use cases where the application wants to control that certain models want to run on low power?
14:32:40 <anssik> ... we need more use cases where application control is relevant
14:32:49 <anssik> q?
14:33:10 <anssik> ack dwayner
14:33:34 <anssik> Dwayne: Apple's concern was implementability as a hard requirement
14:34:01 <NatashaGaitonde> NatashaGaitonde has joined #webmachinelearning
14:34:06 <anssik> ... anybody want to remove this, would like to consider scenarios like desktops with two GPUs
14:34:18 <anssik> ... or NPUs being faster than GPU, does power preference express that?
14:34:40 <anssik> ... or scenario where GPU is already busy, thus NPU usage preferred for better user experience
14:34:40 <McCool> q?
14:34:55 <anssik> q?
14:35:00 <anssik> ack asully
14:35:44 <anssik> Austin: I think from my perspective, I'd like to see this be a requirement, but would like to go to OT in Chromium in the next couple of milestones and I don't see us changing the semantics of this option before that OT
14:36:06 <anssik> ... from Chrome team's perspective, eager to get this in hands of real users to understand what they feel about this option
14:36:07 <anssik> q?
14:36:50 <anssik> Topic: Google Chrome feedback revisited
14:36:57 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#899d
14:37:01 <anssik> Resolution: Close Chrome feedback #453 as completed.
14:37:02 <gb> https://github.com/webmachinelearning/webnn/issues/453 -> Issue 453 Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability (by vsekhar) [process] [opset] [use case]
14:37:26 <anssik> Austin: I can close soon.
14:37:38 <anssik> Topic: Interop issues across different backends
14:37:43 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#7320
14:37:47 <anssik> -> "interop" issues https://github.com/webmachinelearning/webnn/issues?q=is%3Aissue+is%3Aopen+label%3Ainterop
14:38:05 <anssik> anssik: Proposal: Revisit interop issues e.g. remove pooling's rounding direction from MLPool2dOptions to close #324, decide between clamp or "ignore the index" approach to close #486.
14:38:05 <gb> https://github.com/webmachinelearning/webnn/issues/324 -> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (by huningxin) [feature request] [operator specific] [interop]
14:38:05 <gb> https://github.com/webmachinelearning/webnn/issues/486 -> Issue 486 Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled (by huningxin) [operator specific] [interop]
14:38:15 <anssik> ... there seem to be those few issues where we have an agreement on the solution
14:38:25 <anssik> ... Ningxin, any other interop issues where the group feedback is required to make progress?
14:39:10 <anssik> Ningxin: interop issues in Wave 3, Jianwei reported gather elements on TFLite, no direct mapping, not an interop issue per se, but needs an emulation path for those backends
14:39:54 <anssik> ... I'll check with Jiawei, currently documented in the Transformers issue #375
14:39:55 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [opset]
14:40:24 <anssik> Austin: prefer a separate issue
14:40:30 <anssik> q?
14:40:38 <anssik> Topic: Core operator set
14:40:42 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#23a8
14:41:02 <anssik> anssik: Proposal: Document requirements for adding "new core op" and "non-core op", (consider e.g. TOSA, MLIR linalg), categorize ops in spec.
14:41:15 <anssik> anssik: issue #573
14:41:16 <gb> https://github.com/webmachinelearning/webnn/issues/573 -> Issue 573 Core operator set (by philloooo) [question] [opset]
14:41:39 <anssik> ... per F2F discussion, it looks like the group had consensus to document what it takes to add an operator to the spec
14:41:39 <anssik> ... and see if the guidelines for adding "high-level" (decomposable) vs "low-level" ops should be different
14:41:54 <anssik> -> https://github.com/webmachinelearning/webnn/blob/main/CONTRIBUTING.md#proposing-and-adding-a-new-operation
14:43:03 <anssik> Dwayne: we wanted to define an op low-level if it cannot be decomposed further
14:44:03 <anssik> ningxin: I agree, as discussed in Wave 3 we probably want to compate TOSA, linalg to find gaps and useful primitives WebNN spec is missing and also address high-level ops, see if some can be removed when expressable with primitives
14:44:15 <anssik> ... optimized implementation via fusion
14:44:36 <anssik> ... custom ops using primitives
14:45:54 <anssik> anssik: are the existing guidelines still valid?
14:46:00 <anssik> Dwayne: seem useful to me
14:46:15 <anssik> Austin: hoping one day we can remove the guidelines because we have all we need :-)
14:46:26 <anssik> q?
14:46:37 <anssik> Topic: MLTensor
14:46:38 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#e5b7
14:46:49 <anssik> anssik: Proposal: Merge explainer PR with an agreement MLDeviceType changes to impact buffer allocation.
14:47:00 <anssik> ... it's great to see Corentin from WebGPU group continue review the MLTensor explainer and provide insights
14:47:06 <anssik> ... Austin, what are the remaining open questions from Corentin?
14:48:35 <anssik> Austin: remaining thing to address, from WebNN perspective allocating a tensor is opaque, easy guarantee to give given reading and writing, if you hand out the buffer to WebGPU cannot give the same guarantee
14:49:00 <anssik> ... agreed with Corentin we need to expose the layout of the buffer to the developer, will work on incorporating that feedback into the PR
14:49:00 <anssik> q?
14:49:12 <anssik> Dwayne: strides between dimensions can have gaps?
14:49:34 <anssik> Austin: potentially, I think naively, what does exposing a layout mean, one block, then expose just as a big array
14:49:44 <anssik> Dwayne: subtiling or blocks
14:49:58 <anssik> Austin: something WebGPU would need to know
14:50:16 <anssik> ... IIUC on Windows side it's always a big block
14:50:22 <anssik> Dwayne: it's always linear on Windows
14:51:03 <anssik> Austin: I'm hoping the same assumption holds on CoreML on Mac, if not, we need to do some further design work
14:51:23 <anssik> ... the challenge is what is the buffer WebGPU is given and how to read and write to it
14:51:25 <anssik> q?
14:51:46 <anssik> anssik: there's a path forward, that's great
14:51:57 <anssik> Austin: hope is we can merge the PR soon
14:52:14 <anssik> q?
14:52:25 <anssik> Topic: Wide review: TAG
14:52:29 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#cdd5
14:52:41 <anssik> Resolution: Add resource contention considerations to the spec to address TAG review feedback.
14:52:47 <anssik> anssik: I pushed a PR #765 to add resource contention considerations to the spec
14:52:47 <gb> https://github.com/webmachinelearning/webnn/pull/765 -> Pull Request 765 Add resource contention considerations (by anssiko)
14:53:00 <anssik> anssik: thanks Reilly for your review
14:53:21 <anssik> Topic: Tensor primitives
14:53:25 <anssik> -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0007/Tensor_Primitive_Ops_Proposal_-_TPAC.pdf
14:53:29 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#b039
14:53:44 <anssik> anssik: Proposal: Continue explore authoring high-level ops with tensor primitives.
14:54:00 <anssik> ... the goal to demonstrate we can composite custom ops using unified tensor-level primitives
14:54:16 <anssik> ... Ningxin, would you like to get group's input on the proof-of-concept direction?
14:54:33 <zkis> q+
14:54:53 <anssik> ningxin: this topic is synergistic with the core op set discussion
14:54:55 <zkis> q-
14:55:25 <anssik> zkis: the questions about composability, how can we optimize memory structure
14:55:27 <anssik> q?
14:55:37 <ningxin> will do
14:55:42 <anssik> Topic: Translation and Prompt APIs
14:55:46 <anssik> -> Slides https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0008/TPAC_2024_Built-in_AI_APIs.pdf
14:55:51 <anssik> -> F2F minutes https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#20fb
14:56:04 <anssik> anssik: Proposal: Solicit input on whether to adopt these APIs as new deliverables in the WG's Charter 2025->
14:56:15 <anssik> ... would like to hear early signals for adopting the Translation and Prompt APIs into the WG
14:56:23 <anssik> ... official W3C-wide review would happen in Q1'25 when we recharter
14:56:38 <zkis> in the WG or the CG? what is the preference?
14:57:19 <anssik> anssik: editor preference was the WG
14:59:01 <etienne> I'll be leading this area at Google so would love for this area to be part of this group.
15:00:00 <zkis> q+ Natasha
15:00:28 <anssik> Etienne: initially proposing Translation and Prompt APIs for the WG adoption
15:00:37 <anssik> ack Natasha
15:01:14 <anssik> Natasha: want to understand customer requirements, in terms of standardization this forum in important for getting feedback from other vendors
15:01:28 <McCool> q+
15:01:36 <anssik> ... another feature would be Storage, how origins could share or not share access to these large models
15:01:38 <anssik> q?
15:02:09 <anssik> ack McCool
15:03:11 <anssik> -> https://developers.google.com/privacy-sandbox/cookies/related-website-sets
15:04:03 <anssik> q?
15:07:03 <anssik> anssik: hearing initial support from Google and Microsoft for adopting Translation and Prompt APIs in the WebML WG, we will solicit more input from other WG participants before we initiate official rechartering and W3C-wide review in Q1'25
15:07:11 <anssik> RRSAgent, draft minutes
15:07:12 <RRSAgent> I have made the request to generate https://www.w3.org/2024/10/17-webmachinelearning-minutes.html anssik
15:10:46 <anssik> s/some updated/some updates
15:11:33 <anssik> s/works everyone/works everywhere
15:17:18 <anssik> s/Jiawei/Jiewei/g
15:17:45 <anssik> s/Jianwei/Jiewei/g
15:22:42 <anssik> RRSAgent, draft minutes
15:22:44 <RRSAgent> I have made the request to generate https://www.w3.org/2024/10/17-webmachinelearning-minutes.html anssik
17:05:43 <Zakim> Zakim has left #webmachinelearning