14:07:48 RRSAgent has joined #webmachinelearning 14:07:52 logging to https://www.w3.org/2023/11/02-webmachinelearning-irc 14:07:52 RRSAgent, make logs Public 14:07:53 please title this meeting ("meeting: ..."), anssik 14:07:53 Meeting: WebML WG Teleconference – 2 November 2023 14:07:57 Chair: Anssi 14:08:01 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-11-02-wg-agenda.md 14:08:05 Scribe: Anssi 14:08:11 scribeNick: anssik 14:08:21 gb, this is webmachinelearning/webnn 14:08:21 anssik, OK. 14:08:25 Present+ Anssi_Kostiainen 14:08:33 Regrets+ Dominique_Hazael-Massieux 14:08:43 Present+ Joshua_Lochner 14:08:48 Present+ Ningxin_Hu 14:08:54 Present+ Etienne_Noel 14:08:59 Present+ Rafael_Cintron 14:09:05 Present+ Dwayne_Robinson 14:09:12 Present+ Chai_Chaoweeraprasit 14:09:16 Present+ Vivek_Sekhar 14:09:17 and Joshua Bell :) 14:09:31 Present+ Joshua_Bell 14:09:38 RRSAgent, draft minutes 14:09:39 I have made the request to generate https://www.w3.org/2023/11/02-webmachinelearning-minutes.html anssik 14:10:03 dwayner has joined #webmachinelearning 14:10:17 Topic: WebNN v2: Review transformer ops spec contributions (continued) 14:10:17 anssik: issue #375 14:10:18 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 14:10:43 ... continuing from our previous call, recapping with some resources first 14:10:55 -> https://github.com/webmachinelearning/webnn/blob/main/CONTRIBUTING.md#proposing-and-adding-a-new-operation Guidelines for adding new operations 14:11:07 Etienne has joined #webmachinelearning 14:11:07 ... we have identified use cases and sample models: 14:11:17 -> https://huggingface.co/runwayml/stable-diffusion-v1-5 Text-to-image: stable-diffusion-v1-5 14:11:17 -> https://github.com/facebookresearch/segment-anything Image segmentation: segment-anything 14:11:17 -> https://huggingface.co/openai/whisper-tiny Speech-to-text: whisper-tiny 14:11:17 -> https://huggingface.co/optimum/t5-small Text-to-text generation (encoder-decoder): t5-small 14:11:17 -> https://huggingface.co/facebook/m2m100_418M Text-to-text generation (encoder-decoder): m2m100_418M 14:11:19 -> https://huggingface.co/meta-llama/Llama-2-7b Text-generation (decoder-only): Llama-2-7b 14:11:35 ... we have done an op decomposition assessment: 14:11:40 -> https://docs.google.com/spreadsheets/d/1ELfHuv2UqP2LoXWLgqsC0L8T_qqfBx48KxzFighl8d8/ Transformer Models Analysis) 14:12:02 ... Chai, you indicated you've been working on a big PR for the new op spec definitions 14:12:09 ... anything you'd like to bring to the group's attention in this meeting, or questions to ask from the group? 14:12:33 ... I'm wondering whether it'd help if the big PR is split into smaller PRs so those pieces could be reviewed in parallel while new ops are still being defined? Or are there dependencies that make this not practical? 14:12:34 q? 14:12:50 q? 14:13:17 Chai: this PR is huge change, but mostly additions, no breaking changes yet, still working on it, past midpoint 14:13:55 ... some comments, the proposal makes sense, is a natural extension for what we have, there was a suggestion to combine a few ops that have nothing to do with the data itself, but update its shape 14:14:41 ... e.g. squeeze, unsqueeze, flatten2d 14:15:37 ... the other part is normalization, a good design example of design principles we have 14:16:40 ... normalization ops are similar 14:17:32 ... my preference to keep things in one PR, want to keep things consistent 14:18:55 anssik: the PR does not need to be perfect, we'll review it in the group and perfect it together 14:19:29 q? 14:19:42 q+ 14:20:42 ack Ningxin_Hu 14:21:03 Ningxin_Hu: update about Transformer Models Analysis 14:21:29 ... int8 quantized models for whisper-tiny have been added, encoder and decoder columns for int8 14:21:43 q+ 14:21:49 ... feedback from the community was quantized model is used in production 14:22:10 ... four new ops used in the quantized model, conv integer, matmul integer 14:23:07 ... dequantized liner, dynamic quantize linear 14:23:18 q+ 14:23:21 q? 14:23:24 ack Joshua_Lochner 14:24:07 Joshua_Lochner: context for 8-bit weights, people wanted to use smaller weight with Transformers.js 14:24:33 https://huggingface.co/spaces/Xenova/whisper-web 14:24:39 https://i.imgur.com/oPCEtXs.png 14:24:39 ... I experimented with this, text-to-speech not good perf, other way around speech-to-text performs well 14:25:07 ... as an example, whisper-tiny is 40 MB with 8-bit quantized 14:25:27 ... very little quality degradation, able to run on mobile and low resource computers, surprising results 14:26:05 https://github.com/huggingface/distil-whisper 14:26:17 ... another thing, distilled version of whisper-medium and whisper-large to be released in an hour 14:26:38 ... paper is also ready 14:28:07 q? 14:28:16 ack Chai 14:28:33 Chai: noting the transformers ops will not inclulde quantized ops yet in the big PR 14:28:57 ... regarding the size of the spec, it is pretty big, thinking factoring out some section to a more suitable medium 14:29:22 ... e.g. implementation sections are about parameter validation, some parts covered by WPT tests 14:29:57 ... I feel as the spec grows, it will be more important to keep only the relevant part of the ops in the document, and offload some of this validation in WPT in a more complete way 14:29:57 q? 14:30:38 q+ 14:33:50 Chai: implementation sections could be offloaded, e.g. param validation 14:33:51 q? 14:33:54 ack Joshua_Lochner 14:34:41 Joshua_Lochner: these new models use the same architecture as the already existing, the distilled versions being released, internal decoder layers are removed, no new ops added 14:35:13 ... any op in whisper-tiny encoder or decoder will be the same for 8-bit quantized versions of the models 14:35:14 q? 14:35:47 Topic: Enhancements 14:35:57 Subtopic: 0D scalars 14:36:00 anssik: #390 14:36:01 https://github.com/webmachinelearning/webnn/issues/390 -> Issue 390 0D scalars (by fdwr) 14:36:09 ... zero-dimension scalars, we discussed this earlier this year and I'd like to drive this to resolution 14:36:27 ... Dwayne identified a need for zero-dimension scalars while prototyping additional models we're now discussing 14:36:43 ... there's data that suggests this enhancement should be baked into the API spec, consider: 14:36:58 ... - every ML library represents 0D scalars via the shape e.g. Numpy, TF, PyTorch, ONNX, XNNPACK, SafeTensors file format too 14:37:13 ... - the proposed spec enhancement is very small, deletion of one line in https://www.w3.org/TR/webnn/#api-mloperand-create check dimensions steps: 14:37:20 ... "2. If dimensions.length is 0, return false." 14:37:40 ... recently two related proposed changes were suggested for consideration by Nignxin if we add this 0D scalar support: 14:37:55 ... - make MLOperandDescriptor.dimensions a required field to allow distinguish the scalar and 1D tensor 14:38:08 ... - drop the scalar-specific MLGraphBuilder.constant(value, dataType) variant, discussed in issue #475 14:38:08 https://github.com/webmachinelearning/webnn/issues/475 -> Issue 475 Remove `builder.constant(value, dataType)` variant (by huningxin) 14:38:30 ... it looks like the first (make MLOperandDescriptor.dimensions a required field) should be rolled into the PR that adds support for 0D scalars (deletes the one line from check dimensions steps), correct? 14:38:34 ... issue #475 to be addressed in a separate PR? 14:38:57 q? 14:39:20 Dwayne: I sent a PR for this, not an enhancement per se, but the wording needs to be fixed 14:39:38 PR #476 14:39:39 https://github.com/webmachinelearning/webnn/issues/476 -> Pull Request 476 Fix dimensions for 0D scalars (by fdwr) 14:39:49 q? 14:40:17 RRSAgent, draft minutes 14:40:18 I have made the request to generate https://www.w3.org/2023/11/02-webmachinelearning-minutes.html anssik 14:40:28 Subtopic: Softmax axis absent 14:40:33 anssik: issue #466 14:40:34 https://github.com/webmachinelearning/webnn/issues/466 -> Issue 466 Softmax axis absent (by fdwr) 14:40:59 ... issue reported by Wanming (thanks!) in ONNX Runtime WebNN EP review 14:41:16 AramZS has joined #webmachinelearning 14:41:19 ... this is another great example of how our diverse impl experience informs the spec development, including feedback from web engine and framework implementations 14:41:40 ... Dwayne summarized the issue well, TF/PT/ONNX all take an axis parameter, but WebNN's softmax does not 14:41:46 ... platform support is good: 14:41:54 ... - Apple Metal Performance Shaders softMax has an axis 14:42:03 ... - Apple Model Intermediate Language (MIL) activation.softmax supports an axis 14:42:09 ... - DirectML's DML_ACTIVATION_SOFTMAX1_OPERATOR_DESC supports an arbitrary axis list and dimensions 14:42:19 ... to be investigated is XNNPACK, currently limited to 2D input 14:42:42 ... Dwayne proposed two solutions: 1) update XNNPACK to accept an axis or 2) use existing XNNPACK operator plus a reshape or transpose 14:42:51 q+ 14:42:52 ... Dwayne also proposes the required IDL changes in the issue 14:42:55 q? 14:43:02 ack Chai 14:43:50 q+ 14:43:55 Chai: proposing we tackle these issues from impl and prototypes 14:43:56 q? 14:43:58 ack Ningxin_Hu 14:44:52 Ningxin_Hu: question to Chrome folks, can we send a CL to prototype this and inform the spec effort+ 14:44:56 q? 14:45:30 Joshua_Bell: fine with that 14:45:57 sounds great, thanks! 14:46:29 anssik: good to move ahead with a CL to help inform the spec design 14:46:37 q? 14:46:39 RRSAgent, draft minutes 14:46:41 I have made the request to generate https://www.w3.org/2023/11/02-webmachinelearning-minutes.html anssik 14:46:47 Subtopic: split() into sizes not widely supported 14:46:51 anssik: issue #392 14:46:52 https://github.com/webmachinelearning/webnn/issues/392 -> Issue 392 `split` into sizes are not widely supported (by huningxin) 14:46:59 ... this is an issue raised by Jiawei in implementation review (thanks!) 14:47:09 ... WebNN split supports two variants: 14:47:19 ... - "number of split" (when splits argument is an unsigned long) 14:47:27 ... - "split into sizes" (when splits argument is a sequence) 14:47:37 ... however, some backends such as XNNPACK don't support "split into sizes" 14:47:45 ... possible solutions proposed by Ningxin: 14:47:57 ... - decompose split into multiple slices (per informative emulation path in the spec, see https://www.w3.org/TR/webnn/#api-mlgraphbuilder-split) 14:48:07 ... - throw an error and leave the framework to handle 14:48:18 ... Dwayne notes many frameworks support variable length splits: TensorFlow, PyTorch, ONNX 14:48:25 ... and provides the following options expanding upon Ningxin's initial proposal: 14:48:31 ... 1. Keep dedicated split 14:48:37 ... 2. Backend decomposes variable lengths into slice calls 14:48:44 ... 3. Front end decomposes variable lengths into slice calls 14:48:49 ... 4. Throw error if backend doesn't support them 14:48:54 RRSAgent, draft minutes 14:48:56 I have made the request to generate https://www.w3.org/2023/11/02-webmachinelearning-minutes.html anssik 14:49:35 Dwayne: Does XNNPACK support variable size windows for concat, which is splits' symmetric pair operator? 14:50:06 q? 14:50:15 q+ 14:50:18 ack Ningxin_Hu 14:50:43 https://bugs.chromium.org/p/chromium/issues/detail?id=1492036 14:50:52 Ningxin_Hu: we also opened another Chromium issue to benchmark split vs slice for DirectML backend, this will inform the spec design decision 14:51:41 ... decomp split may have an overhead, we want to understand that better, get data from the benchmark 14:52:12 q? 14:52:39 Subtopic: Clarify the restriction for minValue and maxValue of MLClampOptions 14:52:45 anssik: issue #396 14:52:46 https://github.com/webmachinelearning/webnn/issues/396 -> Issue 396 Clarify the restriction for `minValue` and `maxValue` of `MLClampOptions` (by huningxin) 14:53:18 ... Current behavior: "WebNN clamp limits the input tensor element-wise within a range specified by the minimum and maximum values. The minimum and maximum values could be specified by MLClampOptions" 14:53:33 ... Google Security team's Alex asked "can min & max be == floats and is that ok or not?" 14:53:47 ... The current Chromium implementation requires "min <= max" 14:53:51 ... TF.js implements this restriction too 14:54:04 ... however XNNPACK is stricter than that and requires "min < max" 14:54:29 ... so the WebNN spec needs to clarify this restriction, Dwayne provided data on how other implementations are doing this, TF, PyTorch, ONNX, C++, DirectML 14:54:38 ... and the result is all frameworks accept inverted ranges 14:54:44 ... proposed solutions: 14:54:56 ... 1) accept inverted ranges (and impl adjust min/max values before passing to backend) 14:55:08 ... 2) reject min > max, let caller adjust min/max values before calling 14:55:27 q? 14:56:08 Dwayne: this issue implies if backend does not support something, we should limit WebNN similarly 14:56:45 ... there are ways WebNN can have one policy and backends can still implement them 14:56:47 q+ 14:56:48 q+ 14:56:55 q+ 14:56:58 q? 14:57:04 ack Ningxin_Hu 14:57:54 Ningxin_Hu: frameworks behave differently, does that mean we should avoid different behavior? 14:58:39 Dwayne: first I want to check if this issue is about empty range or inverted range? 14:58:50 Ningxin_Hu: inverted range as in the examples 14:59:05 q+ 14:59:05 Dwayne: for consistency, WebNN backend should clamp the min and max values 14:59:18 ... running WebNN on different backends would otherwise give inconsistent results 14:59:19 q? 14:59:32 ack jsbell 15:00:02 jsbell: we need to assume web developers won't test on every platform, so subtle differences will be considered bugs 15:00:42 ... platform differences such as is this codec supported where one can query if a codec is supported, some level of feature detection, this we can expect from developers, not subtle differences 15:01:24 ... WebNN is a bit different, developers are going through a framework, even there framework authors need to know about these subtle differences 15:01:46 ... except in some cases where are high-level op may be supported or not, then feature detection may be OK 15:02:10 ... we need to provide a lot of documentation to framework authors, we need to make sure the behaviour is consistent 15:02:10 q? 15:02:28 ack RafaelCintron 15:02:46 RafaelCintron: I agree with jsbell, WebGPU is in a similar mode, high-level graphics API with multiple backends 15:03:15 ... if backends behave differently, they discuss it in the group and usually majority wins, the minority backend gets polyfilled 15:03:38 ... emulation may regress performance, so features are detectable 15:03:51 ntd 15:03:55 ... feature detection is the last resort in WebGPU API 15:04:00 q? 15:04:30 q- Chai 15:04:36 Present+ Zoltan_Kis 15:04:50 q? 15:05:37 Dwayne: XNNPACK we can polyfill, ranges questions need to survey more backends 15:06:16 anssik: is this blocking the implementation? 15:07:35 Ningxin_Hu: current implementation behaviour need to be fixed in either XNNPACK or Chromium validation logic 15:08:53 RRSAgent, draft minutes 15:08:54 I have made the request to generate https://www.w3.org/2023/11/02-webmachinelearning-minutes.html anssik