Meeting minutes
Welcome to our new participant Kaushik Satpathy from Yahoo!
Announcements
anssik: implementation status of WebNN has been updated, thank you all for the great progress!
Implementation Status of WebNN
webmachinelearning/
<gb> MERGED Pull Request 84 November update for Impl Status (by ibelem)
anssik: also Awesome WebNN, a curated list of awesome things related to the WebNN API, has received updates
webmachinelearning/
<gb> MERGED Pull Request 11 November 2024 Update (by ibelem)
anssik: please share this reference to people interested in this topic for the latest articles, demos, presentations, samples, tutorials, videos and more about WebNN and the ecosystem around it
Call for review: WebML Community Group Charter update
Repository: webmachinelearning/charter
anssik: on 2024-11-01 we initiated a call for review of the Web Machine Learning Community Group Charter update, open until 2024-12-02
… folks who are also WebML Community Group participants are encouraged to review the Charter proposal
… to be eligible to vote, you must be a CG participant
WebML CG Charter update, vote by 2024-12-02
How to join the CG:
… Summary of changes: refresh Goals, add Task-specific APIs and Prompt API to Deliverables, note WebNN has graduated to the WG
… for more information about the proposed task-specific APIs, please refer to the TPAC 2024 presentation:
TPAC 2024 slides for task-specific APIs
anssik: and the GH repos for the proposals:
explainers-by-googlers/
anssik: any questions?
Device selection abstractions
Repository: webmachinelearning/webnn
anssik: Zoltan synthesized the group's current thinking and discussions into a device selection explainer, thanks!
… issues #749 and PR #784
<gb> Issue 749 MLContextOptions.deviceType seems unnecessary outside of conformance testing (by mwyrzykowski) [device selection]
<gb> Pull Request 784 Add device selection explainer (WiP) (by zolkis)
anssik: it discusses intro, history, key use cases and requirements, considered alternatives, examples and design
… and open questions
… the doc is written so that we can hand this to folks outside this group for review, e.g. privacy, TAG etc.
zolkis: this is a collection of thoughts from GH issues, it is WIP still
… Ningxin and Chai contributed in the early phase, Rafael, Joshua, MikeW
zolkis: MikeW's proposal is in considered alternatives, fingerprinting story the remaining concern
… we can possibly go with Mike's proposal if we give more examples
anssiko: considered alternatives:
… 1. Keep the current MLDeviceType as a context option, but improve the device type names
… 2. Follow this proposal MLOpSupportLimits should be opt-in per #759
<gb> Issue 759 MLOpSupportLimits should be opt-in with base functionality (by mwyrzykowski) [device selection]
MikeW: great document, thanks for writing this! Option 1 is simpler, could flesh that out right now
zolkis: listing of opLimits is inside the context, if we move it out then we know what the underlying platform is capable of, can match with the model to run
… we need more concrete examples for Option 2, whether go with full WebGPU adapter approach
… Option 2 could come after Option 1
Dwayne: would Option 1 be relaxing the device type to be a hint?
zolkis: PowerPerformance would provide the hint that the implementation could use to map to underlying processing unit(s), or could rename the MLDeviceType
zolkis: let's solicit more use cases to have a complete view
anssik: for feedback, please use the PR #784
<gb> Pull Request 784 Add device selection explainer (WiP) (by zolkis)
MLTensor
anssik: The group is gathering implementation experience on the MLTensor design to inform an upcoming specification update.
… The explainer is considered the source of truth in this prototyping phase.
… I'd like to discuss the open questions, currently 5
https://
Austin: not blocking forward progress, the first bullet has come up in the Chromium implementation, we deprecated compute() for dispatch(), and if there's an error you don't find about it
How will errors be surfaced?
anssik: issue #477
<gb> CLOSED Issue 477 API lacks handling for async ML device errors on the context (by bbernhar) [question]
Bryan: want to understand how backends can surface errors midway?
Austin: for many backends peak memory usage can exceed what's available, you could OOM while inferencing
… seeing failures on some backends because we have implementation gaps, a few classes of errors
… e.g. trying to allocate too much memory, model file is deleted from disk, compile the model and think everything is good but things change undernearth
… or if you do scatter or gather with OOB indices
<jsbell> webmachinelearning/
<gb> Issue 778 Proposal: Report non-fatal errors from the WebNN timeline (by a-sully) [feature request]
Austin: CoreML backend may have higher peak memory usage during inferencing than after compile
Bryan: writeTensor() has the same issues as dispatch()
(this is expanded in issue #778)
<gb> Issue 778 Proposal: Report non-fatal errors from the WebNN timeline (by a-sully) [feature request]
Core op set & MLIR Linalg mapping
anssik: issue #573
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
anssik: this topic is to discuss core op set, primitive ops informed by MLIR Linalg, PyTorch Prims IR, TOSA, StableHLO others.
… I propose we look at the MLIR Linalg mapping today
… Dwayne contributed a preliminary analysis of op correspondence (thanks!):
Machine Learning Operator Mapping
anssik: Dwayne notes WebNN demonstrates viability of popular models, but it lacks breadth
… implementing all the 800+ ops is untenable due to interop requirements (multiple browsers, multiple underlying platforms and backends), this is why we are investigating what make for an appropriate set of primitive ops to allow composition
Dwayne: no firm recommendations, but some categories that are absent
… WebNN backend support 1D to 3D
… modular div, rounding, bitwise
… composite ops, lego blocks for decomposition of other ops, e.g. sumPooling
… sheet legend: yellow = absent; red = not interesting, it's a named variant
Dwayne: yellow = worth adding to WebNN
Joshua: have you done analysis what backends in Chromium support these?
Dwayne: it is future work
Joshua: do we look at the current backends, or look at primitive core ops on top of which everything can be constructed on
anssik: how could the group help?
Dwayne: investigation on CoreML backend and TFLite support for these would be welcome
Support reverse operator
anssik: issue #773
<gb> Issue 773 Support `reverse` operator (by huningxin) [feature request] [operator specific]
anssik: Ningixin proposed a reverse op that reverses the order of the input tensor along specified axes
… improves performance of PyTorch models
… framework support PT, TF, ONNX (with reverse slicing with step -1)
… native APIs DML (simiarly to ONNX), CoreML, TFLite
… also in primitive opsets StableHLO, TOSA, PT Prims
Ningxin: we have Chromium CL to prototype
Support strides option for slice operator
anssik: issue #772
<gb> Issue 772 Support strides option for `slice` operator (by huningxin) [feature request] [operator specific]
anssik: Ningxin reports stride option for the slice operator is widely supported, but WebNN's flavour only support stride of 1
… real-world models with stride > 1 cause WebNN to fallback to other EP impacting performance
anssik: can we link to some sample models in this issue?
Ningxin: the model is a transformer-based model with some customization, possibly not shareable yet
Dwayne: it is an audio model, I can share that it is for speech recognition usage, consider it a Whisper variant of sorts
Support block-wise quantization
anssik: issue #779
<gb> Issue 779 Support block-wise quantization (by huningxin) [operator specific]
anssik: request to support block-wise quantization
… allows input tensors be divided into smaller independently quantized blocks, used by SLMs
… benefits include faster optimization and high precision quantization
… DML and CoreML support, it seems no TFLite/LiteRT?
… Dwayne suggests a decomp path is viable?
… no API signature changes, only changes to the algorithm
Ningxin: we have a prototype in Chromium for DML backend and we successfully used that to enable Phi3-mini with this capability
… this prototype is successful
Joshua: would be great if you could share in the issue performance improvements "X times faster"
Ningxin: for TF we need a composition, can file an issue for TFLite
… I can follow up on that
… for CoreML, Austin can comment
Austin: more constraints than DML, block-wise quant is theoretically supportable
Ningxin: Phi3-mini uses block-wise quantization and will hit the CoreML implementation
<dwayner> The memory savings were huge (I forget the exact numbers before/after, but IIRC 20GB's before o_o).
<jsbell> If any TPAC attendees can answer the question in webmachinelearning/
<gb> Issue 470 Simplify `matmul` op (by huningxin) [operator specific]