WebML WG Teleconference – 24 August 2023

Meeting minutes

Repository: webmachinelearning/webnn

anssik: Welcome to our 24 Aug call, we have a busy and an exciting agenda!

Google Chrome team's feedback on WebNN API

anssik: I asked Vivek and Joshua to share a high-level summary of the Chrome team's feedback with the WG and document that feedback in a GH issue. Thank you Vivek and Joshua for collecting all this feedback from Google Chrome teams and sharing it with the WG, this is much appreciated.
… you'll find Google Chrome feedback in GH issue #453

<gb> Issue 453 Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability (by vsekhar)

anssik: Vivek and Joshua, please feel free to use 10-20 minutes incl. discussion to share a high-level summary of your feedback. I expect the WG to continue discuss specific topics in the GH issue and spin new issues as appropriate.

Vivek: thank you Anssi and the WG

Vivek: Google strongly supports the work of the WebML WG
… we got together at Google to gather feedback
… feedback solicited from ML research and infrastructure teams at Google
… reach and maintainability are important lenses for us
… Chrome's key observation:
… - for new OS APIs or hardware accelerators, we must assume that most Web users don't have them
… - we have an obligation to ensure a workable experience for other users as well
… Chrome's goal:
… - achieve 80% of a device's theoretical hardware-accelerated ML runtime performance across 80% of devices on the Web, and to do so while imposing a manageable long-term support burden on browser vendors
… Ecosystem issue:
… - the ML ecosystem is still rapidly evolving, making it difficult for any API to keep up
… Proposed steps for the WG (from the issue):
… 1. Request public positions from major browser implementers
… 2. Reduce the long term support burden of WebNN by streamlining the API surface
… 3. Demonstrate WebNN performance for CPU and GPU execution across multiple OS platforms
… 4. Demonstrate WebNN performance gains utilizing OS- and hardware-specific optimizations
… Proposed steps for OS- and hardware-specific optimizations:
… 1. Select 2-5 demonstrative ML models
… 2. Run on a demonstrative set of platforms with accelerator hardware
… 3. Evaluate latency, throughput and power efficiency between lowering to CPU/GPU vs. hardware accelerators

anssik: thanks Vivek and Joshua and the Google Chrome team for this feedback!

jsbell: thanks for the summary! Just wanted to share this derives from thinking what we as Google Chrome have as requirements for shipping an API, the same criteria as we have for Intent to Ship for any feature
… we stand behind what we ship for decades and these consideration are based on that expectation

Vivek: want to note the group has thought about the low-level and high-level ops question, appreciate that
… reducing long-term support burden, if there's consensus emerging in the broader ML space we propose to align with that on op set abstraction level and scope

Ningxin_Hu: thanks for this concrete feedback, a lot of good observations
… I like that you've shared concrete guidance and recommendations
… re "Demonstrate WebNN performance for CPU and GPU execution across multiple OS platforms"
… suggestion is to implement WebNN as a polyfill on top of Wasm and WebGPU APIs
… we have a JS implementation of WebNN using TF.js kernels with Wasm and WebGPU backends

webmachinelearning/webnn-polyfill

Ningxin_Hu: can you elaborate this proposal?

jsbell: clarifying that we are not proposing that browsers ship WebNN as a polyfill, but that the CG created polyfill would be excellent
… launch process adoption question, it is adopted by developers, we want to avoid a situation that there is no polyfill and web developers code directly to CPU or GPU backend
… if there's a quality polyfill framework authors can move to a polyfill and when browser implementations roll out we'd see immediate performance boost
… this is called ecosystem activation

Vivek: a polyfill helps clarify what is needed from the platform, running workloads on a polyfill will help trace where the performance bottlenecks are
… e.g. if the polyfill is too large, we can see where the web platform can help
… some features may remain in the user space
… because they change so fast, so polyfill helps clarify those aspects and see what developers should be able to customize

anssik: are we maintaining the WebNN polyfill?

Ningxin_Hu: probably not up to date with the very latest spec version
… an opportunity to improve the WebNN polyfill
… the polyfill builds on TF.js, so thanks for the TF.js team

Joshua_Lochner: wanted to ask, re caching side of things, when you save models you don't need to redownload the models, as Transformers.js author an issue close to my heart
… is this a consideration here?

jsbell: I definitely acknowledge your concern, that is part of the broader ecosystem adoption consideration
… we have folks in the Chrome team for improvements to storage
… will be discussed outside this call, in another issue or a future call

WebNN v2: review proposed new ops and data types

anssik: I'd like us to review and discuss the proposed new ops and data types informed by v2 model targets and recent prototyping efforts.
… Dwayne posted a well formulated list of proposals into a GH issue #375 -- thank you!

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

anssik: Details in webmachinelearning/webnn#375 (comment)
… let me first summarize what was proposed and then let Dwayne fill me in
… Proposed models:
… - Text-to-image: Stable Diffusion unet/VAE/text encoder
… - Image segmentation: Segment Everything decoder
… - Speech-to-text: Whisper Tiny

anssik: We don't have text-to-text models proposed, so I'd like the WG to discuss if some would be applicable, examples:
… Text-to-text: Summarization, translation, code completion demonstrated by Transformers.js?

Joshua_Lochner: text-to-text, I'll add some additional material

Joshua shared:

Text-to-text:

- Summarization: https://xenova.github.io/transformers.js/?demo=summarization

- Code completion: https://huggingface.co/spaces/Xenova/ai-code-playground

- Translation: https://huggingface.co/spaces/Xenova/react-translator

Joshua_Lochner: application-level tasks that are helpful in web apps
… code completion would be helpful e.g. in GH or codespaces
… this playground uses 300M params, no GPU, Wasm backend, reasonable performance already as is
… privacy focused browser extension could make use of this, for example
… star coder in collaboration with HuggingFace
… I think translation is a great idea, but the concern is the model is huge, 600M parameter model, 1.3GB size

<Joshua_Lochner> will do!

Joshua_Lochner: Proposed new ops:

- Logical elementwise comparison/selection operations: equal, greater, lesser, logicalNot, elementwiseIf/ternary, greaterOrEqual/lesserOrEqual

- More elementwise unary operations: identity, sqrt, erf (Gauss err func), reciprocal

- Reshaping operations: squeeze, unsqueeze, flattenTo2d

- Data rearrangement operations: expand, gather

- Normalization operations: meanVarianceNormalization

- Index seeking operations: argMin/argMax

- Misc: cast, fillSequence, triangularMatrix, shape

- Others?
… Proposed new data types:
… - int64
… - uint64
… Relevant background material:

Transformers.js presentation by Joshua Lochner

Transformer models presentation by Dwayne Robinson

anssik: Dwayne, thanks for this proposal! Please feel free to share with the WG your questions and areas of focus.

Dwayne: I'd appreciate feedback from anyone else who has goals for models that are missing ops, or feedback on whether we should not have some of these ops because they can be satisfied by low-level ops
… or feedback on naming
… these ops enable the models we focus on, but happy to expand to other valuable models, e.g. the text-to-text models Joshua proposed

Vivek: I wanted to understand the motivations for data types, in our work with WebGPU we have been using less data types and plumbing it
… floating point usually used in training use cases

dwayner: larger data types go past 4GB barrier, used by e.g. ONNX

Rachel: Joshua_Lochner shared translation task, is it using WebNN API?

Joshua_Lochner: uses ONNX Runtime Wasm backend currently
… I convert pretrained models to ONNX and use ORT backend to do inference, tokenization etc. other steps done in JS, everything is running in the browser

<Joshua_Lochner> Source code! xenova/transformers.js

anssik: you can validate it is on-device inference by disconnecting from internet, it still works

<Joshua_Lochner> Yes it uses this model: https://huggingface.co/Xenova/nllb-200-distilled-600M

<Joshua_Lochner> which itself is an ONNX export of https://huggingface.co/facebook/nllb-200-distilled-600M

<Joshua_Lochner> yes exactly, the performance is limited by the model itself (and this model is quite old!)

WebIDL and Infra standard conventions

anssik: first, thank you Zoltan for keeping this big PR updated with comments and everyone for your review, this has been a great team effort across orgs!
… these changes align the entire specification with modern specification conventions and add stylistic improvements on top that make navigating this specification more delightful experience.
… today I'd like us to make a decision whether we're ready to merge the zk-conventions-integration integration branch to main.
… the big PR is #446

<gb> Pull Request 446 Add missing algorithms, add stylistic improvements, update with spec conventions (by zolkis)

anssik: first I'd ask Zoltan to summarize the big PR #446 latest status and then Joshua Bell would like to spend a couple minutes discussing some of the motivations behind modern spec style, and the "processing model" (i.e. how JS types pass through Web IDL to become Infra types) etc. A part of Joshua's feedback is captured in issue #450

<gb> Issue 450 Use web spec best practices (by zolkis) [conventions-integration]

anssik: thanks to Joshua's contributions we now have a Bikeshed build with no warnings! :)

zkis: 150 commits in that PR, squashed from many commits
… adding algorithms that were missing, following modern specification best practices
… jsbell helped a lot here, thank you
… a lot of work over the past 2 week
… other changes waiting for this to land, happy to report we did more than what we promised to do
… thanks to the extended team for reviews, I included names in the ack section of the spec
… next step is for editors to approve the big PR and merge it
… I can do quick fixes, but planning to start holiday next week

anssik: can you work with Chai to merge this PR?
… any concerns from anyone for merging this PR to main?

Ningxin_Hu: LGTM, some remaining open issues we can keep open in GH
… we do a final check and let Zoltan know if last-minute changes are needed

jsbell: I want to acknowledge I joined the process very late, appreciate your support for my contributions

anssik: I'm hearing we are ready to merge after a final check by Ningxin, Ningxin to work with Chai to get his GH approval and then merge

[no concerns with the proposed plan]

anssik: we'll proceed with the merge as noted

anssik: thank you everyone!

zkis: I will handle issues, mostly will be away

Ningxin_Hu: we can do this tomorrow latest

– DRAFT –
WebML WG Teleconference – 24 August 2023

24 August 2023

Attendees

Meeting minutes

Google Chrome team's feedback on WebNN API

WebNN v2: review proposed new ops and data types

WebIDL and Infra standard conventions

Diagnostics