WebML WG Teleconference – 27 April 2023

Meeting minutes

Repository: webmachinelearning/webnn

Announcements

TPAC 2023

anssik: W3C TPAC 2023 website is now available. You will find the initial details regarding the event:

https://www.w3.org/2023/09/TPAC/
… 11–15 September 2023, Seville, Spain & online
… WebML WG has traditionally not met at TPAC, is there interest in a meeting this year?
… the TPAC organizers are expecting us to provide our preferencem f2f or online, by early May 2023.
… please let me and Dom know if you're planning to attend by the end of next week.

Contribution guidelines

anssik: We have new improved contribution guidelines, thanks Chai for the PR!

webnn/CONTRIBUTING.md:

anssik: this was initially discussed in issue #231 and fixed by PR #381

<ghurlbot> Pull Request 381 [closed] Additional guidance for contributions (wchao1115)

<ghurlbot> Issue 231 [closed] Create a light-weight process to guide submitting new operator requests to WebNN (anssiko) process

chai: I hope these guidelines help streamline future contributions
… make it easier to triage and review PRs, there can be exceptions to these guidelines
… we should use this call to resolve any issues with guidelines

WebIDL and Infra standard conventions

The constant() method steps

anssik: WebIDL and Infra standard conventions meta issue #210

<ghurlbot> Issue 210 Use modern WebIDL and Infra standard conventions (anssiko) enhancement, Editorial

anssik: PR #365

<ghurlbot> Pull Request 365 Add the constant() method steps. (zolkis)

anssik: I believe Zoltan has followed Chai's guidelines for this PR.
… this is pending Chai's feedback

zkis_: this is one of the open PRs, this addresses all comments from Ningxin and Chai
… this is the most well-baked PRs

Chai: I looked at the recent changes, I will follow up on this this week

zkis_: style change PR would be coming up after this

zkis_: PR #322 would be next PR to be reviewed

<ghurlbot> Pull Request 322 Simplify MLContext creation (wchao1115)

chai: I was planning to rebase this PR #322 when the style changes have been landed

zkis_: style change won't affect this PR #322

chai: style change should not overlap with other PRs
… my PR #322 can wait until the style change comes in

Enhancements, editorials, questions

TC39 proposal to add Float16Array to JavaScript

anssik: issue #373

<ghurlbot> Issue 373 heads up re: proposal to add Float16Array to JavaScript (bakkot)

TC39 proposal

anssik: ECMA TC39 rep let us know there's now a proposal for Float16Array in JavaScript, which would hold IEEE binary16 floats
… we have flagged this as an issue in the WebNN API spec:

WebNN: clarify the usage of ArrayBufferView for float16

anssik: we should track the progress of this JS feature, current status: "The proposal is currently at stage 2, which means TC39 has not yet committed to add it to JS"

chai: does this plan to cover float8?
… or bfloat16 even if not universally supported

anssik: we should probably ask the ECMA folks about their position

tc39/proposal-float16array

Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option

anssik: issue #326

<ghurlbot> Issue 326 Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option (huningxin) Editorial

anssik: "WebNN conv2d operation allows to set MLConv2dOptions.autoPad option to "same-upper" or "same-lower" of MLAutoPad enum."
… proposed fix: "The spec should define the algorithm of how the padding values are automatically computed."

ningxin_hu: we were allowed to continue implementation and left TODO in the Chromium code to be resolved when the specification clarifies autopad calculation
… if no concerns heard, I can take this issue and propose a PR
… this could reuse existing 2d pooling definitions
… according to current spec definitions, we put one formula to describe the calculation, if we apply new web conventions with algorithmic steps this would be ~10 lines of spec
… if we prefer to follow the new algorithmic style for this PR

anssik: this issue will wait for algorithm conventions updates to land

Clarify the usage of 32 bit floating point type and consider using double

anssik: issue #325

<ghurlbot> Issue 325 Clarify the usage of 32 bit floating point type and consider using double (huningxin) enhancement

anssik: "The WebIDL 32 bit floating point type float is widely used by WebNN spec, such as for setting min and max value of clamp operator ... However, WebIDL spec has a warning of using float and the recommendation is to use double."
… proposed fix: "the spec should mention reason of using float"

chai: I think for property types such as attributes this should be straightforward to replace float with double, I would not adjust operands, tensor types, double tensor types would be too big

ningxin_hu: you are fine changing attribute type to double?

chai: but float64 tensors are problematic for all the hardware

ningxin_hu: this would impact the double-precision baseline implementation impact?

chai: attributes do not affect that, conformance is testing tensors

chai: to clarify, example attributes would be min and max values

ningxin_hu: for min and max, we can define these as double, but tensors won't support double, they can be small like float16
… should we put a note there to mention how this double attributes are applied to float32 or float16 tensors?

chai: I think it is up to you if you want to add a note to clarify this

chai: strong feeling for float64 tensors

zkis_: I'm wondering if symbolic definition would be better?
… type related prose in one place

chai: I think it is maybe obvious if we look at the operand types
… maybe no need for explicit text for that
… I prefer a spec that is concise, if we look at operand type it is clear we don't support float64 tensors

ningxin_hu: this is from Jiawei, he mentioned fp64, I think he means the tensor but Chai does not recommend that
… need to confirm with Jiawei, maybe Chai can chime in on GH issue

chai: I'll comment on this issue #325

<ghurlbot> Issue 325 Clarify the usage of 32 bit floating point type and consider using double (huningxin) enhancement

Subclass MLGraph based on the context that creates it

anssik: issue #344

<ghurlbot> Issue 344 Subclass MLGraph based on the context that creates it (huningxin) question

anssik: is API ergonomics improvement the key reason for this proposed change?

anssik: we'll defer this issue from today's call for later

Support for transformers

anssik: We received feedback during the AC review of our charter that the WG should initiate discussion on transformer models applicable for accelerated inference via WebNN API
… this feedback is publicly captured in issue #375

<ghurlbot> Issue 375 Mention transformer in use cases (dontcallmedom) v2

anssik: as we know, the transformer architecture was introduced in Jun 2017 initially focusing on translation tasks
… later adding more models such as GPT, BERT, GPT-2, BART, GPT-3 etc. roughly in order of appearance.
… there's roughly three top-level categories for the transformer models AFAICT:
… - auto-regressive models ("GPT-like")
… - auto-encoding models ("BERT-like")
… - sequence-to-sequence models ("BART-like")
… there have been some early experiments in using these models in the web context e.g. Transformers.js that support a number of transformer models from Hugging Face.

Transformers.js

Transformers.js supported tasks and models
… I believe the use cases WebNN might want to target would be a subset of what Transformers.js supports today. Let me drop a list here to initiate discussion:
… - text classification
… - token classification
… - zero-shot classification
… - question answering
… - language modelling
… - summarization
… - translation
… - text generation
… - automatic speech recognition
… - image-to-text
… - image classification
… - zero-shot image classification
… - image segmentation
… - object detection
… - embeddings
… I think it is fair to say many transformers are big models so I'm eager to hear your thought on which models you feel are the best first candidates in the web context for WebNN
… we know training transformer models is a HUGE task and luckily training is out of scope for WebNN so we don't need to worry about that.
… let's discuss what inference tasks on a trained transformer model would be good targets
… And let's try use our improved contribution guidelines that suggest we should start by looking at use cases.

Guidelines > Proposing and adding a new operation

chai: I agree, transformers are becoming very hot and WebNN needs to support transformers
… most transformers coming up recently and popular right now, are not single model but pipeline
… e.g. stable diffusion is 6 models starting with embedding stage, core part is the attention network that is the part that is generally speaking what needs processing
… in fact depending on the pipeline implementation, python or C# some of those languages have a library for tokenization, in embedding
… in most cases done on CPU, ~99% running on this is auto-encoding, stable diffusion running in a loop
… totally support the idea of adding support for transformers, look at which transformers to support
… for embeddings and tokenization not much to be done, they are outside the DNN, but auto-encoder is super important, the crux of the whole thing

zkis_: I have some questions to clarify, Transformer.js uses a pipeline in its API, it's using ONNX models it seems so you have to transform your models to ONNX, what is the core WebNN use case?

ningxin_hu: I would also support transformers network in WebNN
… my perspective, WebNN intends to be the backend of frameworks
… not a single model, but multiple models need to be supported, interaction between the models, how to run them in a loop
… need to understand how to partition the network and WebNN as a backend
… we need to be careful how to partition the task where model parts can be delegated or offloaded to WebNN for eventual acceleration by hardware, and what tasks to be left to the framework or user code to handle
… we may want to talk to web application developers to understand their usages
… Transformers.js and ONNX would be good projects to investigate by the WG

chai: my thought is aligned with Ningxin, if we look at Tranformers.js, it is all implemented in JS in Wasm, the whole pipeline can be written in JS compiled into Wasm
… when you run this natively, each model is instantiated, the heavy-weight is the auto-encoder
… WebNN is the backend that can be used by these frontends, most of the time is spend in auto-encoder
… transformer is probably the best network to highlight WebNN benefit, without proper acceleration this will be very slow and use a lot of memory
… benefits greatly from hardware acceleration

anssik: would you like to work on this in a GH issue?
… sounds like we want to brainstorm use cases in issue #375