WebML WG Teleconference – 18 April 2024

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please join me in welcoming our latest WG participant!
… - Kenji Baheux from Google has contributed to the emerging Hybrid AI effort and joins in an official capacity

Hybrid AI exploration update

anssik: the initial Hybrid AI exploration proposal webmachinelearning/proposals#5 received interest from multiple folks (thanks!)

<gb> Issue 5 Hybrid AI Exploration (by grgustaf)

anssik: to allow for a more structured discussion, I'd propose we create a "webmachinelearning/hybrid-ai" repo owned by the WebML Community Group
… this allows us split discussion into multiple issues, so interested folks can discuss important topics such as privacy implications of shared caches in dedicated issues
… The WebML CG (!= WG) is chartered to produce "Non-Normative Reports [...] that are not Specifications, for instance use cases, requirements, or white papers" so the Hybrid AI discussion would fall within that scope, that includes explainer-style docs
… participation across the WG and CG has a good overlap, active contributors are encouraged to join both the groups
… possible new spec incubations informed by the Hybrid AI discussion will require us to recharter the CG
… we will get to that if/when timely

Web Machine Learning Community Group Charter

Instructions for join the CG (and/or the WG)

anssik: any questions or comments?

Michael: splitting discussion into multiple issues sounds good to me

WebNN W3C Candidate Recommendation Snapshot published

anssik: A new WebNN Candidate Recommendation Snapshot was published 11 April 2024.

Web Neural Network API W3C Candidate Recommendation Snapshot, 11 April 2024

anssik: Congrats to the WG for this milestone!
… this Candidate Recommendation of WebNN API was signal-boosted by W3C's news and social channels and was picked up by some established newsletters such as JavaScript Weekly. Feel free to spread the word.

W3C News: Updated Candidate Recommendation: Web Neural Network API

anssik: Since the initial Candidate Recommendation Snapshot published 30 March 2023 the group gathered further implementation experience and added new ops and data types needed for well-known transformers to support generative AI use cases
… I'd like to thank the entire group for their contributions
… this publication improved the specification significantly from the initial 2023 CR and benefited from an expanded contributor base and advanced implementation efforts
… our work continues toward the next milestone, and we're currently addressing open issues faster than we open new issues i.e. our bug count is trending downwards, yay!

NPU support discussion (cont'd)

anssik: issue #623

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

anssik: I'd like to follow up on the NPU support discussion from our last meeting
… three things: 1) NPU device type, 2) fallback device concept, 3) ops for quantized models

Michael: quantization representation may be separate issue can be used outside NPUs too

anssik: quantization was first brough up #128

<gb> Issue 128 WebNN should support int8 quantized models (by wchao1115) [v2] [opset] [feature request]

Dwayne: agree with the 3 major topics and also that the quantization support also other device types
… NPU device type is in Chromium and we can test it, it doesn't have a fallback yet

anssik: what is the failure path?

Dwayne: at context creation time

anssik: is the Chromium implementation blocked on some spec discussions?

Dwayne: not blocked on the spec at the moment, but insights from other platforms would help, e.g. CoreML

Austin: on CoreML there's always a fallback, CPU is always included no matter what
… WebNN API fails if there's no NPU present, and if there's NPU present it tries to create and will fail at build
… without fallback or polyfill pushing the failure forward would be helpful
… another relevant thing re CoreML, adding conv2dint and integer-focused ops, on CoreML everything is fp16
… an accelerator with support for data types varies across systems, conv2dint on Mac needs to emulated on CPU on userspace in Chromium implementation

anssik: anything around ops for quantized models?

Michael: should consider e.g. 4-bit adapters in this context

Dwayne: is this a block-based compression for weights?

Michael: a moving target what to support, noting overlap with adapters

Open issues and PRs

anssik: As usual, we'll discuss open issues and review PRs based on your feedback and progress:

All open issues

All open pull requests

Debrief on PRs merged recently

anssik: first, a large number of issues I put on the agenda a week ago were addressed between now and then. I'm impressed.
… massive Thank You to Josh, Ningxin, Dwayne, Austin, others who worked on these!
… the following were addressed:
… - issue #634 fixed by PR #639

<gb> CLOSED Issue 634 `MLContext.compute()` explicitly rejects promises where the sub-septs may currently throw (by huningxin) [bug]

<gb> MERGED Pull Request 639 BugFix: `compute()` explicitly rejects a promise if buffer transferring fails (by huningxin)

anssik: - issue #610 fixed by PR #641

<gb> MERGED Pull Request 641 Introduce "valid dimension", used as needed when calculating operand shapes (by inexorabletash)

<gb> CLOSED Issue 610 Need clarify scale factor for resample2d (by BruceDai) [bug]

anssik: - issue #602 fixed by PR #622

<gb> MERGED Pull Request 622 Revise graph resource validation (by inexorabletash)

<gb> CLOSED Issue 602 Is "validate graph resources" backwards? (by inexorabletash) [question]

anssik: - issues #484 #486 fixed by PR #642

<gb> MERGED Pull Request 642 gather(): Address indices validation and other algorithm nits (by inexorabletash)

<gb> CLOSED Issue 486 Add "implementation consideration" about how out-of-bound indices of Gather should be handled (by huningxin) [operator specific]

<gb> CLOSED Issue 484 Should Gather's indices data type be integers and support negative value? (by huningxin) [operator specific]

anssik: - issue #209 fixed by PR #637

<gb> MERGED Pull Request 637 Decompositions for reduceLogSum, reduceLogSumExp, and reduceSumSquare (by inexorabletash)

<gb> CLOSED Issue 209 reduceLogSum, reduceLogSumExp, reduceSumSquare are not supported on OV/NNAPI (by mingmingtasd) [editorial] [operator specific]

anssik: - issue #630 fixed by PR #631

<gb> MERGED Pull Request 631 Validate restriction of output padding in convTranspose2d (by inexorabletash)

<gb> CLOSED Issue 630 Need to add the value restriction of output padding in convTranspose2d (by mei1127) [operator specific]

anssik: an agenda diff for these: https://github.com/webmachinelearning/meetings/commit/904edb388f2c091ee1f80fd8cbac6af54ea0a1eb

And a few more PRs that were not on the agenda, were also addressed after our last meeting:
… - issue #643 partially fixed by PR #283 (PR is pre-work)
… - issue #615 fixed by PR #632
… - issue #187 fixed by #638
… - PR #640 merged to align with newly added WebIDL Float16Array

<gb> MERGED Pull Request 643 Fix/simplify some validation steps (by inexorabletash)

<gb> Issue 283 Specify the operand data type constraints of operation (by huningxin) [question]

<gb> MERGED Pull Request 632 add a note for empty input (by philloooo)

<gb> CLOSED Issue 615 Graph with no input (by philloooo) [question]

<gb> MERGED Pull Request 638 Editorial: Make "generically emulated" text a macro, update wording (by inexorabletash)

<gb> CLOSED Issue 187 BatchNormalization should be an optional operation (by pyu10055) [opset]

<gb> MERGED Pull Request 640 Float16Array has landed in Web IDL - remove workarounds (by inexorabletash)

Joshua: PRs to highlight #637 adds decomposition, #638 adds macros for emulated text, #642 makes changes to gather
… common to these is the text associated with decompositions, worded as hints to implementers, if a backend does not support clamping then use these decompositions as a hint
… a sudden thing going through all these PRs, eventually we may run in cases where these can't be handled in userspace

<Ningxin_Hu> gelu op has also been added: webmachinelearning/webnn#628

<gb> MERGED Pull Request 628 Define Gelu operation (by mingmingtasd)

[bug] Synchronously validate input operands/activations

anssik: issue #572

<gb> Issue 572 Synchronously validate input operands/activations (by inexorabletash) [bug] [question]

anssik: gentle nudge, I believe feedback and PRs are still welcome for the proposed remaining work
… Josh, anything you want to add?

jsbell: no change since last time

[bug] WebIDL definition for constant(start, end, step, type) is missing

anssik: issue #571 (related to #492 discussed later)

<gb> Issue 571 WebIDL definition for constant(start, end, step, type) is missing (by inexorabletash) [bug] [question] [operator specific]

<gb> Issue 492 Constant sequential filling operation needs output shape parameter (by fdwr) [feature request] [operator specific]

anssik: a proposal for adding the missing Web IDL definition for constant
… Ningxin suggested we consider this together with int64 (bigint), see issue webmachinelearning/webnn#492 (comment)

<gb> Issue 492 Constant sequential filling operation needs output shape parameter (by fdwr) [feature request] [operator specific]

anssik: so let us visit #492 for a second:
… Josh comments bigint use in Chromium is limited and binding codegen is not yet handling that
… also notes Web IDL spec has a warning for union of bigint and a numeric type

Web IDL warning: a union of bigint and a numeric type
… Dwayne opened a new Web IDL issue whatwg/webidl#1388 and Web IDL spec editor was convinced the proposal to use a union of bigint and a numeric type "seems reasonable"
… are we going to miss something if we merge this issue #571 into issue #492 ?

<gb> Issue 1388 Intent to use BigInt/numeric union in WebNN (by fdwr)

<jsbell> Agreed, just merge them. I'll make it a dupe

Dwayne: I don't think we'll miss anything if we close this

<Ningxin_Hu> +1 to merge them

anssik: OK to close this when useful information is transferred from this issue to #492

[question] Allow no-op graphs?

anssik: issue #614

<gb> Issue 614 Allow no-op graphs? (by inexorabletash) [question]

anssik: on our last call Ningxin suggested we track this as part of MLBuffer proposal

Discussion from our last call

anssik: any new information to add?
… Austin you commented this recently in context of fillSequence webmachinelearning/webnn#492 (comment)

<gb> Issue 492 Constant sequential filling operation needs output shape parameter (by fdwr) [feature request] [operator specific]

anssik: do you want to talk to your question related to fillSequence rationale?

Austin: the context is I was looking at fillSequence and realized we can't imply the end, MLBuffer could be passed a constant so this may not be required, my understanding of MLBuffer is we can get the same behavior with it
… this op was proposed before MLBuffer existed

Dwayne: thanks for the detailed feedback Austin
… I'd be inclined to remote it
… WebNN I want to be explicit with its output shape
… I will get back to the thread today

Ningxin_Hu: want to clarify that for no-op graphs is to share constant between graphs, and MLBuffer satisfies that use case, not aware of other use cases for this feature

[question] Can an MLGraphBuilder be reused?

anssik: issue #567

<gb> Issue 567 Can an MLGraphBuilder be reused? (by reillyeon) [question]

anssik: an issue opened by Reilly, discussed last time, but he was not able to participate then but relayed this message:

"I'm satisfied that there are good use cases for constructing multiple MLGraphs from an MLGraphBuilder but we need example code and implementation experience before we can decide specifically how it will work"

anssik: I'm assuming the group's position is the same i.e. we want to explore more with example code and/or implementations?

asully: I can try to speak for Reilly a bit, he requested multiple graphs can be built at the same time
… when do we know when we can free up memory we've copied? every build call needs to be pass constant weights
… no time when you can release the memory, you don't know when to call build again
… if you can ensure you can call build once or several times at once, if you can expire MLGraphBuilder you can free extra memory copies

Dwayne: should MLGraphBuilder be stateful so it owns the things it creates?

Austin: what does the constant method do?
… implies it is stateful, if not then a copy has to be made at some time, at build call?
… if you change the content of that buffer you change what's passed to build, if you want to reuse an ArrayBuffer, calling constant with a buffer and calling it again with the same buffer you'd think you'll pass different buffers, unintuitive for developers
… arguing MLGraphBuffer should be stateful

Dwayne: changing buffer later before build, agree that shouldn't happen

Austin: if we copy the data and MLGraphBuilder is stateful, when we release all its data?

Ningxin_Hu: I think Austin made a good point, not intuitive for a developer if the data in the buffer can be changed
… when copying data, I see a use case for two graphs sharing the constant when building the graph
… valuable if the buffer is copied at constant build time
… it is used for GPU at build time, same copy of a buffer can be copied
… if using MLBuffer and constant we can keep one copy of the data and share with different graphs
… one constant can be shared with different graphs

Austin: MLBuffer.destroy() is explicit, with big buffer want to clean up memory that's expensive
… with one copy we can pass the same copy everywhere, not how we implement it, but can optimize implementation
… the answer is whenever the operator associated with GraphBuilder is GC'd, answer now is never

RafaelCintron: thanks Austin, if we hook up MLBuffer in the future to be a constant is another when to copy problem
… I want MLBuffer here and there and change them, call build and change them again, when are the changes reflected in graphs?
… I think once you call build things should be locked in place
… I can see it being more intuitive that way, not feeling strongly about that
… builder could be too big a consume too memory, could introduce destroy to it, also to context too to release GPU memory
… if associated with high power device could dial down power usage, should shy away from doing that

Austin: regarding MLBuffer we haven't specified the timeline how things happen, good questions when should the data become static?
… when build is called?
… would that create a copy, whatever we decide I expect it to be well defined

RafaelCintron: constant could say "I own it", or "you own it" and "I own it" has performance implications
… we can gain implementation experience on which way to go

Austin: makes sense

[question] Specify the operand data type constraints of operation

anssik: issue #283

<gb> Issue 283 Specify the operand data type constraints of operation (by huningxin) [question]

<jsbell> Pull Request 646 Specify the operand data type constraints of operations

<gb> Pull Request 646 Specify the operand data type constraints of operations (by inexorabletash)

anssik: Ningxin reported initially (in 2022): "The current spec doesn't specify the operand type constraints of an operation. However, some operations, e.g., softmax should only support float32 operand type according to the survey of frameworks and native ML APIs"
… fast forward to 2023/24, Ningxin provided an extensive summary of the operand data type constraints for current WebNN ops:

webmachinelearning/webnn#283 (comment)

<gb> Issue 283 Specify the operand data type constraints of operation (by huningxin) [question]

jsbell: PR #646 follows what we do in the rest of the spec, if the data type is not an allowed type, throw an error

<gb> Pull Request 646 Specify the operand data type constraints of operations (by inexorabletash)

jsbell: do we want to be table-driven with this information?

<Ningxin_Hu> +1 to inline

anssik: Yajing provided comparison with CoreML: webmachinelearning/webnn#283 (comment)

anssik: a non-blocking question from Jiewei: "Should mixed precision be allowed when the op involves accumulation?"

[operator specific] Type of some parameters should match the input data type

anssik: issue #442

<gb> Issue 442 Type of some parameters should match the input data type (by Honry) [operator specific]

anssik: Wanming reports MLPadOptions for pad and clamp should match the input data type
… a proposal from Josh in the issue is to add a new typedef:

typedef (bigint or unrestricted double) MLNumber;

anssik: this typedef to be used for:
… - constant(value, type)
… - constant(start, end, step, type) (see #571 and #492)
… - MLClampOptions
… - MLPadOptions

<gb> Issue 492 Constant sequential filling operation needs output shape parameter (by fdwr) [feature request] [operator specific]

<gb> CLOSED Issue 571 WebIDL definition for constant(start, end, step, type) is missing (by inexorabletash) [bug] [duplicate] [question] [operator specific]

<Ningxin_Hu> +1 to have a PR for MLNumber

– DRAFT –
WebML WG Teleconference – 18 April 2024

18 April 2024

Attendees

Meeting minutes

Hybrid AI exploration update

WebNN W3C Candidate Recommendation Snapshot published

NPU support discussion (cont'd)

Open issues and PRs

Debrief on PRs merged recently

[bug] Synchronously validate input operands/activations

[bug] WebIDL definition for constant(start, end, step, type) is missing

[question] Allow no-op graphs?

[question] Can an MLGraphBuilder be reused?

[question] Specify the operand data type constraints of operation

[operator specific] Type of some parameters should match the input data type

Diagnostics