WebML WG Teleconference – 31 October 2024

Meeting minutes

anssik: Happy Halloween!
… To kick off the meeting, let me welcome our latest new participants:
… Domenic Denicola and Kenji Baheux from Google
… Islam El-Ashi from Dfinity Stiftung
… Christian Liebel from Thinktecture
… Domenic and Kenji are working on the task-specific API using built-in models proposed to the WebML Community Group, also contributing elsewhere
… Christian has built solutions with these task-specific APIs and also wrote TypeScript definitions for them https://www.npmjs.com/package/@types/dom-chromium-ai
… welcome all, we look forward to working with you!

WebML Community Group Charter update

Repository: webmachinelearning/charter

anssik: I'd like to review and discuss the proposed WebML Community Group charter update to add new task-specific APIs (Writing Assistance APIs, Translator and Language Detector APIs, Prompt API) introduced at TPAC 2024 in scope

TPAC 2024 slides for task-specific APIs

anssik: I've prepared a PR #9 for the CG charter update

<gb> Pull Request 9 Update Community Group Charter for 2024-> (by anssiko)

Charter Preview

anssik: review comments are welcome in the PR, on this call, or via email
… I'd like to note the updated scope for proposed Community Group extends beyond the current scope of the Web Machine Learning Working Group
… in a nutshell, WebML CG incubates new web spec proposals and WebML WG standardizes them when/if they gain traction and support
… WebNN API is an example that was incubated in the CG and graduated to this WG
… both the WebML CG and WG share many of the active participants and use the same GH infrastructure
… this work mode enables smooth collaboration
… one notable difference is the IPR policy: in CG a participant first make commitments over their own contributions, in a WG commitments are made based on the WG charter scope
… other differences come from the W3C Process that WGs comply with, e.g. wide review for WG specs when hitting certain spec milestones

Changes overview

anssik: to give an overview of the proposed changes:
… - Goals were updated to better explain the relationship of low-level APIs and higher-level task-specific APIs:

"Following the precepts of the Extensible Web Manifesto, higher-level task-specific APIs can be implemented in terms of the low-level APIs that use custom models downloaded over the network. To complement this approach, the Community Group will incubate selected task-specific APIs to enable reuse of the built-in models that are distributed as part of the browser or the underlying software platform."

anssik: - Task-specific APIs and Prompt API added to Deliverables:

Translator and Language Detector APIs

Writing Assistance APIs

Prompt API
… - Noted WebNN has graduated to the WG

anssik: the next step is to conduct a 30-day vote for the CG participants on the proposed new charter, I'd like to kick off that review soon
… any questions or comments?

Etienne: Eng manager at Google for built-in AIs, working with Kenji and Domenic, also working with Natasha from Microsoft

dom: during TPAC there was discussion whether some of the APIs should be considered for the next WG charter
… wanted to hear whether we indeed think adding them as potential deliverables in that new WG charter
… IPR considerations differ between CG and WG charters

dom: if adding to the CG is the first step for getting them into the WG, would be good to know that soon

anssik: WG charter draft should be ready around January

MikeW: I haven't yet looked at the proposals, can take a look

Device selection abstractions

Repository: webmachinelearning/webnn

anssik: issue #749

<gb> Issue 749 MLContextOptions.deviceType seems unnecessary outside of conformance testing (by mwyrzykowski) [device selection]

anssik: MikeW proposed a more generalized, abstract concept on how to address the issue of device selection
… navigator.ml.opSupportLimitsPerAdapter() returns MLOpSupportLimits dictionary for each compute device on a system (CPU, GPU, NPU)
… web developer/framework then passes the most appropriate MLOpSupportLimits to navigator.ml.createContext() to indicate to the implementation which limits will be required during the lifetime of the MLContext
… implementation-specific constraints this concept attempts to address:
… - DirectML needs to know which device the tensor will run on
… - CoreML does not allow requiring the NPU
… other considerations:
… - implementation to decide how much information to disclose via opSupportLimitsPerAdapter
… - this would expand the fingerprintable surface on top of what opSupportLimits() (PR #755) already exposes

<gb> MERGED Pull Request 755 Define opSupportLimits() (by philloooo)

MikeW: excellent summary Anssi
… not a concrete summary, but how to abstract the concept a bit so that DirectML gets what it needs
… also cater to the fact that CoreML cannot require NPU
… fingerprinting concern needs to be addressed
… open to any feedback, this was a rough idea that I informally proposed

Zoltan: once we enumerate constraints and bind to a compute device you can identify which device you have
… the main problem from current design is we have CPU, GPU, NPU, some combinations are not available on all platforms
… why not piggyback on constraints query
… I was wondering how to do this from the security point of view, without exposing too many details from the system
… we can experiment with this and certainly it adds more code to the application, but it depends on the use case
… we can do more work to map these to the application use cases
… programmatically thinking I like the proposal

<zkis> The question is what the USVString would be in record<USVString, MLOpSupportLimits> opSupportLimitsPerAdapter();

dom: this is a topic to engage with the Privacy WG (was PING), in terms of when, it does not need to be fully fleshed out proposal, more likely what they're interested in is what additional fingerprinting surface this would add, what are the ways to mitigate
… doing own research ahead of time, what mitigation would be realistic to deploy, drive-by fingerprinting, what are the tradeoffs to make during the design

dom: maybe writing an explainer for the new proposal is the best approach to bring it in front of the Privacy WG and document also Privacy considerations in that explainer

Zoltan: I think the problem now is NPU is not supported, if we find an algorithm that says choose NPU when available or closest mapping
… if for other reasons we want to expose this proposed API that results the USVString and opSupportsLimits
… should standardize the USVString that is the fingerprintable area
… privacy people will not like device enumeration-style API

MikeW: the existing API is an option, the concern from WebKit people was the concept of NPU, GPU, NPU being too hardware-specific for the web, if we'd abstract those terms it would help

MLTensor

anssik: MLTensor explainer has landed!

MLTensor Explainer
… Thank You all, this was a major effort, the PR #754 received 175(!) review comments

<gb> MERGED Pull Request 754 Add MLTensor explainer (by a-sully) [webgpu interop]

anssik: special thanks to Austin for carefully iterating this explainer with review and contributions from Corentin, Domenic, Phillis, Rafael, Brandon, Bryan, Nignxin, Dwayne, others
… our next step would be to prepare a spec patch to specify the feature normatively in spec language
… another task would be to check that we've closed all "webgpu-interop" issues that this explainer (or its spec counterpart will) addressed:

webgpu-interop issues

anssik: Austin, what are your thoughts on the spec update? I think you want to coordinate with the editors on that.

Austin: I'm very excited that this PR merged, the work has just began in terms of specifying the feature and prototype implementations in browsers
… we're discussing whether we prototype first and then spec based on that implementation experience
… leaning for the explainer as a source of trust for now and when especially the WebGPU interop implementation is better fleshed out advance to the spec update

Austin: a draft PR might be a good idea for the spec patches, I will figure out how to go about it

Tensor primitives

TPAC 2024 slides

anssik: this topic is on the agenda to check if we're ready to discuss some of the requirements
… I listed some possible subtopics in the agenda:
… - Additional primitive ops: evolve core op set w/ MLIR linalg, PT Edge, TOSA
… - Graph expressiveness: subgraphs, control flow
… - Native runtime support: fusion, pattern matcher
… Ningxin, would any of these topics benefit from group discussion?

ningxin: after TPAC we established two workstreams, how to define additional primitive ops cross-referencing other IRs, incorporating the feedback, beyond linalg, PT Edge, TOSA
… another workstream is about subgraph definition by WebNN Builder API, this is being combined with the underlying implementation of fusion, pattern matcher
… subgraph most useful for graph optimizer for pattern matching
… we've started proof-of-concept based on Chromium to see how we can define this subgraph in WebNN and help the implementation do pattern matching, starting with multi-headed attention and look for other patterns LSTM, GRU
… once we have data from this POC that will inform the WG on this design, at that stage we'll open a spec issue for more discussion
… we think control flow is out of scope for the first stage, because in some devices it is not supported natively, so we want to focus on the subgraph

Dwayne: for comparison to others, I compiled a huge spreadsheet
… next step would be to identify what is missing
… anssik: should we use the core op set issue #573 for discussion these tensor primitives?

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

<ningxin> 573 is a good one

Dwayne: anssik: also after cross-referencing the custom ops issue #6 (from 2019) I think we should close it?

<gb> Issue 6 Custom operations (by dsmilkov) [v2] [device selection]

Dwayne: anssik: it has great historical context but is not actionable

ningxin: re issue #6, we're explosing composed primitive, another way for WebGPU interop is for the framework to write a custom op, so issue #6 expands beyond composable custom op

Open issues and PRs

anssik: as the usual, we'll discuss open issues and review PRs based on your feedback and progress:

All open issues

All open pull requests

Recently merged PRs

Debrief on PRs merged recently

anssik: thanks to Austin and Bruce for PRs and everyone for reviews, since last call:
… issue #666 fixed by PR #774

<gb> MERGED Pull Request 774 Convert MLOperand methods into readonly attributes (by a-sully)

<gb> CLOSED Issue 666 Reconsider `MLOperand` methods (by a-sully) [question]

anssik: issue #775 fixed by PR #776

<gb> MERGED Pull Request 776 Bugfix: Fix the error of opSupportLimits for split op (by BruceDai)

<gb> CLOSED Issue 775 The split's opSupportLimits should be of `MLSplitSupportLimits` (by BruceDai)

anssik: issue (various) fixed by PR #754 - MLTensor explainer

<gb> MERGED Pull Request 754 Add MLTensor explainer (by a-sully) [webgpu interop]

anssik: w3ctag/design-reviews#933 fixed by PRs #765 and #769

<gb> MERGED Pull Request 769 Add Implementation Status to metadata (by anssiko)

<gb> MERGED Pull Request 765 Add resource contention considerations (by anssiko)

<gb> CLOSED Issue 933 Updated review of WebNN API (by dontcallmedom) [Priority: urgent] [Progress: review complete] [Review type: small delta] [Review type: horizontal review] [Venue: WebML CG] [Resolution: satisfied with concerns] [Mode: breakout]

<gb> … [Topic: Machine Learning] [Focus: Web architecture (pending)] [Focus: Security (pending)] [Focus: Privacy (pending)]

[feature request] LocalResponseNormalization

anssik: issue #228

<gb> CLOSED Issue 228 Support for LocalResponseNormalization (LRN) operation (by MarkGHX) [opset] [feature request]

anssik: the group analyzed the proposal for a new op and recommends a decomposition path for LocalResponseNormalization
… Dwayne's last comment documents the rationale:

"Using decomposition in higher layers (e.g. ORT's WebNN EP) for localResponseNormalization rather than a dedicated WebNN operator due to the rarity of the operator in models and the awkward backend differences."

<ningxin> +1

[operator specific] Consider adding int64/uint64 data type support for some reduce operators

anssik: issue #694 and PR #695

<gb> Pull Request 695 Bugfix: Add missing 64-bit integers support for some reduction operators (by huningxin) [operator specific]

<gb> Issue 694 Consider adding int64/uint64 data type support for some reduce operators (by lisa0314) [operator specific]

anssik: I wanted to clarify the PR status and possible blockers for this PR
… MikeW suggests "we should start with the set of operations which is the intersection of which all native frameworks support today", Austin +1
… Ningxin was asking a question re Apple platforms and 64-bit integers:

"if an implementation could map "cpu" MLContext to BNNS, "gpu" MLContext to MPS and "npu" MLContext to CoreML, would that mean only the "npu" MLContext not supporting 64bit integers? That device type specific difference can be detected through MLContext.opSupportLimits() interface."

ningxin: my questions is related to the device selection, so the question is whether this data type support can be detectable through opSupportsLimits in the current design?

MikeW: I think technically this is feasible for the backends to map these, it does make implementation diverge substantially, not sure if anyone has feedback on that for CoreML
… it might be confusing for frameworks or users, when they use 64-bit integer it'd not be possible to run on NPU

ningxin: thanks Mike, we need to think this together with the device selection to find the right abstraction to give to developers

<anssik> s/… should we/anssik: should we

– DRAFT –
WebML WG Teleconference – 31 October 2024

31 October 2024

Attendees