WebML WG Teleconference – 22 February 2024

Meeting minutes

Repository: webmachinelearning/webnn

MikeW: I represent Apple, WebGPU implementation background on Apple platform, leading WebNN/ML effort on Apple side as well

<MikeW> mwyrzykowski

Announcements

W3C Breakouts Day 2024

anssik: W3C Breakouts Day 2024 welcomes proposals by 29 February
… breakout concept is familiar from TPAC, but now there's a separate day
… in fact, an early WebNN API proposal was introduced in a breakout session at TPAC some years ago
… breakouts are a good opportunity to share any new explorations with the broader community to gather feedback
… if anyone has a breakout proposal in mind, simply open a GH issue in the breakouts-day-2024 repo https://github.com/w3c/breakouts-day-2024/issues

dom: in general breakouts are an opportunity to raise topics that go beyond one group's scope
… typically prepare for 1 hour session with a small presentation and discussion

WebNN API Candidate Recommendation Snapshot transition

anssik: Proposal is to initiate WebNN API CR Snapshot transition request 7 March, then publish by the end of March
… let's review our CR readiness #240

<gb> Issue 240 Candidate Recommendation readiness tracker (by anssiko) [process]

anssik: we're soon ready to turn all green, I have still a few areas I want to confirm with the group

Test coverage

anssik: my expectation is the current test coverage is considered adequate for the CR transition, 100% coverage is not expected at CR time

wpt/webnn

wpt.fyi

dom: we are way beyond expectations in terms of test coverage

anssik: happy to hear that

anssik: I suggest we note in the CR transition request that wpt.fyi results reflect XNNPACK backend implementation status

dom: good idea to clarify that in transition request

Ningxin_Hu: for DirectML implementation, we have an SW adapter for WPT, some gaps for real GPU in the bots to run wpt

dom: is this something that is being worked on and is there a timeline?

Ningxin_Hu: Austin informed we want to enable GPU tests in Chromium infrastructure, not sure if anyone is working on wpt.fyi currently

dom: wpt.fyi in some circles is used to gauge momentum so it is useful to figure out what needs to be done to improve CI setup for WPT for DirectML

Ningxin_Hu: currently wpt.fyi runs Edge on Windows 10, DML backends would require Windows 11, please point a contact for wpt.fyi to work with

dom: wpt owners groups would be the responsible people, being clear on what is needed would be the good first step

anssik: can have a separate call about this

Delta wide review

anssik: Delta wide review tracked in #239

<gb> Issue 239 Wide review tracker (by anssiko) [process]

anssik: no concerns raised
… not expecting any major concerns given this is a delta review and the earlier full review passed and changes since have been either to address review feedback or adjust opset scope, improve overall spec quality

High-level status

High-level status (aka Status of this document)

anssik: I think we're good with this status text, merged to main, any proposals welcome
… to recap this is the section busy people read, it is not inclusive of everything

Implementation status

Implementation Status

anssik: Belem & co have maintained the implementation status page, it is fit for the purpose of the CR

<MikeW> thank you

anssik: all good to initiate transition req on March 7?

dom: only dangling bit is the TAG review

anssik: can you help bring this to their attention?

dom: I can try

Triage Guidance and Milestones

anssik: Next, I'd like to introduce the newly minted triage guidance and review initial triage results. Thanks Josh for working with me on this. I hope the group sees this effort as net positive
… for this call, I'd like to hear if any of the issues identified as "bug", "testing", or "untriaged" (later "big issues") should be addressed by imminent CR Snapshot
… for CR Snapshot purposes, we obviously are not expected to reach zero issues

Triage Guidance

jsbell: a few weeks ago we published triage guidance
… since then have tried to follow the guidance
… a big new label was "operator specific" with 41 issues
… even if a lot of issues, the problems are scoped, not affecting the shape of the API overall

<jsbell> https://github.com/webmachinelearning/webnn/issues?page=1&q=is%3Aissue+is%3Aopen+-label%3A%22operator+specific%22+-label%3A%22opset%22+-label%3A%22use+case%22+-label%3A%22webgpu+interop%22+-label%3A%22conventions%22+-label%3A%22editorial%22+-label%3A%22process%22++-label%3A%22testing%22

jsbell: important are issues that do not fit into the workstreams
… aka "unknown unknowns"

jsbell: some additional issue clusters includes:
… - Graph construction and build steps - covers about 5 issues; we've got some active discussion from several participants narrowing in on what, where, and how to make things more precise.
… - Data types and number handling, including casting, small and big ints, input validation, and so on
… Dwayne has kicked off discussions with WebIDL maintainers about the path to supporting both float64 (double) and int64 (bigint) as inputs to the same method
… closed 15-20 issues as part of this initial triage

zkis: thanks Josh!

dom: I think some groups could borrow best practices from this group for triage guidance

anssik: triage guidance is welcoming PRs
… also new contributors to the Triage team welcome

Milestones

anssik: how do we want to concretely make the best use of the GH milestones feature?
… there was support on our last call to adopt milestones
… is a CR Snapshot a good spec milestone with an a scope that is feasible for a ~quarter worth of work?

dom: CR Snapshot every 3 months would raise a question how we do wide reviews for that cadence
… we want another CR Snaphot beyond the next planned one, one milestone might be the next CR Snapshot, another obvious would be "Proposed Rec", not anticipating any timelines
… discussion how to integrate backends to Proposed Rec implementation experience
… what should not be part of the first Rec
… declaring the first victory is beneficial

RafaelCintron: WebGPU group there's a concept of a milestone, Mike can confirm

<MikeW> That's right, its quite fluid however the milestones for WebGPU

RafaelCintron: criteria there is different from ours

MikeW: WebGPU group basically just categorizes issues to milestones based on complexity, flexibly changing from milestone to another

New features

MLBuffer

anssik: Let's continue discussion on the proposal for a backend-agnostic storage type for WebNN operations informed by implementation experience.
… I'd ask the group to pay attention to the open questions in the sub-issues and the exploration doc

-> MLBuffer proposal #482

<gb> Issue 482 Support for device-based tensor storage objects (by bbernhar) [webgpu interop]

-> Creation and representing MLBuffer on XPU devices #542

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

-> Uploading/downloading tensor data #543

<gb> Issue 543 [MLBuffer] Uploading/downloading tensor data (by bbernhar) [webgpu interop]

-> Support for MLBuffer in graph execution #544

<gb> Issue 544 [MLBuffer] Support for MLBuffer in graph execution (by bbernhar) [webgpu interop]

-> MLBuffer exploration doc #541

<gb> Pull Request 541 Add MLBuffer exploration doc (by a-sully) [webgpu interop]

Chromium implementation

anssik: I'm seeing good discussion in the exploration doc
… I'd like to bring for discussion Austin's proposal for refocusing MLBuffer on the following goals:
… - Prove out that the MLBuffer concept is feasible to implement on all platforms,
… - Prove out that MLBuffer provides meaningful performance wins for the two use cases we've identified, and
… - Avoid baking in any assumptions which would preclude adopting further optimizations in the future
… tentative suggestions:
… - Start with the initially-proposed readBuffer() and writeBuffer() APIs as the only way to read/write data to an MLBuffer from script
… tentative suggestions:
… - Start with the initially-proposed readBuffer() and writeBuffer() APIs as the only way to read/write data to an MLBuffer from script
… - Take a phased approach to supporting WebGPU <-> WebNN interop
… - Punt on the following features: Buffer mapping to JS, Minimizing buffer copies for UMA systems

asully: thanks for taking a look at this!
… I think the purpose is to make sure we get performance wins, JS buffer mapping is not so helpful because it may introduce overhead in fact
… most of the discussion can happen async, but would love to get Apple's feedback on this call

MikeW: I'm reading the issue now, I need to do a little bit research first

Bryan: more or less we're back to where we started

Pull Requests and open issues

anssik: we've worked through our PR queue, so we can focus on discussing open issues based on your feedback

Open PRs

Open issues

<jsbell> webmachinelearning/webnn#573

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

Core operator set

phillis: feedback from our platform teams, want to ensure we have good coverage and it is decomporable
… works consistently so frameworks on top can rely on it

jsbell: this has come up with StableHLO and PyTorch that have tried to move to very well defined baseline ops

jsbell: if the higher-level op is missing they want to be able to lower to a core ops

RafaelCintron: I'm willing to explore what a core op set would mean
… need web developers to be able to use an expanded ops on platforms that support it
… being able to do higher-level things easily seems very useful, everything should be in the spec, core and "expanded" op set

phillis: expanded op set should be in the spec, the question is whether we section the ops per core and "extended"

<jsbell> https://pytorch.org/docs/main/torch.compiler_ir.html

jsbell: agree we don't want to go down to minimal set, PyTorch has settled on a core op set

asully: one of the key thing when we say "core op set" these are defined closely with constraints and would behave the same across platforms
… the higher-level the op, more variation in implementations, e.g. LSTM

<jsbell> Dwayne has hand up in Zoom?

Dwayne: this is not a new concept, we haven't gone deep into this, there's primitive, aggregate, optional ops -- what does it mean to be compliant to this spec then?
… I feel every complex op should be decomposable and core set should behave the same across platforms
… there's wiggle room around edges, casting, truncating to zero etc.
… fuzzier areas to iron out
… required and optional, logically organized with a label next to them

asully: to respond to Dwayne, agree ideally every op behaves the same on all platforms and we have no distinction of core and others
… in reality we have different backends to TF, PT etc. with differences
… for many web platform APIs you expect to run everyone, if we require every op to be supported everywhere e.g. LSTM is not implemented everywhere and would require a CPU fallback
… there's room to establish clarity around this

– DRAFT –
WebML WG Teleconference – 22 February 2024

22 February 2024

Attendees