Meeting minutes
Repository: webmachinelearning/webnn
MikeW: I represent Apple, WebGPU implementation background on Apple platform, leading WebNN/ML effort on Apple side as well
<MikeW> mwyrzykowski
Announcements
anssik: W3C Breakouts Day 2024 welcomes proposals by 29 February
… breakout concept is familiar from TPAC, but now there's a separate day
… in fact, an early WebNN API proposal was introduced in a breakout session at TPAC some years ago
… breakouts are a good opportunity to share any new explorations with the broader community to gather feedback
… if anyone has a breakout proposal in mind, simply open a GH issue in the breakouts-day-2024 repo https://
dom: in general breakouts are an opportunity to raise topics that go beyond one group's scope
… typically prepare for 1 hour session with a small presentation and discussion
WebNN API Candidate Recommendation Snapshot transition
anssik: Proposal is to initiate WebNN API CR Snapshot transition request 7 March, then publish by the end of March
… let's review our CR readiness #240
<gb> Issue 240 Candidate Recommendation readiness tracker (by anssiko) [process]
anssik: we're soon ready to turn all green, I have still a few areas I want to confirm with the group
Test coverage
anssik: my expectation is the current test coverage is considered adequate for the CR transition, 100% coverage is not expected at CR time
dom: we are way beyond expectations in terms of test coverage
anssik: happy to hear that
anssik: I suggest we note in the CR transition request that wpt.fyi results reflect XNNPACK backend implementation status
dom: good idea to clarify that in transition request
Ningxin_Hu: for DirectML implementation, we have an SW adapter for WPT, some gaps for real GPU in the bots to run wpt
dom: is this something that is being worked on and is there a timeline?
Ningxin_Hu: Austin informed we want to enable GPU tests in Chromium infrastructure, not sure if anyone is working on wpt.fyi currently
dom: wpt.fyi in some circles is used to gauge momentum so it is useful to figure out what needs to be done to improve CI setup for WPT for DirectML
Ningxin_Hu: currently wpt.fyi runs Edge on Windows 10, DML backends would require Windows 11, please point a contact for wpt.fyi to work with
dom: wpt owners groups would be the responsible people, being clear on what is needed would be the good first step
anssik: can have a separate call about this
Delta wide review
anssik: Delta wide review tracked in #239
<gb> Issue 239 Wide review tracker (by anssiko) [process]
anssik: no concerns raised
… not expecting any major concerns given this is a delta review and the earlier full review passed and changes since have been either to address review feedback or adjust opset scope, improve overall spec quality
High-level status
High-level status (aka Status of this document)
anssik: I think we're good with this status text, merged to main, any proposals welcome
… to recap this is the section busy people read, it is not inclusive of everything
Implementation status
anssik: Belem & co have maintained the implementation status page, it is fit for the purpose of the CR
<MikeW> thank you
anssik: all good to initiate transition req on March 7?
dom: only dangling bit is the TAG review
anssik: can you help bring this to their attention?
dom: I can try
Triage Guidance and Milestones
anssik: Next, I'd like to introduce the newly minted triage guidance and review initial triage results. Thanks Josh for working with me on this. I hope the group sees this effort as net positive
… for this call, I'd like to hear if any of the issues identified as "bug", "testing", or "untriaged" (later "big issues") should be addressed by imminent CR Snapshot
… for CR Snapshot purposes, we obviously are not expected to reach zero issues
jsbell: a few weeks ago we published triage guidance
… since then have tried to follow the guidance
… a big new label was "operator specific" with 41 issues
… even if a lot of issues, the problems are scoped, not affecting the shape of the API overall
jsbell: important are issues that do not fit into the workstreams
… aka "unknown unknowns"
jsbell: some additional issue clusters includes:
… - Graph construction and build steps - covers about 5 issues; we've got some active discussion from several participants narrowing in on what, where, and how to make things more precise.
… - Data types and number handling, including casting, small and big ints, input validation, and so on
… Dwayne has kicked off discussions with WebIDL maintainers about the path to supporting both float64 (double) and int64 (bigint) as inputs to the same method
… closed 15-20 issues as part of this initial triage
zkis: thanks Josh!
dom: I think some groups could borrow best practices from this group for triage guidance
anssik: triage guidance is welcoming PRs
… also new contributors to the Triage team welcome
Milestones
anssik: how do we want to concretely make the best use of the GH milestones feature?
… there was support on our last call to adopt milestones
… is a CR Snapshot a good spec milestone with an a scope that is feasible for a ~quarter worth of work?
dom: CR Snapshot every 3 months would raise a question how we do wide reviews for that cadence
… we want another CR Snaphot beyond the next planned one, one milestone might be the next CR Snapshot, another obvious would be "Proposed Rec", not anticipating any timelines
… discussion how to integrate backends to Proposed Rec implementation experience
… what should not be part of the first Rec
… declaring the first victory is beneficial
RafaelCintron: WebGPU group there's a concept of a milestone, Mike can confirm
<MikeW> That's right, its quite fluid however the milestones for WebGPU
RafaelCintron: criteria there is different from ours
MikeW: WebGPU group basically just categorizes issues to milestones based on complexity, flexibly changing from milestone to another
New features
MLBuffer
anssik: Let's continue discussion on the proposal for a backend-agnostic storage type for WebNN operations informed by implementation experience.
… I'd ask the group to pay attention to the open questions in the sub-issues and the exploration doc
-> MLBuffer proposal #482
<gb> Issue 482 Support for device-based tensor storage objects (by bbernhar) [webgpu interop]
-> Creation and representing MLBuffer on XPU devices #542
<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]
-> Uploading/downloading tensor data #543
<gb> Issue 543 [MLBuffer] Uploading/downloading tensor data (by bbernhar) [webgpu interop]
-> Support for MLBuffer in graph execution #544
<gb> Issue 544 [MLBuffer] Support for MLBuffer in graph execution (by bbernhar) [webgpu interop]
-> MLBuffer exploration doc #541
<gb> Pull Request 541 Add MLBuffer exploration doc (by a-sully) [webgpu interop]
anssik: I'm seeing good discussion in the exploration doc
… I'd like to bring for discussion Austin's proposal for refocusing MLBuffer on the following goals:
… - Prove out that the MLBuffer concept is feasible to implement on all platforms,
… - Prove out that MLBuffer provides meaningful performance wins for the two use cases we've identified, and
… - Avoid baking in any assumptions which would preclude adopting further optimizations in the future
… tentative suggestions:
… - Start with the initially-proposed readBuffer() and writeBuffer() APIs as the only way to read/write data to an MLBuffer from script
… tentative suggestions:
… - Start with the initially-proposed readBuffer() and writeBuffer() APIs as the only way to read/write data to an MLBuffer from script
… - Take a phased approach to supporting WebGPU <-> WebNN interop
… - Punt on the following features: Buffer mapping to JS, Minimizing buffer copies for UMA systems
asully: thanks for taking a look at this!
… I think the purpose is to make sure we get performance wins, JS buffer mapping is not so helpful because it may introduce overhead in fact
… most of the discussion can happen async, but would love to get Apple's feedback on this call
MikeW: I'm reading the issue now, I need to do a little bit research first
Bryan: more or less we're back to where we started
Pull Requests and open issues
anssik: we've worked through our PR queue, so we can focus on discussing open issues based on your feedback
<jsbell> webmachinelearning/
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
Core operator set
phillis: feedback from our platform teams, want to ensure we have good coverage and it is decomporable
… works consistently so frameworks on top can rely on it
jsbell: this has come up with StableHLO and PyTorch that have tried to move to very well defined baseline ops
jsbell: if the higher-level op is missing they want to be able to lower to a core ops
RafaelCintron: I'm willing to explore what a core op set would mean
… need web developers to be able to use an expanded ops on platforms that support it
… being able to do higher-level things easily seems very useful, everything should be in the spec, core and "expanded" op set
phillis: expanded op set should be in the spec, the question is whether we section the ops per core and "extended"
<jsbell> https://
jsbell: agree we don't want to go down to minimal set, PyTorch has settled on a core op set
asully: one of the key thing when we say "core op set" these are defined closely with constraints and would behave the same across platforms
… the higher-level the op, more variation in implementations, e.g. LSTM
<jsbell> Dwayne has hand up in Zoom?
Dwayne: this is not a new concept, we haven't gone deep into this, there's primitive, aggregate, optional ops -- what does it mean to be compliant to this spec then?
… I feel every complex op should be decomposable and core set should behave the same across platforms
… there's wiggle room around edges, casting, truncating to zero etc.
… fuzzier areas to iron out
… required and optional, logically organized with a label next to them
asully: to respond to Dwayne, agree ideally every op behaves the same on all platforms and we have no distinction of core and others
… in reality we have different backends to TF, PT etc. with differences
… for many web platform APIs you expect to run everyone, if we require every op to be supported everywhere e.g. LSTM is not implemented everywhere and would require a CPU fallback
… there's room to establish clarity around this