WebML WG Teleconference – 16 November 2023

Meeting minutes

Repository: webmachinelearning/webnn

Announcements

Web & Networks IG coordination

anssik: W3C Web & Networks Interest Group is rechartering with a coordination opportunity with WebML WG
… this IG is interested in working with us to explore how to load balance computing between cloud and client
… an example of this could be an inference workload
… this IG has network infrastructure experts in it
… expected IG investigations include identifying what network characteristics (bandwidth, latency, radio power consumption etc.) would be helpful to surface as higher-level hints via Web APIs or protocols to help web apps using APIs such as WebNN make informed decisions on where to run their workloads

Proposed Web and Networks Interest Group Charter

Voting instructions (Member-only):

anssik: every W3C Member company can vote, talk to your AC rep to cast a vote

Implementation status

Implementation Status of WebNN Operations

anssik: I wanted to check if there are implementation status updates to share with the WG
… should we check with Belem to update the above status page?
… I recall Reilly asked a questions about macOS backend implementation on our earlier call?

Reilly: we have been looking at the question of implementation against CoreML

anssik: we look forward to welcoming Apple on board this WG
… but while waiting for that to happen, I'd encourage the WG to investigate based on publicly available information, implementation story on Apple's platforms
… I also noticed Reilly recently made helpful contributions to the WebNN polyfill, thanks!
… while the polyfill is not considered an implementation per se, it helps web developers bring WebNN-accelerated web experiences to platforms that do not yet have WebNN native implementation with the same codebase

Reilly: I was focused on WebNN samples, their performance, polyfill contributions were to upgrade the polyfill dependencies to get more up to date view on its performance
… I got stuck with the patch with TF upgrade, the way how the polyfill is compiling under browser, but test suite runs on Node.js
… I'm probably not going to contribute to this polyfill beyond low-hanging fruit issues

Web LLM collaboration on hybrid execution use case

anssik: Tianqi Chen from Carnegie Mellon University, OctoML, created Web LLM, a JS library that accelerates select LLMs in browsers with WebGPU

Web LLM repo

anssik: Web LLM runs Llama with 70b parameters in some high-end systems
… Tianqi shared an interesting use case with this WG: "There are great synergies to webnn related projects that possibly enables future hybrid executions of models(e.g. webgpu for customized op and some through webnn)"

Proposed hybrid execution use case

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

anssik: if we find time from his busy calendar we'll invite Tianqi to our future call to share his experiences with Web LLM
… meanwhile, I suggest we may spin this hybrid execution use case into its own GH issue, thoughts?

Joshua_Lochner: I've definitely paid attention to this project, 70B params is amazing, I'm interested in running similar models on lower end systems

<ningxin_hu> https://bugs.chromium.org/p/chromium/issues/detail?id=1492036#c3

ningxin_hu: not related to Web LLM, to follow-up on the benchmark, we got benchmark data regarding the split and emulation path for slice
… decomposing to multiple slice, we see perf difference in the range of 10-20%, both iGPU and dGPU, please see the cgbug for details

WebNN v2: Review transformer ops spec contributions (continued)

anssik: issue #375 and PR #478

<gb> Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115)

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

anssik: Chai submitted a PR #478 to add support for new ops and data types needed by well-known transformers the WG has identified -- thanks Chai and also Dwayne!
… this PR also removes one op per our maintenance commitment and in-depth understanding of model targets thanks to careful research by participants
… I'd encourage the WG to review this PR
… this is a substantive change, diff stats +1500 -900
… Chai's PR comment provide a great summary so I'd like Chai to walk us through this PR and perhaps highlight areas where reviewers should most focus on

Chai: thanks Anssi, thanks all for the early feedback!
… in the PR description I have the summary, this is not an extensive list of ops needed for Transformers, this is a starting point
… allows us to run some popular models we've identified, there will likely be additions later
… you see some removal, we are considering the entire spec "v1" and we actively validating the spec driven by implementation experience
… when we solidify the op set we want to be more strict about changes
… regarding CoreML compatibility, for every spec change we have looked at CoreML to make sure this works there
… we also look at the major ML frameworks compatibility, in addition to major OS ML APIs
… if folks think there should be additional compatibility targets, please let us know
… we discussed e.g. clamp, min and max being equal, a minor case but frameworks did not agree on that minor detail

jsbell: thanks you Chai for confirming we are still comfortable making changes to this spec
… how is the WPT coverage for the API, when we add ops do we have WPT tests for those?

<dom> (in this case, this would be adding tests and additions to the baseline implementation)

chai: touching on breaking change aspect, we can make change at this stage, but I wouldn't make big changes, this PR makes one such change by removing one op
… red diff is due to bikeshed formatting mostly
… we are updating WPT in tandem with the spec PR

ningxin_hu: we have a small team working on WPT for these new ops

<dom> (for the record, the culprit of the overly red diff is not bikeshed, but the diff script itself)

ningxin_hu: baseline implementation, pure JS impl of all the ops, is updated to help testing effort

<ningxin_hu> https://github.com/webmachinelearning/webnn-baseline/pulls

ningxin_hu: we've merged some of these new ops and are in progress of reviewing the remaining ops being added

<ningxin_hu> web-platform-tests/wpt#43179

ningxin_hu: also WPT PRs work in progress
… we want to get the WPT tests right and add gradually, help wanted to review WPT PRs

jsbell: thanks, that's awesome!

<ningxin_hu> sgtm, will do that

jsbell: please include status update on WPT tests in the PR #478

<gb> Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115)

Enhancements

Simplify matmul op

anssik: issue #470

<gb> Issue 470 Simplify `matmul` op (by huningxin)

anssik: WebNN matmul supports 1-D input tensors but 1-D input tensors are not widely supported by native ML APIs, specifically:
… - DirectML's DML_GEMM_OPERATOR_DESC
… - BNNS's BroadcastMatMul
… - TensorFlow's BatchMatMulV2
… the open question is:
… - should matmul drop support for 1-D input tensors?
… frameworks could still support 1-D input tensors by reshaping to 2-D, prepend or append 1 dimension
… this reshape incurs no performance penalty, because no memory copy
… this change would simplify the implementation, WPT tests, reduce conformance testing cost

ningxin_hu: in the CL review we prototyped without 1-D support
… additional comment, in the issue we looked at XNNPACK's xnn_define_batch_matrix_multiply
… none of these native API support 1-D so this is good to be dropped

anssik: everyone OK to drop support for 1-D input tensors?

[ silence mean consent ]

Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option

anssik: issue #326

<gb> Issue 326 Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option (by huningxin) [Editorial]

anssik: we discussed this in Q2 2023 and agreed to revisit when the conventions update has landed, now that has happened so I wanted to revisit this issue

https://www.w3.org/2023/04/27-webmachinelearning-minutes.html#t08

anssik: initial issue description: "WebNN conv2d operation allows to set MLConv2dOptions.autoPad option to "same-upper" or "same-lower" of MLAutoPad enum."
… proposed fix back then: "The spec should define the algorithm of how the padding values are automatically computed." and Ningxin proposed to fix this by reusing 2d pooling definitions
… a new proposal emerged recently: "drop the support of MLAutoPad and only support explicit padding"
… thoughts?
… If we want to reduce the implementation, testing burden, and spec complexity maybe the new proposal is better?

Dwayne: haven't spent more thought on this, not strongly proposing the dropping support
… maybe callers could do the work so WebNN wouldn't have to worry about this

chai: not specific to this topic, just like any other API design discussion, there's always two sides of the argument, we have to make a decision
… establishing a principle when considering an API change is helpful, this topic is a great example of this tension
… defining a backend API we want it to be more tight in a sense we don't want to bloat the API space too much
… define smallest possible exposure so versioning in the future is easier
… we also need to consider the ease of implementation of framework developers who sit on top of WebNN API
… if we make the WebNN API too low, too explicit, it will burden the framework developers more, is more error-prone for them
… looking this API is look at tension points to find the best compromise, you don't want to make the API too explicit so frameworks can innovate, OTOH don't want to bloat the API

ningxin_hu: thanks Chai, for this specific issue we can investigate and ask framework authors for input
… we can drop this WebNN spec and implementation if authors already handle this calculation
… ONNXRT and TF

Dwayne: both these have helpers to massage paddings

API lacks handling for async ML device errors on the context

anssik: issue #477

<gb> Issue 477 API lacks handling for async ML device errors on the context (by bbernhar)

anssik: Bryan is asking: "What happens if a WebNN operation dispatched through MLContext encounters some internal error which causes the GPU device to get removed?"
… as you know, Bryan is well informed of all things WebGPU and also contributing to WebNN implementation now so can help bridge the efforts
… Bryan continues: "I would expect WebNN to provide a spec into how fatal (device) errors are handled so the WebNN developer could respond appropriately. If we want to do more with MLContext (ex. create buffers), I believe we'll need a more robust error mechanism like WebGPU"

WebGPU Errors & Debugging

RafaelCintron: I'm supportive of Bryan's proposal
… you can use GPU and then a driver update happens and things may break, this helps with that

chai: I think this also has to do how frameworks will respond to this event?
… if frameworks will bubble up this event to the app and expect the app to clean up, then it makes sense for WebNN to send this up
… good to understand how frameworks do this nowadays

ningxin_hu: many Web APIs backed by GPU, is there a unified way to get this error information surfaced to applications?
… this impacts many APIs, not just WebGPU and WebNN

RafaelCintron: there's no API for error information that spans all the Web APIs using GPUs
… WebGPU provides contextlost promise
… 2D Canvas did have a feature request, but not sure if it was implemented
… for WebNN inclined to follow WebGPU promise path
… re frameworks, BabylonJS handles these errors
… "house may burn down", so we need these APIs to help web developers build robust sites

chai: responding to ningxin_hu, in my previous PR, I said avoiding internal GPU device is important, if it want to use GPU device it has to use WebGPU device
… in part motivated on error handling discussed here
… if the only device WebNN will use, implying when device reset happens WebNN is part of the app stack for that WebGPU device
… a way to do uniform error handling, WebNN including the framework on top, an app for WebGPU so to speak, we have an internal device "gpu" for WebNN
… this is a discussion that hasn't finish and good to revisit

ningxin_hu: I feel we can do unification if we reuse WebGPU device for "gpu" selection
… if WebNN supports NPU later we need a similar interface to surface errors

chai: I believe NPU will follow a different route, the NPU adapter is not handled by WebGPU because it is not a graphics adapter

– DRAFT –
WebML WG Teleconference – 16 November 2023

16 November 2023

Attendees