WebML WG Teleconference – 16 March 2023

Meeting minutes

Repository: webmachinelearning/webnn

Proposed Charter 2023-2025 under W3C Advisory Committee review

anssik: Please make sure your Advisory Committee representative brings input (and ideally support) on the review.
… You'll find your AC rep contact details at https://www.w3.org/Member/ACList (login with your W3C Account)
… The deadline for responses is 03:59 UTC 5 April 2023
… or 23:59, Boston time on 4 April 2023

dom: nothing to add, it is important to get support from as many as possible

WebNN - WebIDL and Infra standard conventions

anssik: we have aligned the core parts of the API with WebIDL conventions to satisfy CR criteria. These are documented as "Done" in webmachinelearning/webnn#210 (comment)
… we have editorial enhancements that have been waiting the CR publication to happen
… because to keep the spec in a consistent and cohesive state for the CR publication before we start landing these
… we will discuss 2 of those enhancements today, Zoltan has worked on these, much thanks, Ningxin reviewed

Sync and async algorithms

anssik: issue #316

<ghurlbot> Issue 316 Review sync vs async compute differences (zolkis) Editorial

anssik: PR #329

<ghurlbot> Pull Request 329 Rework the sync async algorithms based on #323 (zolkis)

zoltan: a 2nd take on #323

<ghurlbot> Pull Request 323 [closed] Transfer the input and output views for asynchronous execution (huningxin)

zoltan: updated based on ningxin's changes, factoring out the common part of sync & async executions
… e.g. now we have validate and execute graph steps, referenced from both sync & async methods
… it helps avoid repetition - mostly editorial
… but would like clarity on whether to merge it now to avoid having too many branches in parallel
… to me, it can be merged for our CR release
… would like chai to take a look so that we can merge it soon

<ningxin_hu> webmachinelearning/webnn#341

ningxin_hu: +1 to merging it for CR - it's an editorial improvement that also fixes #341

<ghurlbot> Issue 341 Should validate MLGraph.[[context]] in MLContext.compute() and MLContext.computeSync() steps (huningxin) question

chai: I'll take a look it, this week or next

RafaelCintron: what happens if the Web developer calls async passing arraybuffers and change the values of the arraybuffer before the compute promise returns?

zolkan: that's handled by the transfer-input and output steps which ningxin fixed

ninginx: right - we've fixed that race condition while avoiding copy overhead in #323

<ghurlbot> Pull Request 323 [closed] Transfer the input and output views for asynchronous execution (huningxin)

MLOperand and MLActivation internal slots

anssik: issue #336

<ghurlbot> Issue 336 Add internal slots to MLOperand, MLActivation and basic algorithms (zolkis) Editorial

anssik: PR #337

<ghurlbot> Pull Request 337 Add internal slots to MLOperand and MLActivation (zolkis)

zoltan: this to help fix lack of clarity around MLActivation (née MLOperator)
… #337 uses internal slots for the usual attributes, but also to link to the operator
… I was also experimenting with describing constructors, but removed it based on Ningxin's feedback
… Ningxin reviewed the PR, other feedback welcome
… MLActivation also has an implementation internal slot that will be needed for other algorithms improvements to come in other PRs
… there may be additional changes needed as we improve the algorithms of other functions
… but we can do this iteratively
… this PR is a prerequisite to start these iterations

ningxin: could we have separate PRs for internal slots for MLOperand and MLActivation?

zoltan: we could - right now they're separate commits
… (but further amendments aren't)
… I think they belong to the same PR because they're needed as a combination to iterate

Ningxin: I need to take another look now that you've incorporated my feedback

zoltan: no matter what, I'll need an MLActivation internal slot; but I think the current PR should be consistent with your feedback
… let's see what Ningxin says - if there are still open issues on MLActivation, I'll split the PR

<ningxin_hu> sgtm

zoltan: but if not, then we could land it as one

WebNN - enhancements, editorials, questions

Simplify the operand layout support of conv2d and pooling 2d operations

anssik: issue #324

<ghurlbot> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (huningxin) enhancement

anssik: "In the existing WebNN spec, conv2d supports two input operand layouts defined by MLInputOperandLayout and four filter operand layouts defined by MLConv2dFilterOperandLayout."

ningxin_hu: this emerged from implementation feedback
… not all OSes support all layouts
… which forces to emulate the unsupported layouts
… it's mostly a matter to insert a transpose operation in the underlying implementation
… but that adds overhead in the graph compute
… when the transpose is applied to a constant, it is done in the build phase, but for the input tensor, it would have to happen at compute time
… limiting the supported input and filter layouts would help let that overhead be handled by the frameworks
… we need to WG's input of which layouts to support

dom: is there any layout that is supported across all platforms?

ningxin: XNNPack only supports nhwc, directml supports both
… other platforms needs more investigation

anssi: is transpose sync implemented? how complete is it?

ningxin_hu: not implemented yet, but native APIs already have it and we could investigate it

anssi: great example of implementation feedback we're seeking
… it probably needs to stay open to gather input

chai: the layout only matters when the hardware prefers something, transposing to match the hardware layout loses performance
… ONNX took the position to stick with one layout and leaves the rest to the backend
… but here WebNN is the backend and has to pick a layout

Anssi: is this blocking implementation work, or can this be left open for a while?

ningxin: it isn't blocking - right now we're throwing an exception if the layout isn't supported
… there probably should be a better way to exposing that to developers

dom: is the throwing behaviour defined in the spec?

ningxin_hu: it probably needs to be defined in the algorithm steps

dom: I think it is more than enhancement, some of Zoltan's changes are very specific, here layout is tied to hardware support, something we can address

ningxin: right, based on current spec, an implementation would have to support all layouts
… throwing is an implementation decision that doesn't match the spec at this point

chai: layout is not something you need to query the hardware to understand what it supports
… it's at the framework level they would define the type of input layout they want
… once that input layout is translated in the backend, the backend has to adapt to the "historical format preference"
… when WebNN compiles the graph (in the build step), that's the time it needs to query the driver its preference (or anti-preference)
… in some cases, it's only a preference, in some cases a requirement
… the backend can resolve this by inserting a transpose, or manipulating the stride using jump reads instead of block reads
… it depends on the performance characteristics
… a transpose may be wasting time compared to a stride since it forces a copy
… sometimes it's worth it if there are 50 layers of convolutions with a layout preference
… WebNN as a backend has to support both layouts

anssi: so I'm hearing WebNN needs to be liberal in what it accepts

Chai: yes, webNN needs to be flexible in the formats it can handle
… the implementation will make it happen with the info from the driver
… the implementation cannot fail on unsupported formats, it needs to carry on
… if it can't be done, it should fail at the compile stage

– DRAFT –
WebML WG Teleconference – 16 March 2023

16 March 2023

Attendees