WebML WG Teleconference – 2 March 2023

Meeting minutes

Repository: webmachinelearning/webnn

Call for Consensus: WebNN API Candidate Recommendation

anssik: Call for Consensus to publish the Web Neural Network API as a Candidate Recommendation (CR) was issued 23 Feb 2023
… this is a W3C process step to check the WG is fine with the plan.

CfC to publish WebNN API Candidate Recommendation - review by 2 Mar 2023

anssik: no concerns were raised for the proposal meaning we can move forward with this publication. We will integrate #340 to add Status of this document note for Candidate Recommendation prior to publication.

<ghurlbot> Pull Request 340 Add Status of this document note for Candidate Recommendation (anssiko)

anssik: I'm aware we have many enhancements in flight and thus I will work with Dom to ensure our spec CI/CD system is ready to allow us continue make timely publications post CR.
… As a W3C process detail, post initial CR we can do three publications:
… (Return to) Working Draft
… (A revised) Candidate Recommendation Snapshot
… (A revised) Candidate Recommendation Draft
… (Advance to) Proposed Recommendation
… as WG participants you don't need to worry too much about these process details
… I'll take care of them with Dom and keep you informed so you can focus on the important technical work on our plate
… the key diff between CR Snapshot and CR Draft is that the Snapshot adds patent protection.
… these CR Snapshots should not be published more often once every 6 months
… so I expect we'll publish multiple WDs for smaller changes (e.g. editorial enhancements) post initial CR
… and when we make significant changes (e.g. add new features) we publish one or more Candidate Recommendation Drafts.
… And when we gather additional implementation experience we ultimately advance to Proposed Recommendation. That milestone is, however, further out.
… The requirements for Proposed Rec is to "have at least two independent implementations of every feature defined in the specification.".
… with that, I'm happy to resolve the CfC to publish WebNN API Candidate Recommendation.

RafaelCintron: what is the process of changing the spec after the CR land?

anssik: we can do WD, revised CR

RafaelCintron: if we later decide that WebGPU interop needs to change I hope we can change it

anssik: we can change the WebGPU interop parts post CR

<zkis> For what CR means, see https://www.w3.org/2004/02/Process-20040205/tr.html

ningxin_hu: how the CR and the implementation feedback and experience can play with each other? Some open issues depend on implementation experience.
… how this implementation feedback and CR link together?

anssik: CR is "call for implementations"

zkis: CR may mean use cases have been stabilized

<ningxin_hu> you explanation of CR is very helpful, thanks Anssi!

chai: thanks for clarifying CR status, "call for implementations" is a very clear explanation
… in the SOTD PR we augment the note, once we enter CR state we should focus on resolving the issues described in that PR
… we may add a few more ops per impl feedback, it sounds like those are fully allowed

anssik: adding new ops is allowed after the initial CR

https://www.w3.org/TR/webnn/

https://webmachinelearning.github.io/webnn/

<chai> +1

<Zakim> zkis, you wanted to ask about spec versioning, like 1.0, 2.0 etc. See also https://www.w3.org/2005/05/tr-versions

<zkis> For versioning specs, see https://www.w3.org/2005/05/tr-versions

zkis: I pasted the link to versioning discussion, Anssi mentioned we'll make CR drafts with breaking changes
… do we want to stick with TR version to be referenced in the implementation

anssik: let's try to avoid versioning if possible
… "Living Standards"

https://www.w3.org/TR/2023/WD-webnn-20230124/

https://www.w3.org/standards/history/webnn

<chai> +1

<ningxin_hu> +1

RESOLUTION: CfC to publish WebNN API Candidate Recommendation passes.

WebNN API feedback at the GPU Web F2F

anssik: WebNN API was on the agenda at the GPU Web F2F (2023-02-16/17).
… thanks to Rafael, Chai, Ningxin for presenting WebNN API to the WebGPU audience.
… there's some feedback recorded in the Member-only minutes. It'd be great if the minutes could be made public.

GPU Web F2F minutes (2023-02-16/17) [Member-only]
… I think what I can say now in the public space is that the feedback is encouraging considering multi-implementer interest signals.
… On behalf of the WebML WG I can say we're committed to work with WebGPU folks and I look forward to a closer collaboration when WebGPU folks' have available bandwidth.
… Rafael, Chai, Ningxin, do you want to share your high level takeaways?

chai: Ningxin and I joined 20 mins at the end of the day, I think the feedback for the WebNN API was positive
… I actually put some of that feedback in an issue #350

<ghurlbot> #350

chai: the belief is the WebNN API as specced is implementable on Core ML
… they like the fact we didn't insist in fighting the format wars, also like the API is focused on what matters the most, connecting apps to the underlying platform, WebNN API as a backend interface
… in terms of implementation technicalities, much focus on Neural Engine, the nature of their platform is such that they could partition the network graph to run on different processors, a single graph with part of it run on CPU or GPU and part on NPU
… the current spec that has an explicit device type is in a way not in a way they think it should work
… for them they'd like to be able to choose which part of the graph runs where, in the pending PR the come premise is to remove the GPU option from the device type and let the whole GPU processing done though WebGPU device path
… scoping this to one code path, leaves the default context to whatever the implementer want to implement
… on Windows, it is important the implementation is clear on whether it is on CPU or on something the system needs to take care of
… if we define the default to be CPU we cannot swing back and forth on Windows, on Apple platform this may be different that they change the default to actually use some other processing unit
… the core feedback is the explicit device type may not work well with Apple's architecture, secondly, Core ML works on textures, WebNN on textures and buffers
… DML only works on buffers, and you need to tensorize it
… this is the diff between Core ML and DML
… this does not prevent implementation to work, but the diff in implementation in interesting

RafaelCintron: I think Chai covered the high points
… textures and buffers thing is on the interop side of things, interopping with WebNN-WebGPU needs to be a texture
… good to see validation with the choice to not do ML model formats
… when you import a texture to WebGPU you need to make sure any write to texture are seen by WebGPU when it read it, that means it is difficult to have an API to freely pass textures and buffers between WebNN and WebGPU
… that's pretty much it
… Mozilla person seemed to be happy with the WebNN API

ningxin_hu: two cents from me, from use case perspective I shared real-time video processing GPU pipeline with WebRTC group with WebGPU folks there
… this was well received, also interest in Neural Engine co-work with ML, texture sharing
… questions how WebNN can hadle multiple device timelines, captured in issue #350 I believe

<ghurlbot> Issue 350 Need to understand how WebNN supports implementation that involves multiple devices and timelines (wchao1115) question

ningxin_hu: we have an open issue investigate GPU and NN accelerator co-work, we can work on this together, this use case was interesting as well as AI and CPU co-work scenario
… Apple discussed a chopping concept for running a graph across multiple devices
… excluding a GPU use case was mentioned by Apple, did you Chai consider that in your open PR?
… Chai: that open PR just consolidates the GPU code path it does not tackle that specific feedback explicitly
… in Core ML they allow run the whole graph in GPU, in older systems there was no Neural Engine

Chai: a successful hybrid execution on Large Language Models is yet to be seen
… they can move around quite a bit when running the model on Core ML they can check what macOS version is on and apply heuristics per that information
… comments were made from the interface point of view, their impl would allow them to run WebNN graph on CPU or GPU entirely

in PR #322 we still allow full GPU implementation and also allow mix and match in case of default context

<ghurlbot> Pull Request 332 [closed] Fix #308: refer to MLContext validation steps from MLGraphBuilder (zolkis)

<ghurlbot> #332

<ghurlbot> Pull Request 322 Simplify MLContext creation (wchao1115)

WebNN API open PRs and issues

anssik: As usual, we'll do a review of open PRs and discuss issues. Identify and fast track any priority changes that should get into the initial CR release train.

Make axis definition of concat and split consistent

anssik: issue #345

<ghurlbot> Issue 345 Use unsigned long for axis of concat operation (miaobin)

anssik: PR #352

<ghurlbot> Pull Request 352 Make axis definition of concat and split consistent (huningxin)

anssik: issue summary: The current axis definitions of concat and split are inconsistent.
… PR summary: fixes this inconsistency by aligning the valid value range [-N, N) and negative value interpretation.

ningxin_hu: Jiawei would like to propose an unsigned int axis type
… that makes sense I think, would like to hear the WG's view on that
… also raised a point the spec should make all axis definitions consistent across the spec

<chai> +1 on axis consistency

ningxin_hu: I put a table to help review axis usage across the spec
… please review the PR knowing some opens in the issue, check the issue too
… I'll update the PR per your feedback

Use static padding values for pad operation

anssik: issue #354

<ghurlbot> Issue 354 Use static padding values for `pad` operation (huningxin)

anssik: PR #355

<ghurlbot> Pull Request 355 Use static padding values for `pad` operation (huningxin)

anssik: issue summary: The current WebNN pad operation declares padding parameter as an MLOperand whose values may be dynamic at runtime
… not widely available on native ML APIs. This may lead to complex implementation if the native ML APIs only support static padding values
… PR summary: Use static padding values for pad operation

ningxin_hu: I incorporated Zoltan's naming suggestion into this PR, PR ready for review

Propose to add Parametric ReLU into the operation list

anssik: issue #356

<ghurlbot> Issue 356 Propose to add Parametric ReLU into the operation list (huningxin)

anssik: PR #357

<ghurlbot> Pull Request 357 Add prelu (Parametric ReLU) into the operation list (huningxin)

anssik: issue summary: PRelu is a widely used activation function with mainstream use cases in facial landmark detection, has efficient implementation path with native ML APIs
… PR summary: adds a new prelu() method

ningxin_hu: all good, no further comments on that PR

<chai> 🎉

zkis: I'd like to advance the WebNN editorial enhancements into the spec

<ningxin_hu> thanks zoltan!

– DRAFT –
WebML WG Teleconference – 2 March 2023

02 March 2023

Attendees