WebML WG Teleconference – 16 February 2023

Meeting minutes

Repository: webmachinelearning/webnn

WebNN API open PRs and issues

Simplify MLContext creation

anssik: PR #322 was discussed extensively on our 2 February 2023 call

<ghurlbot> Pull Request 322 Simplify MLContext creation (wchao1115)

[minutes] WebML WG Teleconference – 2 February 2023

anssik: following the call we reached an agreement this PR should be highlighted
… in the upcoming CR "status of this document" section
… IOW, we agreed not to block the initial CR with this PR
… this SOTD update is in PR #340, approved and ready to merge when we commence with the CR publication

<ghurlbot> Pull Request 340 Add Status of this document note for Candidate Recommendation (anssiko)

anssik: I'd like us to use this call to discuss the other topics we deferred from our last call

Rework the sync and async algorithms

anssik: issue #316 and PR #329

<ghurlbot> Issue 316 Review sync vs async compute differences (zolkis)

<ghurlbot> Pull Request 329 Rework the sync async algorithms based on #323 (zolkis)

anssik: issue description:
… - Inputs and their validation is the same in both, so it can be factored out
… - The async steps currently are not really asynchronous.
… - There is only one difference between the sync and async steps (excluding promise-related): the async version compares the byte length of |outputTensor| with the length of the output descriptor corresponding to |key|, whereas the sync version compares the same with the byte length of |value|.

anssik: the proposed PR changes:
… - Factor out graph input/output validation
… - Factor out graph execution algorithm
… - Use them in the sync and async algorithm steps

zkis: good summary, adapted per Ningxin's changes, would like to get re-review from Ningxin
… after the changes it was more complicated to specify, even if diff looks complex not so much different from the previous version
… we cannot review it on this call, need to be read slowly

<ningxin_hu> i'll take a look at this PR, thanks Zoltan!

zkis: no hurry with this PR actually, can live parallel live with other higher priority PRs
… this is pretty similar to what Ningxin did, validation steps are just not repeated, minimal specification principle applied in this PR
… asserts in the text have no implication on implementations, consider them notes in the spec per WebIDL spec conventions

anssik: we want to get this in the initial CR?

zkis: yes, prefer that

<zkis> https://infra.spec.whatwg.org/#assertions

Add internal slots to MLOperand and MLActivation

anssik: issue #336 and PR #337

<ghurlbot> Issue 336 Add internal slots to MLOperand, MLActivation and basic algorithms (zolkis)

<ghurlbot> Pull Request 337 Add internal slots to MLOperand and MLActivation (zolkis)

zkis: this is WIP, need to decide a few things
… algos that are polymorphic, in those we need to internally contruct the operand, need internal or explicit contructor steps, all algos e.g. clamp() is an example, it has its own issue and PR
… missing references mentioned
… all PRs depend on MLOperand and MLActivation PR
… do we need a constructor MLOperand, or only created by a builder?
… MLActivation, no explanation how to use this construct, what this is exactly

chai: MLActivation and MLOperand, these are trivial, these interfaces can be trivially constructor, MLActivation is a placeholder, we don't want to define enums for all of them
… we decided to have a separate interface for that
… no separate interface wanted, placeholder, implementation can keep a name, it is to be able to defer construction
… caller will do new activations, similar feedback for MLOperand

ningxin_hu: MLOperand and MLActivation, we need an internal concept, not public interface, for node inside the computational graph
… API returns operand, in our programming model this is data flowing through the graph
… nodes consume operands, in my implementation I repurposed MLOperand for a node inside a graph and activation can be fused into it
… MLActivation connected through MLOperand
… witch Chai's change we introduce MLActivation, used for fused activation from user's POV
… for spec, we need node or operator concept to describe the algorithm steps, e.g. in clamp() input operand and output operand, how these are connected together?

zkis: if we can define those meanings in the spec level it'd be nice
… now only one internal slot that is its own name
… should we have input and output internal slots?

ningxin_hu: no we don't need that I think
… underlying implementation connects MLOperand, let's see how you can write the algorithm steps e.g. in clamp() and we can start from these what internal slots are required to satisfy your needs for algorithm steps

zkis: we'll see in that PR how to do this properly

The clamp() algorithm

anssik: issue #347 and PR #348

<ghurlbot> Pull Request 348 [WiP] Add the clamp() algorithm (zolkis)

<ghurlbot> Issue 347 Add the clamp() algorithm (zolkis)

<zkis> https://github.com/webmachinelearning/webnn/pull/348/files

zkis: please take a look at this PR and the steps there
… I tried two different polymorphic versions depending on the first argument

ningxin_hu: my question, you mention it is polymorphic version, you try to handle them in this one algorithm steps
… my understanding is, we need to have two algorithms: 1) clamp takes input operation without activation, 2) clamp returns an activation
… probably you need an internal slot or say "underlying implementation" that realize the min and max values
… there are two different algorithms to write for clamp
… in Chromium implementation we have two implementations for this

<chai> +1

zkis: in JS we need to have single algorithmic steps
… I can change these steps, will do the same thing as with batchNorm
… will clarify the implementation owns the operation
… I'm not aware how I can make two algorithms here, you can do that in C++ but not in JS bindings
… if you have examples how this is done in other specs
… usually we switch on an object type passed as an argument

chai: process question, do we need to add all these implementation nodes on all the ops?
… I hear we've discussed clamp(), that is trivial op, we have more complex ops, if we need to explain implementation nodes for all of these it takes forever

zkis: only steps that you can see in this PR, param handling, return values, what we request the platform to do
… conv is more complex

chai: imagine convolution and gemm and friends, very complex to define at this level

zkis: input and output handling need to be clarified, how do we want the implementation to be called on this
… a lot of libraries could do the underlying work

chai: understand validation is important
… convolution input validation layer will be more more complex
… need to consider alignments and all that
… need to show you the code we do in DML to explain this
… it is very very tricky, in practice implementations of these ops are not going to do all this, they defer to the platform APIs
… e.g. CoreML may fail with improper arguments, it is unlikely the browser implementation to do all this itself

zkis: we'll still need to add a lot of formal text
… domenic
… mentioned we should delineate normative from informative text clearly
… MVP: exceptions, success path, handling input/output
… we could also clarify what we mean by axis in this spec, do we use the TF definition?

chai: for convolution etc. we try to define the overlap between them, the popular ones
… these rules, how the inputs need to line up is complicated
… we don't just go with TF, a lot of how different FWs do it is copy paste from others that came before them

zkis: I try to lift the boilerplate work from you editors and try to do it in a minimal way
… I think the dependency PRs should be merged soon

referring to add internal slots to MLOperand and MLActivation PR #337

<ghurlbot> Pull Request 337 Add internal slots to MLOperand and MLActivation (zolkis)

ningxin_hu: questions, whether we should put shape inference steps in the spec
… for clamp(), when make MLOperand and return to the user code, you define MLOperand as an internal slot, how to set the dimensions for output operand would require share inference
… these is some output shape calc formula written by Chai
… we need to translate that to algorithmic steps?

zkis: for clamp() I use trivial constructor for operand
… we can factor our the shape formula

ningxin_hu: in MLOperand PR no internal slot for that

zkis: I came up on the need for clamp() PR; will add that, thanks!

Improve batchNorm

anssik: issue #334 and PR #339

<ghurlbot> Pull Request 339 [WiP] Fix #334: Improve the batch norm algorithm (zolkis)

<ghurlbot> Issue 334 Improve batchNorm (zolkis)

anssik: issue description:
… clarify batchNorm description, in particular clarify how "axis" is used
… validate the inputs (dimensions etc.)
… add an algorithm which "outsources" running the actual op, but informatively describes what's expected

anssik: the proposed PR is WIP but welcomes review for the parts defined
… this depends on PR #337

<ghurlbot> Pull Request 337 Add internal slots to MLOperand and MLActivation (zolkis)

<zkis> https://github.com/webmachinelearning/webnn/pull/339/files

ningxin_hu: I recall we discussed this before, want to hear from Chai
… in this batchNorm PR we say "issue ... for the underlying platform"
… I'd like to get a clarification should we do that or skip in the builder methods for any ops

Chai: the builder should be cheap, what goes to builder should be only input validation
… I had some conversation with another browser vendor to see how they feel about implementing some of this
… aligned with our beliefs that when implementers look at the spec they see compilation step being the big step
… compilation has to have all the information
… build method is the time when they process everything

zkis: do we need to split what we have in the PR right know, to just record the structure?
… no exec time steps for batchNorm and others?
… exec steps just name ops as text such as enum, no description on how to use axes and such

Chai: no different from other ops, just construct the graph, I don't think you can provide impl guidance for the build method, it is different
… how e.g. Apple might implement the method could be different

zkis: what are the interop elements we can make here for the API users?
… if everything is deferred to impl interop is challenging

Chai: Web API is an interface that provides guidance to impl nodes, but cannot guarantee it is actually how it is implemented in detail
… documenting input validation at graph validation time we can do
… even that is not going to be thorough

zkis: please help me get the steps right for this op and I'll make the others following the blueprint we formulate for this op

<zkis> https://github.com/webmachinelearning/webnn/pull/339/files

zkis: I got the feedback, essence of it, we need to be very light

Chai: blueprint for these ops is some input validation, that'd be helpful
… that can be used when we explain when one builds a graph, what happens
… you cannot be fully confident until you compile the graph
… I have no idea how to write these steps for the build method

zkis: I'll try to figure it out, I got your guidance

Chai: I don't think you can accurately describe the implementation steps
… for build you have to pick one way to do it, but that one way may not be correct for other ways of doing it, it is an implementation detail
… if you can explain compile and compute in very simple way

zkis: we can explain where impl specific optimization can take place

ningxin_hu: another piece of feedback, for input validation, e.g. batchNorm, why not translate the declarative text we currently have in place?
… e.g. second input param is min, you can check if it is 1D otherwise throw an exception
… if we cannot validate like this, we push validation to build method for graph build time validation by the implementation
… batchNorm method could do input validation translating the current declarative text

zkis: we need to be clear on what happens at build phase, compute phase

ningxin_hu: build phase, we need to make sure the graph constructed with build methods, architecture/topology, attributes of nodes are validated before feeding into the native implementation
… other things are done by the native implementations

zkis: feel free to comment anything on these PRs
… if I can make you both happy we're probably on the right track

– DRAFT –
WebML WG Teleconference – 16 February 2023

16 February 2023

Attendees

Meeting minutes

WebNN API open PRs and issues

Simplify MLContext creation

Rework the sync and async algorithms

Add internal slots to MLOperand and MLActivation

The clamp() algorithm

Improve batchNorm

Diagnostics