W3C

– DRAFT –
WebML WG Teleconference – 10 March 2022

10 March 2022

Attendees

Present
Anssi_Kostiainen, Chai_Chaoweeraprasit, Dominique_Hazael-Massieux, Ganesan_Ramalingam, James_Fletcher, Jonathan_Bingham, Ningxin_Hu, Rafael_Cintron, Wan_Xiaojian
Regrets
-
Chair
Anssi
Scribe
Anssi, anssik, dom

Meeting minutes

Security considerations - last call for review

issue: General Security Questions

issue: General Security Questions:

PR: Update Security Considerations per review feedback:

All security-tracker issues:

Op metadata that helps avoid implementation mistakes (issue #243)

Anssi: PR#251 addresses most questions of #241, but doesn't address #243
… propose we leave that for later

dom: happy to review PR #251

dom: +1 to leave #243 for later

Ethical considerations update

UPDATE Ethical Web Machine Learning

James: we conducted an internal / external literature review, which led to the writing of a draft consultation document
… not complete, but enough for people to engage and react with
… please take a look at the document and bring comments
… it contains material that may or may not end up in the final WG note
… including the thinking process, background on ethics and ML
… still incomplete and work in progress
… A summary of the process: I looked at existing principles rather than developing our own
… we want these principles to be universal given the reach of the Web
… align with W3C values & principles
… which led to recommending to use the UNESCO value & principles - with more justification in the doc
… Unesco has a set of 4 values and 10 principles, developed through very wide review and approval, globally
… confirmed their fitness through meta-analysis about completeness and focus
… We're looking for feedback on the process, and where it led to in terms of proposed principles
… this is leading to the next phase where we want to hear from experts and stakeholders
… The principles are very high level - one challenge is how to turn them into practices, which the document wants to tackle
… We're running group sessions early April to kickstart that process
… between principles and risks/mitigations, there may be guidance that elaborates on principles with more context, with more details and more specific to the W3C context
… e.g. mapping to the W3C TAG ethical principles, which has "autonomy" or "decentralization" principles that don't emerge in UNESCO
… We'll look to synthetize this into a single guidance per principle, shorter than what we've extracted and specific to the W3C context
… that guidance would then to risks & mitigations
… The document also presents case studies that illustrates typical issues in ML ethics
… Re risks & mitigations, the doc will contain high level considerations, not yet to the level of individual specs
… the document has an example of a possible risk & mitigation to illustrate this
… We'll update this version of the document by March 21st, including guidance, feedback received incl on issues & case studies, and high level risks & mitigations
… Week of April 4 will run group review & brainstorm sessions to feed risks & mitigations
… and we're targeting April 21st as the time to approve this as a WG Note

Draft consultation document

dom: thanks for turning out plans into this document

Dom: THANKS!

anssi: we'll want participants from this group to be involved in the live session; got lots of interest from other W3C groups, e.g "horizontal" groups

Graph execution methods used in different threading models: immediate, async, queued

issue #230: Should WebNN support async APIs?

anssi: we started discussing in issue #230
… Chai produced 2 alternative designs up for review

PR #255: Define graph execution methods used in different threading models

PR #257: Context-based graph execution methods for different threading models

Anssi: this is a substantial change to the API - I want us to get it right

Chai: the two pull requests are both trying to do the same thing; I recommend starting with #255 - #257 builds on top of it
… the core change in #255 is trying to add several execution methods that the user can use to execute the compiled graph
… and based on the requirements we collected the few months, there are 3 ways people want to use WebNN
… the summary in #255 describes what we want to do: we want to enable:
… 1- immediate execution from the calling thread,where you wait until the result is available in the output buffer - a blocking call
… this is the simplest way to execute a graph
… this is needed in scenarios where it runs on the CPU
… 2- the second method allow async execution with a promise, allowing not to block the UI thread
… 3- the 3rd method is specific to WebGPU; with WebGPU you can buffer commands before they get executed in order
… sync or async doens't help here, you wouldn't get a deterministic execution
… the key difference is that it doesn't run the graph, it records the commands in the command buffer, and leaves it to the caller to execute the buffered commands
… in terms of API shape, in #255, I tried to not change too much of the existing API - 90% of the API is GraphBuilder
… because the execution methods have a strong dependency on the kind of context they're creating, I tried to separate the various mode of executions in execution interfaces
… MLExecution, MLAwaitedExecution, and MLCommandEncoder
… the latter name is directly inspired from the WebGPU spec
… on #255, Dom & Ningxin pointed that having the MLContext being something that calls the execution makes more sense
… a separate Execution interface makes it harder to see the dependency
… #257 addresses that
… it builds on #255 - it no longer has a separate Execution interface, but instead they become methods in MLContext
… with a compute method and a computeAsync method
… the runtime dependency is still there - if you try to use compute to execute on the GPU context, it's not allowed
… The caller that tries to execute the graph should know a lot about the context - we're not supporting a mode where the context is created independently from the execution
… when it comes to WebGPU, I chose to ise a createCommandEncoder that creates an MLCommandEncoder that uses an interface consistant with the WebGPU command queue

<Zakim> anssik, you wanted to ask if this interop method suggests any changes to the WebGPU API, whether we should seek explicit WebGPU WG review

anssik: re MLCommandEncoder command buffer, does it require any chance to WebGPU?
… should we seek explicit WebGPU WG review on the proposal

chai: it doesn't require any change to WebGPU

<dom> i|James: we:|Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Mar/att-0001/0310_W3C_Ethical_Web_ML_Update.pdf

RafaelCintron: re commandencoder - initialize and dispatch take a Graph; initialize should be called only once
… could initialize be a constructor for an MLCommandEncoder?
… what happens if someone calls dispatch with a different graph than the one used to initialize

chai: this is similar to what ningxin asked on #255
… initializeGraph records the commands we need to initialize the graph; it's not initializing the encoder
… if you put in the constructor, it would be misleading
… in many systems that we know, before you want to process the model, you want to pre-process the weights
… e.g. on GPUs and one some NPUs
… at the driver level, when you have the weights, they want the opportunity to process it and cache it in their driver in their layout format
… passing the weights in initializeGraph, the command encoder will record a copy into the GPU buffer
… that will send this down to the GPU driver with a flag that some systems would use to indicate the opportunity to initialize them at least once
… the actual commands get dispatched when the inference happens

RafaelCintron: what happens initialize multiple times?

Chai: wouldn't be efficient, but wouldn't fail

RafaelCintron: what about multiple dispatch?

Chai: the encoder is reusable, it doesn't carry state
… compared to compute, it's a lower level API, matching the WebGPU approach
… compute could be implemented on top of it

RafaelCintron: what if initialize A with some input, and then dispatch it with other input

Chai: they're different inputs - only constant weights for initialize
… this preprocessing step matches the approach taken by several low level model API (incl DirectML)

anssi: does the spec talk about these 2 inputs being different?

chai: feedback welcomed in the PR, which could use more explanation in places

anssi: maybe name them differently to help improve the ergonomics

chai: would also like a section with a sample with WebGPU usage
… but probably done in a separate PR

dom: thanks for this piece of work!
… I prefer #257 over #255, the API shape is explained better in that
… not sure still on CPU only compute() method, but will comment on the PR
… Anssi raised question about WebGPU intersection, we will need WebGPU WG to chime in
… we need WebGPU WG review for the intersection, there was a GH thread that pointed out some gaps, this PR might start address those

https://github.com/gpuweb/gpuweb/issues/2500

anssi: not requiring changes to WebGPU is definitely a big +

dom: we have Rafael as a bridge between WebML-WebGPU WGs
… if someone can give us reliable review on this PR from WebGPU let's check with them

Anssi: would Brian be able to give a WebGPU-angled review of this PR?

Ningxin: +1 to ask Bryan, in addition to Rafael's review
… Also thanks again to Chai - very significant contribution
… the PR brings both sync/async, and integration with WebGPU
… If we were to interact with the WebGPU people, we would want to highlight the latter - MLCommandEncoder
… the discussion on constants weights, it reminds me of an open comment I made on the first PR
… we have 2 surfaces to upload constants / weights for a context built on a WebGPU device
… the MLContext via GPUBuffer; with MLCommandEncoder, this gives another path to provide the weights
… Do we need to remove the constant method for the GPUBuffer binding, and move that to the initialize graph method?
… if we do so, the graph building code gives 2 different paths for graph building based on different contexts, builder vs initialize graph
… this isn't ideal; can we find a way to combine them?

chai: I understand that feedback; let's iterate on the PR
… re integration with WebGPU, a lot of these ideas came from Bryan

chai: my original idea was to have the ml path in WebGPU - but it creates a hard dependency to WebGPU spec & implementation

anssi: I'm hearing #257 as the PR to continue with
… thanks for the good progress
… summarizing: reviews expected on #257, including looping people from WebGPU (Bryan, RafaelCintron)
… and maybe later seek review from the broader WebGPU WG (possibly after the PR landed)

<anssik> i|James: we conducted an internal / external literature review, which led to the writing of a draft consunltation document|UPDATE Ethical Web Machine Learning

Minutes manually created (not a transcript), formatted by scribe.perl version 185 (Thu Dec 2 18:51:55 2021 UTC).

Diagnostics

Succeeded: s/Topic: Graph execution methods used in different threading models: immediate, async, queued//

Succeeded: s/consunltation/consultation/

Failed: i|James: we:|Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Mar/att-0001/0310_W3C_Ethical_Web_ML_Update.pdf

Succeeded: s/Brian/Bryan

Succeeded: s/Chair/Chai

Failed: i|James: we conducted an internal / external literature review, which led to the writing of a draft consunltation document|->UPDATE Ethical Web Machine Learning https://lists.w3.org/Archives/Public/www-archive/2022Mar/att-0002/0310_W3C_Ethical_Web_ML_Update.pdf

Succeeded: i|James: we conducted an internal / external literature review, which led to the writing of a draft consultation document|->UPDATE Ethical Web Machine Learning https://lists.w3.org/Archives/Public/www-archive/2022Mar/att-0002/0310_W3C_Ethical_Web_ML_Update.pdf

Maybe present: Anssi, anssik, Chai, dom, James, Ningxin, RafaelCintron