WebML CG Teleconference – 15 April 2021

Meeting minutes

Web Machine Learning WG

anssik: I wanted to give you an update where we're at with the Web Machine Learning Working Group and its charter review

Changes to the proposed Web Machine Learning Working Group charter

Updated WG charter

WG charter changes diff

anssik: first would like to thank everyone who reviewed the charter and provided feedback and support signals.

anssik: let me summarize the key events over the course of last 1.5 months:
… W3C Advisory Committee has now reviewed the proposed WG charter
… based on that review, we incorporated changes in the charter before its final approval by the W3C Director TimBL, launch expected next week
… summary of changes:
… 1) aligned the language around the scope of the WebNN API with the CG
… 2) Model Loader API has been clarified as depending on the availability of a well-defined standard format for Machine Learning models
… 3) coordination with WebRTC WG and Ecma TC39 added
… 4) coordination with the W3C TAG on Ethical Web Principles added
… 5) new deliverable "Ethical Impact of Machine Learning on the Web" added, a W3C Note
… Please contact dom at w3.org if any of these changes feel problematic, before April 16.

anssik: any questions?

Privacy review feedback normative changes

[privacy-tracker] Make WebNN API a policy-controlled feature (issue #145)

Add Permissions Policy Integration and initial createContext() hooks (PR #159)

anssik: we discussed this on our previous call and there was unanimous support
… so I crafted a PR #159 that adds Permissions Policy Integration
… also added initial steps for createContext() method to allow integration of the "allowed to use" check
… I believe Chai may want to base some of his work on top of this?

Resolution: Merge PR #159 to add Permissions Policy Integration and initial createContext() hooks

anssik: any comments?

Operation-specific APIs proposal

operation-specific APIs proposal by Ping and Jonathan

anssik: we've been refining the work-in-progress features that satisfy the requirements of the operation-specific APIs proposal

WebAssembly scenario of the op level execution use case

Support CPU - WebAssembly scenario of the op level execution use case (issue #156)

anssik: as for the WebAssembly scenario of the op level execution use case described in issue #156 by Ningxin
… there's a PR #162 that in part enables an efficient interop with WebAssembly, among other aspects

Add support for device selection and asynchronous data downloads for GPU readbacks (PR #162)

anssik: Chai, Ningxin you want to summarize your investigation and progress?

Chai: PR #162 should address the original issue of inteop with Wasm, while allowing the result of compute() method, now sync, to be downloaded on demand based on TF.js use case described by Ningxin
… from GPU point of view, should work as well, since interop with WebGPU context
… this PR is a "solution" for using WebNN as an op-level API

Ping: LGTM! Thanks Chai and Ningxin

Chai: thanks Ping and Ningxin!

<ningxin_hu> Thanks Chai for making this PR

Clarify at which point weights are used in the compilation

anssik: quoting Rafael: "wanted to say, one thing we need to clarify at which point the web developers weights are used in the compilation, if they give us an ArrayBuffer, change it in between, need to specify when that work be more explicit if compilation means we copy things"

anssik: anything to discuss about this issue?
… do editors see aspects that need to be agreed with the group prior to advancing with this issue?

Ningxin: I'll work on a PR for this issue, no issue with ArrayBuffer and weights, since that was clarified on the previous meeting
… weights are to be provided via GPU or GL buffer, so the open is how the weights via GPU resources are interacted with, are they copied in the compile and build steps?

Rafael: I haven't read through the new PR that Chai submitted
… re GPU buffers, as long as you don't get the computation as a promise, we're OK
… as long as there are promises there are data races
… or if you don't get the model compilation as a promise, we're OK too
… this is undefined, need to define what happens when the promises are unresolved

<ningxin_hu> sounds good

Chai: I did not address that in the existing PRs, because this needs its own PR
… Rafael, addressing your issue is still in my queue

How the caller using an op-specific API can resource upload and download

anssik: I believe PR #162 explains in an informative example, how MLGraph.compute() invokes tasks on different timelines in parallel, and how JS code can do asynchronous data downloads
… I believe we still need to add normative algorithmic prose to specify MLGraph.compute() steps in detail
… also need to specify the steps for MLTensor.data() i.e. normative steps that happen prior to the promise returned by MLTensor.data() is resolved

<Ping_Yu> one question is about webgl inputs, the texture sharing is not possible across context, are there APIs guarding against that?

Chai: about the upload and download, I make modifications in PR #162 on what happens when the call completes, in particular for uploads
… downloads should be added to the PR, will add that to PR #162

Ningxin: if MLTensor created for a GPU buffer or texture?
… it has its own data download method, do we need MLTensor to handle that or delegate to WebGPU API?

Chai: IIUC, question is how MLTensor is constructed? It can be constructed of any resource types we define

Ningxin: GPUBuffer has its own download/upload methods, do we use MLTensor.data to handle DL or leave that to the WebGPU API

Chai: my assumption is this is handled by the implementer, these tasks happen in the backend in the browser engine
… assumption it is good enough to defer this to browser implementation
… MLTensor interface encapsulates many resources, GPU and system memory
… should work across resource types, otherwise you have to deal with different type of resources
… I'll take a look at the feedback in the PR, good topic to look deeper into
… concern is it
… gets too complicated to handle with ever expanding number of resource types

Ningxin: proposal to separate system resource type from GPU resource type

Ping: data download is this downloading data to the CPU(?)

[line breaking]

Ping: you want to get it from the CPU, there are users that want to access the data on CPU as a handle to the texture or buffer
… another questions, WebGPU textures not shared across contexts

Chai: the design of this PR is that the data method is CPU download only
… separation between output resource and system memory resource that gets copied from
… the data method is meant to be CPU download because CPU readback is expensive
… and there might be layout manipulation
… targeting the CPU memory, for GPU resource there's another method, currently "resource" for the lack of a better word
… as for the second questions, how the app deals with the edge between the WebNN and WebGPU APIs
… web developers do that themselves, if additional DL/UL needs to happen it is the responsibility of the JS framework
… related to how to deal with WebGPU DL/UL, better left to the callers of WebNN
… too much assumptions how other APIs deal with these aspect, will become an problem

<chai> (back now)

<Ping_Yu> sounds good

Chai: thanks Ping for your help with this work!

<Ping_Yu> will comment on the PR

TAG review feedback - open issues without associated PRs

anssik: Let's look at the remaining open TAG review issues and proposed resolutions, if any

Define a common term for logical tensor changes?

[tag-tracker] issue #150

anssik: no proposed resolution, yet?

Rafael: Rafael: what does "tensor changes" mean in this context?

anssik: please ask in that issue :-)

Isomorphic JS story, worker scope exposure

[tag-tracker] issue #142

anssik: Ningxin shared the webnn-polyfill CPU backend (dep on TF.js CPU backend) already works in Node.js for testing
… navigator can be emulated in Node, but non-browser implementation could bind ML object to whatever is appropriate for a given JS exec environment as Ningxin suggested?

anssik: I wanted to point out already now Deno (JS and TypeScript runtime using V8) has an experimental support for WebGPU API https://deno.land/posts/v1.8 and its GPU object hangs off of navigator object similarly to the ML object for WebNN API
… worker scope exposure would enable inference without blocking the main thread
… communication between worker(s) and the main thread is via postMessage() API

proposed RESOLUTION: Expose ML interface to Worker in addition to its default Window context

Rafael: I'd prefer to expose ML object elsewhere than in navigator

navigator.gpu

Rafael: maybe expose WebNN API in the same contexts as WebGPU is exposed

<RafaelCintron> https://gpuweb.github.io/gpuweb/#navigator-gpu

proposed RESOLUTION: Expose ML interface to Worker scope similarly to WebGPU API

Resolution: Expose ML interface to Worker scope similarly to WebGPU API

Ergonomics of the JS examples

[tag-tracker] issue #139

anssik: I've understood TAG is not fully satisfied with the positioning of the WebNN API as an API for frameworks as its primary consumer
… so I'm wondering whether we could explain in the WebNN API how the Model Loader API is expected to provide a more approachable API for web developers to use directly?
… I think we have text like that in the explainer, maybe making that connection in the spec intro would help?

proposed RESOLUTION: Explain in the spec intro the rationale why the primary API consumer is a JS framework, note Model Loader API as a higher-level abtraction targeting web developers

Chai: there's nothing wrong for a web dev to call into WebNN API, we want the frameworks to take advantage of WebNN
… API not for web devs is not fully accurate, but Model Loader API will be easier for web developers for sure

Resolution: Explain in the spec intro the rationale why the primary API consumer is a JS framework, note Model Loader API as a higher-level abtraction targeting web developers

String enum for activations

[tag-tracker] issue #138

anssik: I noted there's an active discussion going on in this issue, I suggest we let this discussion settle before we propose a resolution
… quoting Sangwhan/TAG: "The fact that hardware-accelerated implementations of common blocks can have limitations is an implementation detail that should not be ossified into the design of the API. Instead of baking the silicon limitations into the API, shouldn't it throw based on the limitations of the context?"
… all, please review the issue and provide your perspective

Chai: I'm concerned with the proposal from Sangwhan bacouse activation functions is ever growing list
… there are 20+ but only a subset are relevant
… we might impose to implementer to support all the activations, or success in some and fail in some
… either or is bad from API implementer point of view
… you want to strive for an API you really know can be implemented succesfully
… otherwise it might be ignored
… we can already compose a network, see gru section
… I see both the sides of the discussion, but prefer not leave the API open ended, knowing these are combinations that can fail, the issue is the user might not know when the API fails
… if the user cannot be confident the API works, the user won't use it
… I feel quite strongly about my position

anssik: clarifying the extensibility story is one possible approach

– DRAFT –
WebML CG Teleconference – 15 April 2021

15 April 2021

Attendees