WebML CG Teleconference – 29 April 2021

Meeting minutes

Web Machine Learning WG launched

anssik: thanks to your contributions over the two+ years including supportive review from the W3C Advisory Committee, W3C launched the Web Machine Learning Working Group last week -- congrats y'all!

Call for Participation: Web Machine Learning Working Group

Annoucement blog post

anssik: your actions: click the "Instructions for joining the Web Machine Learning Working Group" link and follow the instructions to join the WG:

Instructions for joining the Web Machine Learning Working Group

anssik: some of you have already joined.
… your AC rep will get notified of your request and will initiate the internal reviews and will join you
… internal reviews might take some time and thus we wait for the critical mass of key contributors to join before we hand over the WebNN API to the WG, meanwhile we're advance the API in this CG

Dom: thanks to the Community Group for getting to the Working Group! Nice job!
… I put a few proposed topics to discuss to the agenda:

anssik: 1) Discuss how this Community Group and the new Working Group will work together
… 2) Discuss how we plan to take the Web Neural Network API to First Public Working Draft and beyond
… 3) Discuss the new new deliverable working title "Ethical Impact of Machine Learning on the Web"

Anssik: WG/CG collaboration - short answer is Dom will take care of the transition & collaboration aspects so y'all can focus on the technical work

anssik: starting with (1), the short answer is with Dom we'll take care of the transition aspects so you can focus on the technical work. In short, the transition should be almost transparent to you. The spec will adopt the official W3C spec format which changes some visual aspects.

Anssik: the spec will evolve to take the WG style

Dom: as we transition to a WG, there's more attention to the work from broader community
… we may need to adjust some logistics e.g. meeting time rescheduling

anssik: as for (2), we're a path to publish the WebNN API First Public Working Draft (aka FPWD) during Q2 2021
… given this is the first formal publication, no "previous maturity level" exists and many requirements do not apply. Later stages add more requirements, esp when transitioning to Candidate or Proposed Recommendation.
… given we've been proactive and initiated both TAG and PING reviews, addressing those issues by the time we publish FPWD would be great demonstration of early wide review
… questions about the WebNN API First Public Working Draft?

Dom: before WG takes over CG spec, CG should publish a formal CG report and make final licensing request?
… questions about the WebNN API First Public Working Draft?

anssik: about (3), per AC review comments, the WG added a new deliverables working title "Ethical Impact of Machine Learning on the Web"
… the charter says: "The Working Group will develop a Working Group Note documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account."

anssik: I propose we create a repo for this deliverable. I'm volunteering to bootstrap the work on this deliverable and welcome your contributions. I was planning to structure the doc around the following high-level topics each documenting the potential concerns:
… 1) Bias - how AI systems may be unfair, or have harmful impacts to certain groups of people due to social differences
… 2) Privacy - obviously example is face recognition
… 3) Explainability and Transparency - how data is gathered, how algos are trained and tuned, web tech can be part of the solution helping visualize models, but DL network as not very explainable due to complexities.
… repo name proposal: "ethical-webmachinelearning"?

Dom: thanks for the early proposal, structurally useful to identify if there are well-known taxonomy ethical risks we could build upon
… or communities we could ask for input as we work on this document
… similarly to accessibility, internationalization etc.
… as a way to anchor existing work into this framework

<Geun-Hyung> +1 anssik

Operation-specific APIs proposal

anssik: Continue refine work-in-progress features, A LOT of good discussions in PR #162

WebAssembly op level execution use case and resource upload and download

Issue #156

PR #162

anssik: PR #162 is a substantial one with 50+ comments and deep design discussions going on in it. Thanks Chai, Ningxin, Ping, Rama, others for work on this.

Chai: there's too many topics in this PR, should maybe be split
… two parts: device selection and data download and upload and layout conversion
… first part almost consensus, except naming

Device preference naming

Chai: - device preference naming suggestions "software", "hardware", or "cpu", "gpu", "accelerator" / "custom" etc.

Chai: two sub-issues: naming and whether or not we want to name the accel
… in my PR the enum is "default", "cpu", "gpu", "accelerator"

accelerator is the hardest, CPU and GPU are well-defined
… if the define is a CPU device you know use use ArrayBuffer, for GPU maps to WebGL and GPU
… accelerometer has not been defined, e.g. DirectX may decide to map its API to accelerometer, then you could use WebGL contract to use that accelerator undernearth, say Intel VPU
… WebGL term is a bit weird in this context
… we already have an enum to control the power profile, one can still create a selection policy that maps to accelerometer

Zoltan: wanted to note Kubernetes struggles with the same problem
… used labels and selectors instead
… vendor independent way to map to particular device types

<ping_yu> from usability point of view, it would be nice for the developer to query the availability of devices

[querying available devices tends to get serious frowns from a privacy perspective]

<zkis> https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

Web Platform Design Principles > Naming principles

anssik: could we spin naming into its own issue?

Chai: we could to that, remove accel from the PR for now

<ping_yu> need to step away a bit

Ningxin: agree with Chai that naming is still open, would like to say for this issue, device selection, major open issue to close is to make sure resource UL/DL is effective
… if we select GPU then developers should know they're using GPU Resource

RafaelCintron: WebGPU CG has been tackling this at the very last meeting
… they have requestAdaptor that accepts filters one is powerPreference
… recently struggling if I'm building an app I know will run badly on software, how to filter that out
… adapter can be plular, can be multiple and there's an attribute that tells whether an adapter is software
… generally supportive of this, Apple has some concerns about this design in WebGPU discussion quoting privacy reasons
… enumerate and filter devices needed for compute-only devices in WebGPU in particular

Dom: privacy challenge noted by Rafael is an important and recurrent theme with device selection and enumeration

<RafaelCintron> WebGPU PR#1: Pluralize requestAdapters() and add GPUAdapter.isSoftware (https://github.com/gpuweb/gpuweb/pull/1660)

<RafaelCintron> WebGPU Issue: Add GPURequestAdapterOptions.allowSoftware (https://github.com/gpuweb/gpuweb/pull/1634)

Dom: privacy and fingerprinting also impacting WebRTC API that had to refactor its enumeration API
… a word of caution, it is not just about dev experience, so any decision in this space need to be careful and balance with DX

Zoltan: I agree we should have a separate discussion, when we speak about privacy should keep threat mitigation discussion concrete
… having a "CPU" is not a privacy issue

+1 to zoltan

behavior of MLTensor for the pre-allocated outputs scenario

ningxin_hu: I think this discussion moves to a later one with Chai, Rama, Anssi

<ningxin_hu> https://github.com/webmachinelearning/webnn/pull/162#discussion_r613729314

ningxin_hu: major open for this comment, before this PR we had resource binding for inputs and outputs
… compute() would layout convert output data to ArrayBuffer, with this PR Chai introduced MLTensor with sync data() method to help developers
… this created some open with resource binding mechanism
… i.e. if you have ArrayBuffer binding and another API with async binding, it seems complex
… made a proposal to separate MLTensor from GPU Resource binding
… MLTensor seems like an upload source and download target

<zkis> Another link on kubernetes (to the previous topic): https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

ningxin_hu: MLTensor does not then have any binding to ArrayBuffer as before, it'd satisfy different device type requirements from the table in this PR
… would make the spec self-contained
… also proposed GPU Resource is a binding mechanism, I tend to propose to use the previous mechanism, like we do for MLInput and MLOutput binding, would work well for GPU Resources
… also mentioned GPU Resources have their own WebGL and GPU APIs for UL/DL, looks like we can use those APIs for UL/DL instead of introducing a new similar mechanism in WebNN API

RafaelCintron: question about MLTensor proposal: at what point is the ArrayBuffer copied into MLTensor?

<chai> (need to step out for a few mins)

<chai> (back now)

ningxin_hu: not specified yet(?), discussion focused on the download part now

ack chai

Chai: seems to be the same issue, Ningxin's explanation was a great example of complexities of data UL/DL with different resource types
… my opinion is much of the discussion in this PR is around this complexity
… should be able to manage data download outside WebNN API leave that to the framework
… the more we try to abstract this, the more edge cases you'll find
… the caller is not the one we assume, major discussion point in this PR
… the main goals of the WebNN should be I'm binding this resource to the WebNN API
… once the output resource is bound to the graph and it is executed, it is up to the caller
… the caller known with which type of resource it deals with, can do overlapping upload and download
… you may want to in parallel upload the next frame and let caller manage overlapping DL/UL e.g. on different threads
… there's still confusion around it, I'm sure
… I hear Rafael asking about data upload, it is the same problem, sync of resources and timing
… the more we abstract, the more challenging

Rafael: leave it to the user? how does that change the API?

Chai: we may not need MLTensor as in the PR now?
… also async call can be eliminated, we go back to the earlier design before this PR

Rafael: how about weights?

Chai: weights are operands

Rafael: values of the weights?

Chai: whatever is given to the model, DirectML that runs in the backend needs to negotiate with the driver, 10/10 cases the driver will need to convert the format into native layout
… cannot happen in the userspace
… bind the weights with the model, when compiling the graph, all the weights are compiled to a native format

RafaelCintron: what is the data type for weights web devs give?

Chai: initially ArrayBuffer mapped to CPU
… caller can interact with WebGPU, if you get model out and upload everything to GPU, then the weights are already in the GPU memory
… WebNN should just offers means to bind the resources

ningxin_hu: I'm hearing Chai suggests to go back to the previous MLInput and MLOutput for resource binding
… I agree this is good for GPU Resources, but for CPU we lack means to do sync data conversion from an internal layout to a standard layout
… for ArrayBuffer every time we need to download the data, major reason we want to introduce MLTensor to have a call to let the FW manage UL/DL,
… this would mean the use case for Wasm with multiple graphs would be inefficient, with multiple data layout conversions happening

Ping: I hear what Chai says, and Ningxin's point
… API should not be too complicated
… the underlying format from what users can provide
… w/o giving user high perf execution promise, defeats the purpose of this API
… even for GPU, our execution could use different formats, vastly different from what users provide
… the formats vary, having users provide the binding is performance issue, from API perspective cleaner
… usually from graph execution POV, seems to be good to have input and output, but now people chain a lot of small graphs and want to detect the boundary, ML is not one single graph anymore, multiple models interconnected

Chai: just quickly, to move forward we need to tease a path for these two processes
… 1) data UL/DL, independent of the layout format
… 2) layout formats
… native HW layouts exists and understand perf implications, requirement is not when data is copied but when it is converted
… teasing these processes apart we can reach flexibility

anssik: Chai are you proposing to finish the device selection?

<ningxin_hu> +1

Chai: separate issue for data UL/DL and another for layout conversion

ningxin_hu: +1 Chai's resolution

proposed RESOLUTION: Open new issues for 1) data UL/DL 2) layout format conversions discussions

proposed RESOLUTION: Open new issues for 1) data UL/DL 2) layout format conversions discussions, and finish PR #162 addressing device selection

<chai> +1

<ningxin_hu> +1

<RafaelCintron> +!

<RafaelCintron> +1

Resolution: Open new issues for 1) data UL/DL 2) layout format conversion discussions, and finish PR #162 addressing device selection

Adjourn

[TAG guidance on device capabilities discovery & selection https://w3ctag.github.io/design-principles/#device-enumeration ]

– DRAFT –
WebML CG Teleconference – 29 April 2021

29 April 2021

Attendees

Meeting minutes

Web Machine Learning WG launched

Operation-specific APIs proposal

WebAssembly op level execution use case and resource upload and download

Adjourn

Summary of resolutions

Diagnostics