W3C

– DRAFT –
WebML WG Teleconference – 5 October 2023

05 October 2023

Attendees

Present
Anssi_Kostiainen, Deepti_Gandluri, Joshua_Bell, Joshua_Lochner, Ningxin_Hu, Rachel, Rachel_Yager, Rafael_Cintron, Vivek_Sekhar, Wanming_Lin
Regrets
Dominique_Hazael-Massieux
Chair
Anssi
Scribe
Anssi, anssik

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome Deepti Gandluri from Google to the WG! She has worked on the WebAssembly implementation in V8.

Deepti: I also co-chair W3C Wasm CG

<Deepti> Thank you!

Deepti: also please welcome Phillis Tang also from Google to the WG! She has shipped PWA desktop capabilities in Chrome, also one of Google W3C WebApps WG reps

WebNN v2: Review op breakdown for proposed model targets

anssik: The WG has identified the following as its v2 model targets:
… Text-to-image: Stable Diffusion unet/VAE/text encoder
… Image segmentation: Segment Everything decoder
… Speech-to-text: Whisper Tiny
… Text-to-text generation (encoder-decoder): t5 and m2m100
… Text-generation (decoder-only): llama
… as discussed on our last call, we want to do an op breakdown to better understand what is common across these architectures to inform WebNN API v2 op priorities.
… Ningxin indicated he was working Wanming on such an op breakdown, would you Wanming like to share an update into this investigation?

anssik: issue #375

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

webmachinelearning/webnn#375 (comment)

https://user-images.githubusercontent.com/3271201/270237911-3a204653-c8d4-4243-b2cd-6e4443240bf3.jpg

Wanming: transformer models contain dynamic input shapes, ONNX RT Web enabled freeDimensionOverrides
… users able to run dynamic shape models in WebNN EP
… ONNX RT support constant folding, fusion, node eliminations etc.
… in addition to graph optimization
… optimized model is a static model, when inference session is initiatized these optimizations are applied, WebNN EP runs an optimized model
… comparing the optimized model with dynamic shape model a number of ops are eliminated and fused
… the table provides a summary of this process
… in this table you see op list and number before and after optimization
… each model indicates how many ops there are, if 0 ops are totally eliminated
… e.g. constant and shape could be 100% eliminated
… in "op usage count of optimized models", zero means it is not used at all

Vivek: thanks for this investigation, super useful, we've discussed this internally too
… one of the earlier comments in the issue was re op comparison to TOSA and StableHLO
… could we look into how we align with those two?

anssik: maybe Googlers could contribute that data if we make the spreadsheet public?

Deepti: is there any docs on what do v1 and v2 mean in this context?

<Deepti> thanks

anssik: just internal constructs for ourselves to reason about our work

ningxin_hu: have you mapped ONNX ops to what Jiawei proposed in the thread for transformer support?
… three transformer models noted there, did you map to those?

Wanming: not yet, some of those ops are implemented in terms of other ops

ningxin_hu: this could be a nice next step, have breakdown first, then see how they map into those proposals from Jiawei

<jsbell> at bottom of webmachinelearning/webnn#375 (comment)

<ningxin_hu> webmachinelearning/webnn#375 (comment)

ningxin_hu: for TOSA and StableHLO mapping, there is a table that has elements for mapping to these by Jiawei

<jsbell> at bottom of a very long comment

<Joshua_Lochner> https://github.com/webmachinelearning/webnn/issues/375#:~:text=TOSA/StableHLO%20Mappings

anssik: we could integrate that data from Dwayne to the table by Wanming
… and make the table collaboratively editable
… I propose those as our next steps, agreed?

[silence means agreement]

Security considerations

Computation control-flow attack based on weights / constants change

anssik: issue #443

<gb> Issue 443 Add security consideration for computation control-flow attack based on weights / constants change (by huningxin)

anssik: a question raised whether a computation control-flow attack would be possible using weights and constants change
… from the comments:
… - WebNN currently only accepts static graphs
… - WebNN does not support control flow operations (a difference from Wasm/WGSL)
… implementation-specific concerns how memory region is shared between:
… - the more privileged process calling the graph processing functions
… - the untrusted renderer process exposing the WebNN API
… suggests compromised renderer process is the concern, not abuse of the WebNN API via JS
… a reasonable course of action would be to expand the Security Considerations:

Document operations susceptible to out-of-bounds access as a guidance to implementers
… Ningxin what is your latest thinking?

ningxin_hu: Alex asked this question during the Chromium code review, linked from #443
… this is a valid questions, related to implementation, any multi-process impl will used shared memory between privileged and unprivileged process for transferring weights
… the argument is implementation of WebNN can allow compromising the renderer process with weights changed during computation, affecting control flow behaviour
… the way how weights are shared between processes and what is the relationship between graph compilation and build, not defined in the WebNN API spec, is currently considered an implementation detail
… normative language would help to define how to mitigate this

RafaelCintron: adding to Ningxin, all CLs that go into Chromium and Mojo get Security team's reaview
… s/reaview/review
… in Chromium CL it was asked, can we reduce the number of copies, i.e. GPU operating on the same memory as the renderer process, what are the security implications of that? WebGPU also exploring this problem space.

anssik: how far WebGPU folks are on this exploration?

RafaelCintron: will check and report back

anssik: are these CLs landed?

RafaelCintron: landed, the CLs make a copy so no problem

RafaelCintron: with larger models copying memory might become an issue

jsbell: we shouldn't let iteration on implementation hold off from adding security considerations to the spec
… you describe the problem and list mitigations as you identify them, e.g. sandboxing

Deepti: I understand mem copies in general are not ideal, when copying between these processes is there a measurable perf impact?
… isolation between renderer and the process touching shared memory, is this just a performance issue?

RafaelCintron: the main reason for getting rid of copies is performance

Deepti: how transferrable this is across ML apps, e.g. in Wasm to get rid of copies, we found other things web apps do removing copies did not improve performance that much

RafaelCintron: with enough ops and large models we can validate performance implications

New features

Allow checking whether operators/types are supported for a backend before creating a graph

anssik: issue #463

<gb> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin)

anssik: currently WebNN API throws errors when calling MLGraphBuilder.build() with unsupported operators in the graph
… Rafael notes that at build time weights may have been already downloaded and thus failing at build time is too late to avoid downloading a large model if the backend does not support some operators
… proposal to add APIs to help determine if build would fail given a set of ops BEFORE actually creating the graph
… Jiewei proposed as a solution for weight loading step inspired by popular models e.g. Stable Diffusion
… Jiewei also shared pseudo code for the API changes on how to match constant nodes with weight buffers
… Ningxin modified it to use MLNamedArrayBufferViews
… proposed as a v2 feature i.e. to be addressed together with our v2 ops work
… thoughts?

RafaelCintron: this came up in Chromium CL reviews
… example, some op takes only certain data type, suboptimal for web developers
… or can do int, but not float
… two approaches to solve this:
… - the number of nodes makes the graph big to download, a solution proposed by Jiawei was to build the graph separately as a second step
… - API similar to WebGPU that tells you what are supported by the implementation

ningxin_hu: I have a question to the user of the framework?
… can framework do this separation of topology from weights to two different files or resources to download?
… Wanming, when you looked at transformer models do they combine these two resources together or are they served in separate files?

<jsbell> +1 to that - want feedback from frameworks that would use WebNN backends

Wanming: not sure, but I'll take a look later

jsbell: this seems like an ergonomics issue, providing capabilities similar to WebGPU has fingerprinting concerns
… for developers this is hidden by frameworks they're be interfacing with primarily

Joshua_Lochner: from a Transformers.js perspective, .onnx file contains graph topology and weights
… ONNX RT Node supports external data format, for ONNX RT Web has a feature request to separate the graph and weights into separate files

RafaelCintron: I think more investigation required on which API shape is a better solution for this

<Joshua_Lochner> microsoft/onnxruntime#17151

<Joshua_Lochner> and the PR to fix: microsoft/onnxruntime#17155

Enhancements

Type of parameters should match the input data type

anssik: issue: #442

<gb> Issue 442 Type of some parameters should match the input data type (by Honry)

anssik: Wanming notes MLPadOptions.value, MLClampOptions.minValue, MLClampOptions.maxValue should use union type
… also a few proposed v2 ops too
… Dwayne seems to agree
… I believe we could address this with a union type, needs to be checked if that's applicable for dictionary members

https://webidl.spec.whatwg.org/#idl-dictionaries

<jsbell> looking... :)

<jsbell> I'll add to me TODO list

<jsbell> (same for https://github.com/webmachinelearning/webnn/pull/464)

Wanming: no additional information to add at this time

Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

Succeeded: s/download/download, a solution proposed by Jiawei was to build the graph separately as a second step

Maybe present: anssik, Deepti, jsbell, RafaelCintron, Vivek, Wanming

All speakers: anssik, Deepti, Joshua_Lochner, jsbell, ningxin_hu, RafaelCintron, Vivek, Wanming

Active on IRC: anssik, Deepti, Joshua_Lochner, jsbell, ningxin_hu, Rachel, RafaelCintron, Vivek