WebML CG Teleconference – 27 May 2021

27 May 2021


Anssi_Kostiainen, Chai_Chaoweeraprasit, Jonathan_Bingham, Ningxin_Hu, Ping_Yu, Rafael_Cintron, Zoltan_Kis
Anssi, anssik

Meeting minutes

Security and Privacy

Security and Privacy Considerations

anssik: First let's review and discuss initial Security and Privacy Considerations.

anssik: I submitted a PR #170 to address issue #122. It should be noted we expect to evolve this initial version based on additional feedback, this is a starting point.

Security and privacy considerations (issue #122)

Add initial Security and Privacy Considerations sections (PR #170)

anssik: Chai LGTM'd the PR #170, pending Ningxin's LGTM. This content meets the bar for First Public Working Draft purposes.
… In Security section we should discuss security mechanisms that protect confidentiality, preserve information integrity, or promote availability of data -- we already added Permissions Policy integration per PING feedback
… In Privacy, we discuss measures taken to protect the rights of individual with respect to personal information, or known privacy concerns
… fingerprinting is probably the most substantial privacy concern in Web API design

Mitigating Browser Fingerprinting in Web Specifications

anssik: PING has written a doc about it that also proposes mitigations, we all should read it

<ningxin_hu> sure, I'll review

We have two related [privacy-tracker] labelled issue

[privacy-tracker] Self-Review Questionnaire (issue #119)

anssik: this documents our questionnaire response and serves as a record Privacy Interest Group has acknowledged and is happy about our initial response. We continue work with PING to expand the privacy considerations.

[privacy-tracker] Fingerprinting via matmul (issue #85)

anssik: issue #85 is about a possible fingerprinting vector we discussed earlier
… in PR #170 I incorporated the following statement to inform implementers about this possibility:
… "An execution time analysis may reveal indirectly the performance of the underlying platform's neural network hardware acceleration capabilities relative to another underlying platform."

anssik: Ningxin provided comments (thanks!) from Wasm people how they're handling these concerns, documented in the issue #85
… Ningxin, want to brief us on what you learned?

Ningxin: input from Jonathan and Jing involved with Wasm
… 1. Saturation and rounding (round-to-nearest ties-to-eve) are standardized in Wasm SIMD. So JS developers should see same saturation behavior on different architectures.
… 2. There is an early proposal in Wasm SIMD called Relaxed SIMD, which wants to relax some strict determinism requirements of instructions to unlock near native performance on different platforms. Fingerprinting would also be considered there.
… it'd be useful to monitor how Wasm CG address these issues

RafaelCintron: question re Relaxed SIMD, is this turned on by default?

ningxin_hu: need to check with Wasm people

anssik: if any of these mitigations apply to WebNN API we should reuse those

<ningxin_hu> https://github.com/WebAssembly/design/issues/1401

<Chai> [need to step away briefly. be right back]

<ningxin_hu> https://github.com/WebAssembly/relaxed-simd

anssik: Iet's keep the issuie #85 open to solicit further feedback

WebGPU/GL Security and Privacy Considerations

<Chai> [back]

anssik: wanted the group to discuss and review WebGPU/GL Security and Privacy Considerations to understand whether some of them could be repurposed in this context

WebGPU Security

WebGPU Privacy

WebGL Security

anssik: WebGPU identifies the following security considerations:
… - CPU-based undefined behavior
… - GPU-based undefined behavior
… Uninitialized data
… Out-of-bounds access in shaders
… Invalid data
… Driver bugs
… Timing attacks
… Row hammer attacks
… Denial of service
… Workload identification
… Memory resources
… Computation resources

anssik: and these WebGPU Privacy considerations:
… - Machine-specific limits
… - Machine-specific artifacts
… - Machine-specific performance

anssik: WebGL Security considerations:
… - Resource Restrictions
… - Origin Restrictions
… - Supported GLSL Constructs
… - Defense Against Denial of Service
… - Out-of-Range Array Accesses

anssik: Maybe Rafael has some comments from the WebGL side?

RafaelCintron: both groups take this very seriously
… in WebGL you can only use same-origin or CORS textures to avoid ppl using timing attacks
… many many years based on security research feedback
… timer queries less precise to avoid fingerprinting
… WebGL tried to make many undefined things to be defined
… so behaviour is consistent across browsers
… not just helping security, but also debuggability
… it has been a group effort to WebGPU/GL to come up with Security and Privacy Considerations

anssik: questions, comments?

Operation-specific APIs proposal

anssik: Let's continue our favourite topic, and discuss design considerations and review proposed solutions to enable both efficient graph execution and imperative eager execution with a cohesive WebNN API.

https://github.com/webmachinelearning/webnn/pull/166 Support download data asynchronously (PR #166)

anssik: I'll let Chai and Ningxin update us on the status of this PR

<ningxin_hu> https://github.com/webmachinelearning/webnn/issues/156#issuecomment-846828170

ningxin_hu: I have a prototype to better understand this issue
… this follows up on Jonathan and Ping requirement to allow conv2d impl for TF.js Wasm backend
… The implementation is in conv2d_impl.cc and the WebNN calls are guarded by USE_WEBNN_OP. With the prototype, I observed good performance speedup (3X to 5X) by a tf.conv2d benchmark when offloading the compute to native library (such as XNNPACK or oneDNN).
… using same op cache for later use, when the cache hits a graph is fetched and compute run on inputs and outputs allocated by TF.js
… two backends used, XNNPACK and oneDNN
… using webnn-native project for this prototyping
… observations:
… 1) TF.js Wasm backend expects the input and output data of an op execution to be in standard layout.
… 2) TF.js Wasm backend pre-allocates input and output buffers for an op execution.
… 3) TF.js Wasm backend executes an op synchronously.

ningxin_hu: GraphBuilder API implemented as a sync API to satisfy TF.js backend requirement

RafaelCintron: biggest finding is that using real native code is faster than using the best Wasm you can have today, this gives a lot of hope WebNN is faster than alternatives

Ping_Yu: thanks Ningxin for making WebNN working with TF.js Wasm backend
… want to understand where the performance comes from, my understanding is it is due to wider SIMD and oneDNN optimizations?
… historically Wasm backend is in sync mode, for other backends there's no sync download

[line breaking, typos expected in scribing]

ningxin_hu: performance gain comes according to my investigations, wider SIMD via XNNPACK, AVX-256 wide instruction on my dev machine, while Wasm SIMD is only 128 wide today
… oneDNN uses even more aggressive optimization strategy, reorder not only width, but input and output to vector instruction optimization layout, this is platform specific, different architectures may use different layouts for better performance

RafaelCintron: there's difference between API being sync and returning objects right away

<Jonathan> Is WASM SIMD going to get wider instructions eventually (256 vs 128)?

RafaelCintron: if they're backed by GPU that's possible, so OK for me to have WebNN compute return objects right away, and if you want to read back you have to do that via async promise-returning object

Chai: specifically on the PR itself, I've summarized it in my more recent comment
… want to double-check what Ningxin responded lately
… are you saying that based on your prototype it doesn't matter if the API return native format?
… if we're going to support native format, there's a fingerprinting concern, so would be good for the group to clarify what is the position what comes to native format, overlaps a bit with readback that can happen in standard format
… sync vs. async is a questions, we should talk pros/cons of both and settle on one design, if we support both it'll be harder for implementers
… if we do both it'll be more confusing for the caller

ningxin_hu: in this PR we discuss native format support, my latest response re investigation on TF.js backend is a major use case I think
… it turns out TF.js expects input and output in standard layout, based on that we can leave native format in a separate issue
… to be handled in a future version

<Chai> +1 on standard format only in V1

ningxin_hu: I can revert MLTensor proposal that support format conversion, because that is not needed for this V1 spec
… to close op specific use case, I propose we align what Chai proposed, make compute API only support pre-allocated output, good for GPU resource
… this is documented in my comment

<ningxin_hu> https://github.com/webmachinelearning/webnn/pull/166#discussion_r637792027

ningxin_hu: secondly, we'll leave native format in a separate issue to be addressed in future
… third is to add support for sync version of build and compute API, keep async versions too

anssik: anyone have concerns with Ningxin's/Chai's proposal?

RafaelCintron: is there agreement that we're going to have both sync and async?
… is async meant for CPU ArrayBuffer usage?

ningxin_hu: I think the agreement is for the compute API to not return output
… previously we only had compute accept input and preallocated output
… this is one proposal, agreement between Chai and me

Chai: the previous API implies the implementer needs to manage the output buffer
… separate issue is sync vs async
… my preference is for async, making it simpler for implementer and caller
… WebNN context can be created off of explicit device or context, in the latter case the implementation can create a GPU device under the hood hiding platform details for the caller fully
… additional flexibility to caller, but if resource is given to ArrayBuffer, implies the caller want to buffer to be on CPU
… if we have both async and sync API we have to document carefully both

anssik: can you live with both sync and async API?

<ningxin_hu> sgtm

RafaelCintron: if we must have async version, perhaps we can have that be strongly type to only take ArrayBuffer and return them

<Chai> +1 on async only for CPU buffer output

Model Loader API update

anssik: Next, we'll hear a Model Loader API update from Jonathan. Two topics:
… 1. Chrome OS Origin Trial plans
… 2. Wasm runner for TF Lite and how it could possibly be used for WebNN benchmarking

Jonathan: ChromeOS has staffing for doing work on Model Loader API
… starting with the spec in this CG
… getting to Origin Trial in 1-2 Qs
… plan to be able to run TF Lite models, understanding that TF Lite models are not suitable for a web standard
… but want to use that as a starting point to be able to benchmark with WebNN and Wasm
… at Google I/O we announced work by Ping et al. Wasm runner for TF Lite models
… similar to Model Loader, takes TF Lite model and runs that with Wasm
… there's potential for this work to become a polyfill for a graph API or Model Loader API, if we make that more generic, possible future generation

ningxin_hu: for the second one, you mentioned TF Lite Wasm version, it is interesting
… for WebNN benchmarking, do you mean we have WebNN backend for TF Lite Wasm similarly to TF.js backend?

Jonathan: that's potential, you need to talk to Ping about that
… from our side, TF team wants to do some benchmarking, one way is TF Lite Wasm runner, one is WebNN and parse the model and construct the graph
… it could be useful for this group

ningxin_hu: regarding our explainer, we target framework usage, TF Lite Wasm runner fits into that, I'm interested in investigation into this and will chat with you offline

<Jonathan> https://www.youtube.com/watch?v=5q8BzYN4rqA

Minutes manually created (not a transcript), formatted by scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).


Succeeded: s/sync more/sync mode

Maybe present: anssik, Chai, Jonathan, Ningxin, RafaelCintron