13:55:24 RRSAgent has joined #webmachinelearning 13:55:24 logging to https://www.w3.org/2021/05/27-webmachinelearning-irc 13:55:27 RRSAgent, make logs Public 13:55:27 please title this meeting ("meeting: ..."), anssik 13:55:29 Meeting: WebML CG Teleconference – 27 May 2021 13:55:34 Chair: Anssi 13:55:39 Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-05-27-agenda.md 13:55:45 Scribe: Anssi 13:55:54 scribeNick: anssik 13:56:02 Present+ Anssi_Kostiainen 14:00:09 Ping_Yu has joined #webmachinelearning 14:00:15 Present+ Ningxin_Hu 14:00:33 ningxin_hu has joined #webmachinelearning 14:00:39 Present+ Ping_Yu 14:02:06 Present+ Chai_Chaoweeraprasit 14:02:40 Present+ Jonathan_Bingham 14:02:59 RRSAgent, draft minutes 14:02:59 I have made the request to generate https://www.w3.org/2021/05/27-webmachinelearning-minutes.html anssik 14:03:25 Jonathan has joined #webmachinelearning 14:03:26 Chai has joined #webmachinelearning 14:03:35 RafaelCintron has joined #webmachinelearning 14:03:52 Present+ Rafael_Cintron 14:04:00 Topic: Security and Privacy 14:04:12 Subtopic: Security and Privacy Considerations 14:04:55 anssik: First let's review and discuss initial Security and Privacy Considerations. 14:05:00 present+ Zoltan_Kis 14:05:10 anssik: I submitted a PR #170 to address issue #122. It should be noted we expect to evolve this initial version based on additional feedback, this is a starting point. 14:05:15 -> https://github.com/webmachinelearning/webnn/issues/122 Security and privacy considerations (issue #122) 14:05:21 -> https://github.com/webmachinelearning/webnn/pull/170 Add initial Security and Privacy Considerations sections (PR #170) 14:05:26 Present+ Zoltan_Kis 14:05:56 ... Chai LGTM'd the PR #170, pending Ningxin's LGTM. This content meets the bar for First Public Working Draft purposes. 14:06:52 ... In Security section we should discuss security mechanisms that protect confidentiality, preserve information integrity, or promote availability of data -- we already added Permissions Policy integration per PING feedback 14:07:27 ... In Privacy, we discuss measures taken to protect the rights of individual with respect to personal information, or known privacy concerns 14:07:32 ... fingerprinting is probably the most substantial privacy concern in Web API design 14:07:40 -> https://w3c.github.io/fingerprinting-guidance/ Mitigating Browser Fingerprinting in Web Specifications 14:07:58 anssik: PING has written a doc about it that also proposes mitigations, we all should read it 14:08:14 sure, I'll review 14:08:39 We have two related [privacy-tracker] labelled issue 14:08:45 -> https://github.com/webmachinelearning/webnn/issues/119 [privacy-tracker] Self-Review Questionnaire (issue #119) 14:09:14 anssik: this documents our questionnaire response and serves as a record Privacy Interest Group has acknowledged and is happy about our initial response. We continue work with PING to expand the privacy considerations. 14:09:25 -> https://github.com/webmachinelearning/webnn/issues/85 [privacy-tracker] Fingerprinting via matmul (issue #85) 14:09:37 anssik: issue #85 is about a possible fingerprinting vector we discussed earlier 14:09:47 ... in PR #170 I incorporated the following statement to inform implementers about this possibility: 14:09:54 ... "An execution time analysis may reveal indirectly the performance of the underlying platform's neural network hardware acceleration capabilities relative to another underlying platform." 14:10:55 anssik: Ningxin provided comments (thanks!) from Wasm people how they're handling these concerns, documented in the issue #85 14:11:00 ... Ningxin, want to brief us on what you learned? 14:11:14 Ningxin: input from Jonathan and Jing involved with Wasm 14:11:58 ... 1. Saturation and rounding (round-to-nearest ties-to-eve) are standardized in Wasm SIMD. So JS developers should see same saturation behavior on different architectures. 14:13:01 ... 2. There is an early proposal in Wasm SIMD called Relaxed SIMD, which wants to relax some strict determinism requirements of instructions to unlock near native performance on different platforms. Fingerprinting would also be considered there. 14:14:03 q+ 14:14:03 ... it'd be useful to monitor how Wasm CG address these issues 14:14:18 ack RafaelCintron 14:14:52 RafaelCintron: question re Relaxed SIMD, is this turned on by default? 14:15:00 ningxin_hu: need to check with Wasm people 14:15:36 anssik: if any of these mitigations apply to WebNN API we should reuse those 14:16:13 https://github.com/WebAssembly/design/issues/1401 14:16:15 [need to step away briefly. be right back] 14:16:46 https://github.com/WebAssembly/relaxed-simd 14:16:47 ... Iet's keep the issuie #85 open to solicit further feedback 14:17:15 Subtopic: WebGPU/GL Security and Privacy Considerations 14:17:28 [back] 14:17:35 anssik: wanted the group to discuss and review WebGPU/GL Security and Privacy Considerations to understand whether some of them could be repurposed in this context 14:17:40 -> https://gpuweb.github.io/gpuweb/#security WebGPU Security 14:17:45 -> https://gpuweb.github.io/gpuweb/#security-privacy WebGPU Privacy 14:17:49 -> https://www.khronos.org/registry/webgl/specs/latest/1.0/#4 WebGL Security 14:18:03 anssik: WebGPU identifies the following security considerations: 14:18:12 ... - CPU-based undefined behavior 14:18:12 ... - GPU-based undefined behavior 14:18:12 ... Uninitialized data 14:18:12 ... Out-of-bounds access in shaders 14:18:12 ... Invalid data 14:18:12 ... Driver bugs 14:18:13 ... Timing attacks 14:18:13 ... Row hammer attacks 14:18:13 ... Denial of service 14:18:14 ... Workload identification 14:18:14 ... Memory resources 14:18:14 ... Computation resources 14:18:59 anssik: and these WebGPU Privacy considerations: 14:19:06 ... - Machine-specific limits 14:19:06 ... - Machine-specific artifacts 14:19:06 ... - Machine-specific performance 14:19:14 anssik: WebGL Security considerations: 14:19:23 ... - Resource Restrictions 14:19:23 ... - Origin Restrictions 14:19:23 ... - Supported GLSL Constructs 14:19:23 ... - Defense Against Denial of Service 14:19:24 ... - Out-of-Range Array Accesses 14:19:39 anssik: Maybe Rafael has some comments from the WebGL side? 14:19:54 RafaelCintron: both groups take this very seriously 14:20:11 ... in WebGL you can only use same-origin or CORS textures to avoid ppl using timing attacks 14:20:22 ... many many years based on security research feedback 14:20:34 ... timer queries less precise to avoid fingerprinting 14:20:43 ... WebGL tried to make many undefined things to be defined 14:21:08 ... so behaviour is consistent across browsers 14:21:39 ... not just helping security, but also debuggability 14:22:54 ... it has been a group effort to WebGPU/GL to come up with Security and Privacy Considerations 14:23:59 anssik: questions, comments? 14:24:08 Topic: Operation-specific APIs proposal 14:24:36 anssik: Let's continue our favourite topic, and discuss design considerations and review proposed solutions to enable both efficient graph execution and imperative eager execution with a cohesive WebNN API. 14:24:50 -> https://github.com/webmachinelearning/webnn/pull/166 https://github.com/webmachinelearning/webnn/pull/166 Support download data asynchronously (PR #166) 14:25:04 anssik: I'll let Chai and Ningxin update us on the status of this PR 14:25:38 https://github.com/webmachinelearning/webnn/issues/156#issuecomment-846828170 14:25:40 ningxin_hu: I have a prototype to better understand this issue 14:26:07 ... this follows up on Jonathan and Ping requirement to allow conv2d impl for TF.js Wasm backend 14:27:37 q+ 14:27:44 ... The implementation is in conv2d_impl.cc and the WebNN calls are guarded by USE_WEBNN_OP. With the prototype, I observed good performance speedup (3X to 5X) by a tf.conv2d benchmark when offloading the compute to native library (such as XNNPACK or oneDNN). 14:28:55 ... using same op cache for later use, when the cache hits a graph is fetched and compute run on inputs and outputs allocated by TF.js 14:29:41 ... two backends used, XNNPACK and oneDNN 14:30:05 ... using webnn-native project for this prototyping 14:30:28 ... observations: 14:30:45 ... 1) TF.js Wasm backend expects the input and output data of an op execution to be in standard layout. 14:31:07 ... 2) TF.js Wasm backend pre-allocates input and output buffers for an op execution. 14:31:20 ... 3) TF.js Wasm backend executes an op synchronously. 14:31:24 q? 14:31:58 ningxin_hu: GraphBuilder API implemented as a sync API to satisfy TF.js backend requirement 14:32:26 ack RafaelCintron 14:33:14 RafaelCintron: biggest finding is that using real native code is faster than using the best Wasm you can have today, this gives a lot of hope WebNN is faster than alternatives 14:33:45 q+ 14:34:00 ack Ping_Yu 14:34:27 Ping_Yu: thanks Ningxin for making WebNN working with TF.js Wasm backend 14:34:57 ... want to understand where the performance comes from, my understanding is it is due to wider SIMD and oneDNN optimizations? 14:35:26 ... historically Wasm backend is in sync more, for other backends there's no sync download 14:35:36 [line breaking, typos expected in scribing] 14:35:55 s/sync more/sync mode 14:37:13 ningxin_hu: performance gain comes according to my investigations, wider SIMD via XNNPACK, AVX-256 wide instruction on my dev machine, while Wasm SIMD is only 128 wide today 14:37:48 q+ 14:37:56 ... oneDNN uses even more aggressive optimization strategy, reorder not only width, but input and output to vector instruction optimization layout, this is platform specific, different architectures may use different layouts for better performance 14:38:04 q? 14:38:07 ack RafaelCintron 14:38:32 RafaelCintron: there's difference between API being sync and returning objects right away 14:38:37 q+ 14:39:16 Is WASM SIMD going to get wider instructions eventually (256 vs 128)? 14:39:25 ... if they're backed by GPU that's possible, so OK for me to have WebNN compute return objects right away, and if you want to read back you have to do that via async promise-returning object 14:40:10 q? 14:40:28 ack Chai 14:40:50 Chai: specifically on the PR itself, I've summarized it in my more recent comment 14:41:04 ... want to double-check what Ningxin responded lately 14:41:33 ... are you saying that based on your prototype it doesn't matter if the API return native format? 14:42:31 ... if we're going to support native format, there's a fingerprinting concern, so would be good for the group to clarify what is the position what comes to native format, overlaps a bit with readback that can happen in standard format 14:43:13 ... sync vs. async is a questions, we should talk pros/cons of both and settle on one design, if we support both it'll be harder for implementers 14:43:24 ... if we do both it'll be more confusing for the caller 14:43:28 q+ 14:44:02 ack ningxin_hu 14:44:35 ningxin_hu: in this PR we discuss native format support, my latest response re investigation on TF.js backend is a major use case I think 14:45:00 ... it turns out TF.js expects input and output in standard layout, based on that we can leave native format in a separate issue 14:45:15 ... to be handled in a future version 14:45:20 +1 on standard format only in V1 14:46:10 ningxin_hu: I can revert MLTensor proposal that support format conversion, because that is not needed for this V1 spec 14:46:48 ... to close op specific use case, I propose we align what Chai proposed, make compute API only support pre-allocated output, good for GPU resource 14:46:59 ... this is documented in my comment 14:47:00 https://github.com/webmachinelearning/webnn/pull/166#discussion_r637792027 14:47:15 ... secondly, we'll leave native format in a separate issue to be addressed in future 14:47:45 ... third is to add support for sync version of build and compute API, keep async versions too 14:48:39 q+ 14:48:46 anssik: anyone have concerns with Ningxin's/Chai's proposal? 14:48:51 ack RafaelCintron 14:49:06 RafaelCintron: is there agreement that we're going to have both sync and async? 14:49:30 ... is async meant for CPU ArrayBuffer usage? 14:49:42 ningxin_hu: I think the agreement is for the compute API to not return output 14:49:57 ... previously we only had compute accept input and preallocated output 14:50:09 ... this is one proposal, agreement between Chai and me 14:50:55 Chai: the previous API implies the implementer needs to manage the output buffer 14:51:01 ... separate issue is sync vs async 14:51:21 ... my preference is for async, making it simpler for implementer and caller 14:52:01 ... WebNN context can be created off of explicit device or context, in the latter case the implementation can create a GPU device under the hood hiding platform details for the caller fully 14:53:35 ... additional flexibility to caller, but if resource is given to ArrayBuffer, implies the caller want to buffer to be on CPU 14:54:45 ... if we have both async and sync API we have to document carefully both 14:55:35 anssik: can you live with both sync and async API? 14:55:38 q+ 14:56:29 sgtm 14:56:35 ack RafaelCintron 14:57:43 RafaelCintron: if we must have async version, perhaps we can have that be strongly type to only take ArrayBuffer and return them 14:57:54 +1 on async only for CPU buffer output 14:58:11 Topic: Model Loader API update 14:58:16 anssik: Next, we'll hear a Model Loader API update from Jonathan. Two topics: 14:58:21 ... 1. Chrome OS Origin Trial plans 14:58:28 ... 2. Wasm runner for TF Lite and how it could possibly be used for WebNN benchmarking 14:59:00 Jonathan: ChromeOS has staffing for doing work on Model Loader API 14:59:10 ... starting with the spec in this CG 14:59:25 ... getting to Origin Trial in 1-2 Qs 14:59:44 ... plan to be able to run TF Lite models, understanding that TF Lite models are not suitable for a web standard 15:00:09 ... but want to use that as a starting point to be able to benchmark with WebNN and Wasm 15:00:26 ... at Google I/O we announced work by Ping et al. Wasm runner for TF Lite models 15:00:52 ... similar to Model Loader, takes TF Lite model and runs that with Wasm 15:01:16 ... there's potential for this work to become a polyfill for a graph API or Model Loader API, if we make that more generic, possible future generation 15:01:37 q? 15:01:48 RRSAgent, draft minutes 15:01:48 I have made the request to generate https://www.w3.org/2021/05/27-webmachinelearning-minutes.html anssik 15:01:49 q+ 15:01:57 ack ningxin_hu 15:02:13 ningxin_hu: for the second one, you mentioned TF Lite Wasm version, it is interesting 15:02:43 ... for WebNN benchmarking, do you mean we have WebNN backend for TF Lite Wasm similarly to TF.js backend? 15:02:54 Jonathan: that's potential, you need to talk to Ping about that 15:03:21 ... from our side, TF team wants to do some benchmarking, one way is TF Lite Wasm runner, one is WebNN and parse the model and construct the graph 15:03:36 ... it could be useful for this group 15:04:19 ningxin_hu: regarding our explainer, we target framework usage, TF Lite Wasm runner fits into that, I'm interested in investigation into this and will chat with you offline 15:04:56 https://www.youtube.com/watch?v=5q8BzYN4rqA 15:05:05 q? 15:05:24 RRSAgent, draft minutes 15:05:24 I have made the request to generate https://www.w3.org/2021/05/27-webmachinelearning-minutes.html anssik