13:55:24 <RRSAgent> RRSAgent has joined #webmachinelearning
13:55:24 <RRSAgent> logging to https://www.w3.org/2021/05/27-webmachinelearning-irc
13:55:27 <Zakim> RRSAgent, make logs Public
13:55:27 <Zakim> please title this meeting ("meeting: ..."), anssik
13:55:29 <anssik> Meeting: WebML CG Teleconference – 27 May 2021
13:55:34 <anssik> Chair: Anssi
13:55:39 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-05-27-agenda.md
13:55:45 <anssik> Scribe: Anssi
13:55:54 <anssik> scribeNick: anssik
13:56:02 <anssik> Present+ Anssi_Kostiainen
14:00:09 <Ping_Yu> Ping_Yu has joined #webmachinelearning
14:00:15 <anssik> Present+ Ningxin_Hu
14:00:33 <ningxin_hu> ningxin_hu has joined #webmachinelearning
14:00:39 <anssik> Present+ Ping_Yu
14:02:06 <anssik> Present+ Chai_Chaoweeraprasit
14:02:40 <anssik> Present+ Jonathan_Bingham
14:02:59 <anssik> RRSAgent, draft minutes
14:02:59 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/27-webmachinelearning-minutes.html anssik
14:03:25 <Jonathan> Jonathan has joined #webmachinelearning
14:03:26 <Chai> Chai has joined #webmachinelearning
14:03:35 <RafaelCintron> RafaelCintron has joined #webmachinelearning
14:03:52 <anssik> Present+ Rafael_Cintron
14:04:00 <anssik> Topic: Security and Privacy
14:04:12 <anssik> Subtopic: Security and Privacy Considerations
14:04:55 <anssik> anssik: First let's review and discuss initial Security and Privacy Considerations.
14:05:00 <zkis> present+ Zoltan_Kis
14:05:10 <anssik> anssik: I submitted a PR #170 to address issue #122. It should be noted we expect to evolve this initial version based on additional feedback, this is a starting point.
14:05:15 <anssik> -> https://github.com/webmachinelearning/webnn/issues/122 Security and privacy considerations (issue #122)
14:05:21 <anssik> -> https://github.com/webmachinelearning/webnn/pull/170 Add initial Security and Privacy Considerations sections (PR #170)
14:05:26 <anssik> Present+ Zoltan_Kis
14:05:56 <anssik> ... Chai LGTM'd the PR #170, pending Ningxin's LGTM. This content meets the bar for First Public Working Draft purposes.
14:06:52 <anssik> ... In Security section we should discuss security mechanisms that protect confidentiality, preserve information integrity, or promote availability of data -- we already added Permissions Policy integration per PING feedback
14:07:27 <anssik> ... In Privacy, we discuss measures taken to protect the rights of individual with respect to personal information, or known privacy concerns
14:07:32 <anssik> ... fingerprinting is probably the most substantial privacy concern in Web API design
14:07:40 <anssik> -> https://w3c.github.io/fingerprinting-guidance/ Mitigating Browser Fingerprinting in Web Specifications
14:07:58 <anssik> anssik: PING has written a doc about it that also proposes mitigations, we all should read it
14:08:14 <ningxin_hu> sure, I'll review
14:08:39 <anssik> We have two related [privacy-tracker] labelled issue
14:08:45 <anssik> -> https://github.com/webmachinelearning/webnn/issues/119 [privacy-tracker] Self-Review Questionnaire (issue #119)
14:09:14 <anssik> anssik: this documents our questionnaire response and serves as a record Privacy Interest Group has acknowledged and is happy about our initial response. We continue work with PING to expand the privacy considerations.
14:09:25 <anssik> -> https://github.com/webmachinelearning/webnn/issues/85 [privacy-tracker] Fingerprinting via matmul (issue #85)
14:09:37 <anssik> anssik: issue #85 is about a possible fingerprinting vector we discussed earlier
14:09:47 <anssik> ... in PR #170 I incorporated the following statement to inform implementers about this possibility:
14:09:54 <anssik> ... "An execution time analysis may reveal indirectly the performance of the underlying platform's neural network hardware acceleration capabilities relative to another underlying platform."
14:10:55 <anssik> anssik: Ningxin provided comments (thanks!) from Wasm people how they're handling these concerns, documented in the issue #85
14:11:00 <anssik> ... Ningxin, want to brief us on what you learned?
14:11:14 <anssik> Ningxin: input from Jonathan and Jing involved with Wasm
14:11:58 <anssik> ... 1. Saturation and rounding (round-to-nearest ties-to-eve) are standardized in Wasm SIMD. So JS developers should see same saturation behavior on different architectures.
14:13:01 <anssik> ... 2. There is an early proposal in Wasm SIMD called Relaxed SIMD, which wants to relax some strict determinism requirements of instructions to unlock near native performance on different platforms. Fingerprinting would also be considered there.
14:14:03 <RafaelCintron> q+
14:14:03 <anssik> ... it'd be useful to monitor how Wasm CG address these issues
14:14:18 <anssik> ack RafaelCintron
14:14:52 <anssik> RafaelCintron: question re Relaxed SIMD, is this turned on by default?
14:15:00 <anssik> ningxin_hu: need to check with Wasm people
14:15:36 <anssik> anssik: if any of these mitigations apply to WebNN API we should reuse those
14:16:13 <ningxin_hu> https://github.com/WebAssembly/design/issues/1401
14:16:15 <Chai> [need to step away briefly. be right back]
14:16:46 <ningxin_hu> https://github.com/WebAssembly/relaxed-simd
14:16:47 <anssik> ... Iet's keep the issuie #85 open to solicit further feedback
14:17:15 <anssik> Subtopic: WebGPU/GL Security and Privacy Considerations
14:17:28 <Chai> [back]
14:17:35 <anssik> anssik: wanted the group to discuss and review WebGPU/GL Security and Privacy Considerations to understand whether some of them could be repurposed in this context
14:17:40 <anssik> -> https://gpuweb.github.io/gpuweb/#security WebGPU Security
14:17:45 <anssik> -> https://gpuweb.github.io/gpuweb/#security-privacy WebGPU Privacy
14:17:49 <anssik> -> https://www.khronos.org/registry/webgl/specs/latest/1.0/#4 WebGL Security
14:18:03 <anssik> anssik: WebGPU identifies the following security considerations:
14:18:12 <anssik> ... - CPU-based undefined behavior
14:18:12 <anssik> ... - GPU-based undefined behavior
14:18:12 <anssik> ... Uninitialized data
14:18:12 <anssik> ... Out-of-bounds access in shaders
14:18:12 <anssik> ... Invalid data
14:18:12 <anssik> ... Driver bugs
14:18:13 <anssik> ... Timing attacks
14:18:13 <anssik> ... Row hammer attacks
14:18:13 <anssik> ... Denial of service
14:18:14 <anssik> ... Workload identification
14:18:14 <anssik> ... Memory resources
14:18:14 <anssik> ... Computation resources
14:18:59 <anssik> anssik: and these WebGPU Privacy considerations:
14:19:06 <anssik> ... - Machine-specific limits
14:19:06 <anssik> ... - Machine-specific artifacts
14:19:06 <anssik> ... - Machine-specific performance
14:19:14 <anssik> anssik: WebGL Security considerations:
14:19:23 <anssik> ... - Resource Restrictions
14:19:23 <anssik> ... - Origin Restrictions
14:19:23 <anssik> ... - Supported GLSL Constructs
14:19:23 <anssik> ... - Defense Against Denial of Service
14:19:24 <anssik> ... - Out-of-Range Array Accesses
14:19:39 <anssik> anssik: Maybe Rafael has some comments from the WebGL side?
14:19:54 <anssik> RafaelCintron: both groups take this very seriously
14:20:11 <anssik> ... in WebGL you can only use same-origin or CORS textures to avoid ppl using timing attacks
14:20:22 <anssik> ... many many years based on security research feedback
14:20:34 <anssik> ... timer queries less precise to avoid fingerprinting
14:20:43 <anssik> ... WebGL tried to make many undefined things to be defined
14:21:08 <anssik> ... so behaviour is consistent across browsers
14:21:39 <anssik> ... not just helping security, but also debuggability
14:22:54 <anssik> ... it has been a group effort to WebGPU/GL to come up with Security and Privacy Considerations
14:23:59 <anssik> anssik: questions, comments?
14:24:08 <anssik> Topic: Operation-specific APIs proposal
14:24:36 <anssik> anssik: Let's continue our favourite topic, and discuss design considerations and review proposed solutions to enable both efficient graph execution and imperative eager execution with a cohesive WebNN API.
14:24:50 <anssik> -> https://github.com/webmachinelearning/webnn/pull/166 https://github.com/webmachinelearning/webnn/pull/166 Support download data asynchronously (PR #166)
14:25:04 <anssik> anssik: I'll let Chai and Ningxin update us on the status of this PR
14:25:38 <ningxin_hu> https://github.com/webmachinelearning/webnn/issues/156#issuecomment-846828170
14:25:40 <anssik> ningxin_hu: I have a prototype to better understand this issue
14:26:07 <anssik> ... this follows up on Jonathan and Ping requirement to allow conv2d impl for TF.js Wasm backend
14:27:37 <RafaelCintron> q+
14:27:44 <anssik> ... The implementation is in conv2d_impl.cc and the WebNN calls are guarded by USE_WEBNN_OP. With the prototype, I observed good performance speedup (3X to 5X) by a tf.conv2d benchmark when offloading the compute to native library (such as XNNPACK or oneDNN).
14:28:55 <anssik> ... using same op cache for later use, when the cache hits a graph is fetched and compute run on inputs and outputs allocated by TF.js
14:29:41 <anssik> ... two backends used, XNNPACK and oneDNN
14:30:05 <anssik> ... using webnn-native project for this prototyping
14:30:28 <anssik> ... observations:
14:30:45 <anssik> ... 1) TF.js Wasm backend expects the input and output data of an op execution to be in standard layout.
14:31:07 <anssik> ... 2) TF.js Wasm backend pre-allocates input and output buffers for an op execution.
14:31:20 <anssik> ... 3) TF.js Wasm backend executes an op synchronously.
14:31:24 <anssik> q?
14:31:58 <anssik> ningxin_hu: GraphBuilder API implemented as a sync API to satisfy TF.js backend requirement
14:32:26 <anssik> ack RafaelCintron
14:33:14 <anssik> RafaelCintron: biggest finding is that using real native code is faster than using the best Wasm you can have today, this gives a lot of hope WebNN is faster than alternatives
14:33:45 <Ping_Yu> q+
14:34:00 <anssik> ack Ping_Yu
14:34:27 <anssik> Ping_Yu: thanks Ningxin for making WebNN working with TF.js Wasm backend
14:34:57 <anssik> ... want to understand where the performance comes from, my understanding is it is due to wider SIMD and oneDNN optimizations?
14:35:26 <anssik> ... historically Wasm backend is in sync more, for other backends there's no sync download
14:35:36 <anssik> [line breaking, typos expected in scribing]
14:35:55 <anssik> s/sync more/sync mode
14:37:13 <anssik> ningxin_hu: performance gain comes according to my investigations, wider SIMD via XNNPACK, AVX-256 wide instruction on my dev machine, while Wasm SIMD is only 128 wide today
14:37:48 <RafaelCintron> q+
14:37:56 <anssik> ... oneDNN uses even more aggressive optimization strategy, reorder not only width, but input and output to vector instruction optimization layout, this is platform specific, different architectures may use different layouts for better performance
14:38:04 <anssik> q?
14:38:07 <anssik> ack RafaelCintron
14:38:32 <anssik> RafaelCintron: there's difference between API being sync and returning objects right away
14:38:37 <Chai> q+
14:39:16 <Jonathan> Is WASM SIMD going to get wider instructions eventually (256 vs 128)?
14:39:25 <anssik> ... if they're backed by GPU that's possible, so OK for me to have WebNN compute return objects right away, and if you want to read back you have to do that via async promise-returning object
14:40:10 <anssik> q?
14:40:28 <anssik> ack Chai
14:40:50 <anssik> Chai: specifically on the PR itself, I've summarized it in my more recent comment
14:41:04 <anssik> ... want to double-check what Ningxin responded lately
14:41:33 <anssik> ... are you saying that based on your prototype it doesn't matter if the API return native format?
14:42:31 <anssik> ... if we're going to support native format, there's a fingerprinting concern, so would be good for the group to clarify what is the position what comes to native format, overlaps a bit with readback that can happen in standard format
14:43:13 <anssik> ... sync vs. async is a questions, we should talk pros/cons of both and settle on one design, if we support both it'll be harder for implementers
14:43:24 <anssik> ... if we do both it'll be more confusing for the caller
14:43:28 <ningxin_hu> q+
14:44:02 <anssik> ack ningxin_hu
14:44:35 <anssik> ningxin_hu: in this PR we discuss native format support, my latest response re investigation on TF.js backend is a major use case I think
14:45:00 <anssik> ... it turns out TF.js expects input and output in standard layout, based on that we can leave native format in a separate issue
14:45:15 <anssik> ... to be handled in a future version
14:45:20 <Chai> +1 on standard format only in V1
14:46:10 <anssik> ningxin_hu: I can revert MLTensor proposal that support format conversion, because that is not needed for this V1 spec
14:46:48 <anssik> ... to close op specific use case, I propose we align what Chai proposed, make compute API only support pre-allocated output, good for GPU resource
14:46:59 <anssik> ... this is documented in my comment
14:47:00 <ningxin_hu> https://github.com/webmachinelearning/webnn/pull/166#discussion_r637792027
14:47:15 <anssik> ... secondly, we'll leave native format in a separate issue to be addressed in future
14:47:45 <anssik> ... third is to add support for sync version of build and compute API, keep async versions too
14:48:39 <RafaelCintron> q+
14:48:46 <anssik> anssik: anyone have concerns with Ningxin's/Chai's proposal?
14:48:51 <anssik> ack RafaelCintron
14:49:06 <anssik> RafaelCintron: is there agreement that we're going to have both sync and async?
14:49:30 <anssik> ... is async meant for CPU ArrayBuffer usage?
14:49:42 <anssik> ningxin_hu: I think the agreement is for the compute API to not return output
14:49:57 <anssik> ... previously we only had compute accept input and preallocated output
14:50:09 <anssik> ... this is one proposal, agreement between Chai and me
14:50:55 <anssik> Chai: the previous API implies the implementer needs to manage the output buffer
14:51:01 <anssik> ... separate issue is sync vs async
14:51:21 <anssik> ... my preference is for async, making it simpler for implementer and caller
14:52:01 <anssik> ... WebNN context can be created off of explicit device or context, in the latter case the implementation can create a GPU device under the hood hiding platform details for the caller fully
14:53:35 <anssik> ... additional flexibility to caller, but if resource is given to ArrayBuffer, implies the caller want to buffer to be on CPU
14:54:45 <anssik> ... if we have both async and sync API we have to document carefully both
14:55:35 <anssik> anssik: can you live with both sync and async API?
14:55:38 <RafaelCintron> q+
14:56:29 <ningxin_hu> sgtm
14:56:35 <anssik> ack RafaelCintron
14:57:43 <anssik> RafaelCintron: if we must have async version, perhaps we can have that be strongly  type to only take ArrayBuffer and return them
14:57:54 <Chai> +1 on async only for CPU buffer output
14:58:11 <anssik> Topic: Model Loader API update
14:58:16 <anssik> anssik: Next, we'll hear a Model Loader API update from Jonathan. Two topics:
14:58:21 <anssik> ... 1. Chrome OS Origin Trial plans
14:58:28 <anssik> ... 2. Wasm runner for TF Lite and how it could possibly be used for WebNN benchmarking
14:59:00 <anssik> Jonathan: ChromeOS has staffing for doing work on Model Loader API
14:59:10 <anssik> ... starting with the spec in this CG
14:59:25 <anssik> ... getting to Origin Trial in 1-2 Qs
14:59:44 <anssik> ... plan to be able to run TF Lite models, understanding that TF Lite models are not suitable for a web standard
15:00:09 <anssik> ... but want to use that as a starting point to be able to benchmark with WebNN and Wasm
15:00:26 <anssik> ... at Google I/O we announced work by Ping et al. Wasm runner for TF Lite models
15:00:52 <anssik> ... similar to Model Loader, takes TF Lite model and runs that with Wasm
15:01:16 <anssik> ... there's potential for this work to become a polyfill for a graph API or Model Loader API, if we make that more generic, possible future generation
15:01:37 <anssik> q?
15:01:48 <anssik> RRSAgent, draft minutes
15:01:48 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/27-webmachinelearning-minutes.html anssik
15:01:49 <ningxin_hu> q+
15:01:57 <anssik> ack ningxin_hu
15:02:13 <anssik> ningxin_hu: for the second one, you mentioned TF Lite Wasm version, it is interesting
15:02:43 <anssik> ... for WebNN benchmarking, do you mean we have WebNN backend for TF Lite Wasm similarly to TF.js backend?
15:02:54 <anssik> Jonathan: that's potential, you need to talk to Ping about that
15:03:21 <anssik> ... from our side, TF team wants to do some benchmarking, one way is TF Lite Wasm runner, one is WebNN and parse the model and construct the graph
15:03:36 <anssik> ... it could be useful for this group
15:04:19 <anssik> ningxin_hu: regarding our explainer, we target framework usage, TF Lite Wasm runner fits into that, I'm interested in investigation into this and will chat with you offline
15:04:56 <Jonathan> https://www.youtube.com/watch?v=5q8BzYN4rqA
15:05:05 <anssik> q?
15:05:24 <anssik> RRSAgent, draft minutes
15:05:24 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/27-webmachinelearning-minutes.html anssik