13:58:11 RRSAgent has joined #webmachinelearning 13:58:11 logging to https://www.w3.org/2021/05/13-webmachinelearning-irc 13:58:12 inviting RRSAgent 13:58:14 RRSAgent, make logs Public 13:58:14 please title this meeting ("meeting: ..."), anssik 13:58:36 Meeting: WebML CG Teleconference – 13 May 2021 13:58:40 Chair: Anssi 13:58:59 Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-05-13-agenda.md 13:59:03 Scribe: Anssi 13:59:07 scribeNick: anssik 13:59:29 Present+ Anssi_Kostiainen 13:59:41 Present+ Rafael_Cintron 14:00:02 Present+ Ganesan_Ramalingam 14:00:20 Present0 Sandeep_Gupta 14:00:25 Present+ Ningxin_Hu 14:00:37 Present+ Chai_Chaoweeraprasit 14:00:40 ningxin_hu has joined #webmachinelearning 14:00:49 RRSAgent, draft minutes 14:00:49 I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik 14:01:10 rama has joined #webmachinelearning 14:01:30 Topic: TAG review 14:01:42 chai has joined #webmachinelearning 14:02:00 Sandeep_Gupta has joined #webmachinelearning 14:02:01 anssik: let's review the remaining open TAG review issues for new information and thoughts 14:02:16 Geun-Hyung has joined #webmachinelearning 14:02:19 Present+ Heun-Hyung_Kim 14:02:40 Subtopic: [tag-review] Define a common term for logical tensor changes? 14:02:47 -> https://github.com/webmachinelearning/webnn/issues/150 issue #150 14:02:54 RafaelCintron has joined #webmachinelearning 14:03:03 anssik: since we last looked at this the issue has received a TAG clarification 14:03:17 Geun-Hyung_ has joined #webmachinelearning 14:03:18 ... TAG says: "Looking at this PR, "wouldn't it make sense to define a common term for logical tensor changes (e.g. views?) somewhere early in the document so that concept can be re-used?" 14:03:24 ... and TAG clarified "tensor changes" means cases where it still refers to the same tensor, but has "changed" from the caller's point of view. One example would be a non-copying reshape, another would be a transpose. 14:03:24 present 14:03:31 present+ 14:03:51 anssik: any reactions or suggestions how we'd like to respond? 14:04:33 q? 14:04:39 q+ 14:04:43 ack rama 14:05:00 Rama: it seems this should be considered mostly an implementation detail 14:05:23 ... even if transpose changes the data, reshape gives an option ... 14:05:36 ... it is implementation's responsibility to eliminate unnecessary copies 14:05:57 ... they're distinct values from the caller's pov 14:06:14 +1, it sounds like an implementation details 14:06:24 +1 14:06:55 anssik: I suggest we add a note to the spec to clarify our design direction 14:08:06 can we simply clarify this on the issue? 14:09:48 anssik: Rama, can you work with Chai to propose a resolution in the issue #150? 14:09:56 Subtopic: [tag-review] Isomorphic JS story, worker scope exposure? 14:10:03 -> https://github.com/webmachinelearning/webnn/issues/142 issue #142 14:10:09 -> https://github.com/webmachinelearning/webnn/pull/163 PR #163 14:10:14 anssik: Ningxin submitted a PR #163 to address this issue, some reviews pending from Chai and Ping 14:11:33 ningxin_hu: this PR exposes MLContext and MLOperand etc. in addition to Window, to DedicatedWorker 14:12:03 ... this is good, because from use cases perspective some Wasm-based ML frameworks would like to run Wasm module in WebWorker 14:12:20 ... if we can expose this in the worker, it'll help the lib to access hardware acceleration 14:12:42 ... this avoid blocking the main thread, given sync API implementation, running in the worker, message-passing communication 14:12:58 ... this change makes the sync Wasm lib implementation feasible 14:13:28 ... after this change we can also think how to address that use case to allow Wasm JS lib in worker to use this API 14:14:04 anssik: PR #163 makes it feasible to introduce sync variants of compile()/build() and compute() in the worker context without blocking the main thread? 14:14:49 [step away from the keyboard] 14:16:03 ningxin_hu: subsequent change would be a sync API for DedicatedWorker, per my investigation the Wasm lib uses sync programming model, so WebNN API needs to cater for that, in a worker context we could add such a sync API without blocking 14:16:24 q? 14:17:06 RafaelCintron: PR #163 LGTM 14:17:33 [back now] 14:17:56 propose to add Rafael into reviewers 14:19:17 q? 14:19:24 q+ 14:19:28 ack ningxin_hu 14:19:39 jonathan_ has joined #webmachinelearning 14:19:57 ningxin_hu: given Rafael has given a lot of great feedback, I propose Rafael to be added to Collaborators so we can request his review explicitly 14:20:21 Subtopic: [tag-review] Ergonomics of the JS examples 14:20:26 -> https://github.com/webmachinelearning/webnn/issues/139 issue #139 14:21:04 anssik: I recall Sangwhan wanted to provide some concrete suggestions for this issue. I believe he's been busy so I'd ping him again and resolve this by our next call latest, OK? 14:21:44 q? 14:21:47 Subtopic: [tag-review] String enum for activations 14:21:54 -> https://github.com/webmachinelearning/webnn/issues/138 issue #138 14:22:07 anssik: last time around we discussed the pros and cons of the "failure is an option" pattern 14:22:31 ... TAG suggest future-proofing via raising errors when the underlying hardware does not support a particular activation 14:22:38 ... TAG says: "The reason why I think this pattern might be better is because it discourages preemptively implementing different code branches (e.g. if accelerator is A at the time of implementation, ossify model to the capabilities of A at the time being) like how user agent based branching is abused as of today." 14:23:21 q+ 14:23:36 ... Chai notes error handling complicates the API caller's code, and makes perf unpredictable 14:23:38 ack chai 14:24:01 Chai: I think the topic is being discussed in the issue, I summarized my thoughts there 14:24:09 ... failure is indeed an option, I do not object to that 14:24:26 ... my concern is more about the API usability 14:24:57 ... when we design the op API we said we want to design the smallest possible ops so bigger networks can be composed of smaller wants, if that makes sense from the OS side 14:25:55 ... also some network like RNN etc. are real networks not just ops, those can been already supported by existing OSes and we want to optimize for those in this API, so to us the GRU is perf shortcut 14:26:51 ... it is hard to find hardware that support RNN differently, concern of overloading an API with every activation, 20+, is that you risk defining an API that overtime the caller does not know if it'll succeed from time to time, and the caller will stop using the API 14:28:02 +1 to Chai's points 14:28:11 ... we want all the known cases where the support is universal GRU w/ sigmoid to work, this will be done by the framework, when it loads a model it'll convert it into WebNN that is considered a backend, by overloading with every possible activations the API will fail randomly and we risk people not using it because it won't be reliable 14:28:40 +1 14:28:47 +1 14:30:18 anssik: suggesting this design principle to be added to the spec as a note 14:30:25 chai: +1 I'll take care of that 14:31:26 Chai: is it unusual to have a FAQ section in the spec? 14:32:23 anssik: on top of my head, not many specs have FAQ 14:34:41 q? 14:34:44 Topic: Privacy review 14:34:50 Subtopic: [privacy-tracker] Self-Review Questionnaire: Security and Privacy 14:35:02 -> https://github.com/webmachinelearning/webnn/issues/119 Issue #119 14:35:07 anssik: issue #119 track out responses to the Self-Review Questionnaire for Security and Privacy 14:35:17 ... we should close this issue from our end once we've addressed all actionable feedback. 14:35:28 ... we did already merge PR #159 that addressed issue #145 and made WebNN API a policy-controlled feature 14:36:01 ... I have opened an issue #122 to add a Security and privacy considerations section to the spec and incorporate PING feedback that does not suggest normative changes into this section 14:36:15 -> https://github.com/webmachinelearning/webnn/issues/122 Issue #122 14:36:39 q? 14:36:54 Subtopic: [privacy-tracker] Fingerprinting via matmul 14:37:08 anssik: another specific issue flagged by PING is about fingerprinting via matmul 14:37:15 -> https://github.com/webmachinelearning/webnn/issues/85 Issue #85 14:37:33 anssik: the author of this issue Kenneth Heafield from Uni of Edinburgh works on Natural Language Processing, he also presented at our workshop on privacy focused machine translation in Firefox 14:37:37 -> https://www.w3.org/2020/06/machine-learning-workshop/talks/privacy_focused_machine_translation_in_firefox.html Kenneth's workshop presentation 14:37:56 anssik: as part of his implementation work, he reported that an efficient matmul implementation can be fingerprinted to determine hardware capabilities. 14:38:11 ... as you know, fingerprinting is a substantial concern on the web platform, and PING has produced an entire document discussing this issue 14:38:16 -> https://www.w3.org/TR/fingerprinting-guidance/ Mitigating Browser Fingerprinting in Web Specifications 14:39:23 anssik: the proposed course of action for the group is to review this issue, study PING fingerprinting mitigation doc and document the issue and its proposed mitigations in the Privacy and Security section 14:39:30 ... does this sound reasonable? 14:39:58 q? 14:41:26 ningxin_hu: I haven't looked at this so far, I think this is related to CPU instructions, perhaps relevant to Wasm group as well? 14:41:51 Present+ Jonathan_Bingham 14:42:11 ningxin_hu: I can talk to some Intel folks working on Wasm and report back to the group 14:42:23 q? 14:42:51 Topic: Operation-specific APIs proposal 14:43:00 anssik: We keep on making progress on speccing features that satisfy the requirements of the operation-specific APIs proposal 14:43:04 Subtopic: Add support for device selection 14:43:09 -> https://github.com/webmachinelearning/webnn/pull/162 PR #162 (merged) 14:43:18 anssik: Chai great work rescoping this PR and getting it landed 14:43:26 ... Chai, want to give a brief on what new features were introduced in this PR? 14:43:37 anssik: From PR log: 14:43:41 ... - Add a device preference in the context option in addition to the power preference. This change allows the app to have more control over what type of execution device should be used. It enables an effective interop with WASM (#156). 14:43:51 Chai: PR rescoped to just device selection 14:44:09 ... need a new separate issue for accelerator 14:45:14 +1 14:46:36 https://github.com/webmachinelearning/webnn/issues/169 14:47:05 Subtopic: Asynchronous data download 14:47:14 anssik: no separate issue for this, discussed in PR #162 14:47:19 -> https://github.com/webmachinelearning/webnn/pull/166 PR #166 14:47:26 anssik: Ningxin submitted a PR, in review 14:47:39 ... Ningxin please introduce these suggested changes and key opens 14:48:10 Ningxin: changes in this PR (follow up for #162): 14:48:22 ... - Introduce MLTensor interface that supports download data asynchronously. 14:48:25 ... - Add an overloaded version of MLGraph.compute that returns MLTensor synchronously. 14:48:30 ... - Add a new example that computes multiple graphs without downloading the intermediate results. 14:49:58 ningxin_hu: if you look at today's compute API, compute preallocated outputs and provide ML outputs and bind buffers, supply input and output, another usage is to get a newly allocated buffer from the browser, caller provides only input and gets output in a promise resolution, that's the current API 14:50:17 ... in this PR the preallocated usage is not changes, still via MLInput and MLOutput, you provide both when calling compute() 14:50:37 ... newly allocated buffer case uses MLTensor to replace MLOutput that's dictionary 14:50:53 ... MLTensor is an interface, can download to a buffer and return it async 14:51:14 ... for this usage compute API is sync API, user can get MLTensor immediately upon compute() 14:51:35 ... added also use case sample for multiple ops executing in sequence 14:51:45 ... new example 5 illustrates this use case 14:52:39 ... code can create three different conv single-op graphs, JS code executes these three in sequence and uses the output MLTensor as input to the next one, issues three computes in sequence and only accesses the final output at the final step in an async way 14:53:03 ... this can satisfy the usage required, also preallocated buffer usage we already support 14:53:08 ... as for opens: 14:53:29 ... it looks like this PR makes caller to decide the API based on the underlying implementation? 14:53:56 ... I suppose that's not the case, caller chooses based on usage, MLTensor for newly returned buffers 14:54:25 q+ 14:54:35 ... for the implementation, if this feature is merged, implementation should support both 14:54:47 ... I think DML backend should be able to support these two usages 14:54:53 ack chai 14:55:21 Chai: thanks ningxin_hu! This is an issue moved from the device selection to this PR 14:55:50 ... when there's disagreement in the API how it is exposed, we should reexamine the assumptions 14:56:03 ... caller having to know what the implementation is beforehand 14:56:39 ... we should look at what is the exact problem we want to solve 14:57:06 ... caller wants to use the WebNN in a way they can stream multiple compiled graphs in a sequence 14:57:55 ... caller can compile a graph of one op, and them stream them in sequence 14:59:03 ... special case is to not only stream but control in between the two graphs and pass intermediate data, the implementer of WebNN needs to allow a case where output can be produced no-one can understand, because the contract allow such data to be passed between the graphs 14:59:42 ... even if the desire is noble, if we can hide the implementation details from the caller, and let the caller move data on their own without looking at it, until the very end when the data is revealed 15:00:12 ... not super familiar with Wasm, but believe this is how you can make Wasm backend to run faster, additional design requirement inserted here 15:00:42 ... this is fundamentally changing the initial assumptions 15:02:00 ... OS or browser layout, does not matter, output will be in this layout, fundamental to the current design, even after you compile and execute, the layout is impl specific, but only at the last point you convert it to something that can be understood -- this is hard for implementations, since they need to defer until the caller wants to download the data 15:02:13 ... that is the core of this discussion in this PR 15:02:30 ... this is a design request 15:02:31 q? 15:03:37 Chai: are we OK to change the API fundamental this way? if we accept that whatever you read in the op spec, you should assume that's not true, because conv can produce whatever it wants to produce, and only method that matters is the final method that says download this data 15:03:46 ... we need to discuss this point more than the specifics of the PR 15:04:33 q+ 15:04:36 q? 15:04:38 ack ningxin_hu 15:05:08 ningxin_hu: I understand what Chai says, I think this PR does not change layout of op outputs 15:05:30 ... when the user access any graph data output, the user will always get the data in the layout defined by the spec 15:06:17 ... if the impl uses the standard layout that OK, the user code does not know whether internal layout is used, the surface of the API is always in layout defined by the spec 15:06:31 ... I don't understand why this breaks the fundamental assumptions? 15:06:50 Chai: the reason is because we are in this PR proposing overloads, two ways to call compute 15:07:01 ... depending on what compute overload you call 15:08:22 Chai: for GPU case overload would be inefficient, here the two overloads are different things 15:09:07 ... the specific point re DML is more that for the API to work well, you want to give it a resource to produce output with 15:09:23 ... any GPU calls you provide output buffer ahead of time and reuse it again and again 15:09:44 ... the first calling pattern does not allow that to happen, thus second pattern is introduced 15:10:21 ... do we want to allow the first calling pattern to exist? just raising this issue, we have to be OK that if the op has been compiled can produce something that's internal 15:10:39 q? 15:11:21 ningxin_hu: Chai made two points, 1) overloaded compute() 15:11:45 ... I agree the return is different, in this PR we return MLTensor and access data via .data 15:12:03 ... for the same two usages, even this spec has optional MLOutputs for 2nd param 15:12:25 ... and we return MLOutput in a promise, so we have two usages for preallocated buffer 15:12:37 ... these two ways to allocate buffers exists today 15:13:14 ... in the PR, the second usage for internally allocated buffer, would be also helpful for the GPU case, if the device a GPU, but MLInput is bound to ArrayBufferView 15:14:29 ... MLTensor as an output, we can just use MLTensor as the first op and input to the second, data can be in GPUBuffer w/o download back to the CPU 15:14:35 q? 15:15:09 -> https://github.com/webmachinelearning/webnn/issues/156 Support CPU - WebAssembly scenario of the op level execution use case #156 15:15:57 Chai: let's continue in https://github.com/webmachinelearning/webnn/pull/166 15:18:02 RRSAgent, draft minutes 15:18:02 I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik 15:18:33 s/Present0 Sandeep_Gupta// 15:18:36 RRSAgent, draft minutes 15:18:36 I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik 15:19:03 Present+ Sandeep_Gupta 15:19:13 RRSAgent, draft minutes 15:19:13 I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik