13:58:11 <RRSAgent> RRSAgent has joined #webmachinelearning
13:58:11 <RRSAgent> logging to https://www.w3.org/2021/05/13-webmachinelearning-irc
13:58:12 <Zakim> inviting RRSAgent
13:58:14 <Zakim> RRSAgent, make logs Public
13:58:14 <Zakim> please title this meeting ("meeting: ..."), anssik
13:58:36 <anssik> Meeting: WebML CG Teleconference – 13 May 2021
13:58:40 <anssik> Chair: Anssi
13:58:59 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-05-13-agenda.md
13:59:03 <anssik> Scribe: Anssi
13:59:07 <anssik> scribeNick: anssik
13:59:29 <anssik> Present+ Anssi_Kostiainen
13:59:41 <anssik> Present+ Rafael_Cintron
14:00:02 <anssik> Present+ Ganesan_Ramalingam
14:00:20 <anssik> Present0 Sandeep_Gupta
14:00:25 <anssik> Present+ Ningxin_Hu
14:00:37 <anssik> Present+ Chai_Chaoweeraprasit
14:00:40 <ningxin_hu> ningxin_hu has joined #webmachinelearning
14:00:49 <anssik> RRSAgent, draft minutes
14:00:49 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik
14:01:10 <rama> rama has joined #webmachinelearning
14:01:30 <anssik> Topic: TAG review
14:01:42 <chai> chai has joined #webmachinelearning
14:02:00 <Sandeep_Gupta> Sandeep_Gupta has joined #webmachinelearning
14:02:01 <anssik> anssik: let's review the remaining open TAG review issues for new information and thoughts
14:02:16 <Geun-Hyung> Geun-Hyung has joined #webmachinelearning
14:02:19 <anssik> Present+ Heun-Hyung_Kim
14:02:40 <anssik> Subtopic: [tag-review] Define a common term for logical tensor changes?
14:02:47 <anssik> -> https://github.com/webmachinelearning/webnn/issues/150 issue #150
14:02:54 <RafaelCintron> RafaelCintron has joined #webmachinelearning
14:03:03 <anssik> anssik: since we last looked at this the issue has received a TAG clarification
14:03:17 <Geun-Hyung_> Geun-Hyung_ has joined #webmachinelearning
14:03:18 <anssik> ... TAG says: "Looking at this PR, "wouldn't it make sense to define a common term for logical tensor changes (e.g. views?) somewhere early in the document so that concept can be re-used?"
14:03:24 <anssik> ... and TAG clarified "tensor changes" means cases where it still refers to the same tensor, but has "changed" from the caller's point of view. One example would be a non-copying reshape, another would be a transpose.
14:03:24 <Geun-Hyung_> present
14:03:31 <Geun-Hyung_> present+
14:03:51 <anssik> anssik: any reactions or suggestions how we'd like to respond?
14:04:33 <anssik> q?
14:04:39 <rama> q+
14:04:43 <anssik> ack rama
14:05:00 <anssik> Rama: it seems this should be considered mostly an implementation detail
14:05:23 <anssik> ... even if transpose changes the data, reshape gives an option ...
14:05:36 <anssik> ... it is implementation's responsibility to eliminate unnecessary copies
14:05:57 <anssik> ... they're distinct values from the caller's pov
14:06:14 <ningxin_hu> +1, it sounds like an implementation details
14:06:24 <chai> +1
14:06:55 <anssik> anssik: I suggest we add a note to the spec to clarify our design direction
14:08:06 <chai> can we simply clarify this on the issue?
14:09:48 <anssik> anssik: Rama, can you work with Chai to propose a resolution in the issue #150?
14:09:56 <anssik> Subtopic: [tag-review] Isomorphic JS story, worker scope exposure?
14:10:03 <anssik> -> https://github.com/webmachinelearning/webnn/issues/142 issue #142
14:10:09 <anssik> -> https://github.com/webmachinelearning/webnn/pull/163 PR #163
14:10:14 <anssik> anssik: Ningxin submitted a PR #163 to address this issue, some reviews pending from Chai and Ping
14:11:33 <anssik> ningxin_hu: this PR exposes MLContext and MLOperand etc. in addition to Window, to DedicatedWorker
14:12:03 <anssik> ... this is good, because from use cases perspective some Wasm-based ML frameworks would like to run Wasm module in WebWorker
14:12:20 <anssik> ... if we can expose this in the worker, it'll help the lib to access hardware acceleration
14:12:42 <anssik> ... this avoid blocking the main thread, given sync API implementation, running in the worker, message-passing communication
14:12:58 <anssik> ... this change makes the sync Wasm lib implementation feasible
14:13:28 <anssik> ... after this change we can also think how to address that use case to allow Wasm JS lib in worker to use this API
14:14:04 <anssik> anssik: PR #163 makes it feasible to introduce sync variants of compile()/build() and compute() in the worker context without blocking the main thread?
14:14:49 <chai> [step away from the keyboard]
14:16:03 <anssik> ningxin_hu: subsequent change would be a sync API for DedicatedWorker, per my investigation the Wasm lib uses sync programming model, so WebNN API needs to cater for that, in a worker context we could add such a sync API without blocking
14:16:24 <anssik> q?
14:17:06 <anssik> RafaelCintron: PR #163 LGTM
14:17:33 <chai> [back now]
14:17:56 <ningxin_hu> propose to add Rafael into reviewers
14:19:17 <anssik> q?
14:19:24 <ningxin_hu> q+
14:19:28 <anssik> ack ningxin_hu
14:19:39 <jonathan_> jonathan_ has joined #webmachinelearning
14:19:57 <anssik> ningxin_hu: given Rafael has given a lot of great feedback, I propose Rafael to be added to Collaborators so we can request his review explicitly
14:20:21 <anssik> Subtopic: [tag-review] Ergonomics of the JS examples
14:20:26 <anssik> -> https://github.com/webmachinelearning/webnn/issues/139 issue #139
14:21:04 <anssik> anssik: I recall Sangwhan wanted to provide some concrete suggestions for this issue. I believe he's been busy so I'd ping him again and resolve this by our next call latest, OK?
14:21:44 <anssik> q?
14:21:47 <anssik> Subtopic: [tag-review] String enum for activations
14:21:54 <anssik> -> https://github.com/webmachinelearning/webnn/issues/138 issue #138
14:22:07 <anssik> anssik: last time around we discussed the pros and cons of the "failure is an option" pattern
14:22:31 <anssik> ... TAG suggest future-proofing via raising errors when the underlying hardware does not support a particular activation
14:22:38 <anssik> ... TAG says: "The reason why I think this pattern might be better is because it discourages preemptively implementing different code branches (e.g. if accelerator is A at the time of implementation, ossify model to the capabilities of A at the time being) like how user agent based branching is abused as of today."
14:23:21 <chai> q+
14:23:36 <anssik> ... Chai notes error handling complicates the API caller's code, and makes perf unpredictable
14:23:38 <anssik> ack chai
14:24:01 <anssik> Chai: I think the topic is being discussed in the issue, I summarized my thoughts there
14:24:09 <anssik> ... failure is indeed an option, I do not object to that
14:24:26 <anssik> ... my concern is more about the API usability
14:24:57 <anssik> ... when we design the op API we said we want to design the smallest possible ops so bigger networks can be composed of smaller wants, if that makes sense from the OS side
14:25:55 <anssik> ... also some network like RNN etc. are real networks not just ops, those can been already supported by existing OSes and we want to optimize for those in this API, so to us the GRU is perf shortcut
14:26:51 <anssik> ... it is hard to find hardware that support RNN differently, concern of overloading an API with every activation, 20+, is that you risk defining an API that overtime the caller does not know if it'll succeed from time to time, and the caller will stop using the API
14:28:02 <RafaelCintron> +1 to Chai's points
14:28:11 <anssik> ... we want all the known cases where the support is universal GRU w/ sigmoid to work, this will be done by the framework, when it loads a model it'll convert it into WebNN that is considered a backend, by overloading with every possible activations the API will fail randomly and we risk people not using it because it won't be reliable
14:28:40 <ningxin_hu> +1
14:28:47 <rama> +1
14:30:18 <anssik> anssik: suggesting this design principle to be added to the spec as a note
14:30:25 <anssik> chai: +1 I'll take care of that
14:31:26 <anssik> Chai: is it unusual to have a FAQ section in the spec?
14:32:23 <anssik> anssik: on top of my head, not many specs have FAQ
14:34:41 <anssik> q?
14:34:44 <anssik> Topic: Privacy review
14:34:50 <anssik> Subtopic: [privacy-tracker] Self-Review Questionnaire: Security and Privacy
14:35:02 <anssik> -> https://github.com/webmachinelearning/webnn/issues/119 Issue #119
14:35:07 <anssik> anssik: issue #119 track out responses to the Self-Review Questionnaire for Security and Privacy
14:35:17 <anssik> ... we should close this issue from our end once we've addressed all actionable feedback.
14:35:28 <anssik> ... we did already merge PR #159 that addressed issue #145 and made WebNN API a policy-controlled feature
14:36:01 <anssik> ... I have opened an issue #122 to add a Security and privacy considerations section to the spec and incorporate PING feedback that does not suggest normative changes into this section
14:36:15 <anssik> -> https://github.com/webmachinelearning/webnn/issues/122 Issue #122
14:36:39 <anssik> q?
14:36:54 <anssik> Subtopic: [privacy-tracker] Fingerprinting via matmul
14:37:08 <anssik> anssik: another specific issue flagged by PING is about fingerprinting via matmul
14:37:15 <anssik> -> https://github.com/webmachinelearning/webnn/issues/85 Issue #85
14:37:33 <anssik> anssik: the author of this issue Kenneth Heafield from Uni of Edinburgh works on Natural Language Processing, he also presented at our workshop on privacy focused machine translation in Firefox
14:37:37 <anssik> -> https://www.w3.org/2020/06/machine-learning-workshop/talks/privacy_focused_machine_translation_in_firefox.html Kenneth's workshop presentation
14:37:56 <anssik> anssik: as part of his implementation work, he reported that an efficient matmul implementation can be fingerprinted to determine hardware capabilities.
14:38:11 <anssik> ... as you know, fingerprinting is a substantial concern on the web platform, and PING has produced an entire document discussing this issue
14:38:16 <anssik> -> https://www.w3.org/TR/fingerprinting-guidance/ Mitigating Browser Fingerprinting in Web Specifications
14:39:23 <anssik> anssik: the proposed course of action for the group is to review this issue, study PING fingerprinting mitigation doc and document the issue and its proposed mitigations in the Privacy and Security section
14:39:30 <anssik> ... does this sound reasonable?
14:39:58 <anssik> q?
14:41:26 <anssik> ningxin_hu: I haven't looked at this so far, I think this is related to CPU instructions, perhaps relevant to Wasm group as well?
14:41:51 <anssik> Present+ Jonathan_Bingham
14:42:11 <anssik> ningxin_hu: I can talk to some Intel folks working on Wasm and report back to the group
14:42:23 <anssik> q?
14:42:51 <anssik> Topic: Operation-specific APIs proposal
14:43:00 <anssik> anssik: We keep on making progress on speccing features that satisfy the requirements of the operation-specific APIs proposal
14:43:04 <anssik> Subtopic: Add support for device selection
14:43:09 <anssik> -> https://github.com/webmachinelearning/webnn/pull/162 PR #162 (merged)
14:43:18 <anssik> anssik: Chai great work rescoping this PR and getting it landed
14:43:26 <anssik> ... Chai, want to give a brief on what new features were introduced in this PR?
14:43:37 <anssik> anssik: From PR log:
14:43:41 <anssik> ... - Add a device preference in the context option in addition to the power preference. This change allows the app to have more control over what type of execution device should be used. It enables an effective interop with WASM (#156).
14:43:51 <anssik> Chai: PR rescoped to just device selection
14:44:09 <anssik> ... need a new separate issue for accelerator
14:45:14 <ningxin_hu> +1
14:46:36 <anssik> https://github.com/webmachinelearning/webnn/issues/169
14:47:05 <anssik> Subtopic: Asynchronous data download
14:47:14 <anssik> anssik: no separate issue for this, discussed in PR #162
14:47:19 <anssik> -> https://github.com/webmachinelearning/webnn/pull/166 PR #166
14:47:26 <anssik> anssik: Ningxin submitted a PR, in review
14:47:39 <anssik> ... Ningxin please introduce these suggested changes and key opens
14:48:10 <anssik> Ningxin: changes in this PR (follow up for #162):
14:48:22 <anssik> ... - Introduce MLTensor interface that supports download data asynchronously.
14:48:25 <anssik> ... - Add an overloaded version of MLGraph.compute that returns MLTensor synchronously.
14:48:30 <anssik> ... - Add a new example that computes multiple graphs without downloading the intermediate results.
14:49:58 <anssik> ningxin_hu: if you look at today's compute API, compute preallocated outputs and provide ML outputs and bind buffers, supply input and output, another usage is to get a newly allocated buffer from the browser, caller provides only input and gets output in a promise resolution, that's the current API
14:50:17 <anssik> ... in this PR the preallocated usage is not changes, still via MLInput and MLOutput, you provide both when calling compute()
14:50:37 <anssik> ... newly allocated buffer case uses MLTensor to replace MLOutput that's dictionary
14:50:53 <anssik> ... MLTensor is an interface, can download to a buffer and return it async
14:51:14 <anssik> ... for this usage compute API is sync API, user can get MLTensor immediately upon compute()
14:51:35 <anssik> ... added also use case sample for multiple ops executing in sequence
14:51:45 <anssik> ... new example 5 illustrates this use case
14:52:39 <anssik> ... code can create three different conv single-op graphs, JS code executes these three in sequence and uses the output MLTensor as input to the next one, issues three computes in sequence and only accesses the final output at the final step in an async way
14:53:03 <anssik> ... this can satisfy the usage required, also preallocated buffer usage we already support
14:53:08 <anssik> ... as for opens:
14:53:29 <anssik> ... it looks like this PR makes caller to decide the API based on the underlying implementation?
14:53:56 <anssik> ... I suppose that's not the case, caller chooses based on usage, MLTensor for newly returned buffers
14:54:25 <chai> q+
14:54:35 <anssik> ... for the implementation, if this feature is merged, implementation should support both
14:54:47 <anssik> ... I think DML backend should be able to support these two usages
14:54:53 <anssik> ack chai
14:55:21 <anssik> Chai: thanks ningxin_hu! This is an issue moved from the device selection to this PR
14:55:50 <anssik> ... when there's disagreement in the API how it is exposed, we should reexamine the assumptions
14:56:03 <anssik> ... caller having to know what the implementation is beforehand
14:56:39 <anssik> ... we should look at what is the exact problem we want to solve
14:57:06 <anssik> ... caller wants to use the WebNN in a way they can stream multiple compiled graphs in a sequence
14:57:55 <anssik> ... caller can compile a graph of one op, and them stream them in sequence
14:59:03 <anssik> ... special case is to not only stream but control in between the two graphs and pass intermediate data, the implementer of WebNN needs to allow a case where output can be produced no-one can understand, because the contract allow such data to be passed between the graphs
14:59:42 <anssik> ... even if the desire is noble, if we can hide the implementation details from the caller, and let the caller move data on their own without looking at it, until the very end when the data is revealed
15:00:12 <anssik> ... not super familiar with Wasm, but believe this is how  you can make Wasm backend to run faster, additional design requirement inserted here
15:00:42 <anssik> ... this is fundamentally changing the initial assumptions
15:02:00 <anssik> ... OS or browser layout, does not matter, output will be in this layout, fundamental to the current design, even after you compile and execute, the layout is impl specific, but only at the last point you convert it to something that can be understood -- this is hard for implementations, since they need to defer until the caller wants to download the data
15:02:13 <anssik> ... that is the core of this discussion in this PR
15:02:30 <anssik> ... this is a design request
15:02:31 <anssik> q?
15:03:37 <anssik> Chai: are we OK to change the API fundamental this way? if we accept that whatever you read in the op spec, you should assume that's not true, because conv can produce whatever it wants to produce, and only method that matters is the final method that says download this data
15:03:46 <anssik> ... we need to discuss this  point more than the specifics of the PR
15:04:33 <ningxin_hu> q+
15:04:36 <anssik> q?
15:04:38 <anssik> ack ningxin_hu
15:05:08 <anssik> ningxin_hu: I understand what Chai says, I think this PR does not change layout of op outputs
15:05:30 <anssik> ... when the user access any graph data output, the user will always get the data in the layout defined by the spec
15:06:17 <anssik> ... if the impl uses the standard layout that OK, the user code does not know whether internal layout is used, the surface of the API is always in layout defined by the spec
15:06:31 <anssik> ... I don't understand why this breaks the fundamental assumptions?
15:06:50 <anssik> Chai: the reason is because we are in this PR proposing overloads, two ways to call compute
15:07:01 <anssik> ... depending on what compute overload you call
15:08:22 <anssik> Chai: for GPU case overload would be inefficient, here the two overloads are different things
15:09:07 <anssik> ... the specific point re DML is more that for the API to work well, you want to give it a resource to produce output with
15:09:23 <anssik> ... any GPU calls you provide output buffer ahead of time and reuse it again and again
15:09:44 <anssik> ... the first calling pattern does not allow that to happen, thus second pattern is introduced
15:10:21 <anssik> ... do we want to allow the first calling pattern to exist? just raising this issue, we have to be OK that if the op has been compiled can produce something that's internal
15:10:39 <anssik> q?
15:11:21 <anssik> ningxin_hu: Chai made two points, 1) overloaded compute()
15:11:45 <anssik> ... I agree the return is different, in this PR we return MLTensor and access data via .data
15:12:03 <anssik> ... for the same two usages, even this spec has optional MLOutputs for 2nd param
15:12:25 <anssik> ... and we return MLOutput in a promise, so we have two usages for preallocated buffer
15:12:37 <anssik> ... these two ways to allocate buffers exists today
15:13:14 <anssik> ... in the PR, the second usage for internally allocated buffer, would be also helpful for the GPU case, if the device a GPU, but MLInput is bound to ArrayBufferView
15:14:29 <anssik> ... MLTensor as an output, we can just use MLTensor as the first op and input to the second, data can be in GPUBuffer w/o download back to the CPU
15:14:35 <anssik> q?
15:15:09 <anssik> -> https://github.com/webmachinelearning/webnn/issues/156 Support CPU - WebAssembly scenario of the op level execution use case #156
15:15:57 <anssik> Chai: let's continue in https://github.com/webmachinelearning/webnn/pull/166
15:18:02 <anssik> RRSAgent, draft minutes
15:18:02 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik
15:18:33 <anssik> s/Present0 Sandeep_Gupta//
15:18:36 <anssik> RRSAgent, draft minutes
15:18:36 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik
15:19:03 <anssik> Present+ Sandeep_Gupta
15:19:13 <anssik> RRSAgent, draft minutes
15:19:13 <RRSAgent> I have made the request to generate https://www.w3.org/2021/05/13-webmachinelearning-minutes.html anssik