14:49:05 <RRSAgent> RRSAgent has joined #webmachinelearning
14:49:05 <RRSAgent> logging to https://www.w3.org/2021/12/02-webmachinelearning-irc
14:49:10 <anssik> Meeting: WebML WG Teleconference – 2 Dec 2021
14:49:10 <Zakim> RRSAgent, make logs Public
14:49:12 <Zakim> please title this meeting ("meeting: ..."), anssik
14:49:15 <anssik> Chair: Anssi
14:49:21 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-12-02-agenda.md
14:49:28 <anssik> Scribe: Anssi
14:49:37 <anssik> scribeNick: anssik
14:49:38 <anssik> Present+ Anssi_Kostiainen
15:00:54 <anssik> Present+ Bruce
15:01:10 <anssik> Present+ Ningxin_Hu
15:01:20 <anssik> Present+ Dom
15:01:41 <ningxin_hu> ningxin_hu has joined #webmachinelearning
15:02:15 <anssik> Present+ Ganesan_Ramalingam
15:02:38 <anssik> RRSAgent, draft minutes
15:02:38 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik
15:03:08 <anssik> Present+ Chai_Chaoweeraprasit
15:03:32 <rama> rama has joined #webmachinelearning
15:03:50 <chai> chai has joined #webmachinelearning
15:04:09 <Bruce> Bruce has joined #webmachinelearning
15:04:39 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:04:47 <anssik> Present+ Rafael_Cintron
15:05:35 <anssik> Topic: Announcements
15:05:50 <anssik> -> https://groups.google.com/a/chromium.org/g/blink-dev/c/PD6TDMDS9mg Intent to Prototype in Chromium: Web Neural Network API
15:06:19 <dom> Anssi: congratulations to the group on the intent to prototype announcement!
15:06:42 <dom> ... I've received positive signals from web developers
15:07:18 <anssik> Topic: TPAC meeting follow-up (cont'd)
15:07:19 <dom> Topic: TPAC meeting follow-up (cont'd)
15:07:31 <jonathan> jonathan has joined #webmachinelearning
15:07:48 <anssik> s/Topic: TPAC meeting follow-up (cont'd)//
15:07:52 <anssik> Subtopic: Conformance testing of WebNN API
15:08:33 <dom> anssi: I would like to discuss the practicalities of a reference implement of WebNN for a baseline in a conformance test
15:08:42 <dom> ... and review the status of Web Platform Tests
15:08:52 <anssik> -> https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf Conformance Testing of Machine Learning API (slides)
15:09:02 <dom> ... I also want to make sure Chai's learnings from DirectML testing are incorporated in our work
15:09:19 <dom> ... First, any thought about establishing a reference implementation?
15:10:04 <anssik> dom: I don't think there's been reference impl in the context of w-p-t itself, there have been cases we've seen a single lib used across all implementations
15:10:13 <anssik> ... but that lib has not been a reference impl per se
15:10:46 <anssik> ... e.g. librtc, woff, png lib, maybe Dawn for WebGPU
15:10:55 <anssik> ... possibly in future webnn-native?
15:11:27 <anssik> ... given the specificity of the field, I'd say a reference implementation question is probably do we have the resources to pull it off
15:11:46 <dom> anssi: so no fundamental issue beyond resources
15:11:54 <anssik> q?
15:11:55 <ningxin_hu> +1
15:12:19 <dom> s/probably do/probably a good thing to have if/
15:12:29 <dom> anssi: webnn-native is our de-facto reference implementation
15:12:29 <anssik> q?
15:12:41 <ningxin_hu> q+
15:13:12 <dom> chai: this makes sense; when we built DirectML, this was one of the first things we did
15:13:19 <anssik> Present+ Jonathan_Bingham
15:13:23 <dom> ... without it, it's very hard to determine if you've made any mistake
15:13:25 <anssik> Present+ Rachel_Yager
15:13:33 <anssik> RRSAgent, draft minutes
15:13:33 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik
15:13:47 <dom> ... in a discussion with Bruce and Ningxin, they have a reference implementation in mind based on tfjs with CPU backend
15:13:50 <dom> ... that should be OK
15:14:00 <dom> ... the thing about reference implementations is that they provide a good baseline
15:14:29 <dom> ... based on ningxin feedback, the cpu backend uses double-precision which is good - we should double-check it is so
15:14:39 <dom> ... it's not so much about data types, it's more about computation
15:14:55 <dom> ... you can compute in double-precision and truncate into simple precision
15:15:06 <dom> ... what matters is not the destination data type, but how the calculation is made
15:15:17 <dom> q+
15:15:44 <dom> ... for our DirectML ref implementation, we make sure to use CPU double-precision
15:16:01 <dom> ... when it comes to comparison, if the comparison is made on single precision data
15:16:19 <dom> ... you want to compare that with a truncated version of the baseline
15:16:28 <dom> ... which will most likely produce a result that is 1 ULP off
15:17:14 <anssik> Scribe+ dom
15:17:17 <anssik> RRSAgent, draft minutes
15:17:17 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik
15:17:24 <dom> ... I may be able to publish the ULP tolerance that we use for DirectML conformance
15:17:29 <dom> ... the range is within 1 to 5
15:17:50 <dom> ... for DirectML, there are many ways to calculate as well, incl half-precision and half-precision truncated from single precision
15:17:59 <anssik> RRSAgent, draft minutes
15:17:59 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik
15:18:01 <rama> q+
15:18:08 <dom> ... you need at least two tables for float32 and float16
15:18:10 <anssik> q?
15:18:11 <anssik> ack ningxin_hu
15:18:41 <dom> Ningxin: Bruce and I investigated the baseline, there are options: WebNN-native, WebNN-polyfill
15:18:59 <dom> ... the concern of -native is that it depends on native ML APIs
15:19:31 <dom> ... and that is ultimately one of the implementation we want to test
15:19:54 <dom> ... also, the ref needs double precision on CPU, which isn't supported in WebNN-native
15:20:25 <dom> ... So that's why Bruce and I have been looking at WebNN polyfill, built on TensorFlow JS
15:20:33 <dom> ... JS uses double precision
15:21:06 <anssik> q?
15:21:50 <dom> ... Bruce started to use this as a baseline to measure that the ULP distance between baseline and other (incl other backends)
15:22:35 <ningxin_hu> https://github.com/webmachinelearning/webnn-polyfill/issues/144
15:22:39 <anssik> ack dom
15:23:04 <anssik> dom: first, in terms of naming, maybe not need to call it "reference implementation" but "baseline implementation" instead
15:23:48 <anssik> ... given what we want to accomplish in setting the baseline one key aspect is this baseline needs to give confidence it fulfills the requirements for conformance
15:24:01 <anssik> ... the codebase should be easy to review, not too many layers of abtraction
15:24:22 <anssik> ... polyfill might be problematic in terms of layering, or maybe not?
15:24:41 <anssik> ... also need to consider whether the baseline could be a deliverable of this WG
15:25:06 <anssik> q?
15:25:10 <anssik> ack rama
15:25:19 <Bruce> q+
15:25:56 <chai> q+
15:25:57 <anssik> rama: in general it is hard to say what the numeric precision will be
15:26:07 <anssik> q?
15:26:15 <anssik> ack chai
15:26:36 <anssik> chai: I wanted to say exactly that there are two components, if you implement ref all computation is done in double precision
15:26:53 <anssik> ... not producing loss, you want baseline to be "ideal", we don't want any intermediate casting
15:27:11 <anssik> ... looking at your calculation, you move into single precision and truncate, cumulate loss
15:27:23 <anssik> ... double precision compute must be it, not truncation
15:27:37 <anssik> ... conv, there's so many ways to implement it, what the ref should do?
15:27:50 <anssik> ... ref should implement convolution as the semantics say?
15:28:35 <ningxin_hu> q+
15:28:38 <anssik> ... you don't have a known result, flatten gemm, add algorithm, you make shortcuts, there are two components when building ref, both are true, each of the convolutions, dot products within the kernel is what we do
15:28:51 <anssik> ... if we are to use tfjs CPU backend we need to know what we compare against
15:29:04 <anssik> ... otherwise, compare two impls together, not knowing which one is correct
15:29:16 <anssik> ... ref is an interesting topic, because you really want to know what is done
15:29:17 <anssik> q?
15:29:33 <anssik> anssik: did that answer your question?
15:29:46 <anssik> rama: yes, it is hard to establish bounds on what the delta between impls is
15:30:02 <anssik> chai: my opinion is the best ref is an open source ref, anyone can look at the code and be confident
15:30:18 <anssik> ... for GPU conv it is sometimes done on hw
15:30:39 <anssik> ... if optimized to be very fast but against ref you see big ULP diff, should we accept the result?
15:30:41 <anssik> q?
15:31:19 <anssik> ack Bruce
15:31:43 <anssik> Bruce: I have slides to report on conformance follow-up
15:31:47 <anssik> q?
15:31:55 <anssik> ack ningxin_hu
15:32:15 <Rachel_> Rachel_ has joined #webmachinelearning
15:32:38 <dom> ningxin: chai, you mentioned that for the ref implementation, the intermediary results should be in double precision without truncation
15:32:41 <anssik> ningxin_hu: question to chai, you mention to do the ref impl, intermediates should be in double precision without truncation
15:32:53 <anssik> ... this is for op tests, but do you expect the same for model tests?
15:33:01 <anssik> chai: no
15:33:18 <anssik> ... e.g. ref impl should not truncate before you do relu
15:33:37 <anssik> ... keep double precision result as ideal
15:33:54 <anssik> ... you have to keep everything w/o truncation all the way until the end
15:34:00 <anssik> ... the best method for op level
15:34:14 <anssik> ... if you try to do this on a graph level, you cannot guarantee that exec does not yield loss
15:34:21 <anssik> q?
15:34:37 <anssik> ningxin_hu: I believe we need to look close to TF.js CPU backend
15:34:51 <anssik> ... even arithmetics are double precision
15:34:59 <anssik> ... data is single precision, float32
15:35:50 <anssik> ... the fused like relu, current impl in polyfill is implemented with conv+relu as two ops because TF.js does not have a single op
15:36:08 <anssik> ... a reason we may need to pay closer look at implementation
15:36:15 <anssik> ... dom mentioned layers and open source impl
15:36:28 <anssik> ... webnn-polyfill based on TF.js which adds layers
15:36:34 <anssik> ... not so easy to review
15:36:36 <anssik> q?
15:37:41 <anssik> anssik: can you ningxin, Bruce, maybe with chai and ping to review form a plan for baseline implementation
15:37:46 <anssik> ningxin_hu: yes, will do
15:38:38 <dom> Slideset: bruceslides
15:38:46 <dom> [slide 1]
15:39:01 <dom> [slide 3]
15:39:22 <dom> Bruce: a first PR has been submitted for WebNN to WPT
15:39:38 <dom> ... a separate pull request has been submitted with the polyfill as suggested by Dom
15:40:19 <dom> ... we've collected ULP distances as PR to webnn-polyfill, based on the TFjs CPU backend as baseline
15:40:23 <dom> [slide 4]
15:40:34 <anssik> RRSAgent, draft minutes
15:40:34 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik
15:40:57 <dom> ... the data has been collected with the WebNN polyfill with CPU backend as baseline
15:41:21 <dom> ... the table shows the ULP distance with WASM (CPU) and WebGL (GPU)
15:42:04 <dom> ... this is the "max" distance - it varies across different parameters with some ops
15:42:30 <dom> ... 5 observations:
15:42:49 <dom> ... the WASM backend have the same ma distance across devices
15:43:18 <dom> ... on the other hand, WebGL distances varies significantly across devices
15:43:48 <dom> ... some operations have the same distance on both WASM and WebGL (e.g. concat is zero everywhere, gemm is 1 everwhere, instanceNorm 128)
15:45:19 <anssik> q?
15:46:19 <anssik> q?
15:46:58 <dom> anssik: thank you, very important work! in the interest of time, let's follow up via github issues
15:47:08 <dom> [slide 5]
15:47:30 <dom> bruce: two open questions - how do we determine the acceptable distance of ops?
15:47:38 <chai> q+
15:47:48 <anssik> ack chai
15:48:08 <dom> chai: the answer is - it depends, ops by ops
15:48:20 <dom> ... normally, it shouldn't be more than 10; for simpler cases, it should be 1 or 0
15:48:32 <dom> ... let's continue in the github thread
15:48:42 <anssik> Subtopic: Model Loader API update
15:49:34 <dom> Jonathan: the chromeos team is working on impl the model loader API in collab with chrome team
15:49:40 <dom> ... to get us to the point of benchmarking
15:49:49 <anssik> -> https://github.com/webmachinelearning/model-loader/blob/main/explainer.md Updated Model Loader API explainer
15:50:03 <dom> ... we're doing similarly to what Ningxin has done with WebNN, e.g. comparing with WASM results
15:50:41 <dom> ... it shows some sources of performance loss - specific ops, but also security impact, hardware quirks
15:50:49 <dom> ... we hope to send an intent to prototype relatively soon
15:50:53 <dom> ... what about model formats?
15:51:19 <dom> ... we're still in agreement that the correct solution is to have a single format that developers can trust to have on all platforms
15:51:31 <dom> ... given that we don't have that yet, we're using TFLite in our prototype
15:51:43 <dom> ... the path from that prototype to a standard, there are multiple ways:
15:51:50 <anssik> s/TFLite/TF Lite flat buffer format
15:52:00 <dom> ... * we pick an existing standard (ONNX may be the sole contender at this stage)
15:52:16 <dom> ... * the TF team pushes TFLite toward such a status - still a long shot
15:52:42 <dom> ... * we develop a new format - in this group or elsewhere; the work done with WebNN is exactly the kind of work we would need for a Web standard format
15:52:47 <anssik> q?
15:52:55 <dom> ... We haven't changed anything on this at this point
15:53:35 <dom> ... FWIW, the people developing this in ChromeOS are based in Australia - they can't join the meeting at this time (2am for them)
15:53:40 <RafaelCintron> q+
15:53:50 <dom> ... we would need a different timeslot to hear directly from them
15:53:50 <anssik> q?
15:53:55 <anssik> ack RafaelCintron
15:54:02 <dom> anssik: let's figure out a solution to that logistical issue offline
15:54:10 <dom> rafael: thanks for the summary, very exciting
15:54:19 <dom> ... can you say a bit more on the cost of security to performence?
15:54:26 <dom> jonathan: not well positioned to get into the details
15:54:36 <dom> ... but securing hardware execution has a perf impact
15:54:56 <dom> ... WASM is effectively a VM technology running in the browser, which imposes a cost
15:55:03 <anssik> RRSAgent, draft minutes
15:55:03 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik
15:55:04 <anssik> q?
15:55:10 <dom> ... this is the kind of topic where getting the Australians on the call would help
15:56:10 <dom> anssik: thanks for the exciting update!
15:57:37 <dom> ... the Model Loader API is a CG deliverable at the moment - solving the model format issue is a prerequisite to adopting this in the WG
15:57:41 <anssik> dom: a key point for adoption is the model format, non-trivial to get that done, some visibility into when that might be would help in when the WG could start adopting the Model Loader API
15:58:18 <dom> s/ when that might be/ when we might be in a position to pick one of the 3 mentioned paths/
15:58:47 <dom> anssik: the model loader repo in the CG would be a good place to have the format discussion
15:59:29 <dom> anssik: we'll defer the ethical discussions as first topic of our next call
16:00:07 <dom> ... we can also look at other issues in the webnn repo, incl our discussion with WebRTC to prototype usage of WebNN in the context of WebRTC e.g. for background blurring
16:00:12 <dom> RRSAgent, draft minutes
16:00:12 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html dom
16:00:41 <dom> anssik: next call on Dec 16 - stay safe
16:00:42 <dom> RRSAgent, draft minutes
16:00:42 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html dom
16:29:13 <dom> s|bruceslides|https://lists.w3.org/Archives/Public/www-archive/2021Dec/att-0001/Conformance_Testing_Follow-up.pdf
16:29:19 <dom> RRSAgent, draft minutes v-slide
16:29:19 <RRSAgent> I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html dom
18:01:31 <Zakim> Zakim has left #webmachinelearning