14:49:05 RRSAgent has joined #webmachinelearning 14:49:05 logging to https://www.w3.org/2021/12/02-webmachinelearning-irc 14:49:10 Meeting: WebML WG Teleconference – 2 Dec 2021 14:49:10 RRSAgent, make logs Public 14:49:12 please title this meeting ("meeting: ..."), anssik 14:49:15 Chair: Anssi 14:49:21 Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2021-12-02-agenda.md 14:49:28 Scribe: Anssi 14:49:37 scribeNick: anssik 14:49:38 Present+ Anssi_Kostiainen 15:00:54 Present+ Bruce 15:01:10 Present+ Ningxin_Hu 15:01:20 Present+ Dom 15:01:41 ningxin_hu has joined #webmachinelearning 15:02:15 Present+ Ganesan_Ramalingam 15:02:38 RRSAgent, draft minutes 15:02:38 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik 15:03:08 Present+ Chai_Chaoweeraprasit 15:03:32 rama has joined #webmachinelearning 15:03:50 chai has joined #webmachinelearning 15:04:09 Bruce has joined #webmachinelearning 15:04:39 RafaelCintron has joined #webmachinelearning 15:04:47 Present+ Rafael_Cintron 15:05:35 Topic: Announcements 15:05:50 -> https://groups.google.com/a/chromium.org/g/blink-dev/c/PD6TDMDS9mg Intent to Prototype in Chromium: Web Neural Network API 15:06:19 Anssi: congratulations to the group on the intent to prototype announcement! 15:06:42 ... I've received positive signals from web developers 15:07:18 Topic: TPAC meeting follow-up (cont'd) 15:07:19 Topic: TPAC meeting follow-up (cont'd) 15:07:31 jonathan has joined #webmachinelearning 15:07:48 s/Topic: TPAC meeting follow-up (cont'd)// 15:07:52 Subtopic: Conformance testing of WebNN API 15:08:33 anssi: I would like to discuss the practicalities of a reference implement of WebNN for a baseline in a conformance test 15:08:42 ... and review the status of Web Platform Tests 15:08:52 -> https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0017/Conformance_Testing_of_Machine_Learning_API.pdf Conformance Testing of Machine Learning API (slides) 15:09:02 ... I also want to make sure Chai's learnings from DirectML testing are incorporated in our work 15:09:19 ... First, any thought about establishing a reference implementation? 15:10:04 dom: I don't think there's been reference impl in the context of w-p-t itself, there have been cases we've seen a single lib used across all implementations 15:10:13 ... but that lib has not been a reference impl per se 15:10:46 ... e.g. librtc, woff, png lib, maybe Dawn for WebGPU 15:10:55 ... possibly in future webnn-native? 15:11:27 ... given the specificity of the field, I'd say a reference implementation question is probably do we have the resources to pull it off 15:11:46 anssi: so no fundamental issue beyond resources 15:11:54 q? 15:11:55 +1 15:12:19 s/probably do/probably a good thing to have if/ 15:12:29 anssi: webnn-native is our de-facto reference implementation 15:12:29 q? 15:12:41 q+ 15:13:12 chai: this makes sense; when we built DirectML, this was one of the first things we did 15:13:19 Present+ Jonathan_Bingham 15:13:23 ... without it, it's very hard to determine if you've made any mistake 15:13:25 Present+ Rachel_Yager 15:13:33 RRSAgent, draft minutes 15:13:33 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik 15:13:47 ... in a discussion with Bruce and Ningxin, they have a reference implementation in mind based on tfjs with CPU backend 15:13:50 ... that should be OK 15:14:00 ... the thing about reference implementations is that they provide a good baseline 15:14:29 ... based on ningxin feedback, the cpu backend uses double-precision which is good - we should double-check it is so 15:14:39 ... it's not so much about data types, it's more about computation 15:14:55 ... you can compute in double-precision and truncate into simple precision 15:15:06 ... what matters is not the destination data type, but how the calculation is made 15:15:17 q+ 15:15:44 ... for our DirectML ref implementation, we make sure to use CPU double-precision 15:16:01 ... when it comes to comparison, if the comparison is made on single precision data 15:16:19 ... you want to compare that with a truncated version of the baseline 15:16:28 ... which will most likely produce a result that is 1 ULP off 15:17:14 Scribe+ dom 15:17:17 RRSAgent, draft minutes 15:17:17 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik 15:17:24 ... I may be able to publish the ULP tolerance that we use for DirectML conformance 15:17:29 ... the range is within 1 to 5 15:17:50 ... for DirectML, there are many ways to calculate as well, incl half-precision and half-precision truncated from single precision 15:17:59 RRSAgent, draft minutes 15:17:59 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik 15:18:01 q+ 15:18:08 ... you need at least two tables for float32 and float16 15:18:10 q? 15:18:11 ack ningxin_hu 15:18:41 Ningxin: Bruce and I investigated the baseline, there are options: WebNN-native, WebNN-polyfill 15:18:59 ... the concern of -native is that it depends on native ML APIs 15:19:31 ... and that is ultimately one of the implementation we want to test 15:19:54 ... also, the ref needs double precision on CPU, which isn't supported in WebNN-native 15:20:25 ... So that's why Bruce and I have been looking at WebNN polyfill, built on TensorFlow JS 15:20:33 ... JS uses double precision 15:21:06 q? 15:21:50 ... Bruce started to use this as a baseline to measure that the ULP distance between baseline and other (incl other backends) 15:22:35 https://github.com/webmachinelearning/webnn-polyfill/issues/144 15:22:39 ack dom 15:23:04 dom: first, in terms of naming, maybe not need to call it "reference implementation" but "baseline implementation" instead 15:23:48 ... given what we want to accomplish in setting the baseline one key aspect is this baseline needs to give confidence it fulfills the requirements for conformance 15:24:01 ... the codebase should be easy to review, not too many layers of abtraction 15:24:22 ... polyfill might be problematic in terms of layering, or maybe not? 15:24:41 ... also need to consider whether the baseline could be a deliverable of this WG 15:25:06 q? 15:25:10 ack rama 15:25:19 q+ 15:25:56 q+ 15:25:57 rama: in general it is hard to say what the numeric precision will be 15:26:07 q? 15:26:15 ack chai 15:26:36 chai: I wanted to say exactly that there are two components, if you implement ref all computation is done in double precision 15:26:53 ... not producing loss, you want baseline to be "ideal", we don't want any intermediate casting 15:27:11 ... looking at your calculation, you move into single precision and truncate, cumulate loss 15:27:23 ... double precision compute must be it, not truncation 15:27:37 ... conv, there's so many ways to implement it, what the ref should do? 15:27:50 ... ref should implement convolution as the semantics say? 15:28:35 q+ 15:28:38 ... you don't have a known result, flatten gemm, add algorithm, you make shortcuts, there are two components when building ref, both are true, each of the convolutions, dot products within the kernel is what we do 15:28:51 ... if we are to use tfjs CPU backend we need to know what we compare against 15:29:04 ... otherwise, compare two impls together, not knowing which one is correct 15:29:16 ... ref is an interesting topic, because you really want to know what is done 15:29:17 q? 15:29:33 anssik: did that answer your question? 15:29:46 rama: yes, it is hard to establish bounds on what the delta between impls is 15:30:02 chai: my opinion is the best ref is an open source ref, anyone can look at the code and be confident 15:30:18 ... for GPU conv it is sometimes done on hw 15:30:39 ... if optimized to be very fast but against ref you see big ULP diff, should we accept the result? 15:30:41 q? 15:31:19 ack Bruce 15:31:43 Bruce: I have slides to report on conformance follow-up 15:31:47 q? 15:31:55 ack ningxin_hu 15:32:15 Rachel_ has joined #webmachinelearning 15:32:38 ningxin: chai, you mentioned that for the ref implementation, the intermediary results should be in double precision without truncation 15:32:41 ningxin_hu: question to chai, you mention to do the ref impl, intermediates should be in double precision without truncation 15:32:53 ... this is for op tests, but do you expect the same for model tests? 15:33:01 chai: no 15:33:18 ... e.g. ref impl should not truncate before you do relu 15:33:37 ... keep double precision result as ideal 15:33:54 ... you have to keep everything w/o truncation all the way until the end 15:34:00 ... the best method for op level 15:34:14 ... if you try to do this on a graph level, you cannot guarantee that exec does not yield loss 15:34:21 q? 15:34:37 ningxin_hu: I believe we need to look close to TF.js CPU backend 15:34:51 ... even arithmetics are double precision 15:34:59 ... data is single precision, float32 15:35:50 ... the fused like relu, current impl in polyfill is implemented with conv+relu as two ops because TF.js does not have a single op 15:36:08 ... a reason we may need to pay closer look at implementation 15:36:15 ... dom mentioned layers and open source impl 15:36:28 ... webnn-polyfill based on TF.js which adds layers 15:36:34 ... not so easy to review 15:36:36 q? 15:37:41 anssik: can you ningxin, Bruce, maybe with chai and ping to review form a plan for baseline implementation 15:37:46 ningxin_hu: yes, will do 15:38:38 Slideset: bruceslides 15:38:46 [slide 1] 15:39:01 [slide 3] 15:39:22 Bruce: a first PR has been submitted for WebNN to WPT 15:39:38 ... a separate pull request has been submitted with the polyfill as suggested by Dom 15:40:19 ... we've collected ULP distances as PR to webnn-polyfill, based on the TFjs CPU backend as baseline 15:40:23 [slide 4] 15:40:34 RRSAgent, draft minutes 15:40:34 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik 15:40:57 ... the data has been collected with the WebNN polyfill with CPU backend as baseline 15:41:21 ... the table shows the ULP distance with WASM (CPU) and WebGL (GPU) 15:42:04 ... this is the "max" distance - it varies across different parameters with some ops 15:42:30 ... 5 observations: 15:42:49 ... the WASM backend have the same ma distance across devices 15:43:18 ... on the other hand, WebGL distances varies significantly across devices 15:43:48 ... some operations have the same distance on both WASM and WebGL (e.g. concat is zero everywhere, gemm is 1 everwhere, instanceNorm 128) 15:45:19 q? 15:46:19 q? 15:46:58 anssik: thank you, very important work! in the interest of time, let's follow up via github issues 15:47:08 [slide 5] 15:47:30 bruce: two open questions - how do we determine the acceptable distance of ops? 15:47:38 q+ 15:47:48 ack chai 15:48:08 chai: the answer is - it depends, ops by ops 15:48:20 ... normally, it shouldn't be more than 10; for simpler cases, it should be 1 or 0 15:48:32 ... let's continue in the github thread 15:48:42 Subtopic: Model Loader API update 15:49:34 Jonathan: the chromeos team is working on impl the model loader API in collab with chrome team 15:49:40 ... to get us to the point of benchmarking 15:49:49 -> https://github.com/webmachinelearning/model-loader/blob/main/explainer.md Updated Model Loader API explainer 15:50:03 ... we're doing similarly to what Ningxin has done with WebNN, e.g. comparing with WASM results 15:50:41 ... it shows some sources of performance loss - specific ops, but also security impact, hardware quirks 15:50:49 ... we hope to send an intent to prototype relatively soon 15:50:53 ... what about model formats? 15:51:19 ... we're still in agreement that the correct solution is to have a single format that developers can trust to have on all platforms 15:51:31 ... given that we don't have that yet, we're using TFLite in our prototype 15:51:43 ... the path from that prototype to a standard, there are multiple ways: 15:51:50 s/TFLite/TF Lite flat buffer format 15:52:00 ... * we pick an existing standard (ONNX may be the sole contender at this stage) 15:52:16 ... * the TF team pushes TFLite toward such a status - still a long shot 15:52:42 ... * we develop a new format - in this group or elsewhere; the work done with WebNN is exactly the kind of work we would need for a Web standard format 15:52:47 q? 15:52:55 ... We haven't changed anything on this at this point 15:53:35 ... FWIW, the people developing this in ChromeOS are based in Australia - they can't join the meeting at this time (2am for them) 15:53:40 q+ 15:53:50 ... we would need a different timeslot to hear directly from them 15:53:50 q? 15:53:55 ack RafaelCintron 15:54:02 anssik: let's figure out a solution to that logistical issue offline 15:54:10 rafael: thanks for the summary, very exciting 15:54:19 ... can you say a bit more on the cost of security to performence? 15:54:26 jonathan: not well positioned to get into the details 15:54:36 ... but securing hardware execution has a perf impact 15:54:56 ... WASM is effectively a VM technology running in the browser, which imposes a cost 15:55:03 RRSAgent, draft minutes 15:55:03 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html anssik 15:55:04 q? 15:55:10 ... this is the kind of topic where getting the Australians on the call would help 15:56:10 anssik: thanks for the exciting update! 15:57:37 ... the Model Loader API is a CG deliverable at the moment - solving the model format issue is a prerequisite to adopting this in the WG 15:57:41 dom: a key point for adoption is the model format, non-trivial to get that done, some visibility into when that might be would help in when the WG could start adopting the Model Loader API 15:58:18 s/ when that might be/ when we might be in a position to pick one of the 3 mentioned paths/ 15:58:47 anssik: the model loader repo in the CG would be a good place to have the format discussion 15:59:29 anssik: we'll defer the ethical discussions as first topic of our next call 16:00:07 ... we can also look at other issues in the webnn repo, incl our discussion with WebRTC to prototype usage of WebNN in the context of WebRTC e.g. for background blurring 16:00:12 RRSAgent, draft minutes 16:00:12 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html dom 16:00:41 anssik: next call on Dec 16 - stay safe 16:00:42 RRSAgent, draft minutes 16:00:42 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html dom 16:29:13 s|bruceslides|https://lists.w3.org/Archives/Public/www-archive/2021Dec/att-0001/Conformance_Testing_Follow-up.pdf 16:29:19 RRSAgent, draft minutes v-slide 16:29:19 I have made the request to generate https://www.w3.org/2021/12/02-webmachinelearning-minutes.html dom 18:01:31 Zakim has left #webmachinelearning