13:56:14 RRSAgent has joined #webmachinelearning 13:56:18 logging to https://www.w3.org/2023/06/08-webmachinelearning-irc 13:56:18 RRSAgent, make logs Public 13:56:19 please title this meeting ("meeting: ..."), anssik 13:56:19 Chair: Anssi 13:56:28 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-06-08-wg-agenda.md 13:56:32 Scribe: Anssi 13:56:35 scribeNick: anssik 13:57:07 ghurlbot, this is webmachinelearning/webnn 13:57:07 anssik, OK. 13:58:24 Present+ Anssi_Kostiainen 13:58:34 Regrets+ Dominique_Hazael-Massieux 14:01:49 RafaelCintron has joined #webmachinelearning 14:01:53 ningxin_hu has joined #webmachinelearning 14:01:55 zkis has joined #webmachinelearning 14:02:50 chai has joined #webmachinelearning 14:05:15 Joshua has joined #webmachinelearning 14:05:55 Vivek has joined #webmachinelearning 14:06:55 RRSAgent, draft minutes 14:06:57 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 14:07:55 Meeting: WebML WG Teleconference – 8 June 2023 14:08:01 Zakim, prepare meeting 14:08:01 RRSAgent, make logs Public 14:08:03 please title this meeting ("meeting: ..."), anssik 14:08:09 Chair: Anssi 14:08:21 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-06-08-wg-agenda.md 14:08:26 Scribe: Anssi 14:08:32 scribeNick: anssik 14:08:49 RRSAgent, draft minutes 14:08:50 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 14:09:18 Present+ Joshua_Lochner 14:09:26 Present+ Zoltan_Kis 14:09:30 Present+ Ningxin_Hu 14:09:35 Present+ Rafael_Cintron 14:09:42 Present+ Chai_Chaoweeraprasit 14:09:46 Present+ Vivek_Sekhar 14:09:56 RRSAgent, draft minutes 14:09:58 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 14:10:07 Topic: Introductions 14:10:13 anssik: please welcome Joshua to today's call! 14:10:23 ... Joshua Lochner (@xenova) created Transformers.js and joined HuggingFace recently 14:10:23 https://github.com/xenova -> @xenova 14:10:29 ... after a discussion with Nikhil I thought I must invite Joshua to share his findings from Transformers.js with this WG and here he is! 14:10:39 ... we can so a super quick 15 sec intro round: 14:11:30 ... Joshua, 23 year-old SW developer, created Transformers.js and now at HuggingFace, flattered to be called an invited expert! Love open source. 14:12:17 anssik: Anssi, chair of this WG, working at Intel, long-term web standards contributor, excited to see our WG grow and get new participants as we advance to v2 14:12:56 ... Ningxin, WebNN API spec co-editor, impl WebNN in Chromium, working at Intel, welcome Joshua! 14:13:29 ... Chai, running a team at Msft working on ML platform for the core OS, WebNN API spec co-editor 14:14:22 ... Rafael, developer of Msft Edge team, low-level graphics, contribute to browsers, WG participants, DirectML focus 14:15:00 ... Zoltan, Intel, AI research background, part of the web team helping specs advance, in this WG help with the spec algorithms 14:15:29 ... Vivek, Google, Chrome team, WebGPU, Wasm, recently joined the ML effect within Chrome 14:15:40 RRSAgent, draft minutes 14:15:42 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 14:16:03 Topic: Transformers.js 14:16:15 anssik: Joshua Lochner (@xenova) will introduce Transformers.js 14:16:24 ... and share his learnings from this project 14:16:35 ... including practical use cases to help inform the WebNN v2 feature work. 14:16:55 ... We want to make WebNN the most performant and robust backend for a future version of Transformers.js. 14:17:07 ... Joshua provided background material for this meeting in a GH comment where we discuss support for transformers: 14:17:12 https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1560785639 14:17:31 anssik: in our follow up discussion I told him that the WG's key interest is to hear based on his Transformers.js experience: 14:17:45 ... 1) what are the feasible tasks or use cases now or short term in the browser. 14:18:08 ... 2) what is “coming up” but not yet ready for the browser, what is missing to make them feasible. 14:18:26 ... this feedback will be great input into v2 feature discussions to help prioritize our work. 14:18:41 ... I also shared with Joshua the WG's high-level approach to adding new features into WebNN API: 14:18:46 ... 1) identify use cases 14:18:49 brb 14:19:01 ... 2) "research" models, framework support, cross-platform support 14:19:32 ... 3) derive requirements (ops, other functional requirements, non-functional reqs such as perf, a11y, privacy, i18n, usability, responsibility & transparency) 14:19:46 ... 4) spec new features in a close feedback loop with the implementations 14:19:46 back 14:20:05 ... WG's work mode is also captured in our contribution guidelines for new ops 14:20:12 -> https://github.com/webmachinelearning/webnn/blob/main/CONTRIBUTING.md#proposing-and-adding-a-new-operation Proposing and adding a new operation 14:20:22 ... our preexisting "v1" use cases, our spec documents two categories: 14:20:27 -> application use cases https://www.w3.org/TR/webnn/#usecases 14:20:39 -> framework use cases https://www.w3.org/TR/webnn/#usecases-framework 14:21:05 anssik: application use cases is a mix of tasks across multiple modalities, derived from some well-known classic models such as SqueezeNet, MobileNet, ResNet, TinyYOLO, RNNoise, NSNet etc. 14:21:25 ... Computer Vision: semantic segmentation, person detection, skeleton detection etc. 14:21:25 ... Text-to-text: summarization and translation 14:21:25 ... Video-to-text: video summarization 14:21:25 ... Audio: noise suppression 14:21:25 ... etc. 14:21:50 ... framework "use cases" include requirements received from JS framework vendors, e.g. custom layer, network concatenation, perf adaptation, op-level execution, integration with real-time video processing (WebRTC) etc. 14:22:11 ... with that as an intro on behalf of the WG, I'll let Joshua share with us his feedback from Transformers.js, including introduction to this library that uses ONNX Runtime Web (currently with Wasm backend) under the hood! 14:22:46 [Joshua presents slides] 14:23:02 Joshua: Transformers.js runs HF transformers in the browsers 14:23:30 ... run pre-trained models in browser, GH community growing 14:23:46 ... "What can it do?" Text, Vision, Audio, Multimodal 14:24:21 "How does it work?" 1) Convert your modelto ONNX with HF Optimum, 2) Write JS code, 3) Run in the browser 14:27:33 ... Why was it created? Origins: remove spam YT comments; Current plan: support all Transformers models, tokenizers, processors, pipelines, and tasks; Ultimate goal: Help bridge the gap between web dev and ML 14:27:58 RRSAgent, draft minutes 14:27:59 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 14:30:02 Joshua: Applications 14:30:53 ... WebML environments: Websites and PWAs; Browser extensions; Server-side / Electron apps 14:32:00 ... WebML's ability to use the same model across websites is a massive positive, privacy benefits of on-device inference 14:32:12 ... Feasible tasks: 14:33:08 ... Text classification (sentiment analysis, NER), Code Completion (constrained text-generation problems), Text-to-text (translation, summarization) 14:33:58 s/Feasible tasks:/Feasible tasks in Text / Vision / Audio / Multimodal 14:34:50 ... Image Classification (label images), Object Detection (bb for objects), Segmentation 14:36:32 ... Speech-to-Text (ASR), Text-To-Speech 14:37:30 ... Multimodal: Embeddings (semantic search, clustering, data analysis), Image-to-text (captions to images) 14:38:31 ... Limitations 14:39:31 ... Speed (CPU only now), Memory (Wasm can't address >4GB), Models (standards, distribution, interop), Browsers (Unified model caching, Tensor API) 14:41:48 RRSAgent, draft minutes 14:41:49 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 14:43:50 q+ 14:43:57 ack chai 14:44:10 q+ 14:44:24 Chai: Thanks Joshua! When using ONNX you're using ONNX Runtime Web? 14:44:28 Joshua: correct 14:44:40 Chai: using Optimal? 14:45:08 Joshua: defaults to FP32 quant to 8-bit 14:45:20 Chai: SegmentAnything running as quantized models? 14:45:32 Joshua: correct, working on some enhancements 14:45:59 Chai: thanks! 14:46:39 Joshua: would love to connect with Msft ONNX Runtime folks 14:46:44 Chai: will love to make that connection 14:47:47 Joshua: I've debugged the WebGPU issues with ONNX Runtime Web and would love to connect with people working on that feature 14:47:49 q? 14:48:41 ack Vivek 14:49:15 Vivek: thanks this is fantastic! Scientific computing and linear algebra and tensor API? Can you mention a bit more what the requirements would be? 14:49:29 Joshua: I want to be able to do pre- and post-processing 14:50:11 ... so if go through the code of utils, maths, audio, tensor in JS, it is annoying I had to implement these ops myself in JS 14:50:42 ... I think these should have Web APIs, maybe similar to NumPy 14:51:18 ... image resizing, many ways to do this, in the browser it is done with canvas API, but it does not allow you to select interpolation algorithm 14:51:30 ... this has performance implications 14:52:49 q+ 14:52:53 ... other lib developers might also find these scientific computing and linear algebra helpers useful, see maths.js and tensor.js in utils 14:52:54 q? 14:52:56 ack ningxin_hu 14:53:03 q+ 14:53:27 ningxin_hu: questions related to memory limitation, Wasm heap limitation, there are some workarounds using WebGPU 14:54:34 ... what is the ideal was for big model weights to be downloaded to the client side? 14:55:02 ... Wasm provides streaming compiling, compile it while streaming it 14:55:20 Joshua: in my mind the downloading and the size don't exceed 4GB when using Wasm implementation 14:55:57 ... some translation models are ~1 GB, I wouldn't worry how it is loaded as long as its cached 14:56:13 ... now you load the model, it is saved so you use the cached version if you use it again 14:56:53 ... I wouldn't care if it is big as long as the browser handles caching and you can share it across websites 14:56:59 q? 14:57:33 ... saving to a local storage is an ideal solution 14:58:36 ... for loading weights as you process, I probably wouldn't advise running such models, if a 4 GB model is needed today, maybe not realistic in browser today, maybe in the future 14:59:28 anssik: are you using CDN for models? 15:00:17 Joshua: serving models from huggingface.co currently 15:00:45 s/huggingface.co/huggingface.co model hub 15:01:24 q? 15:01:37 ack chai 15:02:08 chai: One specific question, you mentioned the popular models are quantized, also that you look for WebGPU support 15:02:31 ... quant 8-bit is not great in WebGPU, the more optimized data type is FP16 15:02:43 ... I'm wondering what are your thoughts here? 15:03:26 Joshua: I spoke about the desire for quantized models here 15:04:19 ... FP16 support with ops and run that with GPU, haven't got to that point yet, I'm a one-man show currently :-) 15:05:10 Chai: my day job is ML platforms in core Windows OS, happy to connect with you to help out here, now focusing on GPU 15:05:31 q? 15:06:20 Chai: we're dealing with GPU issues typical developer face writing shaders 15:06:49 Joshua: WebGPU support is the next big thing on the todo list 15:06:53 q? 15:07:28 q? 15:09:03 Topic: WebIDL and Infra standard conventions 15:09:17 i am good 15:09:27 anssik: I discussed with Chai on how to make progress with our open PRs for WebIDL and Infra standard conventions changes 15:09:43 ... we came to a conclusion it would help if there would a fork that tracks the official spec and integrates all these changes into it 15:09:52 ... I proposed to Zoltan he could host the rendered version of such a fork at https://zolkis.github.io/webnn/ 15:10:07 ... we could then use https://services.w3.org/htmldiff to visually compare the delta between the built versions of https://www.w3.org/TR/webnn/ and https://zolkis.github.io/webnn/ that is often faster and easier than source-level diffs 15:10:33 ... the WG would review all the changes in the fork together and merge wholesale once reviewed and ready. 15:10:49 ... I also discussed this plan with Zoltan and I think we agreed on the big picture, but wanted to have this discussion to sync all of us. 15:11:28 Chai: thanks Anssi! that's a good description what I think would work better. 15:12:00 Zoltan: this will solve the review problem, how to deal with merging 1000s LOCs 15:12:15 Chai: we can agree on the types of changes, stylistic vs. normative 15:12:30 ... those changes can be staged, when we look at the PR for the entire fork it is a lot of work 15:12:31 q? 15:12:48 ... bikeshed is not ideal for diff 15:13:18 ... PR may become outdated, no magic bullet how to ingest big changes, we must spend the time reviewing them, I'm convinced staging this as a fork reduces work 15:13:23 Zoltan: I agree 15:13:36 ... privately I have setup such a fork 15:13:58 ... I can make a GH Action that builds the spec in an integration branch and deploys the built spec 15:14:27 ... changes are simple, adds algorithmic steps, I have separate branches for all the methods, I have integration branch that unified everything 15:15:16 ... moving descriptions for arguments, if there are dictionaries in IDL I move them into their own subsections, argument sections become main text 15:15:30 ... we will have separate sections for polymorphic functions 15:16:22 ... with a polymorphic function we have generic text and use autolinking, the last change is for internal slots for algorithms, those are merged already 15:16:54 Chai: my key point is atomicity of changes 15:17:24 ... from the PoV of editors, we want to make sure that when compared to the baseline, by the time the PR is merged, it does not leave any undefined state in the spec 15:18:03 ... should do all stylistics changes in a one change to make it an atomic change 15:18:27 Zoltan: I make all the changes and then we slice them for merging in pieces, would that work? 15:18:53 Chai: style change and content change should be separated 15:19:17 ... that will help regulate the proposed changes going into the mainline, it will make harder to review the entire fork 15:19:35 ... atomicity is important, I hope that makes sense 15:19:52 ... no need to review incrementally, bring the fork forward in a one go 15:20:10 ... I will be one with my fork next week, will notify you when it is ready to review 15:20:29 Chai: I'd stop the in-flight PRs and go over to that fork and port over the fork when it is ready 15:21:25 Zoltan: I can merge into the integration branch, I will share a proposal next week? 15:21:54 Chai: maintaining as a branch or fork, either way should work 15:21:59 ... personally prefer a fork, so can pull forward 15:22:11 ... mechanics of this up to you, must just stage it somewhere 15:22:22 Zoltan: some of the PRs have been merged 15:22:39 Chai: I'm aware, I'd prefer to stop that and bring all the rest changes in when you are ready 15:23:04 ... async and sync changes are content changes 15:23:22 Zoltan: batchNorm, clamp and concat you want closed and moved to fork? 15:23:29 Chai: correct 15:23:40 i need to drop 15:26:06 RRSAgent, draft minutes 15:26:07 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 15:26:27 ningxin_hu: integration branch will be fine I think 15:27:50 RRSAgent, draft minutes 15:27:51 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 15:28:52 @Anssi can I email you the slides? 15:28:52 https://github.com/Anssi -> @Anssi 16:53:40 s/[Joshua presents slides]/-> Transformers.js presentation slides https://lists.w3.org/Archives/Public/www-archive/2023Jun/att-0000/Transformers_js.pdf 16:53:50 RRSAgent, draft minutes 16:53:51 I have made the request to generate https://www.w3.org/2023/06/08-webmachinelearning-minutes.html anssik 17:24:31 Zakim has left #webmachinelearning