14:55:17 RRSAgent has joined #webmachinelearning 14:55:21 logging to https://www.w3.org/2025/02/13-webmachinelearning-irc 14:55:21 RRSAgent, make logs Public 14:55:22 please title this meeting ("meeting: ..."), anssik 14:55:22 Meeting: WebML WG Teleconference – 13 February 2025 14:55:27 Chair: Anssi 14:55:34 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-02-13-wg-agenda.md 14:55:39 Scribe: Anssi 14:55:46 scribeNick: anssik 14:55:54 gb, this is webmachinelearning/webnn 14:55:54 anssik, OK. 14:56:02 Present+ Anssi_Kostiainen 14:56:16 RRSAgent, draft minutes 14:56:17 I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik 14:56:51 McCool has joined #webmachinelearning 14:57:55 Present+ Joshua_Bell 14:58:04 Present+ Michael_McCool 14:59:41 Present+ Zoltan_Kis 14:59:52 Mike_W has joined #webmachinelearning 15:00:08 Present+ Mike_Wyrzykowski 15:00:14 zkis has joined #webmachinelearning 15:00:20 Present+ Christian_Liebel 15:00:40 Present+ Tarek_Ziade 15:01:05 ningxin has joined #webmachinelearning 15:01:20 Present+ Ningxin_Hu 15:01:28 Present+ Joshua_Lochner 15:01:45 Joshua_Lochner has joined #webmachinelearning 15:01:53 Present+ Thomas_Steiner 15:02:31 RafaelCintron has joined #webmachinelearning 15:02:54 Present+ Markus 15:03:06 Tarek has joined #webmachinelearning 15:03:06 Present+ Dwayne_Robinson 15:03:10 If we have ~5 mins near the end of the call, I'd love to share some work I have done with benchmarking in Transformers.js! (largest-scale web benchmarking effort?) 15:03:41 jsbell has joined #webmachinelearning 15:03:53 Topic: Announcements 15:03:59 Yes this benchmarking suite is made in collaboration with a bunch of different teams :) 15:04:05 Subtopic: WebML Working Group Charter Advisory Committee review started 15:04:18 anssik: on 5 Feb 2025 W3C Advisory Committee representatives received a proposal to review the draft WebML WG charter we prepared together 15:04:24 -> Announcement https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2025Feb/0000.html 15:04:37 anssik: W3C invites comments on the proposed charter from the AC through 2025-03-06 15:04:49 ... relaying Dom's message, please make sure your Advisory Committee representative remembers to vote 15:04:55 ... please ping Dom or me if you need help identify your AC rep 15:05:21 Subtopic: WebML Community Group meetings to kick off 15:06:00 anssik: based on the scheduling poll results, we'll kick off the bi-weekly Community Group meetings on 25/26 Feb 2025 15:06:05 ... on Tue 4-5 pm PST / Wed 00-01 am UTC / Wed 8-9 am CST / Wed 9-10 am JST 15:06:20 ... continuing on a twice a month cadence, 11/12 Mar, 1/2 Apr, ... adjusting for holidays and breaks as appropriate 15:06:37 ... I acknowledge this time is not great for EU-based participants 15:06:51 ... so I'm establishing a work mode where we recap the most recent CG discussion in these WG meetings 15:07:16 ... I've asked Etienne Noel to assume a role of a scribe for the CG meetings and he will produce a summary to keep the entire group abreast of discussions and will help facilitate the CG meetings 15:07:38 ... I will work with Etienne on the CG agendas with input from the group 15:07:49 Etienne can't make *this* meeting today, apologies. But yes, he's shared an early draft of the agenda with me, looks good! 15:07:57 ... as the group's composition evolves, we may adjust the meeting time 15:08:21 ... I'm excited to kick off these new incubations and I already see broader community engagement for these proposals 15:08:31 ... any questions? 15:08:36 q? 15:08:41 q+ 15:08:46 ack RafaelCintron 15:09:47 DwayneR has joined #webmachinelearning 15:10:05 Topic: ML in Web Extensions 15:10:22 anssik: I'm pleased to welcome Tarek Ziade and Tomislav Jovanovic from Mozilla to talk about Firefox AI Runtime 15:10:35 Present+ Tomislav_Jovanovic 15:10:43 ... a new experimental API in Firefox Nightly for running offline ML tasks in web extensions 15:10:49 -> Firefox AI Runtime https://blog.mozilla.org/en/products/firefox/firefox-ai/running-inference-in-web-extensions/ 15:10:52 my FOSDEM slides https://docs.google.com/presentation/d/1M38WbRtb9dHfKFPlg7K0MTJ2mOYEorX0uWTKDFjLrqE/edit#slide=id.g82761e80df_0_1948 15:11:10 anssik: the thrust of this discussion is to understand the emerging usage of ML in web extensions 15:11:45 ... given the different security model cf. the open web, this API can provide features that would not be feasible to do as is on the open web today, for privacy or security reasons 15:12:05 ... I believe we can learn from this early experiment, and with further refinement some of these ideas could find their way to the open web and into API we're defining here 15:12:25 ... one such feature being experimented in the web extensions context is model sharing across origins, would also benefit the WebNN greatly 15:12:38 ... now I'll let Tarek and Tomislav share their story and we had have a brief Q&A, timebox 10-15 mins 15:13:03 Tarek: we worked with Tomislav on this experiment 15:13:12 ... thanks Anssi for a great summary 15:13:24 ... I shared a link to FOSDEM presentation on the topic 15:13:55 ... I will summarize the key takeaways from the FOSDEM presentation 15:14:13 ... our project was to provide an inference API for offline usages surfaces to web extension developers 15:14:38 ... allow developers experiment with the capability, understand for what such an API could be used for 15:15:10 ... we added first inference-based feature to Firefox in 2019, translation using Bergamot and Marian MNT using Wasm, RNN model 15:15:40 ... dedicated inference process, Wasm can run the inference task every time the user asks the web page to be translated 15:16:25 ... specific piece of this is that for some kernel operations slow in Wasm, we use Wasm built-ins, directly into Firefox using Gemmlogy, faster without relying on Wasm SIMD implementation 15:16:59 ... beyond translation, we provide more features running on browsers, e.g. image-text and other Transformers.js tasks 15:17:31 ... we picked Transformers.js because it is easy to use, close to Python Transformer, all the tasks are provided, integration and experimentation simple 15:17:41 ... Joshua made fantastic work on model coverage 15:18:14 ... Transformers.js task-based API is easy to use 15:18:28 ... Transformers.js in Firefox 133+ 15:18:50 ... implemented customer model cache, stores models in IndexedBD in x-origin manner 15:18:58 ... we run our own inference process so easier to secure 15:19:09 ... we have allow-deny filter to check model matches allowlist 15:19:23 ... currently allows Mozilla's and Joshua's curated models 15:20:01 ... we have pdf.js to get alt text for an image, using image-to-text model that is downloaded with pdf.js 15:20:49 ... web extensions AI API is behind trial namespace, browser.trial.ml, the API can change or disappear at any time to highlight this is highly experimental 15:21:05 ... documentation available on our website that demonstrated how to wrap the calls 15:21:06 tomayac8 has joined #webmachinelearning 15:21:30 ... source tree has an example of a web extension that implement the get alt text of an image on right click 15:21:39 ... the code is pretty simple, two phases: 15:21:50 ... create an engine and then run the inference 15:22:04 ... a bunch of events that can be triggered 15:22:15 ... cached in IndexedDB 15:22:16 q? 15:22:29 q+ 15:22:52 ack RafaelCintron 15:23:16 RafaelCintron: do you plan on making this API an official web standard or relegated to extensions? 15:23:34 tomayac1 has joined #webmachinelearning 15:23:55 Tarek: we don't yet now if this API would be useful, we want to use this experiment to first understand how people will use it and solicit feedback from folks, if useful we can propose this into browsers 15:23:57 Shameless plug: https://blog.tomayac.com/2025/02/07/playing-with-ai-inference-in-firefox-web-extensions/ 15:24:45 RafaelCintron: I'm asking because the API takes raw APIs from the internet, this group in the past experimented with Model Loader that takes as input a model, but did not progress with it due to lack of interoperable model format 15:25:34 Tarek: we abstract this away, we use a task-based abstraction that is model agnostic 15:26:12 RafaelCintron: which operators are available and what they do is another consideration 15:26:18 q? 15:28:17 Tarek: there's a way to use a generic task e.g. image-to-text or if preferred, specify the model by its id, what is unspecified is the schema on how to interface with models 15:28:18 q? 15:28:19 q+ 15:28:23 q+ 15:28:43 ack Joshua_Lochner 15:28:55 ack ningxin 15:29:30 ningxin: you can run a model on CPU using Wasm, or GPU on WebGPU, how does the runtime decide on which device the inference runs on? 15:29:42 Tarek: right now the default is to run everything on CPU 15:30:09 ... GPU had certain limitation on quantized models, e.g. Firefox does not support fp16 on WebGPU, int8 is used as quantization level 15:30:50 ... we don't restrict developer to set the option, if the extension developer wants to use a specific GPU model, can pass a device option, and dtype to define the quantization 15:31:15 ... as long as such a model is available on the hub 15:31:19 anssik: is there a fallback? 15:31:31 Tarek: get an error if explicitly asked for a specific model 15:31:33 q? 15:31:36 We're working on an "auto" device and dtype setting btw :) 15:32:10 https://github.com/huggingface/transformers.js-benchmarking 15:32:16 Joshua_Lochner: I've been working on Transformers.js benchmarking 15:32:43 ... a benchmarking suite for running these models across browsers, a collaboration with a team at Google and ONNX Runtime team and Tarek 15:33:19 ... ideally the largest benchmarking suite for ONNX models, I've collected every single ONNX model on TF.js compatible models on HF Hub 15:33:24 ... around 5100 models in total 15:33:49 ... in the benchmarking suite, we define utility functions to stream models 15:34:24 ... we identify ops used and number of downloads per month 15:34:54 ... you see how many ops are used, e.g. Kokoro model uses the most ops currently 15:35:30 ... it has some issues in Firefox and Android, we hope this helps browser developers understand what models are popular 15:36:22 ... you can run tasks using the benchmarking tool, select the device used: Wasm, WebGPU, adding WebNN 15:36:27 q? 15:36:57 anssik: thanks Joshua for the world reveal! 15:37:04 really nice work! 15:38:16 q+ 15:38:16 Do you have an info column for which models are pure ONNX domain? I saw some that use unofficial contrib ops like SkipLayerNormalization, MatMulNBits. 15:39:58 Joshua_Lochner: not in the CSV file, but in packages/core/src/operators.js in the repo we have the list defined 15:40:00 Thanks - answered. 15:40:10 The transformers.js ops would be super useful, thanks Joshua for sharing! 15:40:14 ack Tarek 15:40:24 Tarek: great work, thanks Joshua! 15:40:41 ... for some models, ONNX is optimizing the graph on the fly and uses other ops than on the graph 15:41:00 ... do you read the graph statically, or can you graph ops that are executed after optimization step? 15:41:15 ... currently static, reading the header of ONNX file, until it reaches the boundary 15:41:40 ... not doing optimization of the bat, for many of the optimized llama models do optimization ahead of time 15:42:49 anssik: I'll invite you to your future meeting to have an update on this topic of benchmarking 15:43:11 Topic: Device selection 15:43:15 Subtopic: Remove MLDeviceType 15:43:25 anssik: PR #809 15:43:26 https://github.com/webmachinelearning/webnn/pull/809 -> Pull Request 809 Remove MLDeviceType (by zolkis) [device selection] 15:43:32 -> Device Selection Explainer https://github.com/webmachinelearning/webnn/blob/main/device-selection-explainer.md 15:43:45 anssik: I asked the group to conduct a final review for the spec PR to remove MLDeviceType as outlined in the explainer 15:43:54 ... thank you Zoltan for the PR and reviewers for your feedback and suggestions! 15:43:58 ... summary of changes: 15:44:06 ... - remove MLDeviceType and related prose 15:44:17 ... - update "create a context" algorithm, removing any MLDeviceType references 15:44:32 ... - be explicit MLContextOptions is a hint 15:44:40 ... PR #809 closes issues #302 #350 #749 15:44:41 https://github.com/webmachinelearning/webnn/issues/749 -> Issue 749 MLContextOptions.deviceType seems unnecessary outside of conformance testing (by mwyrzykowski) [device selection] 15:44:41 https://github.com/webmachinelearning/webnn/issues/350 -> Issue 350 Need to understand how WebNN supports implementation that involves multiple devices and timelines (by wchao1115) [question] [webgpu interop] 15:44:41 https://github.com/webmachinelearning/webnn/issues/302 -> CLOSED Issue 302 API simplification: context types, context options, createContext() (by zolkis) [device selection] 15:45:14 ... the query mechanism design was spun off into a separate issue, and will be discussed next 15:45:27 ... this split allows us to address the device selection problem space in a piecemeal fashion 15:45:39 ... the PR has been reviewed and approved, we are ready to merge it, thanks all! 15:45:50 ... any questions or comments? 15:46:21 Zoltan: this is the first step, the next step is to work on the Apple's proposal for opLimits, in between came the new use case for quary mechanism 15:46:42 ... it is good to merge this PR now and work on the query mechanism and opLimits on top of this PR 15:47:22 anssik: any concerns in merging the PR? 15:47:25 q+ 15:47:31 ack Mike_W 15:47:51 Mike_W: OK to merge 15:48:22 anssik: we can proceed to merge PR #809 15:48:23 https://github.com/webmachinelearning/webnn/pull/809 -> Pull Request 809 Remove MLDeviceType (by zolkis) [device selection] 15:48:34 Subtopic: Query mechanism 15:48:41 anssik: issue #815 15:48:41 https://github.com/webmachinelearning/webnn/issues/815 -> Issue 815 Post-context creation query mechanism for supported devices (by anssiko) [device selection] 15:48:44 handellm has joined #webmachinelearning 15:48:45 ... I spun off this issue from PR #809 15:49:15 ... the premise of this issue is to enable querying the context for what kind of devices the context actually supports rather than prescribing a preferred device type 15:49:25 ... I put forward a simple API proposal as a discussion starter 15:49:43 ... first, I think it is important to understand the use cases for such a feature 15:50:13 ... then, second, we can compress the possible solution space by what can be implemented in an interoperable and future-proof manner 15:51:00 ... third, the more specific information we expose through the API, bigger the fingerprint, also conceptual weight affecting developer ergonomics 15:51:03 tomayac7 has joined #webmachinelearning 15:51:28 ... also good to remember sometimes the best solution is no new API surface at all i.e. let the implementation decide 15:51:49 ... to balance that extreme, I also hear we cannot expect an implementation to handle everything and some sort of query mechanism would be warranted 15:51:54 q+ 15:51:57 ... now, it is up to discussion what is the right level of abstraction 15:52:04 q+ 15:52:10 ... I'd ask us to think about use cases first, implementability second, and the API shape third 15:52:23 q- 15:52:27 ... the initial use cases: 15:52:39 ... - allow the web app to select a model optimized for a device type ahead of compile time 15:52:48 ... - allow the web app to fall back to WebGPU shaders if "high performance" WebNN is not available 15:52:59 q? 15:53:05 q+ 15:53:21 Present+ Markus_Handell 15:54:06 tomayac7 has joined #webmachinelearning 15:54:17 Markus: one use case, if you imagine real-time processing with an app, e.g. background blur, and we have a model that is advanced, it run on an acceptable performance on Apple Silicon or Intel, but there's something on the model that inhibits execution on NPU and we fall back to CPU 15:55:01 ... experimenting locally with WebNN, there's 10X difference in CPU and GPU execution in some test cases, in such cases we'd fall back to a model that is lower-resolution or use Wasm 15:55:22 ... it is also important for us that the query mechanism is not slow, it will have impact on UX 15:55:26 q? 15:56:02 anssik: query mechanism should be available before the model is downloaded or compiled? 15:56:17 q+ 15:56:38 q+ 15:57:06 Markus: before downloading the model would be good to know if e.g. only CPU is supported 15:57:53 q? 15:58:04 anssik: I see this as a three-stage process: 15:58:08 ... - 1) hints provided at context-creation time (I want a "high-performance" context) 15:58:12 ... - 2) device availability after context creation ("does the context have a GPU?") 15:58:16 ... - 3) device capabilities after compile ("can you actually run this model?") 15:58:17 q? 15:58:27 ack handellm 15:58:27 q- 15:58:31 ack zkis 15:58:51 Zoltan: there's also 0) pre-context creation query 15:58:53 q? 15:58:57 ack RafaelCintron 15:59:53 RafaelCintron: question to Markus, you said you experimented with WebNN with HW acceleration and fallback to CPU had a performance impact 16:00:48 Markus: in Core ML there was some bug with Core ML backend that caused fall back to CPU 16:00:56 anssik: sounds like an implementation bug 16:01:25 RafaelCintron: if WebNN is available it should always do as good job as WebGPU 16:01:25 q? 16:01:34 Model data type (like float16) can make a big perf difference, and so knowing before model download is useful. 16:01:34 float32: runs on both GPU and CPU, but not any many NPU's 16:01:34 float16: works on NPU. On GPU offers ~2x boost over float32, but much slower on CPU because CPU's lack dedicated float16 hardware. 16:01:34 int8: works well on CPU and NPU, but then GPU's suffer. 16:01:52 RafaelCintron: we should continue improve the WebNN implementation to ensure that is the case 16:01:55 q? 16:02:04 Zoltan: use case is to limit fallbacks? 16:02:11 Markus: yes, fail with error would be better 16:02:12 q? 16:02:16 ack Mike_W 16:02:55 Mike: I want to say that from Apple's perspective there's support for query mechanism and what models to download, but think names "gpu" and "npu" don't add helpful information 16:03:22 q+ 16:04:02 ... a query mechanism should be based on capabilities and not device names 16:04:05 q? 16:04:07 ack zkis 16:04:24 Zoltan: it is possible to specify a generic mechanism 16:05:03 anssik: it feels like a future-proof solution might be to provide a high-level abstraction that stands the test of time 16:05:19 ... can we come up with an expressive enough set of high-level future-proof hints that allow meaningful mapping to concrete devices? 16:06:07 ... instead of hard-coding the hints-to-device mapping, I'd envision an implementation would create a "symbolic link" from (a set of) these hints to concrete device(s) that would be maintained, and the mapping could evolve over time 16:06:45 ... hints such as "high-performance" and imaginary "full-precision" together might map to a GPU 16:06:45 ... in the future version of the same implementation or on another system, the same hints could map to a different device 16:07:45 q? 16:08:23 RRSAgent, draft minutes 16:08:24 I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik 16:15:37 s/and we had have/followed by 16:16:09 s/surfaces/surfaced 16:17:14 s/slow in Wasm/that are slow in Wasm 16:17:48 s/Python Transformer/Python Transformers 16:18:13 s/customer model/custom model 16:18:26 s/IndexedBD/IndexedDB 16:19:41 s/yet now/yet know 16:21:05 s/raw APIs/raw models 16:24:31 s/ONNX model on TF.js compatible models on HF Hub/TF.js compatible ONNX model on HF Hub 16:26:04 s/you graph ops/you identify graph ops 16:26:45 RRSAgent, draft minutes 16:26:46 I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik 16:28:01 s/Topic: ML in Web Extensions/Topic: ML in Web Extensions and Transformers.js Benchmarking 16:28:03 RRSAgent, draft minutes 16:28:05 I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik 18:28:29 Zakim has left #webmachinelearning