14:55:17 <RRSAgent> RRSAgent has joined #webmachinelearning
14:55:21 <RRSAgent> logging to https://www.w3.org/2025/02/13-webmachinelearning-irc
14:55:21 <Zakim> RRSAgent, make logs Public
14:55:22 <Zakim> please title this meeting ("meeting: ..."), anssik
14:55:22 <anssik> Meeting: WebML WG Teleconference – 13 February 2025
14:55:27 <anssik> Chair: Anssi
14:55:34 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-02-13-wg-agenda.md
14:55:39 <anssik> Scribe: Anssi
14:55:46 <anssik> scribeNick: anssik
14:55:54 <anssik> gb, this is webmachinelearning/webnn
14:55:54 <gb> anssik, OK.
14:56:02 <anssik> Present+ Anssi_Kostiainen
14:56:16 <anssik> RRSAgent, draft minutes
14:56:17 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik
14:56:51 <McCool> McCool has joined #webmachinelearning
14:57:55 <anssik> Present+ Joshua_Bell
14:58:04 <anssik> Present+ Michael_McCool
14:59:41 <anssik> Present+ Zoltan_Kis
14:59:52 <Mike_W> Mike_W has joined #webmachinelearning
15:00:08 <anssik> Present+ Mike_Wyrzykowski
15:00:14 <zkis> zkis has joined #webmachinelearning
15:00:20 <anssik> Present+ Christian_Liebel
15:00:40 <anssik> Present+ Tarek_Ziade
15:01:05 <ningxin> ningxin has joined #webmachinelearning
15:01:20 <anssik> Present+ Ningxin_Hu
15:01:28 <anssik> Present+ Joshua_Lochner
15:01:45 <Joshua_Lochner> Joshua_Lochner has joined #webmachinelearning
15:01:53 <anssik> Present+ Thomas_Steiner
15:02:31 <RafaelCintron> RafaelCintron has joined #webmachinelearning
15:02:54 <anssik> Present+ Markus
15:03:06 <Tarek> Tarek has joined #webmachinelearning
15:03:06 <anssik> Present+ Dwayne_Robinson
15:03:10 <Joshua_Lochner> If we have ~5 mins near the end of the call, I'd love to share some work I have done with benchmarking in Transformers.js! (largest-scale web benchmarking effort?)
15:03:41 <jsbell> jsbell has joined #webmachinelearning
15:03:53 <anssik> Topic: Announcements
15:03:59 <Joshua_Lochner> Yes this benchmarking suite is made in collaboration with a bunch of different teams :)
15:04:05 <anssik> Subtopic: WebML Working Group Charter Advisory Committee review started
15:04:18 <anssik> anssik: on 5 Feb 2025 W3C Advisory Committee representatives received a proposal to review the draft WebML WG charter we prepared together
15:04:24 <anssik> -> Announcement https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2025Feb/0000.html
15:04:37 <anssik> anssik: W3C invites comments on the proposed charter from the AC through 2025-03-06
15:04:49 <anssik> ... relaying Dom's message, please make sure your Advisory Committee representative remembers to vote
15:04:55 <anssik> ... please ping Dom or me if you need help identify your AC rep
15:05:21 <anssik> Subtopic: WebML Community Group meetings to kick off
15:06:00 <anssik> anssik: based on the scheduling poll results, we'll kick off the bi-weekly Community Group meetings on 25/26 Feb 2025
15:06:05 <anssik> ... on Tue 4-5 pm PST / Wed 00-01 am UTC / Wed 8-9 am CST / Wed 9-10 am JST
15:06:20 <anssik> ... continuing on a twice a month cadence, 11/12 Mar, 1/2 Apr, ... adjusting for holidays and breaks as appropriate
15:06:37 <anssik> ... I acknowledge this time is not great for EU-based participants
15:06:51 <anssik> ... so I'm establishing a work mode where we recap the most recent CG discussion in these WG meetings
15:07:16 <anssik> ... I've asked Etienne Noel to assume a role of a scribe for the CG meetings and he will produce a summary to keep the entire group abreast of discussions and will help facilitate the CG meetings
15:07:38 <anssik> ... I will work with Etienne on the CG agendas with input from the group
15:07:49 <jsbell> Etienne can't make *this* meeting today, apologies. But yes, he's shared an early draft of the agenda with me, looks good!
15:07:57 <anssik> ... as the group's composition evolves, we may adjust the meeting time
15:08:21 <anssik> ... I'm excited to kick off these new incubations and I already see broader community engagement for these proposals
15:08:31 <anssik> ... any questions?
15:08:36 <anssik> q?
15:08:41 <RafaelCintron> q+
15:08:46 <anssik> ack RafaelCintron
15:09:47 <DwayneR> DwayneR has joined #webmachinelearning
15:10:05 <anssik> Topic: ML in Web Extensions
15:10:22 <anssik> anssik: I'm pleased to welcome Tarek Ziade and Tomislav Jovanovic from Mozilla to talk about Firefox AI Runtime
15:10:35 <anssik> Present+ Tomislav_Jovanovic
15:10:43 <anssik> ... a new experimental API in Firefox Nightly for running offline ML tasks in web extensions
15:10:49 <anssik> -> Firefox AI Runtime https://blog.mozilla.org/en/products/firefox/firefox-ai/running-inference-in-web-extensions/
15:10:52 <Tarek> my FOSDEM slides https://docs.google.com/presentation/d/1M38WbRtb9dHfKFPlg7K0MTJ2mOYEorX0uWTKDFjLrqE/edit#slide=id.g82761e80df_0_1948
15:11:10 <anssik> anssik: the thrust of this discussion is to understand the emerging usage of ML in web extensions
15:11:45 <anssik> ... given the different security model cf. the open web, this API can provide features that would not be feasible to do as is on the open web today, for privacy or security reasons
15:12:05 <anssik> ... I believe we can learn from this early experiment, and with further refinement some of these ideas could find their way to the open web and into API we're defining here
15:12:25 <anssik> ... one such feature being experimented in the web extensions context is model sharing across origins, would also benefit the WebNN greatly
15:12:38 <anssik> ... now I'll let Tarek and Tomislav share their story and we had have a brief Q&A, timebox 10-15 mins
15:13:03 <anssik> Tarek: we worked with Tomislav on this experiment
15:13:12 <anssik> ... thanks Anssi for a great summary
15:13:24 <anssik> ... I shared a link to FOSDEM presentation on the topic
15:13:55 <anssik> ... I will summarize the key takeaways from the FOSDEM presentation
15:14:13 <anssik> ... our project was to provide an inference API for offline usages surfaces to web extension developers
15:14:38 <anssik> ... allow developers experiment with the capability, understand for what such an API could be used for
15:15:10 <anssik> ... we added first inference-based feature to Firefox in 2019, translation using Bergamot and Marian MNT using Wasm, RNN model
15:15:40 <anssik> ... dedicated inference process, Wasm can run the inference task every time the user asks the web page to be translated
15:16:25 <anssik> ... specific piece of this is that for some kernel operations slow in Wasm, we use Wasm built-ins, directly into Firefox using Gemmlogy, faster without relying on Wasm SIMD implementation
15:16:59 <anssik> ... beyond translation, we provide more features running on browsers, e.g. image-text and other Transformers.js tasks
15:17:31 <anssik> ... we picked Transformers.js because it is easy to use, close to Python Transformer, all the tasks are provided, integration and experimentation simple
15:17:41 <anssik> ... Joshua made fantastic work on model coverage
15:18:14 <anssik> ... Transformers.js task-based API is easy to use
15:18:28 <anssik> ... Transformers.js in Firefox 133+
15:18:50 <anssik> ... implemented customer model cache, stores models in IndexedBD in x-origin manner
15:18:58 <anssik> ... we run our own inference process so easier to secure
15:19:09 <anssik> ... we have allow-deny filter to check model matches allowlist
15:19:23 <anssik> ... currently allows Mozilla's and Joshua's curated models
15:20:01 <anssik> ... we have pdf.js to get alt text for an image, using image-to-text model that is downloaded with pdf.js
15:20:49 <anssik> ... web extensions AI API is behind trial namespace, browser.trial.ml, the API can change or disappear at any time to highlight this is highly experimental
15:21:05 <anssik> ... documentation available on our website that demonstrated how to wrap the calls
15:21:06 <tomayac8> tomayac8 has joined #webmachinelearning
15:21:30 <anssik> ... source tree has an example of a web extension that implement the get alt text of an image on right click
15:21:39 <anssik> ... the code is pretty simple, two phases:
15:21:50 <anssik> ... create an engine and then run the inference
15:22:04 <anssik> ... a bunch of events that can be triggered
15:22:15 <anssik> ... cached in IndexedDB
15:22:16 <anssik> q?
15:22:29 <RafaelCintron> q+
15:22:52 <anssik> ack RafaelCintron
15:23:16 <anssik> RafaelCintron: do you plan on making this API an official web standard or relegated to extensions?
15:23:34 <tomayac1> tomayac1 has joined #webmachinelearning
15:23:55 <anssik> Tarek: we don't yet now if this API would be useful, we want to use this experiment to first understand how people will use it and solicit feedback from folks, if useful we can propose this into browsers
15:23:57 <tomayac1> Shameless plug: https://blog.tomayac.com/2025/02/07/playing-with-ai-inference-in-firefox-web-extensions/
15:24:45 <anssik> RafaelCintron: I'm asking because the API takes raw APIs from the internet, this group in the past experimented with Model Loader that takes as input a model, but did not progress with it due to lack of interoperable model format
15:25:34 <anssik> Tarek: we abstract this away, we use a task-based abstraction that is model agnostic
15:26:12 <anssik> RafaelCintron: which operators are available and what they do is another consideration
15:26:18 <anssik> q?
15:28:17 <anssik> Tarek: there's a way to use a generic task e.g. image-to-text or if preferred, specify the model by its id, what is unspecified is the schema on how to interface with models
15:28:18 <anssik> q?
15:28:19 <ningxin> q+
15:28:23 <Joshua_Lochner> q+
15:28:43 <anssik> ack Joshua_Lochner
15:28:55 <anssik> ack ningxin
15:29:30 <anssik> ningxin: you can run a model on CPU using Wasm, or GPU on WebGPU, how does the runtime decide on which device the inference runs on?
15:29:42 <anssik> Tarek: right now the default is to run everything on CPU
15:30:09 <anssik> ... GPU had certain limitation on quantized models, e.g. Firefox does not support fp16 on WebGPU, int8 is used as quantization level
15:30:50 <anssik> ... we don't restrict developer to set the option, if the extension developer wants to use a specific GPU model, can pass a device option, and dtype to define the quantization
15:31:15 <anssik> ... as long as such a model is available on the hub
15:31:19 <anssik> anssik: is there a fallback?
15:31:31 <anssik> Tarek: get an error if explicitly asked for a specific model
15:31:33 <anssik> q?
15:31:36 <Joshua_Lochner> We're working on an "auto" device and dtype setting btw :)
15:32:10 <Joshua_Lochner> https://github.com/huggingface/transformers.js-benchmarking
15:32:16 <anssik> Joshua_Lochner: I've been working on Transformers.js benchmarking
15:32:43 <anssik> ... a benchmarking suite for running these models across browsers, a collaboration with a team at Google and ONNX Runtime team and Tarek
15:33:19 <anssik> ... ideally the largest benchmarking suite for ONNX models, I've collected every single ONNX model on TF.js compatible models on HF Hub
15:33:24 <anssik> ... around 5100 models in total
15:33:49 <anssik> ... in the benchmarking suite, we define utility functions to stream models
15:34:24 <anssik> ... we identify ops used and number of downloads per month
15:34:54 <anssik> ... you see how many ops are used, e.g. Kokoro model uses the most ops currently
15:35:30 <anssik> ... it has some issues in Firefox and Android, we hope this helps browser developers understand what models are popular
15:36:22 <anssik> ... you can run tasks using the benchmarking tool, select the device used: Wasm, WebGPU, adding WebNN
15:36:27 <anssik> q?
15:36:57 <anssik> anssik: thanks Joshua for the world reveal!
15:37:04 <Tarek> really nice work!
15:38:16 <Tarek> q+
15:38:16 <DwayneR> Do you have an info column for which models are pure ONNX domain? I saw some that use unofficial contrib ops like SkipLayerNormalization, MatMulNBits.
15:39:58 <anssik> Joshua_Lochner: not in the CSV file, but in packages/core/src/operators.js in the repo we have the list defined
15:40:00 <DwayneR> Thanks - answered.
15:40:10 <ningxin> The transformers.js ops would be super useful, thanks Joshua for sharing!
15:40:14 <anssik> ack Tarek
15:40:24 <anssik> Tarek: great work, thanks Joshua!
15:40:41 <anssik> ... for some models, ONNX is optimizing the graph on the fly and uses other ops than on the graph
15:41:00 <anssik> ... do you read the graph statically, or can you graph ops that are executed after optimization step?
15:41:15 <anssik> ... currently static, reading the header of ONNX file, until it reaches the boundary
15:41:40 <anssik> ... not doing optimization of the bat, for many of the optimized llama models do optimization ahead of time
15:42:49 <anssik> anssik: I'll invite you to your future meeting to have an update on this topic of benchmarking
15:43:11 <anssik> Topic: Device selection
15:43:15 <anssik> Subtopic: Remove MLDeviceType
15:43:25 <anssik> anssik: PR #809
15:43:26 <gb> https://github.com/webmachinelearning/webnn/pull/809 -> Pull Request 809 Remove MLDeviceType (by zolkis) [device selection]
15:43:32 <anssik> -> Device Selection Explainer https://github.com/webmachinelearning/webnn/blob/main/device-selection-explainer.md
15:43:45 <anssik> anssik: I asked the group to conduct a final review for the spec PR to remove MLDeviceType as outlined in the explainer
15:43:54 <anssik> ... thank you Zoltan for the PR and reviewers for your feedback and suggestions!
15:43:58 <anssik> ... summary of changes:
15:44:06 <anssik> ... - remove MLDeviceType and related prose
15:44:17 <anssik> ... - update "create a context" algorithm, removing any MLDeviceType references
15:44:32 <anssik> ... - be explicit MLContextOptions is a hint
15:44:40 <anssik> ... PR #809 closes issues #302 #350 #749
15:44:41 <gb> https://github.com/webmachinelearning/webnn/issues/749 -> Issue 749 MLContextOptions.deviceType seems unnecessary outside of conformance testing (by mwyrzykowski) [device selection]
15:44:41 <gb> https://github.com/webmachinelearning/webnn/issues/350 -> Issue 350 Need to understand how WebNN supports implementation that involves multiple devices and timelines (by wchao1115) [question] [webgpu interop]
15:44:41 <gb> https://github.com/webmachinelearning/webnn/issues/302 -> CLOSED Issue 302 API simplification: context types, context options, createContext() (by zolkis) [device selection]
15:45:14 <anssik> ... the query mechanism design was spun off into a separate issue, and will be discussed next
15:45:27 <anssik> ... this split allows us to address the device selection problem space in a piecemeal fashion
15:45:39 <anssik> ... the PR has been reviewed and approved, we are ready to merge it, thanks all!
15:45:50 <anssik> ... any questions or comments?
15:46:21 <anssik> Zoltan: this is the first step, the next step is to work on the Apple's proposal for opLimits, in between came the new use case for quary mechanism
15:46:42 <anssik> ... it is good to merge this PR now and work on the query mechanism and opLimits on top of this PR
15:47:22 <anssik> anssik: any concerns in merging the PR?
15:47:25 <Mike_W> q+
15:47:31 <anssik> ack Mike_W
15:47:51 <anssik> Mike_W: OK to merge
15:48:22 <anssik> anssik: we can proceed to merge PR #809
15:48:23 <gb> https://github.com/webmachinelearning/webnn/pull/809 -> Pull Request 809 Remove MLDeviceType (by zolkis) [device selection]
15:48:34 <anssik> Subtopic: Query mechanism
15:48:41 <anssik> anssik: issue #815
15:48:41 <gb> https://github.com/webmachinelearning/webnn/issues/815 -> Issue 815 Post-context creation query mechanism for supported devices (by anssiko) [device selection]
15:48:44 <handellm> handellm has joined #webmachinelearning
15:48:45 <anssik> ... I spun off this issue from PR #809
15:49:15 <anssik> ... the premise of this issue is to enable querying the context for what kind of devices the context actually supports rather than prescribing a preferred device type
15:49:25 <anssik> ... I put forward a simple API proposal as a discussion starter
15:49:43 <anssik> ... first, I think it is important to understand the use cases for such a feature
15:50:13 <anssik> ... then, second, we can compress the possible solution space by what can be implemented in an interoperable and future-proof manner
15:51:00 <anssik> ... third, the more specific information we expose through the API, bigger the fingerprint, also conceptual weight affecting developer ergonomics
15:51:03 <tomayac7> tomayac7 has joined #webmachinelearning
15:51:28 <anssik> ... also good to remember sometimes the best solution is no new API surface at all i.e. let the implementation decide
15:51:49 <anssik> ... to balance that extreme, I also hear we cannot expect an implementation to handle everything and some sort of query mechanism would be warranted
15:51:54 <zkis> q+
15:51:57 <anssik> ... now, it is up to discussion what is the right level of abstraction
15:52:04 <handellm> q+
15:52:10 <anssik> ... I'd ask us to think about use cases first, implementability second, and the API shape third
15:52:23 <zkis> q-
15:52:27 <anssik> ... the initial use cases:
15:52:39 <anssik> ... - allow the web app to select a model optimized for a device type ahead of compile time
15:52:48 <anssik> ... - allow the web app to fall back to WebGPU shaders if "high performance" WebNN is not available
15:52:59 <anssik> q?
15:53:05 <zkis> q+
15:53:21 <anssik> Present+ Markus_Handell
15:54:06 <tomayac7> tomayac7 has joined #webmachinelearning
15:54:17 <anssik> Markus: one use case, if you imagine real-time processing with an app, e.g. background blur, and we have a model that is advanced, it run on an acceptable performance on Apple Silicon or Intel, but there's something on the model that inhibits execution on NPU and we fall back to CPU
15:55:01 <anssik> ... experimenting locally with WebNN, there's 10X difference in CPU and GPU execution in some test cases, in such cases we'd fall back to a model that is lower-resolution or use Wasm
15:55:22 <anssik> ... it is also important for us that the query mechanism is not slow, it will have impact on UX
15:55:26 <anssik> q?
15:56:02 <anssik> anssik: query mechanism should be available before the model is downloaded or compiled?
15:56:17 <RafaelCintron> q+
15:56:38 <Mike_W> q+
15:57:06 <anssik> Markus: before downloading the model would be good to know if e.g. only CPU is supported
15:57:53 <zkis> q?
15:58:04 <anssik> anssik: I see this as a three-stage process:
15:58:08 <anssik> ... - 1) hints provided at context-creation time (I want a "high-performance" context)
15:58:12 <anssik> ... - 2) device availability after context creation ("does the context have a GPU?")
15:58:16 <anssik> ... - 3) device capabilities after compile ("can you actually run this model?")
15:58:17 <anssik> q?
15:58:27 <anssik> ack handellm
15:58:27 <handellm> q-
15:58:31 <anssik> ack zkis
15:58:51 <anssik> Zoltan: there's also 0) pre-context creation query
15:58:53 <anssik> q?
15:58:57 <anssik> ack RafaelCintron
15:59:53 <anssik> RafaelCintron: question to Markus, you said you experimented with WebNN with HW acceleration and fallback to CPU had a performance impact
16:00:48 <anssik> Markus: in Core ML there was some bug with Core ML backend that caused fall back to CPU
16:00:56 <anssik> anssik: sounds like an implementation bug
16:01:25 <anssik> RafaelCintron: if WebNN is available it should always do as good job as WebGPU
16:01:25 <zkis> q?
16:01:34 <DwayneR> Model data type (like float16) can make a big perf difference, and so knowing before model download is useful.
16:01:34 <DwayneR> float32: runs on both GPU and CPU, but not any many NPU's
16:01:34 <DwayneR> float16: works on NPU. On GPU offers ~2x boost over float32, but much slower on CPU because CPU's lack dedicated float16 hardware.
16:01:34 <DwayneR> int8: works well on CPU and NPU, but then GPU's suffer.
16:01:52 <anssik> RafaelCintron: we should continue improve the WebNN implementation to ensure that is the case
16:01:55 <anssik> q?
16:02:04 <anssik> Zoltan: use case is to limit fallbacks?
16:02:11 <anssik> Markus: yes, fail with error would be better
16:02:12 <anssik> q?
16:02:16 <anssik> ack Mike_W
16:02:55 <anssik> Mike: I want to say that from Apple's perspective there's support for query mechanism and what models to download, but think names "gpu" and "npu" don't add helpful information
16:03:22 <zkis> q+
16:04:02 <anssik> ... a query mechanism should be based on capabilities and not device names
16:04:05 <anssik> q?
16:04:07 <anssik> ack zkis
16:04:24 <anssik> Zoltan: it is possible to specify a generic mechanism
16:05:03 <anssik> anssik: it feels like a future-proof solution might be to provide a high-level abstraction that stands the test of time
16:05:19 <anssik> ... can we come up with an expressive enough set of high-level future-proof hints that allow meaningful mapping to concrete devices?
16:06:07 <anssik> ... instead of hard-coding the hints-to-device mapping, I'd envision an implementation would create a "symbolic link" from (a set of) these hints to concrete device(s) that would be maintained, and the mapping could evolve over time
16:06:45 <anssik> ... hints such as "high-performance" and imaginary "full-precision" together might map to a GPU
16:06:45 <anssik> ... in the future version of the same implementation or on another system, the same hints could map to a different device
16:07:45 <anssik> q?
16:08:23 <anssik> RRSAgent, draft minutes
16:08:24 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik
16:15:37 <anssik> s/and we had have/followed by
16:16:09 <anssik> s/surfaces/surfaced
16:17:14 <anssik> s/slow in Wasm/that are slow in Wasm
16:17:48 <anssik> s/Python Transformer/Python Transformers
16:18:13 <anssik> s/customer model/custom model
16:18:26 <anssik> s/IndexedBD/IndexedDB
16:19:41 <anssik> s/yet now/yet know
16:21:05 <anssik> s/raw APIs/raw models
16:24:31 <anssik> s/ONNX model on TF.js compatible models on HF Hub/TF.js compatible ONNX model on HF Hub
16:26:04 <anssik> s/you graph ops/you identify graph ops
16:26:45 <anssik> RRSAgent, draft minutes
16:26:46 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik
16:28:01 <anssik> s/Topic: ML in Web Extensions/Topic: ML in Web Extensions and Transformers.js Benchmarking
16:28:03 <anssik> RRSAgent, draft minutes
16:28:05 <RRSAgent> I have made the request to generate https://www.w3.org/2025/02/13-webmachinelearning-minutes.html anssik
18:28:29 <Zakim> Zakim has left #webmachinelearning