14:54:35 RRSAgent has joined #webmachinelearning 14:54:39 logging to https://www.w3.org/2024/12/05-webmachinelearning-irc 14:54:39 RRSAgent, make logs Public 14:54:40 please title this meeting ("meeting: ..."), anssik 14:54:40 Meeting: WebML WG Teleconference – 5 December 2024 14:54:59 Chair: Anssi 14:55:03 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-12-05-wg-agenda.md 14:55:07 Scribe: Anssi 14:55:11 scribeNick: anssik 14:55:19 gb, this is webmachinelearning/webnn 14:55:19 anssik, OK. 14:55:24 Present+ Anssi_Kostiainen 14:55:32 RRSAgent, draft minutes 14:55:33 I have made the request to generate https://www.w3.org/2024/12/05-webmachinelearning-minutes.html anssik 14:57:09 Present+ Zoltan_Kis 14:59:38 Present+ Etienne_Noel 14:59:58 Present+ Joshua_Lochner 15:00:41 Present+ Mike_Wyrzykowski 15:00:54 MikeW has joined #webmachinelearning 15:00:59 Regrets+ Rafael_Cintron 15:01:06 Present+ Dwayne_Robinson 15:01:20 RafaelCintron has joined #webmachinelearning 15:01:25 Regrets+ Ningxin_Hu 15:02:03 DwayneRobinson has joined #webmachinelearning 15:02:04 ningxin has joined #webmachinelearning 15:02:51 Present+ Austin_Sullivan 15:03:10 Present+ Christian_Liebel 15:03:26 RRSAgent, draft minutes 15:03:27 I have made the request to generate https://www.w3.org/2024/12/05-webmachinelearning-minutes.html anssik 15:04:35 Topic: WebNN Operator Update Wave 3 15:04:55 anssik: at TPAC 2024 the group resolved to update spec with Wave 3 operators 15:05:02 ... the details are in Dwayne's excellent presentation: 15:05:04 christianliebel1 has joined #webmachinelearning 15:05:08 -> WebNN Operator Update Wave 3 (slides) https://lists.w3.org/Archives/Public/www-archive/2024Sep/att-0014/WebNN_Operator_Update_Wave_3.pdf 15:05:15 zkis has joined #webmachinelearning 15:05:17 -> TPAC 2024 resolution and discussion https://www.w3.org/2024/10/17-webmachinelearning-minutes.html#1152 15:05:35 anssik: implementation status tracker for these Wave 3 ops has been updated: 15:05:40 -> Implementation Status incl. Wave 3 ops https://webmachinelearning.github.io/webnn-status/ 15:05:41 Joshua_Lochner has joined #webmachinelearning 15:06:20 anssik: per implementation status all Wave 3 ops are implemented across >=2 backends and used by at least one framework 15:06:23 asully has joined #webmachinelearning 15:06:30 ... a few observations: 15:06:53 ... - gatherElements and scatterElements use emulation paths on LiteRT backend (denoted with "Emulated with") 15:06:57 ... - notEqual op was not implemented 15:07:33 ... overall Wave 3 ops maps nicely to backends, anyone want to share implementation experience across WebNN backends and frameworks? 15:07:34 q? 15:07:59 q+ 15:08:23 Dwayne: everything is implemented except notEqual 15:08:46 ... WebNN EP in ORT has complete coverage except notEqual 15:09:17 ack ningxin 15:09:56 ningxin: ORT Web reverse is still work in progress, this was introduced later, not in the initial proposal, this was a requirement for a customer model that we felt was important to incorporate 15:10:36 ... development of that op has some delay, but we see no issues in implementing that in ORT, can be implemented with ONNX slice with negative step 15:11:01 https://github.com/webmachinelearning/webnn/issues/767 15:11:02 https://github.com/webmachinelearning/webnn/issues/767 -> Issue 767 Request the decomposition for gatherElements, scatterElements and scatterND (by fujunwei) [operator specific] 15:11:19 ... for Chromium TFLite implementation, scatterElement and gatherElement have no corresponding ops, use emulation paths 15:11:47 ... specifically for constant indices variants we have emulation paths 15:12:04 ... this is partial support, we need some further work to have full support for these two emulations 15:12:21 ... an issue opened against TFLite to query interest to implement these 15:12:23 q? 15:13:10 Austin: on CoreML, most of the Wave 3 ops are implemented 15:13:23 ... some are entirely decompositions, e.g. LSTM is all decomp 15:13:44 ... remaining Wave 3 ops are quantization ops, de/quantizeLinear and reverse 15:14:10 ... some constraints in dequantizeLinear, the scale needs to be positive 15:15:14 ... for the most part, in WebNN now, any input is valid as long as it is of correct shape and data type 15:15:20 ... we don't inspect content of the input 15:16:04 ... if garbage is passed, we validate, if negative values is passed in scales tensor need to decide what to do 15:16:25 ... all these tensors are required to be constant by the backend 15:16:52 ... I can file an issue about this 15:18:06 q? 15:18:20 Topic: MLTensor specification updates 15:18:38 anssik: The MLTensor design in the explainer form is now being converted to specification prose, thanks Austin! 15:18:42 -> MLTensor explainer https://github.com/webmachinelearning/webnn/blob/main/mltensor-explainer.md 15:19:08 anssik: PR #787 landed that specified key methods related to MLTensor, while "hand-waving over much of the juicy details" (quoting Austin) :-) 15:19:08 https://github.com/webmachinelearning/webnn/pull/787 -> MERGED Pull Request 787 Specify MLTensor (by a-sully) 15:19:28 ... PR #786 (linked in the agenda) was superseded by PR #795 that removed both compute() method and outdated MLContext requirements in this same PR 15:19:28 https://github.com/webmachinelearning/webnn/pull/786 -> CLOSED Pull Request 786 Remove descriptions of outdated MLContext requirements (by a-sully) 15:19:28 https://github.com/webmachinelearning/webnn/pull/795 -> MERGED Pull Request 795 Remove the MLContext.compute() method (by a-sully) 15:20:11 Austin: juicy details are about more detailed discussion on what timelines are, the PR defines behaviour, we don't specify what the timeline is 15:20:42 ... there are some constraints on how operation can run when multiple graphs are at play 15:21:01 ... planning how to best specify these, we could pull in some logic from IndexedDB spec 15:21:08 ... methods should be good to go 15:21:22 Subtopic: Timelines 15:21:30 anssik: issue #529 is where the "juicy details" of MLContext's timeline are discussed 15:21:31 https://github.com/webmachinelearning/webnn/issues/529 -> Issue 529 Specify WebNN timelines (by a-sully) [webgpu interop] 15:21:50 ... Josh and Zoltan noted the spec needs tightening in how it explains cross-thread, cross-process work 15:21:56 ... HTML spec provides abstractions for: 15:22:07 -> event loop ("main thread") https://html.spec.whatwg.org/multipage/webappapis.html#event-loop 15:22:13 -> in parallel ("on a background thread") https://html.spec.whatwg.org/multipage/infrastructure.html#in-parallel 15:22:29 -> task source ("separate logically-different types of tasks") https://html.spec.whatwg.org/multipage/webappapis.html#task-source 15:22:33 -> task queue ("coalesce task sources within a given event loop") https://html.spec.whatwg.org/multipage/webappapis.html#task-queue 15:22:51 anssik: WebGPU spec defines timelines that "clearly define both the order of operations, and which state is available to which operations": 15:22:56 -> Content timeline ("Associated with the execution of the Web script") https://www.w3.org/TR/webgpu/#content-timeline 15:23:01 -> Device timeline ("Associated with the GPU device operations") https://www.w3.org/TR/webgpu/#device-timeline 15:23:05 -> Queue timeline ("Associated with the execution of operations") https://www.w3.org/TR/webgpu/#queue-timeline 15:23:22 ... these WebGPU timelines can be used (only?) for GPU device type? 15:23:49 ... possible work items from this issue: 15:23:55 ... - decide and define what new timelines are required 15:24:14 ... - define behavior of the new timelines on implementations that involve multiple devices and timelines? 15:24:21 ... - define interaction of the new timelines with WebGPU's timelines? 15:24:51 Austin: no need to define timelines in terms of WebGPU for this spec, we can probably go with "content" and "context" timelines for WebNN purposes 15:25:17 ... big question in my mind, how hand-wave we should/can be, we have these concepts such as task queues and task sources 15:25:41 ... we want to interleave our timeline with WebGPU, some of these primitives are associated with web things e.g. execution contexts 15:25:43 q? 15:26:04 ... open question: can we use existing primitives, or be ultra-hand-wave when we talk to WebGPU? 15:26:20 q? 15:27:29 Rafael: I'm onboard with Austin, no need to specify as in WebGPU, but need "device timeline" that runs separate from CPU, true for 2D Canvas, WebGL/GPU, e.g. draw is async 15:27:38 q+ 15:27:52 ack MikeW 15:28:47 MikeW: content timelines is on the web side, queue is the timeline of the actual chip, in WebNN case it'd be CPU, GPU, or NPU, separate execution paths 15:29:22 Austin: WebNN would not need these three distinction because it is not tied to GPU so closely 15:30:00 MikeW: starting with two timelines for WebNN would be reasonable, in some cases we could see that not specifying this could introduce data races 15:30:49 q+ 15:30:53 ack ningxin 15:31:02 https://github.com/microsoft/webnn-developer-preview/pull/67 15:31:03 https://github.com/microsoft/webnn-developer-preview/pull/67 -> Pull Request 67 Support iobinding for Whisper Base demo on WebNN GPU (by miaobin) 15:31:12 ningxin: want to share a use case for MLTensor in WebNN Whisper demo 15:31:43 ... MLTensor was used to optimize Whisper demo with good speedup on GPU, up to 50% inference perf improvement 15:32:26 q? 15:33:07 Subtopic: Non-fatal errors 15:33:12 anssi: issue #778 15:33:13 https://github.com/webmachinelearning/webnn/issues/778 -> Issue 778 Proposal: Report non-fatal errors from the WebNN timeline (by a-sully) [feature request] 15:33:22 ... proposal to report non-fatal errors from the WebNN timeline 15:33:37 ... includes problem statement, current state, observations, proposal with tentative IDL provided, thanks Austin! 15:33:43 ... thumbs up from Ningxin and Reilly 15:34:06 ... good insights from Bryan and Rafael on what does WebGPU do to resources involved in non-fatal errors 15:34:29 ... also a suggestion from Rafael to introduce an "errorScope" a la WebGPU or error queue with errors referring to labeled objects 15:34:35 q? 15:35:03 Austin: thank you for the feedback Rafael and Ningxin! 15:36:03 ... given how WebGPU handles errors, I'm wondering whether doing something like cascading errors on tensors as proposed, create a promise that resolves when a graph is destroyed, does that get us far enough? 15:36:14 ... so that we would not need to do cascading errors on MLTensor? 15:36:26 ... are we fine starting with this similar approach? 15:36:48 Rafael: a simple solution would be a promise on the graph, and if invalidated the graph could not be used for anything? 15:36:59 ... and we could do that for various backends? 15:37:28 Austin: potential for false positives there, if you fail to dispatch usually that's the right thing to do, consistency for developers 15:37:58 Rafael: that should be fine for now, what advise to give when that happens? Try another model? 15:38:24 Austin: no good solution, in such a case the graph is invalid 15:38:52 ... unfortunately can just tell developers "try something else" 15:39:01 Rafael: which platforms see this? 15:39:17 Austin: TFLite and CoreML, TFLite errors are implementation gaps 15:39:24 ... if you don't clamp it is a runtime exception 15:39:45 ... CoreML has other instances, e.g. negative scales throw a runtime exception 15:40:01 ... trying to allocate some buffers while dispatching, we lose the context 15:40:06 ... should lose the graph instead 15:40:21 Rafael: do we have bugs filed against CoreML? 15:40:31 Austin: we can file more of them 15:41:02 q+ 15:41:06 ack ningxin 15:42:00 ningxin: question to Austin, non-fatal errors for values in tensors, can we introduce clamp value for indices? 15:42:41 Austin: I agree there are some cases in CoreML and TFLite backend where we don't clamp indices and that causes runtime error, implementation issue that'll be fixed 15:43:02 ... a class of errors in the "State of the world" section, class 2 errors, you're trying to allocate resources and that fails 15:43:24 ... or you compile the model and clear site data that may blow away the representation of the model stored on disk 15:44:01 ... both those errors and implementation issues 15:44:03 q? 15:44:32 ningxin: thanks, that makes sense 15:44:32 q? 15:45:11 Austin: next steps to respond to Rafael latest comment and update the proposal to note we start adding promise to the graph and reject the promise if MLGraph becomes invalidated 15:45:13 q? 15:45:25 q? 15:45:40 Topic: WebML Community Group new deliverables kick off 15:45:47 gb, this is webmachinelearning/charter 15:45:47 anssik, OK. 15:45:49 ningxin has joined #webmachinelearning 15:46:00 anssik: The WebML Community Group charter review ended with explicit support and no objections. Minor editorial changes were suggested, all addressed. 15:46:04 -> Results announcement https://lists.w3.org/Archives/Public/public-webmachinelearning/2024Dec/0000.html 15:46:08 -> Updated Charter https://webmachinelearning.github.io/charter/ 15:46:26 anssik: Summary of changes: 15:46:26 15:46:26 ... - Refresh Goals 15:46:26 ... - Add Task-specific APIs (Translator and Language Detector APIs, Writing Assistance APIs) and Prompt API to Deliverables 15:46:26 ... - Note WebNN has graduated to the WG 15:46:27 ... - Stylistic tweaks 15:47:06 ... since new Task-specific APIs and Prompt API are the most significant changes, I've asked Etienne Noel from Google to present the latest on these new APIs and to give us a refresher to the topic 15:47:07 ... and hear additional updates on new ideas, the launch status of specific APIs currently in Origin Trial, and progress made since last update at TPAC. 15:47:11 ... and have should have some time for questions at the end 15:47:17 -> https://bit.ly/tpac2024-builtinai 15:47:30 -> https://github.com/WICG/translation-api 15:47:30 -> https://github.com/WICG/writing-assistance-apis 15:47:30 -> https://github.com/explainers-by-googlers/prompt-api 15:47:43 Slideset: built-in-apis-slides 15:47:58 [slide 1] 15:48:36 [slide 2] 15:49:01 [slide 3] 15:49:22 Etienne: built-in APIs complementary to WebNN 15:49:26 [slide 4] 15:49:51 [slide 5] 15:50:36 Etienne: strong interest from developers 15:50:41 [slide 6] 15:51:10 Etienne: some developers want high-level APIs for ease of use 15:51:24 ... sharing models across origins a hard problem 15:51:28 [slide 7] 15:51:48 Etienne: Prompt API is complicated, hard to stardardize 15:51:55 q+ 15:52:04 ... OT only for extensions for Prompt API 15:52:27 q+ 15:52:48 Etienne: why not Prompt API? Standardizing it is hard 15:52:53 ... task-specific APIs easier to standardize 15:53:05 ... for some use cases, we cannot have specific API for 15:53:39 ... want to provide cohesive unified APIs 15:53:40 RafaelCintron has joined #webmachinelearning 15:53:52 ... open questions: how to support all the devices on the web? 15:54:01 ... in the future this will change but now limited 15:54:21 ... proposed solution, can be hybrid, client-side or cloud-based 15:54:56 ... privacy and fingerprinting, mitigations using permission prompts so that users stay in control 15:55:43 ... interoperability, quality, i18n considerations 15:55:46 ... next steps 15:56:05 ... want to discuss these proposals with group participants and other browser vendors to shape this area 15:56:06 q? 15:56:16 q+ 15:56:26 ack christianliebel1 15:56:36 Joshua_Lochner has joined #webmachinelearning 15:56:38 q+ 15:56:45 christianliebel: language and translation use different models? 15:57:04 Etienne: correct, on Android using the same models you'd use when offline, will consider LLM later 15:57:16 christianliebel1: Writer and Rewriter? 15:57:26 Etienne: using Gemini Nano for those at this time 15:57:33 q? 15:57:36 ack christianliebel1 15:57:38 ack christianliebel 15:57:57 ack Joshua_Lochner 15:58:59 Joshua_Lochner: wanted to ask, when it comes to using custom models, or more up to date models, e.g. latest LLaMa is released and gets versioned in the API and the previous one is outdated, users want to use the latest and some users don't want to upgrade, or fine-tune their own 15:59:13 ... any thought how on how to version the models? 15:59:39 Etienne: that is definitely something we should discuss in this Community Group and see what it takes to support that in the model ecosystem for browsers 15:59:40 q? 15:59:55 Joshua_Lochner: adapters, LoRas, use cases for those would be amazing 16:00:28 ... I'm working on adding support for these features too 16:00:43 https://github.com/explainers-by-googlers/prompt-api 16:00:53 q? 16:00:56 ack Zakim 16:00:59 ack zkis 16:01:38 zkis: you need to adapt to device capabilities in model selection, how you could control what model to use in the Chrome version you use, there could be multiple versions, do you want to open up the model management? 16:01:50 ... have discussed in the CG, so wonder if that should be work for the CG? 16:02:05 Etienne: not opposed to the idea, should look at the use cases 16:02:06 q? 16:02:27 Etienne: need to think privacy and fingerprinting impact 16:02:39 zkis: how to adapt to local devices 16:02:39 q? 16:02:39 ack RafaelCintron 16:03:12 RafaelCintron: presentation says, "Prompt API is good, hard to standardize", do you want to have it for browsers? 16:03:27 Etienne: we want to standardize the other task-specific APIs first 16:03:40 ... if strong need for Prompt API, we can adjust priorities 16:04:08 RafaelCintron: one thing we see with Prompt API, wild variations what various implementation produce 16:04:52 Etienne: structured output for Prompt API is being experimented 16:04:53 q? 16:05:26 Summarizer API: OT M131 to M136 16:05:26 https://chromestatus.com/feature/5193953788559360 16:05:26 16:05:26 Rewriter API and Writer API: Hoping for OT soon. 16:05:26 16:05:26 Translate API: OT M131 to M136 16:05:27 https://chromestatus.com/feature/5172811302961152 16:05:27 16:05:27 Language Detector: OT M130 to M135 16:05:28 https://chromestatus.com/feature/6494349985841152 16:05:28 16:05:28 Prompt API: Origin Trial for Extensions 16:05:29 https://developer.chrome.com/docs/extensions/ai/prompt-api 16:05:59 Topic: Device selection abstractions 16:06:07 anssik: PR #784 explainer 16:06:07 Issue 784 not found 16:06:28 ... would like to check the group's readiness to converge on the proposal to allow progress toward a prototype early next year 16:06:34 ... explainer documents 3 considered alternatives: 16:06:38 -> Considered alternatives https://github.com/webmachinelearning/webnn/blob/88a858d6afdf2f5ff549e6e1f49dee52b443d460/device-selection-explainer.md#considered-alternatives 16:06:47 anssik: I heard the group prefers a minimal solution, does not want to over engineer 16:06:54 anssik: it looks like Option 1 is "the most MVP": 16:06:59 "Keep the current MLDeviceType as a context option, but improve the device type names and specify an algorithm for a mapping these names to various real adaptors (with their given characteristics). However, this would be more limited than being able to specify device specific limits to context creation." 16:07:23 anssik: in addition, for all paths in the spec where there's an error returned or thrown in response to MLDeviceType, we'd change to a fallback path instead. 16:07:28 RRSAgent, draft minutes 16:07:29 I have made the request to generate https://www.w3.org/2024/12/05-webmachinelearning-minutes.html anssik 16:07:35 q? 16:07:40 q? 16:09:16 q? 16:10:23 gb, this is webmachinelearning/webnn 16:10:23 anssik, OK. 16:10:57 feedback welcome via PR #784 for the device selection explainer 16:10:58 https://github.com/webmachinelearning/webnn/pull/784 -> Pull Request 784 Add device selection explainer (WiP) (by zolkis) 16:11:07 RRSAgent, draft minutes 16:11:08 I have made the request to generate https://www.w3.org/2024/12/05-webmachinelearning-minutes.html anssik 16:15:48 s/maps nicely/map nicely 16:18:59 s/how hand-wave/how handwavy 16:19:24 s/ultra-hand-wave/ultra-hand-wavy 16:20:01 s/distinction/distinctions 16:22:15 NatashaGaitonde has joined #webmachinelearning 16:23:13 s/have should have/have 16:26:14 s/LLaMa/LLaMA 16:27:44 s/experimented/experimented as a solution to that problem 16:28:35 RRSAgent, draft minutes 16:28:36 I have made the request to generate https://www.w3.org/2024/12/05-webmachinelearning-minutes.html anssik 18:34:21 Zakim has left #webmachinelearning