WebML WG Teleconference – 10 April 2025

Meeting minutes

Repository: webmachinelearning/webnn

Incubations summary

anssik: we had an EU and APAC timezone friendly WebML CG Teleconference last week

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-03-31-cg-minutes.md

anssik: key takeaways:
… Proofreader API was discussed, positive sentiment from the group
… this API will be proposed for CG adoption: webmachinelearning/charter#11

<gb> Pull Request 11 Add Proofreader API to Deliverables (by anssiko)

anssik: I will send a call for review to the group's list soon
… note that proofreading was in scope of the Prompt API, now we want to add an explicit task-specific API for it
… we also discussed Writing Assistance APIs review feedback, noted the spec is in a good shape, the most advanced in terms of spec maturity of all task-based APIs
… also Prompt API feature requests were discussed
… exposing max image / audio limits, preference to leave this feature out for now
… multimodal real-time capabilities, we saw a demo from Christian using cloud-based APIs, noted a gap of around 1-year with cloud-based vs. task-based APIs in browsers
… reviewed DOM integration proposal, the group wanted to see motivating use cases for the feature
… our upcoming WebML CG meeting schedule is as follows, note we agreed to skip the next week's AMER:
… - 28 April EU
… - 13/14 May AMER
… - 26 May EU
… - 10/11 June AMER
… - 23 June EU (tentative due to vacation period in the Northern hemisphere)
… - 8/9 July AMER

AI Agents

anssik: Dom hosted an AI Agents W3C Breakouts session a few weeks ago

How would AI Agents change the Web platform? - Mar 26

anssik: most recently AI Agents was discussed at the W3C Advisory Committee meeting, Apr 8, in context of AI Impact on the Web discussion
… topics:
… - AI Browsers such as OpenAI Operator
… - Model Context Protocol (MCP)
… - Web Automators
… - Assistive technology
… risks:
… - security, hallucination, break out of the sandbox with prompt injection
… - privacy, with another party in the mix
… ecosystem:
… - user intent dilution
… - monetization with attention

anssik: I'd like to share a few active W3C workstreams connected with AI Agents
… the WebML CG discussed a Prompt API feature request to add tool/function calling
… this function calling proposal from Jul 2024 is basically a predecessor for MCP introduced Nov 2024

webmachinelearning/prompt-api#7

<gb> Issue 7 Support for tool/function calling (by christianliebel) [enhancement]

anssik: proposes to allow browser (extensions?) to provide standard functions to be called to augment the capabilities of an LLM model
… a simple example would be e.g. a calculator function provided by the browser

Christian: this is still relevant, you can do function calling without AI Agents and vice versa, calling a JS function with a well-defined schema, use cases e.g. with form filling, you want to make sure the data is well formed and can be used in non-AI contexts
… re Agentic AI, this is very WIP, so we're not behind in terms of the web capabilities, now is the good time to explore this

anssik: there's also a more recent feature request to add explicit MCP support to Prompt API

webmachinelearning/prompt-api#100

<gb> Issue 100 [FR] Add MCP Support (by christianliebel)

Christian: you could add MCP support to your browser and expose certain functionality via tools, this is why tool calling is important, I think it should be implemented
… MCP story is early, would be great if you could interact with the web site, extension could talk to the tab, this could be one functionality, Playwright MCP Server is a good example
… this type of use cases would be nice, thinking if Prompt API is the good place to extend

<zkis> https://modelcontextprotocol.io/introduction

anssik: MCP, Model Context Protocol, is like function calling with superpowers
… reminds the traditional client-server model:
… - MCP Server -- where the tools live, e.g. local calculator or remote weather lookup, web search etc.
… - MCP Client -- connector usually part of the AI Agent, finds available tools, formats requests, communicates with the MCP Server

Christian: MCP is still in flux, it is not too late to think about how to integrate this into browsers
… you talk to external systems, maybe on your local system, maybe remotely

<zkis> MCP in agentic orchestration and other topics: https://huggingface.co/blog/Kseniase/mcp

A collection of MCP Servers

Tensors for graph constants

anssik: issue #760 PR #830

<gb> Pull Request 830 Allow tensors for graph constants. (by bbernhar)

<gb> Issue 760 Support building graphs from `MLTensor` containing constants (by bbernhar) [feature request]

anssik: this issue was opened Sep 2024 and I felt now is the right time to discuss it again on this call
… thanks Bryan for iterating on the PR and prototyping, and Austin for all the Chromium work!

jsbell: there's an agreement on the idea, bikeshedding on if this is a new IDL type or a new property on an existing interface

anssik: I see 3 open conversations in the PR, would like to see if we have agreement on them

Ningxin: I think we could defer this until Bryan is on the call

Caching mechanism for MLGraph

anssik: issue #807

<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]

Explainer updates & migration

anssik: since our last discussion, we agreed cross-origin model sharing use cases are out of scope
… we agreed to focus on the same origin caching of MLGraphs
… this tighter scope aligns with the implementation intent and avoids privacy risks
… we will use Reilly's explicit API as a starting point for the explainer that will document Chromium implementation experience, using the successful MLTensor explainer-Chromium prototyping feedback loop as a blueprint
… given Reilly's Chromium experience, I'd ask Reilly to submit a PR for the explainer skeleton focusing on same-origin case that we can iteratively advance
… all exploratory work (cross-origin, adapters etc.) will happen in a separate hybrid-ai repo to keep this WG focused on what is being implemented in browser engines

<McCool> webmachinelearning/hybrid-ai#16

<gb> Pull Request 16 Create localcache.md (by mmccool)

McCool: I created a local cache explainer, has labels for discussion items

RafaelCintron: I like the concept of having a PR to gather comments, to provide feedback how to do that?

<ningxin> +1 to send the localcache.md PR to WebNN repo

<jsbell> +1

anssik: there will be a new PR

<zkis> +1 to discuss the explainer in WebNN WG, as cross-origin was marked out of scope in the PR, so it is in line with the WG

Zoltan: I checked this discussion and it looks pretty good and like that the cross-origin is out of scope, Reilly taking the first stab on the PR and explainer SGTM

Requirements from web frameworks

anssik: it was proposed by Ningxin we should look at what WebNN key customers i.e. web frameworks need from the caching mechanism and design toward those requirements
… one such customer is ORT native that has Execution Provider context cache feature
… EP context cache attempts to solve the exact problem of compilation cost, notes most backends SDKs provide the feature to dump the pre-compiled model into binary file that can be directly executed on the target device and as such improves session creation time

OnnxRuntime EP context cache

anssik: what can we learn from the OnnxRuntime EP context cache design?

ningxin: wanted to clarify this design is only for native

ningxin: this means we need to coordinate with ORT folks, EP context is a possible way to move forward, Execution Provider is based on vendors SDKs, can provide compiled blobs to native apps to use those exported models with this EP context mode
… this is the native use case, I think there's an opportunity to have a similar thing on the web, EP context is one ONNX op in ONNX opset
… we cannot import ONNX model with native binary with web, but can think about saved MLGraph being used for EP context, I think there's an opportunity to explore with Reilly's proposal
… we experimented with this feature on native to see data we could project on to web
… on Intel platform using GPU and NPU, using EP context feature we can accelerate session generation time with SD turbo 7x speedup
… for NPU even greater speedup, 25x speedup in session creation time
… this is very promising data

ningxin: prototype in Chromium is our next step, will share Chromium CLs in the spec issue

Related API proposals

anssik: there are other explorations in this space

Cross-Origin Storage API:

anssik: I believe discussion on COS API use cases and user research might be useful due to some overlap

jsbell: high-level, this is very-very-experimental, no implementation commitment yet

Christian: AFAICT, this is in early feedback gathering phase, user research going on
… example.org and example.com could not share the same model file so want to only require it to be downloaded once, Joshua L provided positive feedback, also WebLLM project, but challenges remain

jsbell: parallel exploration about whether can this be done on a higher level, "want a model good at translation"
… this exploration was discussed at BlinkOn and interesting tidbit was that file-based mechanism bleeds into compiled model mechanism
… would be difficult to see how that could be done for ML models similarly to what we do can do for Wasm modules

anssik: another proposal is the Cross-Origin Model cache exploration by Michael looking at cross-origin reuse and adapters

Cross-Origin Model Cache

McCool: one comment comparing my proposal with COS API, cache is non-deterministic, not equivalent, the advantage of hashing is you can't change the data without changing the hash

<jsbell> +1 to Michael... I think these approaches are complementary, not in competition.

McCool: need to figure out local caching first and how that impacts other things, I think we need to explore file store and caching, what is unique to caching

jsbell: the caching and cross-origin storage proposal are not in conflicts, that said, no implementation commitment for the latter at the moment

McCool: the adapters feature is a separate issue, step 3

Query mechanism for supported devices

anssik: issue #815

<gb> Issue 815 Query mechanism for supported devices (by anssiko) [device selection]

<McCool> (adapters might be step 2 if we build them on local caches and/or model storage)

anssik: I wanted to discuss any new use case feedback, Apple's device privacy considerations, address questions on the capacity concept
… and review the latest iteration of the API proposal
… we discussed Markus' feedback last time, is there any new information, or questions to Markus?

webmachinelearning/webnn#815 (comment)

<gb> Issue 815 Query mechanism for supported devices (by anssiko) [device selection]

anssik: Mike responded to Markus and shared an example from the Apple device ecosystem, explaining that e.g. a 8-core and 80-core Apple GPUs are seen as identical devices for privacy reasons

Mike: for WebGPU and other Web APIs we limit exposure, so these look the same

anssik: this implies GPU capabilities may differ significantly, and on a lower core-count system Apple framework might prefer to use the NPU for a better user experience, even if a developer would request a GPU
… it is suggested the MLPowerPreference hint allows for this flexibility
… another key piece of feedback is that excluding the CPU is not possible with Core ML APIs today
… suggestion is to allow Core ML to choose the device it thinks is the most suitable
… lastly, there was a request for a sample that could be used to reproduce the case where the existing MLPowerPreference is not sufficient for this use case

anssik: Zoltan put together the latest iteration of the API proposal considering what Core ML supports:
… - enumerate available compute devices (cpu, gpu, npu)
… - limit the used compute devices (to cpu-only, cpu+gpu, cpu+npu, or auto).

https://developer.apple.com/documentation/coreml/mlcomputedevice/allcomputedevices

https://developer.apple.com/documentation/coreml/mlcomputeunits#Processing-Unit-Configurations

zkis: the latest proposal adds devicePreference hint passed at context creation time:

const context = await navigator.ml.createContext({
    powerPreference: 'high-performance',
    devicePreference: "gpu-like",  // or "cpu-only", defaulting on "auto"
});

zkis: two directions, Mike's proposal, or even simpler version of what Reilly requested

zkis: original request is to be able to ask if the context has GPU in any combination, I tried to satisfy that use case
… if this is implementable on Apple platforms would be good to know

Mike: the need for an additional hint does not seem completely justified, adding it would mean we can't remove it ever, need stronger motivation to add it

– DRAFT –
WebML WG Teleconference – 10 April 2025

10 April 2025

Attendees