WebML CG Teleconference – 2 October 2025

Meeting minutes

Anssi: first, please welcome
… Mari, Jason Mayes and Mark Foltz from Google,
… Henrik Edstrom from Autodesk,
… Ranjith Raj, Luca Del Puppo, as individual contributors
… to the WebML Community Group!
… also Rick Viscomi from Google

Rick: I joined recently to look into WebMCP, work on Chrome DevRel, interested in this space and happy to follow along

Mari: part of the Chrome technical team, working in the agentic space, want to understand the scope of the work and futureroadmap

Updated WebML CG Charter operational

Repository: webmachinelearning/charter

Charter

Anssi: new charter is now operational as of 2025-09-25
… Changelog is simply "Add WebMCP API as a new deliverable"
… thank you everyone for your support!
… any questions?

https://webmachinelearning.github.io/incubations/

WebMCP API

Repository: webmachinelearning/webmcp

Anssi: next, we will continue our brainstorming to build shared understanding of the key issues, solutions, and solicit new ideas
… we touch on the issues we deferred from our last call, and also revisit global name bikeshedding time allowing
… as a reminder, please apply Agenda+ GH label to issues and/or PRs you'd propose we discuss on our meetings, also feel free to remove the label as appropriate

Agenda+ label

Anssi: I will consider all proposals and update also announced agendas when needed

Capability discovery

Anssi: issue #8

<gb> Issue 8 Should tools be a means for capability discovery? (by bokand) [Agenda+]

Anssi: David asks should tools be a means for capability discovery for an agent?
… it looks like the declarative API that would complement the imperative API would address this issue?
… are we ready yet to make a resolution?

Khushal: declarative makes it easy to index and crawl the site
… wanted to have a separate issue to establish that goal, I had some comments on declarative API that adds attributes to existing markup, it helps but is not enough, does not help with JS specific functionality, Microsoft folks had a proposal to address that, a manifest-based proposal
… I don't have an opinion on the API shape, but can take a decision that tools are also indexable

David: I have a little bit reservation on tools being context dependant, no concerns otherwise

<kush> +1

<brwalder> +1

<dbokan> yup, lgtm

<Leo> +1

<AlexN> +1

RESOLUTION: The group wants to make the tools be part of the discovery mechanism and continues to explore and prototype API shapes that satisfy this requirement. This includes the declarative API proposal that complements the imperative API, as well as the JSON manifest, with pros/cons documented.

API to list registered tools

Anssi: issue #16

<gb> Issue 16 Add API to list / execute tools? (by bokand) [Agenda+]

Anssi: David proposes "an API to list out the registered tools and be able to execute them by name and argument dictionary would be useful for external agents (e.g. provided via extensions or third-party libraries)"
… Khushal points out WebMCP for Service Workers session management intersects here

Session management

Anssi: there's a concern multiple agents could stomp on each other
… Brandon proposes a lock mechanism similar to Pointer Lock that only one user or agent can hold at a time
… Ilya suggests we need an API / listener that agent can subscribe to for updates to the list of tools, similarly to MCP's "notifications/tools/list_changed" method, this avoid the need to poll for changes in a loop
… Jason proposes an alternative where (un)registerTool communicate tools changes, noting Ilya's proposal aligns with MCP and makes more sense
… Khushal point out we haven't explored integration with non-browser agents that could interact with web pages via extension APIs, of Chrome DevTools Protocol (CDP) for automation use cases

Khushal: when I filed this issue, had implicit assumption how agents would use this API, my thinking has evolved since
… two options exist:
… "1. The Agent executes script on the web page and discovers tools using the same Web APIs through which the site is declaring them to the browser."
… "2. The API surface the Agent is using (an extension API or chrome devtools protocol) provides higher level hooks to connect WebMCP with the Agent."
… when this API is web-ified further, iframe-embedding might have specific constraints or requirements
… these policies will be implemented by the engine, expecting user-land code to replicate this properly for security policy is risky
… we haven't explored standard API surface how the browser exposes WebMCP to 3P

<Zakim> anssik, you wanted to ask a question

Alex: from my perspective, we inject JS and SOP violations exists with that approach, what the API for those tools would look like, listing tools would be compilation of all?
… can open a new issue

Brandon: +1 to Khushal's point that having a well-defined API to expose 3P to manage security boundary is a great idea, rather than inject JS to the page
… to Alex's point which tools the agent would get from the list, which Service Worker, I guess this hasn't been yet well refined
… inspired by what VSCode's agent does, the user chooses what context the agent has, currently open file or currently open file + other relevant files
… translating that to the browser world, agents could choose with which tabs they interact with and only get tools from those tabs
… also consider Service Worker-based approach

Jason: I was thinking it could be request-based, if the agent requested access to the current tab vs. all the tabs, think collaborative apps across tabs, with each with separate contexts, do you need three separate contexts or one that has access to all and asks the user
… not necessarily specific tabs, more packaged versions we might expose for the same domain

Reilly: I'm not sure we need to do anything specific here besides providing agent tools, extension authors are familiar with crossing extension boundaries
… I'm not sure this is any different

Reilly: browsers include UI granting extensions access to the page on user activation

Alex: only think I'd say is there's a risk when multiple agents inject the same JS, things can get messy with multiple agents interacting at the same time
… see MetaMask extension for similar incidents
… there's race conditions and such

<jason> A simpler example might be multiple password managers

Khushal: since we're discussing how well extensions interact with WebMCP, or purpose-built extensions or CDP APIs, is there adequate understanding to make a call on that?
… Reilly thinks injecting JS is OK?

Reilly: having extensions specific APIs is fine, wanted to point out the SOP violation is not any different from what browsers already allow for extensions in general

Khushal: could we add this to the JS API is the question

Reilly: not going to add APIs for injecting script
… enumerate tools without inserting script, we'd rather use the non-script injection path, not add additional capabilitities to make the script injection path easier

David: with extension API web-based libraries could add to your page and interact with your page via WebMCP, is that important use case?

Brandon: for web frameworks like that there's a workaround that the framework can manage the tool set and maintain references, maybe the web framework problem is solved?

David: I suppose so

<jason> Maybe I'm just missing something here - but IIUC reillyg (please correct me if i'm not understanding) isn't suggesting we use the extensions api for webmcp

<brwalder> +1 to extension and CDP APIs

<dbokan> +1

<AlexN> +1 for both

Reilly: to answer Jason, many APIs to access all the sites, built-in agent can see the tools, if there's a built-in agent, you can have extension that can enumerate tools, maybe CDP API, or the browser itself provides an MCP server for listing tools

Jason: makes sense, thanks

Brandon: suggest "connecting WebMCP with external agents"

David: is the resolution saying we as part of WebMCP will be doing that, it seems a bit external to the API itself?

<dbokan> +1

<kush> "Javascript injection by external Agents to interact with WebMCP is not supported."

Reilly: I think that its within our scope to define a WebDriver API or web extensions API to link these things
… probably outside the WebMCP spec, closer to MCP spec
… WebMCP server connecting with MCP, so agent don't have to act differently across browsers

proposed RESOLUTION: The group looks into higher-level hooks to connect WebMCP with external egants for listing tools. This reduces coupling with MCP and subsequently browser implementation complexity. Javascript injection by external Agents to interact with WebMCP is not supported.

<brwalder> +1

<dbokan> +1

<Leo> +1

RESOLUTION: The group looks into higher-level hooks to connect WebMCP with external agants for listing tools. This reduces coupling with MCP and subsequently browser implementation complexity. Javascript injection by external Agents to interact with WebMCP is not supported.

<kush> +1

<AlexN> +1

Elicitation

Anssi: issue #21

<gb> Issue 21 Elicitation (by bwalderman) [Agenda+]

Anssi: Brandon is "Gathering thoughts on supporting MCP elicitation since this would be a good way to bring the user's attention to a tab if the agent determines that their input is needed."

MCP Elicitation

Anssi: missing feature is how to inform the agent elicitation is happening on-page
… Khushal's initial idea was to use "needsUserInput" tool annotation, but this does not account for conditionality
… how to mitigate abuse case where a site grabs user's attention too much a la popups
… elicitation control flows differ between WebMCP (client) and MCP (remote server); client as an arbitrator knows where we're at with user input, while in MCP server case the remote server manages user input

Anssi: Alex proposes to align with the MCP spec and defer elicitation to call resolution to the client
… deferred elicitation is better for non-human-in-the-loop use cases e.g. automation
… we resolved earlier to focus on human in the loop use cases, nevertheless deferred elicitation would enable forwards compatibility when we get to those automation use cases

2025-09-18 RESOLUTION: WebMCP focuses on human in the loop use cases initially.

Anssi: Ilya from Shopify provides important e-commerce related feedback:
… WebMCP must enable user input and allow review for e.g. terms, disclosures, liability & compliance requirements
… in the most recent comment, Khushal summarized the latest high-level design, rephrased:
… 1) Site <- WebMCP API -> Browser
… 2) Browser <- another API -> Agent
… 3) WebMCP mirrors MCP concepts, if they map well to Web API friendly abstractions
… elicitation is needed for (1) for sure

Khushal: we recognize there are cases where the user needs to interact with the site in the middle of tool execution
… only way to do that is when the user interact with the site, then annotation is enough
… but we realized that dynamically during execution we may find out the same
… which entity needs to know user attention is required?
… built-in agent can background a tab, how to handle that case here?
… Ilya noted we're seeing an usage patterns where there's browser usage in a VM, how to manage that, then the discussion went into what is the browser and agent connection looking like
… whatever API is powering the interaction, it needs to be able to communicate if user interaction is happening
… what is the connection between user and agent to use WebMCP
… to minimize the problem space, can we avoid the second part, and decide how the WebMCP talks to browser when it needs elicitation

Brandon: parallels to popup issue, if an agent needs to elicit input from the user, and user's tab is backgrounded, that is attention-grabbing behavior we want to avoid
… browser UI does not need to foreground the tab to do so, perhaps flashing the tab would work as a mechanism to let the tool call yield control to the user
… maybe during the tool call an API on the WebMCP object, JS can tell the browser to give mouse and keyboard control back
… lock mechanism that allows signal the agent the human is giving input is one possible approach, not sure how that'd look like for declarative API

Alex: I have it in the declarative explainer PR, mark certain inputs as requiring human input
… that pops up an alert, rough proposal, feedback welcome
… PR #26

<gb> Pull Request 26 add explainer for the declarative api (by MiguelsPizza)

Khushal: not sure what the API shape to lock looks like, perhaps resolve with "Tool execution should be able to yield to the user."

Brandon: there should be some way for the agents to yield control to user and agent should know when that happens so they can update their UI accordingly

Khushal: the user needs to take over and there's some browser UI "I'm taking over"?

Brandon: I can add more detail with a comment in this issue, thinking about an imperative approach, some sort of JS API that pauses the agent
… when the user has clicked "submit" it resumes the agent

Khushal: I though exec would be an async function

Brandon: tool functions are async, but they'd need multiple user interactions, dedicated API to pause and resume the tool call function and defer promise resolution might be necessary

Khushal: how about resolving on the high-level idea: "should be able to pause and resume during tool execution"

<kush> "Tool execution should be able to start/stop yielding to the user throughout it's lifecycle."

Reilly: I'd add, the high-level need as you said is to have a way for the Agent to say I'd like to interact here, mitigate abuse including an option for the agent to tell the browser when I'm executing the tool to not let the tool to ask for user input, up to agent to mitigate from having abuse

Khushal: for popups you have an option "do not let this site create popups", the user makes the judgment call

<kush> +1

<dbokan> +1

<brwalder> +1

RESOLUTION: Tool execution should be able to start/stop yielding to the user throughout its lifecycle.

Bikeshedding the global name

Anssi: issue #24

<gb> Issue 24 Bikeshedding the global name (by bwalderman) [Agenda+]

Earlier discussion from 2025-09-18 telcon

Brandon: it seems we've converging on navigator as the home, should have a name .modelContext or .agentContext
… not just "tools" or "resources" but others too
… which way to go, modelContext or agentContext
… another issue #31 for agent-to-agent interaction

<gb> Issue 31 Support agent to agent interaction (by khushalsagar)

<kush> +1 to modelContext

Brandon: modelContext might be better considering A2A future

<dbokan> +1

<AlexN> +1

<jason> +1

<tomayac> +1

Reilly: model as a term is overloaded

<brwalder> I have a coin if we need to flip for it

Khushal: the implementation in Chrome needs some name we can give to developers as a source of truth
… I've been writing the word "agent" and confuse it with "user agent"

<dbokan> agentModelContext :P

Reilly: I can live with navigator.modelContext
… I feel like the user agent containing the model is what the API is about

<AlexN> +1

<dbokan> +1

<brwalder> +1

<kush> +100

<jason> +1

<tomayac> +1

RESOLUTION: navigator.modelContext is the "root" object name

<reillyg> Thanks all!

– DRAFT –
WebML CG Teleconference – 2 October 2025

02 October 2025

Attendees

Meeting minutes

Updated WebML CG Charter operational

WebMCP API

Capability discovery

API to list registered tools

Elicitation

Bikeshedding the global name

Summary of resolutions

Diagnostics