00:47:17 <RRSAgent> RRSAgent has joined #webmachinelearning
00:47:21 <RRSAgent> logging to https://www.w3.org/2025/11/11-webmachinelearning-irc
00:47:21 <Zakim> RRSAgent, make logs Public
00:47:22 <Zakim> please title this meeting ("meeting: ..."), anssik
00:47:22 <anssik> Meeting: Web Machine Learning CG F2F – 11 November 2025
00:47:25 <anssik> Chair: Anssi
00:47:35 <anssik> Agenda: https://github.com/webmachinelearning/meetings/issues/35
00:47:35 <gb> https://github.com/webmachinelearning/meetings/issues/35 -> https://github.com/webmachinelearning/meetings/issues/35
00:47:36 <sushraja> present+ Sushanth_Rajasankar
00:47:39 <anssik> Scribe: Anssi
00:47:42 <anssik> scribeNick: anssik
00:47:46 <anssik> scribe+ dom
00:47:54 <anssik> gb, this is webmachinelearning/webmcp
00:47:54 <gb> anssik, OK.
00:48:00 <anssik> Present+ Anssi_Kostiainen
00:48:04 <anssik> Present+ Dominique_Hazael-Massieux
00:48:13 <anssik> RRSAgent, draft minutes
00:48:15 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik
00:49:20 <Jingyun> Jingyun has joined #webmachinelearning
00:49:31 <Sun> Sun has joined #webmachinelearning
00:49:35 <cpn> cpn has joined #webmachinelearning
00:49:37 <lei_zhao> lei_zhao has joined #webmachinelearning
00:49:37 <kush> kush has joined #webmachinelearning
00:49:42 <cpn> present+
00:49:50 <scheib> present+ Vincent_Scheib
00:50:43 <iahouma> iahouma has joined #webmachinelearning
00:51:41 <kush> present+
00:51:42 <Sun> present+ Sun_Shin
00:51:42 <anssik> Topic: Welcome
00:51:45 <phillis> phillis has joined #webmachinelearning
00:52:23 <lei_zhao> present+
00:53:50 <anssik> Topic: Welcome
00:53:54 <kbx> kbx has joined #webmachinelearning
00:53:58 <MasaoG> MasaoG has joined #webmachinelearning
00:53:58 <anssik> Anssi: welcome to the W3C Web Machine Learning CG F2F at TPAC 2025
00:54:02 <anssik> ... I'm Anssi Kostiainen, Intel, the chair of the CG, also chairing the WG that works closely together with this CG that serves as an incubator
00:54:30 <anssik> ... as a recap from yesterday:
00:54:30 <reillyg> present+ Reilly_Grant
00:54:52 <Dingwei> Dingwei has joined #webmachinelearning
00:55:04 <acomminos> acomminos has joined #webmachinelearning
00:55:08 <christianliebel> present+ Christian_Liebel
00:55:11 <Dingwei> present+
00:55:12 <markafoltz> markafoltz has joined #webmachinelearning
00:55:15 <ningxin> ningxin has joined #webmachinelearning
00:55:25 <kbx> present+ Kenji_Baheux
00:55:34 <Mark_Foltz> present+ Mark_Foltz
00:55:58 <Haili> Haili has joined #webmachinelearning
00:56:20 <kush> kush has joined #webmachinelearning
00:56:22 <anssik> ... this CG is a group where new ideas are discussed, explored and incubated before formal standardization
00:56:24 <kush> present+
00:56:39 <ningxin> present+
00:56:44 <Haili> present+
00:56:45 <anssik> ... past CG spec incubations include e.g. WebNN API, Model Loader
00:56:53 <MasaoG> present+ Masao_Goho
00:57:06 <anssik> ... since last year, we've expanded the scope of the CG
00:57:14 <anssik> -> WebML CG Charter https://webmachinelearning.github.io/charter/
00:57:30 <anssik> Anssi: the CG has added in scope and delivered a number of new incubations
00:57:34 <anssik> -> WebML CG Incubations https://webmachinelearning.github.io/incubations/
00:57:59 <BenGreenstein> BenGreenstein has joined #webmachinelearning
00:58:11 <johannhof> johannhof has joined #webmachinelearning
00:58:30 <anssik> Anssi: we have delivered first versions of new built-in AI APIs:
00:58:34 <anssik> ... Prompt API
00:58:39 <anssik> ... Writing Assistance APIs
00:58:46 <anssik> ... Translator and Language Detector APIs
00:58:52 <anssik> ... and at the explainer stage we have:
00:58:54 <mtavenrath> mtavenrath has joined #webmachinelearning
00:58:59 <anssik> ... WebMCP API
00:59:06 <anssik> ... Proofreader API
00:59:35 <dom> reillyg: can we add a short presentation on the built-in APIs and developers feedback before we dive into individual APIs?
00:59:45 <dom> anssik: sure, remind me at the start of that session
01:01:25 <anssik> ... this CG has grown significantly over the last year similarly to its sister WG
01:01:33 <anssik> ... obviously the intersection of the web and AI is exciting and this group is the place where the future Web AI experiences are incubated ahead broad market adoption
01:01:49 <anssik> ... the year-over-year growth rate of this group is around +30% for both organizations and participants, so the diversity is growing too
01:02:08 <anssik> -> https://www.w3.org/groups/cg/webmachinelearning/participants/
01:02:27 <anssik> Anssi: we observe many businesses looking to adopt the technologies developed in this group have joined, this is important as it allows us to capture real-world feedback early on
01:03:16 <anssik> ... the nature of these "task-specific APIs" is that they're easy to adopt if your requirements match
01:03:26 <anssik> ... when more control over the experience is required, WebNN API provides the lower-level primitives to build your own
01:03:39 <anssik> ... if you registered as a CG participant, please join us at the table
01:04:08 <anssik> ... observers are welcome to join the table too subject to available space
01:04:13 <anssik> Anssi: we use Zoom for a hybrid meeting experience, please join using the link in the meeting invite
01:04:30 <anssik> Anssi: we use IRC for official meeting minutes and for managing the speaker queue
01:04:32 <anssik> q+
01:04:35 <anssik> ack anssik
01:04:39 <anssik> ... please join the #webmachinelearning IRC channel, link in the meeting invite and agenda:
01:04:45 <anssik> -> https://irc.w3.org/?channels=#webmachinelearning
01:04:45 <anssik> -> https://github.com/webmachinelearning/meetings/issues/35
01:04:46 <gb> https://github.com/webmachinelearning/meetings/issues/35 -> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)
01:04:46 <hagio> hagio has joined #webmachinelearning
01:04:58 <anssik> Anssi: to put yourself on the queue type in IRC "q+"
01:04:58 <anssik> ... during the introductions round, we'll try to record everyone's participation on IRC with:
01:05:01 <anssik> ... Present+ Firstname_Lastname
01:05:02 <tomayac> Present+ Thomas_Steiner
01:05:12 <anssik> ... please check that your participation is recorded on IRC
01:05:12 <estade> estade has joined #webmachinelearning
01:05:21 <Victor> Present+Victor_Huang
01:05:43 <estade> Present+ Evan_Stade
01:05:45 <Mike_Wyrzykowski> Present+ Mike_Wyrzykowski
01:05:46 <hagio> Present+ Yuta_Hagio
01:05:46 <Victor> Present+ Victor_Huang
01:05:47 <mtavenrath> Present+ Markus_Tavenrath
01:05:48 <alispivak> alispivak has joined #webmachinelearning
01:05:53 <RobKochman> Present+ Rob_Kochman
01:05:57 <dom> RRSAgent, draft minutes
01:05:59 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom
01:06:05 <reillyg> Present+ Victor_Huang
01:06:05 <sushraja> Present+ Sushanth_Rajasankar
01:06:13 <Tarek7> Tarek7 has joined #webmachinelearning
01:06:13 <Em> Em has joined #webmachinelearning
01:06:13 <alispivak> Present+ Ali_Spivak
01:06:15 <hyojin> Present+ Hyojin Song
01:06:15 <RafaelCintron> RafaelCintron has joined #webmachinelearning
01:06:15 <kush> present+ Khushal_Sagar
01:06:15 <BenGreenstein> Present+ Ben_Greenstein
01:06:15 <cfredric> cfredric has joined #webmachinelearning
01:06:15 <iahouma> Present+ Isaac_Ahouma
01:06:16 <Em> Present+ Emily_Lauber
01:06:16 <Tarek7> Present+ Tarek_Ziade
01:06:17 <RafaelCintron> Present+ Rafael_Cintron
01:06:22 <Haili> Present+ Haili_Bai
01:06:37 <nournabil> nournabil has joined #webmachinelearning
01:06:52 <npm> npm has joined #webmachinelearning
01:06:56 <cfredric> Present+ Chris_Fredrickson
01:07:19 <Em> Em has joined #webmachinelearning
01:07:24 <rviscomi> rviscomi has joined #webmachinelearning
01:07:31 <anssik> Subtopic: Agenda bashing
01:07:34 <anssik> Anssi: the F2F agenda was built collaboratively with CG participants
01:07:45 <anssik> -> https://github.com/webmachinelearning/meetings/issues/35
01:07:45 <gb> https://github.com/webmachinelearning/meetings/issues/35 -> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko)
01:08:07 <anssik> Anssi: any last-minute proposals or updates?
01:08:13 <anssik> Topic: WebMCP
01:08:15 <chi> chi has joined #webmachinelearning
01:08:17 <anssik> gb, this is webmachinelearning/webmcp
01:08:20 <gb> anssik, OK.
01:08:24 <anssik> Subtopic: Intro & demo
01:08:57 <anssik> Anssi: the WebMCP abstract reads:
01:09:02 <anssik> ... "Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows."
01:09:44 <anssik> ... TL;DR unpacking:
01:09:55 <Amirsh> Amirsh has joined #webmachinelearning
01:10:06 <anssik> ... - WebMCP API is a new JS API that allows a web developer to expose their web app functionality as "tools"
01:10:21 <anssik> ... - web developer is in control what tools the web page exposes and how they function
01:10:42 <anssik> ... - the web page acts as an MCP Server equivalent, but implements the tools in client-side script, not backend
01:11:10 <anssik> ... - tools are simply JS functions with associated natural language descriptions and structured schemas
01:11:16 <anssik> ... - the natural language descriptions allow AI agents to invoke the programmatic "tools" API
01:11:44 <anssik> ... - there's also a complementary declarative WebMCP API being worked on, reusing ARIA role-* attributes
01:11:45 <Amirsh> present+ Amir
01:13:16 <anssik> Anssi: the WebMCP proposal is a synthesis of two proposals from Microsoft and Google
01:13:21 <anssik> ... the group has now converged on the initial proposal documented in the explainer and supplementary proposal document that contains API shape and code examples
01:13:26 <anssik> -> WebMCP explainer https://github.com/webmachinelearning/webmcp/blob/main/README.md
01:13:31 <anssik> -> WebMCP proposal https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md
01:13:45 <anssik> Anssi: the group has received early implementation experience and feedback from the OSS community contributors Alex Nahas and Jason McGhee that has been very valuable
01:13:58 <anssik> -> https://github.com/MiguelsPizza/WebMCP
01:14:03 <anssik> -> https://github.com/jasonjmcghee/WebMCP
01:14:24 <anssik> Anssi: both these OSS projects have explored the problem space ahead browser implementations and without some of the constraints of the browser implementation
01:14:27 <anssik> ... both Alex and Jason are active participants of the group
01:15:04 <dom> Kush: it was a good surprise that the GOogle and MS team had very similar APIs in mind, good signal for future convergence
01:15:43 <dom> ... we've started with a very simple API shape, but as we're advancing, it looks like browsers will have an important role to play as a trusted mediator
01:15:59 <dom> ... we want to make sure developers have control about how their site can be used by agents
01:16:18 <dom> ... also nice to see interest from the community in this space
01:17:03 <dom> sushraja: this started from seeing the enthusiasm of the community around MCP, and we were pleasantly surprised to see Google with a similar proposal
01:17:18 <dom> ... the explainer is still very open to changes
01:17:33 <dom> ... lots still need to be figured out in terms of API shapes, capabilities, security, ...
01:17:47 <dom> ... very open to feedback, in particular from potential users of the API
01:19:10 <dom> -> https://screen.studio/share/hbGudbFm WebMCP video demo
01:19:13 <reillyg> scribe+
01:23:53 <anssik> Anssi: in this demo Alex demonstrates a user interacting with an AI agent built into the browser
01:23:57 <anssik> ... Alex has a conversation with an an agent using a voice interface and asks the agent to search for specific information (WebMCP explainer) and send a short summary of this information via email to a specific email address
01:24:01 <anssik> ... in the another demo, Alex uses the Prompt API integration
01:24:05 <anssik> ... the demo uses determistic API access to tools provided by the website and does not do any DOM parsing or computer vision-based image processing of screenshots
01:24:10 <anssik> ... that means secure, auditable
01:24:14 <anssik> ... in the spirit of the human-in-the-loop workflow, the end user can actually choose which tools exposed by the website are accessible to the agent
01:24:18 <anssik> ... all compute happens on the client side, the model powering the agent runs on the client
01:24:22 <anssik> ... this is more light-weight and privacy-preserving approach compared to cloud-based agentic usage
01:25:09 <anssik> Anssi: in the second part of the demo, google.com has defined tools such as get_page_title, extract_search_query, run_search, get_search_results
01:25:14 <anssik> ... remember these tools are just JS functions declared by the website, injected into the model's context so it is aware of them
01:25:18 <anssik> ... WebMCP tools make it easy to progressively enhance your existing websites and make them agentic
01:25:21 <anssik> ... Prompt API input "run_search for webmcp and tell me what it is"
01:25:25 <anssik> ... calls the same tools as in the voice model demo
01:25:28 <anssik> ... get_page_title tool call returns "Google" to confirm the page is Google's homepage
01:25:32 <anssik> ... run_search too is invoked to navigate to Google search results for "webmcp"
01:25:44 <anssik> ... all interaction happens inside browser, model execution, tool calls, events
01:25:48 <anssik> ... agent can be stopped at any time by the user
01:25:54 <anssik> -> Prompt API tool use https://github.com/webmachinelearning/prompt-api#tool-use
01:26:02 <anssik> -> WebMCP demo https://screen.studio/share/hbGudbFm
01:27:21 <anssik> RRSAgent, draft minutes
01:27:22 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik
01:29:49 <reillyg> anssik: For this demo the extension injects the tools. In production the website would inject these tools.
01:30:15 <reillyg> mfoltz: How close was the polyfill to the explainer?
01:30:36 <reillyg> anssik: Unsure. You can check out the repo. It's conceptually aligned.
01:30:45 <markafoltz> markafoltz has joined #webmachinelearning
01:30:52 <reillyg> kush: I have another demo I can show which matches the explainer.
01:31:00 <kush> https://drive.google.com/file/d/1awZA2bsVNO-uUqo9NpVdnHS2Xh26Pf4u/view?usp=sharing
01:31:12 <reillyg> -> https://drive.google.com/file/d/1awZA2bsVNO-uUqo9NpVdnHS2Xh26Pf4u/view?usp=sharing
01:32:02 <reillyg> kush: This demo shows integration with MCP UI.
01:32:13 <reillyg> ... Allows embedding UI within the conversation with the agent.
01:32:23 <reillyg> ... That UI declares its tools with WebMCP.
01:32:47 <reillyg> ... The agent doesn't need to hit the UI's backend. It's all a client-side app.
01:33:07 <anssik> q?
01:33:12 <kbx> q+
01:33:19 <anssik> ack kbx
01:33:20 <reillyg> ... In this case it's not a full website but a small UI embedded in a chat like we're seeing in the ecosystem.
01:33:56 <anssik> q?
01:34:03 <Dingwei> Dingwei has joined #webmachinelearning
01:35:12 <reillyg> Subtopic: Built-in agent ideation
01:35:54 <reillyg> ningxin: (Referencing slides) This is very similar to what we just saw in the demo.
01:36:09 <reillyg> ... Web application registers capabilities through WebMCP.
01:36:15 <reillyg> ... Browser has built-in small language model.
01:36:50 <Em> Em has joined #webmachinelearning
01:37:23 <reillyg> ... Investigating whether a site can call into this agent to perform a task using tools provided by the web app itself.
01:38:51 <reillyg> ... Investigating whether this can be run entirely locally.
01:39:05 <anssik> q?
01:39:07 <kbx> q+
01:39:22 <kush> q+
01:39:27 <anssik> ack kbx
01:39:28 <reillyg> anssik: We can share this idea in an issue and foster further discussion.
01:39:35 <reillyg> kbx: +1 to this idea.
01:39:35 <Tarek> Tarek has joined #webmachinelearning
01:39:42 <reillyg> ... the browser also has capabilities that we might want to expose.
01:39:45 <anssik> ack kush
01:39:50 <AmirSh> AmirSh has joined #webmachinelearning
01:40:02 <reillyg> kush: There are two issues related ot this on WebMCP.
01:40:02 <anssik> q?
01:40:26 <reillyg> ... 1. Exposing built-in tools to agent.
01:40:49 <reillyg> ... 2. Are you hoping for a tool to expose built-in agent to external agent?
01:40:52 <kbx> q+
01:41:04 <reillyg> ningxin: No. The prompt comes from the user, e.g. a chatbox on the site.
01:41:13 <LeoLee> LeoLee has joined #webmachinelearning
01:41:21 <reillyg> kush: Could the in-browser and site agent work together?
01:41:30 <reillyg> ningxin: The web app can customize this.
01:41:42 <reillyg> ... We're considering use by app itself as well as extensions.
01:41:49 <LeoLee> present+
01:41:51 <reillyg> ... Extensions can use this API to enhance the browser agent.
01:42:21 <reillyg> Subtopic: Communication with the TAG
01:42:40 <anssik> Subtopic: Communication with the TAG
01:42:47 <anssik> Anssi: issue #35
01:42:48 <gb> https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+]
01:42:56 <anssik> Anssi: the TAG provided the following discussion points:
01:43:00 <anssik> -> https://github.com/webmachinelearning/webmcp/issues/35#issuecomment-3424766197
01:43:00 <gb> https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+]
01:43:04 <kbx> q-
01:43:17 <anssik> Anssi: I'd like to prime this session by unpacking the TAG discussion points followed by a summary of the group's current posture wrt these points
01:43:37 <anssik> Anssi: - Motivation: "Frontend agent integration is useful and should be supported on the Web"
01:43:57 <anssik> ... the group agrees with the TAG, the WebMCP effort's focus is to establish interoperable interfaces to enable frontend agent integration, motivation section of the explainer expands on this
01:44:04 <anssik> -> WebMCP Explainer > Motivation https://github.com/webmachinelearning/webmcp#background-and-motivation
01:44:25 <anssik> ... - Generality: "Whatever the WebMCP efforts introduce to the web standards should be general enough to support different protocols"
01:44:59 <anssik> ... the explainer states "the WebMCP API [is] as agnostic as possible" and explains the API only reuses the "tools" base primitive from MCP, similarly to how the Prompt API reuses "tools"
01:45:28 <hagio> hagio has joined #webmachinelearning
01:45:31 <anssik> ... the group's charter codifies coordination expectations with the AI Agent Protocol CG, and I expect this group's participants to explore implementations atop other emerging protocols that may gain traction
01:45:40 <anssik> -> https://github.com/webmachinelearning/webmcp#model-context-protocol-mcp-without-webmcp
01:45:44 <anssik> -> https://webmachinelearning.github.io/charter/#coordination
01:46:00 <anssik> ... - Privacy and Security: "The P&S aspects can be challenging, and we would also like to see more explorations."
01:46:20 <anssik> ... the group has initiated a dedicated workstream for privacy and security, documented in issue #45 to be discussed next
01:46:21 <gb> https://github.com/webmachinelearning/webmcp/issues/45 -> Issue 45 Privacy & security considerations for WebMCP  (by victorhuangwq) [Agenda+]
01:46:35 <anssik> ... - Declarative API: "A declarative API can be quite useful and cover use cases that an imperative API can't cover"
01:46:56 <anssik> ... the group agrees with the TAG and has developed and prototyped a declarative API alongside the imperative API, the latest proposal is PR #26 and related discussion in issue #22
01:46:58 <gb> https://github.com/webmachinelearning/webmcp/issues/22 -> Issue 22 Declarative API Equivalent (by EisenbergEffect)
01:46:58 <gb> https://github.com/webmachinelearning/webmcp/pull/26 -> Pull Request 26 add explainer for the declarative api (by MiguelsPizza) [Agenda+]
01:47:18 <kush> q?
01:47:28 <anssik> Anssi: "[TAG] would like to encourage further development of the high-level API that the WebML [CG] is currently working on" suggests this group is on the right track
01:47:39 <christianliebel> cpn: https://github.com/webmachinelearning/webmcp/issues/35
01:47:40 <gb> https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+]
01:47:44 <anssik> ... per issue discussion Xiaocheng's personal view (and not TAG consensus) is "WebMCP should be built bottom-up on lower-level primitives"
01:48:04 <anssik> ... Khushal explains in his comment how current web sites don't have a way to expose the same "how I use this site" as semantic actions and that such an API must be standardized for the browser to know how to use such actions programmatically
01:48:09 <anssik> ... this is the rationale why the "tools" was chosen as the abstraction for the WebMCP to allow interoperability between websites and browsers
01:48:12 <anssik> ... the "tools" abstraction enables websites and web browsers to talk to each other programmatically without needing for the browserss to scrape the content or run computer vision models on the page screenshots to understand where to click
01:48:19 <anssik> ... the "tools" abstraction enables websites and web browsers to talk to each other programmatically without needing for the browserss to scrape the content or run computer vision models on the page screenshots to understand where to click
01:48:22 <anssik> -> https://github.com/webmachinelearning/webmcp/issues/35#issuecomment-3444229149
01:48:26 <anssik> q?
01:48:47 <reillyg> jyasskin: That was a good summary.
01:48:55 <reillyg> ... A lot of this feedback was not full TAG concensus.
01:49:20 <reillyg> ... Loudest concern is that MCP is not going to last. Want to avoid stuff being stuck in the platform that won't match a future protocol.
01:49:36 <reillyg> ... There was some uncertanty if the current design satisfies that request.
01:49:42 <reillyg> ... I think it might.
01:49:48 <anssik> q?
01:49:59 <reillyg> ... Also concern about JSON schema being production ready here. I think that it is good enough.
01:50:46 <anssik> q?
01:51:55 <reillyg> anssik: My takeaway from the feedback is that high-level is the right direction.
01:52:11 <reillyg> sushraja: There was a suggestion to try something lower level.
01:52:31 <reillyg> ... Building a JS API which would allow multiple LLMs to understand tools that are available.
01:52:40 <reillyg> ... This would be challenging across multiple library versions.
01:52:43 <anssik> q?
01:53:06 <anssik> RESOLUTION: Continue development of WebMCP as the high-level API as per the TAG guidance. Coordinate with the AI Agent Protocol CG on new protocols.
01:54:46 <anssik> Subtopic: Privacy & security considerations
01:54:50 <anssik> Anssi: issue #45
01:54:52 <gb> https://github.com/webmachinelearning/webmcp/issues/45 -> Issue 45 Privacy & security considerations for WebMCP  (by victorhuangwq) [Agenda+]
01:54:55 <anssik> ... the group has kicked off a workstream to look into privacy and security considerations as requested by the TAG in #35
01:54:55 <gb> https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+]
01:55:05 <nournabil> nournabil has joined #webmachinelearning
01:55:06 <anssik> ... please note also breakout session "Agentic Browsing and the Web's Security Model" by Johann tomorrow
01:55:13 <anssik> -> https://github.com/w3c/tpac2025-breakouts/issues/25
01:55:14 <gb> https://github.com/w3c/tpac2025-breakouts/issues/25 -> Issue 25 Agentic Browsing and the Web's Security Model (by johannhof) [session]
01:56:21 <Victor> anssik I have prepared a slide show for the privacy and security considerations that I can share after the coffee break perhaps?
01:57:30 <hagio> hagio has left #webmachinelearning
02:33:24 <Tarek> Tarek has joined #webmachinelearning
02:34:11 <sushraja> sushraja has joined #webmachinelearning
02:34:14 <kbx> kbx has joined #webmachinelearning
02:34:22 <RobKochman> RobKochman has joined #webmachinelearning
02:34:24 <hagio> hagio has joined #webmachinelearning
02:34:28 <LeoLee> LeoLee has joined #webmachinelearning
02:34:30 <kbx> present+ Kenji_Baheux
02:34:40 <BatuHoang> BatuHoang has joined #webmachinelearning
02:34:43 <Ehsan> Ehsan has joined #webmachinelearning
02:34:49 <Ehsan> Ehsan Toreini +
02:34:56 <Em> Em has joined #webmachinelearning
02:35:02 <mgifford2> mgifford2 has joined #webmachinelearning
02:35:07 <Fazio> Fazio has joined #webmachinelearning
02:35:27 <cpn> cpn has joined #webmachinelearning
02:35:43 <Fazio> present+
02:35:46 <anssik> RRSAgent, draft minutes
02:35:47 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik
02:36:09 <janina0> janina0 has joined #webmachinelearning
02:36:20 <janina0> Present+
02:36:21 <Neha> Neha has joined #webmachinelearning
02:36:27 <Neha> present+
02:36:40 <anssik> Anssi: the issue opened by Victor surfaces three key areas for deeper discussion
02:36:52 <anssik> ... 1. Prompt injection attacks, which is already mentioned in issue #11
02:36:52 <gb> https://github.com/webmachinelearning/webmcp/issues/11 -> #11
02:37:01 <matatk> matatk has joined #webmachinelearning
02:37:04 <anssik> ... 2. Misrepresentation of intent in the WebMCP tool
02:37:15 <anssik> ... 3. Personalization scraping / fingerprinting through over parametrization
02:37:16 <matatk> present+ Matthew_Atkinson
02:37:19 <anssik> ... I will break this into separate subtopics
02:37:30 <anssik> ... (permissions part of the solution space is discussed in issue #44, we'll discuss that separately later today)
02:37:31 <gb> https://github.com/webmachinelearning/webmcp/issues/44 -> Issue 44 Managing action specific permissions (by khushalsagar) [Agenda+]
02:37:35 <kush> kush has joined #webmachinelearning
02:37:52 <kush> present+ Khushal_Sagar
02:38:01 <Dingwei> Dingwei has joined #webmachinelearning
02:38:12 <MasaoG> MasaoG has joined #webmachinelearning
02:39:16 <dom> Slideset: victor_slides
02:39:22 <dom> [slide 2]
02:39:59 <Fazio> Fazio has joined #webmachinelearning
02:40:18 <dom> s/Topic: Welcome//]
02:40:22 <dom> RRSAgent, draft minutes
02:40:23 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom
02:40:30 <dom> [slide 3]
02:40:56 <BenGreenstein> BenGreenstein has joined #webmachinelearning
02:41:07 <dom> [slide 4]
02:42:36 <dom> [slide 5]
02:43:32 <dom> [slide 6]
02:44:01 <markafoltz> markafoltz has joined #webmachinelearning
02:44:07 <dom> [slide 7]
02:45:07 <dom> q+ to note that over parametrization is much more invasive than just fingerprinting
02:47:44 <dom> [slide 8]
02:47:44 <dom> [slide 9]
02:54:52 <Fazio> q+
02:54:52 <dom> [slide 10]
02:54:52 <anssik> q?
02:54:52 <anssik> ack dom
02:54:52 <Zakim> dom, you wanted to note that over parametrization is much more invasive than just fingerprinting
02:54:52 <johannhof> johannhof has joined #webmachinelearning
02:54:52 <Fazio> security means a few different things to people with disabilities. It could mean am I entering the information in the right place, is this what I think it is
02:54:52 <kbx> kbx has joined #webmachinelearning
02:54:52 <Amirsh> Amirsh has joined #webmachinelearning
02:54:52 <anssik> dom: thank you for the presentation, I'd like to remind of the fingerprinting issue
02:54:52 <Aung> Aung has joined #webmachinelearning
02:54:52 <kush> q+
02:54:52 <anssik> ... I don't want a site to know a user is pregnant or is visiting Japan at the moment, for example
02:54:52 <anssik> ... privacy issue is quite substantial
02:54:52 <anssik> Victor: fingerprinting is not the only thing this feature would be used for, but other malicious usage scenarios too
02:54:52 <anssik> q?
02:54:52 <anssik> ack Fazio
02:54:52 <ohmata> ohmata has joined #webmachinelearning
02:54:52 <msw> (not audible to remote participants)
02:54:52 <anssik> Fazio: we did user research on an American fast-food chain, users had not done any purchased online because security concerns, s/he wasn't sure that the information is inserted in the right fields, confidence that the user is doing the right thing is a consideration
02:54:52 <tomayac> s/purchased online/purchase online
02:54:52 <phillis> phillis has joined #webmachinelearning
02:54:52 <nournabil> nournabil has joined #webmachinelearning
02:54:52 <dom> anssik: we can make a comparison with keeping a hand on the steering wheel in a self-driving car
02:54:58 <Mark_Foltz> q+
02:54:58 <Em> Em has joined #webmachinelearning
02:54:58 <kbx> q+
02:54:58 <dom> victor: conversely, people get more and more comfortable with a self-driving car over time
02:54:58 <Fazio> Data annotation is crucial
02:54:58 <anssik> q?
02:54:58 <dom> ... but we're at the start of the journey and so it's important to make sure the user stays involved
03:21:48 <RRSAgent> RRSAgent has joined #webmachinelearning
03:21:49 <RRSAgent> logging to https://www.w3.org/2025/11/11-webmachinelearning-irc
03:22:11 <dom> Subtopic: Managing action specific permissions
03:22:51 <anssik> Anssi: issue #44
03:22:51 <gb> https://github.com/webmachinelearning/webmcp/issues/44 -> Issue 44 Managing action specific permissions (by khushalsagar) [Agenda+]
03:23:01 <anssik> ... we opened this separate issue to discuss the permission model, informed by the TAG feedback in #35
03:23:02 <gb> https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+]
03:23:17 <anssik> ... Khushal's initial proposal:
03:23:37 <anssik> ... - The browser manages a global: "can you be agentic on this site" permission.
03:24:01 <anssik> ... - Action specific permissions. Say an action is destructive (deletes some files). The site would likely want user consent before that action is taken.
03:24:17 <anssik> ... the question is whether and how to persist the action specific permissions per site?
03:24:18 <kush> q+
03:24:32 <anssik> ... tools are ephemeral from the browser's perspective, but NOT ephemeral from the site's perspective
03:24:47 <anssik> ... consider these 2 states are indistinguishable from the browser (or MCP client's) perspective
03:24:55 <anssik> ... the site (or MCP server) no longer provides this tool
03:25:01 <anssik> ... the tool is currently disabled
03:25:56 <dom> kush: the main issue I landed on: for any kind of sensitive action, outlining clearly who is responsible for gaining user content (browser/agent vs site)
03:25:57 <npm9> npm9 has joined #webmachinelearning
03:26:01 <Victor> dom I have sent the document to you
03:26:16 <dom> ... if the agent/browser doing it, it's going to be non-determistic since it will depend on interpretation by the agent
03:26:26 <dom> ... conversely, the site knows determistically which tool is sensitive
03:27:08 <dom> ... but we also want to avoid double-prompting, so making the browser the default consent dialog source
03:27:17 <anssik> q?
03:27:18 <dom> ... and have tools identify if they require user consent
03:27:22 <anssik> ack kush
03:28:19 <dom> ... what string should show up on the consent dialog? coming deterministically from the site or generated by the agent?
03:28:24 <matatk> matatk has joined #webmachinelearning
03:28:26 <johannhof> q+
03:28:28 <dom> ... can this be declarative or would this depend on tool execution?
03:29:21 <dom> ... elicitation is for cases where a complex UI needed to continue the task (e.g. a payment transaction)
03:29:23 <anssik> q?
03:29:35 <dom> ... don't have a sketch of an API yet
03:29:50 <sushraja> q+
03:29:56 <anssik> ack johannhof
03:30:00 <hta> hta has joined #webmachinelearning
03:30:43 <dom> kush: a browser could refuse to execute WebMCP on abusive web sites
03:30:55 <dom> johannhof: this should be discussed explicitly in the threat model
03:31:04 <dom> ... some discussion of preventing elicitation to happen
03:31:24 <anssik> q?
03:31:29 <anssik> ack sushraja
03:31:53 <dom> sushraja: if you're thinking of persistence, we need an identity for tool which we don't have today - maybe a hash for the description?
03:32:01 <dom> ... the user may want to allow now or always
03:32:27 <dom> kush: if this is delegated to the browser, the tool may not need to care
03:32:50 <anssik> q?
03:33:53 <dom> sushraja: part of the question is what guarantees would tied to that hash/id
03:34:03 <anssik> q?
03:34:50 <kush> WebMCP needs an API for the site to request browser provided consent flows for each tool execution
03:35:02 <shisama> shisama has joined #webmachinelearning
03:36:13 <Tarek> q+
03:36:32 <anssik> dom: providing browser prompts on the site might complicate the UX, example is geolocation API where the web site should explain when it needs access
03:36:33 <anssik> q?
03:37:21 <anssik> ... maybe a boolean API works, any information under control of the website blurs the boundarary
03:37:40 <anssik> johannhof: website generated text is not allowed in prompts, a Chrome policy
03:37:48 <anssik> q?
03:38:21 <anssik> ... difference between how you give our this information and click a button
03:38:44 <anssik> dom: consent dialog and PII, is scarier
03:38:45 <anssik> q?
03:39:13 <anssik> kush: if you share PII that is browser-driven, this is about site exposing a tool and the user agreeing with the agent's action
03:39:28 <anssik> ... the site gets information it otherwise would not have access to
03:39:59 <anssik> reillyg: I think I lost track of what we try to protect against?
03:40:24 <anssik> ... the site decides what is requires and built UX for it, "sure you want to pay for this" needs no browser UX
03:40:35 <anssik> ... what is the threat model we want to push this to browser UX?
03:40:54 <anssik> kush: UX reasons, one was minimizing context switches, no need to foreground the tab
03:41:02 <anssik> ... another establishing a clear responsibility
03:41:16 <anssik> reillyg: we have 3 regions in the UI, content area, browser chrome and agent UX
03:41:53 <anssik> ... question is, if we ask for consent, but the consent question in the agent UX area different from browser UX
03:42:19 <anssik> ... is asking question at the bottom of the conversation the same as if it'd be an alert dialog in an omnibox
03:42:35 <anssik> johannhof: personally I think all should be consistent agent UX
03:42:36 <anssik> q?
03:42:54 <anssik> q?
03:43:22 <anssik> dom: clear threat models would help here
03:43:33 <anssik> johannhof: a lot of this is delegated to the user agent already
03:43:35 <anssik> q?
03:44:00 <anssik> ... no strong definition how these UI elements are shown exactly
03:44:02 <anssik> ack Tarek
03:44:13 <anssik> Tarek: confused about prompt granularity
03:44:23 <anssik> ... not going to show 10 different permission prompts
03:44:37 <anssik> ... full scenario for using an agent on a web page, shopping on a webpage
03:44:45 <Em> Em has joined #webmachinelearning
03:45:19 <anssik> ... is there a protocol that, if I'm on a website these can be called 100s times
03:45:21 <anssik> q?
03:45:28 <sushraja> q+
03:46:17 <anssik> kush: analogue "cookie prompts"
03:47:01 <kush> q+
03:47:10 <anssik> johannhof: all the agent makers must be good at these probabilistic protections to ship products
03:47:19 <anssik> Tarek: you trust the origin?
03:47:20 <markafoltz> markafoltz has joined #webmachinelearning
03:47:20 <anssik> q?
03:47:48 <Em> q+
03:48:04 <anssik> Fazio: a journey map visualization to help illustrate the flow?
03:48:05 <anssik> q?
03:48:29 <dom> Zakim, close the queue
03:48:29 <Zakim> ok, dom, the speaker queue is closed
03:48:29 <anssik> q?
03:48:38 <anssik> q?
03:49:00 <anssik> q?
03:49:10 <anssik> q?
03:49:13 <anssik> ack kush
03:49:59 <anssik> q?
03:50:38 <anssik> q?
03:51:19 <kush> q?
03:52:20 <sushraja> https://github.com/webmachinelearning/webmcp/issues/44
03:52:20 <gb> https://github.com/webmachinelearning/webmcp/issues/44 -> https://github.com/webmachinelearning/webmcp/issues/44
03:52:30 <dom> PROPOSED RESOLUTION: WebMCP needs a threat model to evaluate the role of browser mediated consent for tool execution
03:52:35 <sushraja> Please leave open issues and continue to discussion here https://github.com/webmachinelearning/webmcp/issues/44
03:52:46 <Em> these types of conversations about consent of a user for an agent actions is already happening in many other standards organizations. WebMCP can leverage OAuth discssions from IETF where the larger MCP community is working through these same concerns of what the agent is allowed to do on behalf of an user
03:52:49 <anssik> RESOLUTION: WebMCP needs a threat model to evaluate the role of browser mediated consent for tool execution.
03:53:08 <cfredric> cfredric has joined #webmachinelearning
03:53:25 <anssik> Subtopic: WebMCP accessibility, ARIA mapping via Declarative API
03:53:41 <matatk> matatk has joined #webmachinelearning
03:53:43 <anssik> Anssi: PR #26
03:53:45 <gb> https://github.com/webmachinelearning/webmcp/pull/26 -> Pull Request 26 add explainer for the declarative api (by MiguelsPizza) [Agenda+]
03:53:47 <Tarek> Em: agreed, also wonder the duplication from the ai agents working grooup
03:54:04 <anssik> ... Brandon noted "aria-label and aria-description and others for labelling and describing elements and so it might be helpful to define WebMCP mappings/behaviors for these instead of introducing entirely new HTML attributes."
03:54:23 <matatk> q+
03:54:31 <dom> Zakim, reopen the queue
03:54:31 <Zakim> ok, dom, the speaker queue is open
03:54:33 <dom> q+ matatk
03:54:40 <dom> queue=matatk
03:54:43 <anssik> ack sushraja
03:54:45 <anssik> ack Em
03:54:47 <anssik> q?
03:55:13 <dom> alex: WebMCP describes the tools in JS
03:55:23 <dom> ... the declarative API is a proposal for a pure HTML version of MCP
03:55:50 <dom> ... rather than declare them in JS, we declare them in HTML, and the different form field are all comprised in the markup
03:56:17 <dom> ... e.G. with a tool-name attribute and a tool-description, and the rest of the schema for the tool is inferred from the children elements, including aria attributes
03:56:35 <dom> ... still lots of questions about the execution of the tool would do
03:56:52 <anssik> q?
03:57:12 <dom> .... current proposal is execute the form, with the expectation that the server would respond with JSON based on the presence of an HTTP header indicating the execution in a tool context
03:57:33 <dom> ... with some challenges in how to handle HTTP and redirects in that context
03:57:51 <anssik> ack matatk
03:57:51 <dom> ... but it's interesting to repurposing accessibility annotations in that context
03:58:21 <dom> matatk: APA is happy to look at this pull request - we haven't done that yet
03:58:50 <dom> ... what's really important is to ensure that semantics that are added are useful to people
03:58:51 <RobKochman_> RobKochman_ has joined #webmachinelearning
03:59:00 <dom> ... some of it targets for screen readers, but not just them
03:59:02 <LeoLee> q+
03:59:10 <dom> ... we had a breakout about making some of these annotations for machine
03:59:23 <dom> ... which can then help e.G. for cognitive disabilities
03:59:30 <dom> ... e.g. to help navigation destinations
04:00:13 <dom> ... there are other things where we see overlap between Agents & accessibility: in many case, accessibility needs to increase machine interpretability for assistive technologies
04:00:21 <anssik> q?
04:00:42 <dom> ... but it's important to make sure the aria attributes used in the agent context end up overloading with information people
04:00:53 <anssik> ack LeoLee
04:01:20 <reillyg> q+
04:01:34 <dom> LeoLee: there might be web sites that abuse aria attributes to "escape" from agents
04:01:38 <anssik> q?
04:01:45 <dom> ... and thus deteriorate the accessibility experience
04:01:54 <kush> q+
04:02:08 <dom> alex: this should be opt-in to avoid overloading models
04:02:18 <anssik> ack reillyg
04:02:23 <dom> ... e.G only expose tools expose with mcp annotations
04:02:36 <dom> reillyg: I like the idea of trying to describe things better to make assistive agents more useful
04:03:03 <dom> ... in terms of declarative vs imperactive, the former should be focused on explaining to the agent how to interact with the web site through the normal human interface
04:03:40 <dom> ... what makes me nervous is the assumption the server should respond with a specialized machine readable output, it should be the regular result of the underlying normal interaction
04:03:52 <anssik> q?
04:03:55 <anssik> ack kush
04:04:06 <dom> ... it should be grounded on the UI and human workflow of the site; for a JSON-based approach, the imperative approach feels appropriate
04:04:14 <anssik> q?
04:04:25 <dom> kush: + 1 to avoid that divergence between UI and response
04:05:15 <anssik> q?
04:05:32 <dom> ... also, if this an opt-in approach, we should provide ways to describe MCP desriptions independently of ARIA to avoid polluting the screen reader experience
04:05:47 <dom> RRSAgent, draft minutes
04:05:49 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom
04:06:17 <dom> kush: open question on whether they should ship together
04:07:03 <hagio> hagio has left #webmachinelearning
04:43:46 <acomminos> acomminos has joined #webmachinelearning
05:05:37 <LeoLee> LeoLee has joined #webmachinelearning
05:05:44 <Aung> Aung has joined #webmachinelearning
05:05:55 <sushraja> sushraja has joined #webmachinelearning
05:08:17 <sa-takagi> sa-takagi has joined #webmachinelearning
05:13:13 <ErikAnderson> ErikAnderson has joined #webmachinelearning
05:13:22 <ErikAnderson> present+
05:13:39 <Mike_Wyrzykowski> Mike_Wyrzykowski has joined #webmachinelearning
05:14:24 <kbx> kbx has joined #webmachinelearning
05:16:59 <Tarek> Tarek has joined #webmachinelearning
05:18:05 <thelounge> thelounge has joined #webmachinelearning
05:18:48 <thelounge> thelounge has left #webmachinelearning
05:19:19 <nehasapre> nehasapre has joined #webmachinelearning
05:20:08 <Em> Em has joined #webmachinelearning
05:20:14 <anssik> Subtopic: Define the API for in-page Agents to use a site's declared tools
05:20:24 <anssik> Anssi: issue #51
05:20:24 <gb> https://github.com/webmachinelearning/webmcp/issues/51 -> #51
05:20:30 <sushraja> Present+ Sushanth Rajasankar
05:20:32 <kush> kush has joined #webmachinelearning
05:20:38 <anssik> ... currently we have an API to declare tools to site, "the registry":
05:20:40 <kush> present+ Khushal_Sagar
05:20:42 <sushraja> Present+ Sushanth_Rajasankar
05:20:45 <anssik> -> modelContext API https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md#modelcontext
05:21:01 <phillis> phillis has joined #webmachinelearning
05:21:02 <anssik> Anssi: we need also an API for an agent _embedded on the site_ to use these tools, and think browser can help mediate which agent is accessing the site's functionality at a time
05:21:09 <anssik> ... notably, here the agent is running within the same origin as the site
05:21:25 <anssik> ... this issue was informed by issue #43, rephrasing Alex's thinking there:
05:21:26 <gb> https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j)
05:21:46 <anssik> ... - "WebMCP server" is the `navigator.modelContext` object that acts as the registry which tools are declared on
05:22:05 <anssik> ... - "WebMCP client" is the API for listing, calling, and listening for tool changes callable by an agent _embedded on the site_, e.g.:
05:22:14 <anssik> ```
05:22:14 <anssik> navigator.modelContext.listTools
05:22:14 <anssik> navigator.modelContext.executeTool
05:22:14 <anssik> navigator.modelContext.onToolListChanged
05:22:14 <anssik> ```
05:22:24 <anssik> -> https://github.com/webmachinelearning/webmcp/issues/43#issuecomment-3478492067
05:22:25 <gb> https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j)
05:22:38 <anssik> q?
05:22:57 <MasaoG> MasaoG has joined #webmachinelearning
05:23:06 <dom> kush: this is a derivative discussion from #43 about the scope of the API
05:23:28 <dom> ... when people are embedding 3rd party library to embed agents on their site (à la zendesk)
05:23:37 <dom> ... it would be convenient to make that integration easier
05:23:53 <johannhof> johannhof has joined #webmachinelearning
05:24:20 <dom> ... initially, there was no support to integrate this for lack of good use case
05:24:21 <dom> q+
05:24:24 <anssik> q?
05:25:11 <nournabil> nournabil has joined #webmachinelearning
05:25:30 <anssik> q?
05:25:40 <anssik> ack dom
05:25:54 <anssik> dom: this feels like a convenience function?
05:26:20 <anssik> kush: exactly, to avoid browsers jumping at each other
05:26:56 <anssik> ... say "I want to start executing tools and if another agent is interacting with the site I can avoid stomping on each other"
05:27:21 <anssik> dom: how about if you provide WebMCP we recommend you record your server in a global function
05:27:40 <kbx> kbx has joined #webmachinelearning
05:27:46 <anssik> ... some other groups are recommending exposing standard library as an SDK
05:28:00 <anssik> johannhof: this seem pretty big tangent, a separate proposal for the group?
05:28:29 <anssik> reillyg: if we decide there's explicit way to expose tools, then we'd include a mitigation in the browser feature
05:28:46 <anssik> johannhof: a separate proposal for integration with the Prompt API warrants its own issue
05:28:47 <anssik> q?
05:28:57 <acomminos> acomminos has joined #webmachinelearning
05:29:05 <anssik> kush: there's probably a lighter-weight way to deal with this problem?
05:29:39 <anssik> ... anything mediated by the browser, API light weight as "external agent want to connect to you" and the site can terminate that connection
05:30:00 <anssik> ... someone declares the tools and by doing this we ensure there's only one task using the tool at a time
05:30:01 <anssik> q?
05:30:24 <anssik> dom: only change to core WebMCP would be if the tool can or cannot run concurrently
05:31:12 <anssik> reillyg: web developers can proceed when the initial tool call via a button press is completed, and resume with a new call then
05:31:32 <anssik> q?
05:32:04 <dom> anssik: we should define the various flavours of agents (incl e.g. in-site agents)
05:32:38 <dom> reillyg: the type of agents I can imagine are browser-agents and same-site agents
05:33:17 <anssik> dom: how about agent connected to the site?
05:33:39 <anssik> reillyg: browser-integerated agent, OS-provided agent, in-page agent
05:33:40 <anssik> q?
05:33:54 <anssik> dom: agent that specialized on one origin only
05:34:07 <anssik> reillyg: extension developers could do an origin-expert agents
05:34:31 <anssik> s/an origin-/a origin-
05:34:40 <johannhof> q+
05:34:46 <anssik> dom: security consideration are different for different flavors
05:35:52 <kush> kush has joined #webmachinelearning
05:35:55 <Steven> Steven has joined #webmachinelearning
05:36:00 <dom> johannhof: feels beyond the scope of what an MVP for WebMCP needs to address, except to the extent it affects WebMCP itself (e.g. a connect function to help manage multi connections)
05:36:20 <Em> Em has joined #webmachinelearning
05:36:48 <anssik> q?
05:36:52 <anssik> ack johannhof
05:37:28 <dom> PROPOSED RESOLUTION: revisit that issue (likely in a different proposal) if there is a need for browser mediated interactions between browser agents and in-site agents - no clear need to address this in WebMCP atm
05:38:10 <Em> fyi created an issue about identity within WebMCP. I think many of the current conversations around consent prompting to users and what agents have access to can be resolved by the agent not acting *as* the user, but using OAuth for the agent to act *on behalf of* the user. https://github.com/webmachinelearning/webmcp/issues/54
05:38:10 <gb> https://github.com/webmachinelearning/webmcp/issues/54 -> Issue 54 Challenging assumptions of Identity within WebMCP (by EmLauber)
05:38:15 <dom> RESOLUTION: revisit that issue (likely in a different proposal) if there is a need for browser mediated interactions between browser agents and in-site agents - no clear need to address this in WebMCP atm
05:38:20 <anssik> Subtopic: Should we support cross-origin Agents across frame boundaries
05:38:29 <anssik> Anssi: issue #52
05:38:30 <gb> https://github.com/webmachinelearning/webmcp/issues/52 -> Issue 52 Should we support cross-origin Agents across frame boundaries (by khushalsagar) [Agenda+]
05:38:42 <ningxin> ningxin has joined #webmachinelearning
05:38:42 <anssik> ... current thinking is tools can only exist on the top-level browsing context:
05:38:53 <anssik> ... "Only a top-level browsing context, such as a browser tab can be a model context provider."
05:38:59 <anssik> -> https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md#understanding-webmcp
05:39:19 <anssik> Anssi: this design does not support the following use cases:
05:39:31 <anssik> ... - a webpage which embeds an Agent
05:39:37 <anssik> ... - an Agent which embeds a cross-origin webpage and wants to access its WebMCP functionality
05:39:41 <anssik> ... proposal to consider a permission policy allowlist for WebMCP
05:39:45 <anssik> -> https://www.w3.org/TR/permissions-policy/#allowlist
05:39:53 <mtavenrath> mtavenrath has joined #webmachinelearning
05:40:11 <anssik> Anssi: question on granularity: "this origin is allowed to see all tools", "this origin is allowed to see a subset of tools: X, Y and Z"
05:40:11 <kush> q+
05:40:38 <anssik> ack kush
05:41:42 <dom> kush: this is another derivative of #43; this is about an agent embedded in a cross-origin iframe
05:41:42 <gb> https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j)
05:41:46 <Mark_Foltz> q+
05:42:00 <anssik> ack Mark_Foltz
05:42:29 <dom> ... the question is whether the browser need to provide a shared interface instead of relying e.g. on ad-hoc postMessage communication
05:42:35 <kbx> q+
05:43:18 <dom> kush: should we allow embedded iframe to expose tools to the browser-agent? separately, should the embedder be able to access embedded tools?
05:43:36 <tomayac> q+
05:43:41 <dom> johannhof: for the former, part of the question is whether the user has the mental model to understand what happens
05:43:54 <Em> q+
05:44:11 <dom> ... that's an assumption we've made for permissions
05:44:41 <dom> ... imagine a locateStore tool - this should be available to the agent seamlessly
05:44:43 <ErikAnderson> q+
05:44:48 <sushraja> q+
05:44:55 <dom> kush: the challenge is to enforce origin isolation for data provided to the tool
05:45:50 <dom> ... this could be something the embedder could allow or disallow
05:46:01 <dom> johannhof: if so, I think it should be disallowed by default
05:46:03 <anssik> ack kbx
05:46:36 <dom> kbx: +1 on disallow by default
05:46:56 <dom> ... #43 was talking about an in-site agent, which it feels out of scope
05:46:56 <gb> https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j)
05:47:34 <anssik> ack tomayac
05:48:19 <dom> tomayac: re iframe, some pages (e.G. shopping pages) advertise for themselves, via an iframe with a different origin
05:49:27 <dom> ... it would be problematic if these types of embedding were perceived as a separate and not accessible by the agent
05:50:23 <dom> ... e.g. if the agent couldn't invoke a tool provided by the tool
05:50:28 <anssik> ack Em
05:50:30 <dom> ack em
05:51:00 <dom> emily: work at MS Identity - one part missing is authentification of the pages
05:51:24 <anssik> q?
05:51:39 <dom> ... we should ensure the agent should only have access to authenticated pages for that user
05:52:13 <dom> ... incl scenario where federated authentication is used
05:52:42 <dom> ... to avoid leakage of sensitive data
05:53:04 <dom> kush: from the WebMCP API surface, there is no distinction on the state of authentication
05:53:28 <dom> ... in terms of data-leakage, origin isolation needs to be applied in any case even without embedding
05:55:29 <dom> em: if WebMCP calls a tool that interacts with an authenticated backend, then ensuring limits around authentication context matters
05:56:01 <dom> reillyg: tools are always calling JS functions in the page (and thus inherit the authentication status of the page)
05:57:08 <dom> .... there are separate proposals to be able to shift the caller of the tool from the agent in the browser to a server-side agent with its context, incl identity (e.g. to allow chatgpt to continue an operation started in a browser agent context)
05:57:15 <anssik> ack ErikAnderson
05:57:15 <dom> ... but that's outside the scope of MCP
05:57:22 <sushraja> q
05:57:26 <sushraja> q+
05:57:37 <acomminos> acomminos has joined #webmachinelearning
05:57:37 <Em> Em has joined #webmachinelearning
05:58:09 <dom> ErikAnderson: MS Teams has a complex architecture for their own tabbing infrastructure, incl via iframes
05:58:50 <anssik> ack sushraja
05:58:53 <dom> ... if we use a permission policy mechansim, it would be interesting to see if this could or should be attached to actual visibility of the content
05:59:24 <dom> johannhof: or allowing to toggle permission on and off dynamically
05:59:46 <dom> sushraja: the problem with iframes is the risk of tool naming collisions
06:00:12 <dom> ... the top-level frame could delegate tool "focus" to a given iframe dynamically
06:00:31 <anssik> q?
06:01:11 <dom> dom: this points out the need for the top-level frame to be the coordinator
06:02:09 <dom> kush: the problem of name collision exists in multi-tab scenarios
06:02:26 <dom> sushraja: I've assumed we were working only tab-in-focus
06:03:08 <dom> johannhof: I'm not too worried about the multi-tab situation where the agent should be able to make distinction
06:04:09 <dom> kush: should we punt on embedded iframes for now? or make this UA dependent?
06:05:24 <dom> johannhof: is there ways to detect name collision in the current spec?
06:05:50 <dom> kush: name collision across multiple services is bound to happen, with or without the Web
06:06:37 <dom> ErikAnderson: is there a concrete use case to help drive this conversation?
06:06:39 <anssik> q?
06:07:42 <dom> Mark_Foltz: maybe polyfilling with postMessage to get a better sense of the need and potential API shape
06:07:56 <dom> dom: if so, would we treat same-origin and cross-origin differently?
06:08:06 <dom> johannhof: I don't see why not allow same site
06:08:22 <dom> sushraja: we would be facing name collision
06:09:08 <dom> johannhof: I think we should actually go with a permission policy that would work in x-origin context
06:09:48 <dom> kush: again, the need for agents to disambiguate across name collisions is something that agents will need to fix
06:10:20 <dom> leo: even the same-origin is not necessarily anchored in use case
06:11:35 <dom> johannhof: ultimately, whether first-party sites do the tools themselves or delegate to same- or x-origin iframes shouldn't be relevant
06:12:04 <dom> kush: I think the main question is of implementation cost; let's give a bit more time figure implementation challenges before proceeding
06:12:14 <dom> anssik: we should also make sure to get more johannhof's time in our calls
06:12:46 <dom> Topic: Built-in AI APIs Overview
06:12:49 <dom> Subtopic: Prompt API
06:12:53 <dom> Slideset: https://docs.google.com/presentation/d/1cpPcxK25UB9Zf6_5USLKetje57jLtc6ZwRwC5ScnCdY/edit?slide=id.g3a175b287ca_3_0#slide=id.g3a175b287ca_3_0
06:13:20 <Jingyun> Jingyun has joined #webmachinelearning
06:13:21 <dom> [slide 3]
06:13:30 <tako> tako has joined #webmachinelearning
06:13:35 <Tarek> Tarek has joined #webmachinelearning
06:14:06 <dom> [slide 4]
06:14:09 <acomminos> acomminos has joined #webmachinelearning
06:14:43 <dom> [slide 5]
06:15:31 <dom> [slide 6]
06:18:34 <Em> Em has joined #webmachinelearning
06:19:11 <dom> MikeWasserman: the rest of the slides are more in-depth look at technical topics - we might need to agenda-bash which topics to focus on today
06:19:49 <dom> Anssi: reillyg and I have identified a few issues for Prompt and Writing Assistance
06:20:14 <sa-takagi> sa-takagi has joined #webmachinelearning
06:22:29 <anssik> Topic: Prompt API
06:22:35 <anssik> Subtopic: TAG design review
06:22:38 <anssik> -> TAG design review: Prompt API https://github.com/w3ctag/design-reviews/issues/1093
06:22:38 <gb> https://github.com/w3ctag/design-reviews/issues/1093 -> CLOSED Issue 1093 Prompt API (by domenic) [Progress: in progress] [Venue: WebML CG] [Resolution: lack of consensus] [Topic: Machine Learning] [Focus: API design (pending)] [Focus: Web architecture (pending)] [Focus: Internationalization
06:22:38 <gb> … (pending)]
06:22:43 <anssik> Anssi: we requested TAG design review in May and received a review response in August
06:22:46 <anssik> ... however, the initial TAG review response was withdrawn due to inaccuracies
06:22:54 <dom> reillyg: in terms of the brand new TAG review, there are concerns about the locality of the models - feedback from developers seems to indicate they're good enough
06:23:11 <anssik> Anssi: we received new review feedback 40 minutes ago
06:23:31 <dom> ... with the caveat that the APIs are only provided on devices with sufficient performance characteristics - question of whether we can provide a fallback for lower end deviecs
06:23:57 <dom> ... a question on cost of computing - concerns about site abusing the user system resources
06:24:07 <dom> ... as the crypto craze demonstrated in the past
06:24:33 <dom> ... this may be something that browser implementors should work towards detect abuse of computing power
06:24:53 <dom> ... A really good question on whether or not we should assume the model execute locally
06:25:32 <dom> ... the current API mentions hybrid options, but we should probably clarify we've only done the local option in existing implementations, and review the security and privacy implications of a cloud-based approach
06:25:38 <kbx> q+
06:26:21 <dom> ... in a cloud approach, there may be concerns about resources consumption (network, AI subscription) and possible additional user profiling (e.G. level of subscription the user has access to can be a proxy to their means)
06:27:05 <dom> ... discussion about downloadprogress with all its complexity - TAG suggesting we should make it simpler
06:27:23 <dom> ... and developers also pushing back on having to make the decision to download or not
06:28:24 <dom> ... On Model version and updates - concern about frequency updates (possibly a suggestion to limit those from a browser perspective), but also more importantly, the issues around interop between sessions (and interop between browsers across different type of models)
06:28:41 <dom> ... some positive experience in that regard from early experimentation, but still needs to be confirmed
06:29:44 <dom> ... On Input/Output languages, risks of fingerprinting with a possible solution to restrict the languages to those that fit the user context
06:30:19 <dom> ... Memory management: destroy() being redundant, good question
06:30:31 <dom> ... JSON Schema standardization status
06:31:11 <anssik> q?
06:31:12 <dom> ... On Tool use - more examples needed to help the TAG analyse the spec
06:31:17 <dom> ... likewise for structured output
06:31:23 <sushraja> sushraja has joined #webmachinelearning
06:31:40 <sushraja> +present Sushanth_Rajasankar
06:32:36 <dom> anssik: this is great feedback
06:32:49 <dom> present+ Sushanth_Rajasankar
06:32:54 <sushraja> q+
06:32:59 <anssik> ack kbx
06:33:21 <dom> kbx: re hybrid model support, are there lessons from WebSpeech?
06:33:47 <dom> reillyg: I started reading the TAG review of the WebSpeech API to get a sense of that
06:33:59 <dom> ... it looks like it's still an open question how to deal with that aspect
06:34:29 <dom> dom: we should ensure to work with WebSppech on this
06:35:09 <dom> Tarek: we're looking at making some of our APIs support cloud-based
06:35:25 <dom> ... which raises question about cost and subscription (hence authentication)
06:35:47 <dom> ... if it needs authentication, it should be applicable across different features
06:36:28 <dom> reillyg: developers really want to be able to enforce on-device e.g. for alignment with their privacy/security policy
06:36:34 <RobKochman> q+
06:36:39 <kush> q+
06:36:52 <dom> ... (with a fallback of having themselves determine which cloud service to use)
06:37:34 <dom> Tarek: Apple Intelligence has a local vs private compute
06:38:04 <anssik> q?
06:38:10 <dom> reillyg: we would need to validate whether this trusted environment on cloud would match their needs from a privacy/security perspective
06:38:59 <dom> tarek: access to a trusted environment needs a key exchange, which the browser doesn't necessarily have access to
06:40:01 <kbx> flower.ai
06:40:05 <dom> ... https://flower.ai/intelligence/ has a JS library with a hybrid approach
06:40:52 <anssik> q?
06:40:57 <anssik> ack kush
06:41:04 <anssik> ack RobKochman
06:41:04 <dom> kush: what's the motivation for the hybrid fallback to happen in the browser vs in a JS lib?
06:41:45 <dom> Rob: to make it easier for developers - make it work on all browsers without consideration of their performance status
06:43:04 <kush> q-
06:43:05 <anssik> dom: if need to integrate two different APIs it makes the DX worse for Prompt API
06:43:30 <sushraja> q-
06:43:38 <dom> rob: the explainer says we want to enable hybrid approaches; nothing we're doing preclude it atm
06:43:39 <anssik> ack sushraja
06:44:03 <dom> kbx: some implementors might choose a hybrid approach, but it's not clear the current API allows for it
06:44:37 <dom> sushraja: re structured output, we throw an exception if the browser doesn't understand a given JSON Schema
06:44:52 <dom> reillyg: we need to be specific about what needs to be supported for interop
06:45:29 <ErikAnderson> ErikAnderson has joined #webmachinelearning
06:45:34 <anssik> q?
06:45:35 <dom> sushraja: re WebSpeech, there are limits in terms how much context get kept - typically split at silent spots
06:45:44 <dom> ... not sure what happens when the limits are hit
06:46:02 <dom> anssik: we'll coordinate the responses
06:46:32 <dom> reillyg: our team will go through the feedback and propose responses/file relevant issues
06:46:58 <dom> anssik: noting there isn't TAG consensus
06:47:26 <anssik> Subtopic: Add image input resizing or tiling options
06:47:30 <anssik> Anssi: issue #133
06:47:30 <gb> Issue 133 not found
06:47:42 <anssik> ... this is another issue from Mike, opened a few months ago so we have some Demenic's comments too from the time before he stapped down from the editor role
06:47:45 <anssik> ... the proposal is as the subject says, to add image input resizing or tiling options
06:47:52 <dom> https://github.com/webmachinelearning/prompt-api/issues/133
06:47:53 <gb> https://github.com/webmachinelearning/prompt-api/issues/133 -> Issue 133 Add image input resizing or tiling options (by michaelwasserman) [Agenda+]
06:47:55 <anssik> ... the request is motivated by the followign use cases:
06:48:05 <anssik> ... - Resizing (the default right now) is useful for an overview of an image, e.g. rough descriptions of photos.
06:48:12 <anssik> ... - Tiling would be useful for analyzing high-resolution details from a larger image, e.g. OCR from a large document.
06:48:20 <sushraja> +q
06:48:24 <anssik> ... Domenic suggested a higher-level knob such as detail: "low" vs. detail: "high" would be more forward looking
06:48:34 <anssik> ... and asked the group to research other model provider APIs
06:48:43 <anssik> q?
06:48:45 <kbx> q+
06:48:55 <anssik> ack sushraja
06:49:05 <Ugur> Ugur has joined #webmachinelearning
06:49:11 <dom> sushraja: today the API exposes the min/max resolution for images, we only normalize to token consumed
06:49:24 <msw> https://github.com/webmachinelearning/prompt-api/issues/84
06:49:24 <gb> https://github.com/webmachinelearning/prompt-api/issues/84 -> Issue 84 Exposing max image / audio limits? (by domenic) [enhancement]
06:49:36 <anssik> q?
06:49:45 <dom> ... but that isn't sufficient for developers to know what they can provide and is supported
06:50:16 <dom> reillyg: the OCR example is also illustrative: you need to tile the image and tile it in a way that's useful for OCR (e.G. avoid breaking over lines)
06:50:34 <dom> ... combining model-specific knowledge and app-specific (e.g. is this vertical or horizontal text)
06:51:06 <dom> ... exposing min/max of images is the only way I can see, but it hardcodes a particular constraint of current models
06:51:43 <dom> tomayac: the developer expectations would be a higher quality image leads to better OCR, but if a low quality image is already higher than what the model will be resizing to, it breaks a logical assumption
06:52:28 <dom> reillyg: my only concern is the additional complexity we might regret later
06:52:47 <dom> ... there is a similar issue with audio and the supported sample rate
06:52:54 <anssik> ack kbx
06:53:02 <anssik> q?
06:53:04 <dom> kbx: and the duration of audio as well
06:54:17 <dom> reillyg: if the openAI API is successful with min/max, that's a good sign, but learning more about these properties
06:55:59 <dom> kbx: another item is the number of channels
06:56:03 <sushraja> +q
06:56:15 <anssik> q?
06:56:19 <dom> reillyg: images has width/height/depth/color
06:56:24 <dom> ... also context window size
06:56:38 <anssik> ack sushraja
06:56:51 <dom> tomayac: also the shape into which it may gets transformed might inspire tiling
06:57:32 <big-screen> big-screen has joined #webmachinelearning
06:57:32 <dom> sushraja: I don't think the developers want resolution for OCR - but what's the minimum font size they support for OCR
06:57:49 <dom> ... that may be pretty constant (human readable font size)
06:58:03 <dom> kbx: that might be discoverable through testing on well-known content
06:59:22 <anssik> RESOLUTION: The group will research other model provider APIs for image input resizing or tiling options. Also consider audio input.
06:59:38 <dom> RRSAgent, draft minutes
06:59:39 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom
07:16:05 <acomminos> acomminos has joined #webmachinelearning
07:24:49 <acomminos> acomminos has joined #webmachinelearning
07:30:07 <sushraja> sushraja has joined #webmachinelearning
07:32:30 <hagio> hagio has joined #webmachinelearning
07:35:25 <Tarek> Tarek has joined #webmachinelearning
07:35:57 <acomminos> acomminos has joined #webmachinelearning
07:38:03 <ErikAnderson> ErikAnderson has joined #webmachinelearning
07:39:01 <kbx> kbx has joined #webmachinelearning
07:39:21 <anssik> Subtopic: Tool Use: decouple execution and formalize function calls and responses
07:39:22 <kush> kush has joined #webmachinelearning
07:39:26 <anssik> Anssi: issue #159
07:39:26 <gb> https://github.com/webmachinelearning/webmcp/issues/159 -> #159
07:39:48 <anssik> ... Mike reports the initial Prompt API design integrated JS tool execution within the prompt() call itself
07:39:56 <anssik> ... this design prioritized API simplicity
07:40:09 <anssik> ... the trade-off of this design is it reduces granular control and direct LLM interactions
07:40:21 <dom> dom has left #webmachinelearning
07:40:28 <anssik> ... to address this, Mike suggests to realign initial tool use integrations with Prompt API objectives, three points:
07:40:43 <anssik> ... - (1) "Provide essential Language Model tool types: i.e. Function Declarations (FD), Function Calls (FC), and Function Responses (FR)"
07:40:46 <kbx> Actual 159 issue is here: https://github.com/webmachinelearning/prompt-api/issues/159
07:40:46 <gb> https://github.com/webmachinelearning/prompt-api/issues/159 -> Issue 159 Tool Use: decouple execution and formalize function calls and responses (by michaelwasserman) [tools] [Agenda+]
07:40:52 <anssik> ... - (2) "Offer fine-grained client control over tool execution loops used for agentic integrations."
07:41:05 <anssik> ... - (3) "Align with patterns established by major LLM APIs (OpenAI, Gemini, Claude)."
07:41:09 <anssik> ... the motivation for the three:
07:41:18 <anssik> ... (1) is used by API clients to inspect, reconstruct, and test session history
07:41:26 <anssik> ... (2) enables clients to define the looping patterns, error handling, limits, etc.
07:41:33 <anssik> ... (3) empower clients to use the Prompt API more interchangeably with server-based APIs.
07:41:56 <anssik> ... about the proposal
07:42:00 <anssik> ... currently the Prompt API using a closed-loop model
07:42:05 <anssik> ... where the browser process operates as a hidden agent, looping on (model prediction → tool execution → response observation) until a final text response was ready
07:42:10 <anssik> ... proposal to move to API-centric, open-loop model for tool execution where:
07:42:18 <anssik> ... - prompt() method returns a structured Function Call (FC) object to the client
07:42:22 <anssik> ... - Function Call (FC) object manages the execution and Function Response (FR) feedback loop
07:42:28 <anssik> ... this maximizing developer control and observability
07:42:32 <anssik> ... Function Declarations (FD) also need not provide execute functions for now
07:42:39 <anssik> ... there's a comparison table between Closed-Loop (Original) and Open-Loop (Proposed)
07:42:43 <anssik> ... Open-Loop is better in Developer Control, Debuggability and Industry Alignment dimensions
07:42:52 <anssik> Mike: working with Microsoft on this proposal
07:43:08 <anssik> ... function calls and responses as first-class APIs
07:43:43 <anssik> ... our concerns were around encapsulating the notion of API clients needing to capture calls to execution functions and turn them into representations for calls and represented later
07:43:57 <anssik> ... what constitutes a response, added to the initial prompt for the session
07:43:58 <anssik> q?
07:44:30 <anssik> Mike: also something to look for are other LMs, web app developers to target Prompt API or cloud-based APIs more easily
07:44:50 <anssik> ... if they're developing and inside agent, we want to make the Prompt API aligned with cloud-based solutions
07:44:58 <anssik> ... some of the intro slides animate this
07:45:32 <anssik> ... also invite Sushant  and Frank to chime in, we've worked together on this
07:45:51 <anssik> ... feedback from this group would be helpful
07:46:03 <anssik> ... we want to structure the API so that allows good production API around tool use
07:46:04 <anssik> q?
07:46:04 <MasaoG> MasaoG has joined #webmachinelearning
07:47:11 <anssik> [ Mike revisits the Built-in AI APIs Overview slides shared earlier. ]
07:48:10 <anssik> [ Slides shown is the "Tool Use: JavaScript Example" adn "Tool Use: Design alternatives" for a weather tool ]
07:49:01 <anssik> [ Demo polyfill by Nathan ]
07:51:33 <tomayac> Video link https://drive.google.com/file/d/12o1tmtfhvdF1f0JUMgb8ZGeSe-HTg-WR/view?resourcekey
07:51:57 <Ugur> Ugur has joined #webmachinelearning
07:55:56 <anssik> q?
07:56:15 <kush> kush has joined #webmachinelearning
07:56:22 <kush> present+ Khushal_Sagar
07:56:30 <kush> q?
07:56:30 <anssik> RRSAgent, draft minutes
07:56:31 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik
07:57:05 <nournabil> nournabil has joined #webmachinelearning
07:57:11 <anssik> reillyg: one of the reasons this makes sense is the semantics are provided by the developer for the tool and they get complicated if you ask the developer to re-prompt the tool
07:57:21 <kbx> q+
07:57:27 <kush> q+
07:57:34 <anssik> ... to avoid design a complicated API, it is deceptively simple, but the semantics are hiding a lot of detail
07:57:54 <anssik> sushraja: each model has a different way to represent the tool call and we want to avoid model dependency
07:58:05 <Tarek> q+
07:58:38 <anssik> ... open-loop allows model to be replaced with a cloud-based
07:58:52 <anssik> ... only negative is it increases implementation complexity
07:58:54 <anssik> ack kush
07:59:25 <anssik> kush: interestingly fundamentally different, responsibility is on the develop to manage everything that goes to the context window
07:59:40 <anssik> ... once they resolve the tool call may want to modify the call
08:00:04 <anssik> ... there is room to align the syntax for LM tool declaration across both the APIs even if the pattern is not
08:00:04 <anssik> q?
08:00:28 <kush> q+
08:00:44 <anssik> reillyg: there was some discussion, if you provide no tools then model will not use them
08:01:12 <anssik> ... because there's a question for each call to prompt, are you expecting a tool call at that very moment
08:01:32 <anssik> ...that effects what string encoding the underlying implementation will do
08:01:50 <anssik> sushraja: enabling and disabling tool calls between prompts?
08:02:11 <tomayac> q+
08:02:34 <anssik> reillyg: tool call needs to be added to expectedOutput
08:02:38 <anssik> q?
08:02:58 <anssik> sushraja: presence of the tools can be used, so developers don't need to provide the list of tools
08:03:07 <anssik> reillyg: availability API assumes the same options are available
08:03:08 <anssik> q?
08:03:14 <Ugur> Ugur has joined #webmachinelearning
08:03:14 <anssik> ack kbx
08:03:58 <anssik> kbx: we got feedback from authors that in practice there's a dynamic condition where open-loop model helps, if something changes in between the journey so can adapt
08:04:18 <anssik> ... if everything happens without issues closed-loop works as wekk
08:04:23 <anssik> s/wekk/well
08:04:42 <anssik> sushraja: state changes we need to add act differently then between the two
08:04:43 <anssik> q?
08:04:46 <anssik> ack Tarek
08:05:11 <anssik> Tarek: on our side we had issues depending on models, something models did not work, bad accuracy, true for small models
08:05:40 <anssik> ... if we'd use this API on different underlying models on say Edge and Chrome, which API is use so that the tool call works
08:05:58 <anssik> ... the tools are going to make the interop issues harder
08:06:24 <anssik> sushraja: without this API developers will craft a special system prompt instead
08:06:50 <anssik> q?
08:07:29 <anssik> reillyg: proposed API abstracts out some differences, we are working with model development teams to ensure the models can work with tools
08:07:44 <anssik> ... this is a possible interop issue for some models
08:08:12 <anssik> sushraja: do we assume building a workflow that expects LLM to make the tool call?
08:08:30 <anssik> q?
08:08:32 <anssik> ack kush
08:08:51 <Ugur> Ugur has joined #webmachinelearning
08:08:51 <anssik> kush: want to understand how the model's context looks like when a tool call is invoked
08:09:43 <anssik> ... if there's a failure, the developer does not handle the tool calls, do the tools get ignored in the next prompt?
08:10:15 <anssik> reillyg: in the two versions of the polyfills, in closed-loop, input and output streams failures, browser catches the errors
08:10:49 <anssik> in open-loop you get a chuck that instead of text is the error signal
08:10:59 <tomayac> s/chuck/chunk
08:11:15 <anssik> q?
08:11:36 <anssik> kush: ImageBufferSource as input to tool call?
08:12:04 <anssik> Tarek: you can use SHA64 images
08:12:28 <anssik> sushraja: if the model wants to do multiple tool calls, then need two responses and prompt
08:12:44 <anssik> reillyg: in this open-loop model can respond in a way the model is not trained for
08:13:08 <anssik> ... the model is trained so it might not do the right thing
08:13:17 <anssik> ... do we track the model expect a tool call?
08:13:30 <kbx> q+
08:14:25 <anssik> reillyg: in the future models might support async tool calling
08:14:28 <anssik> ack tomayac
08:14:43 <anssik> tomayac: footgun-wise, open-loop syntax makes me nervous
08:15:11 <anssik> ... this is recursion, streams, JS developers are not so familiar with these
08:16:17 <anssik> ... non-ergonomics of the open-models is challenging for developers
08:16:34 <anssik> s/open-models/open-loop models
08:16:44 <anssik> q?
08:17:06 <anssik> q?
08:18:24 <anssik> reillyg: chunk type that is a tool call, this is only a property of open-loop design
08:19:27 <anssik> ... there are different flavors of tool calling, in closed-loop model you don't get any chunks
08:19:53 <anssik> sushraja: if you have a closed-loop model, the implementation will silently eat token
08:20:23 <anssik> ... to be able to restore the session need to see all the tokens
08:20:43 <anssik> q?
08:21:00 <anssik> ack kbx
08:22:11 <Tarek> q+
08:22:47 <Tarek> q-
08:23:12 <Ugur> Ugur has joined #webmachinelearning
08:23:26 <anssik> sushraja: also need to think of security implications
08:23:39 <anssik> RESOLUTION: Further solicit feedback on both closed-loop and open-loop models to understand developer ergonomics issues.
08:23:51 <anssik> Topic: Writing Assistance APIs
08:23:56 <anssik> gb, this is webmachinelearning/writing-assistance-apis
08:23:56 <gb> anssik, OK.
08:24:09 <anssik> Subtopic: User Activation requirements for Session Creation cause undue friction
08:24:14 <anssik> Anssi: issue #83 and PR #86 (merged)
08:24:14 <gb> https://github.com/webmachinelearning/writing-assistance-apis/issues/86 -> #86
08:24:14 <gb> https://github.com/webmachinelearning/writing-assistance-apis/issues/83 -> #83
08:24:20 <anssik> ... an issue from Isaac
08:24:31 <anssik> ... Built-in AI APIs currently consume transient user activation when their availability is downloadable
08:24:44 <anssik> ... this adds friction when a site wants to create a Summarizer once the content to be summarized comes into view
08:25:03 <anssik> ... now the site must create the session earlier (think warm up), consume the transient activation, then get user to act on the page again to activate the transient activation again
08:25:19 <anssik> ... transient activation indicates a user has recently pressed a button or performed some other user interaction, and when a transient activation is consumed, it is "deactivated"
08:25:29 <anssik> ... sticky activation by contrast persists until the end of the session to require sticky activation, rather than consuming transient activation
08:25:38 <anssik> ... the proposal is to relaxing the the user activation requirements to require sticky activation
08:25:48 <anssik> Anssi: Reilly asks could we relax this to only require sticky activation for the downloadable state?
08:25:53 <anssik> ... Isaac agrees
08:25:58 <anssik> ... this suggest a similar change to Language Detector
08:26:02 <anssik> ... the PR was reviewed and merged, any comments from the group?
08:26:09 <sushraja> q+
08:26:35 <anssik> reillyg: Translator API excluded because this behaviour is tied up with anti-fingerprinting mechanism regarding model that is downloaded
08:26:43 <anssik> ... consider two models for download controls
08:26:53 <anssik> ... one model for everything and one for developer-specified
08:27:02 <anssik> ... proposal is for APIs where there is only one model
08:27:04 <anssik> q?
08:27:19 <anssik> ack sushraja
08:27:40 <anssik> sushraja: I'd like to see more transparency in which developer scenario this is a real ask to use Prompt API in the background, this competes with TAG feedback
08:27:44 <kbx> q+
08:28:00 <anssik> ... do you get developer feedback this is preferred?
08:28:14 <anssik> ... background tabs playing audio?
08:28:17 <anssik> ack kbx
08:28:34 <anssik> kbx: developers competing to capture the gesture, some get it some don't
08:28:41 <anssik> q?
08:28:47 <anssik> q?
08:29:14 <anssik> Erik: audio players have restrictions if you open a new tab but don't focus they can't play
08:29:31 <anssik> ... if we open 10 new tabs and all call Prompt API at the same time, do you need to wait until the tab is visible?
08:29:33 <anssik> q?
08:29:51 <anssik> q?
08:30:05 <anssik> q?
08:30:40 <anssik> reillyg: this change only impacts the behaviour when the model is downloadable, not when the model has been already downloaded
08:31:02 <anssik> Markus: does the user know how much needs to be downloaded
08:31:23 <anssik> ... what if I accidentially download a lot of data while roaming and the data costs a lot?
08:31:42 <anssik> reillyg: I can imagine a more complex download policy if you're on a metered network to not download
08:31:45 <anssik> q?
08:32:10 <anssik> ... we refuse to do anything if you're not on a situation you can download
08:32:21 <anssik> q?
08:32:51 <anssik> reillyg: if folks have concerns please chime in on the issue
08:32:53 <Tarek> q?
08:33:04 <Tarek> q+
08:33:21 <anssik> Subtopic: Discuss the privacy implications of using a paid cloud option
08:33:24 <anssik> Anssi: issue #84
08:33:25 <gb> https://github.com/webmachinelearning/writing-assistance-apis/issues/84 -> Issue 84 Discuss the privacy implications of using a paid cloud option (by jyasskin) [Agenda+]
08:33:27 <Ugur> Ugur has joined #webmachinelearning
08:33:27 <anssik> ... this issue was filed for the writing-assistance-apis repos, but the issue refers to the Prompt API explainer that makes a general statement that I believe applies across all built-in AI APIs:
08:33:30 <anssik> "Allowing hybrid approaches, e.g. free users of a website use on-device AI whereas paid users use a more powerful API-based model."
08:33:34 <anssik> Anssi: Jeffrey point out here's a risk that this API could reveal to a website whether its user is wealthy (cloud available) or not (local-only).
08:33:37 <anssik> ... Tom suggests this may not be as strong signal of wealth as e.g. device manufacturer information that is already disclosed
08:33:41 <anssik> ... any comments from the group?
08:34:38 <anssik> reillyg: we haven't discussed the cloud option too much
08:34:50 <anssik> Markus: if someone uses cloud do they care about privacy?
08:35:06 <anssik> ... someone sends a request to your model and it returns I'm a cloud version?
08:35:29 <anssik> reillyg: TAG noted even if we don't provide information on client vs cloud it can be inferred from the response received
08:35:30 <anssik> q?
08:35:55 <anssik> RRSAgent, draft minutes
08:35:57 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik
08:36:31 <kbx> q+
08:36:39 <anssik> reillyg: if we as browser vendor want to ensure that the users of any means have access to these models, as a service, then that is an incentive to find a solution
08:36:51 <anssik> ... alternative is to delegate all to the developer
08:37:08 <anssik> sushraja: WebGL/GPU do not have this consideration?
08:37:28 <anssik> kbx: some implemented would do this differently, maybe on server if it is free
08:37:50 <anssik> q?
08:37:56 <anssik> ack kbx
08:38:16 <anssik> Tarek: in Chrome implementation, do you fall back to the model already available on the system?
08:38:26 <anssik> reillyg: Chrome on Android uses the model that ships with Android
08:38:42 <anssik> sushraja: similarly on Windows, we have a plan to contribute the OS provided model
08:39:05 <anssik> Tarek: for cloud option, what if we can us eBring Your Own Model?
08:39:10 <anssik> q?
08:39:13 <anssik> ack Tarek
08:39:48 <anssik> Erik: can of worms, punting may make it harder to response later
08:39:59 <anssik> reillyg: going to the opposite direction to Web Speech API
08:40:13 <anssik> q?
08:40:27 <anssik> q?
08:40:43 <anssik> Topic: Wrap up
08:41:16 <anssik> Anssi: thank you for your active participation and exciting discussions on our incubations, agentic web and built-in AI capabilities!
08:41:22 <anssik> ... similarly to yesterday, interested folks are welcome to join us for a dinner
08:41:26 <anssik> ... the plan would be to again meet in the Portopia Hotel (adjacent to the Kobe International Conference Center) lobby at 18:15 to coordinate on transport and restaurants, more restaurants should be open today than yesterday so more options!
08:41:29 <anssik> ... restaurant options:
08:41:37 <anssik> -> https://www.w3.org/wiki/TPAC/2025/Restaurants
08:41:37 <anssik> -> https://mgifford.github.io/Food-W3C-Kobe/
08:41:44 <anssik> RRSAgent, draft minutes
08:41:45 <RRSAgent> I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik
08:41:48 <hagio> hagio has left #webmachinelearning
10:30:50 <lei_zhao> lei_zhao has joined #webmachinelearning
12:02:08 <sa-takagi> sa-takagi has joined #webmachinelearning
13:36:36 <sa-takagi> sa-takagi has joined #webmachinelearning