00:47:17 RRSAgent has joined #webmachinelearning 00:47:21 logging to https://www.w3.org/2025/11/11-webmachinelearning-irc 00:47:21 RRSAgent, make logs Public 00:47:22 please title this meeting ("meeting: ..."), anssik 00:47:22 Meeting: Web Machine Learning CG F2F – 11 November 2025 00:47:25 Chair: Anssi 00:47:35 Agenda: https://github.com/webmachinelearning/meetings/issues/35 00:47:35 https://github.com/webmachinelearning/meetings/issues/35 -> https://github.com/webmachinelearning/meetings/issues/35 00:47:36 present+ Sushanth_Rajasankar 00:47:39 Scribe: Anssi 00:47:42 scribeNick: anssik 00:47:46 scribe+ dom 00:47:54 gb, this is webmachinelearning/webmcp 00:47:54 anssik, OK. 00:48:00 Present+ Anssi_Kostiainen 00:48:04 Present+ Dominique_Hazael-Massieux 00:48:13 RRSAgent, draft minutes 00:48:15 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik 00:49:20 Jingyun has joined #webmachinelearning 00:49:31 Sun has joined #webmachinelearning 00:49:35 cpn has joined #webmachinelearning 00:49:37 lei_zhao has joined #webmachinelearning 00:49:37 kush has joined #webmachinelearning 00:49:42 present+ 00:49:50 present+ Vincent_Scheib 00:50:43 iahouma has joined #webmachinelearning 00:51:41 present+ 00:51:42 present+ Sun_Shin 00:51:42 Topic: Welcome 00:51:45 phillis has joined #webmachinelearning 00:52:23 present+ 00:53:50 Topic: Welcome 00:53:54 kbx has joined #webmachinelearning 00:53:58 MasaoG has joined #webmachinelearning 00:53:58 Anssi: welcome to the W3C Web Machine Learning CG F2F at TPAC 2025 00:54:02 ... I'm Anssi Kostiainen, Intel, the chair of the CG, also chairing the WG that works closely together with this CG that serves as an incubator 00:54:30 ... as a recap from yesterday: 00:54:30 present+ Reilly_Grant 00:54:52 Dingwei has joined #webmachinelearning 00:55:04 acomminos has joined #webmachinelearning 00:55:08 present+ Christian_Liebel 00:55:11 present+ 00:55:12 markafoltz has joined #webmachinelearning 00:55:15 ningxin has joined #webmachinelearning 00:55:25 present+ Kenji_Baheux 00:55:34 present+ Mark_Foltz 00:55:58 Haili has joined #webmachinelearning 00:56:20 kush has joined #webmachinelearning 00:56:22 ... this CG is a group where new ideas are discussed, explored and incubated before formal standardization 00:56:24 present+ 00:56:39 present+ 00:56:44 present+ 00:56:45 ... past CG spec incubations include e.g. WebNN API, Model Loader 00:56:53 present+ Masao_Goho 00:57:06 ... since last year, we've expanded the scope of the CG 00:57:14 -> WebML CG Charter https://webmachinelearning.github.io/charter/ 00:57:30 Anssi: the CG has added in scope and delivered a number of new incubations 00:57:34 -> WebML CG Incubations https://webmachinelearning.github.io/incubations/ 00:57:59 BenGreenstein has joined #webmachinelearning 00:58:11 johannhof has joined #webmachinelearning 00:58:30 Anssi: we have delivered first versions of new built-in AI APIs: 00:58:34 ... Prompt API 00:58:39 ... Writing Assistance APIs 00:58:46 ... Translator and Language Detector APIs 00:58:52 ... and at the explainer stage we have: 00:58:54 mtavenrath has joined #webmachinelearning 00:58:59 ... WebMCP API 00:59:06 ... Proofreader API 00:59:35 reillyg: can we add a short presentation on the built-in APIs and developers feedback before we dive into individual APIs? 00:59:45 anssik: sure, remind me at the start of that session 01:01:25 ... this CG has grown significantly over the last year similarly to its sister WG 01:01:33 ... obviously the intersection of the web and AI is exciting and this group is the place where the future Web AI experiences are incubated ahead broad market adoption 01:01:49 ... the year-over-year growth rate of this group is around +30% for both organizations and participants, so the diversity is growing too 01:02:08 -> https://www.w3.org/groups/cg/webmachinelearning/participants/ 01:02:27 Anssi: we observe many businesses looking to adopt the technologies developed in this group have joined, this is important as it allows us to capture real-world feedback early on 01:03:16 ... the nature of these "task-specific APIs" is that they're easy to adopt if your requirements match 01:03:26 ... when more control over the experience is required, WebNN API provides the lower-level primitives to build your own 01:03:39 ... if you registered as a CG participant, please join us at the table 01:04:08 ... observers are welcome to join the table too subject to available space 01:04:13 Anssi: we use Zoom for a hybrid meeting experience, please join using the link in the meeting invite 01:04:30 Anssi: we use IRC for official meeting minutes and for managing the speaker queue 01:04:32 q+ 01:04:35 ack anssik 01:04:39 ... please join the #webmachinelearning IRC channel, link in the meeting invite and agenda: 01:04:45 -> https://irc.w3.org/?channels=#webmachinelearning 01:04:45 -> https://github.com/webmachinelearning/meetings/issues/35 01:04:46 https://github.com/webmachinelearning/meetings/issues/35 -> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko) 01:04:46 hagio has joined #webmachinelearning 01:04:58 Anssi: to put yourself on the queue type in IRC "q+" 01:04:58 ... during the introductions round, we'll try to record everyone's participation on IRC with: 01:05:01 ... Present+ Firstname_Lastname 01:05:02 Present+ Thomas_Steiner 01:05:12 ... please check that your participation is recorded on IRC 01:05:12 estade has joined #webmachinelearning 01:05:21 Present+Victor_Huang 01:05:43 Present+ Evan_Stade 01:05:45 Present+ Mike_Wyrzykowski 01:05:46 Present+ Yuta_Hagio 01:05:46 Present+ Victor_Huang 01:05:47 Present+ Markus_Tavenrath 01:05:48 alispivak has joined #webmachinelearning 01:05:53 Present+ Rob_Kochman 01:05:57 RRSAgent, draft minutes 01:05:59 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom 01:06:05 Present+ Victor_Huang 01:06:05 Present+ Sushanth_Rajasankar 01:06:13 Tarek7 has joined #webmachinelearning 01:06:13 Em has joined #webmachinelearning 01:06:13 Present+ Ali_Spivak 01:06:15 Present+ Hyojin Song 01:06:15 RafaelCintron has joined #webmachinelearning 01:06:15 present+ Khushal_Sagar 01:06:15 Present+ Ben_Greenstein 01:06:15 cfredric has joined #webmachinelearning 01:06:15 Present+ Isaac_Ahouma 01:06:16 Present+ Emily_Lauber 01:06:16 Present+ Tarek_Ziade 01:06:17 Present+ Rafael_Cintron 01:06:22 Present+ Haili_Bai 01:06:37 nournabil has joined #webmachinelearning 01:06:52 npm has joined #webmachinelearning 01:06:56 Present+ Chris_Fredrickson 01:07:19 Em has joined #webmachinelearning 01:07:24 rviscomi has joined #webmachinelearning 01:07:31 Subtopic: Agenda bashing 01:07:34 Anssi: the F2F agenda was built collaboratively with CG participants 01:07:45 -> https://github.com/webmachinelearning/meetings/issues/35 01:07:45 https://github.com/webmachinelearning/meetings/issues/35 -> Issue 35 WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) (by anssiko) 01:08:07 Anssi: any last-minute proposals or updates? 01:08:13 Topic: WebMCP 01:08:15 chi has joined #webmachinelearning 01:08:17 gb, this is webmachinelearning/webmcp 01:08:20 anssik, OK. 01:08:24 Subtopic: Intro & demo 01:08:57 Anssi: the WebMCP abstract reads: 01:09:02 ... "Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows." 01:09:44 ... TL;DR unpacking: 01:09:55 Amirsh has joined #webmachinelearning 01:10:06 ... - WebMCP API is a new JS API that allows a web developer to expose their web app functionality as "tools" 01:10:21 ... - web developer is in control what tools the web page exposes and how they function 01:10:42 ... - the web page acts as an MCP Server equivalent, but implements the tools in client-side script, not backend 01:11:10 ... - tools are simply JS functions with associated natural language descriptions and structured schemas 01:11:16 ... - the natural language descriptions allow AI agents to invoke the programmatic "tools" API 01:11:44 ... - there's also a complementary declarative WebMCP API being worked on, reusing ARIA role-* attributes 01:11:45 present+ Amir 01:13:16 Anssi: the WebMCP proposal is a synthesis of two proposals from Microsoft and Google 01:13:21 ... the group has now converged on the initial proposal documented in the explainer and supplementary proposal document that contains API shape and code examples 01:13:26 -> WebMCP explainer https://github.com/webmachinelearning/webmcp/blob/main/README.md 01:13:31 -> WebMCP proposal https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md 01:13:45 Anssi: the group has received early implementation experience and feedback from the OSS community contributors Alex Nahas and Jason McGhee that has been very valuable 01:13:58 -> https://github.com/MiguelsPizza/WebMCP 01:14:03 -> https://github.com/jasonjmcghee/WebMCP 01:14:24 Anssi: both these OSS projects have explored the problem space ahead browser implementations and without some of the constraints of the browser implementation 01:14:27 ... both Alex and Jason are active participants of the group 01:15:04 Kush: it was a good surprise that the GOogle and MS team had very similar APIs in mind, good signal for future convergence 01:15:43 ... we've started with a very simple API shape, but as we're advancing, it looks like browsers will have an important role to play as a trusted mediator 01:15:59 ... we want to make sure developers have control about how their site can be used by agents 01:16:18 ... also nice to see interest from the community in this space 01:17:03 sushraja: this started from seeing the enthusiasm of the community around MCP, and we were pleasantly surprised to see Google with a similar proposal 01:17:18 ... the explainer is still very open to changes 01:17:33 ... lots still need to be figured out in terms of API shapes, capabilities, security, ... 01:17:47 ... very open to feedback, in particular from potential users of the API 01:19:10 -> https://screen.studio/share/hbGudbFm WebMCP video demo 01:19:13 scribe+ 01:23:53 Anssi: in this demo Alex demonstrates a user interacting with an AI agent built into the browser 01:23:57 ... Alex has a conversation with an an agent using a voice interface and asks the agent to search for specific information (WebMCP explainer) and send a short summary of this information via email to a specific email address 01:24:01 ... in the another demo, Alex uses the Prompt API integration 01:24:05 ... the demo uses determistic API access to tools provided by the website and does not do any DOM parsing or computer vision-based image processing of screenshots 01:24:10 ... that means secure, auditable 01:24:14 ... in the spirit of the human-in-the-loop workflow, the end user can actually choose which tools exposed by the website are accessible to the agent 01:24:18 ... all compute happens on the client side, the model powering the agent runs on the client 01:24:22 ... this is more light-weight and privacy-preserving approach compared to cloud-based agentic usage 01:25:09 Anssi: in the second part of the demo, google.com has defined tools such as get_page_title, extract_search_query, run_search, get_search_results 01:25:14 ... remember these tools are just JS functions declared by the website, injected into the model's context so it is aware of them 01:25:18 ... WebMCP tools make it easy to progressively enhance your existing websites and make them agentic 01:25:21 ... Prompt API input "run_search for webmcp and tell me what it is" 01:25:25 ... calls the same tools as in the voice model demo 01:25:28 ... get_page_title tool call returns "Google" to confirm the page is Google's homepage 01:25:32 ... run_search too is invoked to navigate to Google search results for "webmcp" 01:25:44 ... all interaction happens inside browser, model execution, tool calls, events 01:25:48 ... agent can be stopped at any time by the user 01:25:54 -> Prompt API tool use https://github.com/webmachinelearning/prompt-api#tool-use 01:26:02 -> WebMCP demo https://screen.studio/share/hbGudbFm 01:27:21 RRSAgent, draft minutes 01:27:22 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik 01:29:49 anssik: For this demo the extension injects the tools. In production the website would inject these tools. 01:30:15 mfoltz: How close was the polyfill to the explainer? 01:30:36 anssik: Unsure. You can check out the repo. It's conceptually aligned. 01:30:45 markafoltz has joined #webmachinelearning 01:30:52 kush: I have another demo I can show which matches the explainer. 01:31:00 https://drive.google.com/file/d/1awZA2bsVNO-uUqo9NpVdnHS2Xh26Pf4u/view?usp=sharing 01:31:12 -> https://drive.google.com/file/d/1awZA2bsVNO-uUqo9NpVdnHS2Xh26Pf4u/view?usp=sharing 01:32:02 kush: This demo shows integration with MCP UI. 01:32:13 ... Allows embedding UI within the conversation with the agent. 01:32:23 ... That UI declares its tools with WebMCP. 01:32:47 ... The agent doesn't need to hit the UI's backend. It's all a client-side app. 01:33:07 q? 01:33:12 q+ 01:33:19 ack kbx 01:33:20 ... In this case it's not a full website but a small UI embedded in a chat like we're seeing in the ecosystem. 01:33:56 q? 01:34:03 Dingwei has joined #webmachinelearning 01:35:12 Subtopic: Built-in agent ideation 01:35:54 ningxin: (Referencing slides) This is very similar to what we just saw in the demo. 01:36:09 ... Web application registers capabilities through WebMCP. 01:36:15 ... Browser has built-in small language model. 01:36:50 Em has joined #webmachinelearning 01:37:23 ... Investigating whether a site can call into this agent to perform a task using tools provided by the web app itself. 01:38:51 ... Investigating whether this can be run entirely locally. 01:39:05 q? 01:39:07 q+ 01:39:22 q+ 01:39:27 ack kbx 01:39:28 anssik: We can share this idea in an issue and foster further discussion. 01:39:35 kbx: +1 to this idea. 01:39:35 Tarek has joined #webmachinelearning 01:39:42 ... the browser also has capabilities that we might want to expose. 01:39:45 ack kush 01:39:50 AmirSh has joined #webmachinelearning 01:40:02 kush: There are two issues related ot this on WebMCP. 01:40:02 q? 01:40:26 ... 1. Exposing built-in tools to agent. 01:40:49 ... 2. Are you hoping for a tool to expose built-in agent to external agent? 01:40:52 q+ 01:41:04 ningxin: No. The prompt comes from the user, e.g. a chatbox on the site. 01:41:13 LeoLee has joined #webmachinelearning 01:41:21 kush: Could the in-browser and site agent work together? 01:41:30 ningxin: The web app can customize this. 01:41:42 ... We're considering use by app itself as well as extensions. 01:41:49 present+ 01:41:51 ... Extensions can use this API to enhance the browser agent. 01:42:21 Subtopic: Communication with the TAG 01:42:40 Subtopic: Communication with the TAG 01:42:47 Anssi: issue #35 01:42:48 https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+] 01:42:56 Anssi: the TAG provided the following discussion points: 01:43:00 -> https://github.com/webmachinelearning/webmcp/issues/35#issuecomment-3424766197 01:43:00 https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+] 01:43:04 q- 01:43:17 Anssi: I'd like to prime this session by unpacking the TAG discussion points followed by a summary of the group's current posture wrt these points 01:43:37 Anssi: - Motivation: "Frontend agent integration is useful and should be supported on the Web" 01:43:57 ... the group agrees with the TAG, the WebMCP effort's focus is to establish interoperable interfaces to enable frontend agent integration, motivation section of the explainer expands on this 01:44:04 -> WebMCP Explainer > Motivation https://github.com/webmachinelearning/webmcp#background-and-motivation 01:44:25 ... - Generality: "Whatever the WebMCP efforts introduce to the web standards should be general enough to support different protocols" 01:44:59 ... the explainer states "the WebMCP API [is] as agnostic as possible" and explains the API only reuses the "tools" base primitive from MCP, similarly to how the Prompt API reuses "tools" 01:45:28 hagio has joined #webmachinelearning 01:45:31 ... the group's charter codifies coordination expectations with the AI Agent Protocol CG, and I expect this group's participants to explore implementations atop other emerging protocols that may gain traction 01:45:40 -> https://github.com/webmachinelearning/webmcp#model-context-protocol-mcp-without-webmcp 01:45:44 -> https://webmachinelearning.github.io/charter/#coordination 01:46:00 ... - Privacy and Security: "The P&S aspects can be challenging, and we would also like to see more explorations." 01:46:20 ... the group has initiated a dedicated workstream for privacy and security, documented in issue #45 to be discussed next 01:46:21 https://github.com/webmachinelearning/webmcp/issues/45 -> Issue 45 Privacy & security considerations for WebMCP (by victorhuangwq) [Agenda+] 01:46:35 ... - Declarative API: "A declarative API can be quite useful and cover use cases that an imperative API can't cover" 01:46:56 ... the group agrees with the TAG and has developed and prototyped a declarative API alongside the imperative API, the latest proposal is PR #26 and related discussion in issue #22 01:46:58 https://github.com/webmachinelearning/webmcp/issues/22 -> Issue 22 Declarative API Equivalent (by EisenbergEffect) 01:46:58 https://github.com/webmachinelearning/webmcp/pull/26 -> Pull Request 26 add explainer for the declarative api (by MiguelsPizza) [Agenda+] 01:47:18 q? 01:47:28 Anssi: "[TAG] would like to encourage further development of the high-level API that the WebML [CG] is currently working on" suggests this group is on the right track 01:47:39 cpn: https://github.com/webmachinelearning/webmcp/issues/35 01:47:40 https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+] 01:47:44 ... per issue discussion Xiaocheng's personal view (and not TAG consensus) is "WebMCP should be built bottom-up on lower-level primitives" 01:48:04 ... Khushal explains in his comment how current web sites don't have a way to expose the same "how I use this site" as semantic actions and that such an API must be standardized for the browser to know how to use such actions programmatically 01:48:09 ... this is the rationale why the "tools" was chosen as the abstraction for the WebMCP to allow interoperability between websites and browsers 01:48:12 ... the "tools" abstraction enables websites and web browsers to talk to each other programmatically without needing for the browserss to scrape the content or run computer vision models on the page screenshots to understand where to click 01:48:19 ... the "tools" abstraction enables websites and web browsers to talk to each other programmatically without needing for the browserss to scrape the content or run computer vision models on the page screenshots to understand where to click 01:48:22 -> https://github.com/webmachinelearning/webmcp/issues/35#issuecomment-3444229149 01:48:26 q? 01:48:47 jyasskin: That was a good summary. 01:48:55 ... A lot of this feedback was not full TAG concensus. 01:49:20 ... Loudest concern is that MCP is not going to last. Want to avoid stuff being stuck in the platform that won't match a future protocol. 01:49:36 ... There was some uncertanty if the current design satisfies that request. 01:49:42 ... I think it might. 01:49:48 q? 01:49:59 ... Also concern about JSON schema being production ready here. I think that it is good enough. 01:50:46 q? 01:51:55 anssik: My takeaway from the feedback is that high-level is the right direction. 01:52:11 sushraja: There was a suggestion to try something lower level. 01:52:31 ... Building a JS API which would allow multiple LLMs to understand tools that are available. 01:52:40 ... This would be challenging across multiple library versions. 01:52:43 q? 01:53:06 RESOLUTION: Continue development of WebMCP as the high-level API as per the TAG guidance. Coordinate with the AI Agent Protocol CG on new protocols. 01:54:46 Subtopic: Privacy & security considerations 01:54:50 Anssi: issue #45 01:54:52 https://github.com/webmachinelearning/webmcp/issues/45 -> Issue 45 Privacy & security considerations for WebMCP (by victorhuangwq) [Agenda+] 01:54:55 ... the group has kicked off a workstream to look into privacy and security considerations as requested by the TAG in #35 01:54:55 https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+] 01:55:05 nournabil has joined #webmachinelearning 01:55:06 ... please note also breakout session "Agentic Browsing and the Web's Security Model" by Johann tomorrow 01:55:13 -> https://github.com/w3c/tpac2025-breakouts/issues/25 01:55:14 https://github.com/w3c/tpac2025-breakouts/issues/25 -> Issue 25 Agentic Browsing and the Web's Security Model (by johannhof) [session] 01:56:21 anssik I have prepared a slide show for the privacy and security considerations that I can share after the coffee break perhaps? 01:57:30 hagio has left #webmachinelearning 02:33:24 Tarek has joined #webmachinelearning 02:34:11 sushraja has joined #webmachinelearning 02:34:14 kbx has joined #webmachinelearning 02:34:22 RobKochman has joined #webmachinelearning 02:34:24 hagio has joined #webmachinelearning 02:34:28 LeoLee has joined #webmachinelearning 02:34:30 present+ Kenji_Baheux 02:34:40 BatuHoang has joined #webmachinelearning 02:34:43 Ehsan has joined #webmachinelearning 02:34:49 Ehsan Toreini + 02:34:56 Em has joined #webmachinelearning 02:35:02 mgifford2 has joined #webmachinelearning 02:35:07 Fazio has joined #webmachinelearning 02:35:27 cpn has joined #webmachinelearning 02:35:43 present+ 02:35:46 RRSAgent, draft minutes 02:35:47 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik 02:36:09 janina0 has joined #webmachinelearning 02:36:20 Present+ 02:36:21 Neha has joined #webmachinelearning 02:36:27 present+ 02:36:40 Anssi: the issue opened by Victor surfaces three key areas for deeper discussion 02:36:52 ... 1. Prompt injection attacks, which is already mentioned in issue #11 02:36:52 https://github.com/webmachinelearning/webmcp/issues/11 -> #11 02:37:01 matatk has joined #webmachinelearning 02:37:04 ... 2. Misrepresentation of intent in the WebMCP tool 02:37:15 ... 3. Personalization scraping / fingerprinting through over parametrization 02:37:16 present+ Matthew_Atkinson 02:37:19 ... I will break this into separate subtopics 02:37:30 ... (permissions part of the solution space is discussed in issue #44, we'll discuss that separately later today) 02:37:31 https://github.com/webmachinelearning/webmcp/issues/44 -> Issue 44 Managing action specific permissions (by khushalsagar) [Agenda+] 02:37:35 kush has joined #webmachinelearning 02:37:52 present+ Khushal_Sagar 02:38:01 Dingwei has joined #webmachinelearning 02:38:12 MasaoG has joined #webmachinelearning 02:39:16 Slideset: victor_slides 02:39:22 [slide 2] 02:39:59 Fazio has joined #webmachinelearning 02:40:18 s/Topic: Welcome//] 02:40:22 RRSAgent, draft minutes 02:40:23 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom 02:40:30 [slide 3] 02:40:56 BenGreenstein has joined #webmachinelearning 02:41:07 [slide 4] 02:42:36 [slide 5] 02:43:32 [slide 6] 02:44:01 markafoltz has joined #webmachinelearning 02:44:07 [slide 7] 02:45:07 q+ to note that over parametrization is much more invasive than just fingerprinting 02:47:44 [slide 8] 02:47:44 [slide 9] 02:54:52 q+ 02:54:52 [slide 10] 02:54:52 q? 02:54:52 ack dom 02:54:52 dom, you wanted to note that over parametrization is much more invasive than just fingerprinting 02:54:52 johannhof has joined #webmachinelearning 02:54:52 security means a few different things to people with disabilities. It could mean am I entering the information in the right place, is this what I think it is 02:54:52 kbx has joined #webmachinelearning 02:54:52 Amirsh has joined #webmachinelearning 02:54:52 dom: thank you for the presentation, I'd like to remind of the fingerprinting issue 02:54:52 Aung has joined #webmachinelearning 02:54:52 q+ 02:54:52 ... I don't want a site to know a user is pregnant or is visiting Japan at the moment, for example 02:54:52 ... privacy issue is quite substantial 02:54:52 Victor: fingerprinting is not the only thing this feature would be used for, but other malicious usage scenarios too 02:54:52 q? 02:54:52 ack Fazio 02:54:52 ohmata has joined #webmachinelearning 02:54:52 (not audible to remote participants) 02:54:52 Fazio: we did user research on an American fast-food chain, users had not done any purchased online because security concerns, s/he wasn't sure that the information is inserted in the right fields, confidence that the user is doing the right thing is a consideration 02:54:52 s/purchased online/purchase online 02:54:52 phillis has joined #webmachinelearning 02:54:52 nournabil has joined #webmachinelearning 02:54:52 anssik: we can make a comparison with keeping a hand on the steering wheel in a self-driving car 02:54:58 q+ 02:54:58 Em has joined #webmachinelearning 02:54:58 q+ 02:54:58 victor: conversely, people get more and more comfortable with a self-driving car over time 02:54:58 Data annotation is crucial 02:54:58 q? 02:54:58 ... but we're at the start of the journey and so it's important to make sure the user stays involved 03:21:48 RRSAgent has joined #webmachinelearning 03:21:49 logging to https://www.w3.org/2025/11/11-webmachinelearning-irc 03:22:11 Subtopic: Managing action specific permissions 03:22:51 Anssi: issue #44 03:22:51 https://github.com/webmachinelearning/webmcp/issues/44 -> Issue 44 Managing action specific permissions (by khushalsagar) [Agenda+] 03:23:01 ... we opened this separate issue to discuss the permission model, informed by the TAG feedback in #35 03:23:02 https://github.com/webmachinelearning/webmcp/issues/35 -> Issue 35 Communication with the TAG (by xiaochengh) [Agenda+] 03:23:17 ... Khushal's initial proposal: 03:23:37 ... - The browser manages a global: "can you be agentic on this site" permission. 03:24:01 ... - Action specific permissions. Say an action is destructive (deletes some files). The site would likely want user consent before that action is taken. 03:24:17 ... the question is whether and how to persist the action specific permissions per site? 03:24:18 q+ 03:24:32 ... tools are ephemeral from the browser's perspective, but NOT ephemeral from the site's perspective 03:24:47 ... consider these 2 states are indistinguishable from the browser (or MCP client's) perspective 03:24:55 ... the site (or MCP server) no longer provides this tool 03:25:01 ... the tool is currently disabled 03:25:56 kush: the main issue I landed on: for any kind of sensitive action, outlining clearly who is responsible for gaining user content (browser/agent vs site) 03:25:57 npm9 has joined #webmachinelearning 03:26:01 dom I have sent the document to you 03:26:16 ... if the agent/browser doing it, it's going to be non-determistic since it will depend on interpretation by the agent 03:26:26 ... conversely, the site knows determistically which tool is sensitive 03:27:08 ... but we also want to avoid double-prompting, so making the browser the default consent dialog source 03:27:17 q? 03:27:18 ... and have tools identify if they require user consent 03:27:22 ack kush 03:28:19 ... what string should show up on the consent dialog? coming deterministically from the site or generated by the agent? 03:28:24 matatk has joined #webmachinelearning 03:28:26 q+ 03:28:28 ... can this be declarative or would this depend on tool execution? 03:29:21 ... elicitation is for cases where a complex UI needed to continue the task (e.g. a payment transaction) 03:29:23 q? 03:29:35 ... don't have a sketch of an API yet 03:29:50 q+ 03:29:56 ack johannhof 03:30:00 hta has joined #webmachinelearning 03:30:43 kush: a browser could refuse to execute WebMCP on abusive web sites 03:30:55 johannhof: this should be discussed explicitly in the threat model 03:31:04 ... some discussion of preventing elicitation to happen 03:31:24 q? 03:31:29 ack sushraja 03:31:53 sushraja: if you're thinking of persistence, we need an identity for tool which we don't have today - maybe a hash for the description? 03:32:01 ... the user may want to allow now or always 03:32:27 kush: if this is delegated to the browser, the tool may not need to care 03:32:50 q? 03:33:53 sushraja: part of the question is what guarantees would tied to that hash/id 03:34:03 q? 03:34:50 WebMCP needs an API for the site to request browser provided consent flows for each tool execution 03:35:02 shisama has joined #webmachinelearning 03:36:13 q+ 03:36:32 dom: providing browser prompts on the site might complicate the UX, example is geolocation API where the web site should explain when it needs access 03:36:33 q? 03:37:21 ... maybe a boolean API works, any information under control of the website blurs the boundarary 03:37:40 johannhof: website generated text is not allowed in prompts, a Chrome policy 03:37:48 q? 03:38:21 ... difference between how you give our this information and click a button 03:38:44 dom: consent dialog and PII, is scarier 03:38:45 q? 03:39:13 kush: if you share PII that is browser-driven, this is about site exposing a tool and the user agreeing with the agent's action 03:39:28 ... the site gets information it otherwise would not have access to 03:39:59 reillyg: I think I lost track of what we try to protect against? 03:40:24 ... the site decides what is requires and built UX for it, "sure you want to pay for this" needs no browser UX 03:40:35 ... what is the threat model we want to push this to browser UX? 03:40:54 kush: UX reasons, one was minimizing context switches, no need to foreground the tab 03:41:02 ... another establishing a clear responsibility 03:41:16 reillyg: we have 3 regions in the UI, content area, browser chrome and agent UX 03:41:53 ... question is, if we ask for consent, but the consent question in the agent UX area different from browser UX 03:42:19 ... is asking question at the bottom of the conversation the same as if it'd be an alert dialog in an omnibox 03:42:35 johannhof: personally I think all should be consistent agent UX 03:42:36 q? 03:42:54 q? 03:43:22 dom: clear threat models would help here 03:43:33 johannhof: a lot of this is delegated to the user agent already 03:43:35 q? 03:44:00 ... no strong definition how these UI elements are shown exactly 03:44:02 ack Tarek 03:44:13 Tarek: confused about prompt granularity 03:44:23 ... not going to show 10 different permission prompts 03:44:37 ... full scenario for using an agent on a web page, shopping on a webpage 03:44:45 Em has joined #webmachinelearning 03:45:19 ... is there a protocol that, if I'm on a website these can be called 100s times 03:45:21 q? 03:45:28 q+ 03:46:17 kush: analogue "cookie prompts" 03:47:01 q+ 03:47:10 johannhof: all the agent makers must be good at these probabilistic protections to ship products 03:47:19 Tarek: you trust the origin? 03:47:20 markafoltz has joined #webmachinelearning 03:47:20 q? 03:47:48 q+ 03:48:04 Fazio: a journey map visualization to help illustrate the flow? 03:48:05 q? 03:48:29 Zakim, close the queue 03:48:29 ok, dom, the speaker queue is closed 03:48:29 q? 03:48:38 q? 03:49:00 q? 03:49:10 q? 03:49:13 ack kush 03:49:59 q? 03:50:38 q? 03:51:19 q? 03:52:20 https://github.com/webmachinelearning/webmcp/issues/44 03:52:20 https://github.com/webmachinelearning/webmcp/issues/44 -> https://github.com/webmachinelearning/webmcp/issues/44 03:52:30 PROPOSED RESOLUTION: WebMCP needs a threat model to evaluate the role of browser mediated consent for tool execution 03:52:35 Please leave open issues and continue to discussion here https://github.com/webmachinelearning/webmcp/issues/44 03:52:46 these types of conversations about consent of a user for an agent actions is already happening in many other standards organizations. WebMCP can leverage OAuth discssions from IETF where the larger MCP community is working through these same concerns of what the agent is allowed to do on behalf of an user 03:52:49 RESOLUTION: WebMCP needs a threat model to evaluate the role of browser mediated consent for tool execution. 03:53:08 cfredric has joined #webmachinelearning 03:53:25 Subtopic: WebMCP accessibility, ARIA mapping via Declarative API 03:53:41 matatk has joined #webmachinelearning 03:53:43 Anssi: PR #26 03:53:45 https://github.com/webmachinelearning/webmcp/pull/26 -> Pull Request 26 add explainer for the declarative api (by MiguelsPizza) [Agenda+] 03:53:47 Em: agreed, also wonder the duplication from the ai agents working grooup 03:54:04 ... Brandon noted "aria-label and aria-description and others for labelling and describing elements and so it might be helpful to define WebMCP mappings/behaviors for these instead of introducing entirely new HTML attributes." 03:54:23 q+ 03:54:31 Zakim, reopen the queue 03:54:31 ok, dom, the speaker queue is open 03:54:33 q+ matatk 03:54:40 queue=matatk 03:54:43 ack sushraja 03:54:45 ack Em 03:54:47 q? 03:55:13 alex: WebMCP describes the tools in JS 03:55:23 ... the declarative API is a proposal for a pure HTML version of MCP 03:55:50 ... rather than declare them in JS, we declare them in HTML, and the different form field are all comprised in the markup 03:56:17 ... e.G. with a tool-name attribute and a tool-description, and the rest of the schema for the tool is inferred from the children elements, including aria attributes 03:56:35 ... still lots of questions about the execution of the tool would do 03:56:52 q? 03:57:12 .... current proposal is execute the form, with the expectation that the server would respond with JSON based on the presence of an HTTP header indicating the execution in a tool context 03:57:33 ... with some challenges in how to handle HTTP and redirects in that context 03:57:51 ack matatk 03:57:51 ... but it's interesting to repurposing accessibility annotations in that context 03:58:21 matatk: APA is happy to look at this pull request - we haven't done that yet 03:58:50 ... what's really important is to ensure that semantics that are added are useful to people 03:58:51 RobKochman_ has joined #webmachinelearning 03:59:00 ... some of it targets for screen readers, but not just them 03:59:02 q+ 03:59:10 ... we had a breakout about making some of these annotations for machine 03:59:23 ... which can then help e.G. for cognitive disabilities 03:59:30 ... e.g. to help navigation destinations 04:00:13 ... there are other things where we see overlap between Agents & accessibility: in many case, accessibility needs to increase machine interpretability for assistive technologies 04:00:21 q? 04:00:42 ... but it's important to make sure the aria attributes used in the agent context end up overloading with information people 04:00:53 ack LeoLee 04:01:20 q+ 04:01:34 LeoLee: there might be web sites that abuse aria attributes to "escape" from agents 04:01:38 q? 04:01:45 ... and thus deteriorate the accessibility experience 04:01:54 q+ 04:02:08 alex: this should be opt-in to avoid overloading models 04:02:18 ack reillyg 04:02:23 ... e.G only expose tools expose with mcp annotations 04:02:36 reillyg: I like the idea of trying to describe things better to make assistive agents more useful 04:03:03 ... in terms of declarative vs imperactive, the former should be focused on explaining to the agent how to interact with the web site through the normal human interface 04:03:40 ... what makes me nervous is the assumption the server should respond with a specialized machine readable output, it should be the regular result of the underlying normal interaction 04:03:52 q? 04:03:55 ack kush 04:04:06 ... it should be grounded on the UI and human workflow of the site; for a JSON-based approach, the imperative approach feels appropriate 04:04:14 q? 04:04:25 kush: + 1 to avoid that divergence between UI and response 04:05:15 q? 04:05:32 ... also, if this an opt-in approach, we should provide ways to describe MCP desriptions independently of ARIA to avoid polluting the screen reader experience 04:05:47 RRSAgent, draft minutes 04:05:49 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom 04:06:17 kush: open question on whether they should ship together 04:07:03 hagio has left #webmachinelearning 04:43:46 acomminos has joined #webmachinelearning 05:05:37 LeoLee has joined #webmachinelearning 05:05:44 Aung has joined #webmachinelearning 05:05:55 sushraja has joined #webmachinelearning 05:08:17 sa-takagi has joined #webmachinelearning 05:13:13 ErikAnderson has joined #webmachinelearning 05:13:22 present+ 05:13:39 Mike_Wyrzykowski has joined #webmachinelearning 05:14:24 kbx has joined #webmachinelearning 05:16:59 Tarek has joined #webmachinelearning 05:18:05 thelounge has joined #webmachinelearning 05:18:48 thelounge has left #webmachinelearning 05:19:19 nehasapre has joined #webmachinelearning 05:20:08 Em has joined #webmachinelearning 05:20:14 Subtopic: Define the API for in-page Agents to use a site's declared tools 05:20:24 Anssi: issue #51 05:20:24 https://github.com/webmachinelearning/webmcp/issues/51 -> #51 05:20:30 Present+ Sushanth Rajasankar 05:20:32 kush has joined #webmachinelearning 05:20:38 ... currently we have an API to declare tools to site, "the registry": 05:20:40 present+ Khushal_Sagar 05:20:42 Present+ Sushanth_Rajasankar 05:20:45 -> modelContext API https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md#modelcontext 05:21:01 phillis has joined #webmachinelearning 05:21:02 Anssi: we need also an API for an agent _embedded on the site_ to use these tools, and think browser can help mediate which agent is accessing the site's functionality at a time 05:21:09 ... notably, here the agent is running within the same origin as the site 05:21:25 ... this issue was informed by issue #43, rephrasing Alex's thinking there: 05:21:26 https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j) 05:21:46 ... - "WebMCP server" is the `navigator.modelContext` object that acts as the registry which tools are declared on 05:22:05 ... - "WebMCP client" is the API for listing, calling, and listening for tool changes callable by an agent _embedded on the site_, e.g.: 05:22:14 ``` 05:22:14 navigator.modelContext.listTools 05:22:14 navigator.modelContext.executeTool 05:22:14 navigator.modelContext.onToolListChanged 05:22:14 ``` 05:22:24 -> https://github.com/webmachinelearning/webmcp/issues/43#issuecomment-3478492067 05:22:25 https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j) 05:22:38 q? 05:22:57 MasaoG has joined #webmachinelearning 05:23:06 kush: this is a derivative discussion from #43 about the scope of the API 05:23:28 ... when people are embedding 3rd party library to embed agents on their site (à la zendesk) 05:23:37 ... it would be convenient to make that integration easier 05:23:53 johannhof has joined #webmachinelearning 05:24:20 ... initially, there was no support to integrate this for lack of good use case 05:24:21 q+ 05:24:24 q? 05:25:11 nournabil has joined #webmachinelearning 05:25:30 q? 05:25:40 ack dom 05:25:54 dom: this feels like a convenience function? 05:26:20 kush: exactly, to avoid browsers jumping at each other 05:26:56 ... say "I want to start executing tools and if another agent is interacting with the site I can avoid stomping on each other" 05:27:21 dom: how about if you provide WebMCP we recommend you record your server in a global function 05:27:40 kbx has joined #webmachinelearning 05:27:46 ... some other groups are recommending exposing standard library as an SDK 05:28:00 johannhof: this seem pretty big tangent, a separate proposal for the group? 05:28:29 reillyg: if we decide there's explicit way to expose tools, then we'd include a mitigation in the browser feature 05:28:46 johannhof: a separate proposal for integration with the Prompt API warrants its own issue 05:28:47 q? 05:28:57 acomminos has joined #webmachinelearning 05:29:05 kush: there's probably a lighter-weight way to deal with this problem? 05:29:39 ... anything mediated by the browser, API light weight as "external agent want to connect to you" and the site can terminate that connection 05:30:00 ... someone declares the tools and by doing this we ensure there's only one task using the tool at a time 05:30:01 q? 05:30:24 dom: only change to core WebMCP would be if the tool can or cannot run concurrently 05:31:12 reillyg: web developers can proceed when the initial tool call via a button press is completed, and resume with a new call then 05:31:32 q? 05:32:04 anssik: we should define the various flavours of agents (incl e.g. in-site agents) 05:32:38 reillyg: the type of agents I can imagine are browser-agents and same-site agents 05:33:17 dom: how about agent connected to the site? 05:33:39 reillyg: browser-integerated agent, OS-provided agent, in-page agent 05:33:40 q? 05:33:54 dom: agent that specialized on one origin only 05:34:07 reillyg: extension developers could do an origin-expert agents 05:34:31 s/an origin-/a origin- 05:34:40 q+ 05:34:46 dom: security consideration are different for different flavors 05:35:52 kush has joined #webmachinelearning 05:35:55 Steven has joined #webmachinelearning 05:36:00 johannhof: feels beyond the scope of what an MVP for WebMCP needs to address, except to the extent it affects WebMCP itself (e.g. a connect function to help manage multi connections) 05:36:20 Em has joined #webmachinelearning 05:36:48 q? 05:36:52 ack johannhof 05:37:28 PROPOSED RESOLUTION: revisit that issue (likely in a different proposal) if there is a need for browser mediated interactions between browser agents and in-site agents - no clear need to address this in WebMCP atm 05:38:10 fyi created an issue about identity within WebMCP. I think many of the current conversations around consent prompting to users and what agents have access to can be resolved by the agent not acting *as* the user, but using OAuth for the agent to act *on behalf of* the user. https://github.com/webmachinelearning/webmcp/issues/54 05:38:10 https://github.com/webmachinelearning/webmcp/issues/54 -> Issue 54 Challenging assumptions of Identity within WebMCP (by EmLauber) 05:38:15 RESOLUTION: revisit that issue (likely in a different proposal) if there is a need for browser mediated interactions between browser agents and in-site agents - no clear need to address this in WebMCP atm 05:38:20 Subtopic: Should we support cross-origin Agents across frame boundaries 05:38:29 Anssi: issue #52 05:38:30 https://github.com/webmachinelearning/webmcp/issues/52 -> Issue 52 Should we support cross-origin Agents across frame boundaries (by khushalsagar) [Agenda+] 05:38:42 ningxin has joined #webmachinelearning 05:38:42 ... current thinking is tools can only exist on the top-level browsing context: 05:38:53 ... "Only a top-level browsing context, such as a browser tab can be a model context provider." 05:38:59 -> https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md#understanding-webmcp 05:39:19 Anssi: this design does not support the following use cases: 05:39:31 ... - a webpage which embeds an Agent 05:39:37 ... - an Agent which embeds a cross-origin webpage and wants to access its WebMCP functionality 05:39:41 ... proposal to consider a permission policy allowlist for WebMCP 05:39:45 -> https://www.w3.org/TR/permissions-policy/#allowlist 05:39:53 mtavenrath has joined #webmachinelearning 05:40:11 Anssi: question on granularity: "this origin is allowed to see all tools", "this origin is allowed to see a subset of tools: X, Y and Z" 05:40:11 q+ 05:40:38 ack kush 05:41:42 kush: this is another derivative of #43; this is about an agent embedded in a cross-origin iframe 05:41:42 https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j) 05:41:46 q+ 05:42:00 ack Mark_Foltz 05:42:29 ... the question is whether the browser need to provide a shared interface instead of relying e.g. on ad-hoc postMessage communication 05:42:35 q+ 05:43:18 kush: should we allow embedded iframe to expose tools to the browser-agent? separately, should the embedder be able to access embedded tools? 05:43:36 q+ 05:43:41 johannhof: for the former, part of the question is whether the user has the mental model to understand what happens 05:43:54 q+ 05:44:11 ... that's an assumption we've made for permissions 05:44:41 ... imagine a locateStore tool - this should be available to the agent seamlessly 05:44:43 q+ 05:44:48 q+ 05:44:55 kush: the challenge is to enforce origin isolation for data provided to the tool 05:45:50 ... this could be something the embedder could allow or disallow 05:46:01 johannhof: if so, I think it should be disallowed by default 05:46:03 ack kbx 05:46:36 kbx: +1 on disallow by default 05:46:56 ... #43 was talking about an in-site agent, which it feels out of scope 05:46:56 https://github.com/webmachinelearning/webmcp/issues/43 -> Issue 43 Clarifying the scope of the proposal (by 43081j) 05:47:34 ack tomayac 05:48:19 tomayac: re iframe, some pages (e.G. shopping pages) advertise for themselves, via an iframe with a different origin 05:49:27 ... it would be problematic if these types of embedding were perceived as a separate and not accessible by the agent 05:50:23 ... e.g. if the agent couldn't invoke a tool provided by the tool 05:50:28 ack Em 05:50:30 ack em 05:51:00 emily: work at MS Identity - one part missing is authentification of the pages 05:51:24 q? 05:51:39 ... we should ensure the agent should only have access to authenticated pages for that user 05:52:13 ... incl scenario where federated authentication is used 05:52:42 ... to avoid leakage of sensitive data 05:53:04 kush: from the WebMCP API surface, there is no distinction on the state of authentication 05:53:28 ... in terms of data-leakage, origin isolation needs to be applied in any case even without embedding 05:55:29 em: if WebMCP calls a tool that interacts with an authenticated backend, then ensuring limits around authentication context matters 05:56:01 reillyg: tools are always calling JS functions in the page (and thus inherit the authentication status of the page) 05:57:08 .... there are separate proposals to be able to shift the caller of the tool from the agent in the browser to a server-side agent with its context, incl identity (e.g. to allow chatgpt to continue an operation started in a browser agent context) 05:57:15 ack ErikAnderson 05:57:15 ... but that's outside the scope of MCP 05:57:22 q 05:57:26 q+ 05:57:37 acomminos has joined #webmachinelearning 05:57:37 Em has joined #webmachinelearning 05:58:09 ErikAnderson: MS Teams has a complex architecture for their own tabbing infrastructure, incl via iframes 05:58:50 ack sushraja 05:58:53 ... if we use a permission policy mechansim, it would be interesting to see if this could or should be attached to actual visibility of the content 05:59:24 johannhof: or allowing to toggle permission on and off dynamically 05:59:46 sushraja: the problem with iframes is the risk of tool naming collisions 06:00:12 ... the top-level frame could delegate tool "focus" to a given iframe dynamically 06:00:31 q? 06:01:11 dom: this points out the need for the top-level frame to be the coordinator 06:02:09 kush: the problem of name collision exists in multi-tab scenarios 06:02:26 sushraja: I've assumed we were working only tab-in-focus 06:03:08 johannhof: I'm not too worried about the multi-tab situation where the agent should be able to make distinction 06:04:09 kush: should we punt on embedded iframes for now? or make this UA dependent? 06:05:24 johannhof: is there ways to detect name collision in the current spec? 06:05:50 kush: name collision across multiple services is bound to happen, with or without the Web 06:06:37 ErikAnderson: is there a concrete use case to help drive this conversation? 06:06:39 q? 06:07:42 Mark_Foltz: maybe polyfilling with postMessage to get a better sense of the need and potential API shape 06:07:56 dom: if so, would we treat same-origin and cross-origin differently? 06:08:06 johannhof: I don't see why not allow same site 06:08:22 sushraja: we would be facing name collision 06:09:08 johannhof: I think we should actually go with a permission policy that would work in x-origin context 06:09:48 kush: again, the need for agents to disambiguate across name collisions is something that agents will need to fix 06:10:20 leo: even the same-origin is not necessarily anchored in use case 06:11:35 johannhof: ultimately, whether first-party sites do the tools themselves or delegate to same- or x-origin iframes shouldn't be relevant 06:12:04 kush: I think the main question is of implementation cost; let's give a bit more time figure implementation challenges before proceeding 06:12:14 anssik: we should also make sure to get more johannhof's time in our calls 06:12:46 Topic: Built-in AI APIs Overview 06:12:49 Subtopic: Prompt API 06:12:53 Slideset: https://docs.google.com/presentation/d/1cpPcxK25UB9Zf6_5USLKetje57jLtc6ZwRwC5ScnCdY/edit?slide=id.g3a175b287ca_3_0#slide=id.g3a175b287ca_3_0 06:13:20 Jingyun has joined #webmachinelearning 06:13:21 [slide 3] 06:13:30 tako has joined #webmachinelearning 06:13:35 Tarek has joined #webmachinelearning 06:14:06 [slide 4] 06:14:09 acomminos has joined #webmachinelearning 06:14:43 [slide 5] 06:15:31 [slide 6] 06:18:34 Em has joined #webmachinelearning 06:19:11 MikeWasserman: the rest of the slides are more in-depth look at technical topics - we might need to agenda-bash which topics to focus on today 06:19:49 Anssi: reillyg and I have identified a few issues for Prompt and Writing Assistance 06:20:14 sa-takagi has joined #webmachinelearning 06:22:29 Topic: Prompt API 06:22:35 Subtopic: TAG design review 06:22:38 -> TAG design review: Prompt API https://github.com/w3ctag/design-reviews/issues/1093 06:22:38 https://github.com/w3ctag/design-reviews/issues/1093 -> CLOSED Issue 1093 Prompt API (by domenic) [Progress: in progress] [Venue: WebML CG] [Resolution: lack of consensus] [Topic: Machine Learning] [Focus: API design (pending)] [Focus: Web architecture (pending)] [Focus: Internationalization 06:22:38 … (pending)] 06:22:43 Anssi: we requested TAG design review in May and received a review response in August 06:22:46 ... however, the initial TAG review response was withdrawn due to inaccuracies 06:22:54 reillyg: in terms of the brand new TAG review, there are concerns about the locality of the models - feedback from developers seems to indicate they're good enough 06:23:11 Anssi: we received new review feedback 40 minutes ago 06:23:31 ... with the caveat that the APIs are only provided on devices with sufficient performance characteristics - question of whether we can provide a fallback for lower end deviecs 06:23:57 ... a question on cost of computing - concerns about site abusing the user system resources 06:24:07 ... as the crypto craze demonstrated in the past 06:24:33 ... this may be something that browser implementors should work towards detect abuse of computing power 06:24:53 ... A really good question on whether or not we should assume the model execute locally 06:25:32 ... the current API mentions hybrid options, but we should probably clarify we've only done the local option in existing implementations, and review the security and privacy implications of a cloud-based approach 06:25:38 q+ 06:26:21 ... in a cloud approach, there may be concerns about resources consumption (network, AI subscription) and possible additional user profiling (e.G. level of subscription the user has access to can be a proxy to their means) 06:27:05 ... discussion about downloadprogress with all its complexity - TAG suggesting we should make it simpler 06:27:23 ... and developers also pushing back on having to make the decision to download or not 06:28:24 ... On Model version and updates - concern about frequency updates (possibly a suggestion to limit those from a browser perspective), but also more importantly, the issues around interop between sessions (and interop between browsers across different type of models) 06:28:41 ... some positive experience in that regard from early experimentation, but still needs to be confirmed 06:29:44 ... On Input/Output languages, risks of fingerprinting with a possible solution to restrict the languages to those that fit the user context 06:30:19 ... Memory management: destroy() being redundant, good question 06:30:31 ... JSON Schema standardization status 06:31:11 q? 06:31:12 ... On Tool use - more examples needed to help the TAG analyse the spec 06:31:17 ... likewise for structured output 06:31:23 sushraja has joined #webmachinelearning 06:31:40 +present Sushanth_Rajasankar 06:32:36 anssik: this is great feedback 06:32:49 present+ Sushanth_Rajasankar 06:32:54 q+ 06:32:59 ack kbx 06:33:21 kbx: re hybrid model support, are there lessons from WebSpeech? 06:33:47 reillyg: I started reading the TAG review of the WebSpeech API to get a sense of that 06:33:59 ... it looks like it's still an open question how to deal with that aspect 06:34:29 dom: we should ensure to work with WebSppech on this 06:35:09 Tarek: we're looking at making some of our APIs support cloud-based 06:35:25 ... which raises question about cost and subscription (hence authentication) 06:35:47 ... if it needs authentication, it should be applicable across different features 06:36:28 reillyg: developers really want to be able to enforce on-device e.g. for alignment with their privacy/security policy 06:36:34 q+ 06:36:39 q+ 06:36:52 ... (with a fallback of having themselves determine which cloud service to use) 06:37:34 Tarek: Apple Intelligence has a local vs private compute 06:38:04 q? 06:38:10 reillyg: we would need to validate whether this trusted environment on cloud would match their needs from a privacy/security perspective 06:38:59 tarek: access to a trusted environment needs a key exchange, which the browser doesn't necessarily have access to 06:40:01 flower.ai 06:40:05 ... https://flower.ai/intelligence/ has a JS library with a hybrid approach 06:40:52 q? 06:40:57 ack kush 06:41:04 ack RobKochman 06:41:04 kush: what's the motivation for the hybrid fallback to happen in the browser vs in a JS lib? 06:41:45 Rob: to make it easier for developers - make it work on all browsers without consideration of their performance status 06:43:04 q- 06:43:05 dom: if need to integrate two different APIs it makes the DX worse for Prompt API 06:43:30 q- 06:43:38 rob: the explainer says we want to enable hybrid approaches; nothing we're doing preclude it atm 06:43:39 ack sushraja 06:44:03 kbx: some implementors might choose a hybrid approach, but it's not clear the current API allows for it 06:44:37 sushraja: re structured output, we throw an exception if the browser doesn't understand a given JSON Schema 06:44:52 reillyg: we need to be specific about what needs to be supported for interop 06:45:29 ErikAnderson has joined #webmachinelearning 06:45:34 q? 06:45:35 sushraja: re WebSpeech, there are limits in terms how much context get kept - typically split at silent spots 06:45:44 ... not sure what happens when the limits are hit 06:46:02 anssik: we'll coordinate the responses 06:46:32 reillyg: our team will go through the feedback and propose responses/file relevant issues 06:46:58 anssik: noting there isn't TAG consensus 06:47:26 Subtopic: Add image input resizing or tiling options 06:47:30 Anssi: issue #133 06:47:30 Issue 133 not found 06:47:42 ... this is another issue from Mike, opened a few months ago so we have some Demenic's comments too from the time before he stapped down from the editor role 06:47:45 ... the proposal is as the subject says, to add image input resizing or tiling options 06:47:52 https://github.com/webmachinelearning/prompt-api/issues/133 06:47:53 https://github.com/webmachinelearning/prompt-api/issues/133 -> Issue 133 Add image input resizing or tiling options (by michaelwasserman) [Agenda+] 06:47:55 ... the request is motivated by the followign use cases: 06:48:05 ... - Resizing (the default right now) is useful for an overview of an image, e.g. rough descriptions of photos. 06:48:12 ... - Tiling would be useful for analyzing high-resolution details from a larger image, e.g. OCR from a large document. 06:48:20 +q 06:48:24 ... Domenic suggested a higher-level knob such as detail: "low" vs. detail: "high" would be more forward looking 06:48:34 ... and asked the group to research other model provider APIs 06:48:43 q? 06:48:45 q+ 06:48:55 ack sushraja 06:49:05 Ugur has joined #webmachinelearning 06:49:11 sushraja: today the API exposes the min/max resolution for images, we only normalize to token consumed 06:49:24 https://github.com/webmachinelearning/prompt-api/issues/84 06:49:24 https://github.com/webmachinelearning/prompt-api/issues/84 -> Issue 84 Exposing max image / audio limits? (by domenic) [enhancement] 06:49:36 q? 06:49:45 ... but that isn't sufficient for developers to know what they can provide and is supported 06:50:16 reillyg: the OCR example is also illustrative: you need to tile the image and tile it in a way that's useful for OCR (e.G. avoid breaking over lines) 06:50:34 ... combining model-specific knowledge and app-specific (e.g. is this vertical or horizontal text) 06:51:06 ... exposing min/max of images is the only way I can see, but it hardcodes a particular constraint of current models 06:51:43 tomayac: the developer expectations would be a higher quality image leads to better OCR, but if a low quality image is already higher than what the model will be resizing to, it breaks a logical assumption 06:52:28 reillyg: my only concern is the additional complexity we might regret later 06:52:47 ... there is a similar issue with audio and the supported sample rate 06:52:54 ack kbx 06:53:02 q? 06:53:04 kbx: and the duration of audio as well 06:54:17 reillyg: if the openAI API is successful with min/max, that's a good sign, but learning more about these properties 06:55:59 kbx: another item is the number of channels 06:56:03 +q 06:56:15 q? 06:56:19 reillyg: images has width/height/depth/color 06:56:24 ... also context window size 06:56:38 ack sushraja 06:56:51 tomayac: also the shape into which it may gets transformed might inspire tiling 06:57:32 big-screen has joined #webmachinelearning 06:57:32 sushraja: I don't think the developers want resolution for OCR - but what's the minimum font size they support for OCR 06:57:49 ... that may be pretty constant (human readable font size) 06:58:03 kbx: that might be discoverable through testing on well-known content 06:59:22 RESOLUTION: The group will research other model provider APIs for image input resizing or tiling options. Also consider audio input. 06:59:38 RRSAgent, draft minutes 06:59:39 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html dom 07:16:05 acomminos has joined #webmachinelearning 07:24:49 acomminos has joined #webmachinelearning 07:30:07 sushraja has joined #webmachinelearning 07:32:30 hagio has joined #webmachinelearning 07:35:25 Tarek has joined #webmachinelearning 07:35:57 acomminos has joined #webmachinelearning 07:38:03 ErikAnderson has joined #webmachinelearning 07:39:01 kbx has joined #webmachinelearning 07:39:21 Subtopic: Tool Use: decouple execution and formalize function calls and responses 07:39:22 kush has joined #webmachinelearning 07:39:26 Anssi: issue #159 07:39:26 https://github.com/webmachinelearning/webmcp/issues/159 -> #159 07:39:48 ... Mike reports the initial Prompt API design integrated JS tool execution within the prompt() call itself 07:39:56 ... this design prioritized API simplicity 07:40:09 ... the trade-off of this design is it reduces granular control and direct LLM interactions 07:40:21 dom has left #webmachinelearning 07:40:28 ... to address this, Mike suggests to realign initial tool use integrations with Prompt API objectives, three points: 07:40:43 ... - (1) "Provide essential Language Model tool types: i.e. Function Declarations (FD), Function Calls (FC), and Function Responses (FR)" 07:40:46 Actual 159 issue is here: https://github.com/webmachinelearning/prompt-api/issues/159 07:40:46 https://github.com/webmachinelearning/prompt-api/issues/159 -> Issue 159 Tool Use: decouple execution and formalize function calls and responses (by michaelwasserman) [tools] [Agenda+] 07:40:52 ... - (2) "Offer fine-grained client control over tool execution loops used for agentic integrations." 07:41:05 ... - (3) "Align with patterns established by major LLM APIs (OpenAI, Gemini, Claude)." 07:41:09 ... the motivation for the three: 07:41:18 ... (1) is used by API clients to inspect, reconstruct, and test session history 07:41:26 ... (2) enables clients to define the looping patterns, error handling, limits, etc. 07:41:33 ... (3) empower clients to use the Prompt API more interchangeably with server-based APIs. 07:41:56 ... about the proposal 07:42:00 ... currently the Prompt API using a closed-loop model 07:42:05 ... where the browser process operates as a hidden agent, looping on (model prediction → tool execution → response observation) until a final text response was ready 07:42:10 ... proposal to move to API-centric, open-loop model for tool execution where: 07:42:18 ... - prompt() method returns a structured Function Call (FC) object to the client 07:42:22 ... - Function Call (FC) object manages the execution and Function Response (FR) feedback loop 07:42:28 ... this maximizing developer control and observability 07:42:32 ... Function Declarations (FD) also need not provide execute functions for now 07:42:39 ... there's a comparison table between Closed-Loop (Original) and Open-Loop (Proposed) 07:42:43 ... Open-Loop is better in Developer Control, Debuggability and Industry Alignment dimensions 07:42:52 Mike: working with Microsoft on this proposal 07:43:08 ... function calls and responses as first-class APIs 07:43:43 ... our concerns were around encapsulating the notion of API clients needing to capture calls to execution functions and turn them into representations for calls and represented later 07:43:57 ... what constitutes a response, added to the initial prompt for the session 07:43:58 q? 07:44:30 Mike: also something to look for are other LMs, web app developers to target Prompt API or cloud-based APIs more easily 07:44:50 ... if they're developing and inside agent, we want to make the Prompt API aligned with cloud-based solutions 07:44:58 ... some of the intro slides animate this 07:45:32 ... also invite Sushant and Frank to chime in, we've worked together on this 07:45:51 ... feedback from this group would be helpful 07:46:03 ... we want to structure the API so that allows good production API around tool use 07:46:04 q? 07:46:04 MasaoG has joined #webmachinelearning 07:47:11 [ Mike revisits the Built-in AI APIs Overview slides shared earlier. ] 07:48:10 [ Slides shown is the "Tool Use: JavaScript Example" adn "Tool Use: Design alternatives" for a weather tool ] 07:49:01 [ Demo polyfill by Nathan ] 07:51:33 Video link https://drive.google.com/file/d/12o1tmtfhvdF1f0JUMgb8ZGeSe-HTg-WR/view?resourcekey 07:51:57 Ugur has joined #webmachinelearning 07:55:56 q? 07:56:15 kush has joined #webmachinelearning 07:56:22 present+ Khushal_Sagar 07:56:30 q? 07:56:30 RRSAgent, draft minutes 07:56:31 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik 07:57:05 nournabil has joined #webmachinelearning 07:57:11 reillyg: one of the reasons this makes sense is the semantics are provided by the developer for the tool and they get complicated if you ask the developer to re-prompt the tool 07:57:21 q+ 07:57:27 q+ 07:57:34 ... to avoid design a complicated API, it is deceptively simple, but the semantics are hiding a lot of detail 07:57:54 sushraja: each model has a different way to represent the tool call and we want to avoid model dependency 07:58:05 q+ 07:58:38 ... open-loop allows model to be replaced with a cloud-based 07:58:52 ... only negative is it increases implementation complexity 07:58:54 ack kush 07:59:25 kush: interestingly fundamentally different, responsibility is on the develop to manage everything that goes to the context window 07:59:40 ... once they resolve the tool call may want to modify the call 08:00:04 ... there is room to align the syntax for LM tool declaration across both the APIs even if the pattern is not 08:00:04 q? 08:00:28 q+ 08:00:44 reillyg: there was some discussion, if you provide no tools then model will not use them 08:01:12 ... because there's a question for each call to prompt, are you expecting a tool call at that very moment 08:01:32 ...that effects what string encoding the underlying implementation will do 08:01:50 sushraja: enabling and disabling tool calls between prompts? 08:02:11 q+ 08:02:34 reillyg: tool call needs to be added to expectedOutput 08:02:38 q? 08:02:58 sushraja: presence of the tools can be used, so developers don't need to provide the list of tools 08:03:07 reillyg: availability API assumes the same options are available 08:03:08 q? 08:03:14 Ugur has joined #webmachinelearning 08:03:14 ack kbx 08:03:58 kbx: we got feedback from authors that in practice there's a dynamic condition where open-loop model helps, if something changes in between the journey so can adapt 08:04:18 ... if everything happens without issues closed-loop works as wekk 08:04:23 s/wekk/well 08:04:42 sushraja: state changes we need to add act differently then between the two 08:04:43 q? 08:04:46 ack Tarek 08:05:11 Tarek: on our side we had issues depending on models, something models did not work, bad accuracy, true for small models 08:05:40 ... if we'd use this API on different underlying models on say Edge and Chrome, which API is use so that the tool call works 08:05:58 ... the tools are going to make the interop issues harder 08:06:24 sushraja: without this API developers will craft a special system prompt instead 08:06:50 q? 08:07:29 reillyg: proposed API abstracts out some differences, we are working with model development teams to ensure the models can work with tools 08:07:44 ... this is a possible interop issue for some models 08:08:12 sushraja: do we assume building a workflow that expects LLM to make the tool call? 08:08:30 q? 08:08:32 ack kush 08:08:51 Ugur has joined #webmachinelearning 08:08:51 kush: want to understand how the model's context looks like when a tool call is invoked 08:09:43 ... if there's a failure, the developer does not handle the tool calls, do the tools get ignored in the next prompt? 08:10:15 reillyg: in the two versions of the polyfills, in closed-loop, input and output streams failures, browser catches the errors 08:10:49 in open-loop you get a chuck that instead of text is the error signal 08:10:59 s/chuck/chunk 08:11:15 q? 08:11:36 kush: ImageBufferSource as input to tool call? 08:12:04 Tarek: you can use SHA64 images 08:12:28 sushraja: if the model wants to do multiple tool calls, then need two responses and prompt 08:12:44 reillyg: in this open-loop model can respond in a way the model is not trained for 08:13:08 ... the model is trained so it might not do the right thing 08:13:17 ... do we track the model expect a tool call? 08:13:30 q+ 08:14:25 reillyg: in the future models might support async tool calling 08:14:28 ack tomayac 08:14:43 tomayac: footgun-wise, open-loop syntax makes me nervous 08:15:11 ... this is recursion, streams, JS developers are not so familiar with these 08:16:17 ... non-ergonomics of the open-models is challenging for developers 08:16:34 s/open-models/open-loop models 08:16:44 q? 08:17:06 q? 08:18:24 reillyg: chunk type that is a tool call, this is only a property of open-loop design 08:19:27 ... there are different flavors of tool calling, in closed-loop model you don't get any chunks 08:19:53 sushraja: if you have a closed-loop model, the implementation will silently eat token 08:20:23 ... to be able to restore the session need to see all the tokens 08:20:43 q? 08:21:00 ack kbx 08:22:11 q+ 08:22:47 q- 08:23:12 Ugur has joined #webmachinelearning 08:23:26 sushraja: also need to think of security implications 08:23:39 RESOLUTION: Further solicit feedback on both closed-loop and open-loop models to understand developer ergonomics issues. 08:23:51 Topic: Writing Assistance APIs 08:23:56 gb, this is webmachinelearning/writing-assistance-apis 08:23:56 anssik, OK. 08:24:09 Subtopic: User Activation requirements for Session Creation cause undue friction 08:24:14 Anssi: issue #83 and PR #86 (merged) 08:24:14 https://github.com/webmachinelearning/writing-assistance-apis/issues/86 -> #86 08:24:14 https://github.com/webmachinelearning/writing-assistance-apis/issues/83 -> #83 08:24:20 ... an issue from Isaac 08:24:31 ... Built-in AI APIs currently consume transient user activation when their availability is downloadable 08:24:44 ... this adds friction when a site wants to create a Summarizer once the content to be summarized comes into view 08:25:03 ... now the site must create the session earlier (think warm up), consume the transient activation, then get user to act on the page again to activate the transient activation again 08:25:19 ... transient activation indicates a user has recently pressed a button or performed some other user interaction, and when a transient activation is consumed, it is "deactivated" 08:25:29 ... sticky activation by contrast persists until the end of the session to require sticky activation, rather than consuming transient activation 08:25:38 ... the proposal is to relaxing the the user activation requirements to require sticky activation 08:25:48 Anssi: Reilly asks could we relax this to only require sticky activation for the downloadable state? 08:25:53 ... Isaac agrees 08:25:58 ... this suggest a similar change to Language Detector 08:26:02 ... the PR was reviewed and merged, any comments from the group? 08:26:09 q+ 08:26:35 reillyg: Translator API excluded because this behaviour is tied up with anti-fingerprinting mechanism regarding model that is downloaded 08:26:43 ... consider two models for download controls 08:26:53 ... one model for everything and one for developer-specified 08:27:02 ... proposal is for APIs where there is only one model 08:27:04 q? 08:27:19 ack sushraja 08:27:40 sushraja: I'd like to see more transparency in which developer scenario this is a real ask to use Prompt API in the background, this competes with TAG feedback 08:27:44 q+ 08:28:00 ... do you get developer feedback this is preferred? 08:28:14 ... background tabs playing audio? 08:28:17 ack kbx 08:28:34 kbx: developers competing to capture the gesture, some get it some don't 08:28:41 q? 08:28:47 q? 08:29:14 Erik: audio players have restrictions if you open a new tab but don't focus they can't play 08:29:31 ... if we open 10 new tabs and all call Prompt API at the same time, do you need to wait until the tab is visible? 08:29:33 q? 08:29:51 q? 08:30:05 q? 08:30:40 reillyg: this change only impacts the behaviour when the model is downloadable, not when the model has been already downloaded 08:31:02 Markus: does the user know how much needs to be downloaded 08:31:23 ... what if I accidentially download a lot of data while roaming and the data costs a lot? 08:31:42 reillyg: I can imagine a more complex download policy if you're on a metered network to not download 08:31:45 q? 08:32:10 ... we refuse to do anything if you're not on a situation you can download 08:32:21 q? 08:32:51 reillyg: if folks have concerns please chime in on the issue 08:32:53 q? 08:33:04 q+ 08:33:21 Subtopic: Discuss the privacy implications of using a paid cloud option 08:33:24 Anssi: issue #84 08:33:25 https://github.com/webmachinelearning/writing-assistance-apis/issues/84 -> Issue 84 Discuss the privacy implications of using a paid cloud option (by jyasskin) [Agenda+] 08:33:27 Ugur has joined #webmachinelearning 08:33:27 ... this issue was filed for the writing-assistance-apis repos, but the issue refers to the Prompt API explainer that makes a general statement that I believe applies across all built-in AI APIs: 08:33:30 "Allowing hybrid approaches, e.g. free users of a website use on-device AI whereas paid users use a more powerful API-based model." 08:33:34 Anssi: Jeffrey point out here's a risk that this API could reveal to a website whether its user is wealthy (cloud available) or not (local-only). 08:33:37 ... Tom suggests this may not be as strong signal of wealth as e.g. device manufacturer information that is already disclosed 08:33:41 ... any comments from the group? 08:34:38 reillyg: we haven't discussed the cloud option too much 08:34:50 Markus: if someone uses cloud do they care about privacy? 08:35:06 ... someone sends a request to your model and it returns I'm a cloud version? 08:35:29 reillyg: TAG noted even if we don't provide information on client vs cloud it can be inferred from the response received 08:35:30 q? 08:35:55 RRSAgent, draft minutes 08:35:57 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik 08:36:31 q+ 08:36:39 reillyg: if we as browser vendor want to ensure that the users of any means have access to these models, as a service, then that is an incentive to find a solution 08:36:51 ... alternative is to delegate all to the developer 08:37:08 sushraja: WebGL/GPU do not have this consideration? 08:37:28 kbx: some implemented would do this differently, maybe on server if it is free 08:37:50 q? 08:37:56 ack kbx 08:38:16 Tarek: in Chrome implementation, do you fall back to the model already available on the system? 08:38:26 reillyg: Chrome on Android uses the model that ships with Android 08:38:42 sushraja: similarly on Windows, we have a plan to contribute the OS provided model 08:39:05 Tarek: for cloud option, what if we can us eBring Your Own Model? 08:39:10 q? 08:39:13 ack Tarek 08:39:48 Erik: can of worms, punting may make it harder to response later 08:39:59 reillyg: going to the opposite direction to Web Speech API 08:40:13 q? 08:40:27 q? 08:40:43 Topic: Wrap up 08:41:16 Anssi: thank you for your active participation and exciting discussions on our incubations, agentic web and built-in AI capabilities! 08:41:22 ... similarly to yesterday, interested folks are welcome to join us for a dinner 08:41:26 ... the plan would be to again meet in the Portopia Hotel (adjacent to the Kobe International Conference Center) lobby at 18:15 to coordinate on transport and restaurants, more restaurants should be open today than yesterday so more options! 08:41:29 ... restaurant options: 08:41:37 -> https://www.w3.org/wiki/TPAC/2025/Restaurants 08:41:37 -> https://mgifford.github.io/Food-W3C-Kobe/ 08:41:44 RRSAgent, draft minutes 08:41:45 I have made the request to generate https://www.w3.org/2025/11/11-webmachinelearning-minutes.html anssik 08:41:48 hagio has left #webmachinelearning 10:30:50 lei_zhao has joined #webmachinelearning 12:02:08 sa-takagi has joined #webmachinelearning 13:36:36 sa-takagi has joined #webmachinelearning