17:00:18 RRSAgent has joined #smartagents-main 17:00:22 logging to https://www.w3.org/2026/02/25-smartagents-main-irc 17:01:02 SarahWood has joined #smartagents-main 17:02:31 present+ SarahWood, EmmetCoin, GerardChollet, Zohar, Molly, RJBurnham, UlrikeStiefelhagen 17:03:28 present+ DeborahDahl, plh, PrabhuSingh, FrankieJames, JimLarson, LisaMichaud, DirkSchnelle-Walka, KazuyukiAshimaru 17:03:31 https://www.w3.org/policies/code-of-conduct/ 17:03:50 bkardell has joined #smartagents-main 17:04:04 present+ MiNov, GinaSmith, BhikshaRamakrishnan 17:04:09 present+ 17:04:29 present+ BenjaminWeiss 17:05:56 thelounge has joined #smartagents-main 17:05:57 present+Sarah Wood 17:06:07 zakim, who is here? 17:06:07 Present: SarahWood, EmmetCoin, GerardChollet, Zohar, Molly, RJBurnham, UlrikeStiefelhagen, DeborahDahl, plh, PrabhuSingh, FrankieJames, JimLarson, LisaMichaud, DirkSchnelle-Walka, 17:06:10 ... KazuyukiAshimaru, MiNov, GinaSmith, BhikshaRamakrishnan, bkardell, BenjaminWeiss, Wood 17:06:10 On IRC I see thelounge, bkardell, SarahWood, RRSAgent, Zakim, kaz, plh, dirk 17:06:23 Frankie has joined #smartagents-main 17:06:24 LisaNMichaud has joined #smartagents-main 17:08:20 s/Ashimaru/Ashimura/ 17:08:31 agenda: https://www.w3.org/2025/10/smartagents-workshop/agenda.html 17:09:52 topic: 3. Solving Lead vs. Lead: Consistent Pronunciation for Web Content - Sarah Wood 17:12:03 Resent instructions via email to Patricia Lee to check if she did not recive them 17:14:54 should we submit questions here? 17:15:15 sarah: (describes the importance of standardized way to specify pronunciation in Web contents for assistive purposes) 17:15:59 Frankie, raise your hand on zoom 17:16:07 bkardell has joined #smartagents-main 17:16:18 I'm not finding the button for that - sorry, more of a Google Meet user 17:16:38 okay got it 17:16:52 br: cost for that purpose? 17:17:44 s/Zohar/ZoharGan/ 17:17:54 present+ LisaMichaud 17:18:09 sw: can provide examples by email 17:19:10 fj: question about localization 17:19:22 ... e.g., navigation on automotives 17:19:29 ... how to handle that? 17:19:48 ... several possible pronunciations 17:20:05 sw: could see some algorithm 17:20:13 ... based on geographical areas 17:20:41 ... local pronunciation like this 17:21:21 present+ KimPatch, FaresAbawi 17:21:47 jl: simple solution using dictionaries 17:22:05 ... for applicaitons 17:22:15 s/applicaitons/applications/ 17:22:45 ... each application can resolve the pronunciation 17:22:57 sw: sounds a reasonable solution 17:23:17 jl: don't think one single solution would fit all the possible cases 17:23:37 sw: need a mechanism for author control 17:24:30 rj: long-term problem there 17:24:42 ... how to represent using phonemes 17:25:05 ... how modern speech synthesis engines handle it 17:25:29 ... this is the real major problem 17:25:30 zohar has joined #smartagents-main 17:25:36 ... the problem is specifications to represent it 17:25:40 ddahl has joined #smartagents-main 17:26:17 ... my point is not so much specifications but we need to find a way to specify that 17:26:36 ... to train models 17:26:44 sw: completely agree 17:27:20 ... if you have any suggestions, happy to hear that 17:27:31 present+MattShomphe 17:28:08 dd: question from @@@ 17:28:20 sw: not sure about ideal answers 17:28:24 Prabhu: What about derived languages like Hinglish? eg suffer safar 17:29:12 kaz: we used to have several workshops over i18n for ssml 17:29:54 ... ie ssml for various languages. my impression is that we should think about culture, location, background, and context for proper pronunciation. 17:30:38 Brian (on the chat): I will just note that the w3c web audio group has been taking up Web Speech again and it’s currently only dealing with STT, which it is achieving via a biasing dictionary. It will get around to TTS again and that would be a good place to align some of these comments and feedback to make sure that if possible solutions in one 17:30:38 area work in another well 17:30:59 topic: Hallucination in Automatic Speech Recognition Systems - Bhiksha Raj 17:32:06 br: (describes problems around hallucination in ASR systems) 17:34:00 bkardell has joined #smartagents-main 17:34:48 ... (then summarizes the history of ASR technologies) 17:37:39 Ulrike has joined #smartagents-main 17:43:32 ... (then lists current work on mitigation for hallucinations in ASR) 17:44:34 gc: related to noise of speech 17:45:14 br: rather related to mis-learning patterns 17:45:31 ... could show details later 17:45:46 ... specific patterns in the training data 17:46:05 ... there are "noises" for training. that's right 17:46:57 kaz: from tech/research vient point, it's interesting. but what is expecting from the standard? do we need standardize way to identify those kinds of errors? 17:48:15 br: yes, we need some benchmarking. if I give the same output to 10 individuals, maybe 8 will say it's hallucinated. we need to formalize 17:49:26 ec: this is great. when using an llm, we could limit its scope, ie giving it a model. prompts augment llm, a prompt grammar can help. 17:49:39 br: those tends to become very large 17:50:00 ... having to feed that to an llm would be beyond the capacity of moderm llms 17:50:13 ec: giving a sketch/output might be enough 17:50:20 ... would limit mistakes 17:51:02 Matt: whipser has multilangual capabilities, did you find issues with multilingual? 17:51:14 br: yes 17:52:24 dirk: can we detect in realtime? 17:53:02 br: can take a second to process. realtime tends to bring issue because you're dealing with chunks. 17:53:13 ... and chunks might not have complete words or phrases. 17:53:56 fares: on hallucination errors, you used an other language and you assessed the output. 17:54:15 br: we tested again human annotated data 17:54:22 s/again/against/ 17:55:03 br: in the 3 way classifications, 2 humans agree most of the time. 17:55:15 fares: ok, makes sense, like a helllo score. 17:56:07 topic: Multi-Agent Conversational Methodology 17:56:51 This was interesting to me as I've had a lot of experience lately with TTS hallucination, and here we have ASR hallucination. When you think about it - an E2E system has so many opportunities for things to take an interesting direction in a single conversational turn! 18:00:06 s/Methodology/Methodology - Emmett Coin/ 18:00:11 rrsagent, make log public 18:00:19 rrsagent, draft minutes 18:00:20 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz 18:05:06 ec: (describes mechanisms around common protocol for multi-agent conversational systems) 18:06:02 LisaNMichaud has joined #smartagents-main 18:10:16 ... (then shows a demo with multiple agents and a participant) 18:13:59 JimLarson: How does Open Floor Protocol relate to google’s A2A protocol? Are they complementary or competitive? 18:14:17 ec: they are different. 18:15:17 ... this is live interactive, communal, time saving approach. 18:15:47 jl: if I use the google system, does it prohibit me from using the other system. 18:16:03 ec: if it's an agent, you can connect it to anything you want. 18:16:27 YashGelana: The ‘floor’ can essentially live at the operating system level, where each OS is “infused” with many intelligent agents (each specializing in certain actions and equipped with the required tools) and the user is interacting with agents (either one at a time, or commanding multiple agents at a time if a task requires multi-agent 18:16:28 co-ordination) 18:16:41 present+ PaoloDiMaio 18:16:51 present+ BevCorwin 18:17:06 ec: the floor is also a server that lives on the internet 18:17:12 ... you can invite agents 18:17:31 ... it would be like a hosting service 18:18:02 ... if it's in the OS, it becomes similar like the google thing. this is open, like web pages. 18:18:48 kaz (in the chat): (can wait) very exciting ideas/approach. wondering which part to be standardized for the Web technology. protocol, api or might be dialog management model? 18:19:51 mn: security/privacy of the data, between me and the agent, or between agents 18:20:00 ec: using https as a start 18:20:15 ... there is a way in the protocol to send a private msg 18:20:35 ... we have the idea of obfuscating things. 18:20:52 ... so that the agents can still understand some of the context 18:21:22 rj: have you looked at the work within the ietf, vtime 18:21:25 ec: nope 18:21:39 rj: vtime is meant to capture the entire context of multiple conversations 18:21:58 ... they solve different problems but may be worth looking at it 18:22:13 ... privacy, redaction, verification of data, compliance, etc. 18:22:35 ec: interesting indeed. summarizing the context 18:22:44 s/vtime/vcon/g 18:23:30 topic: Reimagining Standards for Voice AI: Interoperability Without Sacrificing Innovation - RJ Burnham 18:24:23 zakim, who's here? 18:24:23 Present: SarahWood, EmmetCoin, GerardChollet, Zohar, Molly, RJBurnham, UlrikeStiefelhagen, DeborahDahl, plh, PrabhuSingh, FrankieJames, JimLarson, LisaMichaud, DirkSchnelle-Walka, 18:24:27 ... KazuyukiAshimaru, MiNov, GinaSmith, BhikshaRamakrishnan, bkardell, BenjaminWeiss, Wood, KimPatch, FaresAbawi, MattShomphe, PaoloDiMaio, BevCorwin 18:24:27 On IRC I see LisaNMichaud, Ulrike, bkardell, ddahl, zohar, Frankie, thelounge, RRSAgent, Zakim, kaz, plh, dirk 18:24:49 present+ YashGhelani 18:25:05 present+ MollyLewis 18:26:15 i|meant to|https://datatracker.ietf.org/group/vcon/about/ Virtualized Conversations (vcon)| 18:26:30 rrsagent, draft minutes 18:26:31 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz 18:30:05 rj: (describes the recent paradigm shift from directed-dialog world to LLM-driven agents) 18:32:57 ... (then summarizes the current practices like MCP, A2A, and AGENTS.md but they're proprietary technologies) 18:33:56 ... (missing the flow layer, which covers portable interchange representation for voice agents) 18:36:06 ... (what standards could help for that?) 18:36:38 ... (shows examples) 18:37:19 ... (need to evangelize the industries) 18:39:14 dd: question around knowledge graphs 18:39:28 gc: some experience for dialog systems 18:39:59 dd: questions from Yash on the Chat 18:40:17 ... I find that thinking of LLM applications as POMDPs is a better mental model when designing conversational agents (voice or otherwise) 18:40:29 A mental model I like to adopt: Tell voice agents what needs to be achieved, not how to go about achieving it 18:40:53 s/A m/... A m/ 18:41:33 rj: what the cost would be 18:41:47 ... humans do that all the time 18:42:09 ... we have to handle that carefully 18:42:26 ... should be prescriptive 18:42:49 dd: 10-min break now 18:43:06 [10 minutes break] 18:43:13 rrsagent, draft minutes 18:43:14 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz 18:45:24 topic: Governance and Greenlights: Leveraging the '3 Ps' to Standardize Trust, Scale, and Usability in Voice Agent Web Integration - Patricia Lee 18:45:28 (recording) 18:45:52 s/[10 minutes break]// 18:46:34 s/dd: 10-min break now// 18:51:55 Laura has joined #smartagents-main 18:53:48 dd: unfortunately, we don't have the speaker here but can get questions on chat 18:54:06 ... then 10-minute break 18:54:07 [10 minutes break] 18:54:13 rrsagent, draft minutes 18:54:14 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz 19:13:09 topic: Breakout room assignment 19:13:59 Solving Lead vs. Lead: Consistent Pronunciation for WebContent: Zoom Room 2 19:14:20 Hallucination in Automatic Speech Recognition Systems: Zoom Room 3 19:14:45 Hallucination in Automatic Speech Recognition Systems: Zoom Room 3 19:15:42 s/Hallucination in Automatic Speech Recognition Systems: Zoom Room 3// 19:15:52 Multi-Agent Conversational Methodology: Zoom Room 4 19:16:08 Reimagining Standards for Voice AI: Interoperability Without Sacrificing Innovation: Zoom Room 5 19:16:31 matt_shomphe has joined #smartagents-main 20:00:16 zohar has joined #smartagents-main 20:00:58 topic: Breakout discussions 20:01:07 topic: Plenary discussion again 20:02:41 dd: we can give 15 mins for each breakout group to make report 20:03:47 subtopic: Solving Lead vs. Lead: Consistent Pronunciation for Web Content - Sarah Wood 20:04:14 sw: didn't generate the official slide deck but had discussion 20:04:49 ... we're not solving the same problem for all the languages 20:05:29 fj: some of the walkaround like pronunciation dictionary 20:05:47 sw: issues about misspelling 20:06:21 ec: demonstration of paragraph 20:06:49 ... you can read it if some of the characters are exchanges with each other 20:07:49 sw: W3C had pronunciation tf before 20:08:06 ... we don't have clear view at the moment 20:08:16 dd: it was a good summary by the tf 20:08:21 ... what to do the next 20:08:31 ... maybe there should be at least a CG 20:08:55 sw: AI has been changed across devices 20:09:12 ec: notation about pronunciation 20:09:27 ... TTS used to just "read" the text 20:09:51 ... wondering if we could think about simple solution 20:10:27 sw: AI is pretty good at handling phonetics, so one possible solution 20:10:54 dirk: forum within W3C to continue the discussion? 20:11:06 bk: there is a Web Speech CG 20:11:27 ... completely aligned with the universal SSML support 20:11:39 --> https://webaudio.github.io/web-speech-api/ Web Speech API 20:11:44 ... also multiple support by MS, Google, Igalia 20:12:17 ... taking TTS once it's got Chartered, probably next year 20:12:40 ... different supports like HTML, ARIA, etc. 20:12:50 dd: how long do we have? 20:12:59 dirk: 10 more minutes? 20:13:11 dd: wondering how much prounciation issiues 20:13:33 ... a lot of languages have hints 20:13:52 ... so wondering differing languages have different issues 20:14:13 sw: also issues on proper names and abbreviations 20:14:26 dd: right 20:14:39 ... and Spanish has a lot of dialects 20:15:19 sw: local languages even within dialects 20:15:35 jl: how words are pronounced in France, etc. 20:15:59 ... lexicon should have information about that 20:16:29 sw: right 20:17:08 jl: different pronunciation by different people 20:17:17 sw: right 20:17:29 ... teachers restrict the variations 20:17:40 jl: customization setting 20:18:17 samantha_estoesta has joined #smartagents-main 20:18:30 kim has joined #smartagents-main 20:18:37 kaz: so we need to think about who from which area is speaking as well as the language itself 20:19:07 subtopic: Reimagining Standards for Voice AI: Interoperability Without Sacrificing Innovation - RJ Burnham 20:19:32 rj: how people support pronunciation and SSML 20:19:51 ... 80% of the interface of the AI agents are same with each other 20:20:11 ... there is no architectural mismatch 20:20:27 ... but we would remove the remaining pain 20:20:42 ... the bigger question is the value 20:20:52 ... and complexity of dialogues 20:21:21 ... (reflects the previous issues at the time of VoiceXML) 20:21:36 ... what is the input and output 20:21:52 ... a lot of context there 20:22:07 ... this is one of the pain points 20:22:18 ... there is not consensus yet 20:22:35 ... probably more exploratory work is needed 20:23:03 ... would be challenging to manage the complexity 20:23:22 rrsagent, draft minutes 20:23:23 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz 20:23:42 sw: exactly we want different browsers 20:23:47 ... any suggestions? 20:23:55 rj: good question 20:24:11 ... this type of problem is hard to solve in a standardized manner 20:25:12 ... engineers from key stakeholders to discuss with each other in a common way 20:25:30 ... very hard to get consensus 20:25:43 ... need a champion 20:25:59 ec: agree we need the next step of VoiceXML 20:26:18 ... how to define the situation for generalization? 20:26:27 rj: very much the case 20:26:45 ec: what is the simple way to specify this? 20:26:54 dd: there is a mechanism of CG 20:27:00 ... also breakout session during TPAC 20:27:37 plh: yeah 20:28:07 ... fyi, there is a breakout session in March too 20:28:15 ... don't have to wait until TPAC 20:28:40 https://github.com/w3c/breakouts-day-2026/issues 20:29:25 s/a breakout session/the breakouts day/ 20:29:49 subtopic: Multi-Agent Conversational Methodology - Emmett Coin 20:30:11 -> https://www.w3.org/2026/02/25-smartagents-4-minutes.html breakout minutes 20:30:20 ec: nice discussion about cultural differences 20:30:30 ... automatically differing speakers 20:30:47 ... standardizing languages like the ESL level 20:31:08 ... wide range of age, culture, etc. 20:31:21 ... also ideas of interaction layer of behavior 20:31:55 se: conversation patterns 20:32:13 Zakim has left #smartagents-main 20:32:56 ... would get expertise on legality, etc., from agents 20:33:04 ec: excellent 20:33:22 ... in simple way, only speak to one person 20:33:44 ... but some one talks with one agent, and another agent can join the conversation 20:34:06 ... simple and rule-based approach is possible 20:34:10 ... simple interaction 20:34:38 ... we talked about a layer of mentality also 20:34:59 ... for vaious generations 20:35:04 s/vaious/various/ 20:35:27 ... we could add those points to make the conversation smoother 20:35:56 dd: very different kind of agents? 20:36:19 ec: do you want every agent to be differentiated? 20:38:07 kaz: for that purpose, we might want to think about some dialogue management model 20:38:43 ... also we could explicitly characterize each agent one by one, e.g., a funny agent and a diligent agent 20:38:50 ec: agree 20:39:03 ... maybe we could have some manifest and persona for that purpose 20:39:27 ... in some case, we need a serous guy without any jokes 20:39:53 dd: we used to have a persona designers 20:40:07 s/ a// 20:40:29 ec: advertise it and other people can see that 20:40:45 ... could have prototype design in some way 20:41:07 present+ Ulrike_Stiefelhagen 20:41:38 us: would rather see one for some specific viewpoint 20:42:05 present+ SamanthaEstoesta 20:42:46 ... but can't see the usefulness to have 5 different personalities one by one 20:43:07 ec: there could be a personality for each agent 20:44:11 fj: probably good to have one from psychological viewpoint 20:44:48 ec: different services could be provided by different agents one by one, train, hotel, etc. 20:45:04 us: it's about "trust", I think 20:45:12 ... the aspect of the task 20:45:21 ... if you go back to the user 20:45:36 ... I trust one from some specific service 20:45:50 ec: could think about various possibilities 20:46:10 dd: one use case could be considered is 20:46:38 ... would it be strange to ask one specific agent to handle different tasks? 20:48:13 ec: from implementation viewpoint, much easier to let one agent handle one occasion 20:49:19 topic: Tomorrow 20:49:50 dd: will have the next session tomorrow 20:50:22 ... 10 mins for each presentation 20:50:52 ec: the content on Zoom chat is useful 20:51:06 dd: could be recorded, I think 20:51:13 [Session 1 adjourned] 20:51:17 rrsagent, draft minutes 20:51:18 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz 20:51:47 i|will have|-> https://www.w3.org/2025/10/smartagents-workshop/agenda.html#session2 Session 2| 20:51:49 rrsagent, draft minutes 20:51:50 I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz