17:00:18 <RRSAgent> RRSAgent has joined #smartagents-main
17:00:22 <RRSAgent> logging to https://www.w3.org/2026/02/25-smartagents-main-irc
17:01:02 <SarahWood> SarahWood has joined #smartagents-main
17:02:31 <plh> present+ SarahWood, EmmetCoin, GerardChollet, Zohar, Molly, RJBurnham, UlrikeStiefelhagen
17:03:28 <plh> present+ DeborahDahl, plh, PrabhuSingh, FrankieJames, JimLarson, LisaMichaud, DirkSchnelle-Walka, KazuyukiAshimaru
17:03:31 <kaz> https://www.w3.org/policies/code-of-conduct/
17:03:50 <bkardell> bkardell has joined #smartagents-main
17:04:04 <plh> present+ MiNov, GinaSmith, BhikshaRamakrishnan
17:04:09 <bkardell> present+
17:04:29 <plh> present+ BenjaminWeiss
17:05:56 <thelounge> thelounge has joined #smartagents-main
17:05:57 <SarahWood> present+Sarah Wood
17:06:07 <plh> zakim, who is here?
17:06:07 <Zakim> Present: SarahWood, EmmetCoin, GerardChollet, Zohar, Molly, RJBurnham, UlrikeStiefelhagen, DeborahDahl, plh, PrabhuSingh, FrankieJames, JimLarson, LisaMichaud, DirkSchnelle-Walka,
17:06:10 <Zakim> ... KazuyukiAshimaru, MiNov, GinaSmith, BhikshaRamakrishnan, bkardell, BenjaminWeiss, Wood
17:06:10 <Zakim> On IRC I see thelounge, bkardell, SarahWood, RRSAgent, Zakim, kaz, plh, dirk
17:06:23 <Frankie> Frankie has joined #smartagents-main
17:06:24 <LisaNMichaud> LisaNMichaud has joined #smartagents-main
17:08:20 <kaz> s/Ashimaru/Ashimura/
17:08:31 <kaz> agenda: https://www.w3.org/2025/10/smartagents-workshop/agenda.html
17:09:52 <kaz> topic: 3. Solving Lead vs. Lead: Consistent Pronunciation for Web Content - Sarah Wood
17:12:03 <dirk> Resent instructions via email to Patricia Lee to check if she did not recive them
17:14:54 <Frankie> should we submit questions here?
17:15:15 <kaz> sarah: (describes the importance of standardized way to specify pronunciation in Web contents for assistive purposes)
17:15:59 <plh> Frankie, raise your hand on zoom
17:16:07 <bkardell> bkardell has joined #smartagents-main
17:16:18 <Frankie> I'm not finding the button for that - sorry, more of a Google Meet user
17:16:38 <Frankie> okay got it
17:16:52 <kaz> br: cost for that purpose?
17:17:44 <plh> s/Zohar/ZoharGan/
17:17:54 <plh> present+ LisaMichaud
17:18:09 <kaz> sw: can provide examples by email
17:19:10 <kaz> fj: question about localization
17:19:22 <kaz> ... e.g., navigation on automotives
17:19:29 <kaz> ... how to handle that?
17:19:48 <kaz> ... several possible pronunciations
17:20:05 <kaz> sw: could see some algorithm
17:20:13 <kaz> ... based on geographical areas
17:20:41 <kaz> ... local pronunciation like this
17:21:21 <plh> present+ KimPatch, FaresAbawi
17:21:47 <kaz> jl: simple solution using dictionaries
17:22:05 <kaz> ... for applicaitons
17:22:15 <kaz> s/applicaitons/applications/
17:22:45 <kaz> ... each application can resolve the pronunciation
17:22:57 <kaz> sw: sounds a reasonable solution
17:23:17 <kaz> jl: don't think one single solution would fit all the possible cases
17:23:37 <kaz> sw: need a mechanism for author control
17:24:30 <kaz> rj: long-term problem there
17:24:42 <kaz> ... how to represent using phonemes
17:25:05 <kaz> ... how modern speech synthesis engines handle it
17:25:29 <kaz> ... this is the real major problem
17:25:30 <zohar> zohar has joined #smartagents-main
17:25:36 <kaz> ... the problem is specifications to represent it
17:25:40 <ddahl> ddahl has joined #smartagents-main
17:26:17 <kaz> ... my point is not so much specifications but we need to find a way to specify that
17:26:36 <kaz> ... to train models
17:26:44 <kaz> sw: completely agree
17:27:20 <kaz> ... if you have any suggestions, happy to hear that
17:27:31 <plh> present+MattShomphe
17:28:08 <kaz> dd: question from @@@
17:28:20 <kaz> sw: not sure about ideal answers
17:28:24 <plh> Prabhu: What about derived languages like Hinglish? eg suffer safar
17:29:12 <plh> kaz: we used to have several workshops over i18n for ssml
17:29:54 <plh> ... ie ssml for various languages. my impression is that we should think about culture, location, background, and context for proper pronunciation.
17:30:38 <plh> Brian (on the chat): I will just note that the w3c web audio group has been taking up Web Speech again and it’s currently only dealing with STT, which it is achieving via a biasing dictionary. It will get around to TTS again and that would be a good place to align some of these comments and feedback to make sure that if possible solutions in one
17:30:38 <plh> area work in another well
17:30:59 <kaz> topic: Hallucination in Automatic Speech Recognition Systems - Bhiksha Raj
17:32:06 <kaz> br: (describes problems around hallucination in ASR systems)
17:34:00 <bkardell> bkardell has joined #smartagents-main
17:34:48 <kaz> ... (then summarizes the history of ASR technologies)
17:37:39 <Ulrike> Ulrike has joined #smartagents-main
17:43:32 <kaz> ... (then lists current work on mitigation for hallucinations in ASR)
17:44:34 <kaz> gc: related to noise of speech
17:45:14 <kaz> br: rather related to mis-learning patterns
17:45:31 <kaz> ... could show details later
17:45:46 <kaz> ... specific patterns in the training data
17:46:05 <kaz> ... there are "noises" for training. that's right
17:46:57 <plh> kaz: from tech/research vient point, it's interesting. but what is expecting from the standard? do we need standardize way to identify those kinds of errors?
17:48:15 <plh> br: yes, we need some benchmarking. if I give the same output to 10 individuals, maybe 8 will say it's hallucinated. we need to formalize
17:49:26 <plh> ec: this is great. when using an llm, we could limit its scope, ie giving it a model. prompts augment llm, a prompt grammar can help.
17:49:39 <plh> br: those tends to become very large
17:50:00 <plh> ... having to feed that to an llm would be beyond the capacity of moderm llms
17:50:13 <plh> ec: giving a sketch/output might be enough
17:50:20 <plh> ... would limit mistakes
17:51:02 <plh> Matt: whipser has multilangual capabilities, did you find issues with multilingual?
17:51:14 <plh> br: yes
17:52:24 <plh> dirk: can we detect in realtime?
17:53:02 <plh> br: can take a second to process. realtime tends to bring issue because you're dealing with chunks.
17:53:13 <plh> ... and chunks might not have complete words or phrases.
17:53:56 <plh> fares: on hallucination errors, you used an other language and you assessed the output.
17:54:15 <plh> br: we tested again human annotated data
17:54:22 <plh> s/again/against/
17:55:03 <plh> br: in the 3 way classifications, 2 humans agree most of the time.
17:55:15 <plh> fares: ok, makes sense, like a helllo score.
17:56:07 <plh> topic: Multi-Agent Conversational Methodology
17:56:51 <LisaNMichaud> This was interesting to me as I've had a lot of experience lately with TTS hallucination, and here we have ASR hallucination.  When you think about it - an E2E system has so many opportunities for things to take an interesting direction in a single conversational turn!
18:00:06 <kaz> s/Methodology/Methodology - Emmett Coin/
18:00:11 <kaz> rrsagent, make log public
18:00:19 <kaz> rrsagent, draft minutes
18:00:20 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz
18:05:06 <kaz> ec: (describes mechanisms around common protocol for multi-agent conversational systems)
18:06:02 <LisaNMichaud> LisaNMichaud has joined #smartagents-main
18:10:16 <kaz> ... (then shows a demo with multiple agents and a participant)
18:13:59 <plh> JimLarson: How does Open Floor Protocol relate to google’s A2A protocol? Are they complementary or competitive?
18:14:17 <plh> ec: they are different.
18:15:17 <plh> ... this is live interactive, communal, time saving approach.
18:15:47 <plh> jl: if I use the google system, does it prohibit me from using the other system.
18:16:03 <plh> ec: if it's an agent, you can connect it to anything you want.
18:16:27 <plh> YashGelana: The ‘floor’ can essentially live at the operating system level, where each OS is “infused” with many intelligent agents (each specializing in certain actions and equipped with the required tools) and the user is interacting with agents (either one at a time, or commanding multiple agents at a time if a task requires multi-agent
18:16:28 <plh> co-ordination)
18:16:41 <plh> present+ PaoloDiMaio
18:16:51 <plh> present+ BevCorwin
18:17:06 <plh> ec: the floor is also a server that lives on the internet
18:17:12 <plh> ... you can invite agents
18:17:31 <plh> ... it would be like a hosting service
18:18:02 <plh> ... if it's in the OS, it becomes similar like the google thing. this is open, like web pages.
18:18:48 <plh> kaz (in the chat): (can wait) very exciting ideas/approach. wondering which part to be standardized for the Web technology. protocol, api or might be dialog management model?
18:19:51 <plh> mn: security/privacy of the data, between me and the agent, or between agents
18:20:00 <plh> ec: using https as a start
18:20:15 <plh> ... there is a way in the protocol to send a private msg
18:20:35 <plh> ... we have the idea of obfuscating things.
18:20:52 <plh> ... so that the agents can still understand some of the context
18:21:22 <plh> rj: have you looked at the work within the ietf, vtime
18:21:25 <plh> ec: nope
18:21:39 <plh> rj: vtime is meant to capture the entire context of multiple conversations
18:21:58 <plh> ... they solve different problems but may be worth looking at it
18:22:13 <plh> ... privacy, redaction, verification of data, compliance, etc.
18:22:35 <plh> ec: interesting indeed. summarizing the context
18:22:44 <plh> s/vtime/vcon/g
18:23:30 <plh> topic: Reimagining Standards for Voice AI: Interoperability Without Sacrificing Innovation - RJ Burnham
18:24:23 <plh> zakim, who's here?
18:24:23 <Zakim> Present: SarahWood, EmmetCoin, GerardChollet, Zohar, Molly, RJBurnham, UlrikeStiefelhagen, DeborahDahl, plh, PrabhuSingh, FrankieJames, JimLarson, LisaMichaud, DirkSchnelle-Walka,
18:24:27 <Zakim> ... KazuyukiAshimaru, MiNov, GinaSmith, BhikshaRamakrishnan, bkardell, BenjaminWeiss, Wood, KimPatch, FaresAbawi, MattShomphe, PaoloDiMaio, BevCorwin
18:24:27 <Zakim> On IRC I see LisaNMichaud, Ulrike, bkardell, ddahl, zohar, Frankie, thelounge, RRSAgent, Zakim, kaz, plh, dirk
18:24:49 <plh> present+ YashGhelani
18:25:05 <plh> present+ MollyLewis
18:26:15 <kaz> i|meant to|https://datatracker.ietf.org/group/vcon/about/ Virtualized Conversations (vcon)|
18:26:30 <kaz> rrsagent, draft minutes
18:26:31 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz
18:30:05 <kaz> rj: (describes the recent paradigm shift from directed-dialog world to LLM-driven agents)
18:32:57 <kaz> ... (then summarizes the current practices like MCP, A2A, and AGENTS.md but they're proprietary technologies)
18:33:56 <kaz> ... (missing the flow layer, which covers portable interchange representation for voice agents)
18:36:06 <kaz> ... (what standards could help for that?)
18:36:38 <kaz> ... (shows examples)
18:37:19 <kaz> ... (need to evangelize the industries)
18:39:14 <kaz> dd: question around knowledge graphs
18:39:28 <kaz> gc: some experience for dialog systems
18:39:59 <kaz> dd: questions from Yash on the Chat
18:40:17 <kaz> ... I find that thinking of LLM applications as POMDPs is a better mental model when designing conversational agents (voice or otherwise)
18:40:29 <kaz> A mental model I like to adopt: Tell voice agents what needs to be achieved, not how to go about achieving it
18:40:53 <kaz> s/A m/... A m/
18:41:33 <kaz> rj: what the cost would be
18:41:47 <kaz> ... humans do that all the time
18:42:09 <kaz> ... we have to handle that carefully
18:42:26 <kaz> ... should be prescriptive
18:42:49 <kaz> dd: 10-min break now
18:43:06 <kaz> [10 minutes break]
18:43:13 <kaz> rrsagent, draft minutes
18:43:14 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz
18:45:24 <plh> topic: Governance and Greenlights: Leveraging the '3 Ps' to Standardize Trust, Scale, and Usability in Voice Agent Web Integration - Patricia Lee
18:45:28 <plh> (recording)
18:45:52 <kaz> s/[10 minutes break]//
18:46:34 <kaz> s/dd: 10-min break now//
18:51:55 <Laura> Laura has joined #smartagents-main
18:53:48 <kaz> dd: unfortunately, we don't have the speaker here but can get questions on chat
18:54:06 <kaz> ... then 10-minute break
18:54:07 <kaz> [10 minutes break]
18:54:13 <kaz> rrsagent, draft minutes
18:54:14 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz
19:13:09 <kaz> topic: Breakout room assignment
19:13:59 <kaz> Solving Lead vs. Lead: Consistent Pronunciation for WebContent: Zoom Room 2
19:14:20 <kaz> Hallucination in Automatic Speech Recognition Systems: Zoom Room 3
19:14:45 <kaz> Hallucination in Automatic Speech Recognition Systems: Zoom Room 3
19:15:42 <kaz> s/Hallucination in Automatic Speech Recognition Systems: Zoom Room 3//
19:15:52 <kaz> Multi-Agent Conversational Methodology: Zoom Room 4
19:16:08 <kaz> Reimagining Standards for Voice AI: Interoperability Without Sacrificing Innovation: Zoom Room 5
19:16:31 <matt_shomphe> matt_shomphe has joined #smartagents-main
20:00:16 <zohar> zohar has joined #smartagents-main
20:00:58 <kaz> topic: Breakout discussions
20:01:07 <kaz> topic: Plenary discussion again
20:02:41 <kaz> dd: we can give 15 mins for each breakout group to make report
20:03:47 <kaz> subtopic: Solving Lead vs. Lead: Consistent Pronunciation for Web Content - Sarah Wood
20:04:14 <kaz> sw: didn't generate the official slide deck but had discussion
20:04:49 <kaz> ... we're not solving the same problem for all the languages
20:05:29 <kaz> fj: some of the walkaround like pronunciation dictionary
20:05:47 <kaz> sw: issues about misspelling
20:06:21 <kaz> ec: demonstration of paragraph
20:06:49 <kaz> ... you can read it if some of the characters are exchanges with each other
20:07:49 <kaz> sw: W3C had pronunciation tf before
20:08:06 <kaz> ... we don't have clear view at the moment
20:08:16 <kaz> dd: it was a good summary by the tf
20:08:21 <kaz> ... what to do the next
20:08:31 <kaz> ... maybe there should be at least a CG
20:08:55 <kaz> sw: AI has been changed across devices
20:09:12 <kaz> ec: notation about pronunciation
20:09:27 <kaz> ... TTS used to just "read" the text
20:09:51 <kaz> ... wondering if we could think about simple solution
20:10:27 <kaz> sw: AI is pretty good at handling phonetics, so one possible solution
20:10:54 <kaz> dirk: forum within W3C to continue the discussion?
20:11:06 <kaz> bk: there is a Web Speech CG
20:11:27 <kaz> ... completely aligned with the universal SSML support
20:11:39 <plh> --> https://webaudio.github.io/web-speech-api/ Web Speech API
20:11:44 <kaz> ... also multiple support by MS, Google, Igalia
20:12:17 <kaz> ... taking TTS once it's got Chartered, probably next year
20:12:40 <kaz> ... different supports like HTML, ARIA, etc.
20:12:50 <kaz> dd: how long do we have?
20:12:59 <kaz> dirk: 10 more minutes?
20:13:11 <kaz> dd: wondering how much prounciation issiues
20:13:33 <kaz> ... a lot of languages have hints
20:13:52 <kaz> ... so wondering differing languages have different issues
20:14:13 <kaz> sw: also issues on proper names and abbreviations
20:14:26 <kaz> dd: right
20:14:39 <kaz> ... and Spanish has a lot of dialects
20:15:19 <kaz> sw: local languages even within dialects
20:15:35 <kaz> jl: how words are pronounced in France, etc.
20:15:59 <kaz> ... lexicon should have information about that
20:16:29 <kaz> sw: right
20:17:08 <kaz> jl: different pronunciation by different people
20:17:17 <kaz> sw: right
20:17:29 <kaz> ... teachers restrict the variations
20:17:40 <kaz> jl: customization setting
20:18:17 <samantha_estoesta> samantha_estoesta has joined #smartagents-main
20:18:30 <kim> kim has joined #smartagents-main
20:18:37 <kaz> kaz: so we need to think about who from which area is speaking as well as the language itself
20:19:07 <kaz> subtopic: Reimagining Standards for Voice AI: Interoperability Without Sacrificing Innovation - RJ Burnham
20:19:32 <kaz> rj: how people support pronunciation and SSML
20:19:51 <kaz> ... 80% of the interface of the AI agents are same with each other
20:20:11 <kaz> ... there is no architectural mismatch
20:20:27 <kaz> ... but we would remove the remaining pain
20:20:42 <kaz> ... the bigger question is the value
20:20:52 <kaz> ... and complexity of dialogues
20:21:21 <kaz> ... (reflects the previous issues at the time of VoiceXML)
20:21:36 <kaz> ... what is the input and output
20:21:52 <kaz> ... a lot of context there
20:22:07 <kaz> ... this is one of the pain points
20:22:18 <kaz> ... there is not consensus yet
20:22:35 <kaz> ... probably more exploratory work is needed
20:23:03 <kaz> ... would be challenging to manage the complexity
20:23:22 <kaz> rrsagent, draft minutes
20:23:23 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz
20:23:42 <kaz> sw: exactly we want different browsers
20:23:47 <kaz> ... any suggestions?
20:23:55 <kaz> rj: good question
20:24:11 <kaz> ... this type of problem is hard to solve in a standardized manner
20:25:12 <kaz> ... engineers from key stakeholders to discuss with each other in a common way
20:25:30 <kaz> ... very hard to get consensus
20:25:43 <kaz> ... need a champion
20:25:59 <kaz> ec: agree we need the next step of VoiceXML
20:26:18 <kaz> ... how to define the situation for generalization?
20:26:27 <kaz> rj: very much the case
20:26:45 <kaz> ec: what is the simple way to specify this?
20:26:54 <kaz> dd: there is a mechanism of CG
20:27:00 <kaz> ... also breakout session during TPAC
20:27:37 <kaz> plh: yeah
20:28:07 <kaz> ... fyi, there is a breakout session in March too
20:28:15 <kaz> ... don't have to wait until TPAC
20:28:40 <kaz> https://github.com/w3c/breakouts-day-2026/issues
20:29:25 <kaz> s/a breakout session/the breakouts day/
20:29:49 <kaz> subtopic: Multi-Agent Conversational Methodology - Emmett Coin
20:30:11 <kaz> -> https://www.w3.org/2026/02/25-smartagents-4-minutes.html breakout minutes
20:30:20 <kaz> ec: nice discussion about cultural differences
20:30:30 <kaz> ... automatically differing speakers
20:30:47 <kaz> ... standardizing languages like the ESL level
20:31:08 <kaz> ... wide range of age, culture, etc.
20:31:21 <kaz> ... also ideas of interaction layer of behavior
20:31:55 <kaz> se: conversation patterns
20:32:13 <Zakim> Zakim has left #smartagents-main
20:32:56 <kaz> ... would get expertise on legality, etc., from agents
20:33:04 <kaz> ec: excellent
20:33:22 <kaz> ... in simple way, only speak to one person
20:33:44 <kaz> ... but some one talks with one agent, and another agent can join the conversation
20:34:06 <kaz> ... simple and rule-based approach is possible
20:34:10 <kaz> ... simple interaction
20:34:38 <kaz> ... we talked about a layer of mentality also
20:34:59 <kaz> ... for vaious generations
20:35:04 <kaz> s/vaious/various/
20:35:27 <kaz> ... we could add those points to make the conversation smoother
20:35:56 <kaz> dd: very different kind of agents?
20:36:19 <kaz> ec: do you want every agent to be differentiated?
20:38:07 <kaz> kaz: for that purpose, we might want to think about some dialogue management model
20:38:43 <kaz> ... also we could explicitly characterize each agent one by one, e.g., a funny agent and a diligent agent
20:38:50 <kaz> ec: agree
20:39:03 <kaz> ... maybe we could have some manifest and persona for that purpose
20:39:27 <kaz> ... in some case, we need a serous guy without any jokes
20:39:53 <kaz> dd: we used to have a persona designers
20:40:07 <kaz> s/ a//
20:40:29 <kaz> ec: advertise it and other people can see that
20:40:45 <kaz> ... could have prototype design in some way
20:41:07 <kaz> present+ Ulrike_Stiefelhagen
20:41:38 <kaz> us: would rather see one for some specific viewpoint
20:42:05 <plh> present+ SamanthaEstoesta
20:42:46 <kaz> ... but can't see the usefulness to have 5 different personalities one by one
20:43:07 <kaz> ec: there could be a personality for each agent
20:44:11 <kaz> fj: probably good to have one from psychological viewpoint
20:44:48 <kaz> ec: different services could be provided by different agents one by one, train, hotel, etc.
20:45:04 <kaz> us: it's about "trust", I think
20:45:12 <kaz> ... the aspect of the task
20:45:21 <kaz> ... if you go back to the user
20:45:36 <kaz> ... I trust one from some specific service
20:45:50 <kaz> ec: could think about various possibilities
20:46:10 <kaz> dd: one use case could be considered is
20:46:38 <kaz> ... would it be strange to ask one specific agent to handle different tasks?
20:48:13 <kaz> ec: from implementation viewpoint, much easier to let one agent handle one occasion
20:49:19 <kaz> topic: Tomorrow
20:49:50 <kaz> dd: will have the next session tomorrow
20:50:22 <kaz> ... 10 mins for each presentation
20:50:52 <kaz> ec: the content on Zoom chat is useful
20:51:06 <kaz> dd: could be recorded, I think
20:51:13 <kaz> [Session 1 adjourned]
20:51:17 <kaz> rrsagent, draft minutes
20:51:18 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz
20:51:47 <kaz> i|will have|-> https://www.w3.org/2025/10/smartagents-workshop/agenda.html#session2 Session 2|
20:51:49 <kaz> rrsagent, draft minutes
20:51:50 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/25-smartagents-main-minutes.html kaz