09:00:09 RRSAgent has joined #smartagents-main 09:00:13 logging to https://www.w3.org/2026/02/26-smartagents-main-irc 09:00:16 zohar has joined #smartagents-main 09:00:53 Meeting: Smart Voice Agents - Day 2 09:02:43 chiaradm has joined #smartagents-main 09:03:00 Present+ IK Ngoo, GinaSmith, DeborahDahl, ZoharGan, PaoloaDiMaio, Emmet Coin, Ulrike Stiefelhagen, ChiaraDeMartin, GerardChollet, KazuyukiAshimura, plh 09:04:54 kaz has joined #smartagents-main 09:05:21 Zakim has joined #smartagents-main 09:05:23 present+ BenjaminWeiss, DirkSchnelle-Walka 09:05:45 present+ ShadiAbou-Zahra 09:05:45 meeting: W3C Workshop on 09:05:46 Smart Voice Agents - Session 2 09:05:53 dirk has joined #smartagents-main 09:05:55 zakim, who is on the call? 09:05:55 Present: BenjaminWeiss, DirkSchnelle-Walka, ShadiAbou-Zahra 09:06:27 present+ IK Ngoo, GinaSmith, DeborahDahl, ZoharGan, PaoloaDiMaio, Emmet Coin, Ulrike Stiefelhagen, ChiaraDeMartin, GerardChollet, KazuyukiAshimura, plh 09:06:34 zakim, who's here? 09:06:34 Present: BenjaminWeiss, DirkSchnelle-Walka, ShadiAbou-Zahra, IK, Ngoo, GinaSmith, DeborahDahl, ZoharGan, PaoloaDiMaio, Emmet, Coin, Ulrike, Stiefelhagen, ChiaraDeMartin, 09:06:38 ... GerardChollet, KazuyukiAshimura, plh 09:06:38 On IRC I see dirk, Zakim, kaz, chiaradm, zohar, RRSAgent, bkardell, ddahl, plh 09:07:56 present+ EmmetCoin 09:08:12 present+ EmmettCoin 09:08:17 present- EmmetCoin 09:08:24 present- Emmet 09:08:30 present- Coin 09:09:26 topic: Scene setting 09:09:50 dd: (gives explanations on the goals, logistics and expectations) 09:10:43 topic: Accessibility of 3D and Immersive Content via Voice Interaction = Zohar Gan 09:10:50 s/=/-/ 09:10:53 rrsagent, make log public 09:10:58 rrsagent, draft minutes 09:10:59 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 09:13:58 zg: (describes difficulties and challenges around accessibility for 3D/Immersive Content) 09:14:54 present+ KristiinaJokinen 09:15:56 chiaradm has joined #smartagents-main 09:17:17 chiaradm has joined #smartagents-main 09:18:03 present+ 09:18:08 chiaradm has left #smartagents-main 09:28:19 kaz: great input for existing W3C groups around media handling and web-based digital twins 09:28:27 ... suggest you work with those groups 09:28:40 ... can send details about those groups to you later 09:28:50 zg: great 09:29:03 ec: this is great and beneficial 09:29:07 zg: tx! 09:30:50 pdm: quick question 09:31:00 ... tx for your presentation 09:32:14 ... consideration to natural voice 09:32:40 ... whether the developers are thinking of making more natural voice for the system 09:33:08 present+ YashGhelani 09:33:27 ... naturalness of the voice 09:33:36 zg: so would make the voice more natural? 09:33:39 pdm: yes 09:33:50 zg: doing poc 09:33:58 ... good idea to explore 09:34:07 present+ YashGhelani 09:34:32 ... previous solution was using recording of people's voice itself 09:34:51 ... another approach is using Web API for more natural voice and customization 09:34:55 ... maybe we can use that 09:35:08 ... from the practical viewpoint, can be fast enough 09:35:15 pdm: tx 09:35:27 ... advances of natural speech 09:35:33 ... so just want to suggest 09:35:49 ... UX would be better with more natural speech 09:36:26 topic: Transition of Use Cases for Voice to LLM-based RAG or Agent setups in difficult scenarios - Ulrike Stiefelhagen 09:37:24 us: (present about hallucinations in Voice agents) 09:38:10 ... (describes several use cases) 09:38:49 ... (existing difficulties with voice interface) 09:40:08 ... (mentions patients' Voice Assistant "Juki") 09:41:36 ... (pros and cons of voice interface) 09:46:35 ... (opportunity of GenAI) 09:47:08 ... (another use case of Workers' Voice Assistant "Helping Harry" 09:47:23 ... (within noisy environment) 09:49:23 s/GenAI/GenAI and Voice/ 09:49:29 rrsgent, draft minutes 09:51:02 ... (opportunity for GenAI & Voice again) 09:51:34 ... (deal with different pronunciations) 09:52:03 s/rrsgent, draft minutes// 09:52:09 rrsagent, draft minutes 09:52:10 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 09:53:09 ec: years ago working on a similar system for warehouse 09:53:20 ... the question was demanding timing 09:53:34 ... GenAI model can handle that? 09:53:40 ur: didn't use picking 09:53:52 ... we chose workstations build up things 09:54:12 ... regarding the speed/timing, we also see difficulty 09:54:25 ... don't generate timing 09:54:35 ... it's basically done ahead 09:54:44 ... don't need variation at the moment 09:55:37 ... estimate the timing for our work 09:55:50 ... traffic light says time needed 09:56:17 ... each of work step depends of fraction of time 09:56:36 ... 150 steps around clock 09:56:50 ... there is some flexibility 09:57:55 kaz: worked on realtime OS based speech timing for my Ph.D thesis 10 years ago 09:58:28 ... it depends on use cases but strict timing management would be useful for more natural dialogue-based communication with GenAI 09:58:40 ... would suggest we work on that kind of advanced use cases as welll 09:59:21 topic: Towards Smarter Voice Interfaces: Using Grounding and Knowledge - Kristiina Jokinen 09:59:32 dd: can play your video for you 09:59:52 kj: sorry about my bad connection today 10:00:10 dd: (starts Kristiina's recorded video) 10:00:14 present+ Jazmin 10:00:23 rrsagent, draft minutes 10:00:24 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 10:01:06 kj: (gives her talk about smarter voice interfaces) 10:01:51 ... (challenges for voice interaction) 10:05:11 ... (Errors in voice interaction) 10:06:33 ... (Grounding: Collaborative mechanism) 10:10:24 ... (Knowledge graphs for grounding) 10:11:56 ... (Agentic Architecture) 10:15:40 dd: tx! 10:15:50 ... do you think you can get questions? 10:15:53 kj: let's try! 10:16:06 ec: 2 questions 10:16:14 ... formalism for grounding structure? 10:16:24 ... knowledge graphs and DB? 10:16:36 ... passed from one AI agent to another? 10:16:49 kj: very important point 10:16:58 ... so far we have basically worked for different projects 10:17:13 ... the expertise from those projects could be applied 10:17:23 ... also explore what we can do 10:17:39 ... it's very important to try to work on some formalization 10:17:59 ... so that other can easily build their systems 10:18:28 ... should dive into a bit more details 10:18:37 ... about how to build the systems 10:18:55 ... so far we've had kind of dialog modelling aspect 10:19:15 ... but then make it computable needs more discussion 10:19:33 ec: another simple question 10:20:07 ... do you address the interaction, e.g., about the date using different styles? 10:20:12 kj: good question 10:20:27 ... same entity with different expressions within the knowledge graph 10:21:07 ... if you actually can refer to the entity which is already grounded, the description could be shorten 10:21:16 ec: tx, interesting 10:21:22 rrsagent, draft minutes 10:21:23 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 10:21:54 topic: Towards Web Standards for Configurable Naturally Responsive Voice Interaction for AI Agents - Paola Di Maio 10:22:03 dd: Paola has issue with connection 10:22:18 ... maybe Gerard could show her recorded video 10:22:23 gc: let me try 10:22:43 ... (shows Paola's recorded Video) 10:22:54 ... wondering about the sound 10:22:58 dd: try again 10:26:13 pdm: (traditional pipeline for speech-to-speech: STT -> LLM -> TTS) 10:27:20 ... (great tech but missing UX) 10:30:12 ... (7 critical usability failures) 10:32:50 ... (proposed UX requirements) 10:38:50 ... (then shows a demo) 10:39:34 rrsagent, draft minutes 10:39:36 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 10:43:15 ec: any work around speech recognition 10:43:37 ... to detect whether the speech is finished or not 10:43:59 pdm: very technical 10:44:23 ... there is some work done 10:46:10 kaz: when I was a ASR researcher 10 years ago, worked on a kind of big speech corpus to detect the readiness of utterance for interactive dialog 10:46:40 ... maybe we might think about even bigger dialog corpus for today's use cases from research viewpoint 10:47:02 pdm: yeah, many things to be done for research 10:47:28 kj: did you just use the information for end-to-end interaction? 10:47:49 pdm: there is a draft paper as the basis of the demo 10:47:56 ... 5 use cases there 10:48:12 ... each use case use specific script 10:48:25 ... note that models are changing every moment 10:48:40 ... variables related to the users also change 10:48:53 ... I have an outline 10:50:00 dd: some more discussion would be great during the breakout sessions 10:51:25 topic: Gaze-Aware Dialog Systems - Fares Abawi 10:51:50 gc: (shows recorded video for Fares) 10:52:04 ... (Under utilized modality) 10:52:18 ... (where gaze matters) 10:54:23 ... (multimodal pipeline) 10:54:35 ... (multimodal fusion) 10:55:57 ... (neural integration) 10:56:37 PaolaDM has joined #smartagents-main 10:57:02 Thanks for setting up the IRC channel 10:59:49 ... (standardization needs) 11:03:13 pdm: wondering what kind/level of role "gaze" would play 11:03:36 dd: a use case is trend users 11:03:58 pdm: how much train the users for the gaze modality is a question 11:04:38 kaz: reminded us of the W3C MMI WG's work 11:04:49 ... maybe we can talk wit him about that too 11:04:53 dd: yeah 11:05:25 us: in the factory context, we can add a context using gaze modality 11:05:43 kj: very interesting work 11:05:53 ... a lot of research around gaze and turn taking 11:06:03 ... it's kind of point device 11:06:19 ... wondering if there is some way to distinguish things from each other 11:06:30 ... when we want to take turns 11:06:39 ... implying our expectations 11:06:50 rrsagent, draft minutes 11:06:52 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 11:08:22 dirk: useful for automotive environment 11:08:53 dd: time for wrapping up the presentation part 11:09:06 ... and then moving to the breakout part 11:09:16 [10-min break] 11:09:21 rrsagent, draft minutes 11:09:22 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 11:22:07 topic: Breakouts 11:23:21 zakim, who is here? 11:23:21 Present: BenjaminWeiss, DirkSchnelle-Walka, ShadiAbou-Zahra, IK, Ngoo, GinaSmith, DeborahDahl, ZoharGan, PaoloaDiMaio, Ulrike, Stiefelhagen, ChiaraDeMartin, GerardChollet, 11:23:24 ... KazuyukiAshimura, plh, EmmettCoin, KristiinaJokinen, chiaradm, YashGhelani, Jazmin 11:23:24 On IRC I see PaolaDM, dirk, Zakim, kaz, RRSAgent, bkardell, ddahl, plh 11:23:40 dirk: (gives instructions for breakouts) 11:24:32 ... suggestion from Debbie was combining some of the talks 11:25:24 [[ 11:25:26 Room 1 - Kristiina 11:25:26 Room 2 - Zohar 11:25:26 Room 4 - Ulrike 11:25:26 Room 5 - Paola/Fares 11:25:27 ]] 11:26:12 dirk: slides on guidance and notes 11:26:15 [[ 11:26:39 Group A (Kristiina Jokinen, Zoom Room 1): https://docs.google.com/presentation/d/1nKTrSX0VmyC1dNyF5e9JegX5Ggwwst4UnlQVR8eL3E8/edit?usp=sharing 11:26:40 Group B (Zohar Gan, Zoom Room 2): https://docs.google.com/presentation/d/1wYd6U3OCj2fFhdKobsRJkWzRPY5knLajxi8o0Csg-8M/edit?usp=sharing 11:26:40 Group C (Fares Abawi, Paola Di Maio, Zoom Room 3): https://docs.google.com/presentation/d/1paQE60SVoe5xmFAHA1cvVmJuVXmdGuMYd0iOGeHwWAI/edit?usp=sharing 11:26:40 Group D (Ulrike Stiefelhagen, Zoom Room 4): https://docs.google.com/presentation/d/1X7Nb5s5fWzd21SX5x5m8-erFFu9FAWNTzGXdDFUCCU4/edit?usp=sharing 11:26:42 ]] 11:27:31 dd: dirk has sent an email about how to join the breakout zoom 11:27:36 dirk: did that 11:28:35 plh: btw, this main room will become the Room 1 11:28:50 rrsagent, draft minutes 11:28:51 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 11:30:35 zohar has joined #smartagents-main 11:31:58 I see a message in room 2 saying the host has another meeting in progress and I can't join 11:32:32 @plh zoom says the host has another meeting in progress when I try to join room 2 11:32:42 hu... 11:32:50 I'm in room 2 11:37:27 only room 1 and 2 are now running. not enough participants in room 3 and 4. 11:57:25 topic: People coming back to the main room 11:58:39 s/topic: People coming back to the main room/[People coming back to the main room]/ 11:58:44 rrsagent, draft minutes 11:58:45 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 12:05:00 present+ FaresAbawi 12:05:24 (discussion about gase, and also turn taking) 12:07:40 bw: not only gaze but various signals could be used 12:09:02 ... that's kind of like playing a boardgame 12:09:21 fa: detecting tension directed to tasks is needed 12:09:45 ... the model should detect that 12:10:00 ... e.g., my gaze indicating something or sometimes not 12:10:21 ... we can have different multiple stages 12:10:38 ... and the model can ignore some of them depending on the situation 12:10:57 ... you can't expect the model can see everything 12:11:06 ... the model has no idea about our intention 12:11:39 ... there could be extensions based on our intention like smart glasses 12:11:46 ... there are many open questions 12:11:58 ... need formalization 12:12:17 ... I'm more concerned about operation of users 12:12:54 ... if it's very functional, there still could be problems with requiring specific Web application 12:13:07 kj: we need to define scenarios 12:13:25 ... and need to know if the gaze means something around user's interaction 12:13:59 ... if the gaze helps the system to identify the intention, can be kind of monitor 12:14:10 ... but there is another aspect 12:14:23 ... what your gaze tells for people 12:14:36 ... providing some intention to people and the system 12:14:50 ... so that they can manage the situation better 12:15:57 ... reminded of some work on standardization 12:16:13 ... when we built in some technology or system 12:16:34 ... some kind of standardization can be achieved in a sense of technology 12:16:58 ... useful and helpful but how to measure the usefulness? 12:18:06 dd: time to wrap up. the other group was talking about accessibility of 3D and Immersive content 12:18:29 zg: media semantic meta data 12:18:52 ... important part of the data 12:19:14 ... helpful to improve privacy and latency using hybrid approach 12:19:52 ... producing semantic metadata for accessibility 12:20:03 ... collaboration within W3C would be useful 12:20:22 ... WebVTT, WebVMT, etc. 12:20:51 ... someone's gaze to identify the intention 12:20:55 dd: tx 12:21:16 ... we can continue the plenary discussion here at the main channel 12:21:38 topic: Closing 12:21:48 dd: one comment on gaze 12:22:07 ... gaze in the real world is difficult to handle 12:22:17 ... much bigger than the screen 12:22:34 ec: what do you look at is a question 12:22:53 fa: as long as you know about the screen 12:23:11 ... but completely different with the real world situation 12:23:24 ... but quite good way to segment objects 12:23:31 ... useful when you talk about entities 12:23:51 ... gazing upon something is already a question to be detected 12:24:04 ... in a 3D world, segmenting object is useful 12:24:49 ... maybe someone else can talk about more 12:24:53 rrsagent, draft minutes 12:24:54 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 12:25:03 kj: segmentation is related to time information 12:25:43 ... probably want to know what kind of gaze pattern is being used 12:25:52 ... how difficult to detect it 12:26:08 ... time-based segmentation is important 12:26:14 fa: definitely 12:26:33 ... just wanted to mention the importance within the real world 12:28:26 kaz: probably we need to think about possible use cases for synchronization of ... 12:28:36 ec: yes, there are synchronization problems 12:29:03 ... how do we synchronize multiple date? 12:29:10 ... not just humans vs AI 12:29:18 ... need to think about different time scale 12:29:36 kj: need to think about some storage 12:29:50 ... helpful memory 12:30:08 ... maybe sometimes we once go back to some point 12:30:34 ec: maybe we need to enforce AI to wait 12:30:57 ... would be helpful more natural conversations 12:31:46 kj: if there are many people, the conversation tend to split into several sub groups 12:32:16 dd: we have to make sure whatever depending on the culture 12:32:21 ... one culture and another 12:32:42 ... get used to different cultures 12:32:53 ... dangerous to assume one specific culture 12:33:00 ec: yeah 12:33:20 dd: people has different immigration history 12:33:25 ... even in US 12:34:18 s/of .../of multiple data streams/ 12:34:41 kj: coaching for elderly adults 12:34:56 ... poc experiment to evaluate usefulness 12:35:06 ... what topics people wanted to talk about 12:35:18 ... and see how the system should react 12:35:40 ... it's community-oriented 12:36:23 ... in the JP context, the system can know the usage 12:36:40 ... in the sort of western context, systems are kind of tools for information 12:36:57 ... emotional understanding of intelligence 12:37:13 ... it was interesting when you want to decide the system 12:37:18 ... can be one for all 12:37:39 ... but personalization and adoption would be useful 12:38:17 ... working in different cultures would require different viewpoints 12:38:50 pdm: not only culture but there is individual difference as well 12:38:56 ... a lot of circumstances thre 12:38:59 s/thre/there/ 12:39:13 dd: cultural difference based on cultures 12:39:37 fa: collective cultures 12:40:03 ... usually would look at a small cluster depending on some specific culture 12:40:18 ... or wide glance for the whole people 12:40:38 ec: sometimes we use "gaze" to focus on something 12:40:50 ... carries several meanings 12:41:11 ... we can detect reliably the standard usage of "gaze" 12:41:26 us: the difference between "gaze" and "gesture" 12:41:41 ... how we look 12:41:52 ... how to track gaze? 12:41:59 fa: we use web cam 12:42:13 ... something called web gaze for browsers 12:42:21 ... first step is calibration 12:42:42 ... you don't force the users to purchase special devices 12:42:54 s/web gaze/web gazer/ 12:43:09 ec: web gazer? 12:43:18 plh: https://webgazer.cs.brown.edu/ 12:43:30 rrsagent, draft minutes 12:43:31 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 12:43:53 fa: this is more like a poc 12:44:14 us: could use gaze tracking 12:44:27 ... and the target of the gaze is a square 12:44:34 fa: exactly 12:44:50 pdm: interact with voice? 12:45:09 ... if people has difficulty with speaking 12:45:20 ... how to use gaze for voice? 12:45:45 fa: first focus on 3 things 12:45:50 ... turn-taking 12:45:57 ... divergent 12:46:06 ... and pointing 12:46:12 rrsagent, draft mintues 12:46:12 I'm logging. I don't understand 'draft mintues', kaz. Try /msg RRSAgent help 12:46:19 s/rrsagent, draft mintues// 12:46:23 rrsagent, draft minutes 12:46:24 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 12:46:36 ... that's my proposal essentially 12:46:59 topic: Wrap-up 12:47:08 dd: a lot of interest in "gaze" 12:47:23 ... for example, for turn-taking and divergence 12:47:41 ... combination with voice for interaction 12:47:51 ... sounds like a recommendation from this group 12:47:58 ... anything else? 12:49:25 kaz: integration of multiple modalities/resources including "gaze" is important 12:49:54 ... also as Emmett mentioned time synchronization among those resources would be important for advanced use cases 12:49:57 dd: right 12:50:05 ... regarding tomorrow's session 12:50:30 -> https://www.w3.org/2025/10/smartagents-workshop/agenda.html#session3 Session 3 12:50:43 dd: 4 topics 12:50:52 ... real-time processing 12:51:01 ... in-vehicle interaction 12:51:08 ... trust/empathy 12:51:20 ... beyond screen readers 12:51:33 dirk: tx a lot for your presentations! 12:51:53 kj: thank you very much from me too 12:52:00 rrsagent, draft minutes 12:52:01 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 12:52:50 dd: if you won't be able to join tomorrow's session, all the presentations will be published later 12:52:57 kj: appreciated 12:53:38 [Session 2 adjourned] 12:53:42 rrsagent, draft minutes 12:53:43 I have made the request to generate https://www.w3.org/2026/02/26-smartagents-main-minutes.html kaz 13:25:36 kim has joined #smartagents-main 14:12:30 Zakim has left #smartagents-main 15:41:49 kim has joined #smartagents-main 15:42:58 kim has left #smartagents-main