18:55:10 RRSAgent has joined #smartagents-1 18:55:15 logging to https://www.w3.org/2026/02/27-smartagents-1-irc 18:55:47 meeting: Breakout - Do we need real-time processing capabilities on voiceagents? 18:56:22 ck: (shows a research paper) 18:56:46 ... (about s 18:56:55 s/ s/ speech recognizer) 18:57:02 @@@ 18:57:20 ck: there are 3 operations 18:58:11 ... example of hypothesis revision 18:58:29 s/@@@/aclanthology.org/E09-1081.pdf 18:59:20 present+ KazAshimura, plh, Casey_Kennington, EmmettCoin, RajTumuluri, GerardChollet, PatriciaLee, RlrikeStiefelhagen, YashGhelani 19:00:41 kaz has joined #smartagents-1 19:00:51 rt: (lefts) 19:01:17 ec: backtrack some sort of semantics? 19:01:23 ... vector of meaning 19:01:46 ck: recognizer doesn't detect meaning itself 19:02:02 ... there is no need for revoking or updating 19:02:12 ... the next step is understanding 19:02:16 ... some map for specific task 19:02:34 ... models for tasks 19:03:01 ... then semantics model 19:03:16 ... hypothesis applied 19:03:45 ... the problem with recurrent networks 19:04:17 ... with larger context 19:04:34 ... the other problem is taking a lot of time for training 19:05:15 ... memory errors, etc. 19:06:04 ec: in some way biased 19:06:13 ck: bias is there 19:06:20 ... but understanding is no 19:06:23 s/no/not 19:06:30 ... generalization is a bias 19:07:19 ... none of the language models here is incremental 19:07:37 ... output is incremental 19:07:41 ... and input can be 19:08:20 ... kind of R&D stage now 19:08:36 ec: an agent listening to me 19:08:58 ck: fairly well research question 19:09:42 ec: we did studies about similar situation (about car environment) 19:10:03 s/question/question. Frankie was talking about car environment./ 19:10:22 ck: people are chatting with AI agents 19:10:29 ... worth speaking with humans 19:10:53 ec: have zero recollection about what was done while driving 19:11:16 ck: drivers paying attention for driving 19:11:26 ... that's kind of another direction 19:11:39 ... the model should interrupt you depending on the situation 19:11:50 ... the work by Koji Inoue 19:11:57 ... doing turn-taking 19:12:24 ... 4 duplex microphones 19:12:42 ... feedback model has signal 19:12:59 ... good starting point for interruption model 19:13:12 ec: who is speaking is a key 19:13:53 ck: we have two speaker dialization 19:14:29 ... (describes a model) 19:15:12 ec: Amazon, etc., handles different speakers at home, for example 19:15:29 ck: dialization is a question at the moment 19:15:40 ec: talking to Agents is interesting 19:15:57 ... learning conversation style might be a bit different 19:16:14 ck: think that's a feature of bug 19:16:23 ... it's not having state information 19:16:54 ... maybe language model should handle this 19:17:45 ... (problem with in-car system) 19:17:59 ... what's on my calendar today? 19:18:21 ... need a safe place to listen the response 19:18:30 ... want some sort of introductory 19:19:03 ... hard thing is difficulty with lane change, etc. 19:19:17 ec: agree that's important 19:20:13 kaz: which part to be standardized within W3C? 19:20:23 ck: not at the moment 19:20:27 ... but streaming? 19:20:37 ... for agentic architecture 19:21:47 ... streaming language model 19:22:00 kaz: there are many related standards already 19:22:05 ... as possible pieces 19:22:19 ... should dive into actual use cases and see what is still missing 19:22:31 ck: robot should be a promising use case 19:22:47 ... human's expectation is much high 19:22:55 ... maybe LLM can do something 19:23:13 ... interactive quality of understanding feedback 19:23:25 ... I utter something, then robot responds 19:23:55 ... but that's flustrating 19:24:27 ... if robots could not just hook up APIs but could work for natural communication, would be great 19:24:49 kaz: tx 19:25:36 ec: @@@ 19:26:01 ck: people treat robot as having some emotion/human quality 19:26:59 ... using a robot like Wally 19:28:43 ... should use appropriate vocabulary for users' age 19:28:58 ec: what is your future? 19:29:06 ck: very different topic 19:29:13 ... emotion 19:29:25 ec: read Minsky's book? 19:29:36 ck: not that but interesting research papers 19:29:57 ... some funding for emotion for robots 19:31:06 ec: note actors act emotional 19:31:31 ck: there are a lot of culture-specific emotions 19:32:14 ec: think "disgust" is one of the universal emotion 19:32:19 s/emotion/emotions/ 19:34:25 kaz: (mentions generic format to handle emotion information, EmotionML) 19:34:26 https://www.w3.org/TR/emotionml/ 19:34:35 rrsagent, make log public 19:34:51 ec: @@@ 19:35:02 ck: there are researchers on that 19:35:12 ... check ACL 19:35:16 ... Microsoft research, etc. 19:35:22 ... multi user attention engagement 19:35:28 rrsagent, draft minutes 19:35:29 I have made the request to generate https://www.w3.org/2026/02/27-smartagents-1-minutes.html kaz 19:36:06 [breakout 1 ends] 19:39:45 s/flust/frust/