18:55:10 <RRSAgent> RRSAgent has joined #smartagents-1
18:55:15 <RRSAgent> logging to https://www.w3.org/2026/02/27-smartagents-1-irc
18:55:47 <kaz> meeting: Breakout - Do we need real-time processing capabilities on voiceagents?
18:56:22 <kaz> ck: (shows a research paper)
18:56:46 <kaz> ... (about s
18:56:55 <kaz> s/ s/ speech recognizer)
18:57:02 <kaz> @@@
18:57:20 <kaz> ck: there are 3 operations
18:58:11 <kaz> ... example of hypothesis revision
18:58:29 <kaz> s/@@@/aclanthology.org/E09-1081.pdf
18:59:20 <kaz> present+ KazAshimura, plh, Casey_Kennington, EmmettCoin, RajTumuluri, GerardChollet, PatriciaLee, RlrikeStiefelhagen, YashGhelani
19:00:41 <kaz> kaz has joined #smartagents-1
19:00:51 <kaz> rt: (lefts)
19:01:17 <kaz> ec: backtrack some sort of semantics?
19:01:23 <kaz> ... vector of meaning
19:01:46 <kaz> ck: recognizer doesn't detect meaning itself
19:02:02 <kaz> ... there is no need for revoking or updating
19:02:12 <kaz> ... the next step is understanding
19:02:16 <kaz> ... some map for specific task
19:02:34 <kaz> ... models for tasks
19:03:01 <kaz> ... then semantics model
19:03:16 <kaz> ... hypothesis applied
19:03:45 <kaz> ... the problem with recurrent networks
19:04:17 <kaz> ... with larger context
19:04:34 <kaz> ... the other problem is taking a lot of time for training
19:05:15 <kaz> ... memory errors, etc.
19:06:04 <kaz> ec: in some way biased
19:06:13 <kaz> ck: bias is there
19:06:20 <kaz> ... but understanding is no
19:06:23 <kaz> s/no/not
19:06:30 <kaz> ... generalization is a bias
19:07:19 <kaz> ... none of the language models here is incremental
19:07:37 <kaz> ... output is incremental
19:07:41 <kaz> ... and input can be
19:08:20 <kaz> ... kind of R&D stage now
19:08:36 <kaz> ec: an agent listening to me
19:08:58 <kaz> ck: fairly well research question
19:09:42 <kaz> ec: we did studies about similar situation (about car environment)
19:10:03 <kaz> s/question/question. Frankie was talking about car environment./
19:10:22 <kaz> ck: people are chatting with AI agents
19:10:29 <kaz> ... worth speaking with humans
19:10:53 <kaz> ec: have zero recollection about what was done while driving
19:11:16 <kaz> ck: drivers paying attention for driving
19:11:26 <kaz> ... that's kind of another direction
19:11:39 <kaz> ... the model should interrupt you depending on the situation
19:11:50 <kaz> ... the work by Koji Inoue
19:11:57 <kaz> ... doing turn-taking
19:12:24 <kaz> ... 4 duplex microphones
19:12:42 <kaz> ... feedback model has signal
19:12:59 <kaz> ... good starting point for interruption model
19:13:12 <kaz> ec: who is speaking is a key
19:13:53 <kaz> ck: we have two speaker dialization
19:14:29 <kaz> ... (describes a model)
19:15:12 <kaz> ec: Amazon, etc., handles different speakers at home, for example
19:15:29 <kaz> ck: dialization is a question at the moment
19:15:40 <kaz> ec: talking to Agents is interesting
19:15:57 <kaz> ... learning conversation style might be a bit different
19:16:14 <kaz> ck: think that's a feature of bug
19:16:23 <kaz> ... it's not having state information
19:16:54 <kaz> ... maybe language model should handle this
19:17:45 <kaz> ... (problem with in-car system)
19:17:59 <kaz> ... what's on my calendar today?
19:18:21 <kaz> ... need a safe place to listen the response
19:18:30 <kaz> ... want some sort of introductory
19:19:03 <kaz> ... hard thing is difficulty with lane change, etc.
19:19:17 <kaz> ec: agree that's important
19:20:13 <kaz> kaz: which part to be standardized within W3C?
19:20:23 <kaz> ck: not at the moment
19:20:27 <kaz> ... but streaming?
19:20:37 <kaz> ... for agentic architecture
19:21:47 <kaz> ... streaming language model
19:22:00 <kaz> kaz: there are many related standards already
19:22:05 <kaz> ... as possible pieces
19:22:19 <kaz> ... should dive into actual use cases and see what is still missing
19:22:31 <kaz> ck: robot should be a promising use case
19:22:47 <kaz> ... human's expectation is much high
19:22:55 <kaz> ... maybe LLM can do something
19:23:13 <kaz> ... interactive quality of understanding feedback
19:23:25 <kaz> ... I utter something, then robot responds
19:23:55 <kaz> ... but that's flustrating
19:24:27 <kaz> ... if robots could not just hook up APIs but could work for natural communication, would be great
19:24:49 <kaz> kaz: tx
19:25:36 <kaz> ec: @@@
19:26:01 <kaz> ck: people treat robot as having some emotion/human quality
19:26:59 <kaz> ... using a robot like Wally
19:28:43 <kaz> ... should use appropriate vocabulary for users' age
19:28:58 <kaz> ec: what is your future?
19:29:06 <kaz> ck: very different topic
19:29:13 <kaz> ... emotion
19:29:25 <kaz> ec: read Minsky's book?
19:29:36 <kaz> ck: not that but interesting research papers
19:29:57 <kaz> ... some funding for emotion for robots
19:31:06 <kaz> ec: note actors act emotional
19:31:31 <kaz> ck: there are a lot of culture-specific emotions
19:32:14 <kaz> ec: think "disgust" is one of the universal emotion
19:32:19 <kaz> s/emotion/emotions/
19:34:25 <kaz> kaz: (mentions generic format to handle emotion information, EmotionML)
19:34:26 <kaz> https://www.w3.org/TR/emotionml/
19:34:35 <kaz> rrsagent, make log public
19:34:51 <kaz> ec: @@@
19:35:02 <kaz> ck: there are researchers on that
19:35:12 <kaz> ... check ACL
19:35:16 <kaz> ... Microsoft research, etc.
19:35:22 <kaz> ... multi user attention engagement
19:35:28 <kaz> rrsagent, draft minutes
19:35:29 <RRSAgent> I have made the request to generate https://www.w3.org/2026/02/27-smartagents-1-minutes.html kaz
19:36:06 <kaz> [breakout 1 ends]
19:39:45 <kaz> s/flust/frust/