Meeting minutes
ck: (shows a research paper)
… (about speech recognizer)
aclanthology.org/E09-1081.pdf
ck: there are 3 operations
… example of hypothesis revision
rt: (lefts)
ec: backtrack some sort of semantics?
… vector of meaning
ck: recognizer doesn't detect meaning itself
… there is no need for revoking or updating
… the next step is understanding
… some map for specific task
… models for tasks
… then semantics model
… hypothesis applied
… the problem with recurrent networks
… with larger context
… the other problem is taking a lot of time for training
… memory errors, etc.
ec: in some way biased
ck: bias is there
… but understanding is not
… generalization is a bias
… none of the language models here is incremental
… output is incremental
… and input can be
… kind of R&D stage now
ec: an agent listening to me
ck: fairly well research question. Frankie was talking about car environment.
ec: we did studies about similar situation (about car environment)
ck: people are chatting with AI agents
… worth speaking with humans
ec: have zero recollection about what was done while driving
ck: drivers paying attention for driving
… that's kind of another direction
… the model should interrupt you depending on the situation
… the work by Koji Inoue
… doing turn-taking
… 4 duplex microphones
… feedback model has signal
… good starting point for interruption model
ec: who is speaking is a key
ck: we have two speaker dialization
… (describes a model)
ec: Amazon, etc., handles different speakers at home, for example
ck: dialization is a question at the moment
ec: talking to Agents is interesting
… learning conversation style might be a bit different
ck: think that's a feature of bug
… it's not having state information
… maybe language model should handle this
… (problem with in-car system)
… what's on my calendar today?
… need a safe place to listen the response
… want some sort of introductory
… hard thing is difficulty with lane change, etc.
ec: agree that's important
kaz: which part to be standardized within W3C?
ck: not at the moment
… but streaming?
… for agentic architecture
… streaming language model
kaz: there are many related standards already
… as possible pieces
… should dive into actual use cases and see what is still missing
ck: robot should be a promising use case
… human's expectation is much high
… maybe LLM can do something
… interactive quality of understanding feedback
… I utter something, then robot responds
… but that's flustrating
… if robots could not just hook up APIs but could work for natural communication, would be great
kaz: tx
ec: @@@
ck: people treat robot as having some emotion/human quality
… using a robot like Wally
… should use appropriate vocabulary for users' age
ec: what is your future?
ck: very different topic
… emotion
ec: read Minsky's book?
ck: not that but interesting research papers
… some funding for emotion for robots
ec: note actors act emotional
ck: there are a lot of culture-specific emotions
ec: think "disgust" is one of the universal emotions
kaz: (mentions generic format to handle emotion information, EmotionML)
https://
ec: @@@
ck: there are researchers on that
… check ACL
… Microsoft research, etc.
… multi user attention engagement