Voice Interaction – 24 April 2024

Meeting minutes

reference implementation

https://github.com/w3c/voiceinteraction/tree/master/source/w3cipa

dirk: review reference implementation
… ChatGPT and Mistral
… framework is mainly headers
… component that accesses GPT, also demo program

dirk reviews SOURCE.md

input listener for modality inputs
… in this case just selects the first one
… goes to ModalityManager
… can add modality components as you like
… startInput and handleOutput
… this is part of the framework, so Royalty Free
… modality type is a free string so it can be extensible
… only text is implemented in the reference implementation
… some modality components could be both input and output
… one instance that knows all listeners and that all modality components would know
… looking at one example of a modality component, textModality

debbie: can there be more than one InputModalityComponent?

dirk: in theory, yes
… we might have scaling issues with multiple text inputs, for example

debbie: take "first" out of name "TakeFirstInputModalityComponent" to make it more general

dirk: moving on to DialogLayer, IPA Service
… IPA for both local and anything else we have
… ReferenceIPAService consumes data from Client
… could serve multiple clients or if we have local and other IPA services
… no DialogManager in place
… if there was one, the IPA service would send the input to it and then after that the IPA service would forward the output back to the client
… the ExternalIPA/Provider Selection Service
… the Provider Selection Service for now only knows about ChatGPT
… IPA provider supports input from different modalities

debbie: should we standardize on define modality types, e.g. "voice" vs "speech"

dirk: would like to talk about ProviderSelectionStrategy and how components are glued together

debbie: we can talk more in the next call
… could we list the parts of the architecture that aren't implemented yet?

dirk. that might make sense

debbie: could there be an UML diagram?

dirk: there could be more diagrams
… could link from code to specification

dirk: next time talk about the provider selection strategy and how to chain everything together

debbie: will try running

dirk: demo running with ChatGPT

gerard: which version of Mixtral do you use?
… open source version

hugues: the next version will not be open source

gerard: the approach is mixture of experts

dirk: what happens if we ask both at the same time?
… would receive them both

gerard: could use an LLM to summarize
… that's what Mixtral is using with the Mixture of Experts

– DRAFT –
Voice Interaction

24 April 2024

Attendees

Meeting minutes

reference implementation

Diagnostics