W3C

– DRAFT –
Voice Interaction

04 October 2023

Attendees

Present
debbie, dirk, gerard, jon, noreen
Regrets
bev
Chair
debbie
Scribe
ddahl

Meeting minutes

gerard: interested in embedding conversational AI in mobile devices

dirk: interested in standardizing voice interaction
… curious to learn from Gerard about security

github issues

dirk: close issue #5
… in the architecture document

dirk: description of Russian doll principle (#40)

noreen: looking at Russian doll in Wikipedia

https://en.m.wikipedia.org/wiki/Matryoshka_doll

noreen: it would be interesting to find a stable reference to that metaphor

noreen: will look for a reference

debbie: we agree to include a reference if noreen can find something appropriate

dirk: will add noreen to github (nwhysel)

irk: roles and responsibilities (issue #36)

dirk: (reviews roles and responsibilities)

debbie: what about the provider of the IPA?

noreen: could be integrator

jon: this participant has multiple roles, e.g. designer and integrator

noreen: should disambiguate owner and user

dirk: user owns speaker, but someone in the house might be using it

noreen: two potential owners, bank and user

dirk: replace owner by platform provider?
… should not mix up hardware device vs something that provider provides

jon: platform, enterprise owner, user
… Amazon has multiple roles in this scheme

jon: if we envision this architecture as a guide for independent IPAs we have three roles
… if it's a consumer-facing IPA (like an app) there would be two

debbie: should we add examples?

dirk: that would help

dirk: will revise list with examples

jon: will add examples from enterprise provider (3 roles)

debbie: revisit this next time

compare OVON and voice interaction work

debbie: looks at OVON clusters and focus items https://lists.w3.org/Archives/Public/public-voiceinteraction/2023Jul/att-0001/overlapOvonClusters.pdf

debbie: the most mature OVON specs are dialog events and interagent protocols
… let's compare dialog events and interfaces
… there is a spec for dialog events but examples would be better to look at https://github.com/open-voice-network/lib-interop/blob/main/python/sample-json/example-ovon-user-input-minimal.json
… for OVON, vs. interfaces document https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paInterfaces/paInterfaces.htm (section 4.1)
… OVON has speaker id for either user or system

dirk: should add that to VI

sending audio data

dirk: two cases, one instance is sending user started speaking and finished utterance (endpointed) or streaming, audio is sent by some other means
… either sender or receiver could endpoint
… message says "user has started speaking, look here for the audio"

debbie: will compare and contrast dialog events and interfaces

dirk: will review

dirk: suggest putting use case task force on the agenda

debbie: agrees

Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

Succeeded: s/reference/stable reference

Maybe present: irk

All speakers: debbie, dirk, gerard, irk, jon, noreen

Active on IRC: ddahl