Voice Interaction Community Group

Meeting minutes

dirk's walkthrough updates

https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-1.htm#walkthrough

dirk: does walkthrough match description?
… this is just the subset that matches part of the text

Action: debbie to update text description in Section 4

dirk: reviews Figure 3

dirk: combination of local stuff and remote
… knowledge graph can be used to find location given GPS

dirk: ASR is sent in parallel to both NLU's (local and remote) (step 6)

jim: what information is given to provider registry?
… how does the provider registry work?

dirk: doesn't know anything about utterance semantics
… it just tries to determine the number of registered IPA providers
… it can use context
… or user preferences

jim: what do input and output from provider registry look like?

dirk: provider selection service gets the ASR result

debbie: what information is in step 9

dirk: it's always the same, just a query for all providers
… the provider registry could be more or less smart

jim: after provider selection service gets an ASR result it queries the Provider Registry for all providers

debbie: what about steps 10 and 11?

dirk: for each provider returned from the registry, authentication information is returned

jim: suggest sending NLU to provider selection service instead of ASR result

dirk: or could just send the raw audio

jim: now the Provider Registry sends all the providers, if we use NLU we could be more selective about what providers we return

dirk: this is the provider selection strategy

dirk: this is the minimum and the overall system could be smarter

dirk: then arrow 6 could be after the NLU

dirk: providers could use their own NLU

jim: what's the difference between the orange (Dialog) NLU and the specific NLU's in the IPA providers

jon: what if IPA provider 1 is American, then it could have full complement of components and functionality.
… we now have text coming from the Dialog box

dirk: could come up with different walkthroughs describing different alternatives

jim: is one of the goals to enable IPA providers to support any input
… are the inputs in step 13 in a standardized format

dirk: 1 -- raw audio, raw ASR or NLU

jim: suggest restricting to one alternative

debbie: I agree

debbie: maybe rule out NLU as a format

dirk: raw audio would be the most common
… or text from a chatbot or ASR

jim: future enhancement would be to standardize NLU, but not covered yet

jim: what is the purpose of the NLU in the Dialog box?

dirk: that's the local NLU that processes some of the queries

Action: dirk to change arrows to reflect the primary use case

output side of walkthrough

dirk: arrow 1 is the various NLU results

dirk: asks dialog registry for the best dialog

dirk: may need to query for missing slots
… gets next dialog move
… NLG comes up with text, doesn't repeat information that's already known to the user

jim: we have a distributed semantic processing, some done in orange and some in blue

dirk: yes

jim: this might be too confusing

dirk: there are some implementations but not standarized

debbie: how do you find the Dialog that goes with the NLU
… if the intent is "book flight" then there could be lots of dialog that could handle that intent
… how do you associate that intent with the appropriate dialog
… so actually intents and entities come back
… and that's mapped to dialogs

dirk: that's correct

debbie: when do you actually get your information from the database

dirk: we haven't gotten to that yet, I could add that

dirk: things need to be clearer

Action: debbie to pull together list of outstanding comments

dirk: also collected some suggestions from a presentation
… what should we do to prove that we did what you describe

jim: eventually there will be conformance tests

– DRAFT –
Voice Interaction Community Group

24 March 2021

Attendees

Meeting minutes

dirk's walkthrough updates

output side of walkthrough

Summary of action items

Diagnostics