Existing W3C voice interaction standards such as VoiceXML are based on use cases centered around telephony-based voice systems. The typical interaction style that these standards support is system-initiated directed dialog using grammars to constrain the speech recognizer. In recent years, interaction with voice applications has become much more flexible, with a user-initiated dialog style and significantly fewer constraints on spoken input.
Many of these new applications take the form of "virtual assistants". These include general-purpose assistants (for example, Siri, Cortana, Google Now and Alexa) as well as virtual assistants with specialized domain expertise. The proposed Community Group will collect new use cases for voice interaction, develop requirements for applications such as virtual assistants and explore areas for possible standardization, possibly producing specifications if appropriate. Depending on interest, this exploration could include such topics as (1) discovery of virtual assistants with specific expertise, for example a way to find a virtual assistant that can supply weather information (2) standard formats for statistical language models for speech recognizers (3) standard representations for references to common concepts such as time (4) interoperability for conversational interfaces and (5) work on dialogue management or ‘workflow' languages . New functionality for existing voice standards can also be a topic of discussion.
Speech application developers and voice user interface designers should be particularly interested in this group.
I was talking to Debbie the other day about one of her upcoming sessions and it got me thinking about how standards could be applied (at a high level at this point) to the IVA specialists of the world that are being created as I write this post. I’ll put the initial thoughts here and maybe stir up a conversation from there.
…..as of now I see folks writing new skills for Amazon because they have one in their house, and the other person writing API.ai extensions for Google Home…because they have one. When the conversation changes to more of the person, or company or service provider, wanting to expose their services for anyone who wants them, this[standard communication mechanism] will be very an important topic.
Standards will have to cover things like interface contracts, what to send and what to expect in return. A standard like an “IVAXML”, for example, would be helpful so folks can expose their “experts” and be consumed by anyone who wants to interact.
I also think the ability to search for an “expert” by service criteria will be a much needed standard, almost like an intent-handler search. For example, you may want to ask “What’s the balance in my checking account”. There could be hundreds of banking experts, and you’d want to search for your bank in a specific domain. Imagine the power of such a system where you can drag and drop “experts” into Siri and Alexa so if you’re on your phone or at home you can ask the same question and get the same response.
This gives rise to the importance of the Gateway IVA that can route your question or command via intent inference to the correct handler, or expert. You can choose your group of IVS (specialists) that is specific to you. Also, this brings about the need for an authentication standard around IVA/S systems so they know who they are talking to in a secure, extensible way.