Inter-IVA Communication Standard
Posted on:I was talking to Debbie the other day about one of her upcoming sessions and it got me thinking about how standards could be applied (at a high level at this point) to the IVA specialists of the world that are being created as I write this post. I’ll put the initial thoughts here and maybe stir up a conversation from there.
…..as of now I see folks writing new skills for Amazon because they have one in their house, and the other person writing API.ai extensions for Google Home…because they have one. When the conversation changes to more of the person, or company or service provider, wanting to expose their services for anyone who wants them, this[standard communication mechanism] will be very an important topic.
Standards will have to cover things like interface contracts, what to send and what to expect in return. A standard like an “IVAXML”, for example, would be helpful so folks can expose their “experts” and be consumed by anyone who wants to interact.
I also think the ability to search for an “expert” by service criteria will be a much needed standard, almost like an intent-handler search. For example, you may want to ask “What’s the balance in my checking account”. There could be hundreds of banking experts, and you’d want to search for your bank in a specific domain. Imagine the power of such a system where you can drag and drop “experts” into Siri and Alexa so if you’re on your phone or at home you can ask the same question and get the same response.
This gives rise to the importance of the Gateway IVA that can route your question or command via intent inference to the correct handler, or expert. You can choose your group of IVS (specialists) that is specific to you. Also, this brings about the need for an authentication standard around IVA/S systems so they know who they are talking to in a secure, extensible way.
Good read and glad to see new activity on this site. I’m new to this group and was wondering what the process is to get the ball rolling with proposed specifications such as Gateway IVA. How do you get buy-in or start with the spec?
Furthermore, If I understand your post correctly, the Gateway is more of a router than an engine? Does the communication between Alexa and Google Home using take place through REST using a common UI? Forgive me if I’m flubbing the vocabulary here – sounds like a much needed capability for users…
Sorry for the very late response! I wanted to comment on how to get the ball rolling. You can start a discussion like this, or put together a draft like the one we just published and circulate it for comments in the group. When we agree that it’s ready we can publish it. You can do this either as one person or as a team, if a couple of people are interested.
So the Gateway IVA could be an engine of sorts, and the router. It could be the owner of the NLP intent routing and infer the intent via trained NLP model and route to the handler/specialist of it’s choice per configuration. Then the question becomes, how does the training of the intent engine happen.
I see this as when an IVS is created, it would perhaps have attached “intent training” content that could be used to train your Gateway IVA on how to find it using NLP. Perhaps a list of questions/sentences that a user would say to get to this exposed service. Or, is this left up to the owner of the Gateway IVA? Or both?
I see REST with a common contract and interface as at least the starting point for this conversation. Each physical/virtual interface, Home, Alexa, etc. would send the communication in a RESTful way to the IVS service, and each would know how to handle the request and the returned response. The IVS wouldn’t care who sent it the request, and would also not care where the reply is being sent, as both would understand the response format and know how to decipher the exchange.
Does it make sense to start with how to define intents that all services can agree upon – i.e. a “turn on the lights” object that reasonably models the real world and can be updated to accommodate new properties?
Some systems have pre-trained intents, for example Houndify has out-of-the-box “domains” (like many others). This could be a starting point to define well-known domains and pre-training recommendations or training sets that will allow the gateway to know where to send the request.
I can think of standard training sets like Penn Treebank and Tiger are used to train for POS, Parsing, Tokenization, etc. There could possibly be a standard sets of training data to certify the Gateway IVA is up to speed on…for lack of a better term… “IVA-Training-Set-2.4.4”, and that spec supports a set of pre-trained intents or domain where you can then tell the Gateway IVA what service you’d like to send your request based on intention of the user.
I think I understand – on a practical level however, what would the specification be and how is it categorized. I think a number of people on this list were involved with VoiceXML, an XML specification and precursor to the next wave. What you’re describing is an open source engine of sorts, more like Apache, correct? Or is it a hybrid of vocabulary and engine…
Don’t laugh me off the stage, but I was wondering if anyone had a thoughts about the following: a low-tech approach to interoperability – human language itself. Assuming that all systems are capable independent of each other in terms of basic voice recognition and understanding, what is English-language schema or “Strunk and White” for them to work off of?
To frame the idea in concrete terms, I asked Google Home: “OK, Google, say hi to Alexa”. And sure enough, Alexa, who was in the same room, responded to that intent with her response to be friends (plenty of YouTube vids on these conversations). However, on another round, this command did not work: “OK, Google, say HELLO to Alexa” – very subtle difference to us but not both of them. Is there a common vocabulary of English words that lends itself better to understanding by all systems?