Using XML for Voice
Applications
Philipp
Hoschka
Interaction
Domain Leader
W3C/INRIA
Why Interest in Voice
?
- "Web approach" lowers telephone service development
cost
- On small mobile devices (3G)
- voice input instead
of keyboard
- voice output instead of
screen
"Thumbs are the new fingers
..."
Mercury
News, 23 March
2002:
Thumbs are the new fingers for GameBoy youth
The use of gadgets such as mobile phones and GameBoys has caused
a physical mutation in young people's hands, according to
a British Sunday newspaper.
New research carried out in nine cities around the world shows that
the thumbs of people under the age of 25 have taken over as the
hand's most dextrous digit, said The Observer.
...
``Discovering that the younger generation has taken to using
thumbs in a completely different way and are instinctively using
thumbs where the rest of us are using our index fingers is
particularly interesting.''
W3C
Voice Activities
- Voice Browser
- Telephone-based
services
- VoiceXML
- Purely
voice-based
- Multimodal Interaction
- Focussed on 3G mobile device
- Just
launched
- Mix voice with web
page
What is a Voice Browser
?
- Browser: "What is your departure airport
?"
- Caller: "Nice"
- Browser: "Where do you want to fly
to ?"
- Caller: "Paris"
- Browser: "At what time
?"
- ...
Example VoiceXML
Users
Companies
Involved
- Editors of Working Drafts
- AT&T
- Dynamicsoft
- IBM
- Lucent
- Motorola
- Nuance
Communications
- PipeBeach
- SpeechWorks
International
- Tellme
- ...
Why
XML for Voice ?
- Leverage Web success factors for voice
- Entry-level development with text-based
editor
- Learn by reading other people's
documents
- Dynamically generated content
possible
- Content stored on any web server
- Web
developers familiar with
angle-brackets
System
Architecture
Voice Browser
Architecture
W3C XML Markup Languages for
Voice
- Dialogue Control: VoiceXML
- Speech
Recognition: Speech Recognition Grammar Syntax
- Speech Synthesis: Speech Synthesis
Markup Language
VoiceXML
- XML-based
programming language for voice applications
- Need to describe
- System prompt
- Expected user
response
- Action on expected response
- Action on
inexpected response
Example:
Prompt
<prompt>
<audio>
Welcome to the <say-as type="acronym">W3C</say-as>
Voice <say-as type="acronym">XML</say-as> server.
Would you like to have more information about the
architecture domain, the document formats domain, the
interaction domain, the technology and society domain
or the Web Accessibility Initiative ?
</audio>
</prompt>
Techniques
to Improve User Interface
- "Barge-In": User can interupt
prompt with answer
- Alternate prompts: Vary prompt for
input
- "Mixed
Dialogue": User can give response that does not answer question
Browser: When will you arrive at the hotel ?
User: I need to
rent a car
Browser: Which company do you prefer
?
...
- See VoiceXML 2.0 Working
Draft
Example
Rule
<rule id="city">
<one-of>
<item>Rio de Janeiro</item>
<item>Rio</item>
<item>Paris</item>
...
</one-of>
</rule>
Dealing
with
Prononciations
<item lang-list="en, fr">Rio de Janeiro</item>
<voice gender="female" category="child">
Mary had a little lamb
</voice>
Markup
for Prosody
<audio src="prompt.au">
What city do you want to fly from ?
</audio>
Example
Implementations
- Tellme
Studio allows anyone to create a voice application and access it via
phone
- http://studio.tellme.com
- +1-880-555-TELL
- +1-408-678-446
(International)
- VXI interpreter (Open
source, SourceForge)
- BeVocal
- HeyAnita
- IBM Voice Server
SDK
- Motorola
- Nuance
- PIPEBEACH
- Telera
- VoiceGenie
- ...
Status
of Drafts at W3C
- VoiceXML 2.0: Working Draft
- Speech
Grammar: Last Call Ended
- Speech Synthesis: Last Call
Ended
Summary
- Lots of development around
VoiceXML
- Bringing development of Voice applications to the
masses
- Fun to play with - try it out !
More
Information