Using XML for Voice Applications

Philipp Hoschka
Interaction Domain Leader
W3C/INRIA

Why Interest in Voice ?

"Thumbs are the new fingers ..."

Mercury News, 23 March 2002:

Thumbs are the new fingers for GameBoy youth

The use of gadgets such as mobile phones and GameBoys has caused 
a physical mutation in young people's hands, according to 
a British Sunday newspaper.  
New research carried out in nine cities around the world shows that 
the thumbs of people under the age of 25 have taken over as the 
hand's most dextrous digit, said The Observer.  
...
``Discovering that the younger generation has taken to using 
thumbs in a completely different way and are instinctively using 
thumbs where the rest of us are using our index fingers is 
particularly interesting.''  

W3C Voice Activities

What is a Voice Browser ?

Example VoiceXML Users

Companies Involved

Why XML for Voice ?

System Architecture

Voice browser architecture

Voice Browser Architecture

voice browser architecture

W3C XML Markup Languages for Voice

VoiceXML

Example: Prompt

<prompt>
  <audio>
    Welcome to the <say-as type="acronym">W3C</say-as>
    Voice <say-as type="acronym">XML</say-as> server.
    Would you like to have more information about the
    architecture domain, the document formats domain, the 
    interaction domain, the technology and society domain 
    or the Web Accessibility Initiative ?
  </audio>
</prompt> 

Techniques to Improve User Interface

Speech Recognition Grammar Syntax

speech recognizer

Example Rule

<rule id="city">
  <one-of>
     <item>Rio de Janeiro</item>
     <item>Rio</item>
     <item>Paris</item>
     ...
  </one-of>
</rule>

Dealing with Prononciations

<item lang-list="en, fr">Rio de Janeiro</item> 

Speech Synthesis Markup Language

Markup for Text Normalisation

Changing Voices

<voice gender="female" category="child">
  Mary had a little lamb
</voice>

Markup for Prosody

Including Recorded Audio

<audio src="prompt.au">
  What city do you want to fly from ?
</audio>

Example Implementations

  1. Tellme Studio allows anyone to create a voice application and access it via phone
  2. VXI interpreter (Open source, SourceForge)
  3. BeVocal
  4. HeyAnita
  5. IBM Voice Server SDK
  6. Motorola
  7. Nuance
  8. PIPEBEACH
  9. Telera
  10. VoiceGenie
  11. ...

Status of Drafts at W3C

Summary

More Information