W3C UserInterface Domain

W3C work on Voice Interaction

Dave Raggett, W3C/HP
Marianne Hickey, HP Labs

A presentation in the Mobile track of the WWW'9 Developer's Day, held in Amsterdam on 19th May 2000.

Why Voice Interaction Is Valuable

Speech Technology

W3C Voice Browser working group

Anatomy of a Voice Interface

Speech Grammars

Context free grammars describe what user says, each rule associated with a semantic effect

diagram of how grammar rule binds to semantics

“I want to fly to London” Destination= “London”

[I want to fly to] $City { destination = $City }
$City = London | Paris | Amsterdam | Milan

Speech Synthesis

Voice Dialog Example

C (computer): Welcome to the weather information service.
   What state?

H (human): Help

C: Please speak the state for which you want the weather.

H: Georgia

C: What city?

H: Tblisi

C: I did not understand what you said. What city?

H: Macon

C: The conditions in Macon Georgia are sunny and clear at 11 AM … 

Voice Dialog Markup

<form id="weather_info">       
  <block>Welcome to the weather information service.</block>       
  <field name="state">       
    <prompt>What state?</prompt>       
    <grammar src="state.gram" type="application/x-jsgf"/>       
    <catch event="help">       
      Please speak the state for which you
      want the weather.       
    </catch>       
  </field>       
  <field name="city">       
    <prompt>What city?</prompt>       
    <grammar src="city.gram" type="application/x-jsgf"/>       
    <catch event="help">       
      Please speak the city for which you
      want the weather.       
    </catch>       
  </field>       
  <block>       
    <submit next="/servlet/weather" namelist="city state"/>       
  </block>       
</form> 

Richer Dialogs

Richer Dialogs

Richer dialogs become possible with the addition of mechanisms for handling dialog history, thereby allowing statements to be made in reference to what was spoken a few turns back. Some indication of the kinds of architecture needed to support this can be seen in the following diagram provided by Philips Research:

diagram showing possible architecture for richer dialogs

Multi-modal Interaction

Pure interactive voice response systems are restricted to voice input and output. The simplest extension allows users to make choices via pressing keys on a telephone key pad. Moving beyond this, the addition of a display, pointing device and richer keyboards, opens up the possibilities of multi-modal interaction.

diagram showing example modalities

Multi-modal Dialogs

Multi-modal systems combine modalities such as display, keypad, pointing device, speech recognition, and speech synthesis. The following diagram from Philips Research gives an indication of how this can effect the architecture:

diagram showing possible architecture for multimodal dialogs

Multi-modal Dialogs

Content Delivery

diagram showing content delivery framework

The key thing to implement is a separation of presentation from your content. This allows you to reuse the content for each channel. The starting point is the applications database. This dynamically generates XML, images, audio and other data. This is then poured into templates to match the device capabilities and user preferences. This exploits CC/PP. The end result is XHTML for desktop browsers, VoiceXML for telephony-based IVR systems, and WML for cell phones.

Convergence Opportunities

diagram showing convergence of VoiceXML and WML

A number of current specifications have significant overlaps. There is an opportunity to "renormalize" into a suite of modular parts. Can we learn from VoiceXML and WML? Can we combine these into a new dialog markup language so that you can target small displays and voice interaction with a single document? W3C's work on XForms also has a part to play as a means to define a way to separate the presentation from the application data and logic.

Copyright  ©  2000 W3C ® (MIT, INRIA, Keio ), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.


Dave Raggett <dsr@w3.org>, Marianne Hickey <marianne_hickey@hpl.hp.com>