W3C work on Voice Interaction

Dave Raggett, W3C/Phone.com

Voice Portals Conference, London 11th October 2000

Email: dsr@w3.org

W3C Activity Lead for Voice Browsers, XHTML and XForms

World Wide Web Consortium

Leading the Web to its Full Potential

Director – Tim Berners-Lee, inventor of the Web
Formed in 1994, W3C is an international consortium of over 460 companies and institutions
Develops specifications, guidelines, software, and tools, e.g.
- HTML, CSS, XML, XSL, SMIL, SVG
Current Staff of 60, budget $7M/year
Working Groups, Interest Groups and Coordination Groups

W3C Voice Browser Working Group

Working Group formed in March 1999, following workshop in October 1998, and previous work in W3C’s Web Accessibility Initiative

Public working drafts on requirements for:

Spoken dialogs
Reusable dialog components
Speech synthesis
Speech grammars
Natural language semantics
Multi-modal dialogs

Voice Browser Working Group

Now working on drafting specifications, with public drafts available for:

Speech Synthesis
Speech Grammars

New drafts expected soon for:

Speech Dialogs
Statistical language models
NL Semantics

Relationship to Other Groups

VoiceXML Forum
- VoiceXML Submission formed basis for W3C work
- Education/Awareness activities
- Conformance/Testing
ETSI
- Standards for distributed speech recognition
ECTF (Enterprise Computer Telephony Forum)
- Call control & Computer Telephony Services Platform
DARPA Communicator
- Research into advanced speech applications

Voice Browser Working Group

Jim Larson

Jim Larson (Intel) — Working Group Chair

Voice Browser WG - Membership

Alcatel
Ask Jeeves
AT&T
Avaya
BeVocal
BT
Canon
Cisco
Conversa
EDF
Enuncia
France Telecom
General Magic

Hitachi
HP
IBM
Intel
isSound
L&H
Locus Dialogue
Lucent
Microsoft
Milo
Mitre
Motorola
Nokia

Nortel Networks
Nuance
Philips
Phone.com
PipeBeach
SpeechWorks
Sun Microsystems
TellMe
Telecom Italia
Unisys
VoxSurf
Yahoo

Why Voice Interaction Is Valuable

Access via any telephone
Hands and eyes free operation
Devices too small for displays and keyboards
WAP-phones
Palm-top organizers
Universal messaging

Anatomy of a Voice Interface

Application and user take turns to speak
Form filling metaphor
Prompt user for each field in turn using synthetic speech and prerecorded audio
Use speech grammars to interpret what user says
Offer help as needed
Submit completed form to back-end server
Links to other “pages”
Break out to scripting as needed

Speech Synthesis

When to use prerecorded human speech
Speech Synthesis engines are smart
Basic properties: volume, rate, pitch
Speech font selection by name, gender, age
Control over how things are pronounced
Prerecorded audio effects
W3C Speech synthesis markup language

Speech Grammars

Context free grammars describe what user says, each rule associated with a semantic effect

diagram of how grammar rule binds to semantics


“I want to fly to London”	Destination= “London”

[I want to fly to] $City { destination = $City }
$City = London | Paris | Amsterdam | Milan

Voice Dialog Example

This example dialog proceeds sequentially:

C (computer): Welcome to the international weather service.
  What country?

H (human): Help

C: Please say the country for which you want the weather.

H: France

C: What city?

H: Antibes

C: I did not understand what you said. What city?

H: Cannes

C: The conditions in Cannes France are sunny and clear at 11 AM …

Voice Dialog Markup

<form id="weather_info">       
  <block>Welcome to the international weather service.</block>       
  <field name=“country">       
    <prompt>What country?</prompt>       
    <grammar src=“country.gram" type="application/x-jsgf"/>       
    <catch event="help">       
      Please say the country for which you want the weather.       
    </catch>       
  </field>       
  <field name="city">       
    <prompt>What city?</prompt>       
    <grammar src="city.gram" type="application/x-jsgf"/>       
    <catch event="help">       
      Please say the city for which you want the weather.       
    </catch>       
  </field>       
  <block>       
    <submit next="/servlet/weather" namelist="city country"/>       
  </block>       
</form>

Keys to Successful Voice Apps

Keep it simple!
Avoid deeply nested task contexts
Carefully consider when a human touch is needed – and handover to human operator
There’s no substitute for user testing
Plan for continuous improvement based on user feedback

Where Next?

Improvements to VoiceXML
Multi-modal applications
- Voice + Display + Key pad etc.
- User is free to switch between voice interaction and use of display/key pad
- Imagine combination of WML+VoiceXML
W3C plans to set up Multi-modal Dialog Working Group in 2001

Copyright © 1994-2000 W3C^® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.