Email: dsr@w3.org
W3C Activity Lead for Voice Browsers, XHTML and XForms
Director – Tim Berners-Lee, inventor of the Web
Formed in 1994, W3C is an international consortium of over 460 companies and institutions
Develops specifications, guidelines, software, and tools, e.g.
HTML, CSS, XML, XSL, SMIL, SVG
Current Staff of 60, budget $7M/year
Working Groups, Interest Groups and Coordination Groups
Working Group formed in March 1999, following workshop in October 1998, and previous work in W3C’s Web Accessibility Initiative
Public working drafts on requirements for:
Now working on drafting specifications, with public drafts available for:
New drafts expected soon for:
VoiceXML Forum
ETSI
ECTF (Enterprise Computer Telephony Forum)
DARPA Communicator
Jim Larson (Intel) — Working Group Chair
Alcatel Ask Jeeves AT&T Avaya BeVocal BT Canon Cisco Conversa EDF Enuncia France Telecom General Magic |
Hitachi HP IBM Intel isSound L&H Locus Dialogue Lucent Microsoft Milo Mitre Motorola Nokia |
Nortel Networks Nuance Philips Phone.com PipeBeach SpeechWorks Sun Microsystems TellMe Telecom Italia Unisys VoxSurf Yahoo |
Access via any telephone
Hands and eyes free operation
Devices too small for displays and keyboards
WAP-phones
Palm-top organizers
Universal messaging
Application and user take turns to speak
Form filling metaphor
Prompt user for each field in turn using synthetic speech and prerecorded audio
Use speech grammars to interpret what user says
Offer help as needed
Submit completed form to back-end server
Links to other “pages”
Break out to scripting as needed
When to use prerecorded human speech
Speech Synthesis engines are smart
Basic properties: volume, rate, pitch
Speech font selection by name, gender, age
Control over how things are pronounced
Prerecorded audio effects
Context free grammars describe what user says, each rule associated with a semantic effect
“I want to fly to London” | Destination= “London” |
[I want to fly to] $City { destination = $City
}
$City = London | Paris | Amsterdam | Milan
This example dialog proceeds sequentially:
C (computer): Welcome to the international weather service. What country? H (human): Help C: Please say the country for which you want the weather. H: France C: What city? H: Antibes C: I did not understand what you said. What city? H: Cannes C: The conditions in Cannes France are sunny and clear at 11 AM …
<form id="weather_info"> <block>Welcome to the international weather service.</block> <field name=“country"> <prompt>What country?</prompt> <grammar src=“country.gram" type="application/x-jsgf"/> <catch event="help"> Please say the country for which you want the weather. </catch> </field> <field name="city"> <prompt>What city?</prompt> <grammar src="city.gram" type="application/x-jsgf"/> <catch event="help"> Please say the city for which you want the weather. </catch> </field> <block> <submit next="/servlet/weather" namelist="city country"/> </block> </form>
Keep it simple!
Avoid deeply nested task contexts
Carefully consider when a human touch is needed – and handover to human operator
There’s no substitute for user testing
Plan for continuous improvement based on user feedback
Improvements to VoiceXML
Multi-modal applications
Voice + Display + Key pad etc.
User is free to switch between voice interaction and use of display/key pad
Imagine combination of WML+VoiceXML
W3C plans to set up Multi-modal Dialog Working Group in 2001
Copyright © 1994-2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C
liability,
trademark,
document
use and software
licensing rules apply. Your interactions with this site are
in accordance with our public
and Member
privacy statements.