W3C logo
slanted W3C logo
Cover page images (keys)

Voice Browsing & Multimodal Interaction at W3C

Max Froumentin, <mf@w3.org>

Outline

New devices

PDA Sony Phone

New devices (2)

Car Navigation old telephone Braille Notebook

In more detail

So what?

Each new device manufacturer builds its own browser to suit existing Web content.

Complex example

This works so far but let's look at something more complex: "google maps in my car."

gmaps screenshotpicture

...

Google maps in my car. I want to have my car navigation system use google maps. Requirements:

Multimodal Interaction on the Web

Evolution of the browser architecture

Now:

basic architecture

The Multimodal Browser

arch2

What to standardise?

But why standardise things that are happening inside the browser?
Why change the Web?

The MMI framework

picture

Work items

Input Modality Interfaces

For input we need: Grammars, Integration, Interfaces

Interfaces as IDL/WSDL APIs can be used in Javascript directly, or Web Services, respectively

Generic: register, setGrammar, setModel, getData, prompt, pause, events

Specialised: sendVoiceXML

Leading to two types of architectures:

DOMWS

What to transmit through the pipes

Grammars: Speech Recognition

Speech Recognition Grammar Specification / Semantic Interpetation for Speech Recognition

<one-of>
  <item>Michael</item>
  <item>Yuriko</item>
  <item>Mary</item>
  <item>Duke</item>
  <item><ruleref uri="#otherNames"/></item>
</one-of>

<one-of><item>1</item> <item>2</item> <item>3</item></one-of>

<one-of>
  <item weight="10">small</item>
  <item weight="2">medium</item>
  <item>large</item>
</one-of>

<one-of>
  <item weight="3.1415">pie</item>
  <item weight="1.414">root beer</item>
  <item weight=".25">cola</item>
</one-of>

Defining handwritten gestures and grammar: InkML

picture

<ink>
   <trace>
     10 0 9 14 8 28 7 42 6 56 6 70 8 84 8 98 8 112 9 126 10 140
     13 154 14 168 17 182 18 188 23 174 30 160 38 147 49 135
     58 124 72 121 77 135 80 149 82 163 84 177 87 191 93 205
   </trace>
   <trace>
     130 155 144 159 158 160 170 154 179 143 179 129 166 125
     152 128 140 136 131 149 126 163 124 177 128 190 137 200
     150 208 163 210 178 208 192 201 205 192 214 180
   </trace>
   <trace>
     227 50 226 64 225 78 227 92 228 106 228 120 229 134
     230 148 234 162 235 176 238 190 241 204
   </trace>
   <trace>
     282 45 281 59 284 73 285 87 287 101 288 115 290 129
     291 143 294 157 294 171 294 185 296 199 300 213
   </trace>
   <trace>
     366 130 359 143 354 157 349 171 352 185 359 197
     371 204 385 205 398 202 408 191 413 177 413 163
     405 150 392 143 378 141 365 150
   </trace>
</ink>

EMMA: representing and annotating input

Goals:

picture

EMMA: example

<emma:emma emma:version="1.0"
 xmlns:emma="http://www.w3.org/2003/04/emma#"> 
  <emma:one-of emma:id="r1" 
      emma:start="2003-03-26T0:00:00.15"
      emma:end="2003-03-26T0:00:00.2">
    <emma:interpretation emma:id="int1" emma:confidence="0.75" > 
      <origin>Boston</origin>
      <destination>Denver</destination>
      <date> 
         <emma:absolute-timestamp
          emma:start="2003-03-26T0:00:00.15"
          emma:end="2003-03-26T0:00:00.2"/> 
         03112003 
      </date>  
    </emma:interpretation>
    <emma:interpretation emma:id="int2" emma:confidence="0.68" >
      <origin>Austin</origin>
      <destination>Denver</destination>
      <date>03112003</date>
    </emma:interpretation>
  </emma:one-of>
</emma:emma>

Compositing

Making one emma file out of two

The Dynamic Properties Framework

image image image

The DPF specification defines an API to access system properties. E.g.

DPF: example

<html>
  <head>
    <title>GPS location example</title>
    <script type="text/javascript">
    <![CDATA[
      SystemEnvironment.location.format="zip code";
      SystemEnvironment.location.updateFrequency="20s";
    ]]>
    </script>
    <script defer="defer" type="text/javascript" 
      ev:event="se:locationUpdate">
      <![CDATA[
        var field = document.getElementById("location");
        var zipcode = SystemEnvironment.location;
        field.childNodes[0].nodeValue = zipcode;
      ]]>
    </script>
  </head>
  <body>
    <h1>Track your location as you walk</h1>
    <p>Your current zip code is: <span id="location">(please
    wait)</span></p>
  </body>
</html>

Interaction Manager

The manager...

picture

...

Interaction Manager (2)

...and shapes the interaction accordingly:

Could be code: JavaScript using the APIs mentioned above, or declarative interaction markup like VoiceXML, with a mapping to API calls(with XBL)

Writing multimodal web content

Existing web pages and applications will still work but won't provide:

So extensions will be useful.

Summary 1

basic architecture

Summary 2

architecture 2

The Voice Browser Activity

Historically precedes the MMI Framework

A specific framework

Now Integrates into MMI

VoiceXML

VoiceXML2: one of W3C's most successful specifications

Simple form-filling applications on the phone

voicexml
    <field name="adjustment_amount"<
     <grammar type="application/srgs+xml" src="/grammars/currency.grxml"/<
      <prompt<
        What is the value of your account adjustment?
      </prompt<
      <filled<
        <submit  next="/cgi-bin/updateaccount"/<
      </filled<
    </field<
  </form<

CCXML

a standard for telephony platforms

handles events (e.g. incoming calls)

makes outgoing calls, conference calls, start (VoiceXML) dialogues

CCXML architecture

SCXML

Harel State Tables: a general interaction management paradigm

CCXML provides markup for HST

Can plug in to CCXML, or drive VoiceXML dialogs or MMI interaction

scxml example

Voice+MMI

How does the Voice Interface Framework specifications fit into the MMI architecture

vandmmi

Summary

New multimodal devices can make a better Web experience

The MMI Framework generalises the standard visual browser model

The MMI Framework generalises the standard voice browser model

New specifications needed, for:

Conclusion

MMI page: w3.org/2002/mmi

VBWG page: w3.org/voice

This presentation: w3.org/2005/Talks/1111-maxf-delhi