Or a network of Web services

web services

III.2 Input Modalities

Need Grammars: SRGS and InkML

SRGS

<one-of>
  <item>Michael</item>
  <item>Yuriko</item>
  <item>Mary</item>
  <item>Duke</item>
  <item><ruleref uri="#otherNames"/></item>
</one-of>

<one-of><item>1</item> <item>2</item> <item>3</item></one-of>

<one-of>
  <item weight="10">small</item>
  <item weight="2">medium</item>
  <item>large</item>
</one-of>

<one-of>
  <item weight="3.1415">pie</item>
  <item weight="1.414">root beer</item>
  <item weight=".25">cola</item>
</one-of>

Defining handwritten gestures and grammar: InkML

<ink>
   <trace>
     10 0 9 14 8 28 7 42 6 56 6 70 8 84 8 98 8 112 9 126 10 140
     13 154 14 168 17 182 18 188 23 174 30 160 38 147 49 135
     58 124 72 121 77 135 80 149 82 163 84 177 87 191 93 205
   </trace>
   <trace>
     130 155 144 159 158 160 170 154 179 143 179 129 166 125
     152 128 140 136 131 149 126 163 124 177 128 190 137 200
     150 208 163 210 178 208 192 201 205 192 214 180
   </trace>
   <trace>
     227 50 226 64 225 78 227 92 228 106 228 120 229 134
     230 148 234 162 235 176 238 190 241 204
   </trace>
   <trace>
     282 45 281 59 284 73 285 87 287 101 288 115 290 129
     291 143 294 157 294 171 294 185 296 199 300 213
   </trace>
   <trace>
     366 130 359 143 354 157 349 171 352 185 359 197
     371 204 385 205 398 202 408 191 413 177 413 163
     405 150 392 143 378 141 365 150
   </trace>
</ink>

III.3. annotating and combining input

EMMA: annotating input

Goals:

define a markup language to combine user input information from recognisers to the interaction manager
intermediaries add medatada to recogniser output

EMMA: example

<emma:emma emma:version="1.0"
 xmlns:emma="http://www.w3.org/2003/04/emma#"> 
  <emma:one-of emma:id="r1" 
      emma:start="2003-03-26T0:00:00.15"
      emma:end="2003-03-26T0:00:00.2">
    <emma:interpretation emma:id="int1" emma:confidence="0.75" > 
      <origin>Boston</origin>
      <destination>Denver</destination>
      <date> 
         <emma:absolute-timestamp
          emma:start="2003-03-26T0:00:00.15"
          emma:end="2003-03-26T0:00:00.2"/> 
         03112003 
      </date>  
    </emma:interpretation>
    <emma:interpretation emma:id="int2" emma:confidence="0.68" >
      <origin>Austin</origin>
      <destination>Denver</destination>
      <date>03112003</date>
    </emma:interpretation>
  </emma:one-of>
</emma:emma>

III.4. Compositing

Making one emma file out of two

programming language (e.g. Java+DOM)

XSLT

<xsl:if test="@emma:confidence &gt; 40">
  <xsl:copy-of select="."/>
</xsl:if>

Declarative: SMIL

<par>
    <par>
        <ref id="input1" mode="ink" grammar="select.ink" begin="activateEvent"/>
        <ref id="timeout1" dur="2s" begin="input1.activateEvent"/>
    </par>
    <excl end="timeout2.end">
        <priorityClass peers="pause">
            <ref id="timeout2" end="timeout1.end"/>
            <ref id="speech1" mode="speech" grammar="print.grm" begin="activateEvent"/>
        </priorityClass>
    </excl>
</par>

III.5. The Dynamic Properties Framework

The S+E specification defines an API to access system properties. E.g.

battery level
signal strength
latitude/longitude from GPS
ambient noise
user preferences

DPF: example

<html>
  <head>
    <title>GPS location example</title>
    <script type="text/javascript">
    <![CDATA[
      SystemEnvironment.location.format="zip code";
      SystemEnvironment.location.updateFrequency="20s";
    ]]>
    </script>
    <script defer="defer" type="text/javascript" 
      ev:event="se:locationUpdate">
      <![CDATA[
        var field = document.getElementById("location");
        var zipcode = SystemEnvironment.location;
        field.childNodes[0].nodeValue = zipcode;
      ]]>
    </script>
  </head>
  <body>
    <h1>Track your location as you walk</h1>
    <p>Your current zip code is: <span id="location">(please
    wait)</span></p>
  </body>
</html>

III.6. Interaction Manager

The manager...

receives the page or application from the Web
knows what modalities are available or not
gets information from the S+E and Session component

...

Interaction Manager (2)

...and shapes the interaction accordingly:

with visual media: shows the page on the screen
with audio media: presents an application as a dialogue (a la VoiceXML)

sometimes with a little help from the application author...

Part IV: writing multimodal web content

Existing web pages and applications will still work but won't provide:

modality dependent text
modality dependent interaction

So extensions will be useful.

You can already do it in HTML+JavaScript+MID+DPF (See above)
Declarative markup (better) is apparing XHTML+Voice, SALT, CSS-MMI

SALT

Speech Application Language Tags

<?xml version="1.0"?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/voice">
    <form>
        <field name="stock">
            <grammar src="./g_stock.grxml"/>
            <help> Please just say stock name. </help>
            Please say the stock name.
        </field>
        <field name="op">
            <grammar src="./g_op.grxml"/>
            <help> Please just say buy or sell. </help>
            Do you want to buy or sell?
        </field>
        <field name="quantity"> 
            <grammar src="./g_quant.grxml"/>
            <help> Please just say number of shares. </help>
            How many shares?
        </field>
        <field name="price">
            <grammar src="./g_price.grxml"/>
            <help> Please just say price. </help>
            What's the price?
        </field>
    </form>
</vxml>

XHTML+Voice

<?xml version="1.0"?>
<html 
xmlns="http://www.w3.org/1999/xhtml" 
xmlns:vxml="http://www.w3.org/2001/vxml"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
>
  <head>
    <title>XHTML+Voice Example</title>
    <!-- voice handler -->
    <vxml:form id="sayHello">
      <vxml:block><vxml:prompt xv:src="#hello"/>
      </vxml:block>
    </vxml:form>
  </head>
  <body>
    <h1>XHTML+Voice Example</h1>
    <p id="hello" ev:event="click" ev:handler="#sayHello">
      Hello World!
    </p>
  </body>
</html>

Focus on CSS-MMI

Extensions to CSS for multimodal interaction

Designing an application's interaction can be viewed as styling it

CSS-MMI: a simple HTML file

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Daily Horoscope</title>
  </head>
  <body>
    <form action="http://example.com/horoscope">
    Your star sign?
       <input id="sign" type="text" name="sign" />
    </form>
  </body>
</html>

CSS-MMI: example stylesheet

#sign:focus {
     prompt: "What is your star sign?";
     grammar: Aries | Taurus | Gemini | Cancer;
     reprompt: 1.5s;
  }

or per-modality:

@media speech {

     prompt: "Do you confirm?";
     grammar: yes | yeah {yes} | sure {yes} | no | nah {no}

}

Conclusion

The framework is big!
But pieces work individually, and can be used elsewhere
Open standards will make possible to bring many different disciplines together

Multimodal Interaction on The Web