graphic with four colored squares
Cover page images (keys)

Multimodal Interaction

Dave Raggett, dsr@w3.org

Traditional View of the Web

But ...

The Architecture of the Web

The World Wide Web (WWW, or simply Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI).

URI identifiers resource, data formats represent resources

See http://www.w3.org/TR/webarch/

Web pages, browsers and Web sites

Web Pages

  • Typically marked up in HTML
    • But also XHTML, SVG, SMIL, MathML, XForms and combinations
  • Interpreted by Web browsers
  • Retrieved from Websites via HTTP (HyperText Transfer Protocol)
  • Behaviour is event driven and customizable via scripting
  • Presentation styled via CSS (Cascading Style Sheets)

Web Servers

  • Host websites
  • Are driven by client requests
    • HTTP GET, PUT, POST
  • Map URIs to static files or scripts (CGI, Servlets)
  • Back-ended with databases and payment systems etc.

Freedom of Choice

New opportunities to give end users the means to select the modes of interaction best suited to their current needs, whilst also enabling the application developer to provide an effective end-user experience for whichever choices users make.

Multimodal Interaction

Giving users a choice of modes:

Differences between Speech and GUI

Speech brings new requirements for user interfaces

Small vs Large displays

Dialogues

Interpreting user input

Extensible Multi-Modal Annotations (EMMA)

Authoring Multimodal Applications

Speech Application Language Tags (SALT)

SALT Example

<body onload="intro.start();">

<salt:prompt  id="intro" onComplete=" askTravel.start();">
      Welcome to Ajax Travel 
</salt:prompt>

<salt:prompt id="askTravel" onComplete="lsnTravel.start();">
      Do you want to travel by air, rail, or boat? 
</salt:prompt>

<salt:listen id="lsnTravel" onreco="threeWayBranch();">
   <salt:grammar id="gram1" name="gram1">
      <grammar version="1.0" xml:lang="en-US"
      xmlns="http://www.w3.org/2001/06/grammar" root="travel">
         <rule id="travel" scope="public">
            <one-of>
               <item>air</item>
               <item>rail</item>
               <item>boa</item>
            </one-of>
         </rule>
      </grammar>
   </salt:grammar>
   <salt:bind targetelement ="saltdebug" value="/" />
   <salt:bind targetelement ="spokenValue" value="//" />
</salt:listen>

XHTML + Voice (X+V)

X+V Example

<?xml version="1.0"?>
<html
 xmlns="http://www.w3.org/1999/xhtml"
 xmlns:vxml="http://www.w3.org/2001/vxml"
 xmlns:ev="http://www.w3.org/2001/xml-events"
 xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
>
  <head>
    <title>XHTML+Voice Example</title>
    <!-- voice handler -->
    <vxml:form id="sayHello">
      <vxml:block>
        <vxml:prompt xv:src="#hello"/>
      </vxml:block>
    </vxml:form>
  </head>
  <body>
    <h1>XHTML+Voice Example</h1>

    <p id="hello" ev:event="click" ev:handler="#sayHello">
      Hello World!
    </p>
  </body>
</html>

XHTML + Scripting

XHTML + State Transition Networks

Rule and Plan Based Approaches

Embedded vs Distributed solutions

Where Next?