W3C Voice Browsing WG - Multimodal Dialog Requirements and Specification

W3C Voice Browsing ActivityMultimodal subgroup

Goal: language spec for multimodal dialog interaction

Status

Language specification – possible approaches

  1. 1: Include elements for synchronisation within the documents

  2. 2: XML data structure specifies relationship between documents

Example dialog:

Computer, C:  says "Would you like coffee, tea, milk or nothing?"
and displays pictures of the choices

User, U:   "coffee" or clicks a picture

C:  "Would you like a cookie, a cake, a sandwich, or nothing?"
and displays pictures of the choices

U:  clicks on "sandwich" or says "sandwich"

C:  "Thank you for using the food and drink service!" 

Proposed elements for multimodality

Element Attribute Description
DialogML elements: <filled>, <noinput>, <menu>, ...   Applies to input from any modality
SMIL elements: <par>, <seq>, <excl>    
<show> src: uri to show
id: identifier
e.g. displays content of url in browser window
<update> id
expr
Update with key value pairs expr:key=val(;key=val)*
<listen> id
wait_before
Wait for input.
<close> id Close the window
<input-sync-excl>   Take the first input
<input-synch-all> start: start of sync window
end: end of synch window
Co-ordinate inputs from different modalities

Example with embedded synchronization

<form>
  <field name="drink">
    <show src="http://www.drinkfood.ex/drinkfood.html"
      id="drinkfood"/>
    <prompt>Would you like coffee,tea, milk,
      or nothing?</prompt>
    <input-sync-excl>
         <grammar src="drink.gram" type="application/x-jsgf"/>
         <listen id=“drinkfood"/>
    </input-sync-excl>
    <filled>
       <update namelist="drink" id="drinkfood"/> 
       <par> 
           <show src="advert.html" id="advert"/> 
           <audio src="advert.au" begin="5s" dur="10s"/>
       </par> 
    </filled>
  </field>
  <field name="food">
    <prompt>Would you like a cookie, a cake, a sandwich,
    or nothing?</prompt>
    <input-sync-excl>
         <grammar src="food.gram" type="application/x-jsgf"/>
         <listen id=“drinkfood"/> 
    </input-sync-excl>    
    <filled>
       <update namelist="food" id=“drinkfood"/> 
    </filled>
  </field>
  <block>
     <prompt>Thank you for using the food and drink
       service!</prompt>
     <close id=“drinkfood"/>
     <submit next="http://www.drinkfood.ex/drinkfood.asp" 
       namelist="drink food"/>
  </block> 
</form>

Example with separate synchronization

Assume you have two voice dialogs for food and drink, as well as matching html files. The following lists just the voice dialog files for brevity:

drink.vxml:
<?xml version="1.0"?>
<v?xml version="1.0">
  <form>
    <field name="drink">
      <prompt>Would you like coffee, tea, milk,
        or nothing?</prompt>
      <grammar src="drink.gram" type="application/x-jsgf"/>
    </field>
    <block>
       <submit next="http://www.drinkfood.example/drink.asp"
         namelist="drink"/>
    </block>
 </form>
</v?xml>
 
food.vxml:
<?xml version="1.0"?>
<v?xml version="1.0">
  <form>
    <field name="food">
      <prompt>Would you like a sandwich, cake, cookie
        or nothing?</prompt>
      <grammar src="food.gram" type="application/x-jsgf"/>
    </field>
    <block>
       <submit next="http://www.drinkfood.example/food.asp"
          namelist="food"/>
    </block>
 </form>
</v?xml>

Here is some markup for synchronizing the two:

<multimodal>
  <input-sync-excl>
    <show src=“drink.vxml”/>
    <show src=“drink.html”/>
  </input-sync-excl>
  <input-sync-excl>
    <show src=“food.vxml”/>
    <show src=“food.html”/>
  </input-sync-excl>
</multimodal>