Marianne Hickey, HP Labs
Goal: language spec for multimodal dialog interaction
Speech + other interaction modes
Focus: speech + small screen, pointing device, buttons
Reuse other mark-up
Multiple input/output modalities available simultaneously (and co-ordinated as a lower priority)
Status
May ’00: Paris face to face – show and tell multimodal demos
July ’00: requirements for multimodal dialog interaction http://www.w3.org/TR/multimodal-reqs
Sept ’00: Review draft Language Specification, W3C f2f
1: Include elements for synchronisation within the documents
2: XML data structure specifies relationship between documents
Example dialog:
Computer, C: says "Would you like coffee, tea, milk or nothing?" and displays pictures of the choices User, U: "coffee" or clicks a picture C: "Would you like a cookie, a cake, a sandwich, or nothing?" and displays pictures of the choices U: clicks on "sandwich" or says "sandwich" C: "Thank you for using the food and drink service!"
| Element | Attribute | Description |
|---|---|---|
| DialogML elements: <filled>, <noinput>, <menu>, ... | Applies to input from any modality | |
| SMIL elements: <par>, <seq>, <excl> | ||
| <show> | src: uri to show id: identifier |
e.g. displays content of url in browser window |
| <update> | id expr |
Update with key value pairs expr:key=val(;key=val)* |
| <listen> | id wait_before |
Wait for input. |
| <close> | id | Close the window |
| <input-sync-excl> | Take the first input | |
| <input-synch-all> | start: start of sync window end: end of synch window |
Co-ordinate inputs from different modalities |
<form>
<field name="drink">
<show src="http://www.drinkfood.ex/drinkfood.html"
id="drinkfood"/>
<prompt>Would you like coffee,tea, milk,
or nothing?</prompt>
<input-sync-excl>
<grammar src="drink.gram" type="application/x-jsgf"/>
<listen id=“drinkfood"/>
</input-sync-excl>
<filled>
<update namelist="drink" id="drinkfood"/>
<par>
<show src="advert.html" id="advert"/>
<audio src="advert.au" begin="5s" dur="10s"/>
</par>
</filled>
</field>
<field name="food">
<prompt>Would you like a cookie, a cake, a sandwich,
or nothing?</prompt>
<input-sync-excl>
<grammar src="food.gram" type="application/x-jsgf"/>
<listen id=“drinkfood"/>
</input-sync-excl>
<filled>
<update namelist="food" id=“drinkfood"/>
</filled>
</field>
<block>
<prompt>Thank you for using the food and drink
service!</prompt>
<close id=“drinkfood"/>
<submit next="http://www.drinkfood.ex/drinkfood.asp"
namelist="drink food"/>
</block>
</form>
Assume you have two voice dialogs for food and drink, as well as matching html files. The following lists just the voice dialog files for brevity:
drink.vxml:
<?xml version="1.0"?>
<v?xml version="1.0">
<form>
<field name="drink">
<prompt>Would you like coffee, tea, milk,
or nothing?</prompt>
<grammar src="drink.gram" type="application/x-jsgf"/>
</field>
<block>
<submit next="http://www.drinkfood.example/drink.asp"
namelist="drink"/>
</block>
</form>
</v?xml>
food.vxml:
<?xml version="1.0"?>
<v?xml version="1.0">
<form>
<field name="food">
<prompt>Would you like a sandwich, cake, cookie
or nothing?</prompt>
<grammar src="food.gram" type="application/x-jsgf"/>
</field>
<block>
<submit next="http://www.drinkfood.example/food.asp"
namelist="food"/>
</block>
</form>
</v?xml>
Here is some markup for synchronizing the two:
<multimodal>
<input-sync-excl>
<show src=“drink.vxml”/>
<show src=“drink.html”/>
</input-sync-excl>
<input-sync-excl>
<show src=“food.vxml”/>
<show src=“food.html”/>
</input-sync-excl>
</multimodal>