Marianne Hickey, HP Labs
Goal: language spec for multimodal dialog interaction
Speech + other interaction modes
Focus: speech + small screen, pointing device, buttons
Reuse other mark-up
Multiple input/output modalities available simultaneously (and co-ordinated as a lower priority)
Status
May ’00: Paris face to face – show and tell multimodal demos
July ’00: requirements for multimodal dialog interaction http://www.w3.org/TR/multimodal-reqs
Sept ’00: Review draft Language Specification, W3C f2f
1: Include elements for synchronisation within the documents
2: XML data structure specifies relationship between documents
Example dialog:
Computer, C: says "Would you like coffee, tea, milk or nothing?" and displays pictures of the choices User, U: "coffee" or clicks a picture C: "Would you like a cookie, a cake, a sandwich, or nothing?" and displays pictures of the choices U: clicks on "sandwich" or says "sandwich" C: "Thank you for using the food and drink service!"
Element | Attribute | Description |
---|---|---|
DialogML elements: <filled>, <noinput>, <menu>, ... | Applies to input from any modality | |
SMIL elements: <par>, <seq>, <excl> | ||
<show> | src: uri to show id: identifier |
e.g. displays content of url in browser window |
<update> | id expr |
Update with key value pairs expr:key=val(;key=val)* |
<listen> | id wait_before |
Wait for input. |
<close> | id | Close the window |
<input-sync-excl> | Take the first input | |
<input-synch-all> | start: start of sync window end: end of synch window |
Co-ordinate inputs from different modalities |
<form> <field name="drink"> <show src="http://www.drinkfood.ex/drinkfood.html" id="drinkfood"/> <prompt>Would you like coffee,tea, milk, or nothing?</prompt> <input-sync-excl> <grammar src="drink.gram" type="application/x-jsgf"/> <listen id=“drinkfood"/> </input-sync-excl> <filled> <update namelist="drink" id="drinkfood"/> <par> <show src="advert.html" id="advert"/> <audio src="advert.au" begin="5s" dur="10s"/> </par> </filled> </field> <field name="food"> <prompt>Would you like a cookie, a cake, a sandwich, or nothing?</prompt> <input-sync-excl> <grammar src="food.gram" type="application/x-jsgf"/> <listen id=“drinkfood"/> </input-sync-excl> <filled> <update namelist="food" id=“drinkfood"/> </filled> </field> <block> <prompt>Thank you for using the food and drink service!</prompt> <close id=“drinkfood"/> <submit next="http://www.drinkfood.ex/drinkfood.asp" namelist="drink food"/> </block> </form>
Assume you have two voice dialogs for food and drink, as well as matching html files. The following lists just the voice dialog files for brevity:
drink.vxml: <?xml version="1.0"?> <v?xml version="1.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing?</prompt> <grammar src="drink.gram" type="application/x-jsgf"/> </field> <block> <submit next="http://www.drinkfood.example/drink.asp" namelist="drink"/> </block> </form> </v?xml> food.vxml: <?xml version="1.0"?> <v?xml version="1.0"> <form> <field name="food"> <prompt>Would you like a sandwich, cake, cookie or nothing?</prompt> <grammar src="food.gram" type="application/x-jsgf"/> </field> <block> <submit next="http://www.drinkfood.example/food.asp" namelist="food"/> </block> </form> </v?xml>
Here is some markup for synchronizing the two:
<multimodal> <input-sync-excl> <show src=“drink.vxml”/> <show src=“drink.html”/> </input-sync-excl> <input-sync-excl> <show src=“food.vxml”/> <show src=“food.html”/> </input-sync-excl> </multimodal>