W3C

MMI interoperability test report

W3C Working Group Note 24 January 2012

This version:
http://www.w3.org/TR/2012/NOTE-mmi-interop-20120124/
Latest published version:
http://www.w3.org/TR/mmi-interop/
Previous version:
none
Editor:
Ingmar Kliche, Deutsche Telekom AG
Authors:
Nagesh Kharidi, Openstream
Piotr Wiechno, France Telecom

Abstract

This document describes an interoperability test, executed by various members of the Multimodal Interaction Working Group, to demonstrate interoperability of multimodal components which are implementing the "Multimodal Architecture and Interfaces" [MMI-ARCH] specification.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 23 January 2012 W3C Working Group Note of "MMI interoperability test report". This W3C Working Group Note has been developed by the Multimodal Interaction Working Group of the W3C Multimodal Interaction Activity.

Comments for this note are welcomed and should have a subject starting with the prefix '[INTEROP]'. Please send them to www-multimodal@w3.org, the public email list for issues related to Multimodal. This list is archived and acceptance of this archiving policy is requested automatically upon first post. To subscribe to this list send an email to www-multimodal-request@w3.org with the word "subscribe" in the subject line.

This document was published by the Multimodal Interaction Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to www-multimodal@w3.org (subscribe, archives). All comments are welcomed and should have a subject starting with the prefix '[INTEROP]'.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

The W3C MMI Working Group developed the “Multimodal Architecture and Interfaces” specification [MMI-ARCH], which is a W3C Candidate Recommendation since January 2011. To prove implementability various working group members (Openstream, France Telecom and Deutsche Telekom) initiated an interoperability test activity. The goal of this activity was to build a multimodal system based on components from the different participants and to prove and demonstrate interoperability of the components by using a simple application.

The key principles of the Multimodal Architecture are to treat Modality Components as 'black-boxes', making no assumptions about their internal implementation and allowing them to communicate solely through the life-cycle events as described. The architecture supports extension of functionality through extension-events, but requires that modality components do not communicate directly with each other, but only through the Interaction Manager.

The current exercise, is to verify that MCs developed by different vendors, can actually communicate through life cycle events and verify the architectural principles outlined in the Multimodal Architecture and Interfaces specification [MMI-ARCH].

2. Test setup

The multimodal system has been built using 3 architectural components: one Interaction Manager and two Modality Components. All 3 components have been provided by different participants. The following figure gives an overview of the structure of the system.

interoperabilty test architecture

As shown in the figure the system consisted of a voice modality component and a graphical modality component (GUI). The system has been set up as a distributed environment and HTTP has been used for the lifecycle event transport. A description of the details of the lifecycle event transport can be found in the "Multimodal Architecture and Interfaces" [MMI-ARCH] specification.

There is a more detailed description of implementation options (some of them have also been used during this test) in the "Authoring Applications for the Multimodal Architecture" note [MMI-AUTH]

3. Components of the multimodal system

Interaction Manager

The Interaction Manager is an HTTP server that runs SCXML applications and processes MMI life-cycle events. The server manages multiple SCXML interpreter instances simultaneously (one per each application context). Life-cycle events are transported via HTTP as described in Appendix E of the Multimodal Architecture and Interfaces specification. The server is Java-based and uses Commons-SCXML to interpret SCXML. HTTP transport and internal event routing is handled by Mule ESB.

An additional instanceId parameter has been added to initial HTTP requests sent by distributed Modality Components to enable instance (context) sharing. The instanceId parameter needs to be configured on each Modality Component that wants to share a context with another Modality Component. If the IM receives a NewContextRequest message with an attached instanceId that matches a previously received value, it will reuse the previously generated context ID in the NewContextResponse message. If no match is found for the provided instanceId or if the parameter is missing, the IM will generate a new context (state machine instance).

Graphical modality component

The graphical modality component has been implemented using HTML [HTML401], ECMAScript [ECMA-262] and XMLHttpRequests [XMLHTTPREQUEST]. The graphical modality component acts as a wrapper around the application specific HTML (see below) and handles DOM events (such as click or focus) generated by the application specific HTML. The graphical modality component uses XMLHttpRequests [XMLHTTPREQUEST] to send these events (wrapped in MMI lifecycle events) to the server side interaction manager. Lifecycle events are received from the interaction manager also usint XMLHttpRequests (see also "Authoring Applications for the Multimodal Architecture" [MMI-AUTH] for a more detailed description).

The following table describes the functionality and the API of the graphical modality component (see also [MMI-MCBP] for more details):

Life Cycle Event Component Implementation
NewContextRequest (Standard) The component requests a new context from the IM.
Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
	<mmi:NewContextRequest source="GUI" requestID="r1" target="MathQuiz" context="-1" />
</mmi:mmi>
NewContextResponse (Standard) The component starts a new context and assigns the new context id to it.
PrepareRequest The component does not take any action on a PrepareRequest.
PrepareResponse The component does not send a PrepareResponse.
StartRequest The component starts processing of the document referenced by contentURL
Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:StartRequest source="MathQuiz" requestID="r2" target="GUI" context="c1">
      <mmi:contentURL mmi:href=”http://localhost/gui.html”/>
   </mmi:StartRequest>
</mmi:mmi>
StartResponse (Standard)
DoneNotification Not used by the modality component.
CancelRequest This component cannot cancel.
CancelResponse The component does not send a CancelResponse.
PauseRequest This component cannot pause.
PauseResponse The component does not send a PauseResponse.
ResumeRequest This component cannot resume.
ResumeResponse The component does not send a ResumeResponse.
ExtensionNotification (focus) The component sends an “ExtensionNotification” lifecycle event to the IM when an input field gets focus. The attribute “id” contains the elements ID within the HTML document.
Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
  <mmi:ExtensionNotification mmi:name="focus" mmi:source="GUI" mmi:requestID="r2" mmi:context="c1">
    <mmi:data>
      <emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <focus id="field1" value="123" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:ExtensionNotification>
</mmi:mmi>
ExtensionNotification (click) The component sends an “ExtensionNotification” lifecycle event to the IM when a button was clicked. The attribute “id” contains the buttons ID within the HTML document.
Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
  <mmi:ExtensionNotification mmi:name="click" mmi:source="GUI" mmi:requestID="r2" mmi:context="c1">
    <mmi:data>
      <emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="button1" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:ExtensionNotification>
</mmi:mmi>
ExtensionNotification (change) The component sends an “ExtensionNotification” lifecycle event to the IM when the content of an input field was changed. This event will not be sent until the input field loses focus. The attribute “id” contains the input fields ID within the HTML document.
Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
  <mmi:ExtensionNotification mmi:name="change" mmi:source="GUI" mmi:requestID="r2" mmi:context="c1">
    <mmi:data>
      <emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <change id="field1" value="16" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:ExtensionNotification>
</mmi:mmi>
ExtensionNotification (setValue) The IM can send an “ExtensionNotification” lifecycle event to the modality component to set a value of an input field. The attribute “id” must contain ID of the HTML element.
Example:
<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:ExtensionNotification mmi:name="setValue" mmi:target="GUI" mmi:requestID="r3" mmi:context="c1">
    <mmi:data>
      <function name="setValue" id="field1" value="20" /> 
    </mmi:data>
  </mmi:ExtensionNotification>
</mmi:mmi>
ExtensionNotification (setFocus) The IM can send an “ExtensionNotification” lifecycle event to the modality component to set the focus of an input field. The attribute “id” must contain ID of the HTML element.
Example:
<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:ExtensionNotification mmi:name="setValue" mmi:target="GUI" mmi:requestID="r3" mmi:context="c1">
    <mmi:data>
      <function name="setFocus" id="field1" value="20" /> 
    </mmi:data>
  </mmi:ExtensionNotification>
</mmi:mmi>
ClearContextRequest (Standard)
clearContextResponse (Standard)
StatusRequest The component does not take any action on a StatusRequest.
StatusResponse The component does not take any action on a StatusRequest.

Voice modality component

The Openstream Voice modality component(VoiceMC) is a fully conformant MMI Modality Component providing both speech-recognition (ASR) and text-to-speech (TTS) functions. It is available on various mobile and desktop platforms including iOS, Android, Blackberry, Windows Phone, Windows XP and Windows 7. Both embedded as well as MRCP v1 and v2 compliant remote speech engines are supported.

The iOS implementation of the VoiceMC running on an iPhone device running iOS 4 was used for the interoperability test. The iOS instance of the VoiceMC has been implemented in Objective-C and it uses embedded ASR and TTS engines for speech-recognition and text-to-speech.

VoiceMC communicates with an Interaction Manager(IM) using the standard MMI [MMI-ARCH] lifecycle events. HTTP is used as the transport protocol, with the component acting as a client and always initiating the connection request to the IM acting as the HTTP server. Lifecycle events that are initiated by the component are sent to the IM using HTTP POST requests. HTTP GET requests are used to receive responses from the Interaction Manager, as well as to receive new lifecycle events initiated by the IM.

On startup, the VoiceMC sends the NewContextRequest MMI lifecycle event to establish a context with the IM. Once the context is established, IM can add/remove speech recognition grammars and initiate ASR/TTS operations using the standard MMI lifecycle events. Speech recognition grammars can be added or removed using the ExtensionNotification lifecycle event. Speech-recognition and text-to-speech operations can be initiated by sending the StartRequest lifecycle event. Speech recognition result is formatted as an EMMA document and sent to the IM in the DoneNotification lifecycle event.

The following table describes the functionality and the API of the voice modality component (see also [MMI-MCBP] for more details):

Life Cycle Event Component Implementation
NewContextRequest (Standard) The component requests a new context from the IM. Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:NewContextRequest source="Voice" requestID="vui-req-1" target="MathQuiz" context="-1">
   </mmi:NewContextRequest>
</mmi:mmi>
NewContextResponse (Standard) The component starts a new context and assigns the new context ID to it.
PrepareRequest The component requires a PrepareRequest event to initiate a session with a remote Voice server.
Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:PrepareRequest requestID="im-req-2" target="Voice" context="MathQuiz-2be0bbb7">
	 <mmi:contentURL href="http://example.com/VUI/VUIProxy" max-age="" fetchtimeout="1s"/>
   </mmi:PrepareRequest>
</mmi:mmi>
PrepareResponse (Standard) The component establishes a remote Voice server session.
StartRequest The component will start an ASR or TTS operation. Start TTS example:
<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:StartRequest target="Voice" requestID="im-req-03" context="MathQuiz-2be0bbb7">
    <mmi:content>play</mmi:content>
    <mmi:data>You are correct.  Please press next for the next question.</mmi:data>
  </mmi:StartRequest>
</mmi:mmi>
Start ASR example:
<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:StartRequest target="Voice" requestID="im-req-04" context="MathQuiz-2be0bbb7">
    <mmi:content>recognize</mmi:content>
  </mmi:StartRequest>
</mmi:mmi>
StartResponse (Standard)
DoneNotification The Modality component returns the result of processing the ASR or TTS request. The result of matching an active grammar with a user utterance will be contained within an EMMA document. TTS Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:DoneNotification source="Voice"  context="MathQuiz-2be0bbb7" status="success" requestID="vui-req-3">
   </mmi:DoneNotification>
</mmi:mmi>
ASR Example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma">
   <mmi:DoneNotification source="Voice"  context="MathQuiz-2be0bbb7" status="success" requestID="vui-req-4" confidential="true">
       <mmi:data>
           <emma:emma version="1.0">
               <emma:interpretation id="int1" emma:medium="acoustic" emma:confidence=".75" 
                                 emma:mode="voice">
                   <answer>18</answer>
               </emma:interpretation>
           </emma:emma>
       </mmi:data>
   </mmi:DoneNotification>
</mmi:mmi>
CancelRequest This component cannot cancel.
CancelResponse StatusInfo field is "cannot cancel".
PauseRequest This component cannot pause.
PauseResponse StatusInfo field is "cannot pause".
ResumeRequest This component cannot resume.
ResumeResponse StatusInfo field is "cannot resume".
ExtensionNotification This component processes “ExtensionNotification” lifecycle events from the IM when instructed to perform one of the following tasks:
  1. Add grammar with provided URL
  2. Remove grammar with provided URL
Add grammar example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:ExtensionNotification name="addGrammar" requestID="im-req-10" target="Voice" context="MathQuiz-2be0bbb7">
      <mmi:data>
         <grammar href=“http://example.com/app/gram/ex.gram" />          
      </mmi:data>
   </mmi:ExtensionNotification>
</mmi:mmi>
Remove grammar example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:ExtensionNotification name="removeGrammar" requestID="im-req-11" target="Voice" context="MathQuiz-2be0bbb7">
      <mmi:data>
         <grammar href=“http://example.com/app/gram/ex.gram"/>
      </mmi:data>
   </mmi:ExtensionNotification>
</mmi:mmi>
This component sends the following “ExtensionNotification” lifecycle events to the IM:
  1. Add grammar result
  2. Remove grammar result
  3. Recording started
  4. Recording ended
Add grammar result example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:ExtensionNotification name="addGrammarResult" requestID="vui-req-12" source="Voice" context="MathQuiz-2be0bbb7">
      <mmi:data>
         <grammar href="http://example.com/app/gram/ex.gram" status="success"/>
      </mmi:data>
   </mmi:ExtensionNotification>
</mmi:mmi>
Remove grammar result example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:ExtensionNotification name="removeGrammarResult" requestID="vui-req-13" source="Voice" context="MathQuiz-2be0bbb7">
      <mmi:data>
         <grammar href="http://example.com/app/gram/ex.gram"  status="success"/>
      </mmi:data>
   </mmi:ExtensionNotification>
</mmi:mmi>
Recording started example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:ExtensionNotification name="recordStarted" requestID="vui-req-14" source="Voice" context="MathQuiz-2be0bbb7">
   </mmi:ExtensionNotification>
</mmi:mmi>
Recording ended example:
<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:ExtensionNotification name="recordEnded" requestID="vui-req-15" source="Voice" context="MathQuiz-2be0bbb7">
   </mmi:ExtensionNotification>
</mmi:mmi>
ClearContextRequest (Standard)
clearContextResponse (Standard)
StatusRequest (Standard)
StatusResponse The component returns a standard life cycle response. The "automaticUpdate" attribute is "false", because this component does not supply automatic updates.

4. A MathQuiz

The application

For the purpose of showing interoperability a simple math-quiz application has been developed: the quiz consists of a series of questions given one by one until the player chooses to stop. The question is going to be presented by voice and visual output. The user has the opportunity to answer the quiz question either by voice or by GUI input. Correct solutions are given immediately after the player's answers.

After starting the application a screen is presented at the GUI containing the following elements:

  1. a “question label” containing the question (e.g. “2+2”)
  2. an input element giving the user the ability to enter the solution
  3. an “OK” Button to confirm the input
  4. a “voice input activation” button for voice input acitivation (push to activate)
  5. a “validity label” indicating the validity of the users solution (this label might be gray at the beginning and will be switched to red or green depending on the validity of the users response)
  6. a “stop button” to stop finish application
  7. a “next button” to go to the next question (the “next button” is disabled by default and will be enabled only once the user answered the question correctly)

At the same time (while starting the GUI component) the VUI component plays a prompt with the question “what is two plus two?”.

If the user uses GUI input to answer the question he has to give input into the input element and click the OK button. The system responds by:

  1. changing the “validity label” either to red or green
  2. enabling the “next button” in case of a right answer
  3. playing a prompt, either “this is correct” or “this is not correct, please try again”

If the user wants to use voice input to answer the question he has to press the microphone button on the GUI to activate the speech recognition and speak into the microphone. The system responds by:

  1. displaying the recognition result at the GUI
  2. changing the “validity label” either to red or green
  3. enabling the “next button” in case of a right answer
  4. playing a prompt, either “this is correct” or “this is not correct, please try again”

Life cycle event flow

The following section describes the detailed event flow during application execution and shows the details of each lifecycle event.

Figure 1 shows the initialization sequence. The dotted rectangle on the left hand side shows visual representation of the graphical modality component. During the initialization the graphical component has not loaded any visual content.

sequence diagram 01

Lifecycle Event 1:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:newContextRequest source="GUI" target="MathQuiz" requestID="gui-req-01" context="-1" />
</mmi:mmi>

Lifecycle Event 2:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:newContextResponse target="GUI" requestID="gui-req-01" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 3:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:newContextRequest source="Voice" target="MathQuiz" requestID="voice-req-01" context="-1" />
</mmi:mmi>

Lifecycle Event 4:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:newContextResponse target="Voice" requestID="voice-req-01" context="ctx-01" status="success" />
</mmi:mmi>

After all modality components have connected to the interaction manager, the preparation sequence is started by the interaction manager. In this configuration the voice modality component is instructed to load application specific grammars. The graphical component still has not loaded any visual content.

sequence diagram 02

Lifecycle Event 5:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
  <mmi:extensionNotification name="addGrammar" requestID="im-req-v2" target="Voice" context="ctx-01">
    <mmi:data><grammar href="http://example.com/mathQuiz/gram/answer_options.gram"/></mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Lifecycle Event 6:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
  <mmi:extensionNotification name="addGrammarResult" requestID="im-req-v2" source="Voice" context="ctx-01">
    <mmi:data><grammar href="http://example.com/mathQuiz/gram/answer_options.gram" status="success"/></mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Now, as the voice modality component has been prepared, the interaction manager starts the actual application. It sends StartRequests to both modality components. The graphical modality component loads and displays the "welcome.html" page, which is shown on the left hand side. The voice modality component plays the following prompt to the user: "Welcome to the Math-Quiz. Please press 'Start' to continue."

sequence diagram 03

Lifecycle Event 7:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="GUI" requestID="im-req-01" context="ctx-01">
    <mmi:contentURL href="welcome.html" />
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 8:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="GUI" requestID="im-req-01" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 9:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v3" context="ctx-01">
    <mmi:data><voice cmd="play"><text>Welcome to the math quiz. Please press Start to continue.</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 10:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v3" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 11:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v3" >
   </mmi:doneNotification>
</mmi:mmi>

To start the application, the user presses the "Start" button on the graphical modality component. This leads to an ExtensionNotification lifecycle event, which contains the click event represented within an EMMA [EMMA] document.

sequence diagram 04

Lifecycle Event 12:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-02" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="start_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

The interaction manager script sends StartRequests to both modality components. The graphical modality component loads the visual representation of the first math quiz question whereas the voice modality is instructed to play the question "What is 7 plus 9?"

sequence diagram 05

Lifecycle Event 13:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="GUI" requestID="im-req-03" context="ctx-01">
    <mmi:contentURL href="question1.html" />
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 14:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="GUI" requestID="im-req-03" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 15:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v4" context="ctx-01">
    <mmi:data><voice cmd="play"><text>Welcome What is 7 plus 9?</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 16:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v4" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 17:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v4" >
   </mmi:doneNotification>
</mmi:mmi>

The user now uses the keyboard to enter the answer (types "1" and "6"). The application dependent HTML has been written in a way that the change event is used to send the value of the input element to the interaction manager. The change event is thrown as soon as the input element looses its focus (and not when the actual input occurs, this is why there is no lifecycle event generated as a direct result to the keyboard input of "1" and "6"), which happens when the user clicks the "check" button. Thus two lifecycle events are sent to the interaction manager immediately after each other.

The user hits the "1" on the keyboard:

sequence diagram 06

The user hits the "6" on the keyboard:

sequence diagram 07

The user clicks the "check" button, which leads to the "change" and "click" HTML DOM events, which are handled by the modality components scripts and wrapped into two lifecycle events:

sequence diagram 08

Lifecycle Event 18:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="change" source="GUI" requestID="gui-req-03" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <change id="answer_field" value="16" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Lifecycle Event 19:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-04" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="check_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

The interaction manager script (application dependent SCXML, see below) checks the users input. Since the answer is correct, it sends an ExtensionNotification lifecycle event to the graphical modality component to change the result indicator to green color. A StartRequest is used to instruct the voice modality component to play "This is correct. Please press 'Next' for another question or press 'Stop' to finish the application."

sequence diagram 09

Lifecycle Event 19:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-04" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="check_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Lifecycle Event 20:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="setValue" target="GUI" requestID="im-req-05" context="ctx-01">
    <mmi:data>
      <function name="setValue" id="result_identifier" value="on" />
    </mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 21:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v5" context="ctx-01">
    <mmi:data><voice cmd="play"><text>This is correct. Please press Next for another question or press Stop to finish the application.</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 22:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v5" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 23:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v5" >
   </mmi:doneNotification>
</mmi:mmi>

Since the user clicks "Next", the graphical modality component again sends an ExtensionNotification lifecycle event to the interaction manager.

sequence diagram 10

Lifecycle Event 24:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-05" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="next_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

As above, the interaction manager script instructs both modality components to load new markup and play a prompt to present the next math quiz question to the user.

sequence diagram 11

Lifecycle Event 25:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="GUI" requestID="im-req-07" context="ctx-01">
    <mmi:contentURL href="question2.html" />
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 26:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="GUI" requestID="im-req-07" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 27:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v6" context="ctx-01">
    <mmi:data><voice cmd="play"><text>What is 22 minus 4?</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 28:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v6" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 29:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v6" >
   </mmi:doneNotification>
</mmi:mmi>

This time the user decides to use the voice modality to answer the question. To activate the speech recognition he clicks the microphone button at the GUI. Again, a generic lifecycle event is sent to the interaction manager, representing the click at the microphone HTML element. The interaction manager script sends a StartRequest lifecycle event to the voice modality to instruct it to open the microphone and start the speech recognition process.

sequence diagram 12

Lifecycle Event 30:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-06" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="voice_input_activation_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Lifecycle Event 31:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v7" context="ctx-01">
    <mmi:data><voice cmd="recognize"/></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 32:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v7" context="ctx-01" status="success" />
</mmi:mmi>

The user speaks "twenty" into the microphone. After automatic endpointing the voice modality component uses a DoneNotification lifecycle event to send the users input to the interaction manager. Note that this information is represented using an EMMA document.

sequence diagram 13

Lifecycle Event 33:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
  <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v7" >
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int1" emma:medium="acoustic" emma:confidence=".75"  emma:mode="voice">
          <answer>20</answer>
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:doneNotification>
</mmi:mmi>

To display the result of the voice input at the GUI the interaction manager script sends an extensionNotification lifecycle event to instruct the graphical modality component to change the value of the HTML input field accordingly.

sequence diagram 14

Lifecycle Event 34:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="setValue" target="GUI" requestID="im-req-10" context="ctx-01">
    <mmi:data>
      <function name="setValue" id="answer_field" value="20" />
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Again the user clicks the "Check" button at the GUI to check the input for correctness.

sequence diagram 15

Lifecycle Event 35:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-07" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="check_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Since this answer is not correct, the interaction manager instructs the voice modality component to play "This is not correct. Please try again." Note that the voice modality component sends two lifecycle events in response to the StartRequest. A StartResponse immediately after it has started to play out the prompt and a DoneNotification once it has finished the prompt.

sequence diagram 16

Lifecycle Event 36:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v8" context="ctx-01">
    <mmi:data><voice cmd="play"><text>This is not correct. Please try again.</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 37:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v8" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 38:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v8" >
   </mmi:doneNotification>
</mmi:mmi>

Again the user tries to answer the questions, this time using the keyboard. He types "1" and "8" on his keyboard.

The user hits the "1" on the keyboard:

sequence diagram 17

The user hits the "8" on the keyboard:

sequence diagram 18

The user clicks the "check" button, which leads to the "change" and "click" HTML DOM events, which are handled by the modality components scripts and wrapped into two lifecycle events:

sequence diagram 19

Again, since this is a generic behavior of the graphical modality component, two lifecycle events are sent to the interaction manager immediately after each other.

Lifecycle Event 39:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="change" source="GUI" requestID="gui-req-08" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <change id="answer_field" value="18" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

Lifecycle Event 40:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-09" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="check_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

This input is checked within the interaction manager script and since it is correct the interaction manager will send lifecycle events both modality components to change the visual indicator to green color and to play "This is correct. Please press 'Next' for another question or press 'Stop' to finish the application."

sequence diagram 20

Lifecycle Event 41:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="setValue" target="GUI" requestID="im-req-13" context="ctx-01">
    <mmi:data>
      <function name="setValue" id="result_identifier" value="on" />
    </mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 42:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v9" context="ctx-01">
    <mmi:data><voice cmd="play"><text>This is correct. Please press Next for another question or Stop to finish the application.</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 43:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v9" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 44:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v9" >
   </mmi:doneNotification>
</mmi:mmi>

The user decides to stop the quiz at this point and clicks the "Stop" button at the GUI.

sequence diagram 21

Lifecycle Event 45:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extensionNotification name="click" source="GUI" requestID="gui-req-10" context="ctx-01">
    <mmi:data>
      <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
        <emma:interpretation id="int-01" emma:medium="tactile" emma:mode="ink">
          <click id="stop_button" value="" />
        </emma:interpretation>
      </emma:emma>
    </mmi:data>
  </mmi:extensionNotification>
</mmi:mmi>

The interaction manager now sends StartRequests to both component to display a "Goodbye" page and play "Thank you and goodbye."

sequence diagram 21

Lifecycle Event 46:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="GUI" requestID="im-req-15" context="ctx-01">
    <mmi:contentURL href="goodbye.html" />
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 47:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="GUI" requestID="im-req-15" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 48:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest target="Voice" requestID="im-req-v10" context="ctx-01">
    <mmi:data><voice cmd="play"><text>Thank you and goodbye.</text></voice></mmi:data>
  </mmi:startRequest>
</mmi:mmi>

Lifecycle Event 49:

<mmi:mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startResponse source="Voice" requestID="im-req-v10" context="ctx-01" status="success" />
</mmi:mmi>

Lifecycle Event 50:

<mmi:mmi xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="Voice" context="ctx-01" status="success" requestID="im-req-v10" >
   </mmi:doneNotification>
</mmi:mmi>

A. Application scripts

The application consists of two components which use markup languages. The interaction manager uses the state chart description language SCXML [SCXML] to build the application specific functionality, whereas the GUI modality component uses HTML [HTML401] for the application specific part.

SCXML

The following figure shows a visual representation of the interaction manager state machine.

state machine
  1. Send 'welcome' StartRequests to both Modality Components. Provide HTML page URL in the GUI-MC StartRequest, and text to synthesize in the Voice-MC StartRequest.
  2. Process ExtensionNotification events from Modality Components. At this stage, we collect and register GUI events. Once a condition has been satisfied (a "clicked" event occurred on the "start_button" object), proceed to the next state.
  3. Load a question. In a more complex setup, this could mean fetching data from a question database; here, we simply switch between two hardcoded question/answer pairs.
  4. Send 'question' StartRequests to both Modality Components.
  5. Process ExtensionNotification/DoneNotification events from Modality Components. Once a condition has been satisfied, perform an appropriate action:
    • if the GUI "voice input activation button" has been clicked, send a "recognize" command to the Voice MC
    • if an answer has been provided by either of the Modality Components, evaluate the answer and send the result to both MCs
    • if the GUI "next" button has been clicked, transition to next question
    • if the GUI "stop" button has been clicked, transition to the "goodbye" state
  6. Send 'goodbye' StartRequests to both Modality Components.

The following markup code shows the SCXML which describes the application logic. The code is loaded by the interaction manager at application start.

<?xml version="1.0" encoding="UTF-8"?>
<scxml xmlns="http://www.w3.org/2005/07/scxml" 
	xmlns:commons="http://commons.apache.org/scxml"
	xmlns:emma="http://www.w3.org/2003/04/emma" 
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xmlns:mmi="http://www.w3.org/2008/04/mmi-arch"
	xsi:schemaLocation="http://www.w3.org/2005/07/scxml scxml.xsd http://commons.apache.org/scxml commons.xsd http://www.w3.org/2008/04/mmi-arch mmi.xsd http://www.w3.org/2003/04/emma emma.xsd" version="1.0"
	initialstate="init">

	<!-- data model definition -->
	<datamodel>

		<!-- top-level variables -->
		<data name="contextId" />
		<data name="requestId" />
		<data name="sourceMC" />
		<data name="source" expr="IM" />

		<data name="gui_events">
			<clicked xmlns="" />
			<changed xmlns="" />
			<value xmlns="" />
		</data>

		<data name="ttsWelcome">
			<voice xmlns="" cmd="play"><text><![CDATA[Welcome to the math quiz. Please press start to continue.]]></text></voice>
		</data>
		
		<data name="ttsQuestion">
			<voice xmlns="" cmd="play"><text><![CDATA[Question placeholder.]]></text></voice>
		</data>
		
		<data name="ttsCorrect">
			<voice xmlns="" cmd="play"><text><![CDATA[This is correct. Please press Next for another question or press Stop to finish the application.]]></text></voice>
		</data>
		
		<data name="ttsIncorrect">
			<voice xmlns="" cmd="play"><text><![CDATA[This is not correct. Please try again.]]></text></voice>
		</data>
		<data name="ttsGoodbye">
			<voice xmlns="" cmd="play"><text><![CDATA[Thank you and goodbye.]]></text></voice>
		</data>
		
		<data name="cmdRecognize">
			<voice xmlns="" cmd="recognize"></voice>
		</data>
		
		<data name="initGrammar">
			<grammar xmlns="" href="http://64.9.100.106/mmi/mathQuiz/gram/answer_options.gram"/>
		</data>
		
		<data name="url" />
		
		<data name="problem">
			<text xmlns="" />
			<answer xmlns="" />
			<id xmlns="" />
			<url xmlns="" />
		</data>

		<data name="setCorrect">
			<function xmlns="" name="setValue" id="result_identifier" value="on" />
		</data>
		
		<data name="setIncorrect">
			<function xmlns="" name="setValue" id="result_identifier" value="off" />
		</data>
		<data name="setAnswer">
			<function xmlns="" name="setValue" id="answer_field" value="" />
		</data>

	</datamodel>

	<state id="init">
		<onentry>
			<log label="[init]" expr="Waiting for new context request..." />
		</onentry>
		<transition event="mmi:newContextRequest" target="welcome">
			<assign name="requestId" expr="${Data(_eventdata, '//mmi:newContextRequest/@requestID')}" />
			<assign name="sourceMC" expr="${Data(_eventdata, '//mmi:newContextRequest/@source')}" />
			<assign name="contextId" expr="${mmi:newContextId()}" />
			<commons:var name="newContextResponse" expr="${mmi:newContextResponse(contextId, source, sourceMC, requestId)}" />
			<send event="mmi:newContextResponse" target="${sourceMC}" targettype="MC" namelist="newContextResponse" />
		</transition>
	</state>

	<state id="welcome">
		<onentry>
			<assign location="${Data(problem, 'id')}" expr="1" />
			<commons:var name="extensionNotification" expr="${mmi:newExtensionNotification(contextId, source, 'Voice', mmi:newRequestId(contextId), 'addGrammar', initGrammar)}" />
			<send event="mmi:extensionNotification" target="Voice" targettype="MC" namelist="extensionNotification" />
		</onentry>
		<initial>
			<transition target="welcome.send"/>
		</initial>
		<state id="welcome.send">
			<onentry>
				<commons:var name="startRequest" expr="${mmi:newStartRequest(contextId, source, 'GUI', mmi:newRequestId(contextId), 'welcome.html', null)}" />
				<send event="mmi:startRequest" target="GUI" targettype="MC" namelist="startRequest" />
				<commons:var name="startRequest" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'welcome.vxml', ttsWelcome)}" />
				<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequest" />
			</onentry>
			<transition event="mmi:startResponse" target="welcome.wait" />
		</state>
		<state id="welcome.wait">
			<transition event="mmi:extensionNotification" target="welcome.wait">
				<if cond="${Data(_eventdata, '//click/@id') ne ''}">
					<assign location="${Data(gui_events, 'clicked')}" expr="${Data(_eventdata, '//click/@id')}" />
				</if>
			</transition>
			<transition cond="${Data(gui_events, 'clicked') eq 'start_button'}" target="question" />
		</state>
	</state>

	<state id="question">
		<initial>
			<transition target="question.load"/>
		</initial>
		<state id="question.send">
			<onentry>
				<assign name="url" expr="${Data(problem, 'url')}" />
				<commons:var name="startRequestGUI" expr="${mmi:newStartRequest(contextId, source, 'GUI', mmi:newRequestId(contextId), url, null)}" />
				<send event="mmi:startRequest" target="GUI" targettype="MC" namelist="startRequestGUI" />
				<assign location="${Data(ttsQuestion, 'voice/text')}" expr="${Data(problem, 'text')}" />
				<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', ttsQuestion)}" />
				<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />				
			</onentry>
			<transition event="mmi:startResponse" target="question.wait"/>
		</state>
		<state id="question.wait">
			<transition event="mmi:extensionNotification" target="question.wait">
				<if cond="${Data(_eventdata, '//click/@id') ne ''}">
					<assign location="${Data(gui_events, 'clicked')}" expr="${Data(_eventdata, '//click/@id')}" />
				</if>
				<if cond="${Data(_eventdata, '//change/@id') ne ''}">
					<assign location="${Data(gui_events, 'changed')}" expr="${Data(_eventdata, '//change/@id')}" />
					<assign location="${Data(gui_events, 'value')}" expr="${Data(_eventdata, '//change/@value')}" />
				</if>
			</transition>
			<transition event="mmi:doneNotification" target="question.wait">
				<if cond="${Data(_eventdata, '//answer') ne ''}">
					<assign location="${Data(gui_events, 'value')}" expr="${Data(_eventdata, '//answer')}" />
					<assign location="${Data(gui_events, 'clicked')}" expr="check_button" />
					<assign location="${Data(setAnswer, 'function/@value')}" expr="${Data(gui_events, 'value')}" />
					<commons:var name="extensionNotification" expr="${mmi:newExtensionNotification(contextId, source, sourceMC, mmi:newRequestId(contextId), 'setValue', setAnswer)}" />
					<send event="mmi:extensionNotification" target="GUI" targettype="MC" namelist="extensionNotification" />
				</if>
			</transition>			
			<transition cond="${Data(gui_events, 'clicked') eq 'voice_input_activation_button'}" target="question.wait">
				<assign location="${Data(gui_events, 'clicked')}" expr="" />
				<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', cmdRecognize)}" />
				<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />					
			</transition>
			<transition cond="${Data(gui_events, 'clicked') eq 'check_button'}" target="question.wait">
				<assign location="${Data(gui_events, 'clicked')}" expr="" />
				<if cond="${Data(gui_events, 'value') eq Data(problem, 'answer')}">
					<commons:var name="extensionNotification" expr="${mmi:newExtensionNotification(contextId, source, sourceMC, mmi:newRequestId(contextId), 'setValue', setCorrect)}" />
					<send event="mmi:extensionNotification" target="GUI" targettype="MC" namelist="extensionNotification" />
					<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', ttsCorrect)}" />
					<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />	
					<else />
					<commons:var name="extensionNotification" expr="${mmi:newExtensionNotification(contextId, source, sourceMC, mmi:newRequestId(contextId), 'setValue', setIncorrect)}" />
					<send event="mmi:extensionNotification" target="GUI" targettype="MC" namelist="extensionNotification" />
					<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', ttsIncorrect)}" />
					<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />				
					<assign location="${Data(ttsQuestion, 'voice/text')}" expr="${Data(problem, 'text')}" />
					<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', ttsQuestion)}" />
					<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />								
					<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', cmdRecognize)}" />
					<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />						
				</if>
			</transition>
			<transition cond="${Data(gui_events, 'clicked') eq 'stop_button'}" target="goodbye">
				<assign location="${Data(gui_events, 'clicked')}" expr="" />
			</transition>
			<transition cond="${Data(gui_events, 'clicked') eq 'next_button'}" target="question">
				<assign location="${Data(gui_events, 'clicked')}" expr="" />
			</transition>
		</state>

		<state id="question.load">
			<onentry>
				<if cond="${(Data(problem, 'id') mod 2) eq 1}">
					<assign location="${Data(problem, 'text')}" expr="What is 7 plus 9?" />
					<assign location="${Data(problem, 'answer')}" expr="16" />
					<assign location="${Data(problem, 'url')}" expr="question1.html" />
				</if>
				<if cond="${(Data(problem, 'id') mod 2) eq 0}">
					<assign location="${Data(problem, 'text')}" expr="What is 22 minus 4?" />
					<assign location="${Data(problem, 'answer')}" expr="18" />
					<assign location="${Data(problem, 'url')}" expr="question2.html" />
				</if>
				<assign location="${Data(problem, 'id')}" expr="${Data(problem, 'id') + 1}" />
			</onentry>
			<transition target="question.send"/>
		</state>

	</state>

	<state id="goodbye">
		<initial>
			<transition target="goodbye.send"/>
		</initial>
		<state id="goodbye.send">
			<onentry>
				<commons:var name="startRequest" expr="${mmi:newStartRequest(contextId, source, sourceMC, mmi:newRequestId(contextId), 'goodbye.html')}" />
				<send event="mmi:startRequest" target="${sourceMC}" targettype="MC" namelist="startRequest" />
				<commons:var name="startRequestVoice" expr="${mmi:newStartRequest(contextId, source, 'Voice', mmi:newRequestId(contextId), 'question.vxml', ttsGoodbye)}" />
				<send event="mmi:startRequest" target="Voice" targettype="MC" namelist="startRequestVoice" />
			</onentry>
			<transition event="mmi:startResponse" target="quit" />
		</state>
	</state>

	<state id="quit">
	</state>

</scxml>

HTML

The following HTML code shows the welcome.html page, which represents the application specific markup and will be loaded from the graphical modality component as a reaction to a mmi:StartRequest lifecycle event.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<title>Welcome - MathQuiz</title>
<link href="css/layout.css" rel="stylesheet" type="text/css" />
<script type="text/javascript">
    var EVENTTYPES = new Array();
    var HTMLELEMENTS = new Array();
    var FORMELEMENTS = new Array();
    EVENTTYPES[0] = new Array("click");
    HTMLELEMENTS[0] = new Array("input");
    FORMELEMENTS[0] = new Array("button");
</script>
</head>

<body class="col3_only">
<div class="page_margins">
    <div id="header">
        Math-Quiz
    </div>
    <div id="main">
        <div id="col1">
        </div>
        <div id="col2">
        </div>
        <div id="col3">
            <div id="col3_content">
                <p class="center">Welcome</p>
            </div>
        </div>
    </div>
    <div id="footer">
	   <input id="start_button" type="button" value="Start" />
    </div>
</div>
</body>
</html>

The application code above contains some ECMAScript varialbes (arrays such as EVENTTYPES). These variables give the application author control over which DOM elements are to be handled by the graphical modality component wrapper and forwarded to the interaction manager. This functionality is used to optimize performance and to avoid flooding the interaction manager with events like mouse movements or hover.

The following figure shows a graphical representation of the application dependent 'welcome' page:

welcome page

There are similar HTML documents for other stages of the application which will be loaded in reaction to other mmi:StartRequest lifecycle events.

Grammar

This ABNF speech grammar [SPEECH-GRAMMAR] has been used by the voice modality component to recognize the math quiz results:

#ABNF 1.0 iso-8859-1;
language en-US;

mode voice;
root $mathquizanswer;
tag-format <semantics/1.0>;

public $mathquizanswer =
	1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20;

The modality component loads this grammar in response to a ExtensionNotification lifecycle event (see description above). Recognition results are compiled into an EMMA [EMMA] document and sent to the interaction manager using a DoneNotification lifecycle event.

B. Acknowledgements

The authors would like to recognize the contributions of the members of the W3C Multimodal Interaction Group. Special thanks to Thomas Ziem (T-Systems) and Jakob Sachse for supporting the test activity.

C. References

C.1 Normative references

[ECMA-262]
ECMAScript Language Specification, Third Edition. December 1999. URL: http://www.ecma-international.org/publications/standards/Ecma-262.htm
[EMMA]
Michael Johnston. EMMA: Extensible MultiModal Annotation markup language. 10 February 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-emma-20090210
[HTML401]
David Raggett; Ian Jacobs; Arnaud Le Hors. HTML 4.01 Specification. 24 December 1999. W3C Recommendation. URL: http://www.w3.org/TR/1999/REC-html401-19991224
[MMI-ARCH]
Jim Barnett. Multimodal Architecture and Interfaces. 12 January 2012. W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-mmi-arch-20120112
[SCXML]
Jim Barnett; et al. State Chart XML (SCXML): State Machine Notation for Control Abstraction. 26 April 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-scxml-20110426
[SPEECH-GRAMMAR]
Andrew Hunt; Scott McGlashan. Speech Recognition Grammar Specification Version 1.0. 16 March 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-speech-grammar-20040316
[XMLHTTPREQUEST]
Anne van Kesteren. The XMLHttpRequest Object. 03 August 2010. W3C Candidate Recommendation. (Work in progress.) URL: http://www.w3.org/TR/2010/CR-XMLHttpRequest-20100803

C.2 Informative references

[MMI-AUTH]
Ingmar Kliche. Authoring Applications for the Multimodal Architecture. 2 July 2008. W3C Note. URL: http://www.w3.org/TR/2008/NOTE-mmi-auth-20080702
[MMI-MCBP]
Ingmar Kliche. Best practices for creating MMI Modality Components. 1 March 2011. W3C Note. URL: http://www.w3.org/TR/2011/NOTE-mmi-mcbp-20110301