Copyright © 2010 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at
This document is the seventh Public Working Draft of "Multimodal Architecture and Interfaces" published on 21 September 2010 for review by W3C Members and other interested parties, and has been developed by the Multimodal Interaction Working Group as part of the W3C Multimodal Interaction Activity. The main normative changes from the previous draft are:
A diff-marked version of this document is also available for comparison purposes. Please note that many sections have been modified because of above changes, the editors would like readers to read the whole document carefully and give comments.
Comments for this specification are welcomed and should have a subject starting with the prefix '[ARCH]'. Please send them to, the public email list for issues related to Multimodal. This list is archived and acceptance of this archiving policy is requested automatically upon first post. To subscribe to this list send an email to> with the word subscribe in the subject line.
For more information about the Multimodal Interaction Activity, please see the Multimodal Interaction Activity statement.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
1 Summary
2 Overview
3 Design versus Run-Time considerations
3.1 Markup and The
Design-Time View
3.2 Software
Constituents and The Run-Time View
3.3 Differences from
Compound Document Formats
3.4 Relationship to EMMA
4 Overview of
4.1 Run-Time
Architecture Diagram
4.2 The
4.2.1 The Interaction Manager
4.2.2 The Data Component
4.2.3 The Modality Components
4.2.4 The Runtime Framework
The Event Transport Layer
Event and Information Security
4.2.5 System and OS Security
4.2.6 Media stream handling
4.2.7 Examples
5 Interface between the Interaction Manager
and the Modality Components
5.1 Common
Event Fields
5.1.1 Context
5.1.2 Source
5.1.3 Target
5.1.4 RequestID
5.1.5 Status
5.1.6 StatusInfo
5.1.7 Data
5.1.8 Confidential
5.2 Standard Life Cycle
5.2.1 NewContextRequest/NewContextResponse
NewContextRequest Properties
NewContextResponse Properties
5.2.2 PrepareRequest/PrepareResponse
PrepareRequest Properties
PrepareResponse Properties
5.2.3 StartRequest/StartResponse
StartRequest Properties
StartResponse Properties
5.2.4 DoneNotification
DoneNotification Properties
5.2.5 CancelRequest/CancelResponse
CancelRequest Properties
CancelResponse Properties
5.2.6 PauseRequest/PauseResponse
PauseRequest Properties
PauseResponse Properties
5.2.7 ResumeRequest/ResumeResponse
ResumeRequest Properties
ResumeResponse Properties
5.2.8 ExtensionNotification
ExtensionNotification Properties
5.2.9 ClearContextRequest/ClearContextResponse
ClearContextRequest Properties
ClearContextResponse Properties
5.2.10 StatusRequest/StatusResponse
Status Request Properties
StatusResponse Properties
5.3 Modality Component
A Examples of Life-Cycle
A.1 newContextRequest
(from MC to IM)
A.2 newContextResponse
(from IM to MC)
A.3 prepareRequest (from
IM to MC, with external markup)
A.4 prepareRequest (from
IM to MC, inline VoiceXML markup)
A.5 prepareResponse
(from MC to IM, success)
A.6 prepareResponse
(from MC to IM, failure)
A.7 startRequest (from
IM to MC)
A.8 startResponse (from
MC to IM)
A.9 doneNotification
(from MC to IM, with EMMA result)
A.10 doneNotification
(from MC to IM, with EMMA "no-input" result)
A.11 cancelRequest (from
IM to MC)
A.12 cancelResponse
(from IM to MC)
A.13 pauseRequest (from
IM to MC)
A.14 pauseResponse (from
MC to IM)
A.15 resumeRequest (from
IM to MC)
A.16 resumeResponse
(from MC to IM)
A.17 extensionNotification (formerly the data event,
sent in both directions)
A.18 clearContextRequest
(from the IM to the MC)
A.19 statusRequest (from
the IM to the MC)
A.20 statusResponse
(from the MC to the IM)
B Event Schemas
B.1 mmi.xsd
B.2 mmi-datatypes.xsd
B.3 mmi-attribs.xsd
B.4 mmi-elements.xsd
B.5 NewContextRequest.xsd
B.6 NewContextResponse.xsd
B.7 PrepareRequest.xsd
B.8 PrepareResponse.xsd
B.9 StartRequest.xsd
B.10 StartResponse.xsd
B.11 DoneNotification.xsd
B.12 CancelRequest.xsd
B.13 CancelResponse.xsd
B.14 PauseRequest.xsd
B.15 PauseResponse.xsd
B.16 ResumeRequest.xsd
B.17 ResumeResponse.xsd
B.18 ExtensionNotification.xsd
B.19 ClearContextRequest.xsd
B.20 ClearContextResponse.xsd
B.21 StatusRequest.xsd
B.22 StatusResponse.xsd
C Ladder Diagrams
C.1 Creating a
C.2 Processing User
C.3 Ending a
D Localization and
E HTTP transport of MMI lifecycle
E.1 Lifecycle event
transport from modality components to Interaction Manager
E.2 Lifecycle event
transport from IM to modality components (HTTP clients
E.3 Lifecycle event
transport from Interaction Manager to modality components (HTTP
E.4 Error
F Glossary
G Types of Modality Components
G.1 Simple modality
G.2 Complex modality
G.3 Nested modality
H References
This document describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.
This document describes the architecture of the Multimodal Interaction (MMI) framework [MMIF] and the interfaces between its constituents. The MMI Working Group is aware that multimodal interfaces are an area of active research and that commercial implementations are only beginning to emerge. Therefore we do not view our goal as standardizing a hypothetical existing common practice, but rather providing a platform to facilitate innovation and technical development. Thus the aim of this design is to provide a general and flexible framework providing interoperability among modality-specific components from different vendors - for example, speech recognition from one vendor and handwriting recognition from another. This framework places very few restrictions on the individual components or on their interactions with each other, but instead focuses on providing a general means for allowing them to communicate with each other, plus basic infrastructure for application control and platform services.
Our framework is motivated by several basic design goals:
Even though multimodal interfaces are not yet common, the software industry as a whole has considerable experience with architectures that can accomplish these goals. Since the 1980s, for example, distributed message-based systems have been common. They have been used for a wide range of tasks, including in particular high-end telephony systems. In this paradigm, the overall system is divided up into individual components which communicate by sending messages over the network. Since the messages are the only means of communication, the internals of components are hidden and the system may be deployed in a variety of topologies, either distributed or co-located. One specific instance of this type of system is the DARPA Hub Architecture, also known as the Galaxy Communicator Software Infrastructure [Galaxy]. This is a distributed, message-based, hub-and-spoke infrastructure designed for constructing spoken dialogue systems. It was developed in the late 1990's and early 2000's under funding from DARPA. This infrastructure includes a program called the Hub, together with servers which provide functions such as speech recognition, natural language processing, and dialogue management. The servers communicate with the Hub and with each other using key-value structures called frames.
Another recent architecture that is relevant to our concerns is the model-view-controller (MVC) paradigm. This is a well known design pattern for user interfaces in object oriented programming languages, and has been widely used with languages such as Java, Smalltalk, C, and C++. The design pattern proposes three main parts: a Data Model that represents the underlying logical structure of the data and associated integrity constraints, one or more Views which correspond to the objects that the user directly interacts with, and a Controller which sits between the data model and the views. The separation between data and user interface provides considerable flexibility in how the data is presented and how the user interacts with that data. While the MVC paradigm has been traditionally applied to graphical user interfaces, it lends itself to the broader context of multimodal interaction where the user is able to use a combination of visual, aural and tactile modalities.
In discussing the design of MMI systems, it is important to keep in mind the distinction between the design-time view (i.e., the markup) and the run-time view (the software that executes the markup). At the design level, we assume that multimodal applications will take the form of multiple documents from different namespaces. In many cases, the different namespaces and markup languages will correspond to different modalities, but we do not require this. A single language may cover multiple modalities and there may be multiple languages for a single modality.
At runtime, the MMI architecture features loosely coupled software constituents that may be either co-resident on a device or distributed across a network. In keeping with the loosely-coupled nature of the architecture, the constituents do not share context and communicate only by exchanging events. The nature of these constituents and the APIs between them is discussed in more detail in Sections 3-5, below. Though nothing in the MMI architecture requires that there be any particular correspondence between the design-time and run-time views, in many cases there will be a specific software component responsible for each different markup language (namespace).
At the markup level, an application consists of multiple documents. A single document may contain markup from different namespaces if the interaction of those namespaces has been defined (e.g., as part of the Compound Document Formats Activity [CDF].) By the principle of encapsulation, however, the internal structure of documents is invisible at the MMI level, which defines only how the different documents communicate. One document has a special status, namely the Root or Controller Document, which contains markup defining the interaction between the other documents. Such markup is called Interaction Manager markup. The other documents are called Presentation Documents, since they contain markup to interact directly with the user. The Controller Document may consist solely of Interaction Manager markup (for example a state machine defined in CCXML [CCXML] or SCXML [SCXML]) or it may contain Interaction Manager markup combined with presentation or other markup. As an example of the latter design, consider a multimodal application in which a CCXML document provides call control functionality as well as the flow control for the various Presentation documents. Similarly, an SCXML flow control document could contain embedded presentation markup in addition to its native Interaction Management markup.
These relationships are recursive, so that any Presentation Document may serve as the Controller Document for another set of documents. This nested structure is similar to 'Russian Doll' model of Modality Components, described below in 3.2 Software Constituents and The Run-Time View.
The different documents are loosely coupled and co-exist without interacting directly. Note in particular that there are no shared variables that could be used to pass information between them. Instead, all runtime communication is handled by events, as described below in 5.1 Common Event Fields.
Furthermore, it is important to note that the asynchronicity of the underlying communication mechanism does not impose the requirement that the markup languages present a purely asynchronous programming model to the developer. Given the principle of encapsulation, markup languages are not required to reflect directly the architecture and APIs defined here. As an example, consider an implementation containing a Modality Component providing Text-to-Speech (TTS) functionality. This Component must communicate with the Interaction Manager via asynchronous events (see 3.2 Software Constituents and The Run-Time View). In a typical implementation, there would likely be events to start a TTS play and to report the end of the play, etc. However, the markup and scripts that were used to author this system might well offer only a synchronous "play TTS" call, it being the job of the underlying implementation to convert that synchronous call into the appropriate sequence of asynchronous events. In fact, there is no requirement that the TTS resource be individually accessible at all. It would be quite possible for the markup to present only a single "play TTS and do speech recognition" call, which the underlying implementation would realize as a series of asynchronous events involving multiple Components.
Existing languages such as XHTML may be used as either the Controller Documents or as Presentation Documents. Further examples of potential markup components are given in 4.2.7 Examples
At the core of the MMI runtime architecture is the distinction between the Interaction Manager (IM) and the Modality Components, which is similar to the distinction between the Controller Document and the Presentation Documents. The Interaction Manager interprets the Controller Document while the individual Modality Components are responsible for specific tasks, particularly handling input and output in the various modalities, such as speech, pen, video, etc.
The Interaction Manager receives all the events that the various Modality Components generate. Those events may be commands or replies to commands, and it is up to the Interaction Manager to decide what to do with them, i.e., what events to generate in response to them. In general, the MMI architecture follows a 'targetless' event model. That is, the Component that raises an event does not specify its destination. Rather, it passes it up to the Runtime Framework, which will pass it to the Interaction Manager. The IM, in turn, decides whether to forward the event to other Components, or to generate a different event, etc.
Modality Components are black boxes, required only to implement the Modality Component Interface API which is described below. This API allows the Modality Components to communicate with the IM and hence with each other, since the IM is responsible for delivering events/messages among the Components. Since the internals of a Component are hidden, it is possible for an Interaction Manager and a set of Components to present themselves as a Component to a higher-level Interaction Manager. All that is required is that the IM implement the Component API. The result is a "Russian Doll" model in which Components may be nested inside other Components to an arbitrary depth. Nesting components in this manner is one way to produce a 'complex' Modality Component, namely one that handles multiple modalities simultaneously. However, it is also possible to produce complex Modality Components without nesting, as discussed in 4.2.3 The Modality Components.
In addition to the Interaction Manager and the modality components, there is a Runtime Framework that provides infrastructure support, in particular a transport layer which delivers events among the components.
Because we are using the term 'Component' to refer to a specific set of entities in our architecture, we will use the term 'Constituent' as a cover term for all the elements in our architecture which might normally be called 'software components'.
The W3C Compound Document Formats Activity [CDF] is also concerned with the execution of user interfaces written in multiple languages. However, the CDF group focuses on defining the interactions of specific sets of languages within a single document, which may be defined by inclusion or by reference. The MMI architecture, on the other hand, defines the interaction of arbitrary sets of languages in multiple documents. From the MMI point of view, mixed markup documents defined by CDF specifications are treated like any other documents, and may be either Controller or Presentation Documents. Finally, note that the tightly coupled languages handled by CDF will usually share data and scripting contexts, while the MMI architecture focuses on a looser coupling, without shared context. The lack of shared context makes it easier to distribute applications across a network and also places minimal constraints on the languages in the various documents. As a result, authors will have the option of building multimodal applications in a wide variety of languages for a wide variety of deployment scenarios. We believe that this flexibility is important for the further development of the industry.
The Extended Multimodal Annotation Language [EMMA], is a set of specifications for multimodal systems, and provides details of an XML markup language for containing and annotating the interpretation of user input. For example, a user of a multimodal application might use both speech to express a command, and keystroke gesture to select or draw command parameters. The Speech Recognition Modality would express the user command using EMMA to indicate the input source (speech). The Pen Gesture Modality would express the command parameters using EMMA to indicate the input source (pen gestures). Both modalities may include timing information in the EMMA notation. Using the timing information, a fusion module combines the speech and pen gesture information into a single EMMA notation representing both the command and its parameters. The use of EMMA enables the separation of recognition process from the information fusion process, and thus enables reusable recognition modalities and general purpose information fusion algorithms.
Here is a list of the Constituents of the MMI architecture. They are discussed in more detail below.
The Interaction Manager (IM) is responsible for handling all events that the other Components generate. Normally there will be specific markup associated with the IM instructing it how to respond to events. This markup will thus contain a lot of the most basic interaction logic of an application. Existing languages such as SMIL, CCXML, SCXML, or ECMAScript can be used for IM markup as an alternative to defining special-purpose languages aimed specifically at multimodal applications. The IM fulfills multiple functions. For example, it is responsible for synchronization of data and focus, etc., across different Modality Components as well as the higher-level application flow that is independent of Modality Components. It also maintains the high-level application data model and may handle communication with external entities and back-end systems. In the future we may split these functions apart and define different components for each of them. However, for the moment, we leave them rolled up in a single monolithic Interaction Manager component. We note that state machine languages such as SCXML are a good choice for authoring such a multi-function component, since state machines can be composed. Thus it is possible to define a high-level state machine representing the overall application flow, with lower-level state machines nested inside it handling the the cross-modality synchronization at each phase of the higher-level flow.
Due to the Russian Doll model, Components may contain their own Interaction Managers to handle their internal events. However these Interaction Managers are not visible to the top level Runtime Framework or Interaction Manager.
If the Interaction Manager does not contain an explicit handler for an event, any default behavior that has been established for the event will be respected. If there is no default behavior, the event will be ignored. (In effect, the Interaction Manager's default handler for all events is to ignore them.)
The Data Component is responsible for storing application-level data. The Interaction Manager is a client of the Data Component and must be able to access and update the it as part of its control flow logic, but Modality Components do not have direct access to it. Since Modality Components are black boxes, they may have their own internal Data Components and may interact directly with backend servers. However, the only way that Modality Components can share data among themselves and maintain consistency is is via the Interaction Manager. It is therefore good application design practice to divide data into two logical classes: private data, which is of interest only to a given modality component, and public data, which is of interest to the Interaction Manager or to more than one Modality Component. Private data may be managed as the Modality Component sees fit, but all modification of public data, including submission to back end servers, should be entrusted to the Interaction Manager.
For the initial version of this specification, we do not define an interface between the Data Component and the Interaction Manager. This amounts to treating the Data Component as part of the Interaction Manager. (Note that this means that the data access language will be whatever one the IM provides.) The Data Component is shown with a dotted outline in the diagram above, because it is only logically distinct. However, at some point in the future, we may define the interface between the Data Component and the Interaction Manager and require support for a specific data access language, independent of the Interaction Manager.
Modality Components, as their name would indicate, are responsible for controlling the various input and output modalities on the device. They are therefore responsible for handling all interaction with the user(s). Their only responsibility is to implement the interface defined in 5 Interface between the Interaction Manager and the Modality Components. Any further definition of their responsibilities must be highly domain- and application-specific. In particular we do not define a set of standard modalities or the events that they should generate or handle. Platform providers are allowed to define new Modality Components and are allowed to place into a single Component functionality that might logically seem to belong to two different modalities. Thus a platform could provide a handwriting-and-speech Modality Component that would accept simultaneous voice and pen input. Such combined Components permit a much tighter coupling between the two modalities than the loose interface defined here. Furthermore, modality components may be used to perform general processing functions not directly associated with any specific interface modality, for example, dialog flow control or natural language processing.
In most cases, there will be specific markup in the application corresponding to a given modality, specifying how the interaction with the user should be carried out. However, we do not require this and specifically allow for a markup-free modality component whose behavior is hard-coded into its software.
The Runtime Framework is a cover term for all the infrastructure services that are necessary for successful execution of a multimodal application. This includes starting the components, handling communication, and logging, etc. For the most part, this version of the specification leaves these functions to be defined in a platform-specific way, but we do specifically define a Transport Layer which handles communications between the components.
The Event Transport Layer is responsible for delivering events among the IM and the Modality Components. Clearly, there are multiple transport mechanisms (protocols) that can be used to implement a Transport Layer and different mechanisms may be used to communicate with different modality components. Thus the Event Transport Layer consists of one or more transport mechanisms linking the IM to the various Modality Components.
We place the following requirements on all transport mechanisms:
For a sample definition of a Transport Layer relying on HTTP, see E HTTP transport of MMI lifecycle events. In the current draft, this definition is provided as an example only, but in future drafts we may require support for this and possibly other Transport Layer definitions.
Events will often carry sensitive information, such as bank account numbers or health care information. In addition events must also be reliable to both sides of transaction: for example, if an event carries an assent to a financial transaction, both sides of the transaction must be able to rely on that assent.
We do not currently specify delivery mechanisms or internal security safeguards used by the Modality Components and the Interaction Manager. However, we believe that any secure system will have to meet the following requirements at a minimum:
The following two optional requirements can be met by using the W3's XML-Signature Syntax and Processing specification [XMLSig].
The remaining optional requirements for event delivery and information security can be met by following other industry-standard procedures.
Multiple protocols may be necessary to implement these requirements. For example, TCP/IP and HTTP provide reliable event delivery, but additional protocols such as TLS or HTTPS could be required to meet security requirements.
This architecture does not and will not specify the internal security requirements of a Modality Component or Runtime Framework.
Media streams are typically not flow through the Interaction Manager. This specification does not specify how media connections are established, as the main focus of this specification is the flow of control data. However, all control data logically sent between modality components MUST flow through the Interaction Manager.
For the sake of concreteness, here are some examples of components that could be implemented using existing languages. Note that we are mixing the design-time and run-time views here, since it is the implementation of the language (the browser) that serves as the run-time component.
The most important interface in this architecture is the one between the Modality Components and the Interaction Manager. Modality Components communicate with the IM via asynchronous events. Components must be able to raise events and to handle events that are delivered to them asynchronously. It is not required that components use these events internally since the implementation of a given Component is black box to the rest of the system. In general, it is expected that Components will raise events both automatically (i.e., as part of their implementation) and under mark-up control.
The majority of the events defined here come in request/response pairs. That is, one party (either the IM or an MC) sends a request and the other returns a response. (The exception is the ExtensionNotification event, which can be sent by either party.) In each case it is specified which party sends the request and which party returns the response. If the wrong party sends a request or response, the receiving party MUST ignore it. In the descriptions below, we say that the originating party "MAY" send the request, because it is up to the internal logic of the originating party to decide if it wants to invoke the behavior that the request would trigger. On the other hand, we say that the receiving party "MUST" send the response, because it is mandatory to send the response if and when the request is received.
The concept of 'context' is basic to these events described below. A context represents a single extended interaction with one (or possibly more) users. In a simple unimodal case, a context can be as simple as a phone call or SSL session. Multimodal cases are more complex, however, since the various modalities may not be all used at the same time. For example, in a voice-plus-web interaction, e.g., web sharing with an associated VoIP call, it would be possible to terminate the web sharing and continue the voice call, or to drop the voice call and continue via web chat. In these cases, a single context persists across various modality configurations. In general, we intend for 'context' to cover the longest period of interaction over which it would make sense for components to store state or information.
For examples of the concrete XML syntax for all these events, see A Examples of Life-Cycle Events
The following common fields are shared by multiple life-cycle events:
A URI that is unique across the system and is used to identify this interaction. All events relating to a given interaction will use the same context URI. Events containing a different context URI will be part of other, unrelated, interactions.
A unique identifier for a Request/Response pair. Most life-cycle events come in Request/Response pairs that share a common RequestID. For each such pair, this id must be unique within the given context.
An enumeration of 'success' and 'failure'. The Response event of a Request/Response pair will use this field to report whether it succeeded in carrying out the request.
An arbitrary value providing further error information in cases where the status is 'failure'.
An optional field containing arbitrary data. The format and meaning of this data is application-specific.
An optional field indicating whether the contents of this event should be treated as confidential. The default value is 'false'. If the value is 'true', the Interaction Manager and Modality Component implementations MUST not log the information or make it available in any way to third parties unless explicitly instructed to do so by the author of the application.
The Multimodal Architecture defines the following basic life-cycle events which must be supported by all modality components. These events allow the Interaction Manager to invoke modality components and receive results from them. They thus form the basic interface between the IM and the Modality components. Note that the 'Extension' event offers extensibility since it contains arbitrary XML content and be raised by either the IM or the Modality Components at any time once the context has been established. For example, an application relying on speech recognition could use the 'Extension' event to communicate recognition results or the fact that speech had started, etc.
A Modality Component MAY send a NewContextRequest to the IM to request that a new context be created. If this event is sent, the IM MUST respond with the NewContextResponse event. Note that the IM MAY create a new context/interaction without a previous NewContextRequest. In such a case, the IM will send a PrepareRequest or StartRequest to the modality components containing a new context ID.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the
NewContextRequest event.Status
See 5.1.5
Status. If the value is Success, the NewContextRequest has
been accepted and a new context identifier will be included. (See
below). If the value is Failure, no context identifier will be
included and further information will be included in the
See 5.1.1
Context. A newly created context identifier. This field MUST be empty
if status is Failure.StatusInfo
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The IM MAY send a PrepareRequest to allow the Modality Components to pre-load markup and prepare to run. Modality Components are not required to take any particular action in response to this event, but they MUST return a PrepareResponse event. Modality Components that return a PrepareResponse event with Status of 'Success' SHOULD be ready to run with close to 0 delay upon receipt of the StartRequest.
A given component can only execute a single StartRequest at one time (see 5.2.3 StartRequest/StartResponse ). However, the Interaction Manager MAY send multiple PrepareRequest events to a Modality Component for the same Context, each referencing a different ContentURL or containing different in-line Content, before sending a StartRequest. In this case, the Modality Component SHOULD prepare to run any of the specified content. The subsequent StartRequest event will determine which specific content the Modality Component should execute.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. Note that the IM may re-use the same context value
in successive calls to Prepare
if they are all within
the same session/call.ContentURL
Optional URL of the content that the
Modality Component should execute. Includes standard HTTP fetch
parameters such as max-age, max-stale, fetchtimeout, etc.
Incompatible with content
Optional Inline markup for the Modality
Component to execute. Incompatible with contentURL
Note that it is legal for both contentURL
to be empty. In such a case, the Modality
Component will revert to its default hard-coded behavior, which
could consist of returning an error event or of running a
preconfigured or hard-coded script.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the PrepareRequest
See 5.1.1
Context. MUST match the value in the
See 5.1.5
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The IM sends a StartRequest to invoke a Modality Component. The Modality Component MUST return a StartResponse event in response. If the Runtime Framework has sent a previous Prepare event, it MAY leave the contentURL and content fields empty, and the Modality Component MUST use the values from the Prepare event. If the IM includes new values for these fields, the values in the Start event override those in the Prepare event.
If the Interaction Manager sends multiple StartRequests to a given Modality Component before it receives a DoneNotification, each such request overrides the earlier ones. Thus if a Modality Component receives a new StartRequest while it is executing a previous one, it MUST either cease execution of the previous StartRequest and begin executing the content specified in the most recent StartRequest, or reject the new StartRequest, returning a StartResponse with status equal to 'failure'.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. Note that the IM may re-use the same context value
in successive calls to Start
if they are all within
the same session/call.ContentURL
Optional URL of the content that the
Modality Component should execute. Includes standard HTTP fetch
parameters such as max-age, max-stale, fetchtimeout, etc.
Incompatible with content
Optional Inline markup for the Modality
Component to execute. Incompatible with contentURL
Note that it is legal for both contentURL
to be empty. In such a case, the Modality
Component will either use the values provided in the most recent
PrepareRequest, if one was sent, or revert to its default
hard-coded behavior, which could consist of returning an error
event or of running a preconfigured or hard-coded script.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the StartRequest
See 5.1.1
Context. MUST match the value in the Start
See 5.1.5
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The Modality Component MAY return a DoneNotification to the IM to indicate that it has reached the end of its processing. The DoneNotification event is intended to indicate the completion of the processing that has been initiated by the Interaction Manager with a StartRequest. As an example a voice modality component might use the DoneNotification event to indicate the completion of a recognition task. In this case the DoneNotification event might carry the recognition result expressed using EMMA. However, there may be tasks which do not have a specific end. For example the Interaction Manager might send a StartRequest to a graphical modality component requesting it to display certain information. Such a task does not necessarily have a specific end and thus the graphical modality component might never send a DoneNotification event to the Interaction Manager. Thus the graphical modality component would display the screen until it received another StartRequest (or some other lifecycle event) from the Interaction Manager.
. See 5.1.4
RequestID. MUST match the RequestID of the StartRequest
See 5.1.1
Context. MUST match the value in the Start
See 5.1.5
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The IM MAY send a CancelRequest to stop processing in the Modality Component. In this case, the Modality Component MUST return a CancelResponse.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. MUST match the value in the Start
Boolean value indicating whether a hard
stop is requested.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the CancelRequest
See 5.1.1
Context. MUST match the value in the Start
See 5.1.5
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The IM MAY send a PauseRequest to suspend processing by the Modality Component. Modality Components may ignore this command if they are unable to pause, but they MUST return a PauseResponse.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. MUST match the value in the Start
Boolean value indicating whether a hard
pause is requested.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the PauseRequest
See 5.1.1
Context. MUST match the value in the Start
See 5.1.5
ISee 5.1.6 StatusInfo.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The IM MAY send the ResumeRequest to resume processing that was paused by a previous PauseRequest. Implementations may ignore this command if they are unable to pause, but they MUST return a ResumeResponse.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. MUST match the value in the Start
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the ResumeRequest
See 5.1.1
Context. MUST match the value in the Start
See 5.1.5
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.This event MAY be generated by either the IM or the Modality Component. It is used to encapsulate application-specific events that are extensions to the framework defined here. For example, if an application containing a voice modality wanted that modality component to notify the Interaction Manager when speech was detected, it would cause the voice modality to generate an Extension event ( with a 'name' of something like 'speechDetected') at the appropriate time.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Name
The name of this event. This is an
application-specific value.Context
See 5.1.1
Context. MUST match the value in the Start
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The IM MAYsend a ClearContextRequest to indicate that t he specified context is no longer active and that any resources associated with it may be freed. Modality Components are not required to take any particular action in response to this command, but MUST return a ClearContextResponse.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. MUST match the value in the Start
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the
ClearContextRequest event.Context
See 5.1.1
Context. MUST match the value in the Start
See 5.1.5
See 5.1.6
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.The StatusRequest message and the corresponding StatusResponse are intended to provide keep-alive functionality, informing the IM about the presence of the various modality components. Note that both these messages may be either linked to a specific context or sent to the underlying server independent of any user interaction. In the former case, the IM is inquiring about the status of the specific interaction (i.e. context). In the latter case, it is in effect asking the underlying server whether it could start a new Context if requested to do so.
The StatusRequest message is sent from the IM to a Modality Component. By waiting for an implementation dependent period of time for a StatusResponse message, the IM may determine if the Modality Component is active. If automatic updates are enabled, the Modality Component SHOULD send multiple StatusResponse messages in response to a single StatusRequest message.
. See 5.1.4
RequestID. A newly generated identifier used to identify
this request.Context
See 5.1.1
Context. Optional specification of the context for
which the status is requested. If it is not present, the request is
directed to the underlying server, namely the software that would
host a new context if one were created.RequestAutomaticUpdate
. A boolean value indicating
whether the Modality Component should send ongoing StatusResponse
messages without waiting for additional StatusRequest messages from
the Runtime Framework.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.RequestID
. See 5.1.4
RequestID. MUST match the RequestID in the StatusRequest
. A boolean indicating whether the
Modality Component will keep sending StatusResponse messages in the
future without waiting for another StatusRequest message.Context
See 5.1.1
Context. An optional specification of the context for which
the status is being returned. If not present, the response
represents the status of the underlying server.Status
An enumeration of 'Alive' or 'Dead'. The
meaning of these values depends on whether the 'context' parameter
is present. If it is, the status is 'Alive' means that the
specified session is still active and capable of handling new life
cycle events. The status 'Dead' means that the context has
terminated and no further interaction with the user is available
using it. If the 'context' parameter is not provided, the status
refers to the underlying server. A value of 'Alive' indicates that
the Modality Component is able to handle subsequent Prepare and
Start messages. If status is 'Dead', it is not able to handle such
requests. Thus the status of 'Dead' indicates that the modality
component is going off-line. If the IM receives a StatusResponse
message with status of 'Dead', it may continue to send
StatusRequest messages, but it may not receive a response to them
until the Modality Component comes back online.Source
See 5.1.2
See 5.1.3
See 5.1.7
See 5.1.8 Confidential.Within an established context, a Modality Component functions in one of three states: Idle, Running or Paused. Request lifecycle events received from the Interaction Manager imply specific actions and transitions between states. The table below defines MC actions, state transitions and response contents for each possible Request event sent by the IM to a MC in a particular state.
A Failure: ErrorMessage annotation indicates that the specified Request event is either invalid or redundant in the specified state. In this case, the Modality Component must respond by sending a matching Response event with Status=Failure and StatusInfo=ErrorMessage. In all other cases, the Modality should perform the requested action, possibly transitioning to another state as indicated.
event / state | Idle | Running | Paused |
PrepareRequest | preload or update content | preload or update content | preload or update content |
StartRequest | Transition: Running
use new content if provided, otherwise use last available content |
stop processing current content, restart as in Idle | Transition: Running
stop processing current content, restart as in Idle |
Failure: NoContent if MC requires content to run and none has been provided | |||
CancelRequest | Failure: NotRunning | Transition: Idle | Transition: Idle |
PauseRequest | Failure: NotRunning | Transition: Paused | Failure: AlreadyPaused |
Failure: CantPause if MC is unable to pause | |||
ResumeRequest | Failure: NotRunning | Failure: AlreadyRunning | Transition: Running |
StatusRequest | send status | send status | send status |
ClearContextRequest | close session | close session | close session |
Here is a state chart representation of these transitions:
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:newContextRequest source="someURI" target="someOtherURI" requestID="request-1"> </mmi:newContextRequest> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:newContextResponse source="someURI" target="someOtherURI" requestID="request-1" status="success" context="URI-1"> </mmi:newContextResponse> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:prepareRequest source="someURI" target="someOtherURI" context="URI-1" requestID="request-1"> <mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s"/> </mmi:prepareRequest> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0" xmlns:vxml=""> <mmi:prepareRequest source="someURI" target="someOtherURI" context="URI-1" requestID="request-1" > <mmi:content> <vxml:vxml version="2.0"> <vxml:form> <vxml:block>Hello World!</vxml:block> </vxml:form> </vxml:vxml> </mmi:content> </mmi:prepareRequest> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:prepareResponse source="someURI" target="someOtherURI" context="someURI" requestID="request-1" status="success"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:prepareResponse source="someURI" target="someOtherURI" context="someURI" requestID="request-1" status="failure"> <mmi:statusInfo> NotAuthorized </mmi:statusInfo> </mmi:prepareResponse> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:startRequest source="someURI" target="someOtherURI" context="URI-1" requestID="request-1"> <mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s"/> </mmi:startRequest> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:startResponse source="someURI" target="someOtherURI" context="someURI" requestID="request-1" status="failure"> <mmi:statusInfo> NotAuthorized </mmi:statusInfo> </mmi:startResponse> </mmi:mmi>
This requestID corresponds to the requestID of the "startRequest" event that started it.
<mmi:mmi xmlns:mmi="" version="1.0" xmlns:emma=""> <mmi:doneNotification source="someURI" target="someOtherURI" context="someURI" status="success" requestID="request-1" confidential="true"> <mmi:data> <emma:emma version="1.0"> <emma:interpretation id="int1" emma:medium="acoustic" emma:confidence=".75" emma:mode="voice" emma:tokens="flights from boston to denver"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> </emma:emma> </mmi:data> </mmi:doneNotification> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:doneNotification source="someURI" target="someOtherURI" context="someURI" status="success" requestID="request-1" > <mmi:data> <emma:emma version="1.0"> <emma:interpretation id="int1" emma:no-input="true"/> </emma:emma> </mmi:data> </mmi:doneNotification> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:cancelRequest source="someURI" target="someOtherURI" context="someURI" requestID="request-1"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:cancelResponse source="someURI" target="someOtherURI" context="someURI" requestID="request-1" status="success"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:pauseRequest context="someURI" source="someURI" target="someOtherURI" immediate="true" requestID="request-1"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:pauseResponse source="someURI" target="someOtherURI" context="someURI" requestID="request-1" status="success"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:resumeRequest context="someURI" source="someURI" target="someOtherURI" requestID="request-1"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:resumeResponse source="someURI" target="someOtherURI" context="someURI" requestID="request-2" status="success"/> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:extensionNotification name="appEvent" source="someURI" target="someOtherURI" context="someURI" requestID="request-1"> <applicationdata/> </mmi:extensionNotification> </mmi:mmi>
<mmi:mmi xmlns:mmi="" version="1.0"> <mmi:clearContextRequest source="someURI" target="someOtherURI" context="someURI" requestID="request-2"/> </mmi:mmi>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace=""> <xs:annotation> <xs:documentation xml:lang="en"> Schema definition for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="NewContextRequest.xsd"/> <xs:include schemaLocation="NewContextResponse.xsd"/> <xs:include schemaLocation="ClearContextRequest.xsd"/> <xs:include schemaLocation="ClearContextResponse.xsd"/> <xs:include schemaLocation="CancelRequest.xsd"/> <xs:include schemaLocation="CancelResponse.xsd"/> <xs:include schemaLocation="DoneNotification.xsd"/> <xs:include schemaLocation="ExtensionNotification.xsd"/> <xs:include schemaLocation="PauseRequest.xsd"/> <xs:include schemaLocation="PauseResponse.xsd"/> <xs:include schemaLocation="PrepareRequest.xsd"/> <xs:include schemaLocation="PrepareResponse.xsd"/> <xs:include schemaLocation="ResumeRequest.xsd"/> <xs:include schemaLocation="ResumeResponse.xsd"/> <xs:include schemaLocation="StartRequest.xsd"/> <xs:include schemaLocation="StartResponse.xsd"/> <xs:include schemaLocation="StatusRequest.xsd"/> <xs:include schemaLocation="StatusResponse.xsd"/> <xs:element name="mmi"> <xs:complexType> <xs:choice> <xs:sequence> <xs:element ref="mmi:newContextRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:newContextResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:clearContextRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:clearContextResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:cancelRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:cancelResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:doneNotification"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:extensionNotification"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:pauseRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:pauseResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:prepareRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:prepareResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:resumeRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:resumeResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:startRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:startResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:statusRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:statusResponse"/> </xs:sequence> </xs:choice> <xs:attributeGroup ref="mmi:mmi.version.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="" xmlns:mmi="" targetNamespace=""> <xs:annotation> <xs:documentation xml:lang="en"> general Type definition schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:simpleType name="versionType"> <xs:restriction base="xs:decimal"> <xs:enumeration value="1.0"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="sourceType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="targetType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="requestIDType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="contextType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="statusType"> <xs:restriction base="xs:string"> <xs:enumeration value="success"/> <xs:enumeration value="failure"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="statusResponseType"> <xs:restriction base="xs:string"> <xs:enumeration value="alive"/> <xs:enumeration value="dead"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="immediateType"> <xs:restriction base="xs:boolean"/> </xs:simpleType> <xs:complexType name="contentURLType"> <xs:attribute name="href" type="xs:anyURI" use="required"/> <xs:attribute name="max-age" type="xs:string" use="optional"/> <xs:attribute name="fetchtimeout" type="xs:string" use="optional"/> </xs:complexType> <xs:complexType name="contentType"> <xs:sequence> <xs:any namespace="" processContents="skip" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="emmaType"> <xs:sequence> <xs:any namespace="" processContents="skip" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="anyComplexType" mixed="true"> <xs:complexContent mixed="true"> <xs:restriction base="xs:anyType"> <xs:sequence> <xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="" xmlns:mmi="" targetNamespace="" attributeFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> general Type definition schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> </xs:attributeGroup> <xs:attributeGroup name="mmi.version.attrib"> <xs:attribute name="version" type="mmi:versionType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="source.attrib"> <xs:attribute name="source" type="mmi:sourceType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="target.attrib"> <xs:attribute name="target" type="mmi:targetType" use="optional"/> </xs:attributeGroup> <xs:attributeGroup name="requestID.attrib"> <xs:attribute name="requestID" type="mmi:requestIDType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="context.attrib"> <xs:attribute name="context" type="mmi:contextType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="confidential.attrib"> <xs:attribute name="confidential" type="xs:boolean" use="optional"/> </xs:attributeGroup> <xs:attributeGroup name="context.optional.attrib"> <xs:attribute name="context" type="mmi:contextType" use="optional"/> </xs:attributeGroup> <xs:attributeGroup name="immediate.attrib"> <xs:attribute name="immediate" type="mmi:immediateType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="status.attrib"> <xs:attribute name="status" type="mmi:statusType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="statusResponse.attrib"> <xs:attribute name="status" type="mmi:statusResponseType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name=""> <xs:attribute name="name" type="xs:string" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="requestAutomaticUpdate.attrib"> <xs:attribute name="requestAutomaticUpdate" type="xs:boolean" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="automaticUpdate.attrib"> <xs:attribute name="automaticUpdate" type="xs:boolean" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="group.allEvents.attrib"> <xs:attributeGroup ref="mmi:source.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:requestID.attrib"/> <xs:attributeGroup ref="mmi:context.attrib"/> <xs:attributeGroup ref="mmi:confidential.attrib"/> </xs:attributeGroup> <xs:attributeGroup name="group.allResponseEvents.attrib"> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:status.attrib"/> </xs:attributeGroup> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="" xmlns:mmi="" targetNamespace="" attributeFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> general elements definition schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <!-- ELEMENTS --> <xs:element name="statusInfo" type="mmi:anyComplexType"/> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> NewContextRequest schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="newContextRequest"> <xs:complexType> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> <xs:attributeGroup ref="mmi:source.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:requestID.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> NewContextResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="newContextResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PrepareRequest schema for MMI Life cycle events version 1.0. The optional PrepareRequest event is an event that the Runtime Framework may send to allow the Modality Components to pre-load markup and prepare to run (e.g. in case of VXML VUI-MC). Modality Components are not required to take any particular action in response to this event, but they must return a PrepareResponse event. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="prepareRequest"> <xs:complexType> <xs:choice> <xs:sequence> <xs:element name="contentURL" type="mmi:contentURLType"/> </xs:sequence> <xs:sequence> <xs:element name="content" type="mmi:anyComplexType"/> <!-- only vxml permitted ?? --> </xs:sequence> <!-- data really needed ?? --> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> </xs:choice> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PrepareResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="prepareResponse"> <xs:complexType> <xs:sequence> <xs:element name="data" minOccurs="0" type="mmi:anyComplexType"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> StartRequest schema for MMI Life cycle events version 1.0. The Runtime Framework sends the event StartRequest to invoke a Modality Component (to start loading a new GUI resource or to start the ASR or TTS). The Modality Component must return a StartResponse event in response. If the Runtime Framework has sent a previous PrepareRequest event, it may leave the contentURL and content fields empty, and the Modality Component will use the values from the PrepareRequest event. If the Runtime Framework includes new values for these fields, the values in the StartRequest event override those in the PrepareRequest event. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="startRequest"> <xs:complexType> <xs:choice> <xs:sequence> <xs:element name="contentURL" type="mmi:contentURLType"/> </xs:sequence> <xs:sequence> <xs:element name="content" type="mmi:anyComplexType"/> <!-- only vxml permitted ?? --> </xs:sequence> <!-- data really needed ?? --> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> </xs:choice> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> StartResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="startResponse"> <xs:complexType> <xs:sequence> <xs:element name="data" minOccurs="0" type="mmi:anyComplexType"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> DoneNotification schema for MMI Life cycle events version 1.0. The DoneNotification event is intended to be used by the Modality Component to indicate that it has reached the end of its processing. For the VUI-MC it can be used to return the ASR recognition result (or the status info: noinput/nomatch) and TTS/Player done notification. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="doneNotification"> <xs:complexType> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> CancelRequest schema for MMI Life cycle events version 1.0. The CancelRequest event is sent by the Runtime Framework to stop processing in the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a CancelResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="cancelRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:immediate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> CancelResponse schema for MMI Life cycle events version 1.0. The CancelRequest event is sent by the Runtime Framework to stop processing in the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a CancelResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="cancelResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PauseRequest schema for MMI Life cycle events version 1.0. The PauseRequest event is sent by the Runtime Framework to pause processing of a Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a PauseResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="pauseRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:immediate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PauseResponse schema for MMI Life cycle events version 1.0. The PauseRequest event is sent by the Runtime Framework to pause the processing of the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a PauseResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="pauseResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ResumeRequest schema for MMI Life cycle events version 1.0. The ResumeRequest event is sent by the Runtime Framework to resume a previously suspended processing task of a Modality Component. The Modality Component must return with a ResumeResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="resumeRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:immediate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ResumeRequest schema for MMI Life cycle events version 1.0. The ResumeRequest event is sent by the Runtime Framework to resume a previously suspended processing task of a Modality Component. The Modality Component must return with a ResumeResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="resumeResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ExtensionNotification schema for MMI Life cycle events version 1.0. The extensionNotification event may be generated by either the Runtime Framework or the Modality Component and is used to communicate (presumably changed) data values to the other component. E.g. the VUI-MC has signaled a recognition result for any field displayed on the GUI, the event will be used by the Runtime Framework to send a command to the GUI-MC to update the GUI with the recognized value. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="extensionNotification"> <xs:complexType> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref=""/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ClearContextRequest schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="clearContextRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ClearContextResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="clearContextResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> StatusRequest schema for MMI Life cycle events version 1.0. The StatusRequest message and the corresponding StatusResponse are intended to provide keep-alive functionality, informing the Runtime Framework about the presence of the various modality components. Note that both messages are not tied to any context and may thus be sent independent of any user interaction. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="statusRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:context.optional.attrib"/> <xs:attributeGroup ref="mmi:source.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:requestID.attrib"/> <xs:attributeGroup ref="mmi:requestAutomaticUpdate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="" xmlns:xs="" targetNamespace="" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> StatusRequest schema for MMI Life cycle events version 1.0. The StatusRequest message and the corresponding StatusResponse are intended to provide keep-alive functionality, informing the Runtime Framework about the presence of the various modality components. Note that both messages are not tied to any context and may thus be sent independent of any user interaction. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="statusRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:context.optional.attrib"/> <xs:attributeGroup ref="mmi:source.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:requestID.attrib"/> <xs:attributeGroup ref="mmi:requestAutomaticUpdate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
The following ladder diagram shows a possible message sequence upon a session creation. We assume that an Interaction Manager session is already up and running. The user starts a multimodal session for example by starting a web browser and fetching a given URL.
The initial document contains scripts which providing the modality component functionality (e.g. understanding XML formatted life-cycle events) and message transport capabilities (e.g. AJAX, but depends on the exact system implementation).
After loading the initial documents (and scripts) the modality component implementation issues a mmi:newContextRequest message to the IM. The IM may load a corresponding markup document, if necessary, and initializes and starts a new session.
In this scenario the Interaction Manager manager logic issues a number of mmi:startRequest messages to the various modality components. One message is sent to the graphical modality component (GUI) to instruct it to load a HTML document. Another message is sent to a voice modality component (VUI) to play a welcome message.
The voice modality component has (in this example) to create a VoiceXML session. As VoiceXML 2.1 does not provide an external event interface a CCXML session will be used for external asynchronous communication. Therefore the voice modality component uses the session creation interface of CCXML 1.0 to create a session and start a corresponding script. This script will then make a call to a phone at the user device (which could be a regular phone or a SIP soft phone on the user's device). This scenario illustrates the use of a SIP phone, which may reside on the users mobile handset.
After successful setup of a CCXML session and the voice connection the voice modality component instructs the CCXML browser to start a VoiceXML dialog and passing it a corresponding VoiceXML script. The VoiceXML interpreter will execute the script and play out the welcome message. After the execution of the VoiceXML script has finished, the voice modality component notifies the Interaction Manager using the mmi:done event.
The next diagram gives a example for the possible message flow while processing of user input. In the given scenario the user wants to enter information using the voice modality component. To start the voice input the user has to use the "push-to-talk" button. The "push-to-talk" button (which might be a hardware button or a soft button on the screen) generates a corresponding event when pushed. This event is issues as a mmi:extension event towards the Interaction Manager. The Interaction Manager logic sends a mmi:startRequest to the voice modality component. This mmi:startRequest message contains a URL which points to a corresponding VoiceXML script. The voice modality component again starts a VoiceXML interpreter using the given URL. The VoiceXML interpreter loads the document and executes it. Now the system is ready for the user input. To notify the user about the availability of the voice input functionality the Interaction Manager might send an event to the GUI upon receiving the mmi:startResponse event (which indicates that the voice modality component has started to execute the document). But note that this is not shown in the picture.
The VoiceXML interpreter captures the users voice input and uses a speech recognition engine to recognize the utterance. The speech recognition result will be represented as an EMMA document and sent to the interaction manager using the mmi:done message. The Interaction Manager logic sends a mmi:extension message to the GUI modality component to instruct it to display the recognition result.
In the following scenario a modality component instance will be destroyed as a reaction to a user input, e.g. because the user selected to change to the GUI only mode. In this case a mmi:clearContextRequest will be issued to the voice modality component. The voice modality component wrapper will then destroy the CCXML (and VoiceXML) session.
The application logic (i.e. the IM) may also decide to indicate the removed voice functionality and disable an icon on the screen which indicates the availability of the voice modality.
The MMI architecture specification describes a set of lifecycle events which define the basic interface between the interaction management and the modality components. The startRequest lifecycle event defines the "content" and "contentURL" elements which may contain markup code (or references to markup code). The markup has to be executed by the modality component. Using the "content" or "contentURL" attributes introduces a dependency of the lifecycle event to a specific modality component implementation. In other words, the interaction manager has to issue different startRequests, depending on which markup a GUI modality component may be able to process.
But multimodal applications may want to support different modality component implementations, such as HTML or Flash, for the same application. In this case the interaction manager should be independent of the modality component implementation and hence not generate a markup specific lifecycle event (e.g. containing a link to HTML or even HTML content), but a further abstracted description of the command.
Furthermore, localization needs to be taken into account. If the interaction manager sends markup code to the modality component (or references to it), this markup code should not contain any dependencies to the user's language. Instead the interaction manager needs to send the locale information to the modality component and let it select the appropriate strings.
Here is an example to show, how these two issues could be addressed within the lifecycle events. This example uses a generic data structure to carry the locale information (within the xml:lang attribute) and the data to be visualized at a GUI.
<mmi:mmi xmlns:mmi="" xmlns:xml="" version="1.0"> <mmi:startRequest mmi:requestID="1.237204761416E12" mmi:context="IM_dcc3c320-9e88-44fe-b91d-02bd02fba1e3" mmi:target="GUI"> <mmi:contentURL>login</mmi:contentURL> <mmi:data> <gui resourceid="login" xml:lang="de-DE"> <data id="back" enabled="false"/> <data id="next" enabled="false"/> </gui> </mmi:data> </mmi:startRequest> </mmi:mmi>
This startRequest carries a generic <gui> structure as its payload which contains a "resourceid" and the xml:lang information. The "resourceid" has to be interpreted by the modality component (either to load an HTML document or a corresponding dialog, e.g. if it is a flash app), whereas "xml:lang" is used by the modality component to select the appropriate string tables.
The content of the <gui> structure is an application specific (but generic) description of data to be used by the modality component. This could contain a description of the status of GUI elements (such as "enabled" or "disabled") or a list of items to be displayed. The following example shows a startRequest to display a list of music songs. The list of songs will be loaded from a backend system and are dynamic. The representation of the song list is agnostic to the modality component implementation. It is the responsibility of the modality component to interpret the structure and to display its content appropriately.
<mmi:mmi xmlns:mmi="" xmlns:xml="" version="1.0"> <mmi:startRequest mmi:requestID="1.23720967758E12" mmi:context="IM_dcc3c320-9e88-44fe-b91d-02bd02fba1e3" mmi:target="GUI"> <mmi:contentURL>songSelection</mmi:contentURL> <mmi:data> <gui resourceid="songSelection" xml:lang="de-DE"> <data id="back" enabled="true"/> <data id="next" enabled="false"/> <data id="titleList" selected="" enabled="true"> <items> <item id="10"> <arg name="artist"><![CDATA[One artist]]> </arg> <arg name="title"><![CDATA[This is the title]]> </arg> <arg name="displayName"><![CDATA[Title]]> </arg> <arg name="price"><![CDATA[0.90]]> </arg> </item> <item id="11"> <arg name="artist"><![CDATA[Another artist]]> </arg> <arg name="title"><![CDATA[Yet another title]]> </arg> <arg name="displayName"><![CDATA[2nd title]]> </arg> <arg name="price"><![CDATA[0.90]]> </arg> </item> </items> </data> </gui> </mmi:data> </mmi:startRequest> </mmi:mmi>
The "Multimodal Architecture and Interfaces" specification supports deployments in a variety of topologies, either distributed or co-located. In case of a distributed deployment, a protocol for the lifecycle event transport needs to be defined. HTTP is the major protocol of the web. HTTP is widely adopted, it is supported by many programming languages and especially used by web browsers. Technologies like AJAX provide asynchronous transmission of messages for web browsers and allow to build modality components on top of it in distributed environments. This chapter describes how the HTTP protocol should be used for MMI lifecycle event transport in distributed deployments. Modality components and the Interaction Manager need an HTTP processor to send and receive MMI lifecycle events. The following picture illustrates a possible modularization of the Runtime Framework, the Interaction Manager and the Modality Components. It shows internal lifecycle event interfaces (which abstract from the transport layer) and the HTTP processors. The HTTP processors are responsible for assembling and disassembling of HTTP requests, which carry MMI lifecycle event representations as payloads.
The following chapters describe, how the HTTP protocol should be used to transport MMI lifecycle events.
HTTP defines the concept of client and server [RFC2616]. One possible deployment of the multimodal architecture is shown in following figure:
In this deployment scenario the Interaction Manager acts as an HTTP server, whereas modality components are HTTP clients, sending HTTP requests to the Interaction Manager. But other configurations are possible.
The multimodal architecture specification requires an asynchronous bi-directional event transmission. To achieve this (in the given scenario, where modality components are HTTP clients and the Interaction Manager acts as an HTTP server) separate (parallel) HTTP requests (refered to as send and receive channels in the picture) are used to send and receive lifecycle events.
Modality components use HTTP/POST requests to send MMI lifecycle events to the IM. The request contains the following URL request parameters:
(or token
The lifecycle event itself is contained in the body of the
HTTP/POST request. The Content-Type
header field of
the HTTP/POST request has to be set according to the lifecycle
event format, e.g. “text/xml”.
The URL request parameters context
are equivalent to the respective MMI lifecycle
event attributes. The context
MUST be used whenever
available. The context
is only unknown to the modality
component during startup of a multimodal session, as the
will be returned from the Interaction Manager
to the Modality component with the newContextResponse
lifecycle event. Hence, when sending a
, the context is unknown. Therefore a
is used to associate the
and newContextResponse
The token
is a unique id (preexisting knowledge,
e.g. generated by the modality component during registration) to
identify the channel between a modality component and the
Interaction Manager.
Once the context
is exchanged, the
MUST be used with subsequent requests and the
MUST NOT be used anymore.
The response (to a HTTP/POST request, which carries a lifecycle event from a Modality Component to to the Interaction Manager) MUST NOT contain any content and the HTTP response code MUST be “204 No Content”.
The HTTP processor of the Interaction Manager is expected to handle POST requests (which contain lifecycle events sent from the modality component to the Interaction Manager) as following:
(or token
) parameter
to identify the corresponding interaction manager sessiontimeout
(in milliseconds). The request
contains the following URL request parameters:
(or token
(optional)See discussion of the parameter context
in the
previous chapter. The parameter source
describes the
source of the request, i.e. the modality components id. The
parameter timeout
is optional and describes the
maximum delay in milliseconds. Only positive integer values are
allowed for the parameter timeout
. The request with
set to “0” returns immediately. The
Interaction Manager may limit the timeout to a (platform specific)
maximum value. In case of absence of the parameter
the Interaction Manager uses a platform
specific default.
The HTTP response body contains the lifecycle event as a string.
The HTTP response header MUST contain the Content-Type
header field, which describes the format of the lifecycle event
string (e.g. “text/xml”).
The HTTP processor of the Interaction Manager is expected to handle HTTP/GET requests (which are used by the Modality Component to receive lifecycle events) as following:
(or token
) parameter to
identify the corresponding Interaction Manager sessionsource
parameter to identify modality
component idContent-Type
header field appropriately). Use "200
OK" HTTP status code in case an event is contained in the response,
“204 No Content” in case of timeout or 4XX/5XX codes in case of
failure (see error handling section below)The following figure shows a sequence of HTTP requests:
If the IM receives a HTTP/GET request containing an invalid
or context
, it MUST return a 409
(Conflict) response code.
For modality components, which are HTTP servers themselves, the Interaction Manager needs to send a lifecycle event through an HTTP/POST request. The request contains the following parameters:
is equivalent to the corresponding
MMI lifecycle event attribute and describes the receiver of the
event. Hence, the receiver of the HTTP request uses this parameter
to indentify the corresponding modality component. Various MMI lifecycle events (especially response events) contain Status and StatusInfo fields. These fields should be used for error indication whenever possible. However, a failure during delivery of a lifecycle event needs to be indicated using HTTP response codes.
The HTTP processor of the Interaction Manager has to use HTTP response codes to indicate success or errors during request handling. In case of a successful processing of a request (successful in terms of transport, i.e. an event has been successfully delivered) a 2XX status code (e.g. "204 No Content") has to be returned. Transport related errors, which lead to failure in delivery of a lifecycle event, are indicated using 4XX or 5XX response codes. 4XX error codes referring to "client errors" (wrong parameters etc.) whereas 5XX error codes indicating server errors (see also HTTP response codes in [RFC2616]).
The treatment of transport errors is up to the implementation, but the implementation should make errors visible to author code (e.g. raise event within Interaction Manager when a lifecycle event has not been successfully delivered to a Modality Component).
Modality components can be classified into either of three categories: simple, complex or nested.
A simple modality component presents information to a user or captures information from a user as directed by an interaction manager. A simple modality component is atomic in that it can not be portioned into two or ore simple modality components that send events among themselves. A simple modality component is like a black box in that the interaction manager can not directly access any function inside of the black box other than by using life-cycle events.
A simple modality component might contain functionality to present one of the following types of information to the user or user agent. For example:
A simple modality component might contain functionality to capture one of the following types of information from the user or user agent as directed by a complex modality or interaction manager:
Figure 1 illustrates two simple modality components—ASR modality for capturing input from the user and TTS for presenting output to the user. Note that all information exchanged between the two modality components must be sent as life-cycle events to the interaction manager which forwards them to the other modality component.
A complex modality component may contain functionality of two or more simple modality components, for example:
Figure 2 illustrates a complex modality component containing two functions, ASR and TTS. The ASR and TTS functions within the complex modality component may communicate directly with each other, in addition to sending and receiving life-cycle events with the interaction manager
A nested modality component is a set of modality components and a script (possibly written in SCXML) that manages them. The script communicates with the child modality components using life cycle events. The script communicates with the interaction manager using only life-cycle events. The children modality components may not communicate directly with each other.
Figure 3 illustrates a nested modality component with two child modality components, ASR and TTS.
In effect, the script within a nested modality component can be thought of as an interaction manager that manages the child modality components. In effect, a nested modality component is a nested interaction manager. This is the so-called "Russian Doll" model of nested interaction managers.