Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 16 October 2008 Working Draft of "Multimodal Architecture and Interfaces". The main difference from the previous draft is the addition of the rules and guidelines which will allow modality experts to describe the features, capabilities and APIs for specific modality components in sufficient detail so that the components will be interoperable in implementations of the Multimodal Architecture. The modality components themselves will be defined by modality experts according to the guidelines. For example, voice modality components might be defined by the W3C Voice Browser Working Group. Those rules and guidelines are available at Appendix F Rules and Best Practices for Creating a MMI Modality Component. The event schemas defined in the Appendix B are also updated because there were two undefined life-cycle events, "createRequest" and "createResponse", which were wrongly referred from mmi.xsd, PrepareResponse.xsd and StartResponse.xsd. Such events were once considered, however, have been turned out not needed to be added.
This document is the fifth Public Working Draft for review by W3C Members and other interested parties, and has been developed by the Multimodal Interaction Working Group of the W3C Multimodal Interaction Activity.
Comments for this specification are welcomed and should have a subject starting with the prefix '[ARCH]'. Please send them to www-multimodal@w3.org, the public email list for issues related to Multimodal. This list is archived and acceptance of this archiving policy is requested automatically upon first post. To subscribe to this list send an email to www-multimodal-request@w3.org> with the word subscribe in the subject line.
For more information about the Multimodal Interaction Activity, please see the Multimodal Interaction Activity statement.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
1 Abstract
2 Overview
3 Design versus Run-Time considerations
3.1 Markup and The Design-Time View
3.2 Software Constituents and The Run-Time View
3.3 Relationship to Compound Document Formats
4 Overview of Constituents
4.1 Run-Time Architecture Diagram
5 The Constituents
5.1 The Runtime Framework
5.1.1 The Interaction Manager
5.1.2 The Delivery Context Component
5.1.3 The Data Component
5.2 Modality Components
5.3 Examples
6 Interface between the Runtime Framework and the
Modality Components
6.1 Event Delivery Mechanism
6.1.1 Event and Information Security
6.1.2 Multiple Protocols
6.1.3 System and OS Security
6.2 Standard Life Cycle Events
6.2.1 NewContextRequest
6.2.1.1 NewContextRequest Properties
6.2.2 NewContextResponse
6.2.2.1 NewContextResponse Properties
6.2.3 PrepareRequest
6.2.3.1 PrepareRequest Properties
6.2.4 PrepareResponse
6.2.4.1 PrepareResponse Properties
6.2.5 StartRequest
6.2.5.1 StartRequest Properties
6.2.6 StartResponse
6.2.6.1 StartResponse Properties
6.2.7 DoneNotification
6.2.7.1 DoneNotification Properties
6.2.8 CancelRequest
6.2.8.1 CancelRequest Properties
6.2.9 CancelResponse
6.2.9.1 CancelResponse Properties
6.2.10 PauseRequest
6.2.10.1 PauseRequest Properties
6.2.11 PauseResponse
6.2.11.1 PauseResponse Properties
6.2.12 ResumeRequest
6.2.12.1 ResumeRequest Properties
6.2.13 ResumeResponse
6.2.13.1 ResumeResponse Properties
6.2.14 ExtensionNotification
6.2.14.1 ExtensionNotification Properties
6.2.15 ClearContextRequest
6.2.15.1 ClearContextRequest Properties
6.2.16 ClearContextResponse
6.2.16.1 ClearContextResponse Properties
6.2.17 StatusRequest
6.2.17.1 Status Request Properties
6.2.18 StatusResponse
6.2.18.1 StatusResponse Properties
7 Open Issues
A Examples of Life-Cycle Events
B Event Schemas
C Ladder Diagrams
C.1 Creating a Session
C.2 Processing User Input
C.3 Ending a Session
D Glossary
E Use Case Discussion
F Rules and Best Practices for Creating a MMI Modality Component
G Acknowledgements
H References
This document describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.
This document describes the architecture of the Multimodal Interaction (MMI) framework [MMIF] and the interfaces between its constituents. The MMI Working Group is aware that multimodal interfaces are an area of active research and that commercial implementations are only beginning to emerge. Therefore we do not view our goal as standardizing a hypothetical existing common practice, but rather providing a platform to facilitate innovation and technical development. Thus the aim of this design is to provide a general and flexible framework providing interoperability among modality-specific components from different vendors - for example, speech recognition from one vendor and handwriting recognition from another. This framework places very few restrictions on the individual components or on their interactions with each other, but instead focuses on providing a general means for allowing them to communicate with each other, plus basic infrastructure for application control and platform services.
Our framework is motivated by several basic design goals:
Even though multimodal interfaces are not yet common, the software industry as a whole has considerable experience with architectures that can accomplish these goals. Since the 1980s, for example, distributed message-based systems have been common. They have been used for a wide range of tasks, including in particular high-end telephony systems. In this paradigm, the overall system is divided up into individual components which communicate by sending messages over the network. Since the messages are the only means of communication, the internals of components are hidden and the system may be deployed in a variety of topologies, either distributed or co-located. One specific instance of this type of system is the DARPA Hub Architecture, also known as the Galaxy Communicator Software Infrastructure [Galaxy]. This is a distributed, message-based, hub-and-spoke infrastructure designed for constructing spoken dialogue systems. It was developed in the late 1990's and early 2000's under funding from DARPA. This infrastructure includes a program called the Hub, together with servers which provide functions such as speech recognition, natural language processing, and dialogue management. The servers communicate with the Hub and with each other using key-value structures called frames.
Another recent architecture that is relevant to our concerns is the model-view-controller (MVC) paradigm. This is a well known design pattern for user interfaces in object oriented programming languages, and has been widely used with languages such as Java, Smalltalk, C, and C++. The design pattern proposes three main parts: a Data Model that represents the underlying logical structure of the data and associated integrity constraints, one or more Views which correspond to the objects that the user directly interacts with, and a Controller which sits between the data model and the views. The separation between data and user interface provides considerable flexibility in how the data is presented and how the user interacts with that data. While the MVC paradigm has been traditionally applied to graphical user interfaces, it lends itself to the broader context of multimodal interaction where the user is able to use a combination of visual, aural and tactile modalities.
In discussing the design of MMI systems, it is important to keep in mind the distinction between the design-time view (i.e., the markup) and the run-time view (the software that executes the markup). At the design level, we assume that multimodal applications will take the form of multiple documents from different namespaces. In many cases, the different namespaces and markup languages will correspond to different modalities, but we do not require this. A single language may cover multiple modalities and there may be multiple languages for a single modality.
At runtime, the MMI architecture features loosely coupled software constituents that may be either co-resident on a device or distributed across a network. In keeping with the loosely-coupled nature of the architecture, the constituents do not share context and communicate only by exchanging events. The nature of these constituents and the APIs between them is discussed in more detail in Sections 3-5, below. Though nothing in the MMI architecture requires that there be any particular correspondence between the design-time and run-time views, in many cases there will be a specific software component responsible for each different markup language (namespace).
At the markup level, an application consists of multiple documents. A single document may contain markup from different namespaces if the interaction of those namespaces has been defined (e.g., as part of the Compound Document Formats Activity [CDF].) By the principle of encapsulation, however, the internal structure of documents is invisible at the MMI level, which defines only how the different documents communicate. One document has a special status, namely the Root or Controller Document, which contains markup defining the interaction between the other documents. Such markup is called Interaction Manager markup. The other documents are called Presentation Documents, since they contain markup to interact directly with the user. The Controller Document may consist solely of Interaction Manager markup (for example a state machine defined in CCXML [CCXML] or SCXML [SCXML]) or it may contain Interaction Manager markup combined with presentation or other markup. As an example of the latter design, consider a multimodal application in which a CCXML document provides call control functionality as well as the flow control for the various Presentation documents. Similarly, an SCXML flow control document could contain embedded presentation markup in addition to its native Interaction Managment markup.
These relationships are recursive, so that any Presentation Document may serve as the Controller Document for another set of documents. This nested structure is similar to 'Russian Doll' model of Modality Components, described below in 3.2 Software Constituents and The Run-Time View .
The different documents are loosely coupled and co-exist without interacting directly. Note in particular that there are no shared variables that could be used to pass information between them. Instead, all runtime communication is handled by events, as described below in 6.2 Standard Life Cycle Events .
Furthermore, it is important to note that the asynchronicity of the underlying communication mechanism does not impose the requirement that the markup languages present a purely asynchronous programming model to the developer. Given the principle of encapsulation, markup languages are not required to reflect directly the architecture and APIs defined here. As an example, consider an implementation containing a Modality Component providing Text-to-Speech (TTS) functionality. This Component must communicate with the Runtime Framework via asynchronous events (see 3.2 Software Constituents and The Run-Time View ). In a typical implementation, there would likely be events to start a TTS play and to report the end of the play, etc. However, the markup and scripts that were used to author this system might well offer only a synchronous "play TTS" call, it being the job of the underlying implementation to convert that synchronous call into the appropriate sequence of asynchronous events. In fact, there is no requirement that the TTS resource be individually accessible at all. It would be quite possible for the markup to present only a single "play TTS and do speech recognition" call, which the underlying implementation would realize as a series of asynchronous events involving multiple Components.
Existing languages such as XHTML may be used as either the Controller Documents or as Presentation Documents. Further examples of potential markup components are given in 5.3 Examples
At the core of the MMI runtime architecture is the distinction between the Runtime Framework and the Modality Components, which is similar to the distinction between the Controller Document and the Presentation Documents. The Runtime Framework interprets the Controller Document and provides the basic infrastructure which the various Modality Components plug into. Individual Modality Components are responsible for specific tasks, particularly handling input and output in the various modalities, such as speech, pen, video, etc. Modality Components are black boxes, required only to implement the Modality Component Interface API which is described below. This API allows the Modality Components to communicate with the Framework and hence with each other, since the Framework is responsible for delivering events/messages among the Components.
Since the internals of a Component are hidden, it is possible for a Runtime Framework and a set of Components to present themselves as a Component to a higher-level Framework. All that is required is that the Framework implement the Component API. The result is a "Russian Doll" model in which Components may be nested inside other Components to an arbitrary depth. Nesting components in this manner is one way to produce a 'complex' Modality Component, namely one that handles multiple modalities simultaneously. However, it is also possible to produce complex Modality Components without nesting, as discussed in 5.2 Modality Components .
The Runtime Framework is itself divided up into sub-components. One important sub-component is the Interaction Manager (IM), which executes the Interaction Manager markup. The IM receives all the events that the various Modality Components generate. Those events may be commands or replies to commands, and it is up to the Interaction Manager to decide what to do with them, i.e., what events to generate in response to them. In general, the MMI architecture follows a 'targetless' event model. That is, the Component that raises an event does not specify its destination. Rather, it passes it up to the Runtime Framework, which will pass it to the Interaction Manager. The IM, in turn, decides whether to forward the event to other Components, or to generate a different event, etc. The other sub-components of the Runtime Framework are the Device Context Component, which provides information about device capabilities and user preferences, and the Data Component, which stores the Data Model for the application. We do not currently specify the interfaces for the IM and the Data Component, so they represent only the logical structure of the functionality that the Runtime Framework provides. The interface to the Device Context Component is specified in [DCCI].
Because we are using the term 'Component' to refer to a specific set of entities in our architecture, we will use the term 'Constituent' as a cover term for all the elements in our architecture which might normally be called 'software components'.
The W3C Compound Document Formats Activity [CDF] is also concerned with the execution of user interfaces written in multiple languages. However, the CDF group focuses on defining the interactions of specific sets of languages within a single document, which may be defined by inclusion or by reference. The MMI architecture, on the other hand, defines the interaction of arbitrary sets of languages in multiple documents. From the MMI point of view, mixed markup documents defined by CDF specifications are treated like any other documents, and may be either Controller or Presentation Documents. Finally, note that the tightly coupled languages handled by CDF will usually share data and scripting contexts, while the MMI architecture focuses on a looser coupling, without shared context. The lack of shared context makes it easier to distribute applications across a network and also places minimal constraints on the languages in the various documents. As a result, authors will have the option of building multimodal applications in a wide variety of languages for a wide variety of deployment scenarios. We believe that this flexibility is important for the further development of the industry.
Here is a list of the Constituents of the MMI architecture. They are discussed in more detail in the next section.
The Runtime Framework is responsible for starting the application and interpreting the Controller Document. More specifically, the Runtime Framework must:
The need for mapping between synchronous and asynchronous APIs can be seen by considering the case where a Modality Component wants to query the Delivery Context Interface [DCCI]. The DCCI API provides synchronous access to property values whereas the Modality Component API, presented below in 6.2 Standard Life Cycle Events , is purely asynchronous and event-based. The Modality Component will therefore generate an event requesting the value of a certain property. The DCCI cannot handle this event directly, so the Runtime Framework must catch the event, make the corresponding function call into the DCCI API, and then generate a response event back to the Modality Component. Note that even though it is globally the Runtime Framework's responsibility to do this mapping, most of the Runtime Framework's behavior is asynchronous. It may therefore make sense to factor out the mapping into a separate Adapter, allowing the Runtime Framework proper to have a fully asynchronous architecture. For the moment, we will leave this as an implementation decision, but we may make the Adapter a formal part of the architecture at a later date.
The Runtime Framework's main purpose is to provide the infrastructure, rather than to interact with the user. Thus it implements the basic event loop, which the Components use to communicate with one another, but is not expected to handle by itself any events other than life-cycle events. However, if the Controller Document markup section of the application provides presentation markup as well as Interaction Management, the Runtime Framework will execute it just as the Modality Components do. Note, however, that the execution of such presentation markup is internal to the Runtime Framework and need not rely on the Modality Component API.
The Interaction Manager (IM) is the sub-component of the Runtime Framework that is responsible for handling all events that the other Components generate. Normally there will be specific markup associated with the IM instructing it how to respond to events. This markup will thus contain a lot of the most basic interaction logic of an application. Existing languages such as SMIL, CCXML, SCXML, or ECMAScript can be used for IM markup as an alternative to defining special-purpose languages aimed specifically at multimodal applications. In a future draft of this specification, we may define the interface between the IM and the Runtime Framework, with the goal of making it easy to plug in different IM languages into a given Framework. However, the current draft does not specify such an API so that the Runtime Framework and IM appear as a single unit to the Modality Components.
The IM fulfills multiple functions. For example, it is responsible for synchronization of data and focus, etc., across different Modality Components as well as the higher-level application flow that is independent of Modality Components. It also maintains the high-level application data model and may handle communication with external entities and back-end systems. In the future we may split these functions apart and define different components for each of them. However, for the moment, we leave them rolled up in a single monolithic Interaction Manager component. We note that state machine languages such as SCXML are a good choice for authoring such a multi-function component, since state machines can be composed. Thus it is possible to define a high-level state machine representing the overall application flow, with lower-level state machines nested inside it handling the the cross-modality synchronization at each phase of the higher-level flow.
Due to the Russian Doll model, Components may contain their own Interaction Managers to handle their internal events. However these Interaction Managers are not visible to the top level Runtime Framework or Interaction Manager.
If the Interaction Manager does not contain an explicit handler for an event, any default behavior that has been established for the event will be respected. If there is no default behavior, the event will be ignored. (In effect, the Interaction Manager's default handler for all events is to ignore them.)
The Delivery Context [DCCI] is intended to provide a platform-abstraction layer enabling dynamic adaptation to user preferences, environmental conditions, device configuration and capabilities. It allows Constituents and applications to:
Note that some device properties, such as screen brightness, are run-time settable, while others, such as whether there is a screen, are not. The term 'property' is also used for characteristics that may be more properly thought of as user preferences, such as preferred output modality or default speaking volume.
The Data Component is a sub-component of the Runtime Framework which is responsible for storing application-level data. The Interaction Manager must be able to access and update the Data Component as part of its control flow logic, but Modality Components do not have direct access to it. Since Modality Components are black boxes, they may have their own internal Data Components and may interact directly with backend servers. However, the only way that Modality Components can share data among themselves and maintain consistency is is via the Interaction Manager. It is therefore good application design practice to divide data into two logical classes: private data, which is of interest only to a given modality component, and public data, which is of interest to the Interaction Manager or to more than one Modality Component. Private data may be managed as the Modality Component sees fit, but all modification of public data, including submission to back end servers, should be entrusted to the Interaction Manager.
For the initial version of this specification, we will not specify a data access language, but will assume that the Interaction Manager language provides sufficient data access capabilities, including submission to back end servers. However, at some point in the future, we may require support for a specific data access language, independent of the Interaction Manager.
Modality Components, as their name would indicate, are responsible for controlling the various input and output modalities on the device. They are therefore responsible for handling all interaction with the user(s). Their only responsibility is to implement the interface defined in 6 Interface between the Runtime Framework and the Modality Components . Any further definition of their responsibilities must be highly domain- and application-specific. In particular we do not define a set of standard modalities or the events that they should generate or handle. Platform providers are allowed to define new Modality Components and are allowed to place into a single Component functionality that might logically seem to belong to two different modalities. Thus a platform could provide a handwriting-and-speech Modality Component that would accept simultaneous voice and pen input. Such combined Components permit a much tighter coupling between the two modalities than the loose interface defined here. Furthermore, modality components may be used to perform general processing functions not directly associated with any specific interface modality, for example, dialog flow control or natural language processing.
In most cases, there will be specific markup in the application corresponding to a given modality, specifying how the interaction with the user should be carried out. However, we do not require this and specifically allow for a markup-free modality component whose behavior is hard-coded into its software.
For the sake of concreteness, here are some examples of components that could be implemented using existing languages. Note that we are mixing the design-time and run-time views here, since it is the implementation of the language (the browser) that serves as the run-time component.
The most important interface in this architecture is the one between the Modality Components and the Runtime Framework. Modality Components communicate with the Framework via asynchronous events. Components must be able to raise events and to handle events that are delivered to them asynchronously. It is not required that components use these events internally since the implementation of a given Component is black box to the rest of the system. In general, it is expected that Components will raise events both automatically (i.e., as part of their implementation) and under mark-up control. The disposition of events is the responsibility of the Runtime Framework layer. That is, the Component that raises an event does not specify which Component it should be delivered to or even whether it should be delivered to any Component at all. Rather that determination is left up to the Framework and Interaction Manager.
We do not currentlyspecify the mechanism used to deliver events between the Modality Components and the Runtime Framework, but we may do so in the future. We do place the following requirements on it:
Events will often carry sensitive information, such as bank account numbers or health care information. In addition events must also be reliable to both sides of transaction: for example, if an event carries an assent to a financial transaction, both sides of the transaction must be able to rely on that assent.
We do not currently specify delivery mechanisms or internal security safeguards used by the Modality Components and the Runtime Framework. However, we believe that any secure system will have to meet the following requirements at a minimum:
The following two optional requirements can be met by using the W3's XML-Signature Syntax and Processing specifiction [XMLSig].
The remaining optional requirements for event delivery and information security can be met by following other industry-standard procedures.
The Multimodal Architecture defines the following basic life-cycle events which must be supported by all modality components. These events allow the Runtime Framework to invoke modality components and receive results from them. They thus form the basic interface between the Runtime Framework and the Modality components. Note that the 'Extension' event offers extensibility since it contains arbitrary XML content and be raised by either the Runtime Framework or the Modality Components at any time once the context has been established. For example, an application relying on speech recognition could use the 'Extension' event to communicate recognition results or the fact that speech had started, etc.
The concept of 'context' is basic to these events described below. A context represents a single extended interaction with one (or possibly more) users. In a simple unimodal case, a context can be as simple as a phone call or SSL session. Multimodal cases are more complex, however, since the various modalities may not be all used at the same time. For example, in a voice-plus-web interaction, e.g., web sharing with an associated VoIP call, it would be possible to terminate the web sharing and continue the voice call, or to drop the voice call and continue via web chat. In these cases, a single context persists across various modality configurations. In general, we intend for 'context' to cover the longest period of interaction over which it would make sense for components to store state or information.
For examples of the concrete XML syntax for all these events, see B Examples of Life-Cycle Events
Optional event that a Modality Component may send to the Runtime Framework to request that a new context be created. If this event is sent, the Runtime Framework must respond with the NewContextResponse event.
Sent by the Runtime Framework in response to the NewContextRequest message.
RequestID
. Matches the RequestID in the NewContextRequest event. Status
An enumeration of Success or Failure. If the value is Success, the
NewContextRequest
has been accepted and a new context identifier will be included. (See below). If the
value is Failure,
no context identifier will be included and further information will be included in the
StatusInfo
field. Context
A URI identifying the new context. Empty if status is Failure. Media
One or more valid media types indicating the media to be associated
with the context. Note that these do not have to be identical to the ones contained
in the NewContextRequest. StatusInfo
If status
equals Failure, this field holds further
information. Data
Optional additional data. An optional event that the Runtime Framework may send to allow the Modality Components to pre-load markup and prepare to run. Modality Components are not required to take any particular action in response to this event, but they must return a PrepareResponse event.
Context
. A unique URI designating this context. Note that the Runtime Framework may re-use the same context value
in successive calls to Start
if they are all within the same session/call. ContentURL
Optional URL of the content that the
Modality Component should execute. Includes standard HTTP fetch parameters such as
max-age, max-stale, fetchtimeout, etc. Incompatible with content
. Content
Optional Inline markup for the Modality Component to execute.
Incompatible with contentURL
. Note that it is legal for both contentURL
and content
to be empty. In such a case, the Modality Component
will revert to its
default hard-coded behavior, which could consist of returning an error event or
of running a preconfigured
or hard-coded script. Data
Optional additional data. A given component may only execute a single StartRequest at one time (see 6.2.5 StartRequest ). However, the Interaction Manager may send multiple PrepareRequest events to a Modality Component for the same Context, each referencing a different ContentURL or containing different in-line Content, before sending a StartRequest. In this case, the Modality Component should prepare to run any of the specified content. The subsequent StartRequest event will determine which specific content the Modality Component should execute.
Sent by the Modality Component in response to the Prepare event. Modality Components that return a PrepareResponse event with Status of 'Success' should be ready to run with close to 0 delay upon receipt of the Start event.
The Runtime Framework sends this event to invoke a Modality Component. The Modality Component must return a StartResponse event in response. If the Runtime Framework has sent a previous Prepare event, it may leave the contentURL and content fields empty, and the Modality Component will use the values from the Prepare event. If the Runtime Framework includes new values for these fields, the values in the Start event override those in the Prepare event.
Context
. A unique URI designating this context. Note that the Runtime Framework may re-use the same context value
in successive calls to Start
if they are all within the same session/call. ContentURL
Optional URL of the content that the
Modality Component should execute. Includes standard HTTP fetch parameters such as
max-age, max-stale, fetchtimeout, etc. Incompatible with content
. Content
Optional Inline markup for the Modality Component to execute.
Incompatible with contentURL
. Note that it is legal for both contentURL
and content
to be empty. In such a case, the Modality Component will either
use the values provided in the preceding Prepare event, if one was sent, or revert to its
default hard-coded behavior, which could consist of returning an error event or of running a preconfigured
or hard-coded script. Data
Optional additional data. If the Interaction Manager sends multiple StartRequests to a given Modality Component before it receives a DoneNotification, each such request overrides the earlier ones. Thus if a Modality Component receives a new StartRequest while it is executing a previous one, it should cancel the execution of the previous StartRequest, producing a suitable DoneNotification, and begin executing the content specified in the most recent StartRequest. If it is unable to cancel the execution of the previous StartRequest, the Modality Component should reject the new StartRequest, returning a suitable failure code in the StartResponse.
Returned by the Modality Component to indicate that it has reached the end of its processing.
Sent by the Runtime Framework to stop processing in the Modality Component. The Modality Component must return CancelResponse.
Sent by the Runtime Framework to suspend processing by the Modality Component. Implementations may ignore this command if they are unable to pause, but they must return PauseResponse.
Sent by the Runtime Framework to resume paused processing by the Modality Component. Implementations may ignore this command if they are unable to pause, but they must return ResumeResponse.
This event may be generated by either the Runtime Framework or the Modality Component. It is used to encapsulate application-specific events that are extensions to the framework defined here. For example, if an application containing a voice modality wanted that modality component to notify the Interaction Manager when speech was detected, it would cause the voice modality to generate an Extension event (with a 'name' of something like 'speechDetected') at the appropriate time.
Sent by the Runtime Framework to indicate that the specified context is no longer active and that any resources associated with it may be freed. (More specifically, the next time that the Runtime Framework uses the specified context ID, it should be understood as referring to a new context.)
Returned by the Modality Component in response to theClearContext command.
The StatusRequest message and the corresponding StatusResponse are intended to provide keep-alive functionality, informing the Runtime Framework about the presence of the various modality components. Note that both these messages are not tied to any Context and may thus be sent independent of any user interaction.
The StatusRequest message is sent from the Runtime Framework to a Modality Component. By waiting for an implementation dependent period of time for a StatusResponse message, the Runtime Framework may determine if the Modality Component is active.
Sent by the Modality Component to the Runtime Framework. If automatic updates are enabled, the Modality Component may send multiple StatusResponse messages in response to a single StatusRequest message.
AutomaticUpdate
. A boolean indicating whether the Modality Component
will keep sending StatusResponse messages in the future without waiting for another StatusRequest
message. Status
An enumeration of 'Alive' or 'Dead'. If the status is 'Alive',
the Modality Component is able to handle subsequent Prepare and Start messages. If status
is 'Dead', it is not able to handle such requests. Thus the status of 'Dead' indicates
that the modality component is going off-line. If the Runtime Framework receives a
StatusResponse message with status of 'Dead', it may continue to send StatusRequest messages,
but it may not receive a response to them until the Modality Component comes back online.Data
Optional additional data.Issue (confidential event data):
We are considering adding a field to life-cycle events indicating that the event contains confidential data (such as bank account numbers or PINs) which should not be implicitly logged by the platform or made potentially available to third parties in any way. Note that this is a separate requirement than the security requirements placed on the event transport protocol in 6.1 Event Delivery Mechanism . We would like feedback from potential implementers and users of this standard as to whether such a feature would be useful and how it should be defined.
Resolution:
None recorded.
In this specification we use elements from a fictional "dcont" namespace in some examples. The W3C Ubiquitous Web Application Working Group (UWA-WG) is developing such an ontology and expects to define a "dcont" namespace. The examples below are informative only and may, unintentionally, be incompatible with the work of the UWA-WG. For authoritative information on a (future) "dcont" namespace, please consult the Delivery Context Ontology specification.
(The definition of "media" and the details of the media element will be discussed in the next draft.)
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:newContextRequest source="someURI" requestID="request-1"> <media id="mediaID1>media1</media> <media id="mediaID2">media2</media> <mmi:data xmlns:dcont="http://www.w3.org/2008/04/dcont"> <dcont:DeliveryContext> ... </dcont:DeliveryContext > </mmi:data> </mmi:newContextRequest> </mmi:mmi>
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:newContextResponse source="someURI" requestID="request-1" status="success" context="URI-1"> <media>media1</media> <media>media2</media> </mmi:newContextResponse> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:prepareRequest source="someURI" context="URI-1" requestID="request-1"> <mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s"/> </mmi:prepareRequest> </mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:prepareRequest source="someURI" context="URI-1" requestID="request-1" > <mmi:content> <vxml:vxml version="2.0"> <vxml:form> <vxml:block>Hello World!</vxml:block> </vxml:form> </vxml:vxml> </mmi:content> </mmi:prepareRequest> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:prepareResponse source="someURI" context="someURI" requestID="request-1" status="success"/> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:prepareResponse source="someURI" context="someURI" requestID="request-1" status="failure"> <mmi:statusInfo> NotAuthorized </mmi:statusInfo> </mmi:prepareResponse> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:startRequest source="someURI" context="URI-1" requestID="request-1"> <mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s"> </mmi:startRequest> </mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:startResponse source="someURI" context="someURI" requestID="request-1" status="failure"> <mmi:statusInfo> NotAuthorized </mmi:statusInfo> </mmi:startResponse> </mmi:mmi>
This requestID corresponds to the requestID of the "startRequest" event that started it.
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:doneNotification source="someURI" context="someURI" status="success" requestID="request-1" > <mmi:data> <emma:emma version="1.0" <emma:interpretation id="int1" emma:medium="acoustic" emma:confidence=".75" emma:mode="voice" emma:tokens="flights from boston to denver"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> </emma:emma> </mmi:data> </mmi:doneNotification> </mmi:mmi>
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:doneNotification source="someURI" context="someURI" status="success" requestID="request-1" > <mmi:data> <emma:emma version="1.0" <emma:interpretation id="int1" emma:no-input="true"/> </emma:emma> </mmi:data> </mmi:doneNotification> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:cancelRequest context="someURI" source="someURI" immediate="true" requestID="request-1"> </mmi:cancelRequest> </mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:cancelResponse source="someURI" context="someURI" requestID="request-1" status="success"/> </mmi:cancelResponse> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:pauseRequest context="someURI" source="someURI" immediate="true" requestID="request-1"/> </mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:cancelResponse source="someURI" context="someURI" requestID="request-1" status="success"/> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:resumeRequest context="someURI" source="someURI" requestID="request-1"/> </mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:resumelResponse source="someURI" context="someURI" requestID="request-2" status="success"/> </mmi:mmi>
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:extensionNotification name="appEvent" source="someURI" context="someURI" requestID="request-1" > <applicationdata/> </mmi:extensionNotification> </mmi:mmi>
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:clearContextRequest source="someURI" context="someURI" requestID="request-2"/> </mmi:mmi>
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:statusRequest requestAutomaticUpdate="true" source="someURI" requestID="request-3"/> </mmi:mmi>
<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:statusResponse automaticUpdate="true" status="alive" source="someURI" requestID="request-3"/> </mmi:mmi>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch"> <xs:annotation> <xs:documentation xml:lang="en"> NewContextRequest schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="NewContextRequest.xsd"/> <xs:include schemaLocation="NewContextResponse.xsd"/> <xs:include schemaLocation="ClearContextRequest.xsd"/> <xs:include schemaLocation="ClearContextResponse.xsd"/> <xs:include schemaLocation="CancelRequest.xsd"/> <xs:include schemaLocation="CancelResponse.xsd"/> <xs:include schemaLocation="DoneNotification.xsd"/> <xs:include schemaLocation="ExtensionNotification.xsd"/> <xs:include schemaLocation="PauseRequest.xsd"/> <xs:include schemaLocation="PauseResponse.xsd"/> <xs:include schemaLocation="PrepareRequest.xsd"/> <xs:include schemaLocation="PrepareResponse.xsd"/> <xs:include schemaLocation="ResumeRequest.xsd"/> <xs:include schemaLocation="ResumeResponse.xsd"/> <xs:include schemaLocation="StartRequest.xsd"/> <xs:include schemaLocation="StartResponse.xsd"/> <xs:include schemaLocation="StatusRequest.xsd"/> <xs:include schemaLocation="StatusResponse.xsd"/> <xs:element name="mmi"> <xs:complexType> <xs:choice> <xs:sequence> <xs:element ref="mmi:newContextRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:newContextResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:clearContextRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:clearContextResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:cancelRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:cancelResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:doneNotification"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:extensionNotification"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:pauseRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:pauseResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:prepareRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:prepareResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:resumeRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:resumeResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:startRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:startResponse"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:statusRequest"/> </xs:sequence> <xs:sequence> <xs:element ref="mmi:statusResponse"/> </xs:sequence> </xs:choice> <xs:attributeGroup ref="mmi:mmi.version.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" targetNamespace="http://www.w3.org/2008/04/mmi-arch"> <xs:annotation> <xs:documentation xml:lang="en"> general Type definition schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:simpleType name="versionType"> <xs:restriction base="xs:decimal"> <xs:enumeration value="1.0"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="mediaContentTypes"> <xs:restriction base="xs:string"> <xs:enumeration value="media1"/> <xs:enumeration value="media2"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="mediaAttributeTypes"> <xs:restriction base="xs:string"> <xs:enumeration value="mediaID1"/> <xs:enumeration value="mediaID2"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="sourceType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="targetType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="requestIDType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="contextType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="statusType"> <xs:restriction base="xs:string"> <xs:enumeration value="success"/> <xs:enumeration value="failure"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="statusResponseType"> <xs:restriction base="xs:string"> <xs:enumeration value="alive"/> <xs:enumeration value="dead"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="immediateType"> <xs:restriction base="xs:boolean"/> </xs:simpleType> <xs:complexType name="contentURLType"> <xs:attribute name="href" type="xs:anyURI" use="required"/> <xs:attribute name="max-age" type="xs:string" use="optional"/> <xs:attribute name="fetchtimeout" type="xs:string" use="optional"/> </xs:complexType> <xs:complexType name="contentType"> <xs:sequence> <xs:any namespace="http://www.w3.org/2001/vxml" processContents="skip" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="emmaType"> <xs:sequence> <xs:any namespace="http://www.w3.org/2003/04/emma" processContents="skip" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="anyComplexType" mixed="true"> <xs:complexContent mixed="true"> <xs:restriction base="xs:anyType"> <xs:sequence> <xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> general Type definition schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:attributeGroup name="media.id.attrib"> <xs:attribute name="id" type="mmi:mediaAttributeTypes" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="mmi.version.attrib"> <xs:attribute name="version" type="mmi:versionType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="source.attrib"> <xs:attribute name="source" type="mmi:sourceType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="target.attrib"> <xs:attribute name="target" type="mmi:targetType" use="optional"/> </xs:attributeGroup> <xs:attributeGroup name="requestID.attrib"> <xs:attribute name="requestID" type="mmi:requestIDType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="context.attrib"> <xs:attribute name="context" type="mmi:contextType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="immediate.attrib"> <xs:attribute name="immediate" type="mmi:immediateType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="status.attrib"> <xs:attribute name="status" type="mmi:statusType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="statusResponse.attrib"> <xs:attribute name="status" type="mmi:statusResponseType" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="extension.name.attrib"> <xs:attribute name="name" type="xs:string" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="requestAutomaticUpdate.attrib"> <xs:attribute name="requestAutomaticUpdate" type="xs:boolean" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="automaticUpdate.attrib"> <xs:attribute name="automaticUpdate" type="xs:boolean" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="group.allEvents.attrib"> <xs:attributeGroup ref="mmi:source.attrib"/> <xs:attributeGroup ref="mmi:requestID.attrib"/> <xs:attributeGroup ref="mmi:context.attrib"/> </xs:attributeGroup> <xs:attributeGroup name="group.allResponseEvents.attrib"> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:status.attrib"/> </xs:attributeGroup> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> general elements definition schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <!-- ELEMENTS --> <xs:element name="statusInfo" type="mmi:anyComplexType"/> <xs:element name="media"> <xs:complexType> <xs:simpleContent> <xs:extension base="mmi:mediaContentTypes"> <xs:attributeGroup ref="mmi:media.id.attrib"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> NewContextRequest schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:import namespace="http://www.w3.org/2008/04/dcont" schemaLocation="dcont.xsd"/> <xs:element name="newContextRequest"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:media" maxOccurs="unbounded"/> <xs:element name="data"> <xs:complexType> <xs:sequence> <xs:element ref="dcont:DeliveryContext"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> NewContextResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="newContextResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PrepareRequest schema for MMI Life cycle events version 1.0. The optional PrepareRequest event is an event that the Runtime Framework may send to allow the Modality Components to pre-load markup and prepare to run (e.g. in case of VXML VUI-MC). Modality Components are not required to take any particular action in response to this event, but they must return a PrepareResponse event. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="prepareRequest"> <xs:complexType> <xs:choice> <xs:sequence> <xs:element name="contentURL" type="mmi:contentURLType"/> </xs:sequence> <xs:sequence> <xs:element name="content" type="mmi:anyComplexType"/> <!-- only vxml permitted ?? --> </xs:sequence> <!-- data really needed ?? --> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> </xs:choice> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PrepareResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="prepareResponse"> <xs:complexType> <xs:sequence> <xs:element name="data" minOccurs="0" type="mmi:anyComplexType"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> StartRequest schema for MMI Life cycle events version 1.0. The Runtime Framework sends the event StartRequest to invoke a Modality Component (to start loading a new GUI resource or to start the ASR or TTS). The Modality Component must return a StartResponse event in response. If the Runtime Framework has sent a previous PrepareRequest event, it may leave the contentURL and content fields empty, and the Modality Component will use the values from the PrepareRequest event. If the Runtime Framework includes new values for these fields, the values in the StartRequest event override those in the PrepareRequest event. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="startRequest"> <xs:complexType> <xs:choice> <xs:sequence> <xs:element name="contentURL" type="mmi:contentURLType"/> </xs:sequence> <xs:sequence> <xs:element name="content" type="mmi:anyComplexType"/> <!-- only vxml permitted ?? --> </xs:sequence> <!-- data really needed ?? --> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> </xs:choice> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> StartResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="startResponse"> <xs:complexType> <xs:sequence> <xs:element name="data" minOccurs="0" type="mmi:anyComplexType"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> DoneNotification schema for MMI Life cycle events version 1.0. The DoneNotification event is intended to be used by the Modality Component to indicate that it has reached the end of its processing. For the VUI-MC it can be used to return the ASR recognition result (or the status info: noinput/nomatch) and TTS/Player done notification. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="doneNotification"> <xs:complexType> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> CancelRequest schema for MMI Life cycle events version 1.0. The CancelRequest event is sent by the Runtime Framework to stop processing in the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a CancelResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="cancelRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:immediate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> CancelResponse schema for MMI Life cycle events version 1.0. The CancelRequest event is sent by the Runtime Framework to stop processing in the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a CancelResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="cancelResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PauseRequest schema for MMI Life cycle events version 1.0. The PauseRequest event is sent by the Runtime Framework to pause processing of a Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a PauseResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="pauseRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:immediate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> PauseResponse schema for MMI Life cycle events version 1.0. The PauseRequest event is sent by the Runtime Framework to pause the processing of the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a PauseResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="pauseResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ResumeRequest schema for MMI Life cycle events version 1.0. The ResumeRequest event is sent by the Runtime Framework to resume a previously suspended processing task of a Modality Component. The Modality Component must return with a ResumeResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="resumeRequest"> <xs:complexType> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:immediate.attrib"/> <!-- no elements --> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ResumeRequest schema for MMI Life cycle events version 1.0. The ResumeRequest event is sent by the Runtime Framework to resume a previously suspended processing task of a Modality Component. The Modality Component must return with a ResumeResponse message. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="resumeResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ExtensionNotification schema for MMI Life cycle events version 1.0. The extensionNotification event may be generated by either the Runtime Framework or the Modality Component and is used to communicate (presumably changed) data values to the other component. E.g. the VUI-MC has signaled a recognition result for any field displayed on the GUI, the event will be used by the Runtime Framework to send a command to the GUI-MC to update the GUI with the recognized value. </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:element name="extensionNotification"> <xs:complexType> <xs:sequence> <xs:element name="data" type="mmi:anyComplexType"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> <xs:attributeGroup ref="mmi:extension.name.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ClearContextRequest schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="clearContextRequest"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ClearContextResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="clearContextResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ClearContextRequest schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="clearContextRequest"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:annotation> <xs:documentation xml:lang="en"> ClearContextResponse schema for MMI Life cycle events version 1.0 </xs:documentation> </xs:annotation> <xs:include schemaLocation="mmi-datatypes.xsd"/> <xs:include schemaLocation="mmi-attribs.xsd"/> <xs:include schemaLocation="mmi-elements.xsd"/> <xs:element name="clearContextResponse"> <xs:complexType> <xs:sequence> <xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="mmi:statusInfo" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/> <xs:attributeGroup ref="mmi:target.attrib"/> </xs:complexType> </xs:element> </xs:schema>
The following ladder diagram shows a possible message sequence upon a session creation. We assume that the Runtime Framework and a Interaction Manager session is already up and running. The user starts a multimodal session for example by starting a web browser and fetching a given URL.
The initial document contains scripts which providing the modality component functionality (e.g. understanding XML formatted life cycle events) and message transport capabilities (e.g. AJAX, but depends on the exact system implementation).
After loading the initial documents (and scripts) the modality component implementation issues a mmi:newContextRequest message to the Runtime Framework. The Runtime Framework may load a corresponding markup document, if necessary (could be SCXML), and initializes and starts the Interaction Manager.
In this sceneario the Interaction Manager manager logic issues a number of mmi:startRequest messages to the various modality components. One message is sent to the graphical modality component (GUI) to instruct it to load a HTML document. Another message is sent to a voice modality component (VUI) to play a welcome message.
The voice modality component has (in this example) to create a VoiceXML session. As VoiceXML 2.1 does not provide an external event interface a CCXML session will be used for external asynchronous communication. Therefore the voice modality component uses the session creation interface of CCXML 1.0 to create a session and start a corresponding script. This script will then make a call to a phone at the user device (which could be a regular phone or a SIP soft phone on the user's device). This scenario illustrates the use of a SIP phone, which may reside on the users mobile handset.
After successful setup of a CCXML session and the voice connection the voice modality component instructs the CCXML browser to start a VoiceXML dialog and passing it a corresponding VoiceXML script. The VoiceXML interpreter will execute the script and play out the welcome message. After the execution of the VoiceXML script has finished, the voice modality component notifies the Interaction Manager using the mmi:done event.
The next diagram gives a example for the possible message flow while processing of user input. In the given scenario the user wants to enter information using the voice modality component. To start the voice input the user has to use the "push-to-talk" button. The "push-to-talk" button (which might be a hardware button or a soft button on the screen) generates a corresponding event when pushed. This event is issues as a mmi:extension event towards the Interaction Manager. The Interaction Manager logic sends a mmi:startRequest to the voice modality component. This mmi:startRequest message contains a URL which points to a corresponding VoiceXML script. The voice modality component again starts a VoiceXML interpreter using the given URL. The VoiceXML interpreter loads the document and executes it. Now the system is ready for the user input. To notify the user about the availability of the voice input functionality the Interaction Manager might send an event to the GUI upon receiving the mmi:startResponse event (which indicates that the voice modality component has started to execute the document). But note that this is not shown in the picture.
The VoiceXML interpreter captures the users voice input and uses a speech recognition engine to recognize the utterance. The speech recognition result will be represented as an EMMA document and sent to the interaction manager using the mmi:done message. The Interaction Manager logic sends a mmi:extension message to the GUI modality component to instruct it to display the recognition result.
In the following sceneario a modality component instance will be destroyed as a reaction to a user input, e.g. because the user selected to change to the GUI only mode. In this case a mmi:clearContextRequest will be issued to the voice modality component. The voice modality component wrapper will then destroy the CCXML (and VoiceXML) session.
The application logic (i.e. the IM) may also decide to indicate the removed voice functionality and disable an icon on the screen which indicates the availability of the voice modality.
This section presents a detailed example of how an implementation of this architecture. For the sake of concreteness, it specifies a number of details that are not included in this document. It is based on the MMI use case document [MMIUse], specifically the second use case, which presents a multimodal in-car application for giving driving directions. Three languages are involved in the design view:
The remainder of the discussion involves the run-time view. The numbered items are taken from the "User Action/External Input" field of the event table. The appended comments are based on the working group's discussion of the use case.
Recognition can be done locally, remotely (on the server) or distributed between the device and the server. By default, the location of event handling is determined by the markup. If there is a local handler for an event specified in the document, the event is handled locally. If not, the event is forwarded to the server. Thus if the markup specifies a speech-started event handler, that event will be consumed locally. Otherwise it will be forwarded to the server. However, remote ASR requires more than simply forwarding the speech-started event to the server because the audio channel must be established. This level of configuration is handled by the device profile, but can be overridden by the markup. Note that the remote server might contain a full VoiceXML interpreter as well as ASR capabilities. In that case, the relevant markup would be sent to the server along with the audio. The protocol used to control the remote recognizer and ship it audio is not part of the MMI specification (but may well be MRCP.)
Open Issue: The previous paragraph about local vs remote event handling is retained from an earlier draft. Since the Modality Component is a black box to the Runtime Framework, the local vs remote distinction should be internal to it. Therefore the event handlers would have to be specified in the VoiceXML markup. But no such possibility exists in VoiceXML 2.0. One option would be to make the local vs remote distinction vendor-specific, so that each Modality Component provider would decide whether to support remote operations and, if so, how to configure them. Alternatively, we could define the DCCI properties for remote recognition, but make it optional that vendors support them. In either case, it would be up to the VoiceXML Modality Component communicate with the remote server, etc. Newer languages, such as VoiceXML 3.0 could be designed to allow explicit markup control of local vs remote operations. Note that in the most complex case, there could be multiple simultaneous recognitions, some of which were local and some remote. This level of control is most easily achieved via markup, by attaching properties to individual grammars. DCCI properties are more suitable for setting global defaults.
When the IM receives the recognition result event, it parses it and retrieves the user's preferences from the DCCI component, which it then dispatches to the Modality Components, which adjust their displays, output, default grammars, etc. accordingly. In VoiceXML 2.0, each of the multiple voice Modality Components will receive the corresponding event.
This particular step in the use case shows the usefulness of the Interaction Manager. One can imagine an architecture lacking an IM in which the Modality Components communicate with each other directly. In this case, all Modality Components would have to handle the location update events separately. This would mean considerable duplication of markup and calculation. Consider in particular the case of a VoiceXML 2.0 Form which is supposed to warn the driver when he went off course. If there is an IM, this Form will simply contain the off-course dialog and will be triggered by an appropriate event from the IM. In the absence of the IM, however, the Form will have to be invoked on each location update event. The Form itself will have to calculate whether the user is off-course, exiting without saying anything if he is not. In parallel, the HTML Modality Component will be performing a similar calculation to determine whether to update its display. The overall application is simpler and more modular if the location calculation and other application logic is placed in the IM, which will then invoke the individual Modality Components only when it is time to interact with the user.
Note on the GPS. We assume that the GPS raises four types of events: On-Course Updates, Off-Course Alerts, Loss-of-Signal Alerts, and Recovery of Signal Notifications. The Off-Course Alert is covered below. The Loss-of-Signal Alert is important since the system must know if its position and course information is reliable. At the very least, we would assume that the graphical display would be modified when the signal was lost. An audio earcon would also be appropriate. Similarly, the Recovery of Signal Notification would cause a change in the display and possibly a audio notification. This event would also contain an indication of the number of satellites detected, since this determines the accuracy of the signal: three satellites are necessary to provide x and y coordinate, while a fourth satellite allows the determination of height as well. Finally, note that the GPS can assume that the car's location does not change while the engine is off. Thus when it starts up it will assume that it is at its last recorded location. This should make the initialization process quicker.
When the IM is satisfied with the confidence levels, it ships the n-best list off to a remote server, which adds graphical information for at least the first choice. The server may also need to modify the n-best list, since items that are linguistically unambiguous may turn out to be ambiguous in the database (e.g., "Starbucks"). Now the IM instructs the HTML component to display the hypothesized destination (first item on n-best list) on the screen and instructs the speech component to start a confirmation dialog. Note that the submission to the remote server should be similar to the <data> tag in VoiceXML 2.1 in that it does not require a document transition. (That is, the remote server should not have to generate a new IM document/state machine just to add graphical information to the n-best list.)
Modality components can be classified into either of three categories: simple, complex or nested.
A simple modality component presents information to a user or captures information from a user as directed by an interaction manager. A simple modality component is atomic in that it can not be portioned into two or ore simple modality components that send events among themselves. A simple modality component is like a black box in that the interaction manager can not directly access any function inside of the black box other than by using life cycle events.
A simple modality component might contain functionality to present one of the following types of information to the user or user agent. For example:
TTS—generates synthetic speech from a text string
Audio replay—replays an audio file to a user
GUI presentation—presents HTML on a display device.
Ink replay—replays one or more ink strokes
Video replay—replays one or more video clips
A simple modality component might contain functionality to capture one of the following types of information from the user or user agent as directed by a complex modality or interaction manager:
Audio capture—records user utterances
ASR—captures text from the user by using a grammar to convert spoken voice into text
DTMF—captures integers from a user by using a grammar a user capture digits represented by the sounds created by touch tone keypad on a phone
Ink capture—capture one or more ink strokes
Ink recognition—captures one or more ink strokes and interprets them as text by using a grammar.
Speaker verification—determines if a user is who the user claims to be by comparing spoken voice characteristics with the voice characteristics known to be associated with the user
Speaker identification—determines who a speak is by comparing spoken voice characteristics with a set of preexisting voice characterists of several individuals.
Face verification—determines if a user is who the user claims to be by comparing face patterns with the face patterns known to be associated with the user
Face identification—determines who a speak is by comparing face pattern characteristics with a set of preexisting face patterns of several individuals
GPS—captures the current GPS location of a device.
Keyboard or mouse—captures information entered by the user using a keyboard or mouse.
Figure 1 illustrates two simple modality components—ASR modality for capturing input from the user and TTS for presenting output to the user. Note that all information exchanged between the two modality components must be sent as life cycle events to the interaction manager which forwards them to the other modality component.
A complex modality component may contain functionality of two or more simple modality components, for example:
GUI—presents information to the user, and captures keystrokes and mouse movements
VXML—presents a VoiceXML dialog to the user that both present speech to the user and captures the user's speech
GUI/VUI—enables user to both speak and listen, and read and type.
Figure 2 illustrates a complex modality component containing two functions, ASR and TTS. The ASR and TTS functions within the complex modality component may communicate directly with each other, in addition to sending and receiving life cycle events with the interaction manager
A nested modality component is a set of modality components and a script (possibly written in SCXML) that manages them. The script communicates with the child modality components using life cycle events. The script communicates with the interaction manager using only life cycle events. The children modality components may not communicate directly with each other.
Figure 3 illustrates a nested modality component with two child modality components, ASR and TTS.
In effect, the script within a nested modality component can be thought of as an interaction manager that manages the child modality components. In effect, a nested modality component is a nested interaction manager. This is the so-called "Russian Doll" model of nested interaction managers.
The following rules guarantee that modalities are portable from interaction manager to interaction manager.
The MMI life cycle events are the mechanism through which a modality component communicates with the interaction manager. The MC author must define how the modality component will respond to each life-cycle event. A modality component must respond to every life cycle event it receives from the interaction manager in the cases where a response is required, as defined by the MMI Architecture. For example, if a modality component presents a static display, it must respond to a <pause> event with a <pauseResponse> event even if the static display modality component does nothing else in response to the <pause> event.
For each life cycle event, define the parameters and syntax of the "data" element of the corresponding the life cycle event that will be used in performing that function. For example, the <startRequest> event for a speech recognition modality component might include parameters like timeout, confidence threshold, max n-best, and grammar.
Define an <extensionNotification> event to communicate these functions to and from the interaction manager
For example:
SSML for a speech synthesis simple modality component
SRGS and SISR for a speech recognition simple modality component
VoiceXML 2.1, SSML, SRGS, and SISR for a speech complex modality component
If a modality component captures or generates information, then it should format the information using the EMMA format and use an extension event to send that information to the interaction manager.
The MC developer must specify all error codes that are specific to the component. If the MC is based on another technology, the developer can provide a reference to that technology specification. For instance, if the MC is based on VoiceXML, a reference to the VoiceXML spec for VoiceXML errors can be included instead of listing each VoiceXML error.
Errors such as XML errors and MMI protocol errors must be
handled in accordance with the rules laid out in the MMI architecture. These
errors do not need to be documented.
The following guidelines should be helpful for modality authors to make modalities portable from interaction manager to interaction manager.
For example, if the ASR fails to recognize a user's utterance, a prompt may be presented to the user asking the user to try again by the TTS function. As another example, if the ASR fails to recognize a user's utterance, a GUI function might display the n-best list on a screen so the user can select the desired word. Efficiency concerns may indicate that two modality components be combined into a single complex modality component.
For example, a TTS function must be synchronized with a visual talking head so that the lip movements are synchronized with the words. As another example, a TTS functions presents information about the each graphical item that the user places "in focus." Again, efficiency concerns may indicate that the TTS and talking head be two modality components be combined into a single complex modality component.
Writing an application using a nested modality component may be easier than writing the same application using multiple modality components if the nested modality component hides much of the complexity of managing the children modality components.
Consider a theoretical face identification modality component that takes an image or images of a face and returns the set of possible matches and the confidence of the face identification software in each match. An API to that modality component would include events for starting the component, providing data, and for receiving results back from the component.
This particular example includes the information needed to run this component in the "startRequest" and "doneNotification" events; that is, in this example no "extensionNotification" events are used, although extensionNotification events could be part of another modality component's API. This example assumes that an image has already been acquired from some source; however, another possibility would be to also include image acquisition in the operation of the component.
Depending on the capabilities of the modality component, other possible information that might be included would be the algorithm to be used or the image format to expect. We emphasize that this is just an example to indicate the kinds of information that might be used by a multimodal application that includes face recognition. The actual interface used in real applications should be defined by experts in the field.
The use case is a face identification component that identifies one of a set of employees on the basis of face images.
The MMI Runtime Framework could use the following events to communicate with such a component.
Rule | Component Information |
---|---|
Rule 1: Each modality component must implement all of the MMI life cycle events | See Table 2 for the details of the implementation of the life cycle events. |
Rule 2: Identify other functions of the modality component that are relevant to the interaction manager. | All the functions of the component are covered in the life cycle events, no other functions are needed. |
Rule 3: If the component uses media, specify the media format. | The component uses the jpeg format for images to be identified and for its image database. |
Rule 4: Specify protocols supported by the component for transmitting media (e.g. SIP). | The component uses HTTP for transmitting media. |
Rule 5: Specify supported human languages | This component does not support any human languages. |
Rule 6: Specify supporting languages required by the component | This component does not require any markup languages. |
Rule 7: Modality components sending data to the interaction manager must use the EMMA format. | This component uses EMMA. |
Life Cycle Event | Component Implementation |
---|---|
newContextRequest | (Standard) The component requests a new context from the IM. |
newContextResponse | (Standard) The component starts a new context and assigns the new context id to it. |
prepareRequest | The component prepares resources to be used in identification, specifically, the image database. |
prepareResponse | (Standard) If the database of known users is not found, the error message "known users not found" is returned in the <statusInfo> element. |
startRequest | The component starts processing if possible, using a specified image, image database, threshold, and limit on the size of nbest results to be returned. |
startResponse | (Standard) If the database of known users is not found, the error message "known users not found" is returned in the <statusInfo> element. |
doneNotification | Identification results in EMMA format are reported in the "data" field.The mode is "photograph", the medium is "visual", the function is "identification", and verbal is "false". |
cancelRequest | This component stops processing when it receives a "cancelRequest". It always performs a hard stop whether or not the IM requests a hard stop. |
cancelResponse | (Standard) |
pauseRequest | This component cannot pause. |
pauseResponse | <statusInfo> field is "cannot pause". |
resumeRequest | This component cannot resume. |
resumeResponse | <statusInfo> field is "cannot resume". |
extensionNotification | This component does not use "extensionNotification". It ignores any "extensionNotification" events that are sent to it by the IM. |
clearContextRequest | (Standard) |
clearContextResponse | (Standard) |
statusRequest | (Standard) |
statusResponse | The component returns a standard life cycle response. The "automaticUpdate" attribute is "false", because this component does not supply automatic updates. |
To start the component, a startRequest event from the RTF to the face identification component is sent, asking it to start an identification. It assumes that images found at a certain URI are to be identified by comparing them against a known set of employees found at another URI. The confidence threshold of the component is set to .5 and the RTF requests a maximum of five possible matches.
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:startRequest source="uri:RTFURI" context="URI-1" requestID="request-1"> <mmi:data> <face-identification-parameters threshold=".5" unknown="someURI" known="uri:employees" max-nbest="5"/> </mmi:data> </mmi:startRequest> </mmi:mmi>
As part of support for the life cycle events, a modality component is required to respond to a startRequest event with a startResponse event. Here's an example of a startResponse from the face identification component to the RTF informing the RTF that the face identification component has successfully started.
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:startResponse source="uri:faceURI" context="URI-1" requestID="request-1" status="success"/> </mmi:mmi>
Here's an example of a startResponse event from the face identification component to the RTF in the case of failure, with an example failure message. In this case the failure message indicates that the known images cannot be found.
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:startResponse source="uri:faceURI" context="URI-1" requestID="request-1" status="failure"> <mmi:statusInfo> known users not found </mmi:statusInfo> </mmi:startResponse> </mmi:mmi>
Here's an example of an output event, sent from the face identification component to the RTF, using EMMA to represent the identification results. Two results with different confidences are returned.
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:doneNotification source="uri:faceURI" context="URI-1" status="success" requestID="request-1"> <mmi:data> <emma:emma version="1.0"> <emma:one-of emma:medium="visual" emma:verbal="false" emma:mode="photograph" emma:function="identification"> <emma:interpretation id="int1" emma:confidence=".75"> <person>12345</person> <name>Mary Smith</name> </emma:interpretation> <emma:interpretation id="int2" emma:confidence=".6"> <person>67890</person> <name>Jim Jones</name> </emma:interpretation> </emma:one-of> </emma:emma> </mmi:data> </mmi:doneNotification> </mmi:mmi>
This is an example of EMMA output in the case where the face image doesn't match any of the employees.
<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:doneNotification source="uri:faceURI" context="URI-1" status="success" requestID="request-1" > <mmi:data> <emma:emma version="1.0"> <emma:interpretation id="int1" emma:confidence="0.0" uninterpreted="true" emma:medium="visual" emma:mode="photograph" emma:function="identification"/> </emma:emma> </mmi:data> </mmi:doneNotification> </mmi:mmi>