Registration & Discovery of Multimodal Modality Components in Multimodal Systems: Use Cases and Requirements

W3C Working Group Note 5 July 2012

This version:: http://www.w3.org/TR/2012/NOTE-mmi-discovery-20120705/
Latest published version:: http://www.w3.org/TR/mmi-discovery/
Previous version:: none
Editor:: B. Helena Rodriguez, Institut Telecom
Authors:: Piotr Wiechno, France Telecom; Deborah Dahl, W3C Invited Experts; Kazuyuki Ashimura, W3C; Raj Tumuluri, Openstream, Inc.

Abstract

This document addresses people who want either to develop Modality Components for Applications that communicate with the user through different modalities such as voice, gesture, or handwriting, and/or to distribute them through a multimodal system using multi-biometric elements, multimodal interfaces or multi-sensor recognizers over a local network or "in the cloud". With this goal, this document collects a number of use cases together with their goals and requirements for describing, publishing, discovering, registering and subscribing to Modality Components in a system implemented according the Multimodal Architecture Specification. In this way, Modality Components can be used by automated tools to power advanced Services such as : more accurate searches based on modality, behavior recognition for a better interaction with intelligent software agents and an enhanced knowledge management achieved by the means of capturing and producing emotional data.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 5 July 2012 W3C Working Group Note of "Registration & Discovery of Multimodal Modality Components in Multimodal Systems: Use Cases and Requirements". This W3C Working Group Note has been developed by the Multimodal Interaction Working Group of the W3C Multimodal Interaction Activity.

This document was published by the Multimodal Interaction Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to www-multimodal@w3.org (subscribe, archives). All comments are welcomed and should have a subject starting with the prefix '[dis]'.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

1. Introduction
2. Domain Vocabulary
3. Use Cases
3.1. Use Cases SET 1 : Smart Homes
3.2. Use Cases SET 2 : Personal Externalized Interfaces
3.3. Use Cases SET 3 : Public Spaces
3.4. Use Cases SET 4 : Health Sensing
3.5. Use Cases SET 5 : Synergistic Interaction and Recognition
4. Requirements
4.1 Distribution
4.2 Advertisement
4.3 Discovery
4.4 Registration
4.5 Querying
4.6 Open Issues
5. Related Work (Informative)
A. References

1. Introduction

User interaction with Internet Applications on mobile phones, personal computers, tablets or other electronic Devices is moving towards a multi-mode environment in which important parts of the interaction are supported in multiple ways. This heterogeneity is driven by Applications that compete to enrich the user experience in accessing all kinds of services. More and more, Applications need interaction variety, which has been proven to provide numerous concurrent advantages. At the same time, it brings new challenges in multimodal integration, which is often quite difficult to handle in a context with multiple networks and input/output resources.

Today, users, vendors, operators and broadcasters can produce and use all kinds of different Media and Devices that are capable of supporting multiple modes of input or output. In this context, tools for authoring, edition or distribution of Media for Application developers are well documented. Mature proprietary / open source tools and Services that handle, capture, present, play, annotate or recognize Media in multiple modes are available. Nevertheless, there is a lack of powerful tools or practices for a richer integration and semantic synchronization of all these media.

To the best of our knowledge, there is no standardized way to build a web Application that can dynamically combine and control discovered modalities by querying a registry based on user-experience data and modality states. This document describes design requirements that the Multimodal Architecture and Interfaces specification needs to cover in order to address this problem.

2. Domain Vocabulary

Interaction Context

For the purposes of this document an Interaction Context represents a single exchange between a system and one or multiple users across one or multiple interaction modes and covers the longest period of communication over which it would make sense for components to keep the information available. It can be as simple as a single period of audio visual content (e.g. a program ), a phone call or a web session. But it can be also a richer interaction combining for example, voice , gesture and a direct interaction with a light pointer or a shared whiteboard with an associated VoIP call that during the interaction evolves to a text chat. In these cases, a single context persists across various modality configurations. For more details see the multimodal context in the MMI architecture.

Multimodal System

For the purposes of this document a Multimodal System is any system communicating with the user through different modalities such as voice, gesture, or handwriting in the same interaction cycle identified by an unique context. In a Multimodal System the Application or the final user can dynamically switch modalities in the same context of information exchange. This is a bi-directional system with combined inputs and outputs in multiple sensorial modes (e.g. visual, acoustic, haptic, olfactive, gustative) and modalities (e.g. voice, gesture, handwriting, biometrics capture, temperature sensing, etc). This is also a system in which input and output data can be integrated [ See Fusion ] or dissociated [ See Fission ] in order to identify the meaning of the user's behavior or in order to compose a more adapted, relevant and pertinent returning message using multiple media, modes and modalities. [ See a Multimodal System Example ]

A Multimodal System can be any Infrastructure ( or any IaaS ), any Platform ( or any PaaS ) or any Software ( on any SaaS ) implementing human-centered multimodal communication. For more information : [ See the basic component's Description for a Multimodal Interaction Framework ]

Modality Component

For the purpose of this document, a modality is a term that covers the way an idea could be communicated or the manner an action could be performed. In some mobile multimodal systems, e.g. the primary modality is speech and an additional modality can be typically gesture, gaze, sketch, or any combination thereof. These are forms of representing information in a known and recognizable logical structure. For example, acoustic data can be expressed as a musical sound modality (e.g. a human singing) or as a speech modality (e.g. a human talking). Following this idea, in this document a Modality Component is a logic entity that handles the input and output of different hardware Devices (e.g. microphone, graphic tablet, keyboard) or software Services (e.g. motion detection, biometric changes) associated with the Multimodal System. Modality Components are responsible for specific tasks, including handling inputs and outputs in various ways, such as speech, writing, video, etc.. Modality Components are also loosely coupled software modules that may be either co-resident on a device or distributed across a network.

For example, a Modality Component can be charged at the same time of the speech recognition and the sound input management (i.e. some advanced signal treatment task like source separation). Another Modality Component can manage the complementary command inputs on two different devices: a graphics tablet and a microphone. Two modality components, can manage separately two complementary inputs given by a single device: a camcorder. Or finally, a Modality Component, can use an external recognition web service and only be responsible for the control of communication exchanges needed for the recognition task. [ See : this Modality Component Example ]

In all four cases the system has a generic Modality Component for the detection of a voice command input, despite the differences of implementation. Any Modality Component can potentially wrap multiple features provided by multiple physical Devices but also more than one Modality Component could be included in a single device. To this extent the Modality Component is an abstraction of the same kind of input handled and implemented differently in each case. For more information : [ See the Multimodal Interaction Framework input/output Description ]

Interaction Manager

For the purposes of this document, the Interaction Manager is also a logical component handling the multimodal integration and composition. It is responsible for all message exchanges between the components of the Multimodal System and the hosting runtime framework [ See : Architecture Components ]. This is a communication bus and also an event handler. Each Application can configure at least one Interaction Manager to define the required interaction logic. This is a controller at the core of all the multimodal interaction :

It manages the specific behaviors triggered by the events exchanged between the various input and output components.
It manages the communication between the modules and the client application.
It ensures consistency between multiple inputs and outputs and provides a general perception of the application's current status.
It is responsible for data synchronization.
It is responsible for focus management.
It manages communication with any other entity outside the system.

For more information : [ See the Multimodal Interaction Framework Description of an Interaction Manager ]

Data Component

For the purposes of this document a Data Component is a logic entity that stores the public and private data of any module in a Multimodal System. The data component's primary role is to save the public data that may be required by one or several Modality Components or by other modules (eg, a session component in the hosting framework ). The Data Component can be an internal module or an external module [ See : Example ]. This depends on the implementation chosen by each application. However, the Interaction Manager is the only module that has direct access to the Data Component and only Interaction Manager can view and edit the data and communicate with external servers if necessary. As a result, the Modality Components must use an Interaction Manager as a mediator to access any private or public data in the case of an implementation following the principle of nested dolls given by the MMI Architecture Specification [ See : MMI Recursion ]. For the storage of private data, each Modality Component can implement its own Data Component. This private Data Component accesses external servers and keeps the data that the Modality Components may require, for example, in the speech recognition task. For more information : [ See the Multimodal Interaction Framework Description of an Session Component ]

Application

For the purposes of this document, the term Application refers to a collection of events, components and resources which use server-side or client-side processing and the Multimodal Architecture Specification to provide sensorial, cognitive and emotional information [ See : Emotion Use Cases ] through a rich multimodal user experience. For example, a multimodal Application can be implemented to use in an integrated way, mobile Devices and cell phones, home appliances, Internet of Things objects, television and home networks, enterprise applications, web applications, "smart" cars or medical devices.

Service

For the purposes of this document, a Service is a set of functionalities associated with a process or system that performs a task and is wrapped in a Modality Component abstraction. While most of the industrial approaches for s ervice discovery use hardware Devices as networked Services and the majority of web services discovery approaches consider a Service as a software component performing a specific functionality, our definition covers both views and focuses on the Multimodal Component abstraction. Thus, a multimodal Service is any functionality wrapped in a Multimodal Component (e.g. a Modality Component or the Interaction Manager) and using one or multiple devices, device's services, Devices APIs or web services. For example, in the case of a Modality Component a Service is a functionality provided to handle input or output interaction in one or more devices, sensors, effectors, players (e.g. for virtual reality Media display), on-demand SaaS Recognizers (e.g. for natural language recognition) and on-demand SaaS User Interface Widgets (e.g. for geolocation display).

For the purposes of this document, we will use the term Service Description as a set of attributes (metadata) describing a particular service. The term Service Advertisement refers to the publication of the metadata and the Service itself by indexing the Service in some registry and making available the Service to the client's requests.

Device

For the purposes of this document, a Device is any hardware material resource wrapped in a Modality Component. The MMI Architecture does not describe how Modality Components are allocated to hardware devices, this is dependent on the implementation. A Device can act as an input sensor. In this case, for example, cameras, haptic devices, microphones, biometric devices, keyboards, mouse and writing tablets; are devices that provide input Services wrapped in one or many Modality Components. A Device can also act as an output effector. In this case displays, speakers, pressure applicators, projectors, Media players, vibrating Devices and even some sensor Devices (galvanic skins, switches, motion platforms) can provide effector Services wrapped in one or many Modality Components implemented in the Multimodal System.

Media

For the purposes of this document, the multimodal Media are resources depending on the semantics of the message, the user/author goals and the capabilities of the support itself. The multimodal Media adhere to a certain content mode (e.g. visual) and modality (e.g. animated image) and can be a document, a stream, a set of data or any other conventional logical entity to represent the information that is communicated through devices, services and networks. In a Multimodal System, Media can be played, displayed, recognized, touched, scented, heard, shaken, pushed, shared, adapted, fused, composed, etc.. and annotated to enhance integration, recognition, composition and interpretation [ See the use cases for MultiModal Media Annotation with EMMA ].

Fusion

For the purposes of this document, the multimodal Fusion mechanism is the integration of data from multiple Modality Components to produce specific and comprehensive unified information about the interaction. The goal of the multimodal Fusion is to integrate the data coming from a set of input Modality Components or to integrate data in the actions to be executed by a set of output Modality Components.

Fission

For the purposes of this document, the multimodal Fission mechanism refers to the media composition phenomenon: this is the process of realizing a single message on some combination of the available modalities for more than one sensorial mode. This process occurs before the processes that are dedicated to the information rendering or to the Media restitution. The goal is to generate an adequate message, according to the space (in the car, home, conference room), current activity (course, conference, brainstorm) or preferences and profile (chair of the conference, blind user, elderly). A Fission component determines which are the most relevant modalities, selects which media has the best content to return with the Modality Components available in the given conditions; and coordinates this final result [ See the MMI Architecture Standard Life-Cycle Events ]. In this way, the multimodal Fission mechanism handles the repartition of information among several Modality Components and resolves which part of the content will be generated within each Modality Component when the global multimodal content has been defined.

3. Use Cases

This section is a non-exhaustive list of use cases that would be enabled by implementing the Discovery & Register of Modality Components. Each use case is written according a template that is inspired on the Requirements for Home Networking Scenarios Use Cases template. This allows to locate our proposal in relation with the cases covered by the Web and TV Interest Group. Each use case is structured as a list of:

[Code]
In the form UC X.X (ucSet.ucNumber)
[Description]
A High level description of the use case
[Motivation]
Explanation of the need that addresses the use case.
[Requirements]
List of requirements implied by this Use Case in two columns: a low level requirement and a high level requirement; both related to the MMI Architecture specification.

3.1 Use Cases SET 1 : Smart Homes

Multimodality as an assistive support in intimate, personal and social spaces. Home appliances, entertainment equipment and intelligent home Devices are available as modality components for applications. These appliances can also provide their own Interaction Managers that connect to the multimodal interface on a smartphone to enable control of intelligent home features through the user's mobile device.

3.1.1 Audiovisual Devices Acting as Smartphone Extensions

A home entertainment system providing user interface components to command and extend a mobile application.

Code
[ UC 1.1 ]
Description
A home entertainment system is adapted by a mobile Device as a set of user interface components. In addition to Media rendering and playback, these Devices also act as input modalities for the smartphone's applications. The mobile Device does not have to be manipulated directly at all. A wall-mounted touch-sensitive TV can be used to navigate applications, and a wide-range microphone can handle speech input. Spatial (Kinect-style) gestures may also be used to control Application behavior. The smartphone discovers available modalities and arranges them to best serve the user's purpose. One display can be used to show photos and movies, another for navigation. As the user walks into another room, this configuration is adapted dynamically to the new location. User intervention may be sometimes required to decide on the most convenient modality configuration. The state of the interaction is maintained while switching between modality sets. For example, if the user was navigating a GUI menu in the living room, it is carried over to another screen when she switches rooms, or replaced with a different modality such as voice if there are no displays in the new location.
Motivation
Many of today's home Devices can provide similar functionality (e.g. audio/video playback), differing only in certain aspects of the user interface. This allows continuous interaction with a specific Application as the user moves from room to room, with the user interface switched automatically to the set of Devices available in the user's present location. On the other hand, some Devices can have specific capabilities and user interfaces that can be used to add information to a larger context that can be reused by other Applications and devices. This drives the need to spread an Application across different Devices to achieve a more user-adapted and meaningful interaction according to the context of use. Both aspects provide arguments for exploring use cases where Applications use distributed multimodal interfaces.
Requirements	Low-Level	High-Level
Distribution	Lan	Any
Advertisement	Devices Description	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Mediated, Active or Passive
Registration	The Application must use the Status Event to provide the Modality Component's Description and the register lifetime information.	Hard-State
Querying	The Application must send MMI requests/responses	Queries searching for attributes in the Description of the Modality Component (a predefined MC Data Model needed)

3.1.2 Intelligent Home Apparatus

Mobile Devices as command mediators to control intelligent home apparatus.

Code
[ UC 1.2 ]
Description
Smart home functionality (window blinds/lights/air conditioning etc.) is controlled through a multimodal interface, composed from modalities built into the house itself (e.g. speech and gesture recognition) and those available on the user's personal Devices (e.g. smartphone touchscreen). The system may automatically adapt to the preferences of a specific user, or enter a more complex interaction if multiple people are present. Sensors built into various Devices around the house can act as input modalities that feed information to the home and affect its behavior. For example, lights and temperature in the gym room can be adapted dynamically as workout intensity recorded by the fitness equipment increases. The same data can also increase or decrease volume and tempo of music tracks played by the user's mobile Device or the home's Media system. In addition, the intelligent home in tandem with the user's personal Devices can monitor user behavior for emotional patterns such as 'tired' or 'busy' and adapt further.
Motivation
The increase in the number of controllable Devices in an intelligent home creates a problem with controlling all available services in a coherent and useful manner. Having a shared context -built from information collected through sensors and direct user input - would improve recognition of user intent, and thus simplify interactions. In addition, multiple input mechanisms could be selected by the user based on Device type, level of trust and the type of interaction required for a particular task.
Requirements	Low-Level	High-Level
Distribution	Lan, Wan	Any
Advertisement	Application Manifests, Device Description	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed
Registration	The Modality Component's Description and the register lifetime information may be predefined in the home server.	Hard-State
Querying	The Application must send MMI requests/responses	MMI Lifecycle Events using knowing descriptions

3.2 Use Cases SET 2 : Personal Externalized Interfaces

Multimodality as a way to simplify and personalize interaction with complex Devices that are shared by family or close friends, for example cars. The user's personal mobile Device communicates with the external shared equipment in a way that enables the functionality of one to be available through the interface of the other.

3.2.1 Smart Cars

Multimodality as a nomadic tool to configure user interfaces provided in Smart Cars.

Code
[ UC 2.1 ]
Description
Basic in-car functionality is standardized to be managed by other devices. A user can control seat, radio or AC settings through a personalized multimodal interface shared by the car and her personal mobile device. User preferences are stored on the mobile Device (or in the cloud), and can be transferred across different car models handling a specific functionality (e.g. all cars with touchscreens should be able to adapt to a "high contrast" preference). The car can make itself available as a complex modality component that wraps around all functionality and supported modalities, or as a collection of modality components such as touchscreen / speech recognition / audio player. In the latter case, certain user preferences may be shared with other environments. For example, a user may opt to select the "high contrast" scheme at night on all of her displays, in the car or at home. A car that provides a set of modalities can be also adapted by the mobile Device to compose an interface for its functionality, for example manage playback of music tracks through the car's voice control system. Sensor data provided by the phone can be mixed with data recorded by the car's own sensors to profile user behavior which can be used as context in multimodal interaction.
Motivation
User interface personalization is a task that most often needs to be repeated for all Devices a user wishes to interact with recurringly. With complex devices, this task can also be very time-consuming, which is problematic if the user regularly accesses similar, but not identical Devices -as in the case of several cars rented over a month. A standardized set of personal information and preferences that could be used to configure personalizable Devices automatically would be very helpful for all these cases in which the interaction becomes a customary practice.
Requirements	Low-Level	High-Level
Distribution	Lan, Wan, Man	Distributed
Advertisement	Constructor-Dependent	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Mediated
Registration	Multi-node Registry	Soft-State
Querying	The Application must send MMI requests/responses	Requests limited to a federated group of MC

3.3 Use Cases SET 3 : Public Spaces

Multimodality as a way to access public Devices to perform tasks or for fun. Functionality of a user's personal Device is projected to modality components available in the immediate proximity. Public Devices present a multimodal interface tailored to the user's preferences. The user's mobile Device can also advertise itself as a modality component to allow for direct targeted messaging without privacy abuse.

3.3.1 Interactive Spaces

Multimodality as a support for advertising and accessing information on public user interfaces.

Code
[ UC 3.1 ]
Description
Interactive installations such as touch-sensitive or gesture-tracking billboards are set up in public places. Objects that present public information (e.g. a map of a shopping mall) can use a multimodal interface (built-in or in tandem with the user's mobile devices) to simplify user interaction and provide faster access. Other setups can stimulate social activities, allowing multiple people to enter an interaction simultaneously to work together towards a certain goal (for a prize) or just for fun (e.g. play a musical instrument or control a lighting exhibition). In a context where privacy is an issue (for example, with targeted/personalized alerts or advertisements), the user's mobile Device acts as a complex modality component or a set of modality components for an Interaction Manager running on the public network. This allows the user to receive relevant information in the way she sees fit. These alerts can serve as triggers for interaction with public Devices if the user chooses to do so.
Motivation
Public spaces provide many opportunities for engaging, social and fun interaction. At the same time, preserving privacy while sharing tasks and activities with other people is a major issue in ambient systems. These systems may also deliver personalized information in tandem with more general services presented publicly. A trustful discovery of the Modality Components available in such environments is a necessity to guarantee personalization and privacy in public-space applications.
Requirements	Low-Level	High-Level
Distribution	Lan, Wan, Man	Any
Advertisement	Application Manifests, Device Description, User Profile	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Mediated, Passive, Active
Registration	The Modality Component's Description and the register lifetime information may be predefined in the space server.	Soft-State
Querying	The Application must send MMI requests/responses	Requests limited to a federated group of MC

3.3.2 In-Office Events Assistance

Automatic in-office presentation hardware projectors/speakers/ASR/gesture tracking controlled through a mobile device

Code
[ UC 3.2 ]
Description
A conference room where a series of meetings will take place. People can go in and out of the room before, after and during the meeting. The door is "touched" by a badge. The system activates any available display in the room and the room and the event default Applications : the outline modality component, the notification Modality Component and the guiding Modality Component. The chair of the meeting is notified by a dynamically composed graphic animation, audio notification or a mobile phone notification, about the Modality Components availability with a shortcut to the default Application instance endpoint. The chair of the meeting selects a setup procedure by text amongst the discovered options. These options could be, for example: photo step-by-step instructions (smartphone, HDTV display, Web site), audio Instructions (Mp3 audio guide, Room speakers reproduction, HDTV audio) or RFID enhanced instructions (mobile SmartTag Reader, RFID Reader for smartphone). The chair of the meeting chooses the room speakers reproduction, the guiding Service is activated and he starts to set the video projector. Then some attendees arrived. The chair of the meeting changes to the slide show option and continues to follow the instructions at the same step it was paused but with another more private modality for example, a smartphone slideshow.
Motivation
Meeting Room environments are more and more complex. During meetings room setup users spend a lot of time trying to put up presentations and make remote connections: a task that usually affects his emotional state producing anxiety and increasing his stress level. Assistance Applications using modality discovery can reduce the task load and the cognitive load for the user with reassuring multimodal context-aware interfaces.
Requirements	Low-Level	High-Level
Distribution	Lan, Wan	Any
Advertisement	Application Manifests, Device Description, Situation Description, User Profile	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Passive
Registration	One part of the Modality Component's Description and the register lifetime information may be predefined in the space server.	Soft-State, Hard-State
Querying	The Application must send MMI requests/responses	Requests limited to a federated group of Modality Components

3.4 Use Cases SET 4 : Health Sensing

Multimodality as sensor and control support for medical reporting.

3.4.1 Health Notifiers

Multimodal Applications to alert and report medical data.

Code
[ UC 4.1 ]
Description
In medical facilities, a complex Modality Component provide multiple options to control sensor operations by voice or gesture ("start reading my blood pressure now") for example. The complex Modality Component is attached to a smartphone. The Application integrates information from multiple sensors (for example, blood pressure and heart rate); reports medical sensor readings periodically (for example, to a remote medical facility) and sends alerts when unusual readings/events are detected.
Motivation
In critical situations regarding health, like medical urgency, multimodality is the most effective way to communicate alerts. With this goal a dynamic and effective Modality Component discovery is needed. This is also the case when the goal is to monitor the health evolution of a person in all its complexity with rich media.
Requirements	Low-Level	High-Level
Distribution	Lan	Any
Advertisement	Application Manifests, Device Description, User Profile	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Passive
Registration	The Modality Component's Description and the register lifetime information may be predefined in the space server.	Hard-State
Querying	The Application must send MMI requests/responses	Queries to a Centralized Registry

3.5 Use Cases SET 5 : Synergistic Interaction and Recognition

Automatic discovery of Modality Component's resources providing high-level features of recognition and analysis to adapt application Services and user interfaces, track the user behavior and assist the user in temporary difficult tasks.

3.5.1 Multimodal support to recognition

Code
[ UC 5.1 ]
Description
A modality component is an audio recognizer trained with the more common sounds in the house to alert in case of emergency. In the same house security Application uses a video recognizer to identify people at the front door. These two Modality Component's' advertise some features, to cooperate with a remote home management Application using a discovery mechanism for its Fission feature. The Application validates and completes both recognition results while using them.
Motivation
The current recognizer system development has arrived to a point of maturity where if we want to dramatically enhance recognition, multimodal complementary results will be needed. In order to achieve this, an image recognizer can use results coming from other kind of recognizers (e.g. audio recognizer) within the network engaged in the same interaction cycle.
Requirements	Low-Level	High-Level
Distribution	Lan, Wan, Man	Any
Advertisement	Application Manifests	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Passive
Registration	The Modality Component's Description and the register lifetime information may be predefined in the server.	Hard-State
Querying	The Application must send MMI requests/responses	Queries to a Centralized Registry

3.5.2 Discovery to enhance synergic interaction

Multimodal discovery as a tool to avoid difficult interaction.

Code
[ UC 5.2 ]
Description
A person working mostly with a pc having a problem with his right arm and hands. He is unable to use a mouse or a keyboard for a few months. He can point at things, sketch, clap, make gestures, but he can not make any precise movements. An Application proposes a generic interface to allow this person the access to his most important tasks in his personal devices: to call someone, open a mailbox, access his agenda or navigate over some Web pages. It proposes child-oriented intuitive interfaces like a clapping-based Modality Component, a very articulated TTS component or reduced gesture input widgets. Other Modality Components like phone Devices with very big numbers, very simple remote controls, screens displaying text at high resolution, voice command Devices based on a reduced number or orders can be used.
Motivation
One of the main indicators concerning the usability of an Application is the corresponding level of accessibility provided by it. The opportunity for all the users to receive and to deliver all kinds of information, regardless of the information format or the type of user profile, state or impairment is a recurrent need in web applications. One of the means to achieve accessibility is the design of a more synergic interaction based on the discovery of multimodal Modality Components. Synergy is two or more entities functioning together to produce a result that is not obtainable independently. It means "working together". For example, in nomadic Applications (always affected by the changing context) to avoid disruptive interaction is an important issue. In these applications, user interaction is difficult, distracted and less precise. Multimodal discovery can increase synergic interaction offering new possibilities more adapted to the current disruptive. A well-founded discovery mechanism can enhance the Fusion process for target groups of users experiencing permanent or temporary learning difficulties or with sensorial, emotional or social impairments.
Requirements	Low-Level	High-Level
Distribution	Lan, Wan, Man	Any
Advertisement	Application Manifests, Device Description, User Profile	Modality Component's Description
Discovery	The Application must handle MMI requests/responses	Fixed, Mediated, Passive, Active
Registration	Multi-node Registry	Soft-State
Querying	The Application must send MMI requests/responses	Distributed Registry. Handled by pattern matching

4. Requirements

4.1 Distribution

Modality Components can be distributed in a centralized way, an hybrid way or a fully decentralized way. For the Discovery & Register purposes the distribution of the Modality Components influences how many requests the Multimodal System can handle in a given time interval, and how efficiently it can execute these requests. The MMI Architecture Specification is distribution-agnostic. Modality Components can be located anywhere: in a local network or in the web. The decision about how to distribute the Multimodal Constituents depends on the implementation.

4.2 Advertisement

Advertisement of Modality Components is one of the most influential criteria to evaluate how well the system can discover Modality Components, starting from the description mechanism. This allows the Multimodal System to reach correctness in the Modality Components retrieval, because a pertinent and expressive description enables the result to match more closely to the application's or user's request. On the other hand, Advertisement also affects the completeness in the Modality Component retrieval. To return all matching instances registered and corresponding to the user's request, the request criteria must match to some basic attributes defined in the Modality Component's Description. A Modality Component's Description that lists some multimodal attributes and actions is required.

For this reason, conforming specifications should provide a means for applications to advertise Modality Components in any of the distributions enumerated in section 5.1.

This Modality Component's' Description can be expressed as a data structure (e.g. ASN.1) as a simple attribute-value pairs list, as a list of attributes with hierarchical tree relationships (e.g. the intentional name schema in INS) , as a WSDL document or as an XML manifest document. The language and form of the document is Application dependent.

The kind of data described by this document depends also on the implementation and some examples of the information that could be advertised can be found in the Best practices for creating MMI Modality Components.
For instance, the Modality Component's' Description data can describe :

Information about the of the identity Modality Component :
For example the unique identifier of the Modality Component, its name, address, port number, its embedded Device or service, constructor, version, transport protocol or lifetime.
Information about the of the behavior of the Modality Component :
The list of content or Media handled by the Modality Component; a list of the commands or actions (e.g. described as a service, functionality, operation or activity), to which the Modality Component responds; parameters or arguments for each action; and information variables that models the state of the Modality Component.
Information about the of the context of use of the Modality Component :
This may cover the name of the organization providing the Modality Component, its authorizations, authentication procedures, privacy policies, its business type (e.g. its UDDI businessService), its access point , complementary information about invocation policies or implementation details.
Information about the semantics of the Modality Component for an specific domain :
A Modality Component type that can be application-specific or can follow some specification (e.g. UPnP, ECHONET or IANA) depending on the implementation. The Modality Component's' federation group (e.g. the Zone in AppleTalk Name Binding Protocol or the DNS subdomain), its scope (e.g. like UA scopes in SLP), its category, its intent (e.g. like Implicit Intents in Android or Web Intents) or any other semantic metadata.

A data model with some examples of attributes specific to the multimodal domain can be suggested by the MMI Architecture Specification to guarantee interoperability, expressiveness and relevance in the description's content from a Multimodal Interaction point of view. In this way, this data model can be used also as information support for the annotation, for the modality selection and for the Fusion and Fission mechanisms in the analyzers or synthesizers implemented in the Multimodal System.

As an example, Figure 1 shows a manifest that can be created at Design Time or at Load Time, and that can be joined by a list of Media handled by the Modality Component. It can be an annotated list of Media (e.g. EMMA) and its URIs.

Figure 1: Manifest that can be created either at Design Time or at Load Time

To store these descriptions, the MMI Data Component must be used as a directory support in a centralized or a distributed manner.

A centralized registry hosted by a Data Component should provide a register of Modality Components states and their descriptions; if the problems associated with having a centralized registry such as a single point of failure, bottlenecks, the scalability of data replication, the notification to subscribers when performing system upgrades or handling versioning of Services are not an issue. This decision will depend on the kind of Application being implemented.

A distributed registry hosted by multiple Data Components should provide multiple public or private registries . Data Components can be federated in groups following a mechanism similar to P2P or METEOR-S registries, if conducting inquiries across the federated environment of Data Components in a time consuming manner or the risk of some inconsistencies are not an issue. This decision also will depend on the kind of Application to implement.

Depending on the parameters given by the Application logic, the distribution of the Modality Components or the context of interaction, the advertisement can persist or not. It should be a hard-state advertisement with an infinite Modality Component lifetime or should be a soft-state advertisement, in which a life-time is associated to the Modality Component (e.g. the timestamps assigned with OGSA), and if it is not renewed before expiration, the Modality Component will be assumed as no longer available .

The communication model used by the registries in the Data Component must be specified by the interaction life-cycle events of the MMI Architecture. The Data Component deliver the descriptions of the Modality Components by an exchange of request and responses with MMI events.

4.3 Discovery

The MMI Architecture specification doesn't focus on the discovery implementation and is not language-oriented.
There could be various approaches of implementations for the discovery of Modality Components, e.g ECMAScript or any other language. Any infrastructure can be used as well.

Nevertheless, conforming specifications must use the protocol proposed by the interaction life-cycle events to discover Modality Components. Bootstrapping properties can be specified in a generic way based on Modality Component types defined by a domain taxonomy. These properties play an important role on the announcement to the discovery mechanism.

For example, at Load Time, on bootstrapping, when access to the registry is not preconfigured, the Interaction Manager can try to discover Modality Components using one of this four mechanisms : fixed discovery, mediated discovery, active discovery or passive discovery.

In the fixed discovery case, the Interaction Manager and the Modality Components are assumed to know their address and the port number to listen. In this way, the Interaction Manager can send a request via the StatusRequest/StatusResponse pair to the Modality Component asking for availability, manifest address and Media list, as showed in Figure 2. In other implementations, the Modality Component can also announce its availability and description directly to the Interaction Manager using the same event.

Figure 2: Interaction Manager sending a request via the StatusRequest/StatusResponse pair to the Modality Components

In mediated discovery, the Interaction Manager can use the scanning features provided by the underlying network (e.g. use DHCP) looking for Modality Components tagged in their descriptions with a specific group label (e.g. in federated distributions). If the Modality Component is not tagged it can use some generic mechanism provided only to subscribe to a generic 'welcome' group (e.g. in JXTA implementations). In this case, the Modality Component should send a request via the Status event to the Interaction Manager subscribing to the register. After the bootstrapping mechanism, the Interaction Manager can aks for a manifest address and the Media list address needed for advertisement. In other implementations the Modality Component can send it directly to the register after bootstrapping.

In active discovery, the Interaction Manager can initiate a multicast, anycast or broadcast request or any other push mechanism depending on the underlying networking mechanism implemented. This should be done via the Status Event or the Extension Notification defined by the MMI Architecture.

And finally in passive discovery, the Interaction Manager can listen to advertisement messages coming from Modality Components over the network in a known port. In this case, the Interaction Manager may provide some directory Service feature (e.g. acting as a Jini Lookup Service or the UPnP control points), looking for the Modality Component's' announcements that should be published with the Status Event or the Extension Notification. More precise details of the discovery protocols are out of scope for this document and are implementation dependent.

4.4 Registration

The Interaction Manager is actually used as support to the Modality Component's' registration by the use of Context requests. We propose to extend the Interaction Manager to registration and discovery to avoid the complexity of an approach that will add a new Component.

Figure 3 shows a possible example of registration in which a Modality Component uses a StatusRequest/StatusResponse pair to register its description on a known port. This is a case of fixed discovery in which a Modality Component announces its presence to the Interaction Manager and then, the Interaction Manager insert this information in some registry according to the implemented data Model.

Figure 3: Modality Components using a StatusRequest/StatusResponse pair to register its description

Soft-state and hard-state registering depends on the Modality Component lifetime information. In soft-state Advertisement a registration renewal is needed. This mechanism of renewal should be implemented with attributes proposed in the MMI Architecture like the RequestAutomaticUpdate or the ExtensionNotification. On the other hand, explicit de-registration should be implemented by the use of the Status Event.

As a consequence of the structure of the MMI Architecture that follows the 'Russian Doll' model, the distribution of Data Components in nested Modality Components facilitates the implementation of a multi-node Registry, which is a mechanism also available in UDDI, Jini and SLP. In this way the balance of the registry load can be resolved using the nested Data Component structure. It will also depend on the implementation.

4.5 Querying

In the MMI Architecture the query lifecycle must be handled with the Interaction Life-cycle Events Protocol. The monitoring of the state of a Modality Component is a shared responsibility between the Modality Component and the Interaction Manager. These components should use StatusRequest/StatusResponse or ExtensionNotifications to query updates, as showed in Figure 4.

The expressiveness of the discovery query and the format of the query result can be defined by the Interaction Life-cycle Events since the routing model depends on the implementation and is out of the scope of this document.

Modality Components and Interaction Manager using StatusRequest/StatusResponse of ExtensionNotification events to query updates

Figure 4: Modality Components and Interaction Manager using StatusRequest/StatusResponse of ExtensionNotification events to query updates

The correctness of the result of the query can be ensured by a pattern matching mechanism during the filtering and selection of Modality Components based, in the case of a centralized registry implementation, on the information stored as Descriptions of the Modality Component's in the Data Component Registry.

In the case of a distributed implementation in federated groups (e.g. by modality type, capability, scope, zones, DNS subgroups or intent categories), the correctness of the result of the query is limited to the administrative boundary defined by the networking underlying mechanism (e.g. UPnP or SLP ) which is also out of the scope of this document.

4.6 Open Issues

During this work, the MMI Discovery and Registration Subgroup did not have time to explore topics like Fault tolerance in Modality Component nodes or in Modality Component's' communication during discovery, register or state updates. Consistency maintenance is a subject of extreme interest and the MMI Discovery and Registration Subgroup would suggest the MMI working group to further investigate this topic from the point of view of the MMI Architecture, eventually with the use of specified attributes like the RequestAutomaticUpdate or the Extension Notification and the use of Modality Component States.

5.1 Web and TV Interest Group

The Web and TV Interest Group's work covers audio-visual content, e.g., broadcasting programs and Web pages, and related services delivered by satellite and terrestrial broadcasting, or via cable services, as well as delivery through IP. This group is currently working on services discovery for home-networking scenarios from the perspective of content providing.

5.2 Web Intents Task Force

The Web Intents Task Force works on a joint deliverable of the WebApps and Device APIs WGs that aims to produce a way for web applications to register themselves as handlers for services, and for other web applications to interact with such services (which may have been registered through Intents or in other implementation-specific manners) through simple requests brokered by the user agent (a system commonly known as Intents, or Activities). This Task Force is currently working on registering for a limited type of modalities and from a unimodal perspective oriented on process announcement and description.

5.3 Protocols and Formats Working Group

This working group covers the domain of web accessibility. Its objective is to explore how to make web content usable by persons with disabilities. The Accessible Rich Internet Applications Candidate Recommendation is primarily focused on developers of Web browsers, assistive technologies, and other user agents; developers of Web technologies (technical specifications); and developers of accessibility evaluation tools. WAI-ARIA provides a framework for adding attributes to identify features for user interaction, how they relate to each other, and their current state. WAI-ARIA describes new navigation techniques to mark regions and common Web structures as menus, primary content, secondary content, banner information, and other types of Web structures. This work is mostly oriented to web content, web UI components and web events annotation.

A. References

A.1 Key References

[ECMA-262]: ECMA International. ECMAScript Language Specification, Third Edition. December 1999. URL: http://www.ecma-international.org/publications/standards/Ecma-262.htm
[EMMA]: Michael Johnston. EMMA: Extensible MultiModal Annotation markup language. 10 February 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-emma-20090210/
[MMI-ARCH]: Jim Barnett. Multimodal Architecture and Interfaces. 12 January 2012. W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-mmi-arch-20120112/
[SLP]: E. Guttman et al. RFC 2608: Service Location Protocol. Version 2,1999. URL: http://www.ietf.org/rfc/rfc2608.txt
[WSDL]: Roberto Chinnici,Jean-Jacques Moreau, Arthur Ryman, Sanjiva Weerawarana. A Web Services Description Language WSDL, v.2.0, 26 June 2007. URL: http://www.w3.org/TR/wsdl20/
[XML]: Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, Eve Maler, John Cowan, François Yergeau. XML Extensible Markup Language, XML 1.1, 4 February 2004. URL: http://www.w3.org/TR/xml11/

A.2 Other References

[A-INTENT]: Google. Android Intent Class Documentation Available at URL: http://developer.android.com/reference/android/content/Intent.html
[ARIA]: James Craig et al. Accessible Rich Internet Applications WAI-ARIA version 1.0. W3C Working Draft, 16 September 2010. Available at URL: http://www.w3.org/TR/wai-aria/
[ASN.1]: John Larmouth, Douglas Steedman, James E. White. ASN.1 International Telecommunication Union. ISO/IEC 8824-1 > 8824-4. Available at URL: http://www.itu.int/ITU-T/asn1/introduction/index.htm
[ECHONET]: Echonet Consortium. Detailed stipulation for Echonet Device Objects Available at URL: http://www.echonet.gr.jp/english/spec/pdf/spec_v1e_appendix.pdf
[EMOTION-USE]: Marc Schröder, Enrico Zovato, Hannes Pirker, Christian Peter, Felix Burkhardt. Emotion Incubator Group Use Cases Available at URL: http://www.w3.org/2005/Incubator/emotion/XGR-emotion/#AppendixUseCases
[IANA]: IANA.IANA Domain Name Service ICCAN - Internet Assigned Numbers Authority (IANA). Available at URL: http://www.iana.org/
[INS]: William Adjie-Winoto, Elliot Schwartz, Hari Balakrishnan, Jeremy Lilley. INS: Intentional Naming System Available at URL: http://nms.csail.mit.edu/projects/ins/
[JINI]: Sun Microsystems JINI / RIVER Apache River (formerly JINI). Available at URL: http://river.apache.org/about.html
[INS/TWINE]: Magdalena Balazinska, Hari Balakrishnan, David Karger INS / TWINE Apache River (formerly JINI). Available at URL: http://nms.csail.mit.edu/projects/twine/
[JXTA]: Sun Microsystems Juxtapose JXTA Protocols Specification v2.0 October 16, 2007. v2.0 . Available at URL: http://jxta.kenai.com/Specifications/JXTAProtocols2_0.pdf
[METEOR-S]: Kunal Verma, Karthik Gomadam, Amit P. Sheth, John A. Miller, Zixin Wu. METEOR-S: Semantic Web Services and Processes Technical Report, 06/24/2005. Available at URL: http://lsdis.cs.uga.edu/projects/meteor-s/
[OGSA]: I. Foster, Argonne et al. OGSA The Open Grid Services Architecture, 24 July 2006 Version 1.5. Available at URL: http://www.ogf.org/documents/GFD.80.pdf
[RES-DIS]: Reaz Ahmed et al. Resource and Service Discovery in Large-Scale Multi-Domain Networks.in Ieee Communications Surveys & Tutorials. 4Th Quarter 2007, Vol 9. Available at URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5473882&isnumber=5764312
[UDDI]: Luc Clement, Andrew Hately, Claus von Riegen, Tony Rogers. UDDI Spec Technical Committee Draft, Version 3.0.2. Dated 2004/10/19. Available at URL: http://uddi.org/pubs/uddi_v3.htm
[UPNP-DEVICE]: UPnP Forum. UPnP Device Architecture 1.0 Version 1.0, 15 October 2008. PDF document. URL: http://www.upnp.org/specs/arch/UPnP-arch-DeviceArchitecture-v1.0-20081015.pdf
[UPNP-CONTENT]: UPnP Forum. ContentDirectory:4 Service Standardized DCP. Version 1.0, 31 December 2010. PDF document.URL: http://www.upnp.org/specs/av/UPnP-av-ContentDirectory-v4-Service-20101231.pdf
[WEB-INTENTS]: Paul Kinlan. Web Intents. Available at URL: http://webintents.org/