Multimodal Interaction Activity

Extending the Web to support multiple modes of interaction.

News

Introduction

The Mission

The Multimodal Interaction Activity seeks to extend the Web to allow users to dynamically select the most appropriate mode of interaction for their current needs, including any disabilities, whilst enabling developers to provide an effective user interface for whichever modes the user selects. Depending upon the device, users will be able to provide input via speech, handwriting, and keystrokes, with output presented via displays, pre-recorded and synthetic speech, audio, and tactile mechanisms such as mobile phone vibrators and Braille strips.

Multimodal interaction offers significant ease of use benefits over uni-modal interaction, for instance, when hands-free operation is needed, for mobile devices with limited keypads, and for controlling other devices when a traditional desktop computer is unvailable to host the application user interface. This is being driven by advances in embedded and network-based speech processing that are creating opportunities for integrated multimodal Web browsers and for solutions that separate the handling of visual and aural modalities, for example, by coupling a local XHTML user agent with a remote VoiceXML user agent.

Target Audience

The target audience of the Multimodal Interaction Working Group (member only link) are vendors and service providers of multimodal applications, and should include a range of organizations in different industry sectors like:

Mobile and hand-held devices
As a result of increasingly capable networks, devices, and speech recognition technology, the number of existing multimodal applications, especially mobile applications, is rapidly accelerating. Multimodal Voice Search in particular is a relatively new and compelling use case, and has been implemented in applications by a number of companies, including Google, Microsoft, Yahoo, Vlingo, SpeechCycle, Novauris, AT&T, Openstream, Vocalia, Metaphor Solutions and Sound Hound. Speech offers a welcome means to interact with smaller devices, allowing one-handed and hands-free operation. Users benefit from being able to choose which modalities they find convenient in any situation. The Working Group should be of interest to companies developing smart phones and personal digital assistants or who are interested in providing tools and technology to support the delivery of multimodal services to such devices.
Please note that a related effort has recently been initiated in the W3C by the HTML Speech Incubator Group (HTML Speech XG). The focus of the XG is developing proposals for accessing speech recognition and speech synthesis from HTML5 browsers, and Voice Search and Speech Command Interfaces are possible use cases for these technologies in the browser. However, the XG does not attempt to address modalities other than speech, such as handwriting, emotion, or the wide variety of present and future input modalities. Similarly, it doesn't attempt to address non-browser contexts. In contrast, the Multimodal Architecture provides a generic framework for modality integration and control. Speech in the browser can be seen as a special case of the kind of modality integration covered by the MMI Architecture. The Multimodal Interaction Working Group has been collaboratively working with the XG, and will continue to liaise with them on topics of common interest. For example, the XG has adopted EMMA as a speech recognition result format.
Home appliances, e.g., TV, and home networks
Multimodal interfaces are expected to add value to remote control of home entertainment systems, as well as finding a role for other systems around the home. Companies involved in developing embedded systems and consumer electronics should be interested in W3C's work on multimodal interaction.
Enterprise office applications and devices
Multimodal has benefits for desktops, wall mounted interactive displays, multi-function copiers and other office equipment which offer a richer user experience and the chance to use additional modalities like speech and pens to existing modalities like keyboards and mice. W3C's standardization work in this area should be of interest to companies developing client software and application authoring technologies, and who wish to ensure that the resulting standards live up to their needs.
Intelligent IT ready cars
With the emergence of dashboard integrated high resolution color displays for navigation, communication and entertainment services, W3C's work on open standards for multimodal interaction should be of interest to companies working on developing the next generation of in-car systems.
Medical applications
Mobile healthcare professionals and practitioners of telemedicine will benefit from multimodal standards for interactions with remote patients as well as for collaboration with distant colleagues.

Current Situation

The Multimodal Interaction Working Group was launched in 2002 following a joint workshop between the W3C and the WAP Forum. The Working Group's initial focus was on use cases and requirements. This led to the publication of the W3C Multimodal Interaction Framework, and in turn to work on extensible multi-modal annotations (EMMA), and InkML, an XML language for ink traces. The Working Group has also worked on integration of composite multimodal input; dynamic adaptation to device configurations, user preferences and environmental conditions (now transferred to the Device Independence Activity); modality component interfaces; and a study of current approaches to interaction management. The Working Group has now been re-chartered through 31 July 2013 under the terms of the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. The Working Group is chaired by Deborah Dahl. The W3C Team Contact is Kazuyuki Ashimura.

We want to hear from you!

We are very interested in your comments and suggestions. If you have implemented multimodal interfaces, please share your experiences with us, as we are particularly interested in reports on implementations and their usability for both end-users and application developers. We welcome comments on any of our published documents. If you have a proposal for multimodal authoring language, please let us know. To subscribe to the discussion list send an email to www-multimodal-request@w3.org with the word subscribe in the subject header. Previous discussion can be found in the public archive. To unsubscribe send an email to www-multimodal-request@w3.org with the word unsubscribe in the subject header.

How to join the Working Group

If your organization is already a member of W3C, ask your W3C Advisory Comittee Representative (member only link) to fill out the online registration form to confirm that your organization is prepared to commit the time and expense involved in particpating in the group. You will be expected to attend all Working Group meetings (about 3 or 4 times a year) and to respond in a timely fashion to email requests. Further details about joining are available on the Working Group (member only link) page. Requirements for patent disclosures, as well as terms and conditions for licensing essential IPR are given in the W3C Patent Policy.

More information about the W3C is available, as is information about joining W3C.

Patent Disclosures

W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent.

Revised publication target dates

Specification FPWD LC CR PR Rec
Multimodal Architecture and Interfaces - Completed
- 2nd WD
- 3rd WD
- 4th WD
- 5th WD
- 6th WD
- 7th WD
Completed TBD TBD TBD
EMMA 2.0 4Q 2009 January 2011 TBD TBD TBD
EMMA Completed Completed Completed Completed Completed
(10 Feb. 2009)
InkML Completed 1st LC: Completed

2nd LC: Completed
Completed April 2010 June 2010
EmotionML Completed
- 2nd
Completed June 2011 TBD TBD
Ink Modality Component Definition Completed (as a WG Notes) - - - -
Voice Modality Component Definition December 2009
(as a WG Notes)
- - - -

Work in Progress

This is intended to give you a brief summary of each of the major work items under development by the Multimodal Interaction Working Group. The suite of specifications is known as the W3C Multimodal Interaction Framework.

Current Work

The following indicates current work items. Additional work is expected on topics described in the Scope section of the charter.

Multimodal Architecture

Main Architecture draft

A loosely coupled architecture for the Multimodal Interaction Framework that focuses on providing a general means for components to communicate with each other, plus basic infrastructure for application contrl and platform services. Work is continuing on how the architecture can be realized in terms of well defined component interfaces and eventing models.

MMI Authoring
MMI Best Practices

Extensible Multi-Modal Annotations (EMMA)

- EMMA 1.0

EMMA has been developed as a data exchange format for the interface between input processors and interaction management systems. It will define the means for recognizers to annotate application specific data with information such as confidence scores, time stamps, input mode (e.g. key strokes, speech or pen), alternative recognition hypotheses, and partial recognition results etc. EMMA is a target data format for the semantic interpretation specification being developed in the Voice Browser Activity, and which describes annotations to speech grammars for extracting application specific data as a result of speech recognition. EMMA supercedes earlier work on the natural language semantics markup language in the Voice Browser Activity.

- EMMA 2.0

Since EMMA 1.0 became a W3C Recommendation, a number of new possible use cases for the EMMA language have emerged. These include the use of EMMA to represent multimodal output, biometrics, emotion, sensor data, multi-stage dialogs, and interactions with multiple users. So the Working Group have decided to work on a document capturing use cases and issues for a series of possible extensions to EMMA, and published a Working Group Note to seek feedback on the various different use cases.

InkML - an XML language for digital ink traces

This work item sets out to define an XML data exchange format for ink entered with an electronic pen or stylus as part of a multimodal system. This will enable the capture and server-side processing of handwriting, gestures, drawings, and specific notations for mathematics, music, chemistry and other fields, as well as supporting further research on this processing. The Ink subgroup maintains a separate public page devoted to W3C's work on pen and stylus input.

Emotion Markup Language (EmotionML) 1.0

EmotionML will provide representations of emotions and related states for technological applications. As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Related Materials

Workshops

MMI related presentations

For more details on other organizations see the Multimodal Interaction Charter.

W3C Team Contact

Kazuyuki Ashimura <ashimura@w3.org> - Multimodal Interaction Activity Lead

Copyright © 2002-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements. This page was last updated on $Date: 2012/01/25 01:54:29 $