Multimodal Interaction Working Group - Publications

Recommendations

- history

As the Web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. EmotionML provides mechanisms to represent emotions in terms of scientifically valid descriptors: categories, dimensions, appraisals, and action tendencies. It is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

- history

This document describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.

- history

This document describes the syntax and semantics for the Ink Markup Language for use in the W3C Multimodal Interaction Framework as proposed by the W3C Multimodal Interaction Activity. The Ink Markup Language serves as the data format for representing ink entered with an electronic pen or stylus. The markup allows for the input and processing of handwriting, gestures, sketches, music and other notational languages in applications. It provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.

- history

The W3C Multimodal Interaction Working Group aims to develop specifications to enable access to the Web using multimodal interaction. This document is part of a set of specifications for multimodal systems, and provides details of an XML markup language for containing and annotating the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers.

Notes

- history

This document represents a public collection of emotion vocabularies that can be used with EmotionML. It was originally part of an earlier draft of the EmotionML specification, but was moved out of it so that we can easily update, extend and correct the list of vocabularies as required.

- history

This document addresses people who want either to develop Modality Components for Applications that communicate with the user through different modalities such as voice, gesture, or handwriting, and/or to distribute them through a multimodal system using multi-biometric elements, multimodal interfaces or multi-sensor recognizers over a local network or "in the cloud". With this goal, this document collects a number of use cases together with their goals and requirements for describing, publishing, discovering, registering and subscribing to Modality Components in a system implemented according the Multimodal Architecture Specification. In this way, Modality Components can be used by automated tools to power advanced Services such as : more accurate searches based on modality, behavior recognition for a better interaction with intelligent software agents and an enhanced knowledge management achieved by the means of capturing and producing emotional data.

- history

This document describes an interoperability test, executed by various members of the Multimodal Interaction Working Group, to demonstrate interoperability of multimodal components which are implementing the "Multimodal Architecture and Interfaces" MMI-ARCH specification.

- history

This document describes Modality Components in the MMI Architecture which are responsible for controlling the various input and output modalities on various devices by providing guidelines and suggestions for designing Modality Components. Also this document shows several possible examples of Modality Components, (1) face identification, (2) form-filling using handwriting recognition and (3) video display.

- history

The EMMA: Extensible MultiModal Annotation specification defines an XML markup language for capturing and providing metadata on the interpretation of inputs to multimodal systems. Throughout the implementation report process and discussion since EMMA 1.0 became a W3C Recommendation, a number of new possible use cases for the EMMA language have emerged. These include the use of EMMA to represent multimodal output, biometrics, emotion, sensor data, multi-stage dialogs, and interactions with multiple users. In this document, we describe these use cases and illustrate how the EMMA language could be extended to support them.

- history

This document describes a multimodal system which implements the W3C Multimodal Architecture and gives an example of a simple multimodal application authored using various W3C markup languages, including SCXML, CCXML, VoiceXML 2.1 and HTML.

- history

This document is based on the accumulated experience of several years of developing multimodal applications. It provides a collection of common sense advice for developers of multimodal user interfaces.

- history

Several years of multimodal application development in various business areas and on various device platforms has provided developers enough experience to provide detailed feedback about what they like, dislike, and want to see improve and continue. This experience is provided here as an input to the specifications under development in the W3C Multimodal Interaction and Voice Browser Activities.

- history

This document describes the DOM capabilities needed to support a heterogeneous multimodal environment and the current state of DOM interfaces supporting those capabilities. These DOM interfaces are used between modality components and their host environment in the W3C Multimodal Interaction Framework as proposed by the W3C Multimodal Interaction Activity.

The Multimodal Interaction Framework separates multimodal systems into a set of functional units, including Input and Output components, an Interaction Mananger, Session Components, System and Environment, and Application Functions. In order for those functional components to interact with each other to form an application interpreter, the browser implementation must allow for communication and coordination between those components. This DOM interface identifies the DOM APIs used to communicate and coordinate at the browser implemention level. Multimodal browsers can be stand-alone or distributed systems.

- history

This document introduces the W3C Multimodal Interaction Framework, and identifies the major components for multimodal systems. Each component represents a set of related functions. The framework identifies the markup languages used to describe information required by components and for data flowing among components. The W3C Multimodal Interaction Framework describes input and output modes widely used today and can be extended to include additional modes of user input and output as they become available.

- history

This document describes requirements for the Ink Markup Language that will be used in the multimodal interaction framework as proposed by the W3C Multimodal Interaction Working Group. The Ink Markup Language will serve as the data format for representing ink entered with an electronic pen or stylus in a multimodal system. The markup will allow for the input and processing of handwriting, gestures, sketches, music and other notational languages in web-based multimodal applications. In the context of the W3C Multimodal Interaction Framework, the markup provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.

- history

This document describes requirements for the Extensible MultiModal Annotation language (EMMA) specification under development in the W3C Multimodal Interaction Activity. EMMA is intended as a data format for the interface between input processors and interaction management systems. It will define the means for recognizers to annotate application specific data with information such as confidence scores, time stamps, input mode (e.g. key strokes, speech or pen), alternative recognition hypotheses, and partial recognition results, etc. EMMA is a target data format for the semantic interpretation specification being developed in the Voice Browser Activity, and which describes annotations to speech grammars for extracting application specific data as a result of speech recognition. EMMA supercedes earlier work on the natural language semantics markup language in the Voice Browser Activity.

- history

This document describes fundamental requirements for the specifications under development in the W3C Multimodal Interaction Activity. These requirements were derived from use case studies as discussed in Appendix A. They have been developed for use by the Multimodal Interaction Working Group (W3C Members only), but may also be relevant to other W3C working groups and related external standard activities.

The requirements cover general issues, inputs, outputs, architecture, integration, synchronization points, runtimes and deployments, but this document does not address application or deployment conformance rules.

- history

The W3C Multimodal Interaction Activity is developing specifications as a basis for a new breed of Web applications in which you can interact using multiple modes of interaction, for instance, using speech, hand writing, and key presses for input, and spoken prompts, audio and visual displays for output. This document describes several use cases for multimodal interaction and presents them in terms of varying device capabilities and the events needed by each use case to couple different components of a multimodal application.

Working Drafts

- history

EMMA is an XML markup language for containing and annotating the interpretation of user input like a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input. EMMA 1.0 was published as a W3C Recommendation in February 2009. Since then there have been numerous implementations of the standard and extensive feedback has come in regarding desired new features and clarifications needed of existing features. The Multimodal Interaction Working Group examined a range of different use cases for extensions and published a W3C Note on Use Cases for Possible Future EMMA Features. This Version 1.1 document describes a set of new features based on feedback from implementers.

Retired specifications