W3C

Multimodal Interaction Specifications and Notes

This is intended to provide a brief summary of each of the Multimodal Interaction Working Group's major work items.

Background

This suite of specifications is known as the W3C Multimodal Interaction Framework.

Specifications

The following lists current and completed specifications. Additional work is expected on topics described in the Scope section of the charter.

Multimodal Architecture

Main Architecture specification

The MMI Architecture provides a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the use of well-defined interfaces between its constituents. The framework is motivated by several basic design goals including (1) Encapsulation, (2) Distribution, (3) Extensibility, (4) Recursiveness and (5) Modularity. The MMI Architecture includes Modality Components, which process specific modalities such as speech or handwriting, an Interaction Manager, which coordinates processing among the Modality Components, and the Life Cycle events, which support communication between the Interaction Manager and the Modality Components.

MMI Authoring
MMI Best Practices

Extensible Multi-Modal Annotations (EMMA)

EMMA 1.0

EMMA is a data exchange format for the interface between different levels of input processing and interaction management in multimodal and voice-enabled systems. It provides the means for input processing components, such as speech recognizers, to annotate application specific data with information such as confidence scores, time stamps, and input mode classification (e.g. key strokes, touch, speech, or pen). EMMA also provides mechanisms for representing alternative recognition hypotheses including lattice and groups and sequences of inputs. EMMA 1.0 has been completed. The group will publish a new EMMA 1.1 version of the specification which incorporates new features that address issues brought up through EMMA implementations.

Emma 1.1

Emma 1.1 includes a set of new features based on feedback from implementers as well as added clarification text in a number of places throughout the specification. The new features include: support for adding human annotations (emma:annotation, emma:annotated-tokens), support for inline specification of process parameters (emma:parameters, emma:parameter, emma:parameter-ref), support for specification of models used in processing beyond grammars (emma:process-model, emma:process-model-ref), extensions to emma:grammar to enable inline specification of grammars, a new mechanism for indicating which grammars are active (emma:grammar-active, emma:active), support for non-XML semantic payloads (emma:result-format), support for multiple emma:info elements and reference to the emma:info relevant to an interpretation (emma:info-ref), and a new attribute to complement the emma:medium and emma:mode attributes that enables specification of the modality used to express an input (emma:expressed-through).

InkML - an XML language for digital ink traces

InkML provides a range of features to support real-time ink streaming, multi-party interactions and richly annotated ink archival. Applications may make use of as much or as little information as required, from minimalist applications using only simple traces to more complex problems, such as signature verification or calligraphic animation, requiring full dynamic information. As a platform-neutral format for digital ink, InkML can support collaborative or distributed applications in heterogeneous environments, such as courier signature verification and distance education. This work is complete as InkML has reached the Recommendation stage. However, the Multimodal Interaction Working Group welcomes feedback on the InkML standard.

Emotion Markup Language (EmotionML) 1.0

EmotionML provides representations of emotions and related states for technological applications. As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Working Group Notes

Working Group Notes are non standards-track documents that support, clarify, or otherwise provide additional information about the specifications.

Architecture:

The group has published several Notes that provide additional information about the Multimodal Architecture and Interfaces specification.

EmotionML

EMMA

Since EMMA 1.0 became a W3C Recommendation, a number of new possible use cases for the EMMA language have emerged. These include the use of EMMA to represent multimodal output, biometrics, emotion, sensor data, multi-stage dialogs, and interactions with multiple users. So the Working Group have decided to work on a document capturing use cases and issues for a series of possible extensions to EMMA, and published a Working Group Note to seek feedback on the various different use cases.