This page summarizes the relationships among specifications, whether they are finished standards or drafts. Below, each title links to the most recent version of a document. For related introductory information, see: Multimodal Access.
W3C Recommendations have been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and are endorsed by the Director as Web Standards. Learn more about the W3C Recommendation Track.
Group Notes are not standards and do not have the same level of W3C endorsement.
This document describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.
The W3C Multimodal Interaction Working Group aims to develop specifications to enable access to the Web using multimodal interaction. This document is part of a set of specifications for multimodal systems, and provides details of an XML markup language for containing and annotating the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers.
Registration & Discovery of Multimodal Modality Components in Multimodal Systems: Use Cases and Requirements
This document collects a number of use cases together with their goals and requirements for describing, publishing, discovering, registering and subscribing to Modality Components in a system implemented according the Multimodal Architecture Specification.
This document represents a public collection of emotion vocabularies that can be used with EmotionML. It was originally part of an earlier draft of the EmotionML specification, but was moved out of it so that we can easily update, extend and correct the list of vocabularies as required.
This document describes an interoperability test, executed by various members of the Multimodal Interaction Working Group, to demonstrate interoperability of multimodal components which are implementing the "Multimodal Architecture and Interfaces" specification.
This document describes Modality Components in the MMI Architecture which are responsible for controlling the various input and output modalities on various devices by providing guidelines and suggestions for designing Modality Components. Also this document shows several possible examples of Modality Components, (1) face identification, (2) form-filling using handwriting recognition and (3) video display.
This is a sample short description for this specification; over time we will replace this description with a real one.
This document describes a multimodal system which implements the W3C Multimodal Architecture and gives an example of a simple multimodal application authored using various W3C markup languages, including SCXML, CCXML, VoiceXML 2.1 and HTML.
This document is based on the accumulated experience of several years of developing multimodal applications. It provides a collection of common sense advice for developers of multimodal user interfaces.
Several years of multimodal application development in various business areas and on various device platforms has provided developers enough experience to provide detailed feedback about what they like, dislike, and want to see improve and continue. This experience is provided here as an input to the specifications under development in the W3C Multimodal Interaction and Voice Browser Activities.
This document describes the DOM capabilities needed to support a heterogeneous multimodal environment and the current state of DOM interfaces supporting those capabilities. These DOM interfaces are used between modality components and their host environment in the W3C Multimodal Interaction Framework as proposed by the W3C Multimodal Interaction Activity.
The Multimodal Interaction Framework separates multimodal systems into a set of functional units, including Input and Output components, an Interaction Mananger, Session Components, System and Environment, and Application Functions. In order for those functional components to interact with each other to form an application interpreter, the browser implementation must allow for communication and coordination between those components. This DOM interface identifies the DOM APIs used to communicate and coordinate at the browser implemention level. Multimodal browsers can be stand-alone or distributed systems.
This document introduces the W3C Multimodal Interaction Framework, and identifies the major components for multimodal systems. Each component represents a set of related functions. The framework identifies the markup languages used to describe information required by components and for data flowing among components. The W3C Multimodal Interaction Framework describes input and output modes widely used today and can be extended to include additional modes of user input and output as they become available.
This document describes requirements for the Extensible MultiModal Annotation language (EMMA) specification under development in the W3C Multimodal Interaction Activity. EMMA is intended as a data format for the interface between input processors and interaction management systems. It will define the means for recognizers to annotate application specific data with information such as confidence scores, time stamps, input mode (e.g. key strokes, speech or pen), alternative recognition hypotheses, and partial recognition results, etc. EMMA is a target data format for the semantic interpretation specification being developed in the Voice Browser Activity, and which describes annotations to speech grammars for extracting application specific data as a result of speech recognition. EMMA supercedes earlier work on the natural language semantics markup language in the Voice Browser Activity.
This document describes fundamental requirements for the specifications under development in the W3C Multimodal Interaction Activity. These requirements were derived from use case studies as discussed in Appendix A. They have been developed for use by the Multimodal Interaction Working Group (W3C Members only), but may also be relevant to other W3C working groups and related external standard activities.
The requirements cover general issues, inputs, outputs, architecture, integration, synchronization points, runtimes and deployments, but this document does not address application or deployment conformance rules.
The W3C Multimodal Interaction Activity is developing specifications as a basis for a new breed of Web applications in which you can interact using multiple modes of interaction, for instance, using speech, hand writing, and key presses for input, and spoken prompts, audio and visual displays for output. This document describes several use cases for multimodal interaction and presents them in terms of varying device capabilities and the events needed by each use case to couple different components of a multimodal application.
Below are draft documents: Proposed Recommendations, other Working Drafts. Some of these may become Web Standards through the W3C Recommendation Track process. Others may be published as Group Notes or become obsolete specifications.
As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The present draft specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and basis in science. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.
EMMA is an XML markup language for containing and annotating the interpretation of user input like a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input. EMMA 1.0 was published as a W3C Recommendation in February 2009. Since then there have been numerous implementations of the standard and extensive feedback has come in regarding desired new features and clarifications needed of existing features. The Multimodal Interaction Working Group examined a range of different use cases for extensions and published a W3C Note on Use Cases for Possible Future EMMA Features. This Version 1.1 document describes a set of new features based on feedback from implementers.