EMMA: Extensible MultiModal Annotation markup language

W3C Working Draft 14 December 2004

Wu Chou, Avaya
Deborah A. Dahl, Independent Consultant
Michael Johnston, AT&T
Roberto Pieraccini, IBM
Dave Raggett, W3C/Canon


The W3C Multimodal Interaction working group aims to develop specifications to enable access to the Web using multi-modal interaction. This document is part of a set of specifications for multi-modal systems, and provides details of an XML markup language for describing the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers.

This specification describes markup for representing interpretations of user input (speech, keystrokes, pen input etc.) together with annotations for confidence scores, timestamps, input medium etc., and forms part of the proposals for the W3C Multimodal Interaction Framework. This version of EMMA is the first to include the associated XML schema, see section 7.1.

