EMMA: Extensible MultiModal Annotation markup language Version 2.0

W3C Working Group Note 2 February 2017

This version:: https://www.w3.org/TR/2017/NOTE-emma20-20170202/
Latest published version:: https://www.w3.org/TR/emma20/
Previous version:: https://www.w3.org/TR/2015/WD-emma20-20150908/
Latest editor's draft:: https://w3c.github.io/emma/emma2_0/emma_2_0_editor_draft.html

Editor:: Michael Johnston (W3C Invited Expert)
Authors:: Deborah A. Dahl (W3C Invited Expert); Tim Denney (The Boeing Company; until 2015); Nagesh Kharidi (Openstream)

Abstract

The W3C Multimodal Interaction Working Group aims to develop specifications to enable access to the Web using multimodal interaction. This document is part of a set of specifications for multimodal systems, and provides details of an XML markup language for containing and annotating the interpretation of user input and production of system output. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers. Examples of stages in the production of a system output, are creation of a semantic representation, an assignment of that representation to a particular modality or modalities, and a surface string for realization by, for example, a text-to-speech engine. The production of the system's output is expected to be generated by output production processes, such as a dialog manager, multimodal presentation planner, content planner, and other types of processors such as surface generation.

Status of this Document

Beware. This specification is no longer in active maintenance and the Multimodal Interaction Working Group does not intend to maintain it further.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This specification describes markup for representing interpretations of user input (speech, keystrokes, pen input etc.) and productions of system output together with annotations for confidence scores, timestamps, medium etc., and forms part of the proposals for the W3C Multimodal Interaction Framework.

The EMMA: Extensible Multimodal Annotation 1.0 specification was published as a W3C Recommendation in February 2009. Since then there have been numerous implementations of the standard and extensive feedback has come in regarding desired new features and clarifications requested for existing features. The W3C Multimodal Interaction Working Group examined a range of different use cases for extensions of the EMMA specification and published a W3C Note on Use Cases for Possible Future EMMA Features [EMMA Use Cases]. In this working draft of EMMA 2.0, we have developed a set of new features based on feedback from implementers and have also added clarification text in a number of places throughout the specification. The new features include: support for adding human annotations (emma:annotation, emma:annotated-tokens), support for inline specification of process parameters (emma:parameters, emma:parameter, emma:parameter-ref), support for specification of models used in processing beyond grammars (emma:process-model, emma:process-model-ref), extensions to emma:grammar to enable inline specification of grammars, a new mechanism for indicating which grammars are active (emma:grammar-active, emma:active), support for non-XML semantic payloads (emma:result-format), support for multiple emma:info elements and reference to the emma:info relevant to an interpretation (emma:info-ref), and a new attribute to complement the emma:medium and emma:mode attributes that enables specification of the modality used to express an input (emma:expressed-through). In addition we have extended the specification to handle the production of system output, by adding the new element, emma:output and added a series of annotations enabling the use of EMMA for incremental results (Section 4.2.24).

Not addressed in this draft, but planned for a later Working Draft of EMMA 2.0, is a JSON serialization of EMMA documents for use in contexts were JSON is better suited than XML for representing user inputs and system outputs.