W3C Multimodal Standard Brings Web to More People, More Ways

Author(s) and publish date


EMMA Facilitates Interaction Through Keyboard, Mouse, Voice, Speech, Touch, Gesture



http://www.w3.org/ -- 10 February 2009 -- As part of ensuring the Web is available to all people on any device, W3C published a new standard today to enable interactions beyond the familiar keyboard and mouse. EMMA, the Extensible MultiModal Annotation specification, promotes the development of rich Web applications that can be adapted to more input modes (such as handwriting, natural language, and gestures) and output modes (such as synthesized speech) at lower cost.

Eyes-Busy, Hands-Free, and More

As more people begin to use the Web in more situations, opportunities for multimodal interactions have multiplied. Early handheld devices allowed input through stylus or voice. Touch screens and devices that detect motion and orientation are increasingly commonplace in some markets. EMMA allows developers to separate the "logic" layer of an application from the "interaction" layer, making it easier to adapt applications to new scenarios.

In addition, because some input modalities are more prone to "noise" than others (for example due to variations in spoken language or handwriting, or simply background noise), EMMA helps developers manage the varying degrees of confidence one might have in input information. EMMA allows developers to account for ambiguity in user input so that during later stages of processing, it is possible to select from among competing hypotheses and overcome errors. EMMA also makes supplementary information about interactions (such as the interaction date) available to developers.

Multimodal Benefits Mobile Access and Accessibility

The EMMA standard will be particularly important in the mobile industry. By following EMMA standards and multimodal design, applications are more likely to be adaptable to the mobile context. For instance, most cell phones are capable of receiving both voice and text input. With EMMA, it will be easier to create applications that can take advantage of text, voice, or both.

Applications designed to be multimodal are also more likely to benefit people with disabilities. Multimodal input systems provide alternate methods for Web interaction and access for people with visual, auditory, physical, cognitive and neurological disabilities. Those without keyboard operation abilities can rely on speech recognition; those who use touch commands without firm authority may rely on EMMA mechanisms for interpreting uncertainty.

EMMA was developed by the Multimodal Interaction Working Group which included the following W3C Members: Aspect Communications, AT&T, Cisco Systems, Department of Information and Communication Technology - University of Trento, Deutsche Telekom AG, France Telecom, Genesys Telecommunications Laboratories, German Research Center for Artificial Intelligence (DFKI) Gmbh, Hewlett Packard Company, Institut National de Recherche en Informatique et en Automatique, International Webmasters Association / HTML Writers Guild (IWA-HWG), Korea Association of Information & Telecommunication, Korea Institute of Science & Technology (KIST), Kyoto Institute of Technology, Loquendo, S.p.A., Microsoft Corp., Nuance Communications, Inc., Openstream, Inc., Siemens AG, Université catholique de Louvain, V-Enable, Inc., Voxeo, and Waterloo Maple.

About the World Wide Web Consortium (W3C)

The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards and guidelines designed to ensure long-term growth for the Web. Over 400 organizations are Members of the Consortium. W3C is jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan, and has seventeen outreach offices worldwide. For more information see http://www.w3.org

Testimonials for EMMA 1.0 Recommendation


As a common language for representing multimodal input, EMMA lays a cornerstone upon which more advanced architectures and technologies can be developed to enable natural multimodal interactions. We are glad that EMMA has become a W3C Recommendation and pleased with the capabilities that EMMA brings to the multimodal interactions over the Web.

— Wu Chou, Director, Avaya Labs Research, Avaya

Conversational Technologies

Conversational Technologies strongly supports the W3C Extensible MultiModal Annotation 1.0 (EMMA) standard. By providing a standardized yet extensible and flexible basis for representing user input, we believe EMMA has tremendous potential for making possible a wide variety of innovative multimodal applications and research directions. Conversational Technologies has also found EMMA to be very helpful in helping students understand the principles of natural language processing through its open source EMMA implementation.

— Deborah Dahl, Principal, Conversational Technologies


DFKI appreciates that the Extensible MultiModal Annotation markup language has become a W3C Recommendation.

The definition of EMMA represents a significant step towards the realization of a multimodal interaction infrastructure in a wide range of ICT applications. DFKI found EMMA a very useful instrument for the realization of multimodal dialog systems and has adopted it for the representation of user input in the context of several large consortia projects like SMARTWEB and THESEUS together with its industrial shareholders including SAP, Bertelsmann, Deutsche Telekom BMW and Daimler.

DFKI is pleased to have contributed to the realization of EMMA and will support future work on new EMMA features such as the representation of multimodal output and support for emotion detection and representation.

— Professor Wolfgang Wahlster, Chief Executive Officer and Scientific Director of DFKI GmbH, The German Research Centre for AI

Kyoto Institute of Technology

Kyoto Institute of Technology (KIT) strongly supports the Extensible MultiModal Annotation 1.0 (EMMA) specification. We have been using EMMA within our multimodal human-robot interaction system. EMMA documents are dynamically generated by (1) the Automatic Speech Recognition (ASR) component and (2) the Face Detection/Behavior Recognition component in our implementation.

In addition, the Information Technology Standards Commission of Japan (ITSCJ), which includes KIT as a member, also has a plan to use EMMA as a data format for their own multimodal interaction architecture specification. ITSCJ believes EMMA is very useful for both uni-modalrecognition component, e.g., ASR, and multimodal integration component, e.g., speech with pointing gesture.

— Associate Professor Masahiro Araki, Interactive Intelligence lab., Department of Information Science, Graduate School of Science and Technology, Kyoto Institute of Technology


Extensible MultiModal Annotation (EMMA) 1.0 provides a rich language for representing a variety of input modes within speech-enabled and multimodal applications - such as speech, handwriting and gesture recognition. Loquendo welcomes the EMMA 1.0 W3C Recommendation because it will facilitate the creation of multimodal applications as well as more powerful speech applications, and, the company believes, will facilitate innovation, advance the Web and give businesses a decisive competitive edge.

Loquendo is a longstanding, participating member of the W3C Multimodal Interaction and Voice Browser working groups, as well as the IETF and the Voice XML Forum, and has already implemented EMMA 1.0 into the Loquendo MRCP Server.

— Daniele Sereno, Vice President Product Engineering, Loquendo

University of Trento

We believe that EMMA covers a wide variety of innovative multimodal applications. We expect that EMMA 1.0 will play a key role in the development of interoperable communication technologies as well as enable innovative research platforms.

— Prof. Dr. Ing. Giuseppe Riccardi, Director of the Adaptive Multimodal Information and Interfaces Lab, Department of Information Engineering and Computer Science, University of Trento

Related RSS feed