The mission of the Multimodal Interaction Working Group, part of the Multimodal Interaction Activity, is to develop open standards that enable the following vision:
End date | 31 March 2011 |
---|---|
Confidentiality | Proceedings are Member-only, but the group sends regular summaries of ongoing work to the public mailing list. |
Initial Chairs | Deborah Dahl |
Initial Team Contacts (FTE %: 30) |
Kazuyuki Ashimura |
Usual Meeting Schedule | Teleconferences: Weekly (1 main group call and 3 task force calls) Face-to-face: as required up to three per year |
The primary goal of this group is to develop W3C Recommendations that enable multimodal interaction with various devices including desktop PCs, mobile phones and less traditional platforms such as cars and intelligent home environments. For rapid adoption on a global scale, it should be possible to add simple multimodal capabilities to existing markup languages in a way that is backwards compatible with widely deployed devices, and which builds upon widespread familiarity with existing Web technologies. The standards should be scalable to enable richer capabilities for subsequent generations of multimodal devices.
Users will be able to provide input via speech, handwriting, motion or keystrokes, with output presented via displays, pre-recorded and synthetic speech, audio, and tactile mechanisms such as mobile phone vibrators and Braille strips. Application developers will be able to provide an effective user interface for whichever modes the user selects. To encourage rapid adoption, the same content can be designed for use on both old and new devices. The capability of possible multimodal access depends on the devices used. For example, users of multimodal devices which include not only keypads but also touch panel, microphone and motion sensor can enjoy all the possible modalities, while users of devices with restricted capability prefer simpler and lighter modalities like keypads and voice.
The target audience of the Multimodal Interaction Working Group are vendors and service providers of multimodal applications, and should include a range of organizations in different industry sectors like:
Under previous charters, going back to 2002, The Multimodal Interaction Working Group has created the following W3C Technical Reports
The following suite of specifications published by the group is known as the W3C Multimodal Interaction Framework.
In addition to the above, here is a list of documents published by the group.
To assist with realizing this goal, the Multimodal Interaction Working Group is tasked with providing a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents. The framework is motivated by several basic design goals including (1) Encapsulation, (2) Distribution, (3) Extensibility, (4) Recursiveness and (5) Modularity.
Where practical this should leverage existing W3C work. See the list of dependencies for the information of related groups.
The Working Group may expand the Multimodal Architecture and Interfaces document and include exploration of some language definition which would make it easier to adapt existing I/O devices and software, e.g., an MRCP-enabled ASR engine, to interact with the framework of the life cycle events defined in the architecture.
The working group will investigate and recommend how various W3C languages can be extended for use in a multimodal environment using the multimodal life cycle events. The Working Group may prepare W3C notes on how the following languages to participate in multimodal specifications by incorporating the life cycle events from the multimodal architecture: XHTML, VoiceXML, MathML, SMIL, SVG, InkML and other languages that can be used in a multimodal environment. The working group is also interested in investigating how CSS and Delivery Context: Client Interfaces (DCCI) can be used to support Multimodal Interaction applications, and if appropriate, may write a W3C Note.
Define Modality Components in the MMI Architecture which are responsible for controlling the various input and output modalities on various devices. The possible examples of Modality Components include ink capture and playback, biometrics, media capture and playback (audio, video, images, sensor data), speech recognition, text to speech, SVG, geolocation, voice dialog and emotion.
The group will generate a document which describes (1) the detailed definition of Modality Components and (2) how to build concrete Modality Components. This is not expected a recommendation track document but should be folded into an informative appendix of the Multimodal Architecture and Interfaces specification. Even though the group will provide several actual Modality Component codes as examples for the document, the main purpose of the document is not providing actual codes for various possible Modality Components but clearly defining what Modality Components are and how to build them.
The Working Group will also create separate documents for selected Ink and Voice modalities. Those documents will be first published as WG Notes and then incorporated with the Multimodal Architecture and Interfaces specification. The first public WG Notes are expected in 4Q 2009.
EMMA is a data exchange format for the interface between different levels of input processing and interaction management in multimodal and voice-enabled systems. It provides the means for input processing components, such as speech recognizers, to annotate application specific data with information such as confidence scores, time stamps, and input mode classification (e.g. key strokes, touch, speech, or pen). EMMA also provides mechanisms for representing alternative recognition hypotheses and groups and sequences of inputs. EMMA is a target data format for the semantic interpretation specification developed in the W3C Voice Browser Activity, which defines augmentations to speech grammars that allow extraction of application specific data as a result of speech recognition. EMMA supercedes earlier work on the natural language semantics markup language (NLSML) in the Voice Browser Activity.
EMMA became a W3C Recommendation in February 2009 and is the first specification from the W3C Multimodal Interaction working group to become a recommendation. The EMMA specification has been implemented by more than 10 different companies and institutions.
In the period defined by the new charter, the group will actively maintain and address any issues that arise with the existing EMMA specification, possibly resulting in the publication of an interim draft. The group will also investigate potential extensions of the EMMA language to support new features such as:
The group may publish an updated version of EMMA specification with the above capability.
Bringing InkML to Recommendation
InkML provides a range of features to support real-time ink streaming, multi-party interactions and richly annotated ink archival. Applications may make use of as much or as little information as required, from minimalist applications using only simple traces to more complex problems, such as signature verification or calligraphic animation, requiring full dynamic information. As a platform-neutral format for digital ink, InkML can support collaborative or distributed applications in heterogeneous environments, such as courier signature verification and distance education. The specification is the product of several years of work by a cross-sector working group with input from Apple, Corel, HP, IBM, Maplesoft, Microsoft and Motorola as well as invited experts from academia and other sources.
This work includes defining Modality Components for various possible Ink applications, e.g., Ink capture, Ink playback, handwriting recognition, gesture recognition. The work on defining Modality Components will be held in collaboration with the "Modality Components Definition" work described above.
Bring Emotion Markup Language to Candidate Recommendation.
EmotionML will provide representations of emotions and related states for technological applications. The possible use cases include:
Some of those applications already exist on the market, while others only as research prototypes. However, development is very fast in this area.
Notation for emotions is needed to be standardized, because emotions are conceptually clear in the scientific literature but engineers tend to get it wrong when they try to create actual applications. W3C can help avoid fragmentation of emotion-related technology by providing a scientifically well-founded format that can be generally used.
Naturalistic, interactive multimodal applications need to account for emotions and related human factors. EmotionML will serve as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.
The specification of EmotionML can build on previous work of the Emotion Incubator Group and the Emotion Markup Language Incubator Group. These groups have identified use cases and requirements, and have drafted elements of a specification for a core set of requirements. The work of those groups have shown that there is a high degree of consensus already on how to represent emotions, so the Multimodal Interaction Working Group thinks a standardization looks feasible.
The following design goals motivate the specification:
The EmotionML task in the Multimodal Interaction Working Group will continue the work of the Emotion Markup Language Incubator Group in the Recommendation Track, and aims to produce a W3C Recommendation. Due to the previous work, we expect to have a First Public Working Draft within the first three months of this charter, and to make further rapid progress after that.
Following the publication of EMMA and InkML for Recommendations, the Working Group will be maintaining the specifications; that is, responding to questions and requests on the public mailing list and issuing errata as needed. The Working Group will also consider publishing additional versions of the specification, depending on such factors as feedback from the user community and any requirements generated for EMMA and InkML by the Multimodal Architecture and Interfaces work and the Multimodal Authoring work.
For each document to advance to proposed Recommendation, the group will produce a technical report with at least two independent and interoperable implementations for each required feature. The working group anticipates two interoperable implementations for each optional feature but may reduce the criteria for specific optional features.
The following documents are expected to become W3C Recommendations:
The following documents are either notes or are not expected to advance toward Recommendation:
This Working Group is chartered to last until 31 January 2011. The first face to face meeting after re-chartering is planned in association with the technical plenary.
Here is a list of milestones identified at the time of re-chartering. Others may be added later at the discretion of the Working Group. The dates are for guidance only and subject to change.
Note: The group will document significant changes from this initial schedule on the group home page. | ||||||
Specification | FPWD | LC | CR | PR | Rec | |
---|---|---|---|---|---|---|
Multimodal Architecture and Interfaces | Completed | March 2010 | June 2010 | 4Q 2010 | 1Q 2011 | |
EMMA 2.0 | 4Q 2009 | January 2011 | TBD | TBD | TBD | |
InkML | Completed | 1st LC: Completed 2nd LC: September 2009 |
November 2009 | April 2010 | June 2010 | |
EmotionML | October 2009 | July 2010 | March 2011 | TBD | TBD | |
Ink Modality Component Definition | December 2009 (as a WG Notes) |
- | - | - | - | |
Voice Modality Component Definition | December 2009 (as a WG Notes) |
- | - | - | - |
These are W3C activities that may be asked to review documents produced by the Multimodal Interaction Working Group, or which may be involved in closer collaboration as appropriate to achieving the goals of the Charter.
synchronized text and video
The MMI Architecture may include a video content server as a Modality Component, so collaboration on how to handle a Modality Component for video service would be beneficial to both groups.
The Working Group may cooperate with two other Working Groups ( Media Fragments, Media Annotations) in the Video in the Web activity as well.
Furthermore, Multimodal Interaction Working Group expects to follow these W3C Recommendations:
This is an indication of external groups with complementary goals to the Multimodal Interaction activity. W3C has formal liaison agreements with some of them, e.g. VoiceXML Forum.
To be successful, the Multimodal Interaction Working Group is expected to have 10 or more active participants for its duration. Effective participation in the Multimodal Interaction Working Group is expected to consume one work day per week for each participant; two days per week for editors. The Multimodal Interaction Working Group will also allocate the necessary resources for building Test Suites for each specification.
In order to make rapid progress, the Multimodal Interaction Working Group consists of several subgroups, each working on a separate document. The Multimodal Interaction Working Group members may participate in one or more subgroups.
Participants are reminded of the Good Standing requirements of the W3C Process.
Experts from appropriate communities may also be invited to join the working group, following the provisions for this in the W3C Process.
Working Group participants are not obligated to participate in every work item, however the Working Group as a whole is responsible for reviewing and accepting all work items.
For budgeting purposes, we may hold up to three full group face-to-face meetings per year if we believe them to be beneficial. The Working Group anticipate holding a face-to-face meeting in association with the technical plenary. The Chair will make Working Group meeting dates and locations available to the group in a timely manner according to the W3C Process. The Chair is also responsible for providing publicly accessible summaries of Working Group face to face meetings, which will be announced on www-multimodal@w3.org.
This group primarily conducts its work on the Member-only mailing list w3c-mmi-wg@w3.org (archive). Certain topics need coordination with external groups. The Chair and the Working Group can agree to discuss these topics on a public mailing list. The archived mailing list www-multimodal@w3.org is used for public discussion of W3C proposals for Multimodal Interaction Working Group and for public feedback on the group's deliverables.
Information about the group (deliverables, participants, face-to-face meetings, teleconferences, etc.) is available from the Multimodal Interaction Working Group home page.
All proceedings of the Working Group (mail archives, teleconference minutes, face-to-face minutes) will be available to W3C Members. Summaries of face-to-face meetings will be sent to the public list.
As explained in the Process Document (section 3.3), this group will seek to make decisions when there is consensus. When the Chair puts a question and observes dissent, after due consideration of different opinions, the Chair should record a decision (possibly after a formal vote) and any objections, and move on.
This charter is written in accordance with Section 3.4, Votes of the W3C Process Document and includes no voting procedures beyond what the Process Document requires.
This Working Group operates under the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis.
For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.
This charter for the Multimodal Interaction Working Group has been created according to section 6.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.
The most important changes from the previous charter are:
Copyright© 2009 W3C ® (MIT , ERCIM , Keio), All Rights Reserved.
$Date: 2009/08/11 14:43:22 $