Group Photo from the Workshop




  • Nagesh Kharidi, Openstream
  • Jim Barnett, Genesys
  • Debbie Dahl, Conversational Technoogies
  • Michael Liguori, What Are Minds For, Inc
  • Holger Banski, Bosch
  • Craig Campbell, iSpeech
  • Yoshiaki Kozaki, NTT-AT
  • Antonio Teixeira, DETI/IEETA
  • Noreen Whysel, IA Institute
  • Bev Corwin, IA Institute
  • Sheau Ng, NBCUniversal
  • Wei-Yun Yau, Institute for Infocomm Research
  • Jens Bachmann, Panasonic
  • Ram Bojanki, Panasonic
  • Ryosuke Aoki, NTT
  • Masaki Umejima, JSCA and Keio University
  • Masao Isshiki, W3C/Keio
  • Kaz Ashimura, W3C/Keio
  • Michael Johnston, AT&T
  • Raj Tumuluri, Openstream
  • Hari Saravanan, Openstream
  • Amy Neustein, Linguistic Technology Systems
  • Myra Einstein, NBCUniversal
  • Suresh Ganesan, Cognizant
  • Phil Sheehy, Openstream
  • Peter Rosenberg, NBCUniversal

Executive Summary

Ease of user-interaction (user experience) with applications has become a prime focus world-wide, thanks to the proliferation of new devices and platforms including mobile phones, tablet devices, eBook readers, and gaming platforms. In addition, traditional platforms such as TV's, audio systems, and automobiles are rapidly becoming capable of much more intelligent interaction than in the past.

User-interaction through speech, touch, gesture and swipe has become the key differentiator in the success of popular applications today. One of the key advantages of the W3C Multimodal Architecture (MMI) is its suitability for simple to sophisticated applications across devices in creating compelling user experiences, leveraging advances in i/o methodologies, and supporting inter-operability among multiple vendors' products.

This workshop was aimed at accentuating the merits of HTML5 and the W3C Multimodal Architecture to help create the appropriate level of awareness of the maturity of the MMI Architecture and its suitability for developing innovative and compelling user-experiences across applications/devices.

Seventeen position papers were submitted, and there were 26 registered participants. There were 18 presentations spread over the two days of the workshop.

Workshop Discussions

Day 1 July 22, 2013

The workshop was opened by Raj Tumuluri, CEO of the Workshop host, Openstream. During the first session, three demos using the W3C Multimodal Architecture specification were presented, illustrating applications in the areas of health care, sentiment analysis, and enhanced interaction for ambient assisted living.

The second session provided an overview of current standards related to multimodal applications: the MMI Architecture, SCXML, EMMA, and other related standards (HTML5, Ajax and WebIntents).

The afternoon sessions began with a presentation by Openstream on its platform-independent MMI Architecture compliant authoring framework (Cue-me).

This was followed by a panel of presentations on multimodal use cases. The use case presentations came from a wide range of industries, including automotive (Bosch), speech technology (iSpeech), publishing (iVVi Media), disaster information (NTT), entertainment (NBCUniversal and Institute for Infocomm Research), and website design (Information Architecture Institute).

In addition, this session included a presentation on the ECHONET standard for consumer electronics by Masaki Umejima - JSCA and Keio University and Masao Isshiki - ECHONET Consortium and W3C/Keio.

Day 2 July 23, 2013

The second day of the workshop began with presentations on new directions in multimodal standards, including discovery and registration, future versions of EMMA, MMI over WebSocket and TV Anywhere.

Following these presentations, the attendees reviewed the discussion topics and prioritized them according to their interests. Service/device discovery and HTML5 integration were the highest priority topics, followed by using EMMA for output, timing and time zones, streaming inputs and additional work on use cases and finally related standards, for example Fido, ECHONET Lite and biometric standards.

The attendees selected Device/Service Discovery for detailed discussion and exploration.

Within the topic of Device/Service Discovery, we noted several use cases of interest:

  1. devices dynamically become part of a group in the workplace
  2. second screen scenarios
  3. sensor input
  4. integrating medical devices

Continuing the discussion of Device/Service Discovery, the attendees brainstormed a number of requirements and issues from the industries represented in the workshop. These included:

  1. dealing with very transient services
  2. publishing the capabilities of a service and the publisher/subscriber model
  3. how to match the semantics of a service with an application's requirements in the areas of capability, availability, and privilege
  4. service discovery in the cloud as well as discovery of nearby services
  5. should there be a service discovery module in the MMI Architecture?
  6. the relationship between service discovery and device discovery
  7. the role of Web Service Description Language (WSDL)
  8. clarification of an API to a service vs a service's capability description
  9. brokering/privilege
  10. user-initiated vs app-initiated discovery
  11. Semantic Web services

Another set of issues centered around related groups and standards, including DNLA, ECHONET, Web Intents, Device API's, and Web and TV Interest Group activities. We agreed that is important to understand the relationships between MMI and these activities.

The afternoon's discussion started with further discussion on selected new directions, focusing on the second high-priority topic, HTML5 integration. Use cases discussed include:

  1. second screen
  2. voice-enabled personal assistant
  3. connected TV
  4. browsers that take multiple inputs
  5. input from other modalities, such as cameras
  6. the general problem of getting HTML5 to be a modality component
  7. issues around including the interaction manager within the browser
  8. synchronizing HTML browsers across different devices
  9. synchronizing with non-HTML displays
  10. compatibility between HTML5 and Echonet
  11. shared browsers in contact centers
  12. timing, including millisecond coordination and processing order
  13. fast communication in gaming applications (HTML5 is still not fast enough)

Possible next steps for attendees include for attendees to join the MMI Working Group. The formation of a Business Group or Community Group that would gather requirements on topics such as service discovery, industry use cases, related standards, and timing issues was also discussed. Another next step would be forming a joint task force between the MMI WG and the Web&TV Interest Group.

The MMI Working Group will provide links to resources such as open source Javascript libraries that can be used for MMI architecture-based applications in conventional browsers. It also plans to organize follow-on webinars on high-priority topics. We expect that Service Discovery is likely to be the next topic.

The final session of the workshop was a hands-on session with the Openstream Cue-me Studio, which allowed the workshop attendees to install Cue-me Studio and develop a simple multimodal mobile application.

The workshop concluded with thanks to everyone and encouragement to join the public MMI mailing list (just send a message to with the subject line "subscribe" (the message body can be empty)).