Host

Summary

Attendees

Nagesh Kharidi, Openstream
Jim Barnett, Genesys
Debbie Dahl, Conversational Technoogies
Michael Liguori, What Are Minds For, Inc
Holger Banski, Bosch
Craig Campbell, iSpeech
Yoshiaki Kozaki, NTT-AT
Antonio Teixeira, DETI/IEETA
Noreen Whysel, IA Institute
Bev Corwin, IA Institute
Sheau Ng, NBCUniversal
Wei-Yun Yau, Institute for Infocomm Research
Jens Bachmann, Panasonic
Ram Bojanki, Panasonic
Ryosuke Aoki, NTT
Masaki Umejima, JSCA and Keio University
Masao Isshiki, W3C/Keio
Kaz Ashimura, W3C/Keio
Michael Johnston, AT&T
Raj Tumuluri, Openstream
Hari Saravanan, Openstream
Amy Neustein, Linguistic Technology Systems
Myra Einstein, NBCUniversal
Suresh Ganesan, Cognizant
Phil Sheehy, Openstream
Peter Rosenberg, NBCUniversal

Executive Summary

Ease of user-interaction (user experience) with applications has become a prime focus world-wide, thanks to the proliferation of new devices and platforms including mobile phones, tablet devices, eBook readers, and gaming platforms. In addition, traditional platforms such as TV's, audio systems, and automobiles are rapidly becoming capable of much more intelligent interaction than in the past.

User-interaction through speech, touch, gesture and swipe has become the key differentiator in the success of popular applications today. One of the key advantages of the W3C Multimodal Architecture (MMI) is its suitability for simple to sophisticated applications across devices in creating compelling user experiences, leveraging advances in i/o methodologies, and supporting inter-operability among multiple vendors' products.

This workshop was aimed at accentuating the merits of HTML5 and the W3C Multimodal Architecture to help create the appropriate level of awareness of the maturity of the MMI Architecture and its suitability for developing innovative and compelling user-experiences across applications/devices.

Seventeen position papers were submitted, and there were 26 registered participants. There were 18 presentations spread over the two days of the workshop.

Workshop Discussions

Day 1 July 22, 2013

The workshop was opened by Raj Tumuluri, CEO of the Workshop host, Openstream. During the first session, three demos using the W3C Multimodal Architecture specification were presented, illustrating applications in the areas of health care, sentiment analysis, and enhanced interaction for ambient assisted living.

The second session provided an overview of current standards related to multimodal applications: the MMI Architecture, SCXML, EMMA, and other related standards (HTML5, Ajax and WebIntents).

The afternoon sessions began with a presentation by Openstream on its platform-independent MMI Architecture compliant authoring framework (Cue-me).

This was followed by a panel of presentations on multimodal use cases. The use case presentations came from a wide range of industries, including automotive (Bosch), speech technology (iSpeech), publishing (iVVi Media), disaster information (NTT), entertainment (NBCUniversal and Institute for Infocomm Research), and website design (Information Architecture Institute).

In addition, this session included a presentation on the ECHONET standard for consumer electronics by Masaki Umejima - JSCA and Keio University and Masao Isshiki - ECHONET Consortium and W3C/Keio.

Day 2 July 23, 2013

The second day of the workshop began with presentations on new directions in multimodal standards, including discovery and registration, future versions of EMMA, MMI over WebSocket and TV Anywhere.

Following these presentations, the attendees reviewed the discussion topics and prioritized them according to their interests. Service/device discovery and HTML5 integration were the highest priority topics, followed by using EMMA for output, timing and time zones, streaming inputs and additional work on use cases and finally related standards, for example Fido, ECHONET Lite and biometric standards.

The attendees selected Device/Service Discovery for detailed discussion and exploration.

Within the topic of Device/Service Discovery, we noted several use cases of interest:

devices dynamically become part of a group in the workplace
second screen scenarios
sensor input
integrating medical devices

Continuing the discussion of Device/Service Discovery, the attendees brainstormed a number of requirements and issues from the industries represented in the workshop. These included:

dealing with very transient services
publishing the capabilities of a service and the publisher/subscriber model
how to match the semantics of a service with an application's requirements in the areas of capability, availability, and privilege
service discovery in the cloud as well as discovery of nearby services
should there be a service discovery module in the MMI Architecture?
the relationship between service discovery and device discovery
the role of Web Service Description Language (WSDL)
clarification of an API to a service vs a service's capability description
brokering/privilege
user-initiated vs app-initiated discovery
Semantic Web services

Another set of issues centered around related groups and standards, including DNLA, ECHONET, Web Intents, Device API's, and Web and TV Interest Group activities. We agreed that is important to understand the relationships between MMI and these activities.

The afternoon's discussion started with further discussion on selected new directions, focusing on the second high-priority topic, HTML5 integration. Use cases discussed include:

second screen
voice-enabled personal assistant
connected TV
browsers that take multiple inputs
input from other modalities, such as cameras
the general problem of getting HTML5 to be a modality component
issues around including the interaction manager within the browser
synchronizing HTML browsers across different devices
synchronizing with non-HTML displays
compatibility between HTML5 and Echonet
shared browsers in contact centers
timing, including millisecond coordination and processing order
fast communication in gaming applications (HTML5 is still not fast enough)

Possible next steps for attendees include for attendees to join the MMI Working Group. The formation of a Business Group or Community Group that would gather requirements on topics such as service discovery, industry use cases, related standards, and timing issues was also discussed. Another next step would be forming a joint task force between the MMI WG and the Web&TV Interest Group.

The MMI Working Group will provide links to resources such as open source Javascript libraries that can be used for MMI architecture-based applications in conventional browsers. It also plans to organize follow-on webinars on high-priority topics. We expect that Service Discovery is likely to be the next topic.

The final session of the workshop was a hands-on session with the Openstream Cue-me Studio, which allowed the workshop attendees to install Cue-me Studio and develop a simple multimodal mobile application.

The workshop concluded with thanks to everyone and encouragement to join the public MMI mailing list (just send a message to www-multimodal-request@w3.org with the subject line "subscribe" (the message body can be empty)).