Open, Interoperable Means For Integrated Multimodal Interaction

T. V. Raman



At the recently held W3C workshop on compound documents and web applications, mixed-namespace XML documents comprising of W3C technologies that address specific design points on the Web were identified as the means for building next-generation Web applications. Since the inception of XML in the second half of the 90's, the W3C has defined a set of reusable XML technologies that lend themselves well to this end. In this position paper, we examine the coming together of existing W3C specifications in addressing the goals of multimodal interaction, i.e., Web applications that allow the user to interact using multiple synchronized modes of interaction to achieve rapid task completion.

Table of Contents

1. Multimodal Web Applications

Multimodal interaction — the means by which multiple interaction modalities such as GUI, pen and speech interaction come together to provide a seamless, synchronized end-user experience — has been the focus of the W3C MMI activity since early 2002. Multimodal interaction can be best achieved by bringing together W3C technologies that have been designed to address individual interaction modalities, e.g., SVG and XHTML for visual interaction and VoiceXML for voice interaction to create mixed-namespace XML documents. Eventing can form the common glue when bringing together these different technologies to enable seamless user interaction; XML Events provides a consistent means of authoring event bindings in such mixed-namespace XML documents.

More recently, the W3C workshop on Web applications and compound documents identified such mixed-namespace documents as the next step in building web applications. Interestingly, the approach of combining different W3C specifications into mixed-namespace documents and using standard DOM2 eventing to handle user interaction across such documents has been consistently addressed by different groups as enumerated below; this shows that the approach is both viable and robust.


Combines Synchronized Multimedia Interaction Language (SMIL) with XHTML to create multimedia presentations. This is of particular interest to mobile vendors wishing to deliver multimedia services.


At the Web Applications Workshop, participants from the mobile industry identified the combination of XHTML and SVG Mobile as a key need for delivering rich visual interaction to mobile devices.


Starting in fall of 2002, major industry players including IBM have delivered a variety of multimodal solutions using a combination of XHTML and VoiceXML as documented in the X+V specification. This design enables several different forms of multimodal deployment ranging from thick clients to distributed multimodal applications that run on the network.

In all such compound (mixed-namespace) documents, XML modularization defines syntax, Cascaded Style Sheets (CSS) defines presentation semantics, and XML Events defines user interaction semantics. Going forward, we believe that multimodal interaction using a combination of open standards and interoperable Web Services will lead to a rich suite of end-user applications. Key end-user applications include:

Multimodal interaction in the car (telematics).
Multimodal interaction on mobile devices e.g., PDAs and smart phones.
Information kiosks in noisy environments.

Key challenges in this area include deploying such multimodal services across the network as distributed applications; for instance, it may be necessary to off-load complex processing to the network when working with mobile devices. Additionally, the global network remains the richest source of information services for the mobile client; this makes the integration of Web Services, mobile devices and multimodal interaction the next potential killer application.


[w3c-mmi] W3C MMI . Multimodal Interaction Activity.

[w3c-webapps] W3C WebApps . W3C Workshop On Web Applications And Compound Documents.

[ibm-webapps-paper] IBM WebApps Position Paper. Authoring, Deploying And Consuming Dynamic Web Applications Using Mixed-Namespace XML Documents.

[ibm-webapps-slides] IBM WebApps Slides. Authoring, Deploying And Consuming Dynamic Web Applications Using Mixed-Namespace XML Documents.

[w3c-xforms] XForms. W3C XForms 1.0.

[w3c-voicexml] VoiceXML 2.0. W3C VoiceXML 2.0.

[w3c-xevents] XML Events . XML Events 1.0.

[xhtml-voice] X+V. W3C Note --- Combining XHTML And VoiceXML.

[w3c-svg] SVG Mobile SVG 1.1 . Scalable Vector Graphics --- SVG.

[w3c-smil] SMIL 2.0. Synchronized Multimedia Integration Language.

[xhtml-smil] XHTML+SMIL . Combining XHTML And SMIL.

[xv-deploy] Versatile Multimodal Solutions . X+V --- Authoring, Deploying And Consuming Multimodal Services.