Position paper for the W3C/WAP Workshop on the Multi-modal Web

Stéphane H. Maes and T. V. Raman
IBM Research
Human Language Technologies Group


This position paper outlines an approach for authoring once for the future mobile internet where information is expected to be accessible anytime, anywhere, through any device and where the user can at any given time select to use the device and modality best suited to his or her abilities and capabilities at that moment.

We first define some terminology:

Channel: it denotes a particular device or a particular modality.

multi-channel applications: applications designed for ubiquitous access through different channels, one channel at a time.

multi-modal applications: multi-channel applications, where multiple channels are simultaneously available and synchronized.

Multiple authoring

Content can be created by multiple authoring:

In addition, for multi-modal applications, the developer must also specify the synchronization of the different channels.

Single Authoring For the Mobile Internet

We feel there is a strong need for a language that supports single authoring across a large variety of devices and modalities. As a first step, we would like to focus the W3C/WAP workshop on collecting requirements for such a language.

Motivation For Single Authoring

Single authoring is motivated by the need to author, maintain, and revise content for delivery to an ever increasing plethora of end-user devices. Hand authoring of the target pages leads to the “M times N problem”. An application composed on M “pages” to be accessed via N devices requires M x N authoring steps and it results into M x N presentation pages to maintain. Generic separation of content from presentation results into non-re-usable style sheets and a similar M x N problem with the style sheets.

Multiple authoring is an even more complex problem when synchronization is needed across channels.

Single Document --Multiple Views

We advocate a programming approach that enables separation of specific content from the presentation enabling re-usable style sheets for default presentation in the final form. Specialization can then be performed in-line or via channel specific style sheets.

The underlying principle of single authoring is the Model View Controller:

Separating content from presentation in order to achieve content re-use is now the accepted way of deploying future information on the World Wide Web. In the current W3C architecture, such separation is achieved by representing content in XML that is then transformed to appropriate final-form presentations via XSL transforms.

Appropriate factorization

We believe that single authoring can be achieved by realizing that in addition to form and content there is a third component, the interaction, that lies at the heart of turning static information presentations into interactive information.

Separation Of Concerns -- ease of application development and maintenance

Single authoring for a multiplicity of interfaces and deployment environments necessarily involves addressing of issues of presentation specific to each channel e.g., designing the look and feel for the visual presentation, the sound and feel for the auditory representation. We believe that a single authoring framework should allow these concerns to be cleanly separated so that:

Synchronized Multi-modal Views

During multi-modal or multi-device interactions, the MVC principle becomes especially relevant. The user interacts via the controller on a given view. Instead of modifying the view, his or her actions update the state of the model. It results in an update of the different registered views to be synchronized. Details can be found in [1].

Single authoring for delivering to a multiplicity of synchronized target devices and environment has one final crucial advantage. As we evolve towards devices that deliver multi-modal user interaction, single authoring enables the generation of tightly synchronized presentations across different channels, without requiring re-authoring of the multi-channel applications. The MVC principle guarantees that these applications are also ready for synchronization across channels.

Such synchronization allows user intent expressed in a given channel to be propagated to all the interaction components of a multi-modal system. We speak of tightly coupled multi-modal interactions by opposition to loosely coupled multi-modal interactions where each channel has its own model that periodically synchronizes with the models associated to the other channels. A tightly coupled solution can support a wide range of synchronization granularities. It also allows optimization of the interaction, by allowing given interactions to take place in the channel that is best suited as well as to revert to another channel when it is not available or capable enough.


We recommend the constitution of a Working Group within one of the leading standard organizations to address single authoring languages and frameworks for multi-channel interactions as well as tightly coupled multi-modal interactions.


This section contains our requirements for authoring the multi-modal web. As we consider that single authoring is the key requirement of the multi-modal web, these requirements are also the requirements for single authoring of multi-channel and multi-modal applications.

In addition, for multi-modal rendering, we recommend to leverage the forthcoming DOM level 2 specifications to enable the implementation of the MVC with legacy browsers [1]. It implies that the supported channel specific languages must have a DOM level 2 standardized specification.


[1] S. H. Maes and T. V. Raman, Multi-modal interaction in the Age of Information Appliance, in Proceedings ICME 2000, July 2000, New York, USA.