Stéphane H. Maes and T. V. Raman
smaes@us.ibm.com
tvraman@us.ibm.com
IBM Research
Human Language Technologies Group
This position paper outlines an approach for authoring once for the future mobile internet where information is expected to be accessible anytime, anywhere, through any device and where the user can at any given time select to use the device and modality best suited to his or her abilities and capabilities at that moment.
We first define some terminology:
Channel: it denotes a particular device or a particular modality.
multi-channel applications: applications designed for ubiquitous access through different channels, one channel at a time.
multi-modal applications: multi-channel applications, where multiple channels are simultaneously available and synchronized.
Content can be created by multiple authoring:
Hand authoring of the application in each target channel.
Authoring of style sheet transformations of a common representation (device-independent) into the different target presentation languages (final form).
In addition, for multi-modal applications, the developer must also specify the synchronization of the different channels.
We feel there is a strong need for a language that supports single authoring across a large variety of devices and modalities. As a first step, we would like to focus the W3C/WAP workshop on collecting requirements for such a language.
Single authoring is motivated by the need to author, maintain, and revise content for delivery to an ever increasing plethora of end-user devices. Hand authoring of the target pages leads to the “M times N problem”. An application composed on M “pages” to be accessed via N devices requires M x N authoring steps and it results into M x N presentation pages to maintain. Generic separation of content from presentation results into non-re-usable style sheets and a similar M x N problem with the style sheets.
Multiple authoring is an even more complex problem when synchronization is needed across channels.
We advocate a programming approach that enables separation of specific content from the presentation enabling re-usable style sheets for default presentation in the final form. Specialization can then be performed in-line or via channel specific style sheets.
The underlying principle of single authoring is the Model View Controller:
Separating content from presentation in order to achieve content re-use is now the accepted way of deploying future information on the World Wide Web. In the current W3C architecture, such separation is achieved by representing content in XML that is then transformed to appropriate final-form presentations via XSL transforms.
We believe that single authoring can be achieved by realizing that in addition to form and content there is a third component, the interaction, that lies at the heart of turning static information presentations into interactive information.
Single authoring for a multiplicity of interfaces and deployment environments necessarily involves addressing of issues of presentation specific to each channel e.g., designing the look and feel for the visual presentation, the sound and feel for the auditory representation. We believe that a single authoring framework should allow these concerns to be cleanly separated so that:
Content can be created and maintained without presentation concerns.
Presentation rules --including content transformations and style sheets can be maintained for specific channels and deployment environments without adversely affecting other aspects of the system.
Content and style can be independently maintained and revised.
The result can be specialized for a specific device or channel.
During multi-modal or multi-device interactions, the MVC principle becomes especially relevant. The user interacts via the controller on a given view. Instead of modifying the view, his or her actions update the state of the model. It results in an update of the different registered views to be synchronized. Details can be found in [1].
Single authoring for delivering to a multiplicity of synchronized target devices and environment has one final crucial advantage. As we evolve towards devices that deliver multi-modal user interaction, single authoring enables the generation of tightly synchronized presentations across different channels, without requiring re-authoring of the multi-channel applications. The MVC principle guarantees that these applications are also ready for synchronization across channels.
Such synchronization allows user intent expressed in a given channel to be propagated to all the interaction components of a multi-modal system. We speak of tightly coupled multi-modal interactions by opposition to loosely coupled multi-modal interactions where each channel has its own model that periodically synchronizes with the models associated to the other channels. A tightly coupled solution can support a wide range of synchronization granularities. It also allows optimization of the interaction, by allowing given interactions to take place in the channel that is best suited as well as to revert to another channel when it is not available or capable enough.
We recommend the constitution of a Working Group within one of the leading standard organizations to address single authoring languages and frameworks for multi-channel interactions as well as tightly coupled multi-modal interactions.
This section contains our requirements for authoring the multi-modal web. As we consider that single authoring is the key requirement of the multi-modal web, these requirements are also the requirements for single authoring of multi-channel and multi-modal applications.
XML compliant
Vendor neutral
Any tool developer can target it or use it as an input representation
It can be used not only to express data within an application, but also to pass it to a network services provider, portal, or directly to an end-user device
A single language should handle both multi-channel applications and multi-modal applications
Can be mapped using style sheets to an open-ended set of device specific markups including VoiceXML, WML, CHTML, HTML and others.
Can accommodate channel- or device-specific specialization either in-line, as annotations, or using style sheets
Supports a developer-definable hierarchy of channels and devices
Supports specification of data models in Xforms / Xschema to model the data that can be manipulated by the end user
Enables fine-grain synchronization of multi-modal interaction
Can accommodate both synchronous and asynchronous data exchange, and connected as well as disconnected operation
In addition, for multi-modal rendering, we recommend to leverage the forthcoming DOM level 2 specifications to enable the implementation of the MVC with legacy browsers [1]. It implies that the supported channel specific languages must have a DOM level 2 standardized specification.
[1] S. H. Maes and T. V. Raman, Multi-modal interaction in the Age of Information Appliance, in Proceedings ICME 2000, July 2000, New York, USA.