Digital Channel Partners
Position Paper on Multi-Modal Web Issues

Daniel K. Appelquist
W3C Advisory Committee Representative

London, UK
22 August 2000


How does their ownership of a "brand" and proprietary content mesh with the new paradigms for content distribution born in an "Internet Everywhere" world? What new software development and deployment methods must be explored in order to meet these changing needs? Our view is that the multi-tier application architecture will continue to deepen, bringing more tiers of intelligence between application/content provider and the end user. In this context, "convergence" of a multiplicity of modalities implies a need for a diverse set of inter-related languages, some adapted for specific clients, some adapted for middle-tier data and object exchange.

Our Interest

Digital Channel Partners is an E-Business consultancy that provides consulting services across the spectrum of business strategy, design and engineering. Our client base of ".coms" and ".corps" in the publishing, financial and entertainment sectors need to be exposed to the latest thinking about where the Web is heading. Our software development process also needs to be informed by this thinking so that we can build around the conceptual framework before and as standards evolve and are recommended.


The strength of the Web, and its hyper-growth as a medium, has been based on the opportunities that exist there for the independent content and application providers. The openness of the web has spurred competition and created a marketplace of ideas. It is vital that this openness continues as the Web evolves beyond its current form. DCP sees the types of services currently being offered on the Web evolving into sets of services and meta-services that are deployed across multiple touch-points. A meta-service could be defined as taking the content or functionality of a service and repurposing it for another delivery channel.

For example, a content provider, such as a financial news source, can concentrate on the production of news and all the technology associated with this. The product is a feed of news articles in an XML-derived format. Meta-service providers take this feed and repurpose it for specific output channels or devices. What implications does this have for formatting of content both at the service and meta-service level?

Any discussion about multi-modal content or service deployment must address the burgeoning market in DTV as well as the current "traditional" Web market.

How are wearable computers and mobile devices going to converge? What kinds of languages need to evolve to support possible feature-sets of wearable computers? What about new user experiences such as touch-based or smell-based user interface? What new user experience paradigms are on the horizon that we aren’t currently envisioning? Every such experience requires modality-specific data and cues.

Will mobile agents become more important as users increasingly interact with systems through voice commands? If so, frameworks to support the interchange of data between mobile agents will become a necessity, in essence creating a meta-marketplace where agents, acting on behalf of users or organizations, will exchange information and transact business.

Frameworks for manipulating pre-downloaded images or animation via a low-bandwidth stream may become more important in mobile device environments. Synchronizing speech with facial movement of animated characters, for instance. Is SMIL up to this task or do we need a new framework?

"Fatter" devices on which applications can be downloaded "over the air" and run, provoke even more questions. Should there be a standard and secure way of delivering these applications? Should application communication with back-end services be encapsulated in some way, such that there is a standard framework for message passing from device to land-based relay that takes into account the potential unreliability of the connection? Unreliable connections are a particular problem with current WAP devices and applications.

The deployment of content across a multiplicity of devices also raises questions in content authoring. Text to speech systems often will need mark-up cues in order to correctly render a written sentence into a compelling spoken sentence. Likewise, a written document intended to be read on a page or Web browser most likely will be too long to be presented in spoken word or on a small-screen device. Authoring techniques and technologies for content that take multi-modal content deployment into account need to be developed. Should sentences be ranked in order of importance during the authoring process so that less important sentences can be filtered out depending on what modality the end-user is reading or listening to the content on?

Content should be authored in a mark-up language specific to the modality of the content. The separation of content from style in this way is by now a time tested and widely practiced tenet. However, the recent boom of B2B e-commerce initiatives has shown that this principle can been extended to interactive services as well. DCP supports a framework for thinking about these multi-modal issues that takes its cue from these developments.

Most information that eventually makes it down to a device or is translated into speech will have gone through several levels of translation and will most probably be in the context of a multi-tier application where one of the tiers may be on the device itself. However, an emphasis of most standards efforts in this area seems to stress human-readability of mark-up. Does this make sense, especially when dealing with low bandwidth connections?

It is the position of DCP that the predominant modality of application development and information delivery on the Internet will continue to be a multi-tier architecture that differentiates at the server level between client types and delivers content to clients based on that type. If anything, we see the market trending towards an elongation of this content pipeline with more intelligence moving into the intermediate layers (currently comprised by content caching architectures).

In this context, we are suspicious of efforts to "converge" languages together (e.g. WML and VoiceXML). Rather, what may be needed are highly specific languages that address the capabilities of individual delivery platforms (e.g. WML, CHTML, Speech Synthesis ML) and intermediate languages that model content and user interactions outside of specific modalities (e.g DialogML). Furthermore, the specification of these application delivery languages must be highly coordinated, producing a "family" of languages.


We would hope that everyone attending this workshop will become more aware of the efforts in all of these fields. We hope to explore the issues above. We hope to see the formation of a working group to explore these issues at more length.