Position paper: Compound documents

The W3C Workshop on Web Applications and Compound Documents

Olli Pettay, Olli.Pettay@nokia.com

The evolution of XML-based markup languages has led to the situation where it might be necessary to mix them in one single document. The concept of modularization using separate XML name spaces has been introduced recently, but currently this is utilized quite seldom in the WWW because either the browsers (or other programs) do not support anything else than one type of content (e.g.: (X)HTML) or the way these languages are combined is very implementation-dependent. These pose a need for a standardized method for authoring such mixed language documents.

Two cases, standardized technologies and application specific languages

There are (at least) two different kinds of situations where the need for multiple languages within the same document arises. One is related to the W3C standards, how to combine existing standards for example XHTML, VoiceXML or XForms and XML Events, so that the specifications of the individual languages need not to be altered significantly, and also so that the implementations of each specific language can handle this kind of new combination reasonable easily.

To combine standardized languages there should be very specifically defined profiles regarding how to mix the languages. This is not only necessary to achieve interoperability between implementations, but it also makes it easier to add the support for new languages into the existing language engines (browsers). This would also play part in speeding up the adoption of the new languages. What these profiles should define depends on the languages they are mixing: syntax, processing model, layout model etc.

The second case is to support application specific languages. There might be cases where the content author will want to create new application specific behavior. In this case it could be possible to use XBL-like approaches where the binding language allows the content author to hide some part of the implementation of the application logic or behavior. In addition this also allows content authors reuse some common data and it makes it easier to maintain the application.

Currently browsers use their own binding languages. Even though there are some wrappers from one language to another, interoperability can best be achieved through standardization. One potential problem here is whether or not it is possible to create a binding language, which could be used with all or at least most of the markup languages.

Problems

What happens if the browser does not support some specific language in a document? Should it just reject such a page, or should it make its best effort to do whatever it can or should there be a common way to handle this kind of error situation via a fallback system?