Web Applications and Compound Documents

Vincent Hardy
Sun Microsystems, Inc.

This paper discusses why combining markup is an important endeavor, why we feel the effort should strive to define something simple based on simple architectural rules, and what we feel are the challenges.

What functionality do Web applications need?

Why is combining markup important?

Open-standard XML markup [XML] has become increasingly important for multiple reasons. First, open-standards enable interoperability across vendors, which avoids vendor lock-in and gives end-users a choice. Companies can compete on implementations. But this is not specific to XML markup; it is specific to open-standards.

XML markup brings a unified data model that can be applied to different application domains, from graphics to business processes to multimedia integration. The nature of XML makes it easy to generate data content, easy to process and re-purpose data content and easy to exchange data content. For example, XML content can be easily searched and indexed and it can be easily transcoded, leveraging technologies such as the Document Object Model API [DOM] or XSL transformations ([XSLT]). Additionally, XML's very nature makes it possible for authors to create accessible content; it supports the adding of meta-data and attributes for accessibility purposes to enable User Agents, such as browsers and screen readers for blind users, to access the content's structure, visual attributes, and to change the way information is presented ([WAI]).

Building on the above qualities, XML data is exchanged between all kinds of applications, from authoring tools to web services. In particular, XML markup is at the heart of web services which generate and transform XML content. They can naturally generate and transform content for the purpose of presenting information (in addition to exchanging information), in the same way HTML is dynamically generated by services. The same concepts and processing logic applies to all presentation markup such as SMIL, SVG, XHTML, and XForms.

In the area of Web interfaces. XHTML, SVG, SMIL, and XForms, to name a few, are important technologies, but they are not enough. They are vertical technology silos, each addressing a particular problem domain adequately. However, not a single one of them addresses the full range of features needed in a rich web interface. For example, SVG lacks a layout facility. XHTML does not have scalable graphics. Neither SVG nor XHTML provide the rich multi-media integration features of SMIL. A rich client needs a combination of the features found in various existing markup more that any particular single markup. We need a more horizontal approach to the problem, one that builds a platform rich enough to capture the user interface data an application needs for presentation. We need to define the building blocks for a rich client platform for the Web, re-using and integrating the work that has already been done.

For example, client terminals have widely different screen resolutions. This makes graphical scalability a central issue for web clients and SVG addresses this issue to a large extent. Another example: client terminal have varying screen sizes, which is a different issue than resolution. Depending on the screen size, layout may need to be modified or adapted. The CSS box model provides solutions for layout that XHTML leverages for presentation.  XSL FO also address pagination and layout issue. How can these be leveraged to address the issue?

These examples show that there are multiple specifications that address different issues separately. Combining the solutions to address all the key issues would yield a greater value than the sum of the individual, separate solutions.

Key Web Client Features

Existing solutions
Document and/or page layout
SMIL Layout, CSS Box Model, XSL FO
Resolution independence
CSS units, SVG
Multi-media integration
SMIL Animation (generic), SVG Animation (host language for SMIL Animation)
UI Components
XForms, RCC
Binding with code
XML Events

In this paper, we do not suggest a particular markup combination. Rather, we feel that in order to foster an ecosystems of interoperable authoring tools, applications and services, it is important to limit the number of profiles. Ideally, we should work on a single profile that addresses the top requirements for web applications.


Despite their many advantages, markup syntaxes have, to a large extent, been developed as silo technologies and integration issues have not been addressed.  This is not completely true: some efforts have addressed mixing markup (XML namespaces [XMLNS]) and some have been designed for integration (SMIL Animation [SMIL-Anim]). However, these solutions in particular technology areas are not sufficient to address to problem overall.

Define semantics and architecture

Granted, XML namespaces [XMLNS] allow mixing of various markup. It is possible to mix different syntaxes, for example RDF [RDF] and XHTML [XHTML]. XML namespaces define how different markup can be mixed in a document and allow them to co-exist. This is important but not sufficient, because what we need, at least in the area of rich client markup, is integration more than mere co-existence. And integration raises many questions. For example, what does it mean to mix SVG and XHTML? Are there restrictions or can any SVG element be mixed within any XHTML element? How do events propagate in that context? Do SVG elements take part in the page's box layout? How is the XHTML model reconciled with the SVG model? Do SVG transforms have an impact on XHTML elements? Are XHTML elements subject to the SVG painter's model? There are multiple possible approaches and there has been work in this domain already (for example, in [SVG], the <foreignObject> element), but this type of questions need to be raised and answered so that the rich client building blocks can be combined and built upon. Simply put, we need to define not only the syntax but also the semantics of combining markup, and define the architectural foundations for the combination.


One of the roles of open-standards is to foster interoperability. Interoperability implies that multiple implementations can coexist, collaborate and interoperate. This means that the industry is able to deliver multiple implementations. Therefore the implementation barrier and complexity must be reasonable. This has been an issue for some specifications like SVG for which the implementation barrier is high. This limits the number of implementations and slows the adoption curve. Overall, complexity is detrimental to the success of Web standards. When combining markup, the complexity is, to some extent, a combination of the complexity of each of the combined markup. As a consequence, an effort to combine markup will be challenged to reduce complexity which may require simplifying the building blocks (for example using SVG Tiny instead of SVG Basic or SVG Full, XHTML Basic instead of XHTML full) and simplify the integration strategy (for example, limit the ways in which markup can be combined).

Static and Dynamic Behavior

Another challenge to address when looking at combining markup is the transitions from static specification to dynamic behavior.  In XML markup like SVG or XHTML, the document is statically defined but, once used in a user agent, becomes a dynamic entity which can mutate, is sensitive to events and may be animated.  This is a challenge for the  Document Object Model API which needs to further specify the dynamic behavior of the API in a user agent. The API needs to carefully consider what can and cannot be considered live after the system's initialization. An example of this problem are SVG Fonts. The DOM API allows mutation of SVG Font's characteristics, like the default glyph's advance. What should happen if the default glyph's advance is modified? Should all text that reference the glyph be updated? It seems important to carefully consider the dynamic behavior of the combined markup so that implementations remain as simple as possible.

Generic APIs vs Specialized APIs

An important reason for the success of HTML and XML is the Document Object Model API. This generic API provides a simple, systematic and yet powerful way to modify a document and interact with a user. The generic DOM API addresses several important aspects, but some specialized APIs are needed (as in SVG for example), so that applications can truly leverage the data expressed in markup. For example, the SVG specialized DOM exposes notions such as bounding boxes, affine transforms or text advance which are critical to make the API useful. By the same token, the ElementTimeControl interface in SMIL Animation provides a needed and powerful way to integrate with time controlled resources such as audio or video streams, through the API.

While the specialized DOM APIs are important, the effort should focus on an API that allows rich and meaningful interaction with the content without making it overly complex. In particular, it should not be a goal to allow the API to specify all the syntax variations allowed by the markup, as long as the underlying concept is properly available. An example of this are transforms in the SVG markup. The specification for affine transformations in SVG is sophisticated. However, any transformation can be represented by a 2x3 matrix. It is reasonable for the specialized API to provide access to the 2x3 matrix (read/write) and not provide access to the various ways the matrix may have been specified in markup.

Conclusion: Re-use existing solutions wherever possible. Strive to simplify.

Web standards take a long time to create and mature. There is a number of existing web standards which solve difficult problem domains adequately, but  Web applications need a solution that addresses their needs across domains (layout, resolution independance, multimedia integration) rather than perfectly in one particular area (e.g. SVG is a great resolution independent solution) but not at all in other areas (e.g., SVG does not provide page layout).

However, specifying markup and implementing them is complex. Specifying a combination of markup can be even more challenging because the integration issues are difficult, and we detailed a few issues in this paper. Furthermore, if we multiply the number of combinations, the chances of interoperability (the main motivation for working on open standards) are minimal. In order to achieve interoperability, we feel that the industry should define a single profile that combines the features needed by web applications. This probably requires making some hard decisions. This profile should re-use the smallest possible profile of existing specifications and make the integration points few, clearly identified and clearly specified.


[XML] http://www.w3.org/TR/REC-xml
[XMLNS] http://www.w3.org/TR/REC-xml-names/
[SVG] http://www.w3.org/TR/SVG11/
[SVG2] http://www.w3.org/TR/SVG12/
[SMIL2] http://www.w3.org/TR/smil20/
[SMIL-Anim] http://www.w3.org/TR/smil-animation
[XML-Events] http://www.w3.org/TR/xml-events/
[XHTML] http://www.w3.org/TR/xhtml1/
[XForms] http://www.w3.org/TR/xforms/
[CSS] http://www.w3.org/Style/CSS/
[XSLT] http://www.w3.org/TR/xslt
[XSL-FO] http://www.w3.org/TR/xsl/
[DOM] http://www.w3.org/TR/DOM-Level-2
[WAI] http://www.w3.org/WAI/
[RDF] http://www.w3.org/RDF/