The Convergence of Applications and Documents -
Requirements and Directions

Position Paper for The W3C Workshop on
Web Applications and Compound Documents

Patrick Schmitz
Nervana, Inc.
Bellevue, WA

A trend towards convergence

For some time now, we have seen the line between an application and a document blurring, especially as regards web content. This has been widely reported, and is evident in a variety of forms, including:

There is no reason to believe that this trend will abate, and yet up to now, most solutions are built upon an ad hoc framework, often working around or even fighting the existing DOM and document tools, rather than leveraging them. Solutions - especially more sophisticated ones - are often closely tied to a given vendor-platform, which brings a host of problems.

Implications of annotation

An annotation to a document refers to (a fragment of) the document. Annotation ranges from the trivial comments on a document, to derivative works using parts or all of a document. Temporal segmentation and visual cropping are also forms of annotation, although they are not always recognized as such. In any case, although it is common today to simply copy and paste, smart representations for annotations will use pointers/references, and not embedded copies. A good general solution employs transclusion for document composition. It is worth noting that in this model, an annotation is a document, with a reference to one or more other documents.

As the model of a semantic web matures in the marketplace, we can expect to see more and more annotation of existing documents. In fact, we suspect that this will become the dominant content form. Derivative work has long been the norm, but when tools support collaborative authoring, document reference and embedding, versioning etc., most content will actually represent annotations to existing documents.

The transclusion model provides a good high-level model for compound documents, but begs a number of questions about the details. W3C started down the road by producing XML Fragment Interchange [XFrag], but a great deal of work remains to specify appropriate fragment contexts for content types, and frameworks for interpreting context in CDF's.  The next section summarizes the main issues and associated requirements we have identified in this space.

Issues and requirements

Four general areas emerge that require attention in any CDF model: specification of fragment and context, binding and interaction issues, authoring and rights issues, and issues with structural dynamism. These same issues apply (albeit with different weights) to many web applications. Discussion and direction on these will benefit both domains.

Issues for Fragment specification and context

In [MMM2001] we explored the possibilities for expressing and implementing fragmented media integration with existing and emerging standards. We described the requirements associated several functional areas for fragmented media integration. The first set of requirements addresses fragment specification, or how we delimit the extent of the contained document:

spatial fragmenting
There should be a spatial means of specifying what portion of the contained document/media-object makes up the desired fragment.
temporal fragmenting
There should be a temporal means of specifying what portion of the contained document/media-object makes up the desired fragment.
structural and nominal fragmenting
It should be possible for integrating formats to refer to fragments with a name (representing some spatial, temporal or structural extent) or with a reference to document structure (e.g., in the manner of XPath).

It should also be possible to combine these, so that for example, a spatial fragment of a given structural section (e.g., a portion of a layer) can be specified.

The remaining requirements address the specification of, or constraints upon, the context of the fragment. Note that depending on the manner of fragment specification, context may be spatial (outside the fragment), temporal (before and after the fragment) or structural (the rest of the document excluding the selected node-set). When specifications are composed, the resulting context is adjusted accordingly.

context removal
It must be possible to remove or preclude the display of the fragment context (the rest of the fragment document) without affecting the fragment’s appearance.
fragment distinction
It should be possible to specify fragment presentation (appearance) distinct from the specified presentation appearance of (within) the associated context. See also the discussion of authoring issues, below.
initial navigation to fragment
The fragment should be made readily apparent upon initial access.

A complete discussion of these points is in [MMM2001], but is omitted here for brevity.

Binding and interaction issues

A significant problem with compound documents relates to IDs and ID-REFs. A number of issues arise, which in some cases may be unintended conflicts and in others desired functionality:

To some extent, this class of problem has been recognized in the context of [XInclude], although the goals and constraints of building a simple Info Set have constrained their proposed solution. 

A related issue arises with event binding and interaction control in general. Specifically, how should authors (and implementers) control the flow of events across document fragment boundaries? In many useful cases, it will be desirable to "reach across" the boundaries to define interaction binding, animation targets, timing sync-arcs/event-arcs as well as XHTML hyperlink targets. In many cases, authors may wish to expressly target an element (by ID-REF) in a particular fragment. Conversely, authors may wish to specifically bind to an element in a host document (in the case of predefined CDF templates that transclude one or more of many possible fragments). 

We see many cases like this that will require a general mechanism for modeling the CDF as a compound document in the DOM. In this model each fragment defines a separate, local ID-space within which simple Info Set rules apply. There is some precedent for this in the frames model (although aspects of that model are problematic). We suggest that a hierarchical link reference be defined allowing relative links to, from and event among contained fragments. The target fragment would be indicated using the ID of the transclusion element in the host document (possibly preceded by a '#' as with hashref syntax). This would be followed by a slash separator and the ID-REF specific to the fragment ID-space (e.g., href="#frag1/targetEl"). This solution extends to a deeply nested scenario. For a reference into the parent hosting document, some reserved word may be used (e.g., href="#[parent]/targetEl") or perhaps the traditional dot-dot syntax qualified with a hash (e.g., href="#../targetEl") . Neither of these seems very clean, and further discussion is invited.

Authoring issues

In any CDF model, there are several authors involved. The author of the original content (i.e., the transcluded or embedded fragment or document) may wish to impose constraints on both the fragmentation specification as well as the context constraints. For example:

In any reasonable commercial model, there must be a means to protect the rights of authors in this respect. On the other side of the equation, the integrating author will often wish to present an integrated whole, and so requires controls to override context for fragments (subject to the above first-author constraints). For example:

These may be conflicting goals and constraints, but will be essential for the broader application of CDF models. We expect that some integration with authentication and rights models will be required, beyond the obvious requirements to manage end-user rights to transcluded material (e.g., when transcluding subscription-based or otherwise restricted content).

Issues with structural dynamism

It is common to define document models, Info Set models, etc. with a static view of the document. However, especially in the convergence of application and document where CDF content may vary significantly over time and in response to user interaction, we believe it is essential to consider structural dynamism from the start. In our experience, trying to retrofit dynamism into a model defined with a static mindset is difficult at best.

In some cases this will have little or no impact on designs, while in others it may necessitate certain alternatives be considered over others. 

Proposed directions

Fragment context specification

We support an effort to define fragment context specifications for W3C languages, based upon XML Fragment interchange [XFrag]. Each working group that defines a language should define the associated fragment context constraints, including authoring controls and optional aspects of context for a transcluded document.

Structured, compound DOM

As described above, we advocate support for a unified DOM for compound documents. The model must allow - without mandating - unified models for the following aspects:

We have not yet explored the XPath/XPointer implications for referencing compound documents, although it is certainly an issue; it may be related to the hyperlink reference issue.

On a related theme, we would like to see this support extended to support syntactic references to associated DOM properties. This would have real benefit for declarative animation and declarative value-binding. We believe that this would greatly increase the integrative value of the CDF, facilitating powerful declarative solutions for calculated expressions [FunctAnim] and value binding [XForms]. Calculated expressions can combine DOM values which arise from different fragments; this provides a powerful yet declarative semantic integration which cuts across the semantics of the individual fragments.

Extension behaviors

At the implementation level, we feel that traditional plug-in models are too primitive, especially because of the limited, black-box rendering model. We believe there is significant merit to the model for binary behavior extensions supported in MS Internet Explorer (version 5.5 and later), although we advocate some changes to the model. We think the binding should be defined via namespaces rather than a CSS property as in MSIE, although it must be possible to support attribute-only namespaces that extend the behavior of existing elements (as for SMIL timing attributes associated with XHTML or SVG elements). As with MSIE "rendering behaviors", it should be possible for an extension to participate in the rendering chain at a number of levels (e.g., replacing background only, foreground only, overlay, or some combination). 

While MSIE defines specific API mechanisms, we think it should be possible to define syntax and/or IDL for binding mechanisms, for rendering participation, etc., in a platform independent manner. We realize this is non-trivial, but we believe it is worth the effort. We would note that the primary goal is portable semantics; portability of extension implementations is secondary, at best.

Practical matters

While we believe in an active W3C defining standards ahead of common implementation, we also recognize the current quandary with respect to browser development. At this point, implementation on MSIE is essentially dead with a successor product still hazy at best. In addition, the current MSIE feature set varies considerably across OS platforms. Opera and FireFox are maturing somewhat but are still lacking in many areas (especially multimedia) and neither is widely deployed. Without implementers (prompted by real competition), W3C Recommendations may be moot. We are concerned that with MSFT putting basic document presentation services into the OS core, we may - for the foreseeable future at least - be stuck with whatever CDF model Avalon/Longhorn supports on Windows, and a smattering of uncoordinated support on also-ran browsers.

Ranged against traditional browsers are several proprietary and largely black-box presentation clients including Acrobat and Flash. With mature authoring tools and client support that, while significantly constrained, is nevertheless cross-platform and widely deployed, these products represent a significant thread to open, standards-based document processing and presentation models.

The W3C must act decisively to reenergize the document presentation space - including initiatives for Web Apps and CDF infrastructure - or cede its influence in the browser space to a few corporate players. 


P. Schmitz, S. Thompson, P. King, Presentation Dynamism in XML: Functional Programming meets SMIL Animation. U. Kent Technical Note Nov 2002.
L. Rutledge, P. Schmitz, 8th International Conference on Multimedia Modeling, Nov 2001.
P. Grosso and D. Veillard (eds). XML Fragment Interchange. W3C Candidate Recommendation. February 12, 2001.
M. Dubinko, et al. (eds). XForms. W3C Recommendation 14 October 2003.
J. Marsh and D. Orchard (eds). XML Inclusions (XInclude). W3C Candidate Recommendation. 13 April 2004.