WebApps and CDF Position Paper

The Convergence of Applications and Documents -
Requirements and Directions

Position Paper for The W3C Workshop on
Web Applications and Compound Documents

Patrick Schmitz
Nervana, Inc.
Bellevue, WA

A trend towards convergence

For some time now, we have seen the line between an application and a document blurring, especially as regards web content. This has been widely reported, and is evident in a variety of forms, including:

URLs that reference abstract content rather than a specific document instance. These include news site front pages, financial market summaries and portfolio tracking displays, dynamic and interactive maps for weather, traffic, etc.
So-called Portal web sites that aggregate disparate content, merging or even transcluding content from various sources. The composition is often configurable to reflect user preferences, interests, etc.
The common use of PHP, ASP, SSI and related technologies to generate or modify content on the fly, in response to end-user, system or external inputs.
Browser-based applications, especially within the corporate intranet. These commonly include Human Resource applications (replacing the traditional benefits brochure and associated forms), multimedia presentations and solutions for distributed learning and communications and more recently, a host of Knowledge Management and Knowledge Integration services. These commonly make use of enhanced web clients to support document search, presentation, annotation, collaboration and related services. More and more, they orchestrate a set of Web Services to fetch pieces of the final content stream, and so constitute both a Web Application as well as a Compound Document format.

There is no reason to believe that this trend will abate, and yet up to now, most solutions are built upon an ad hoc framework, often working around or even fighting the existing DOM and document tools, rather than leveraging them. Solutions - especially more sophisticated ones - are often closely tied to a given vendor-platform, which brings a host of problems.

Implications of annotation

An annotation to a document refers to (a fragment of) the document. Annotation ranges from the trivial comments on a document, to derivative works using parts or all of a document. Temporal segmentation and visual cropping are also forms of annotation, although they are not always recognized as such. In any case, although it is common today to simply copy and paste, smart representations for annotations will use pointers/references, and not embedded copies. A good general solution employs transclusion for document composition. It is worth noting that in this model, an annotation is a document, with a reference to one or more other documents.

As the model of a semantic web matures in the marketplace, we can expect to see more and more annotation of existing documents. In fact, we suspect that this will become the dominant content form. Derivative work has long been the norm, but when tools support collaborative authoring, document reference and embedding, versioning etc., most content will actually represent annotations to existing documents.

The transclusion model provides a good high-level model for compound documents, but begs a number of questions about the details. W3C started down the road by producing XML Fragment Interchange [XFrag], but a great deal of work remains to specify appropriate fragment contexts for content types, and frameworks for interpreting context in CDF's. The next section summarizes the main issues and associated requirements we have identified in this space.

Issues and requirements

Four general areas emerge that require attention in any CDF model: specification of fragment and context, binding and interaction issues, authoring and rights issues, and issues with structural dynamism. These same issues apply (albeit with different weights) to many web applications. Discussion and direction on these will benefit both domains.

Issues for Fragment specification and context

In [MMM2001] we explored the possibilities for expressing and implementing fragmented media integration with existing and emerging standards. We described the requirements associated several functional areas for fragmented media integration. The first set of requirements addresses fragment specification, or how we delimit the extent of the contained document:

spatial fragmenting: There should be a spatial means of specifying what portion of the contained document/media-object makes up the desired fragment.
temporal fragmenting: There should be a temporal means of specifying what portion of the contained document/media-object makes up the desired fragment.
structural and nominal fragmenting: It should be possible for integrating formats to refer to fragments with a name (representing some spatial, temporal or structural extent) or with a reference to document structure (e.g., in the manner of XPath).

It should also be possible to combine these, so that for example, a spatial fragment of a given structural section (e.g., a portion of a layer) can be specified.

The remaining requirements address the specification of, or constraints upon, the context of the fragment. Note that depending on the manner of fragment specification, context may be spatial (outside the fragment), temporal (before and after the fragment) or structural (the rest of the document excluding the selected node-set). When specifications are composed, the resulting context is adjusted accordingly.

context removal: It must be possible to remove or preclude the display of the fragment context (the rest of the fragment document) without affecting the fragment’s appearance.
fragment distinction: It should be possible to specify fragment presentation (appearance) distinct from the specified presentation appearance of (within) the associated context. See also the discussion of authoring issues, below.
initial navigation to fragment: The fragment should be made readily apparent upon initial access.

A complete discussion of these points is in [MMM2001], but is omitted here for brevity.

Binding and interaction issues

A significant problem with compound documents relates to IDs and ID-REFs. A number of issues arise, which in some cases may be unintended conflicts and in others desired functionality:

there may be overlap among the sets of defined ID values (i.e., xml properties of type ID) defined in each fragment and the host/parent document. Traditionally, XML requires uniqueness among the IDs in a document.
by the same token, there may be references to ID values with multiple definitions. Resolving these references must be clear and unequivocal (and ideally, follow author intent!).
fragments may include ID-REFs referring to elements outside the fragment. While it should be possible to specify that the referred to elements be included as part of the fragment context, it should also be possible to specify that the references be resolved in the context of the compound document. An example of the former case is an <svg:use> reference to an <svg:symbol> element outside the fragment scope, in which the author(s) wish to maintain the symbol definition from the fragment document context. An example of the latter case involves an author defining (sub-)documents with the intent of transclusion, and so leaving hyperlink, timing or animation links dangling in the fragment definition, with the intent that they will be resolved in the compound document context.

To some extent, this class of problem has been recognized in the context of [XInclude], although the goals and constraints of building a simple Info Set have constrained their proposed solution.

A related issue arises with event binding and interaction control in general. Specifically, how should authors (and implementers) control the flow of events across document fragment boundaries? In many useful cases, it will be desirable to "reach across" the boundaries to define interaction binding, animation targets, timing sync-arcs/event-arcs as well as XHTML hyperlink targets. In many cases, authors may wish to expressly target an element (by ID-REF) in a particular fragment. Conversely, authors may wish to specifically bind to an element in a host document (in the case of predefined CDF templates that transclude one or more of many possible fragments).

We see many cases like this that will require a general mechanism for modeling the CDF as a compound document in the DOM. In this model each fragment defines a separate, local ID-space within which simple Info Set rules apply. There is some precedent for this in the frames model (although aspects of that model are problematic). We suggest that a hierarchical link reference be defined allowing relative links to, from and event among contained fragments. The target fragment would be indicated using the ID of the transclusion element in the host document (possibly preceded by a '#' as with hashref syntax). This would be followed by a slash separator and the ID-REF specific to the fragment ID-space (e.g., href="#frag1/targetEl"). This solution extends to a deeply nested scenario. For a reference into the parent hosting document, some reserved word may be used (e.g., href="#[parent]/targetEl") or perhaps the traditional dot-dot syntax qualified with a hash (e.g., href="#../targetEl") . Neither of these seems very clean, and further discussion is invited.

Authoring issues

In any CDF model, there are several authors involved. The author of the original content (i.e., the transcluded or embedded fragment or document) may wish to impose constraints on both the fragmentation specification as well as the context constraints. For example:

Media authors may (onerously) require that a lead-in advertisement not be excluded when viewing a temporal fragment, or may constrain fragments to include both audio and video channels.
An image publisher may require a special watermark context for spatial fragments.
A designer may constrain the presentation changes allowed when presenting a fragment out of context.
An author not wishing to be quoted out of context may preclude spatial fragments and further require that a minimal structural context (e.g. the containing paragraph) be included for any fragment.

In any reasonable commercial model, there must be a means to protect the rights of authors in this respect. On the other side of the equation, the integrating author will often wish to present an integrated whole, and so requires controls to override context for fragments (subject to the above first-author constraints). For example:

When simply quoting text from another document, an integrating author will often wish to specify the presentation attributes such as font family, size, color, etc.
The integrating author may wish to scale or otherwise transform a spatial fragment - e.g., to zoom in on a raster image or vector graphics fragment.
The integrating author may wish to override the speed of, or remove temporal repeat, autoReverse, et al. behaviors from a segment of a timed document.

These may be conflicting goals and constraints, but will be essential for the broader application of CDF models. We expect that some integration with authentication and rights models will be required, beyond the obvious requirements to manage end-user rights to transcluded material (e.g., when transcluding subscription-based or otherwise restricted content).

Issues with structural dynamism

It is common to define document models, Info Set models, etc. with a static view of the document. However, especially in the convergence of application and document where CDF content may vary significantly over time and in response to user interaction, we believe it is essential to consider structural dynamism from the start. In our experience, trying to retrofit dynamism into a model defined with a static mindset is difficult at best.

In some cases this will have little or no impact on designs, while in others it may necessitate certain alternatives be considered over others.

Proposed directions

Fragment context specification

We support an effort to define fragment context specifications for W3C languages, based upon XML Fragment interchange [XFrag]. Each working group that defines a language should define the associated fragment context constraints, including authoring controls and optional aspects of context for a transcluded document.

Structured, compound DOM

As described above, we advocate support for a unified DOM for compound documents. The model must allow - without mandating - unified models for the following aspects:

Hyperlink targeting - it should be possible to target local hyperlinks across fragment boundaries. It should also be possible to target external hyperlinks (in other documents) to anchors nested within transcluded fragments.
Style - it must be possible to inherit (or preclude inheritance) of style context from the containing document into the fragment.
Events - it must be possible to allow (or suppress) event flow across fragment/sub-doc boundaries. This generalizes to support for user interaction control.
Timing - it must be possible to support a single timegraph, while also allowing for independent local timing of fragments (SMIL already has mechanisms for this). It should also be possible to define sync-arcs, event-arcs, etc. that span fragment boundaries; this depends upon some equivalent to our compound ID-space proposal.
Animation - it must be possible to define an animation that targets element-attributes in another ID-space (e.g., allowing a host document to animate aspects of a contained fragment). This also requires ID-binding.

We have not yet explored the XPath/XPointer implications for referencing compound documents, although it is certainly an issue; it may be related to the hyperlink reference issue.

On a related theme, we would like to see this support extended to support syntactic references to associated DOM properties. This would have real benefit for declarative animation and declarative value-binding. We believe that this would greatly increase the integrative value of the CDF, facilitating powerful declarative solutions for calculated expressions [FunctAnim] and value binding [XForms]. Calculated expressions can combine DOM values which arise from different fragments; this provides a powerful yet declarative semantic integration which cuts across the semantics of the individual fragments.

Extension behaviors

At the implementation level, we feel that traditional plug-in models are too primitive, especially because of the limited, black-box rendering model. We believe there is significant merit to the model for binary behavior extensions supported in MS Internet Explorer (version 5.5 and later), although we advocate some changes to the model. We think the binding should be defined via namespaces rather than a CSS property as in MSIE, although it must be possible to support attribute-only namespaces that extend the behavior of existing elements (as for SMIL timing attributes associated with XHTML or SVG elements). As with MSIE "rendering behaviors", it should be possible for an extension to participate in the rendering chain at a number of levels (e.g., replacing background only, foreground only, overlay, or some combination).

While MSIE defines specific API mechanisms, we think it should be possible to define syntax and/or IDL for binding mechanisms, for rendering participation, etc., in a platform independent manner. We realize this is non-trivial, but we believe it is worth the effort. We would note that the primary goal is portable semantics; portability of extension implementations is secondary, at best.

Practical matters

While we believe in an active W3C defining standards ahead of common implementation, we also recognize the current quandary with respect to browser development. At this point, implementation on MSIE is essentially dead with a successor product still hazy at best. In addition, the current MSIE feature set varies considerably across OS platforms. Opera and FireFox are maturing somewhat but are still lacking in many areas (especially multimedia) and neither is widely deployed. Without implementers (prompted by real competition), W3C Recommendations may be moot. We are concerned that with MSFT putting basic document presentation services into the OS core, we may - for the foreseeable future at least - be stuck with whatever CDF model Avalon/Longhorn supports on Windows, and a smattering of uncoordinated support on also-ran browsers.

Ranged against traditional browsers are several proprietary and largely black-box presentation clients including Acrobat and Flash. With mature authoring tools and client support that, while significantly constrained, is nevertheless cross-platform and widely deployed, these products represent a significant thread to open, standards-based document processing and presentation models.

The W3C must act decisively to reenergize the document presentation space - including initiatives for Web Apps and CDF infrastructure - or cede its influence in the browser space to a few corporate players.

References

[FunctAnim]: P. Schmitz, S. Thompson, P. King, Presentation Dynamism in XML: Functional Programming meets SMIL Animation. U. Kent Technical Note Nov 2002. http://www.cs.kent.ac.uk/people/staff/sjt/PDXML/PDXML.pdf.
[MMM2001]: L. Rutledge, P. Schmitz, 8th International Conference on Multimedia Modeling, Nov 2001.
http://homepages.cwi.nl/~media/publications/mmm01b.pdf.
[XFrag]: P. Grosso and D. Veillard (eds). XML Fragment Interchange. W3C Candidate Recommendation. February 12, 2001. http://www.w3.org/TR/xml-fragment.
[XForms]: M. Dubinko, et al. (eds). XForms. W3C Recommendation 14 October 2003.
http://www.w3.org/TR/xforms/.
[XInclude]: J. Marsh and D. Orchard (eds). XML Inclusions (XInclude). W3C Candidate Recommendation. 13 April 2004.
http://www.w3.org/TR/xinclude/.

The Convergence of Applications and Documents - Requirements and Directions