XML Processing Model Workshop

Sun Position Paper 22 Jun 2001

This version:: Workshop Submission: 22 Jun 2001

Editors:: Norman Walsh <Norman.Walsh@Sun.COM>, Eve Maler <Eve.Maler@Sun.COM>, Christopher Ferris <Chris.Ferris@Sun.COM>

Abstract

We observe that the lack of an underlying processing model allowing applications to describe the semantics of a mixed environment of XML processes poses a serious threat to the development of interoperable web applications.

Motivated by a selection of use cases, we describe our notion of an XML process and of the consequences of processing order on a successful application.

We conclude with our thoughts on how the W3C should proceed.

1 Introduction

2 What Is a Process?

2.1 Infoset Composition

3 Use Cases

3.1 Simple Business Transactions
3.2 Transforming to a New Schema
3.3 Document Publishing
3.4 Business Transaction Hub
3.5 Web Service Implementation

4 Processing Order

5 Conclusion

Appendixes

References

1. Introduction

There is a large, and growing, set of specifications that describe processes operating on XML documents. Considering how these specifications interact raises a large number of issues. Our principal concern in this position paper is interoperability.

There are several broad classes of processes that need to be considered:

Constructive processes. These are processes that build new XML documents. An [XML 1.0 (Second Edition)] parser (with required, or perhaps optional, support for [Namespaces] and [XML Base]) is one example of this class. [XSL Transformations] and [XQuery] are others. Basically, processes which produce a new or substantially different XML document from some input source can be described as constructive.
Augmenting processes. Processes, like [W3C XML Schema] validation or [CSS] styling, that add new information to an existing document are performing some sort of augmentation. Verifying a digital signature might also leave the documentation unchanged, but indicate somehow that it passes or fails a signature check. [XInclude] might be viewed as an augmentation process.
"Peephole" processes. Processes may sometimes operate on only a small fraction of an existing document; perhaps to perform validation of a particular namespace, for example, or to decrypt a region of encrypted content.

It seems possible that these processes will have different characteristics than more global operations, especially with respect to implementation efficiency.
Extraction processes. Some processes reach into an existing document and copy (or link or remove) parts of it. Processes such as [XPointer] and [XLink] (and applications which are built on top of them) fall into this category. So do [XInclude] and [XQuery].
Packaging processes. Distributed or federated web applications will need to package a collection of resources to transmit to another location or service. This packaging may be performed using using XML Protocol, for example, or [SOAP]. We note that the whole issue of packaging resources and providing a useful manifest is not adequately addressed.
Packaging processes. A meta-issue that has not been clearly addressed yet (although processes XML Protocol and [SOAP] may take steps in this direction) is packaging. If you need to send a set of documents and processing model information to another system, how do you do it?

This is by no means an exhastive list of the processes, nor do processes always fit cleanly into the categories, as you can see. What appears to be augmentation in one application might also be a peephole process in another or even a constructive process. All of which is compounded by the fact that processes can be hierarchical, with a transformational process performing a bit of peephole validation or extraction, for example.

The fact that documents may be augmented or transformed raises another set of issues with respect to addressing into augmented or transformed results. The [XML Linking and Style] discusses this at some length, but there are still many unresolved issues.

The use cases described in Section 3 demonstrate that no fixed order of processing can be imposed, but but the order of processing actually used is an important aspect of the semantics of the application. It follows that the ability to identify an underlying processing model and describe how processing is to proceed is an imperative precondition to developing successful, interoperable web applications.

2. What Is a Process?

Before we can investigate what it means to have a processing model, we need some common understanding of what constitutes a process.

For the purposes of this document, we consider any application which constructs, transforms, or augments an [Infoset] to be a processor. A process begins with zero or more Infosets and produces zero or more Infosets (it may also produce ancillary information, such as whether it succeeded or failed). Some processes (such as [Schematron] and [RELAX NG] validation) don't strictly speaking construct, transform, or augment an infoset, but we can finesse that point for the moment by imagining that they perform the identity transformation.

Actually, processes that augment an existing infoset but don't clearly produce a new one represent a special class of process because they may change the fundamental model on which some other process expects to operate.

Although some, perhaps many, applications work with concrete object models that are not identical to the Infoset, it's still useful to describe the processing model in terms of the standard Infoset.

2.1. Infoset Composition

On the surface, the model we describe here, where processes consume and produce infosets, seems to imply that infoset composition is well-defined. In fact, that is not the case. Some infoset properties introduce dependencies between information items:

Schema validity properties introduce dependencies on descendants.
In-scope namespaces may introduce dependencies (at least implicitly) on ancestors.
Markers, as have previously existed, introduce dependencies among siblings.

These dependencies make it difficult to perform arbitrary composition. We believe that this issue deserves immediate consideration.

3. Use Cases

We motivate the rest of our discussion by considering a few simple use cases.

3.1. Simple Business Transactions

Two parties agree to conduct business electronically. They will exchange purchase orders, invoices, and other business documents using some appropriate transport protocol. Before responding to a request, each party wishes to validate the request against a known schema so that errors do not result in mismanaged funds.