XML Processing Model Requirements and Use Cases

1 Introduction

A large and growing set of specifications describe processes operating on XML documents. Many applications will depend on the use of more than one of these specifications. Considering how implementations of these specifications might interact raises many issues related to interoperability. This specification contains requirements on an XML Pipeline Language for the description of XML process interactions in order to address these issues. This specification is concerned with the conceptual model of XML process interactions, the language for the description of these interactions, and the inputs and outputs of the overall process. This specification is not generally concerned with the implementations of actual XML processes participating in these interactions.

2 Terminology

[Definition: XML Information Set or "Infoset"]: An XML Information Set or "Infoset" is the name we give to any implementation of a data model for XML which supports the vocabulary as defined by the XML Information Set recommendation [xml-infoset-rec].
[Definition: XML Pipeline]: An XML Pipeline is a conceptualization of a flow of a configuration of steps and their parameters. The XML Pipeline defines a process in terms of order, dependencies, or iteration of steps over XML information sets.
[Definition: XML Pipeline Specification Document]: A pipeline specification document is an XML document that described an XML pipeline.
[Definition: Step]: A step is a specification of how a component is used in a pipeline that includes inputs, outputs, and parameters.
[Definition: Component]: A component is an particular XML technology (e.g. XInclude, XML Schema Validity Assessment, XSLT, XQuery, etc.).
[Definition: Input Document]: An XML infoset that is an input to a XML Pipeline or Step.
[Definition: Output Document]: The result of processing by an XML Pipeline or Step.
[Definition: Parameter]: A parameter is input to a Step or an XML Pipeline in addition to the Input and Output Document(s) that it may access. Parameters are most often simple, scalar values such as integers, booleans, and URIs, and they are most often named, but neither of these conditions is mandatory. That is, we do not (at this time) constrain the range of values a parameter may hold, nor do we (at this time) forbid a Step from accepting anonymous parameters.
[Definition: XML Pipeline Environment]: The technology or platform environment in which the XML Pipeline is used (e.g. command-line, web servers, editors, browsers, embedded applications, etc.).
[Definition: Streaming]: The ability to parse an XML document and pass infoitems between components without building a full document information set.

3 Design Principles

The design principles described in this document are requirements whose compliance with is an overall goal for the specification. It is not necessarily the case that a specific feature meets the requirement. Instead, it should be viewed that the whole set of specifications related to this requirements document meet that overall goal specified in the design principle.

Technology Neutral: Applications should be free to implement XML processing using appropriate technologies such as SAX, DOM, or other infoset representations.
Platform Neutral: Application computing platforms should not be limited to any particular class of platforms such as clients, servers, distributed computing infrastructures, etc. In addition, the resulting specifications should not be swayed by the specifics of use in those platform.
Small and Simple: The language should be as small and simple as practical. It should be "small" in the sense that simple processing should be able to stated in a compact way and "simple" in the sense the specification of more complex processing steps do not require arduous specification steps in the XML Pipeline Specification Document.
Infoset Processing: At a minimum, an XML document is represented and manipulated as an XML Information Set. The use of supersets, augmented information sets, or data models that can be represented or conceptualized as information sets should be allowed, and in some instances, encouraged (e.g. for the XPath 2.0 Data Model).
Straightforward Core Implementation: It should be relatively easy to implement a conforming implementation of the language but it should also be possible to build a sophisticated implementation that implements its own optimizations and integrates with other technologies.
Address Practical Interoperability: An XML Pipeline must be able to be exchanged between different software systems with a minimum expectation of the same result for the pipeline given that the XML Pipeline Environment is the same. A reasonable resolution to platform differences for binding or serialization of resulting infosets should be expected to be address by this specification or by re-use of existing specifications.
Validation of XML Pipeline Documents by a Schema: The XML Pipeline Specification Document should be able to be validated by both W3C XML Schema and RelaxNG.
Reuse and Support for Existing Specifications: XML Pipelines need to support existing XML specifications and reuse common design patterns from within them. In addition, there must be support for the use of future specifications as much as possible.
Arbitrary Components: The specification should allow use any component technology that can consume or produce XML Information Sets.
Control of Inputs and Outputs: An XML Pipeline must allow control over specifying both the inputs and outputs of any process within the pipeline. This applies to the inputs and outputs of both the XML Pipeline and its containing steps. It should also allow for the case where there might be multiple inputs and outputs.
Control of Flow and Errors: An XML Pipeline must allow control the explicit and implicit handling of the flow of documents between steps. When errors occur, these must be able to be handled explicitly to allow alternate courses of action within the XML Pipeline.

4 Requirements

4.1 Standard Names for Component Inventory [req-standard-names]

The XML Pipeline Specification Document must have standard names for components that correspond, but not limited to, the following specifications [xml-core-wg]:

XML Base
XInclude
XSLT 1.0/2.0
XSL FO
XML Schema
XQuery
RelaxNG

4.2 Allow Defining New Components and Steps [req-new-components-steps]

An XML Pipeline must allow applications to define and share new steps that use new or existing components. [xml-core-wg]

4.3 Minimal Component Support for Interoperability [req-minimal-components]

There must be a minimal inventory of components defined by the specification that are required to be supported to facilitate interoperability of XML Pipelines.

4.4 Allow Pipeline Composition [req-allow-composition]

Mechanisms for XML Pipeline composition for re-use or re-purposing must be provided within the XML Pipeline Specification Document.

4.5 Iteration of Documents and Elements [req-iteration]

XML Pipelines should allow iteration of a specific set of steps over a collection of documents and or elements within a document.

4.6 Conditional Processing of Inputs [req-conditional-processing]

To allow run-time selection of steps, XML Pipelines should provide mechanisms for conditional processing of documents or elements within documents based on expression evaluation. [xml-core-wg]

4.7 Error Handling and Fall-back [req-error-handling-fallback]

XML Pipelines must provide mechanisms for addressing error handling and fall-back behaviors. [xml-core-wg]

4.8 Support for the XPath 2.0 Data Model [req-xdm]

XML Pipelines must support the XPath 2.0 Data Model to allow support for XPath 2.0, XSLT 2.0, and XQuery as steps.

Note:

At this point, there is no consensus in the working group that minimal conforming implementations are required to support the XPath 2.0 Data Model.

4.9 Allow Optimization [req-allow-optimization]

An XML Pipeline should not inhibit a sophisticated implementation from performing parallel operations, lazy or greedy processing, and other optimizations. [xml-core-wg]

4.10 Streaming XML Pipelines [req-streaming-pipes]

An XML Pipeline should allow for the existence of streaming pipelines in certain instances as an optional optimization. [xml-core-wg]

5 Use cases

This section contains a set of use cases that support our requirements and will inform our design. While there is a want to address all the use cases listed in this document, in the end, the first version of those specifications may not solve all the following use cases. Those unsolved use cases may be address in future versions of those specifications.

To aid navigation, the requirements can be mapped to the use cases of this section as follows:

Requirement	Use Cases
4.9 Allow Optimization	5.29 Large-Document Subtree Iteration, 5.30 Adding Navigation to an Arbitrarily Large Document
4.10 Streaming XML Pipelines	5.29 Large-Document Subtree Iteration, 5.30 Adding Navigation to an Arbitrarily Large Document
4.2 Allow Defining New Components and Steps	5.9 Run a Custom Program, 5.27 Integrate Computation Components (MathML)
4.7 Error Handling and Fall-back	5.32 No Fallback for XQuery Causes Error, 5.31 Fallback to Choice of XSLT Processor
4.6 Conditional Processing of Inputs	5.21 Content-Dependent Transformations, 5.30 Adding Navigation to an Arbitrarily Large Document
4.1 Standard Names for Component Inventory	5.3 Parse/Validate/Transform, 5.2 XInclude Processing
4.3 Minimal Component Support for Interoperability	5.3 Parse/Validate/Transform, 5.2 XInclude Processing
4.4 Allow Pipeline Composition	5.23 Response to XML-RPC Request, 5.24 Database Import/Ingestion, 5.15 Parse and/or Serialize RSS descriptions
4.5 Iteration of Documents and Elements	5.15 Parse and/or Serialize RSS descriptions, 5.6 Multiple-file Command-line Document Generation, 5.11 Make Absolute URLs, 5.24 Database Import/Ingestion
4.8 Support for the XPath 2.0 Data Model	5.16 XQuery and XSLT 2.0 Collections

Note:

The above table is known to be incomplete and will be completed in a later draft.

5.6 Multiple-file Command-line Document Generation [use-case-multiple-command-line]

Read a list of source documents.
For each document in the list:
Alternatively, aggregate the resulting documents and serialize a single result.

(source: Erik Bruchez)

Process an XML document through XInclude.
Transform the result with XSLT using a fixed transformation.
Digitally sign the result with XML Signatures.

(source: Henry Thompson)

5.11 Make Absolute URLs [use-case-make-absolute-urls]

Process an XML document through XInclude.
Remove any xml:base attributes anywhere in the resulting document.
Schema validate the document with a fixed schema.
For all elements or attributes whose type is xs:anyURI, resolve the value against the base URI to create an absolute URI. Replace the value in the document with the resulting absolute URI.

This example assumes preservation of infoset ([base URI]) and PSVI ([type definition]) properties from step to step. Also, there is no way to reorder these steps as the schema doesn't accept xml:base attributes but the expansion requires xs:anyURI typed values.

(source: Henry Thompson)

5.12 A Simple Transformation Service [use-case-simple-transform-service]

Extract XML document (XForms instance) from an HTTP request body
Execute XSLT transformation on that document.
Call a persistence service with resulting document
Return the XML document from persistence service (new XForms instance) as the HTTP response body.

(source: Erik Bruchez)

5.13 Service Request/Response Handling on a Handheld [use-case-handheld-service]

Allow an application on a handheld device to construct a pipeline, send the pipeline and some data to the server, allow the server to process the pipeline and send the result back.

(source: [xml-core-wg])

5.14 Interact with Web Service (Tide Information) [use-case-web-service]

Parse the incoming XML request.
Construct a URL to a REST-style web service at the NOAA (see website).
Parse the resulting invalid HTML document with by translating and fixing the HTML to make it XHTML (e.g. use TagSoup or tidy).
Extract the tide information from a plain-text table of data from document by applying a regular expression and creating markup from the matches.
Use XQuery to select the high and low tides.
Formulate an XML response from that tide information.

(source: Alex Milowski)