W3C

XProc: An XML Pipeline Language

W3C Editor's Draft 28 February 2007

This Version:
http://www.w3.org/XML/XProc/docs/ED-xproc-20070228/
Latest Version:
http://www.w3.org/XML/XProc/docs/langspec.html
Previous versions:
http://www.w3.org/TR/2006/WD-xproc-20061117/ http://www.w3.org/TR/2006/WD-xproc-20060928/
Editors:
Norman Walsh, Sun Microsystems, Inc.
Alex Milowski, Invited expert

This document is also available in these non-normative formats: XML


Abstract

This specification describes the syntax and semantics of XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be performed on one or more XML documents, producing one or more XML documents as output.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was produced by the XML Processing Model Working Group which is part of the XML Activity. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is a public Working Draft. While some details of the design remain incomplete, the Working Group has chosen to publish a new draft in order to show the direction we are heading and to encourage feedback from potential users.

Please send comments about this document to public-xml-processing-model-comments@w3.org (public archives are available).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

Introduction
Pipeline Concepts
2.1 Steps
2.2 Inputs and Outputs
2.3 Parameters
2.4 Connections
2.5 Environment
Steps
3.1 Pipeline
3.2 For-Each
3.3 Viewport
3.4 Choose
3.5 Group
3.6 Try/Catch
3.7 Other Steps
Syntax Overview
4.1 Scoping of Names
4.2 Global Attributes
4.3 Associating Documents with Ports
4.4 Ignored namespaces
4.5 Extension attributes
4.6 Extension elements
Step Vocabulary
5.1 p:pipeline Element
5.1.1 Examples
5.2 p:for-each Element
5.2.1 Examples
5.3 p:viewport Element
5.3.1 Examples
5.4 p:choose/p:when/p:otherwise Elements
5.4.1 Examples
5.5 p:group Element
5.5.1 Examples
5.6 p:try/p:catch Elements
5.6.1 Examples
5.7 Other Steps
Other pipeline elements
6.1 p:input Element
6.2 p:iteration-source Element
6.3 p:viewport-source Element
6.4 p:xpath-context Element
6.5 p:output Element
6.6 p:parameter Element
6.6.1 Declaring Parameters
6.6.2 Using Parameters
6.6.3 Assigning Values to Parameters
6.7 p:import-parameter Element
6.8 p:declare-step Element
6.9 p:pipeline-library Element
6.10 p:import Element
6.11 p:pipe Element
6.12 p:inline Element
6.13 p:document Element
Errors
7.1 Static Errors
7.2 Dynamic Errors

Appendices

Standard Component Library
Required Components
1.1 Identity
1.2 XSLT
1.3 XInclude
1.4 Serialize
1.5 Parse
1.6 Load
1.7 Store
Optional Components
Micro-Operations Components
3.1 Rename
3.2 Wrap
3.3 Insert
3.4 Set-attributes
Component Declarations
References
Glossary
Schemas
The Error Vocabulary
Examples
Pipeline for Figure 1
Pipeline for Figure 2

1 Introduction

An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.

A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps include steps within themselves.

This specification defines a standard library, Appendix A, Standard Component Library, of steps. Pipeline implementations may support additional types of steps as well.

Figure 1, “A simple, linear XInclude/Validate pipeline” is a graphical representation of a simple pipeline that performs XInclude processing and validation on a document.

A simple, linear XInclude/Validate pipeline
Figure 1. A simple, linear XInclude/Validate pipeline

This is a pipeline that consists of two atomic steps, XInclude and Validate. The pipeline itself has two inputs, “Document” and “Schema”. How these inputs are connected to XML documents outside the pipeline is implementation-defined. The XInclude step reads the pipeline input “Document” and produces a result document. The Validate step reads the pipeline input “Schema” and the output from the XInclude step and produces a result document. The result of the validation, “Result Document”, is the result of the pipeline. How pipeline outputs are connected to XML documents outside the pipeline is implementation defined.

The pipeline document for this pipeline is shown in Section 1, “Pipeline for Figure 1”.

Figure 2, “A validate and transform pipeline” is a more complex example: it performs schema validation with an appropriate schema and then styles the validated document.

A validate and transform pipeline
Figure 2. A validate and transform pipeline

The heart of this example is the conditional. The “choose” step evaluates an XPath expression over a test document. Based on the result of that expression, one or another branch is evaluated. In this example, each branch consists of a single validate step.

The pipeline document for this pipeline is shown in Section 2, “Pipeline for Figure 2”.

2 Pipeline Concepts

[Definition: A pipeline is a set of steps connected together, with outputs flowing into inputs, without any loops (no step can read its own output, directly or indirectly).] A pipeline is itself a step and must satisfy the constraints on steps.

The result of evaluating a pipeline is the result of evaluating the steps that it contains, in the order determined by the connections between them. A pipeline must behave as if it evaluated each step each time it occurs. Unless otherwise indicated, implementations must not assume that steps are functional (that is, that their outputs depend only on their explicit inputs and parameters) or side-effect free.

2.1 Steps

[Definition: A step is the basic computational unit of a pipeline. Steps are either atomic or compound.] [Definition: An atomic step is a step that performs a unit of XML processing, such as XInclude or transformation.] Atomic steps can perform arbitrary amounts of computation but they are indivisible. Atomic steps carry out fundamental XML operations. An XSLT step, for example, performs XSLT processing; an XML Schema Validation step validates one input with respect to some set of XML Schemas, etc.

Compound steps, on the other hand, control and organize the flow of documents through a pipeline, reconstructing familiar programming language functionality such as conditionals, iterators and exception handling. They contain other steps, whose evaluation they control.

[Definition: A compound step is a step that contains additional steps. That is, a compound step differs from an atomic step in that its semantics are at least partially determined by the steps that it contains.]

Every compound step contains zero or more steps. [Definition: The steps that occur directly inside a compound step are called contained steps.] [Definition: A compound step which immediately contains another step is called its container.]

[Definition: The steps (and the connections between them) within a container form a subpipeline.] A compound step can contain one or more subpipelines and it determines how, and which, if any, of its subpipelines are evaluated.

Steps have “ports” into which inputs and outputs are connected. Each step has a number of input ports and a number of output ports, all with unique names. A step can have zero input ports and/or zero output ports. (All steps have an implicit standard output port for reporting errors that must not be declared.)

Steps have any number of parameters, all with unique names. A step can have zero parameters.

2.2 Inputs and Outputs

Although some steps can read and write non-XML resources, what flows between steps through input ports and output ports are exclusively XML documents or sequences of XML documents. Each XML document (or document in a sequence) must conceptually be an [Infoset] with a Document Information Item at its root. The inputs and outputs can be implemented as sequences of characters, events, or object models, or any other representation the implementation chooses.

It is a dynamic error if a non-XML resource is produced on a step output or arrives on a step input.

Editorial Note

What about the cases where it's impractical to test for this error?

An implementation may make it possible for a step to produce non-XML output (through channels other than a named output port)—for example, writing a PDF document—but that output cannot flow through the pipeline. Similarly, one can imagine a step that takes no pipeline inputs, reads a non-XML file from a URI, and produces an XML output. But the non-XML file cannot arrive on an input port to a step or pipeline.

Each step declares its input and output ports. [Definition: The input ports declared on a step are its declared inputs.] [Definition: The output ports declared on a step are its declared outputs.]

All of the declared inputs of a step must be connected. Inputs may be connected to:

  • The output port of some other step.

  • A fixed, inline document or sequence of documents.

  • A document read from a URI.

  • The inputs declared on the top-level pipeline step, and only on that step, may be connected to documents outside the pipeline by an implementation-dependent mechanism.

If the connection for an input port is not specified then it is connected to the default input port in its environment. It is a static error if a step has an input port which is not connected and the environment does not have a defined default input port.

The declared outputs of a step can be connected to:

  • The output port of some other contained step.

  • A fixed, inline document or sequence of documents.

  • A document read from a URI.

Unconnected output ports are allowed; any documents produced on those ports are simply discarded.

[Definition: The signature of a step is the set of inputs, outputs, and parameters that it is declared to accept.] Each atomic step (e.g. XSLT or XInclude) has a fixed signature, declared globally or built-in, which all its instances share, whereas each compound step has its own signature declared locally.

[Definition: A step matches its signature if and only if it specifies an input for each declared input and it specifies no inputs that are not declared; it specifies a parameter for each parameter that is declared to be required; and it specifies no parameters that are not declared.] In other words, every input and required parameter must be specified and only inputs, outputs, and parameters that are declared may be specified. Outputs and optional parameters do not have to be specified.

Each input and output is declared to accept or produce either a single document or a sequence of documents. It is not a static error to connect a port that is declared to produce a sequence of documents to a port that is declared to accept only a single document. It is, however, a dynamic error if the former step actually produces more than a single document at run time.

Steps may also produce error, warning, and informative messages. These messages appear on a special “error output” port defined (only) in the catch clause of a try/catch.

2.3 Parameters

[Definition: A parameter is a name/value pair.] The name of a parameter must be an expanded name. The value of a parameter must be a string. If a document, node, or other value is given, its [XPath 1.0] string value is computed and that string is used.

[Definition: The parameters declared on a step are its declared parameters.]

2.4 Connections

Steps are connected together by their input ports and output ports. It is a static error if there are any loops in the connections between steps: no step can be connected to itself nor can there be any sequence of connections through other steps that leads back to itself.

2.5 Environment

[Definition: The environment of a step is the static information available to that step.]

The environment consists of:

  1. A set of readable ports. [Definition: The readable ports are the step name/output port name pairs that are visible to the step.] Inputs and outputs can only be connected to readable ports.

  2. A set of in-scope parameters. [Definition: The in-scope parameters are the set of parameters that can be read by a step.]

  3. A default input port. [Definition: The default input port, which may be undefined, is a specific step name/port name pair from the set of readable ports.]

  4. A default output port. [Definition: The default output port, which may be undefined, is a specific step name/port name pair from the set of readable ports.]

  5. A set of ignored namespaces. [Definition: The set of ignored namespaces are the namespaces which do not identify steps.]

[Definition: The empty environment contains no readable ports, no in-scope parameters, an undefined default input port, and a set of ignored namespaces that consists only of the XHTML namespace, http://www.w3.org/1999/xhtml.]

3 Steps

This section describes the core steps of XProc.

Every compound step in a pipeline has five parts: a set of inputs, a set of outputs, a set of parameters, a set of contained steps, and an environment.

Except where otherwise noted, a compound step can have an arbitrary number of inputs, outputs, parameters, and contained steps.

3.1 Pipeline

A pipeline encapsulates the behavior of a subpipeline.

Viewed from the outside, a pipeline is a black box which performs some calculation on its inputs and produces its outputs. From the pipeline author's perspective, the computation performed by the pipeline is described in terms of contained steps which read the pipeline's inputs and produce the pipeline's outputs.

For example, a pipeline might accept a document and a stylesheet as input; perform XInclude, validation, and transformation; and produce a sequence of formatted documents as its output.

There is one additional constraint imposed on pipelines: a pipeline must not itself be a contained step.

3.2 For-Each

A for-each step processes a sequence of documents, applying its subpipeline to each document in turn.

A for-each has a single input port through which it receives the document or sequence of documents over which iteration is performed.

When a pipeline needs to process a sequence of documents using a step that only accepts a single document, the for-each construct can be used as a wrapper around the step that accepts only a single document. The for-each will apply that step to each document in the sequence in turn.

The result of the for-each is a sequence of documents produced by processing each individual document in the input sequence. If the subpipeline is connected to one or more output ports on the for-each, what appears on each of those ports is the sequence of documents produced by each iteration of the loop.

For example, a for-each might accept a sequence of DocBook chapters as its input, process each chapter in turn with XSLT, a step that accepts only a single input document, and produce a sequence of formatted chapters as its output.

3.3 Viewport

A viewport step processes a single document, applying its subpipeline to one or more subsections of the document.

A viewport has a single input port over which it receives a single document.

The result of the viewport is a copy of the original document with the selected subsections replaced by the results of applying the subpipeline to them.

For example, a viewport might accept an XHTML document as its input, apply encryption to selected div elements within that document, and return an XHTML document that is the same as the original except that each selected div has been replaced by its encrypted result.

3.4 Choose

A choose step selects exactly one of a list of alternative subpipelines based on the evaluation of [XPath 1.0] expressions.

A choose has no inputs. It contains an arbitrary number of alternative subpipelines, exactly one of which will be evaluated.

The list of alternative subpipelines consists of zero or more subpipelines, each guarded by an XPath expression (with an associated context), followed optionally by a single default subpipeline.

The choose considers each subpipeline in turn and selects the first (and only the first) subpipeline for which the guard expression evaluates to true in its context. If there are no subpipelines for which the expression evaluates to true, the default subpipeline, if it was specified, is selected.

After a subpipeline is selected, it is evaluated as if only it had been present.

The result of the choose is the result of the selected subpipeline.

For example, a choose might test a schema and apply XML Schema validation to an input document if the schema is an XML Schema document, apply RELAX NG validation if the schema is a RELAX NG grammar, or perform no validation otherwise.

In order to ensure that the result of the choose is consistent irrespective of the subpipeline chosen, each subpipeline must declare the same number outputs with the same names. It is a static error if two subpipelines in a choose declare different outputs.

It is a dynamic error if no subpipeline is selected by the choose and no default is provided.

3.5 Group

A group step encapsulates the behavior of its subpipeline.

A group has no inputs.

A group is a convenience wrapper for a collection of steps. The result of a group is the result of its subpipeline.

3.6 Try/Catch

A try step isolates a subpipeline, preventing any errors that arise within it from being exposed to the rest of the pipeline.

A try has no inputs. It contains exactly two subpipelines: an initial subpipeline and a recovery (or “catch”) subpipeline.

The try step evaluates the initial subpipeline and, if no errors occur, the results of that pipeline are the results of the step. However, if any errors occur, it abandons the first subpipeline, discarding any output that it might have generated, and evaluates the recovery subpipeline.

Editorial Note

In the context of try/catch, “errors” refers to step failure which is not the same as a static or dynamic error in the pipeline itself. (Though perhaps it will be possible to recover from some dynamic errors.) The notion of step failure as a distinct class of error needs to be described.

If the recovery subpipeline is evaluated, the results of the recovery subpipeline are the results of the try step. If the recovery subpipeline is evaluated and a step within that subpipeline fails, the try fails.

For example, a pipeline might attempt to process a document by dispatching it to some web service. If the web service succeeds, then those results are passed to the rest of the pipeline. However, if the web service cannot be contacted or reports an error, the catch step can provide some sort of default for the rest of the pipeline.

In order to ensure that the result of the try is consistent irrespective of whether the initial subpipeline provides its output or the recovery subpipeline does, both subpipelines must declare the same number of outputs with the same names. It is a static error if the two subpipelines declare different outputs.

3.7 Other Steps

A pipeline document may contain additional types of steps. These can be implementation-defined or can be defined through some implementation-dependent extension mechanism. It is a static error if a pipeline contains a step that is not recognized by the processor.

The number of inputs, outputs, parameters, and contained steps that are possible on such a step are implementation-defined.

4 Syntax Overview

This section describes the normative XML syntax of XProc. This syntax is sufficient to represent all the aspects of a pipeline, as set out in the preceding sections.

The namespace of the XProc XML vocabulary described by this specification is http://www.w3.org/2007/03/xproc.

Elements in a pipeline document represent the pipeline, the steps it contains, the connections between those steps, the steps and connections contained within them, and so on. Each step is represented by an element; a combination of elements and attributes specify how the inputs and outputs of each step are connected and how parameters are passed.

Conceptually, we can speak of steps as objects that have inputs and outputs, that are connected together and which may contain additional steps. Syntactically, we need a mechanism for specifying these relationships.

Containment is represented naturally using nesting of XML elements. If a particular element identifies a compound step then the step elements that are its immediate children form its subpipeline.

The connections between steps are expressed using names and references to those names.

Five kinds of things are named in XProc:

  1. Steps types,

  2. Steps,

  3. Input ports,

  4. Output ports, and

  5. Parameters

4.1 Scoping of Names

The scope of the names of step types is the pipeline. Each pipeline processor has some number of built in step types and may declare (directly, or by reference to an external library) additional step types.

The scope of the names of the steps themselves is determined by the environment of each step. In general, the name of a step, the names of its sibling steps, the names of any steps that it contains directly, the names of its ancestors; and the names of its ancestor's siblings are all in the same scope. All in-scope steps must have unique names: it is a static error if two steps with the same name appear in the same scope.

The scope of an input or output port name is the step on which it is defined. The names of all the ports on any step must be unique.

Taken together, these uniqueness constraints guarantee that the combination of a step name and a port name uniquely identifies exactly one port on exactly one in-scope step.

The scope of parameter names is essentially the same as the scope of step names, with the following caveat. Whereas step names must be unique, parameter names may be repeated. The declaration of a parameter on a step shadows any declaration that may already be in-scope.

4.2 Global Attributes

The following attributes may appear on any element in a pipeline:

  • The attribute xml:id with the semantics outlined in [xml:id].

  • The attribute xml:base with the semantics outlined in [XML Base].

The following attributes may appear on any step element:

4.3 Associating Documents with Ports

[Definition: A binding associates an input or output port with some data source.] A document or a sequence of documents can be bound to a port in three ways: by source, by URI, or by providing it inline. Each of these mechanisms is supported on the p:input, p:output, p:xpath-context, p:iteration-source, and p:viewport-source elements.

Specified by URI

[Definition: A document is specified by URI if it is referenced with a URI.] The href attribute on the p:document element is used to refer to documents by URI.

In this example, the input to the Identity step named “otherstep” comes from “http://example.com/input.xml”.

<p:identity name="otherstep">
  <p:input port="source">
    <p:document href="http://example.com/input.xml"/>
  </p:input>
</p:identity>

</p:pipeline>

It is a dynamic error if the processor attempts to retrieve the specified URI and fails. (For example, if the resource does not exist or is not accessible with the user's authentication credentials.)

Specified by source

[Definition: A document is specified by source if it references a specific port on another step.] The step and port attributes on the p:pipe element are used for this purpose. (The step attribute may refer to any kind of step, either a atomic step or a step, its name notwithstanding.)

In this example, the “document” input to the XInclude step named “expand” comes from the “result” port of the step named “otherstep”.

<p:xinclude name="expand">
  <p:input port="source">
    <p:pipe step="otherstep" port="result"/>
  </p:input>
</p:xinclude>

</p:pipeline>

When a pipe is used to bind an input, the specified port must be in the outputs of the current environment. When a pipe is used to bind an output, the specified port must be in the inputs of the environment. It is a static error if the specified port is not available in the necessary environment.

Specified inline

[Definition: An inline document is specified directly in the body of the element that binds it.] The content of the p:inline element is used for this purpose.

In this example, the “stylesheet” input to the XSLT step named “xform” comes from the content of the p:input element itself.

<p:xslt name="xform">
  <p:input port="source">
    <p:pipe step="expand" port="result"/>
  </p:input>
  <p:input port="stylesheet">
    <p:inline>
      <xsl:stylesheet version="1.0">
        ...
      </xsl:stylesheet>
    </p:inline>
  </p:input>
</p:xslt>

Inline documents are considered “quoted”, they are not interpolated or available to the pipeline processor in any way except as documents flowing through the pipeline.

Note that an p:input or p:output element may contain more than one p:pipe, p:document, or p:inline element. If more than one binding is provided, then the specified sequence of documents is made available on that port.

4.4 Ignored namespaces

The element children of a compound step fall into four classes: elements that provide bindings for input and output ports, p:parameter elements that specify parameters, other elements that identify steps that are part of its subpipeline, and extension elements. Extension elements may be used for documentation or to provide additional information for a specific processor.

To determine which elements are extension elements and which are expected to identify steps, a set of ignored namespaces is maintained in the environment of each step.

The ignored namespaces are a set of namespaces which do not identify steps. They are ignored by the processor unless the processor happens to recognize one or more of them as extension elements. The initial set of ignored namespaces contains only a single namespace, the XHTML namespace, http://www.w3.org/1999/xhtml.

Syntactically, a pipeline author can add namespaces to the set of ignored namespaces with the p:ignore-prefixes attribute. This attribute can appear on any element in the pipeline namespace which identifies a step. It is a static error if the attribute appears on any element which does not identify a step.

The value of the p:ignore-prefixes attribute is a sequence of tokens, each of which must be the prefix of an in-scope namespace. It is a static error if any token specified in the ignore-prefixes attribute is not the prefix of an in-scope namespace.

Each of the namespaces identified by the specified prefix is added to the set of ignored namespaces in the environment of the step on which the attribute occurs.

4.5 Extension attributes

[Definition: An element from the XProc namespace may have any attribute not from the XProc namespace, provided that the expanded-QName of the attribute has a non-null namespace URI. These attributes are called extension attributes.] The presence of an extension attribute must not cause the connections between steps to differ from the connections that any other conformant XProc processor would produce. They must not cause the processor to fail to signal an error that a conformant processor is required to signal. This means that an extension attribute must not change the effect of any XProc element except to the extent that the effect is implementation-defined or implementation-dependent.

A processor which encounters an extension attribute that it does not recognize must behave as if the attribute was not present.

4.6 Extension elements

The presence of an extension element must not cause the connections between steps to differ from the connections that any other conformant XProc processor would produce. They must not cause the processor to fail to signal an error that a conformant processor is required to signal. This means that an extension element must not change the effect of any XProc element except to the extent that the effect is implementation-defined or implementation-dependent.

There are three contexts in which an extension element might occur:

  1. In an inline document. All elements in an inline document are considered quoted; no extension element can occur.

  2. In a subpipeline. In a subpipeline, any element in a namespace that is in the set of ignored namespaces is an extension element. Every other element identifies a step.

  3. In any other context, any element that is not in the pipeline namespace is an extension element.

5 Step Vocabulary

This section describes in detail the XML vocabulary that represents a pipeline.

5.1 p:pipeline Element

A p:pipeline represents a pipeline. Its children declare the inputs, outputs, and parameters that the pipeline exposes and identify the steps in its subpipeline.

<p:pipeline
  name? = QName
  p:ignore-prefixes? = prefix list>
   (p:input*,
    p:output*,
    p:parameter*,
    p:import*,
    p:declare-step*,
    subpipeline)
</p:pipeline>

If specified, the name must be unique across all available pipelines. If a p:pipeline occurs as the child of a p:pipeline-library element, it must be named.

A pipeline can declare additional steps (e.g., ones that are provided by a particular implementation or in some implementation-defined way) and import steps from libraries.

The environment of a pipeline step is the empty environment modified as follows:

  • All of the declared inputs of the step are added to the readable ports in the environment.

  • The union of all the declared outputs of the contained steps are added to the readable ports in the environment.

  • All of the declared parameters of the component are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

The environment inherited by its contained steps is the environment of the pipeline.

If there is no binding for any of the declared outputs of the pipeline, then those outputs are bound to the default output port. It is a static error if an output is bound to the default output port and the default output port is undefined.

5.1.1 Examples
Example 1. A Sample Pipeline Document
<p:pipeline name="buildspec">
  <p:input port="source"/>
  <p:input port="stylesheet"/>
  <p:output port="result"/>
  <p:parameter name="validate"/>
  …
</p:pipeline>

5.2 p:for-each Element

A p:for-each represents a for-each.

<p:for-each
  name = NCName
  select? = xpath expression
  p:ignore-prefixes? = prefix list>
   (p:iteration-source?,
    p:output*,
    p:parameter*,
    subpipeline)
</p:for-each>

The iteration-source is an anonymous input: its binding provides a sequence of documents to the for-each step. If no iteration sequence is explicitly provided, then the iteration source is read from the default input port.

A portion of each input document can be selected using the select attribute. If no selection is specified, the document node of each document is selected.

Each subtree selected by the p:for-each from each of the inputs that appear on the iteration source is wrapped in a document node and provided to the subpipeline.

The processor provides each document, one at a time, to the subpipeline represented by the children of the p:for-each on a port named current.

For each declared output, the processor collects all the documents that are produced for that output from all the iterations, in order, into a sequence. The result of the p:for-each on that output is that sequence of documents.

The environment of a for-each is its inherited environment modified as follows:

  • The union of all the declared outputs of the contained steps are added to the readable ports in the environment.

  • All of the declared parameters of the component are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If there is a preceding sibling step element and that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default input port.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

The environment inherited by its contained steps is the environment of the for-each modified as follows: the port named “current” is the default input port.

If there is no binding for any of the declared outputs of the for-each, then those outputs are bound to the default output port. It is a static error if an output is bound to the default output port and the default output port is undefined.

5.2.1 Examples

Example 2, “A Sample For-Each” shows an example of a p:for-each in action.

Example 2. A Sample For-Each
<p:for-each name="chapters" select="//chapter">
  <p:iteration-source>
    <p:document href="http://example.org/docbook.xml"/>
  </p:iteration-source>
  <p:output port="html">
    <p:pipe step="xform-to-html" port="result"/>
  </p:output>
  <p:output port="fo">
    <p:pipe step="xform-to-fo" port="result"/>
  </p:output>
  <p:xslt name="xform-to-fo">
    <p:input port="source">
      <p:pipe step="chapters" port="current"/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="fo/docbook.xsl"/>
    </p:input>
  </p:xslt>
  <p:xslt name="xform-to-html">
    <p:input port="source">
      <p:pipe step="chapters" port="current"/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="html/docbook.xsl"/>
    </p:input>
  </p:xslt>
</p:for-each>

The //chapter elements of the DocBook document are selected. Each chapter is transformed into HTML and XSL Formatting Objects using an XSLT step. The resulting HTML and FO documents are aggregated together and appear on the html and fo ports, respectively, of the chapters step itself.

It is a static error if any declared output does not specify a binding.

5.3 p:viewport Element

A p:viewport represents a viewport.

<p:viewport
  name = NCName
  match = xpath expression
  p:ignore-prefixes? = prefix list>
   (p:viewport-source?,
    p:output,
    p:parameter*,
    subpipeline)
</p:viewport>

The viewport-source is an anonymous input: its binding provides a single document to the viewport step. If no document is explicitly provided, then the viewport source is read from the default input port.

The match attribute specifies an [XPath 1.0] expression that is a Pattern in [XSLT 1.0]. Each matching node in the source document is wrapped in a document node and provided to the viewport's subpipeline. After a node has been matched, its descendants are not considered for further matching. In other words, a node is passed at most once to the subpipeline.

The processor provides each document, one at a time, to the subpipeline represented by the children of the p:viewport on a port named current.

What appears on the output from the p:viewport will be a copy of the input document where each matching node is replaced by the result of applying the subpipeline to the subtree rooted at that node.

It is a dynamic error if the viewport source is a sequence of more than one document or if the output from any iteration is a sequence of more than one document.

The environment of a viewport is its inherited modified as follows:

  • The union of all the declared outputs of the contained steps are added to the readable ports in the environment.

  • All of the declared parameters of the component are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If there is a preceding sibling step element and that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default input port.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

The environment inherited by its contained steps is the environment of the viewport modified as follows: the port named “current” is the default input port.

If there is no binding for any of the declared outputs of the viewport, then those outputs are bound to the default output port. It is a static error if an output is bound to the default output port and the default output port is undefined.

5.3.1 Examples

Example 3, “A Sample Viewport” shows an example of a p:viewport in action.

Example 3. A Sample Viewport
<p:viewport name="encdivs" match="h:div[@class='enc']">
  <p:viewport-source>
    <p:pipe name="step" port="port"/>
  </p:viewport-source>
  <p:output port="result">
    <p:pipe name="encrypt" port="result"/>
  </p:output>
  <p:encrypt-document name="encrypt">
    <p:input port="source">
      <p:pipe name="encdivs" port="current"/>
    </p:input>
  </p:encrypt-document>
</p:viewport>

The nodes which match h:div[@class='enc'] (according to the rules of [XSLT 1.0]) in the input document are selected. Each selected h:div is encrypted and the resulting encrypted version replaces the original h:div. The result of the whole step is a copy of the input document with each selected h:div encrypted.

It is a static error if either the source or result ports do not specify a binding.

5.4 p:choose/p:when/p:otherwise Elements

A p:choose represents a choose.

<p:choose
  name = NCName
  p:ignore-prefixes? = prefix list>
   (p:xpath-context?,
    p:when*,
    p:otherwise?)
</p:choose>

The p:choose can specify the context node against which the [XPath 1.0] expressions that occur on each branch are evaluated. The context node is specified as a binding for the xpath-context. If no binding is provided, the default xpath-context is the document on the default input port.

It is a dynamic error if the xpath-context is bound to a sequence of documents.

The environment of a choose is its inherited environment modified as follows:

  • All of the declared parameters of the step are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If there is a preceding sibling step element and that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default input port.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

Each conditional subpipeline is represented by a p:when element.

<p:when
  test = expression
  p:ignore-prefixes? = prefix list>
   (p:xpath-context?,
    p:output*,
    p:parameter*,
    subpipeline)
</p:when>

Each p:when branch of the p:choose has a test attribute which must contain an [XPath 1.0] expression. That XPath expression's effective boolean value is the guard expression for the subpipeline contained within that p:when.

The p:when can specify a context node against which its test expression is to be evaluated. That context node is specified as a binding for the xpath-context. If no context is specified on the p:when, the context of the p:choose is used. It is a static error if no context is specified in either place and the default input port is undefined.

The default branch is represented by a p:otherwise element.

<p:otherwise>
   (p:output*,
    p:parameter*,
    subpipeline)
</p:otherwise>

All of the p:when branches and the p:otherwise must declare the same number of output ports with the same names. It is a static error if they do not.

The environment of the selected subpipeline is the environment of the choose modified as follows:

If there is no binding for any of the declared outputs of the selected subpipeline, then those outputs are bound to the default output port of the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

5.4.1 Examples

Example 4, “A Sample Choose” shows an example of a p:choose in action.

Example 4. A Sample Choose
<p:choose name="version">
  <p:xpath-context>
    <p:pipe step="prevstep" port="result"/>
  </p:xpath-context>

  <p:when test="/*[@version = 2]">
    <p:output port="result">
      <p:pipe step="v2valid" port="result"/>
    </p:output>

    <p:validate name="v2valid">
      <p:input port="source">
        <p:pipe step="prevstep" port="result"/>
      </p:input>
      <p:input port="schema">
        <p:document href="v2schema.xsd"/>
      </p:input>
    </p:validate>
  </p:when>

  <p:when test="/*[@version = 1]">
    <p:output port="result">
      <p:pipe step="v1valid" port="result"/>
    </p:output>

    <p:validate name="v2valid">
      <p:input port="source">
        <p:pipe step="prevstep" port="result"/>
      </p:input>
      <p:input port="schema">
        <p:document href="v1schema.xsd"/>
      </p:input>
    </p:validate>
  </p:when>

  <p:otherwise>
    <p:output port="result">
      <p:pipe step="ident" port="result"/>
    </p:output>

    <p:identity name="ident">
      <p:input port="source">
        <p:pipe step="prevstep" port="result"/>
      </p:input>
    </p:identity>
  </p:otherwise>
</p:choose>

5.5 p:group Element

A p:group is a wrapper for a subpipeline.

<p:group
  name = NCName
  p:ignore-prefixes? = prefix list>
   (p:output*,
    p:parameter*,
    subpipeline)
</p:group>

The environment of a group is its inherited modified as follows:

  • The union of all the declared outputs of the contained steps are added to the readable ports in the environment.

  • All of the declared parameters of the component are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If there is a preceding sibling step element and that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default input port.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

The environment inherited by its contained steps is the environment of the group.

If there is no binding for any of the declared outputs of the group then those outputs are bound to the default output port of the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

5.5.1 Examples

TBD.

5.6 p:try/p:catch Elements

A p:try represents a try/catch.

<p:try
  name = NCName
  p:ignore-prefixes? = prefix list>
   (p:group,
    p:catch)
</p:try>

Where the p:group represents the initial subpipeline and the recovery (or “catch”) pipeline is identified with a p:catch element.

The environment of the try is its inherited environment modified as follows:

  • All of the declared parameters of the step are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If there is a preceding sibling step element and that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default input port.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

The environment inherited by its initial subpipeline is the environment of the try modified as follows:

The recovery subpipeline of a try is identified with a p:catch:

<p:catch
  name = NCName
  p:ignore-prefixes? = prefix list>
   (p:output*,
    p:parameter*,
    subpipeline)
</p:catch>

Both the p:group and the p:catch must declare the same number of output ports with the same names. It is a static error if they do not.

The environment of the recovery subpipeline is the environment of the try modified as follows:

  • The union of all the declared outputs of the contained steps are added to the readable ports in the environment.

  • All of the declared parameters of the component are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • The port named “#error” is added to the readable ports. All of the error output of the steps that were in the initial subpipeline is exposed on this port.

    Note

    In evaluating the initial subpipeline, failure of one step can cause other steps to fail. In addition, some steps that fail might not produce output on their error ports and some steps that succeeded might produce such output. This pipeline language places no constraints on the order of error messages provided to the recovery subpipeline, nor does it attempt to guarantee that such output will be available in all cases.

    The error documents that appear should conform to Appendix E, The Error Vocabulary.

5.6.1 Examples

TBD.

5.7 Other Steps

Any element in a subpipeline that is not in an ignored namespace identifies a step.

<pfx:otherComponent
  name = NCName
  p:ignore-prefixes? = prefix list>
   (p:input*,
    p:output*,
    p:import-parameter*,
    p:parameter*,
    subpipeline?)
</pfx:otherComponent>

The qualified name of the element identifies the type of step to be instantiated. It is a static error if no step of that type has been imported or declared.

It is a static error if the name is not unique in the current scope or if the specified inputs, outputs, and parameters do not match the signature for steps of that type.

The environment of a step is its inherited environment modified as follows:

  • The union of all the declared outputs of the contained steps are added to the readable ports in the environment. Atomic steps have no contained stages.

  • All of the declared parameters of the step are added to the in-scope parameters in the environment.

  • If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

  • If there is a preceding sibling step element and that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default input port.

  • If the last step element in the subpipeline has exactly one output port, or an output port designated as the default, then that output port becomes the default output port for the container step, otherwise the default output port is undefined.

6 Other pipeline elements

6.1 p:input Element

A p:input identifies input for a step, optionally declaring it, if necessary.

<p:input
  port = QName
  sequence? = yes|no />

The port attribute defines the name of the port. It is a static error to identify two ports with the same name on the same step. It is a static error if the port given does not match the name of an input port specified in the step's declaration.

On compound steps and p:declare-step, an input declaration can indicate if a sequence of documents is allowed to appear on the port. If sequence is specified with the value “yes”, then a sequence is allowed. If sequence is not specified, or has the value “no”, then it is a dynamic error for a sequence of more than one document to appear on the declared port.

The declaration may be accompanied by a binding (or default binding) for the input:

<p:input
  port = QName
  sequence? = yes|no
  select? = xpath expression>
   (p:pipe |
    p:document |
    p:inline)*
</p:input>

If a binding is provided, a select expression may also be provided. The select expression, if specified, applies the specified [XPath 1.0] select expression to the document(s) that are read. Each node that matches is wrapped in a document and provided to the input port. After a node has been matched, its descendants are not considered for further matching; a node is passed at most once as input. In other words,

<p:input port="source">
  <p:document href="http://example.org/input.html"/>
</p:input>

provides a single document, but

<p:input port="source" select="//html:div">
  <p:document href="http://example.org/input.html"/>
</p:input>

provides a sequence of zero or more documents, one for each matching html:div (that is not itself a descendant of an html:div) in http://example.org/input.html.

A select expression can equally be applied to input read from another step. This input:

<p:input port="source" select="//html:div">
  <p:pipe step="origin" port="result"/>
</p:input>

provides a sequence of zero or more documents, one for each matching html:div in the document (or each of the documents) that is read from the portname port of the step named origin.

On a compound step, a p:pipe in a p:input can not access the readable ports of the step's contained steps.

6.2 p:iteration-source Element

A p:iteration-source identifies input to a for-each.

<p:iteration-source
  select? = xpath expression>
   (p:pipe |
    p:document |
    p:inline)*
</p:iteration-source>

The select attribute and binding elements of a p:iteration-source work the same way that they do in a p:input.

6.3 p:viewport-source Element

A p:viewport-source identifies input to a viewport.

<p:viewport-source>
   (p:pipe |
    p:document |
    p:inline)
</p:viewport-source>

Exactly one binding element is allowed and it works the same way that binding elements work in a p:input. No select expression is allowed.

6.4 p:xpath-context Element

A p:xpath-context identifies a context against which an [XPath 1.0] expression will be evaluated for a p:when.

<p:xpath-context>
   (p:pipe |
    p:document |
    p:inline)
</p:xpath-context>

Exactly one binding element is allowed and it works the same way that binding elements work in a p:input. No select expression is allowed.

6.5 p:output Element

A p:output identifies an output port, optionally declaring it, if necessary.

<p:output
  port = QName
  sequence? = yes|no
  default? = yes|no />

The port attribute defines the name of the port. It is a static error to identify two ports with the same name on the same step. It is a static error if the port given does not match the name of an output port specified in the step's declaration.

An output declaration can indicate if a sequence of documents is allowed to appear on the declared port. If sequence is specified with the value “yes”, then a sequence is allowed. If sequence is not specified, or has the value “no”, then it is a dynamic error if the step produces a sequence of more than one document on the declared port.

An output declaration can indicate if it is to be considered the default output for the step. If default is specified with the value “yes”, then the named port will be treated as the default output port. It is a static error to identify two different ports as the default. It is also a static error if the step on which this declaration appears has exactly one output and that output is marked as not being the default. In other words, if any step or step has exactly one output, that output is always the default output.

The declaration may be accompanied by a binding for the output.

<p:output
  port = QName
  sequence? = yes|no
  default? = yes|no>
   (p:pipe |
    p:document |
    p:inline)*
</p:output>

On a compound step, a p:pipe in a p:output can only access the readable ports of the step's contained steps.

6.6 p:parameter Element

The p:parameter element is used both to declare parameters and to establish values for them. When used on a p:declare-step or compound step, p:parameter declares the parameter and may associate a default value with it. Used elsewhere, p:parameter associates a value with the parameter.

6.6.1 Declaring Parameters

Parameters are declared on p:declare-step and compound steps with p:parameter:

<p:parameter
  name = token
  required? = yes | no />

The name attribute must be a QName, a single asterisk (*), or a string of the form *:NCName or NCName:*.

If the name is a QName, the parameter may be declared as required or it may be given a default value. It is a static error to specify that the parameter is required or that it has a default value if the name given is not a QName. It is also a static error to specify that the parameter is both required and has a default value.

If a parameter is required, it is a static error to invoke the step without specifying a value for that parameter.

6.6.2 Using Parameters

Parameters are used on step with p:parameter:

<p:parameter
  name = token />

The parameter must be given a value when it is used.

6.6.3 Assigning Values to Parameters

When a parameter is declared, it may be given a default value. When it is used, it must be given a value.

The value can be specified in two ways: with a select or value attribute.

If a select expression is given, it is evaluated against the document specified in the binding and the [XPath 1.0] string value of the expression becomes the value of the parameter. If no select expression is given, the XPath string value of the document specified in the binding becomes the default value of the parameter. It is a dynamic error if a document sequence is specified.

<p:parameter
  name = QName
  select? = XPath expression>
   (p:pipe |
    p:document |
    p:inline)
</p:parameter>

The select expression may refer to the values of other in-scope parameters by variable reference. It is a static error if the variable reference uses a QName that is not the name of an in-scope parameter or if the reference is circular, either directly or indirectly.

If a value attribute is specified, its content becomes the value of the parameter.

<p:parameter
  name = QName
  value = string />

6.7 p:import-parameter Element

An p:import-parameter provides a set of in-scope parameters to a step.

<p:import-parameter
  name = token />

All in-scope parameters which match the name are made available to the step as if they had been specified with individual p:parameter elements.

The name attribute must be a single asterisk (*), a QName, or a string of the form *:NCName or NCName:*.

6.8 p:declare-step Element

A p:declare-step provides the type and signature of an implementation-dependent type of step. It declares the inputs, outputs, and parameters for all steps of that type.

<p:declare-step
  type = QName>
   (p:input*,
    p:output*,
    p:parameter*)
</p:declare-step>

Editorial Note

We need to make some provision for identifying the implementation of a declared step, even if it's no more than implementation-defined extension attributes. We'll need some sort of mechanism for declaring multiple implementations too.

It is a static error to identify an unrecognized step in a subpipeline. It is not an error to declare such a step, only to use it.

Exactly one input declaration of a p:declare-step may use the name*” to indicate that the step accepts an arbitrary number of inputs.

Exactly one output declaration of a p:declare-step may use the name*” to indicate that the step can produce an arbitrary number of outputs.

6.9 p:pipeline-library Element

A p:pipeline-library is a collection of step declarations and/or pipeline definitions.

<p:pipeline-library>
   (p:import*,
    p:declare-step*,
    p:pipeline*)
</p:pipeline-library>

It is a static error if the import references in a pipeline or pipeline library are circular.

Example 5. A Sample Pipeline Library
<p:pipeline-library>
  <p:declare-step type="my:extension-component">…</p:declare-step>
  <p:pipeline name="xinclude-and-validate">…</p:pipeline>
  <p:pipeline name="validate-and-transform">…</p:pipeline>
  …
</p:pipeline-library>

6.10 p:import Element

An p:import loads a pipeline or pipeline library, making it available in the current environment.

<p:import
  href = URI />

An import statement loads the specified URI and makes any pipelines declared within it available to the current pipeline.

It is a dynamic error if the URI cannot be retrieved or if, once retrieved, it does not point to a p:pipeline-library or p:pipeline. If it points to a p:pipeline, it is a dynamic error if the pipeline does not have a name.

6.11 p:pipe Element

A p:pipe reads from the output port of another step.

<p:pipe
  step = step-name
  port = port-name />

The p:pipe element identifies the output port of another step with the name of the step in the step attribute and the name of the port on that step in the port attribute. It is a static error if the port identified is not in the readable ports of the environment.

6.12 p:inline Element

A p:inline provides a document or a sequence of documents inline.

<p:inline>
   anyElement?
</p:inline>

The content of the p:inline element is wrapped in a document node and passed as input. The base URI of the document is the base URI of the p:inline element.

It is a static error if the content of the p:inline element is not a well-formed XML document.

6.13 p:document Element

A p:document reads an XML document from a URI.

<p:document
  href = URI />

The document identified by the URI in the href attribute is loaded and returned.

It is a dynamic error if the document does not exist, cannot be accessed, or is not a well-formed XML document.

The parser which the p:document element employs must be conformant Namespaces in XML. It must not perform validation. It must not perform any other processing, such as expanding XIncludes.

Use the load step if you need to perform DTD-based validation or if you wish to load documents that are not namespace well-formed.

7 Errors

Errors in a pipeline can be divided into two classes: static errors and dynamic errors.

7.1 Static Errors

[Definition: A static error is one which can be detected before pipeline evaluation is even attempted.] Examples of static errors include cycles, incorrect specification of inputs and outputs, and reference to unknown steps.

Static errors are fatal and must be detected before any steps are evaluated.

7.2 Dynamic Errors

A [Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space).

If a step fails due to a dynamic error, failure propagates upwards until either a try is encountered or the entire pipeline fails. In other words, outside of a try, step failure causes the entire pipeline to fail.

A Standard Component Library

This appendix describes the standard XProc components.

Note

The components described in this draft are intended mainly as a starting point for discussion and to present a flavor for the sorts of components envisioned. The WG has not yet discussed them in detail.

1 Required Components

This section describes standard components that must be supported by any conforming processor.

To be described…

1.1 Identity

The identity step makes a verbatim copy of its input available on its output.

<p:declare-step type="p:identity">
     <p:input port="source" sequence="yes"/>
     <p:output port="result" sequence="yes"/>
</p:declare-step>

1.2 XSLT

The xslt step applies an XSLT 1.0 transformation supplied by the input to the 'transform' port to the document provided on the 'document' port. It produces a sequence of documents on its 'result' port.

<p:declare-step type="p:xslt">
     <p:input port="source" sequence="no"/>
     <p:input port="stylesheet" sequence="no"/>
     <p:output port="result" sequence="yes"/>
     <p:parameter name="*"/>
</p:declare-step>

All of the specified parameters are made available to the XSLT processor. If the XSLT processor signals a fatal error, the step fails, otherwise the result of the transformation is produced on the result port.

Note, an XSLT 1.0 processor without any extensions can only produce a single XML document as its result. However, many XSLT 1.0 processors provide extensions which allow the processor to produce more than one result. In such cases, more than one document may appear in the result port. The principle result document will always appear last.

1.3 XInclude

The XInclude step applies xinclude processing semantics to the document. The referenced documents are calculated against the base URI and are not provided as input to the step.

<p:declare-step type="p:xinclude">
     <p:input port="source" sequence="no"/>
     <p:output port="result" sequence="no"/>
</p:declare-step>

1.4 Serialize

The serialize step applies XML serialization to the children of the document element and replaces those children with their serialization. The outcome is a single element with text content that represents the "escaped" syntax of the children if they were serialized.

<p:declare-step type="p:serialize">
     <p:input port="source" sequence="no"/>
     <p:output port="result" sequence="no"/>
</p:declare-step>

1.5 Parse

The parse step takes the text value of the document element and parses the content as if it was and unicode character stream containing XML. The outcome is a single element with children from the parsing of the XML content. This is the reverse of the serialize step.

When the text value is parsed, a document element wrapper should be assumed so that element siblings can be parsed back into XML. Further, if the 'namespace' parameter is specified, the default namespace is declared on that wrapper element. If a wrapper element name is specified, it is not returned in the result.

If the 'content-type' parameter is specified, an implementation can use a different parser to produce XML content. Such a behavior is implementation defined. For example, for the mime type 'text/html', an implementation might provide an HTML to XHTML parser (e.g. Tidy).

<p:declare-step type="p:parse">
     <p:input port="source" sequence="no"/>
     <p:output port="result" sequence="no"/>
     <p:parameter name="namespace" required="no"/>
     <p:parameter name="content-type" required="no"/>
</p:declare-step>

1.6 Load

The load step has no inputs but takes a parameter that specifies a URI of an XML resource that should be loaded and provided as the result.

<p:declare-step type="p:load">
     <p:output port="result"/>
     <p:parameter name="href" required="yes"/>
</p:declare-step>

Load attempts to read an XML document from the specified URI. If the document does not exist, or is not well-formed, the step fails. Otherwise, the document read is produced on the result port.

Note

Should this step allow href to be a list of URIs and return a sequence of documents?

1.7 Store

The store step stores a serialized version of its input to a URI. The URI is either specified explicitly by the 'href' parameter or implicitly by the base URI of the document. This step has no output.

Note

Should this step allow sequences on its input?

<p:declare-step type="p:store">
     <p:input port="source"/>
     <p:parameter name="href" required="no"/>
</p:declare-step>

Note

A more direct “serialize-to-octet-stream” component may also be required. One, for example, that supports the XSLT 2.0/XQuery 1.0 Serialization specification.

2 Optional Components

T.B.D.

3 Micro-Operations Components

Note

No decisions have been made about whether these components will be optional or required.

3.1 Rename

The rename component renames elements or attributes in a document based on parameter values.

<p:declare-step type="p:rename">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:parameter name="select" required="no"/>
     <p:parameter name="name" required="yes"/>
</p:declare-step>

Each element, attribute, or processing-instruction identified by the XPath 1.0 expression specified in the 'select' parameter is renamed. The name of elements and attributes, and the target of processing-instructions, are changed to the value of the 'name' parameter.

The component fails if the specified name is not a valid name or if the renaming would introduce a syntactic error into the document (i.e., if it would create two attributes with the same name on the same element).

3.2 Wrap

The wrap component wraps the document element with a new document element.

<p:declare-step type="p:wrap">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:parameter name="name" required="yes"/>
</p:declare-step>

3.3 Insert

The insert component inserts a document specified on the 'insertion' port as a child of the document element provided on the 'document' port. The position of this insert is governed by the parameters.

<p:declare-step type="p:insert">
     <p:input port="source"/>
     <p:input port="insertion"/>
     <p:output port="result"/>
     <p:parameter name="at-start" required="yes"/>
</p:declare-step>

If the at-start parameter is true, the insertion document will be inserted as the first child(ren) of the document, otherwise it will be inserted as the last child(ren). If the parameter is not specified, a value of true is assumed.

3.4 Set-attributes

The set-attributes component sets attribute values on the document element using the attribute values provided on the document element of the 'attribute' port's document.

<p:declare-step type="p:set-attributes">
     <p:input port="source"/>
     <p:input port="attributes"/>
     <p:output port="result"/>
</p:declare-step>

4 Component Declarations

T.B.D.

<p:pipeline-library name="standard">

  <p:declare-step type="p:validate">
    <p:input port="source" sequence="no"/>
    <p:input port="schema" sequence="yes"/>
    <p:output port="result" sequence="no"/>
  </p:declare-step>

  <p:declare-step type="p:xinclude">
    <p:input port="source" sequence="no"/>
    <p:output port="result" sequence="no"/>
  </p:declare-step>

  <p:declare-step type="p:xslt">
    <p:input port="source" sequence="no"/>
    <p:input port="stylesheet" sequence="no"/>
    <p:output port="result" sequence="yes"/>
  </p:declare-step>

</p:pipeline-library>

B References

[XML Core Req] XML Processing Model Requirements. Dmitry Lenkov, Norman Walsh, editors. W3C Working Group Note 05 April 2004

[Infoset] XML Information Set (Second Edition). John Cowan, Richard Tobin, editors. W3C Working Group Note 04 February 2004.

[XML 1.0] Extensible Markup Language (XML) 1.0 (Fourth Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006.

[XML 1.1] Extensible Markup Language (XML) 1.1 (Second Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006.

[XPath 1.0] XML Path Language (XPath) Version 1.0. James Clark and Steve DeRose, editors. W3C Recommendation. 16 November 1999.

[XSLT 1.0] XSL Transformations (XSLT) Version 1.0. James Clark, editor. W3C Recommendation. 16 November 1999.

[xml:id] xml:id Version 1.0. Jonathan Marsh, Daniel Veillard, and Norman Walsh, editors. W3C Recommendation. 9 September 2005.

[XML Base] XML Base. Jonathan Marsh, editor. W3C Recommendation. 27 June 2001.

C Glossary

atomic step

An atomic step is a step that performs a unit of XML processing, such as XInclude or transformation.

Note: defined but never referenced.

binding

A binding associates an input or output port with some data source.

by source

A document is specified by source if it references a specific port on another step.

Note: defined but never referenced.

by URI

A document is specified by URI if it is referenced with a URI.

Note: defined but never referenced.

compound step

A compound step is a step that contains additional steps. That is, a compound step differs from an atomic step in that its semantics are at least partially determined by the steps that it contains.

contained steps

The steps that occur directly inside a compound step are called contained steps.

container

A compound step which immediately contains another step is called its container.

declared inputs

The input ports declared on a step are its declared inputs.

declared outputs

The output ports declared on a step are its declared outputs.

declared parameters

The parameters declared on a step are its declared parameters.

default input port

The default input port, which may be undefined, is a specific step name/port name pair from the set of readable ports.

default output port

The default output port, which may be undefined, is a specific step name/port name pair from the set of readable ports.

dynamic error

A dynamic error is one which occurs while a pipeline is being evaluated.

empty environment

The empty environment contains no readable ports, no in-scope parameters, an undefined default input port, and a set of ignored namespaces that consists only of the XHTML namespace, http://www.w3.org/1999/xhtml.

environment

The environment of a step is the static information available to that step.

extension attributes

An element from the XProc namespace may have any attribute not from the XProc namespace, provided that the expanded-QName of the attribute has a non-null namespace URI. These attributes are called extension attributes.

Note: defined but never referenced.

ignored namespaces

The set of ignored namespaces are the namespaces which do not identify steps.

inline document

An inline document is specified directly in the body of the element that binds it.

in-scope parameters

The in-scope parameters are the set of parameters that can be read by a step.

matches

A step matches its signature if and only if it specifies an input for each declared input and it specifies no inputs that are not declared; it specifies a parameter for each parameter that is declared to be required; and it specifies no parameters that are not declared.

parameter

A parameter is a name/value pair.

Note: defined but never referenced.

pipeline

A pipeline is a set of steps connected together, with outputs flowing into inputs, without any loops (no step can read its own output, directly or indirectly).

Note: defined but never referenced.

readable ports

The readable ports are the step name/output port name pairs that are visible to the step.

signature

The signature of a step is the set of inputs, outputs, and parameters that it is declared to accept.

static error

A static error is one which can be detected before pipeline evaluation is even attempted.

step

A step is the basic computational unit of a pipeline. Steps are either atomic or compound.

subpipeline

The steps (and the connections between them) within a container form a subpipeline.

D Schemas

This appendix points to some XProc schemas…

Example D.1. RELAX NG Compact Syntax Schema for XProc

See schemas/pipeline.rnc.

Note: This schema contains a large number of similar patterns in an effort to enforce co-constraints at the grammar level. In the long run, this may turn out to be more confusing than useful given the number of additional constraints that can't practically be enforced at the grammar level.

E The Error Vocabulary

This appendix describes the XML vocabulary that components are expected to use to identify messages on their error ports.

To be described…

F Examples

This appendix contains some examples…

Consult Section 4, “Component Declarations” for a description of the signatures of the standard components.

1 Pipeline for Figure 1

<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
  <p:input port="source" sequence="no"/>
  <p:input port="schemaDoc" sequence="yes"/>
  <p:output port="result" sequence="no"/>

  <p:xinclude name="s1">
    <p:input port="source">
      <p:pipe step="fig1" port="source"/>
    </p:input>
  </p:xinclude>

  <p:validate name="s2">
    <p:input port="schema">
      <p:pipe step="fig1" port="schemaDoc"/>
    </p:input>
  </p:validate>
</p:pipeline>

2 Pipeline for Figure 2

<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
  <p:input port="source" sequence="no"/>
  <p:output port="result" sequence="no"/>

  <div xmlns="http://www.w3.org/1999/xhtml">
    <p>This is documentation</p>
  </div>

  <p:choose name="vcheck">
    <p:when test="/*[@version &lt; 2.0]">
      <p:output port="valid"/>
      <p:validate name="val1">
        <p:input port="schema">
          <p:document href="v1schema.xsd"/>
        </p:input>
      </p:validate>
    </p:when>

    <p:otherwise>
      <p:output port="valid"/>
      <p:validate name="val2">
        <p:input port="schema">
          <p:document href="v2schema.xsd"/>
        </p:input>
      </p:validate>
    </p:otherwise>
  </p:choose>

  <p:xslt name="xform">
    <p:input port="stylesheet">
      <p:document href="stylesheet.xsl"/>
    </p:input>
  </p:xslt>
</p:pipeline>