XProc: An XML Pipeline Language

W3C Working Draft 5 April 2007

This Version:: http://www.w3.org/TR/2007/WD-xproc-20070405/
Latest Version:: http://www.w3.org/TR/xproc/
Previous versions:: http://www.w3.org/TR/2006/WD-xproc-20061117/ http://www.w3.org/TR/2006/WD-xproc-20060928/
Editors:: Norman Walsh, Sun Microsystems, Inc. <Norman.Walsh@Sun.COM>; Alex Milowski, Invited expert <alex@milowski.org>

This document is also available in these non-normative formats: XML

Abstract

This specification describes the syntax and semantics of XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be performed on one or more XML documents, producing one or more XML documents as output.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was produced by the XML Processing Model Working Group which is part of the XML Activity. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is a public Working Draft. This draft addresses many, but not all, of the design questions that were incomplete in previous drafts. The library of standard steps, both required and optional, is still being reviewed and considered. The Working Group continues to encourage feedback from potential users. No useful revision marks from the previous Working Draft are available due to significant editorial reorganization of the material.

Please send comments about this document to public-xml-processing-model-comments@w3.org (public archives are available).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

1 Introduction

2 Pipeline Concepts

2.1 Steps
2.2 Inputs and Outputs
2.3 Options and parameters
2.4 Connections
2.5 Environment

3 Syntax Overview

3.1 Scoping of Names
3.2 Global Attributes
3.3 Associating Documents with Ports
3.4 Ignored namespaces
3.5 Documentation
3.6 Extension attributes
3.7 Extension elements
3.8 Syntax Summaries

4 Steps

4.1 Pipeline
4.2 For-Each
4.3 Viewport
4.4 Choose
4.5 Group
4.6 Try/Catch
4.7 Other Steps

5 Other pipeline elements

5.1 p:input Element

5.2 p:iteration-source Element

5.3 p:viewport-source Element

5.4 p:xpath-context Element

5.5 p:output Element

5.6 p:option Element

5.7 p:parameter Element

5.7.1 Declaring Parameters
5.7.2 Using Parameters
5.7.3 Assigning Values to Parameters

5.8 p:import-parameter Element

5.9 p:declare-step Element

5.10 p:pipeline-library Element

5.11 p:import Element

5.12 p:pipe Element

5.13 p:inline Element

5.14 p:document Element

5.15 p:doc Element

6 Errors

6.1 Static Errors
6.2 Dynamic Errors

Appendices

A Standard Step Library

1 Required Components

1.1 Identity
1.2 Join Sequences
1.3 Load
1.4 Store
1.5 Subsequence
1.6 XInclude
1.7 XSLT
1.8 Serialize
1.9 Parse

2 Optional Components

2.1 Http Request
2.2 Relax NG Validate
2.3 XML Schema Validate
2.4 XSLT 2.0
2.5 XSL Formatter
2.6 XQuery 1.0

3 Micro-Operations Components

3.1 Delete
3.2 Insert
3.3 Label Elements
3.4 Namespace Rename
3.5 Rename
3.6 Replace
3.7 Set-attributes
3.8 Unwrap
3.9 Wrap

B References

C Glossary

D Pipeline Language Summary

E The Error Vocabulary

1 Introduction

An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.

A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps include steps within themselves.

This specification defines a standard library, Appendix A, Standard Step Library, of steps. Pipeline implementations may support additional types of steps as well.

Figure 1, “A simple, linear XInclude/Validate pipeline” is a graphical representation of a simple pipeline that performs XInclude processing and validation on a document.

Figure 1. A simple, linear XInclude/Validate pipeline

This is a pipeline that consists of two atomic steps, XInclude and Validate. The pipeline itself has two inputs, “Document” and “Schema”. How these inputs are connected to XML documents outside the pipeline is implementation-defined. The XInclude step reads the pipeline input “Document” and produces a result document. The Validate step reads the pipeline input “Schema” and the output from the XInclude step and produces a result document. The result of the validation, “Result Document”, is the result of the pipeline. How pipeline outputs are connected to XML documents outside the pipeline is implementation defined.

The pipeline document for this pipeline is shown in Example 1, “A simple, linear XInclude/Validate pipeline”.

Example 1. A simple, linear XInclude/Validate pipeline

<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
  <p:input port="source" sequence="no"/>
  <p:input port="schemaDoc" sequence="yes"/>
  <p:output port="result" sequence="no"/>

  <p:xinclude name="s1">
    <p:input port="source">
      <p:pipe step="fig1" port="source"/>
    </p:input>
  </p:xinclude>

  <p:validate-xml-schema name="s2">
    <p:input port="schema">
      <p:pipe step="fig1" port="schemaDoc"/>
    </p:input>
  </p:validate-xml-schema>
</p:pipeline>

Figure 2, “A validate and transform pipeline” is a more complex example: it performs schema validation with an appropriate schema and then styles the validated document.

Figure 2. A validate and transform pipeline

The heart of this example is the conditional. The “choose” step evaluates an XPath expression over a test document. Based on the result of that expression, one or another branch is evaluated. In this example, each branch consists of a single validate step.

Example 2. A validate and transform pipeline

<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
  <p:input port="source" sequence="no"/>
  <p:output port="result" sequence="no"/>

  <div xmlns="http://www.w3.org/1999/xhtml">
    <p>This is documentation</p>
  </div>

  <p:choose name="vcheck">
    <p:when test="/*[@version &lt; 2.0]">
      <p:output port="valid"/>
      <p:validate-xml-schema name="val1">
        <p:input port="schema">
          <p:document href="v1schema.xsd"/>
        </p:input>
      </p:validate-xml-schema>
    </p:when>

    <p:otherwise>
      <p:output port="valid"/>
      <p:validate-xml-schema name="val2">
        <p:input port="schema">
          <p:document href="v2schema.xsd"/>
        </p:input>
      </p:validate-xml-schema>
    </p:otherwise>
  </p:choose>

  <p:xslt name="xform">
    <p:input port="stylesheet">
      <p:document href="stylesheet.xsl"/>
    </p:input>
  </p:xslt>
</p:pipeline>

2 Pipeline Concepts

[Definition: A pipeline is a set of steps connected together, with outputs flowing into inputs, without any loops (no step can read its own output, directly or indirectly).] A pipeline is itself a step and must satisfy the constraints on steps.

The result of evaluating a pipeline is the result of evaluating the steps that it contains, in the order determined by the connections between them. A pipeline must behave as if it evaluated each step each time it occurs. Unless otherwise indicated, implementations must not assume that steps are functional (that is, that their outputs depend only on their explicit inputs, options, and parameters) or side-effect free.

2.1 Steps

[Definition: A step is the basic computational unit of a pipeline. Steps are either atomic or compound.] [Definition: An atomic step is a step that performs a unit of XML processing, such as XInclude or transformation.] Atomic steps can perform arbitrary amounts of computation but they are indivisible. Atomic steps carry out fundamental XML operations. An XSLT step, for example, performs XSLT processing; an XML Schema Validation step validates one input with respect to some set of XML Schemas, etc.

Compound steps, on the other hand, control and organize the flow of documents through a pipeline, reconstructing familiar programming language functionality such as conditionals, iterators and exception handling. They contain other steps, whose evaluation they control.

[Definition: A compound step is a step that contains additional steps. That is, a compound step differs from an atomic step in that its semantics are at least partially determined by the steps that it contains.]

Every compound step contains zero or more steps. [Definition: The steps that occur directly inside a compound step are called contained steps.] [Definition: A compound step which immediately contains another step is called its container.]

[Definition: The steps (and the connections between them) within a compound step form a subpipeline.] [Definition: The last step in a subpipeline is the last step in document order within its container. ]

A compound step can contain one or more subpipelines and it determines how, and which, if any, of its subpipelines are evaluated.

Steps have “ports” into which inputs and outputs are connected. Each step has a number of input ports and a number of output ports, all with unique names. A step can have zero input ports and/or zero output ports. (All steps have an implicit standard output port for reporting errors that must not be declared.)

Steps have any number of options, all with unique names. A step can have zero options.

Steps have any number of parameters, all with unique names. A step can have zero parameters.

2.2 Inputs and Outputs

Although some steps can read and write non-XML resources, what flows between steps through input ports and output ports are exclusively XML documents or sequences of XML documents. Each XML document (or document in a sequence) must conceptually be an [Infoset] with a Document Information Item at its root. The inputs and outputs can be implemented as sequences of characters, events, or object models, or any other representation the implementation chooses.

It is a dynamic error if a non-XML resource is produced on a step output or arrives on a step input.

Editorial Note

What about the cases where it's impractical to test for this error?

An implementation may make it possible for a step to produce non-XML output (through channels other than a named output port)—for example, writing a PDF document to disk—but that output cannot flow through the pipeline. Similarly, one can imagine a step that takes no pipeline inputs, reads a non-XML file from a URI, and produces an XML output. But the non-XML file cannot arrive on an input port to a step or pipeline.

The common case is that each step has one or more inputs and one or more outputs. Figure 3, “An atomic step” illustrates symbolically an atomic step with two inputs and one output.

An atomic step with two inputs and one output

Figure 3. An atomic step

Each step declares its input and output ports. [Definition: The input ports declared on a step are its declared inputs.] [Definition: The output ports declared on a step are its declared outputs.]

All of the declared inputs of a step (atomic or compound) must be connected. Inputs may be connected to:

The output port of some other step.
A fixed, inline document or sequence of documents.
A document read from a URI.
The inputs declared on the top-level pipeline step, and only on that step, may be connected to documents outside the pipeline by an implementation-dependent mechanism.

Unconnected output ports are allowed; any documents produced on those ports are simply discarded.

For atomic steps, the inputs and outputs are declared once (with p:declare-step) for each type of atomic step and every instance of that type has the same inputs and outputs. For example, every XSLT step has exactly the same inputs and outputs with the same names.

The situation is slightly more complicated for compound steps. Compound steps don't typically have declared inputs, but they do have declared outputs. Unlike atomic steps, on compound steps, the number and names of the outputs can be different on each instance of the step.

Figure 4, “A compound step” illustrates symbolically a compound step with one output. As you can see from the diagram, the output from the compound step comes from one of the outputs of the subpipeline within the step.

A compound step with two inputs and one output

Figure 4. A compound step

Output ports on compound steps have a dual nature: from the perspective of the compound step's siblings, it's outputs are just outputs and they can either be connected up or not. From the perspective of the compound step itself, they are inputs into which something must be connected.

Within a compound step, the declared outputs of a compound step can be connected to:

The output port of some other contained step.
A fixed, inline document or sequence of documents.
A document read from a URI.

Each step may have a default output port. [Definition: If a step has exactly one output port, or if one of its output ports is explicitly designated as the default, then that output port is the default output port of the step.] If a step has more than one output port and none is explicitly designated the default, then the default output port of that step is undefined.

Each input and output is declared to accept or produce either a single document or a sequence of documents. It is not a static error to connect a port that is declared to produce a sequence of documents to a port that is declared to accept only a single document. It is, however, a dynamic error if the former step actually produces more than a single document at run time.

[Definition: The signature of a step is the set of inputs, outputs, options, and parameters that it is declared to accept.] Each atomic step (e.g. XSLT or XInclude) has a fixed signature, declared globally or built-in, which all its instances share, whereas each compound step has its own signature declared locally.

[Definition: A step matches its signature if and only if it specifies an input for each declared input and it specifies no inputs that are not declared; it specifies no options that are not declared; it specifies a parameter for each parameter that is declared to be required; and it specifies no parameters that are not declared.] In other words, every input and required parameter must be specified and only inputs, outputs, options, and parameters that are declared may be specified. Outputs and optional parameters do not have to be specified.

Steps may also produce error, warning, and informative messages. These messages appear on a special “error output” port defined (only) in the catch clause of a try/catch.

2.3 Options and parameters

Some steps accept options and/or parameters. Both are name/value pairs. The distinction between options and parameters is simply that options are used by the XProc processor to configure the step; the step never sees them. Parameters are passed to the step for its use, the XProc processor does not use them.

[Definition: An option is a name/value pair.] The name of an option must be an expanded name. The value of an option must be a string. If a document, node, or other value is given, its [XPath 1.0] string value is computed and that string is used.

[Definition: The options declared on a step are its declared options.]

[Definition: A parameter is a name/value pair.] The name of a parameter must be an expanded name. The value of a parameter must be a string. If a document, node, or other value is given, its [XPath 1.0] string value is computed and that string is used.

[Definition: The parameters declared on a step are its declared parameters.]

2.4 Connections

Steps are connected together by their input ports and output ports. It is a static error if there are any loops in the connections between steps: no step can be connected to itself nor can there be any sequence of connections through other steps that leads back to itself.

2.5 Environment

[Definition: The environment of a step is the static information available to that step.]

The environment consists of:

A set of readable ports. [Definition: The readable ports are the step name/output port name pairs that are visible to the step.] Inputs and outputs can only be connected to readable ports.
A set of in-scope parameters. [Definition: The in-scope parameters are the set of parameters that can be read by a step.] All of the in-scope parameters are available to the processor for computing actual parameters. The actual parameters passed to a step are those that are explicitly identified with p:parameter or p:import-parameter tags on the actual step.
A default readable port. [Definition: The default readable port, which may be undefined, is a specific step name/port name pair from the set of readable ports.]
A set of ignored namespaces. [Definition: The set of ignored namespaces are the namespaces which do not identify steps.]

[Definition: The empty environment contains no readable ports, no in-scope parameters, an undefined default readable port, and an empty set of ignored namespaces.]

Unless otherwise specified, the environment of each step is its inherited environment, the environment of its parent, with the following standard modifications:

All of the declared parameters of the step are added to the in-scope parameters in the environment.
If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.
If there is a preceding sibling step element:
- If that preceding sibling has exactly one output port, or an output port designated as the default, then that output port becomes the default readable port.
- Otherwise, the default readable port is undefined.
If there is not a preceding sibling step element, the default readable port is unchanged from its inherited environment.

A step with no parent inherits the empty environment.

Unless otherwise specified, the inherited environment for the contained steps of a compound step is the standard inheritance, which is the environment of that compound step, with the following modification:

The union of all the declared outputs of the contained steps are added to the readable ports in the environment.

In other words, sibling steps can see each other's outputs in addition to the outputs of their ancestors.

3 Syntax Overview

This section describes the normative XML syntax of XProc. This syntax is sufficient to represent all the aspects of a pipeline, as set out in the preceding sections.

The namespace of the XProc XML vocabulary described by this specification is http://www.w3.org/2007/03/xproc.

Elements in a pipeline document represent the pipeline, the steps it contains, the connections between those steps, the steps and connections contained within them, and so on. Each step is represented by an element; a combination of elements and attributes specify how the inputs and outputs of each step are connected and how options and parameters are passed.

Conceptually, we can speak of steps as objects that have inputs and outputs, that are connected together and which may contain additional steps. Syntactically, we need a mechanism for specifying these relationships.

Containment is represented naturally using nesting of XML elements. If a particular element identifies a compound step then the step elements that are its immediate children form its subpipeline.

The connections between steps are expressed using names and references to those names.

Six kinds of things are named in XProc:

Steps types,
Steps,
Input ports,
Output ports,
Options, and
Parameters

3.1 Scoping of Names

The scope of the names of step types is the pipeline. Each pipeline processor has some number of built in step types and may declare (directly, or by reference to an external library) additional step types.

The scope of the names of the steps themselves is determined by the environment of each step. In general, the name of a step, the names of its sibling steps, the names of any steps that it contains directly, the names of its ancestors; and the names of its ancestor's siblings are all in the same scope. All in-scope steps must have unique names: it is a static error if two steps with the same name appear in the same scope.

The scope of an input or output port name is the step on which it is defined. The names of all the ports on any step must be unique.

Taken together, these uniqueness constraints guarantee that the combination of a step name and a port name uniquely identifies exactly one port on exactly one in-scope step.

The scope of option names is the step on which they appear.

The scope of parameter names is essentially the same as the scope of step names, with the following caveat. Whereas step names must be unique, parameter names may be repeated. The declaration of a parameter on a step shadows any declaration that may already be in-scope.

3.2 Global Attributes

The following attributes may appear on any element in a pipeline:

The attribute xml:id with the semantics outlined in [xml:id].
The attribute xml:base with the semantics outlined in [XML Base].

The following attributes may appear on any step element:

The attribute p:ignore-prefixes with the semantics outlined in Section 3.4, “Ignored namespaces”.

3.3 Associating Documents with Ports

[Definition: A binding associates an input or output port with some data source.] A document or a sequence of documents can be bound to a port in three ways: by source, by URI, or by providing it inline. Each of these mechanisms is supported on the p:input, p:output, p:xpath-context, p:iteration-source, and p:viewport-source elements.

Specified by URI

[Definition: A document is specified by URI if it is referenced with a URI.] The href attribute on the p:document element is used to refer to documents by URI.

In this example, the input to the Identity step named “otherstep” comes from “http://example.com/input.xml”.

<p:identity name="otherstep">
  <p:input port="source">
    <p:document href="http://example.com/input.xml"/>
  </p:input>
</p:identity>

</p:pipeline>

It is a dynamic error if the processor attempts to retrieve the URI specified on a p:document and fails. (For example, if the resource does not exist or is not accessible with the user's authentication credentials.)

Specified by source

[Definition: A document is specified by source if it references a specific port on another step.] The step and port attributes on the p:pipe element are used for this purpose. (The step attribute may refer to any kind of step, either a atomic step or a step, its name notwithstanding.)

In this example, the “document” input to the XInclude step named “expand” comes from the “result” port of the step named “otherstep”.

<p:xinclude name="expand">
  <p:input port="source">
    <p:pipe step="otherstep" port="result"/>
  </p:input>
</p:xinclude>

</p:pipeline>

When a pipe is used, the specified port must be in the readable ports of the current environment. It is a static error if the port specified by a p:pipe is not in the readable ports of the environment.

Specified inline

[Definition: An inline document is specified directly in the body of the element that binds it.] The content of the p:inline element is used for this purpose.

In this example, the “stylesheet” input to the XSLT step named “xform” comes from the content of the p:input element itself.

<p:xslt name="xform">
  <p:input port="source">
    <p:pipe step="expand" port="result"/>
  </p:input>
  <p:input port="stylesheet">
    <p:inline>
      <xsl:stylesheet version="1.0">
        ...
      </xsl:stylesheet>
    </p:inline>
  </p:input>
</p:xslt>

Inline documents are considered “quoted”, they are not interpolated or available to the pipeline processor in any way except as documents flowing through the pipeline.

Note that an p:input or p:output element may contain more than one p:pipe, p:document, or p:inline element. If more than one binding is provided, then the specified sequence of documents is made available on that port.

3.4 Ignored namespaces

The element children of a compound step fall into four classes: elements that provide bindings for input and output ports, elements that provide bindings for options and parameters, other elements that identify steps that are part of its subpipeline, and extension elements. Extension elements may be used for documentation or to provide additional information for a specific processor.

To determine which elements are extension elements and which are expected to identify steps, a set of ignored namespaces is maintained in the environment of each step.

The ignored namespaces are a set of namespaces which do not identify steps. They are ignored by the processor unless the processor happens to recognize one or more of them as extension elements. The initial set of ignored namespaces is empty.

Syntactically, a pipeline author can add namespaces to the set of ignored namespaces with the p:ignore-prefixes attribute. This attribute can appear on any element in the pipeline namespace which identifies a step. It is a static error if the p:ignore-prefixes attribute appears on any element which does not identify a step.

The value of the p:ignore-prefixes attribute is a sequence of tokens, each of which must be the prefix of an in-scope namespace. It is a static error if any token specified in the p:ignore-prefixes attribute is not the prefix of an in-scope namespace.

Each of the namespaces identified by the specified prefix is added to the set of ignored namespaces in the environment of the step on which the attribute occurs.

3.5 Documentation

Pipeline authors may add documentation to their pipeline documents with the p:doc element. Except when it appears as a descendant of p:inline, the p:doc element is completely ignored by pipeline processors, it exists simply for documentation purposes. (If a p:doc is provided as a descendant of p:inline, it has no special semantics, it is treated literally as part of the document to be provided on that port.)

Pipeline processors that inspect the contents of p:doc elements and behave differently on the basis of what they find are not conformant. Processor extensions must be specified with extension elements.

3.6 Extension attributes

[Definition: An element from the XProc namespace may have any attribute not from the XProc namespace, provided that the expanded-QName of the attribute has a non-null namespace URI. Such an attribute is called an extension attribute.] The presence of an extension attribute must not cause the connections between steps to differ from the connections that any other conformant XProc processor would produce. They must not cause the processor to fail to signal an error that a conformant processor is required to signal. This means that an extension attribute must not change the effect of any XProc element except to the extent that the effect is implementation-defined or implementation-dependent.

A processor which encounters an extension attribute that it does not recognize must behave as if the attribute was not present.

3.7 Extension elements

The presence of an extension element must not cause the connections between steps to differ from the connections that any other conformant XProc processor would produce. They must not cause the processor to fail to signal an error that a conformant processor is required to signal. This means that an extension element must not change the effect of any XProc element except to the extent that the effect is implementation-defined or implementation-dependent.

There are three contexts in which an extension element might occur:

In an inline document. All elements in an inline document are considered quoted; no extension element can occur.
In a subpipeline. In a subpipeline, any element in a namespace that is in the set of ignored namespaces is an extension element. Every other element identifies a step.
In any other context, any element that is not in the pipeline namespace is an extension element.

3.8 Syntax Summaries

The description of each element in the pipeline namespace is accompanied by a syntactic summary that provides a quick overview of the element's syntax:

<p:some-element some-attribute? = some-type> ((some | elements | allowed)*, other-elements?) </p:some-element>

For clarity of exposition, some attributes and elements are elided from the summaries:

An xml:id attribute is allowed on any element. It has the semantics of [xml:id].
An xml:base attribute is allowed on any element. It has the semantics of [XML Base].
The p:doc element is not shown, but it is allowed anywhere.

4 Steps

This section describes the core steps of XProc.

Every compound step in a pipeline has six parts: a set of inputs, a set of outputs, a set of options, a set of parameters, a set of contained steps, and an environment.

Editorial Note

In previous drafts, inputs, outputs, options, and parameters occurred in a fixed order. In this draft, they may appear in any order (but before the contained steps). Is that an improvement?

Except where otherwise noted, a compound step can have an arbitrary number of inputs, outputs, options, parameters, and contained steps.

4.1 Pipeline

A Pipeline is specified by the p:pipeline element. It encapsulates the behavior of a subpipeline. Its children declare the inputs, outputs, and parameters that the pipeline exposes and identify the steps in its subpipeline.

A pipeline can declare additional steps (e.g., ones that are provided by a particular implementation or in some implementation-defined way) and import other pipelines.

<p:pipeline name? = NCName p:ignore-prefixes? = prefix list> ((p:input | p:output | p:parameter | p:import* | p:declare-step*)*, subpipeline) </p:pipeline>

Viewed from the outside, a pipeline is a black box which performs some calculation on its inputs and produces its outputs. From the pipeline author's perspective, the computation performed by the pipeline is described in terms of contained steps which read the pipeline's inputs and produce the pipeline's outputs.

The environment of a pipeline is its inherited environment with the standard modifications.

The environment inherited by its contained steps is the empty environment with these modifications:

All of the declared inputs of the pipeline are added to the readable ports in the environment.
If the pipeline has exactly one input, that input is the default readable port, otherwise the default readable port is undefined.
All of the declared parameters of the pipeline are added to the in-scope parameters in the environment.
If any ignored namespaces are specified, those namespaces are added to the set of ignored namespaces in the environment.

If there is no binding for any of the declared outputs of the pipeline, then those outputs are bound to the default output port of the last step in the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

There are two additional constraints on pipelines:

A p:pipeline must not itself be a contained step.
If a p:pipeline is part of a p:pipeline-library or if it is imported directly with p:import, then it must have a name.

4.1.1 Examples

A pipeline might accept a document and a stylesheet as input; perform XInclude, validation, and transformation; and produce a sequence of formatted documents as its output.

Example 3. A Sample Pipeline Document

<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/2007/03/xproc">
<p:input port="document"/>
<p:input port="stylesheet"/>
<p:output port="result"/>

<p:xinclude>
  <p:input port="source">
    <p:pipe step="pipeline" port="document"/>
  </p:input>
</p:xinclude>

<p:validate-xml-schema>
  <p:input port="schema">
    <p:document href="http://example.com/path/to/schema.xsd"/>
  </p:input>
</p:validate-xml-schema>

<p:xslt>
  <p:input port="stylesheet">
    <p:pipe step="pipeline" port="stylesheet"/>
  </p:input>
</p:xslt>

</p:pipeline>

4.2 For-Each

A For-Each is specified by the p:for-each element. It processes a sequence of documents, applying its subpipeline to each document in turn.

<p:for-each name? = NCName select? = xpath expression p:ignore-prefixes? = prefix list> (p:iteration-source?, (p:output | p:parameter)*, subpipeline) </p:for-each>

When a pipeline needs to process a sequence of documents using a step that only accepts a single document, the for-each construct can be used as a wrapper around the step that accepts only a single document. The for-each will apply that step to each document in the sequence in turn.

The result of the for-each is a sequence of documents produced by processing each individual document in the input sequence. If the subpipeline is connected to one or more output ports on the for-each, what appears on each of those ports is the sequence of documents produced by each iteration of the loop.

The p:iteration-source is an anonymous input: its binding provides a sequence of documents to the for-each step. If no iteration sequence is explicitly provided, then the iteration source is read from the default readable port.

A portion of each input document can be selected using the select attribute. If no selection is specified, the document node of each document is selected.

Each subtree selected by the p:for-each from each of the inputs that appear on the iteration source is wrapped in a document node and provided to the subpipeline.

The processor provides each document, one at a time, to the subpipeline represented by the children of the p:for-each on a port named current.

For each declared output, the processor collects all the documents that are produced for that output from all the iterations, in order, into a sequence. The result of the p:for-each on that output is that sequence of documents.

The environment of a for-each is its inherited environment with the standard modifications.

The environment inherited by its contained steps is the standard inheritance with these modifications:

The port named “current” on the p:for-each is added to the readable ports.
The port named “current” on the p:for-each is made the default readable port.

If there is no binding for any of the declared outputs of the for-each, then those outputs are bound to the default output port of the last step in the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

4.2.1 Examples

A for-each might accept a sequence of chapters as its input, process each chapter in turn with XSLT, a step that accepts only a single input document, and produce a sequence of formatted chapters as its output.

Example 4. A Sample For-Each

<p:for-each name="chapters" select="//chapter">
  <p:output port="html-results">
    <p:pipe step="make-html" port="result"/>
  </p:output>
  <p:output port="fo-results">
    <p:pipe step="make-fo" port="result"/>
  </p:output>

  <p:xslt name="make-html">
    <p:input port="stylesheet">
      <p:document href="http://example.com/xsl/html.xsl"/>
    </p:input>
  </p:xslt>

  <p:xslt name="make-fo">
    <p:input port="stylesheet">
      <p:document href="http://example.com/xsl/fo.xsl"/>
    </p:input>
  </p:xslt>
</p:for-each>

The //chapter elements of the document are selected. Each chapter is transformed into HTML and XSL Formatting Objects using an XSLT step. The resulting HTML and FO documents are aggregated together and appear on the html-results and fo-results ports, respectively, of the chapters step itself.

4.3 Viewport

A Viewport is specified by the p:viewport element. It processes a single document, applying its subpipeline to one or more subsections of the document.

<p:viewport name? = NCName match = xpath expression p:ignore-prefixes? = prefix list> (p:viewport-source?, p:output, p:parameter*, subpipeline) </p:viewport>

The result of the viewport is a copy of the original document with the selected subsections replaced by the results of applying the subpipeline to them.

The p:viewport-source is an anonymous input: its binding provides a single document to the viewport step. If no document is explicitly provided, then the viewport source is read from the default readable port.

The match attribute specifies an [XPath 1.0] expression that is a Pattern in [XSLT 1.0]. Each matching node in the source document is wrapped in a document node and provided to the viewport's subpipeline. After a node has been matched, its descendants are not considered for further matching. In other words, a node is passed at most once to the subpipeline.

The processor provides each document, one at a time, to the subpipeline represented by the children of the p:viewport on a port named current.

What appears on the output from the p:viewport will be a copy of the input document where each matching node is replaced by the result of applying the subpipeline to the subtree rooted at that node.

It is a dynamic error if the viewport source is a sequence of more than one document or if the output from any iteration is a sequence of more than one document.

The environment of a viewport is its inherited environment with the standard modifications.

The environment inherited by its contained steps is the standard inheritance with this modification:

The port named “current” on the p:viewport is added to the readable ports.
The port named “current” on the p:viewport is made the default readable port.

If there is no binding for any of the declared outputs of the viewport, then those outputs are bound to the default output port of the last step in the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

4.3.1 Examples

A viewport might accept an XHTML document as its input, add an hr element before all div elements that have the class value “chapter”, and return an XHTML document that is the same as the original except for that change.

Example 5. A Sample Viewport

<p:viewport match="h:div[@class='chapter']">
  <p:output port="result"/>
  <p:insert>
    <p:input port="insertion">
      <p:inline>
        <hr xmlns="http://www.w3.org/1999/xhtml"/>
      </p:inline>
    </p:input>
    <p:option name="at-start" value="true"/>
  </p:insert>
</p:viewport>

The nodes which match h:div[@class='chapter'] (according to the rules of [XSLT 1.0]) in the input document are selected. An hr is inserted as the first child of each h:div and the resulting version replaces the original h:div. The result of the whole step is a copy of the input document with a horizontal rule before each selected h:div.

4.4 Choose

A Choose is specified by the p:choose element. It selects exactly one of a list of alternative subpipelines based on the evaluation of [XPath 1.0] expressions.

<p:choose name? = NCName p:ignore-prefixes? = prefix list> (p:xpath-context?, p:when*, p:otherwise?) </p:choose>

A choose has no inputs. It contains an arbitrary number of alternative subpipelines, exactly one of which will be evaluated.

The list of alternative subpipelines consists of zero or more subpipelines, each guarded by an XPath expression (with an associated context), followed optionally by a single default subpipeline.

The choose considers each subpipeline in turn and selects the first (and only the first) subpipeline for which the guard expression evaluates to true in its context. If there are no subpipelines for which the expression evaluates to true, the default subpipeline, if it was specified, is selected.

After a subpipeline is selected, it is evaluated as if only it had been present.

The result of the choose is the result of the selected subpipeline.

In order to ensure that the result of the choose is consistent irrespective of the subpipeline chosen, each subpipeline must declare the same number outputs with the same names. It is a static error if two subpipelines in a choose declare different outputs.

It is a dynamic error if no subpipeline is selected by the choose and no default is provided.

The p:choose can specify the context node against which the [XPath 1.0] expressions that occur on each branch are evaluated. The context node is specified as a binding for the xpath-context. If no binding is provided, the default xpath-context is the document on the default readable port.

It is a dynamic error if the xpath-context is bound to a sequence of documents.

The environment of a choose is its inherited environment with the standard modifications.

Each conditional subpipeline is represented by a p:when element.

<p:when test = expression p:ignore-prefixes? = prefix list> (p:xpath-context?, (p:output | p:parameter)*, subpipeline) </p:when>

Each p:when branch of the p:choose has a test attribute which must contain an [XPath 1.0] expression. That XPath expression's effective boolean value is the guard expression for the subpipeline contained within that p:when.

The p:when can specify a context node against which its test expression is to be evaluated. That context node is specified as a binding for the xpath-context. If no context is specified on the p:when, the context of the p:choose is used. It is a static error if no context is specified in either the p:choose or the p:when and the default readable port is undefined.

The default branch is represented by a p:otherwise element.

<p:otherwise> ((p:output | p:parameter)*, subpipeline) </p:otherwise>

The environment of the selected subpipeline is the inherited environment with the standard modifications.

The environment inherited by its contained steps is the standard inheritance.

If there is no binding for any of the declared outputs of the selected subpipeline, then those outputs are bound to the default output port of the last step in the selected subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

4.4.1 Examples

A choose might test the version attribute of the document element of a document and validate with an appropriate schema.

Example 6. A Sample Choose

<p:choose name="version">
  <p:when test="/*[@version = 2]">
    <p:output port="result"/>
    <p:validate-xml-schema>
      <p:input port="schema">
        <p:document href="v2schema.xsd"/>
      </p:input>
    </p:validate-xml-schema>
  </p:when>

  <p:when test="/*[@version = 1]">
    <p:output port="result"/>
    <p:validate-xml-schema>
      <p:input port="schema">
        <p:document href="v1schema.xsd"/>
      </p:input>
    </p:validate-xml-schema>
  </p:when>

  <p:otherwise>
    <p:output port="result"/>
    <p:identity/>
  </p:otherwise>
</p:choose>

4.5 Group

A Group is specified by the p:group element. It encapsulates the behavior of its subpipeline.

<p:group name? = NCName p:ignore-prefixes? = prefix list> ((p:output | p:parameter)*, subpipeline) </p:group>

A group is a convenience wrapper for a collection of steps. The result of a group is the result of its subpipeline.

The environment of a group is its inherited environment with the standard modifications.

The environment inherited by its contained steps is the standard inheritance.

If there is no binding for any of the declared outputs of the group, then those outputs are bound to the default output port of the last step in the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

4.5.1 Examples

This group simplifies specification of the “profile” parameter to the XSLT step.

Example 7. An Example Group

<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/2007/03/xproc">
<p:input port="document"/>
<p:input port="config"/>
<p:output port="result"/>

<p:group>
  <p:parameter name="profile" select="/config/profile">
    <p:pipe step="pipeline" port="config"/>
  </p:parameter>
  <p:output port="result"/>

  <p:choose>
    <p:when test="/config/output = 'fo'">
      <p:xpath-context>
        <p:pipe step="pipeline" port="config"/>
      </p:xpath-context>
      <p:output port="result"/>
      <p:xslt>
        <p:input port="source">
          <p:pipe step="pipeline" port="document"/>
        </p:input>
        <p:input port="stylesheet">
          <p:document href="http://example.com/style/fo.xsl"/>
        </p:input>
        <p:parameter name="profile" select="$profile"/>
      </p:xslt>
    </p:when>
    <p:otherwise>
      <p:output port="result"/>
      <p:xslt>
        <p:input port="source">
          <p:pipe step="pipeline" port="document"/>
        </p:input>
        <p:input port="stylesheet">
          <p:document href="http://example.com/style/xhtml.xsl"/>
        </p:input>
        <p:parameter name="profile" select="$profile"/>
      </p:xslt>
    </p:otherwise>
  </p:choose>
</p:group>

</p:pipeline>

4.6 Try/Catch

A Try/Catch is specified by the p:try element. It isolates a subpipeline, preventing any errors that arise within it from being exposed to the rest of the pipeline.

<p:try name? = NCName p:ignore-prefixes? = prefix list> (p:group, p:catch) </p:try>

The p:group represents the initial subpipeline and the recovery (or “catch”) pipeline is identified with a p:catch element.

The try step evaluates the initial subpipeline and, if no errors occur, the results of that pipeline are the results of the step. However, if any errors occur, it abandons the first subpipeline, discarding any output that it might have generated, and evaluates the recovery subpipeline.

Editorial Note

In the context of try/catch, “errors” refers to step failure which is not the same as a static or dynamic error in the pipeline itself. (Though perhaps it will be possible to recover from some dynamic errors.) The notion of step failure as a distinct class of error needs to be described.

If the recovery subpipeline is evaluated, the results of the recovery subpipeline are the results of the try step. If the recovery subpipeline is evaluated and a step within that subpipeline fails, the try fails.

In order to ensure that the result of the try is consistent irrespective of whether the initial subpipeline provides its output or the recovery subpipeline does, both subpipelines must declare the same number of outputs with the same names. It is a static error if the p:group and p:catch subpipelines declare different outputs.

The environment of a try is its inherited environment with the standard modifications.

The environment inherited by its initial subpipeline, the p:group, is the environment of the try.

The recovery subpipeline of a try is identified with a p:catch:

<p:catch p:ignore-prefixes? = prefix list> ((p:output | p:parameter)*, subpipeline) </p:catch>

The environment of a p:catch is the environment of its containing p:try.

The environment inherited by the contained steps of the p:catch is the standard inheritance with this modification:

The port named “error” on the p:catch is added to the readable ports.

Editorial Note

Should the error port be made the default readable port?

If there is no binding for any of the declared outputs of the catch, then those outputs are bound to the default output port of the last step in the subpipeline. It is a static error if an output is bound to the default output port and the default output port is undefined.

4.6.1 Examples

A pipeline might attempt to process a document by dispatching it to some web service. If the web service succeeds, then those results are passed to the rest of the pipeline. However, if the web service cannot be contacted or reports an error, the catch step can provide some sort of default for the rest of the pipeline.

Example 8. An Example Try/Catch

<p:try>
  <p:group>
    <p:output port="result"/>
    <p:http-request>
      <p:input port="source">
        <p:inline>
          <c:http-request method="post" href="http://example.com/form-action">
            <c:entity-body content-type="application/x-www-form-urlencoded">
              <c:body>name=W3C&amp;spec=XProc</c:body>
            </c:entity-body>
          </c:http-request>
        </p:inline>
      </p:input>
    </p:http-request>
  </p:group>
  <p:catch>
    <p:output port="result"/>
    <p:identity>
      <p:input port="source">
        <p:inline>
          <c:error>HTTP Request Failed</c:error>
        </p:inline>
      </p:input>
    </p:identity>
  </p:catch>
</p:try>

4.7 Other Steps

Other steps are specified by elements that occur as contained steps and are not in any of the the ignored namespaces.

<pfx:other-step name? = NCName p:ignore-prefixes? = prefix list> ((p:input | p:output | p:option | p:import-parameter | p:parameter)*, subpipeline?) </pfx:other-step>

Each of these steps must have been declared with p:declare-step or it must be the name of an imported p:pipeline (or a p:pipeline in the same library).

It is a static error if a pipeline contains a step that is not declared or imported or if the specified inputs, outputs, and parameters do not match the signature for steps of that type.

The environment of such a step is its inherited environment with the standard modifications.

If the step element is the same as the type of a step declared with p:declare-step, then that step invokes the declared step.

If the step element is the name of a p:pipeline, then that step runs the named pipeline.

5 Other pipeline elements

5.1 p:input Element

A p:input identifies input for a step, optionally declaring it, if necessary.

<p:input port = NCName sequence? = yes|no />

The port attribute defines the name of the port. It is a static error to identify two ports with the same name on the same step. It is a static error if the port given does not match the name of an input port specified in the step's declaration.

On compound steps and p:declare-step, an input declaration can indicate if a sequence of documents is allowed to appear on the port. If sequence is specified with the value “yes”, then a sequence is allowed. If sequence is not specified on p:input, or has the value “no”, then it is a dynamic error for a sequence of more than one document to appear on the declared port.

The declaration may be accompanied by a binding for the input:

<p:input port = NCName sequence? = yes|no select? = xpath expression> (p:pipe | p:document | p:inline)* </p:input>

If a binding is provided, a select expression may also be provided. The select expression, if specified, applies the specified [XPath 1.0] select expression to the document(s) that are read. Each node that matches is wrapped in a document and provided to the input port. After a node has been matched, its descendants are not considered for further matching; a node is passed at most once as input. In other words,

<p:input port="source">
  <p:document href="http://example.org/input.html"/>
</p:input>

provides a single document, but

<p:input port="source" select="//html:div">
  <p:document href="http://example.org/input.html"/>
</p:input>

provides a sequence of zero or more documents, one for each matching html:div (that is not itself a descendant of an html:div) in http://example.org/input.html.

A select expression can equally be applied to input read from another step. This input:

<p:input port="source" select="//html:div">
  <p:pipe step="origin" port="result"/>
</p:input>

provides a sequence of zero or more documents, one for each matching html:div in the document (or each of the documents) that is read from the portname port of the step named origin.

In contexts where a binding is required, an empty p:input is bound to an empty sequence of documents.

5.2 p:iteration-source Element

A p:iteration-source identifies input to a for-each.

<p:iteration-source select? = xpath expression> (p:pipe | p:document | p:inline)* </p:iteration-source>

The select attribute and binding elements of a p:iteration-source work the same way that they do in a p:input.

5.3 p:viewport-source Element

A p:viewport-source identifies input to a viewport.

<p:viewport-source> (p:pipe | p:document | p:inline) </p:viewport-source>

Exactly one binding element is allowed and it works the same way that binding elements work in a p:input. No select expression is allowed.

5.4 p:xpath-context Element

A p:xpath-context identifies a context against which an [XPath 1.0] expression will be evaluated for a p:when.

<p:xpath-context> (p:pipe | p:document | p:inline) </p:xpath-context>

Exactly one binding element is allowed and it works the same way that binding elements work in a p:input. No select expression is allowed.

5.5 p:output Element

A p:output identifies an output port, optionally declaring it, if necessary.

<p:output port = NCName sequence? = yes|no default? = yes|no />

An output declaration can indicate if a sequence of documents is allowed to appear on the declared port. If sequence is specified with the value “yes”, then a sequence is allowed. If sequence is not specified on p:output, or has the value “no”, then it is a dynamic error if the step produces a sequence of more than one document on the declared port.

An output declaration can indicate if it is to be considered the default output for the step. If default is specified with the value “yes”, then the named port will be treated as the default output port. It is a static error to identify two different output ports as the default.

It is a static error to specify default="no" on the p:output of a step which has exactly one output. In other words, if any step or step has exactly one output, that output is always the default output.

The declaration may be accompanied by a binding for the output.

<p:output port = NCName sequence? = yes|no default? = yes|no> (p:pipe | p:document | p:inline)* </p:output>

5.6 p:option Element

The p:option element is used both to declare options and to establish values for them. When used on a p:declare-step or compound step, p:option declares the option and may associate a default value with it. Used elsewhere, p:option associates a value with the option.

Options are declared, used, and have values assigned in a manner exactly analogous to the p:parameter element.

5.7 p:parameter Element

The p:parameter element is used both to declare parameters and to establish values for them. When used on a p:declare-step or compound step, p:parameter declares the parameter and may associate a default value with it. Used elsewhere, p:parameter associates a value with the parameter.

5.7.1 Declaring Parameters

Parameters are declared on p:declare-step and compound steps with p:parameter:

<p:parameter name = token required? = yes | no />

The name attribute must be a QName, a single asterisk (*), or a string of the form *:NCName or NCName:*.

If the name is a QName, the parameter may be declared as required or it may be given a default value. It is a static error to specify that the parameter is required or that it has a default value if the name given is not a QName. It is a static error to specify that the parameter is both required and has a default value.

If a parameter is required, it is a static error to invoke the step without specifying a value for that parameter.

5.7.2 Using Parameters

Parameters are used on step with p:parameter:

<p:parameter name = token />

The parameter must be given a value when it is used.

5.7.3 Assigning Values to Parameters

When a parameter is declared, it may be given a default value. When it is used, it must be given a value.

The value can be specified in two ways: with a select or value attribute.

If a select expression is given, it is evaluated against the document specified in the binding and the [XPath 1.0] string value of the expression becomes the value of the parameter. If no select expression is given, the XPath string value of the document specified in the binding becomes the default value of the parameter.It is a dynamic error if a document sequence is specified in the binding for a p:parameter.

<p:parameter name = QName select? = XPath expression> (p:pipe | p:document | p:inline) </p:parameter>

The select expression may refer to the values of other in-scope parameters by variable reference. It is a static error if the variable reference uses a QName that is not the name of an in-scope parameter or if the reference is circular, either directly or indirectly.

If a value attribute is specified, its content becomes the value of the parameter.

<p:parameter name = QName value = string />

5.8 p:import-parameter Element

An p:import-parameter provides a set of in-scope parameters to a step.

<p:import-parameter name = token />

All in-scope parameters which match the name are made available to the step as if they had been specified with individual p:parameter elements.

The name attribute must be a single asterisk (*), a QName, or a string of the form *:NCName or NCName:*.

5.9 p:declare-step Element

A p:declare-step provides the type and signature of an implementation-dependent type of step. It declares the inputs, outputs, options, and parameters for all steps of that type.

<p:declare-step type = QName p:ignore-prefixes? = prefix list> (p:input*, p:output*, p:option*, p:parameter*) </p:declare-step>

Editorial Note

We need to make some provision for identifying the implementation of a declared step, even if it's no more than implementation-defined extension attributes. We'll need some sort of mechanism for declaring multiple implementations too.

It is a static error to identify an unrecognized step in a subpipeline. It is not an error to declare such a step, only to use it.

Exactly one input declaration of a p:declare-step may use the name “*” to indicate that the step accepts an arbitrary number of inputs.

Exactly one output declaration of a p:declare-step may use the name “*” to indicate that the step can produce an arbitrary number of outputs.

5.10 p:pipeline-library Element

A p:pipeline-library is a collection of step declarations and/or pipeline definitions.

<p:pipeline-library p:ignore-prefixes? = prefix list namespace? = URI> (p:import*, p:declare-step*, p:pipeline*) </p:pipeline-library>

Libraries can import pipelines and/or other libraries. It is a static error if the import references in a pipeline or pipeline library are circular.

If the p:pipeline-library specifies a namespace with the namespace attribute, then all of the pipelines that occur in the library are in that namespace.

For example, given the following pipeline library:

<p:pipeline-library xmlns:p="http://www.w3.org/2007/03/xproc"
                    namespace="http://example.com/ns/pipelines">

<p:import href="ancillary-library.xml"/>
<p:import href="other-pipeline.xml"/>

<p:pipeline name="validate">
  …
</p:pipeline>

<p:pipeline name="format">
  …
</p:pipeline>

</p:pipeline-library>

The pipelines named “validate” and “format” are in the namespace http://example.com/ns/pipelines. That means that those pipelines must be invoked from the importing pipeline with qualified names:

<ex:validate>
  …
</ex:validate>

(Assuming that the “ex” prefix is bound to http://example.com/ns/pipelines.)

The pipeline library namespace applies only to pipelines that are defined directly in the library; it does not apply to pipeline libraries that are imported or pipelines that are directly imported.

5.11 p:import Element

An p:import loads a pipeline or pipeline library, making it available in the current environment.

<p:import href = URI />

An import statement loads the specified URI and makes any pipelines declared within it available to the current pipeline. An imported pipeline has an implicit signature that consists of the inputs, outputs, options, and parameters declared on it.

It is a dynamic error if the URI of a p:import cannot be retrieved or if, once retrieved, it does not point to a p:pipeline-library or p:pipeline. It is a dynamic error to import a single pipeline if that pipeline does not have a name.

5.12 p:pipe Element

A p:pipe reads from the output port of another step.

<p:pipe step = NCName port = NCName />

The p:pipe element connects to the output port of another step. It identifies the output port two which it connects with the name of the step in the step attribute and the name of the port on that step in the port attribute.

In all cases except the p:output of a compound step, it is a static error if the port identified by a p:pipe is not in the readable ports of the environment of the step that contains the p:pipe.

It is a static error if the port identified by a p:pipe in the p:output of a compound step is not in the readable ports of the environment inherited by the contained steps of the compound step.

In other words, the output of a compound step must be bound to the output of one of its contained steps. All other bindings must be to ports that are already readable in the current environment.

5.13 p:inline Element

A p:inline provides a document or a sequence of documents inline.

<p:inline> anyElement </p:inline>

The content of the p:inline element is wrapped in a document node and passed as input. The base URI of the document is the base URI of the p:inline element.

Note

The nodes inside a p:inline element naturally inherit the namespaces that are in-scope at the point where they occur in the pipeline document. Implementations must assure that those namespaces remain in-scope in the resulting document.

It is a static error if the content of the p:inline element is not a well-formed XML document.

5.14 p:document Element

A p:document reads an XML document from a URI.

<p:document href = URI />

The document identified by the URI in the href attribute is loaded and returned.

It is a dynamic error if the document referenced by a p:document element does not exist, cannot be accessed, or is not a well-formed XML document.

The parser which the p:document element employs must be conformant Namespaces in XML. It must not perform validation. It must not perform any other processing, such as expanding XIncludes.

Use the load step if you need to perform DTD-based validation or if you wish to load documents that are not namespace well-formed.

5.15 p:doc Element

A p:doc contains human-readable documentation.

<p:doc> any-well-formed-content </p:doc>

There are no constraints on the content of the p:doc element. Documentation is ignored by pipeline processors.

6 Errors

Errors in a pipeline can be divided into two classes: static errors and dynamic errors.

6.1 Static Errors

[Definition: A static error is one which can be detected before pipeline evaluation is even attempted.] Examples of static errors include cycles, incorrect specification of inputs and outputs, and reference to unknown steps.

Static errors are fatal and must be detected before any steps are evaluated.

Static Errors

It is a static error if there are any loops in the connections between steps: no step can be connected to itself nor can there be any sequence of connections through other steps that leads back to itself.
Connections
All in-scope steps must have unique names: it is a static error if two steps with the same name appear in the same scope.
Scoping of Names
It is a static error if the port specified by a p:pipe is not in the readable ports of the environment.
Associating Documents with Ports
It is a static error if the p:ignore-prefixes attribute appears on any element which does not identify a step.
Ignored namespaces
It is a static error if any token specified in the p:ignore-prefixes attribute is not the prefix of an in-scope namespace.
Ignored namespaces
It is a static error if an output is bound to the default output port and the default output port is undefined.
Pipeline, For-Each, Viewport, Choose, Group, Try/Catch
It is a static error if two subpipelines in a choose declare different outputs.
Choose
It is a static error if no context is specified in either the p:choose or the p:when and the default readable port is undefined.
Choose
It is a static error if the p:group and p:catch subpipelines declare different outputs.
Try/Catch
It is a static error if a pipeline contains a step that is not declared or imported or if the specified inputs, outputs, and parameters do not match the signature for steps of that type.
Other Steps
It is a static error to identify two ports with the same name on the same step.
p:input Element, p:output Element
It is a static error if the port given does not match the name of an input port specified in the step's declaration.
p:input Element
It is a static error if the port given does not match the name of an output port specified in the step's declaration.
p:output Element
It is a static error to identify two different output ports as the default.
p:output Element
It is a static error to specify default="no" on the p:output of a step which has exactly one output.
p:output Element
It is a static error to specify that the parameter is required or that it has a default value if the name given is not a QName.
Declaring Parameters
It is a static error to specify that the parameter is both required and has a default value.
Declaring Parameters
If a parameter is required, it is a static error to invoke the step without specifying a value for that parameter.
Declaring Parameters
It is a static error if the variable reference uses a QName that is not the name of an in-scope parameter or if the reference is circular, either directly or indirectly.
Assigning Values to Parameters
It is a static error to identify an unrecognized step in a subpipeline.
p:declare-step Element
It is a static error if the import references in a pipeline or pipeline library are circular.
p:pipeline-library Element
It is a static error if the port identified by a p:pipe in the p:output of a compound step is not in the readable ports of the environment inherited by the contained steps of the compound step.
p:pipe Element
It is a static error if the content of the p:inline element is not a well-formed XML document.
p:inline Element

6.2 Dynamic Errors

A [Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space).

If a step fails due to a dynamic error, failure propagates upwards until either a try is encountered or the entire pipeline fails. In other words, outside of a try, step failure causes the entire pipeline to fail.

Dynamic Errors

It is a dynamic error if a non-XML resource is produced on a step output or arrives on a step input.
Inputs and Outputs
It is a dynamic error if the processor attempts to retrieve the URI specified on a p:document and fails.
Associating Documents with Ports
It is a dynamic error if the viewport source is a sequence of more than one document or if the output from any iteration is a sequence of more than one document.
Viewport
It is a dynamic error if no subpipeline is selected by the choose and no default is provided.
Choose
It is a dynamic error if the xpath-context is bound to a sequence of documents.
Choose
If sequence is not specified on p:input, or has the value “no”, then it is a dynamic error for a sequence of more than one document to appear on the declared port.
p:input Element
If sequence is not specified on p:output, or has the value “no”, then it is a dynamic error if the step produces a sequence of more than one document on the declared port.
p:output Element
It is a dynamic error if a document sequence is specified in the binding for a p:parameter.
Assigning Values to Parameters
It is a dynamic error if the URI of a p:import cannot be retrieved or if, once retrieved, it does not point to a p:pipeline-library or p:pipeline.
p:import Element
It is a dynamic error to import a single pipeline if that pipeline does not have a name.
p:import Element
It is a dynamic error if the document referenced by a p:document element does not exist, cannot be accessed, or is not a well-formed XML document.
p:document Element

A Standard Step Library

This appendix describes the standard XProc components.

Some components in this appendix consume or produce an XML vocabulary defined in this section. In all cases, the namespace for that vocabulary is http://www.w3.org/2007/03/xproc-component and is represented by the prefix 'c:' in this appendix.

Note

The components described in this draft are intended mainly as a starting point for discussion and to present a flavor for the sorts of components envisioned. The WG has not yet discussed them in detail.

1 Required Components

This section describes standard components that must be supported by any conforming processor.

1.1 Identity

The identity component makes a verbatim copy of its input available on its output.

<p:declare-step type="p:identity">
<p:input port="source" sequence="yes"/>
<p:output port="result" sequence="yes"/>
</p:declare-step>

1.2 Join Sequences

The Join Sequences component accepts an arbitrary number of input documents via an arbitrary number of input ports and aggregates them into one sequence of documents.

<p:declare-step type="p:join-sequences">
<p:input port="*" sequence="yes"/>
<p:output port="result" sequence="yes"/>
</p:declare-step>

1.3 Load

The load component has no inputs but takes a parameter that specifies a URI of an XML resource that should be loaded and provided as the result.

<p:declare-step type="p:load">
<p:output port="result"/>
<p:option name="href" required="yes"/>
</p:declare-step>

Load attempts to read an XML document from the specified URI. If the document does not exist, or is not well-formed, the component fails. Otherwise, the document read is produced on the result port.

1.4 Store

The store component stores a serialized version of its input to a URI. The URI is either specified explicitly by the 'href' parameter or implicitly by the base URI of the document. This component outputs a reference to the location of the stored document.

Note

Should this component allow sequences on its input?

<p:declare-step type="p:store">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="href" required="no"/>
     <p:option name="encoding" required="no" value="UTF-8"/>
</p:declare-step>

The component attempts to store the XML document to the specified URI. If that URI scheme is not supported or such storage is not allowed, the component fails.

The output of this component is a document containing a single element of the form:

<c:result href = anyURI />

Note

A more direct “serialize-to-octet-stream” component may also be required. One, for example, that supports the XSLT 2.0/XQuery 1.0 Serialization specification.

1.5 Subsequence

The Subsequence component accepts a sequence of documents and produces a subsequence of that input. It applies an XPath expression to each document in the sequence to determine whether the document is in the output. If the expression evaluates to true, the document is copied to the output sequence.

<p:declare-step type="p:subsequence">
     <p:input port="source" sequence="yes"/>
     <p:output port="result" sequence="yes"/>
     <p:option name="test" required="yes"/>
</p:declare-step>

1.6 XInclude

The XInclude component applies xinclude processing semantics to the document. The referenced documents are calculated against the base URI and are not provided as input to the component.

<p:declare-step type="p:xinclude">
<p:input port="source" sequence="no"/>
<p:output port="result" sequence="no"/>
</p:declare-step>

1.7 XSLT

The xslt component applies an XSLT 1.0 transformation to a document. The transformation is supplied by a single document on the input port named 'transform'. That transformation is applied to the primary source document supplied on the input port named 'source'. The result of the transformation is a sequence of documents on its 'result' port.

<p:declare-step type="p:xslt">
     <p:input port="source" sequence="no"/>
     <p:input port="transform" sequence="no"/>
     <p:output port="result" sequence="yes"/>
     <p:parameter name="*"/>
</p:declare-step>

All of the specified parameters are made available to the XSLT processor. If the XSLT processor signals a fatal error, the component fails, otherwise the result of the transformation is produced on the result port.

It should be noted that an XSLT 1.0 processor without any extensions can only produce a single XML document as its result. However, many XSLT 1.0 processors provide extensions which allow the processor to produce more than one result. In such cases, more than one document may appear in the result port. The principle result document will always appear last.

1.8 Serialize

The serialize component applies XML serialization to the children of the document element and replaces those children with their serialization. The outcome is a single element with text content that represents the "escaped" syntax of the children if they were serialized.

<p:declare-step type="p:serialize">
<p:input port="source" sequence="no"/>
<p:output port="result" sequence="no"/>
</p:declare-step>

For example, the input:

<description>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is a chunk of XHTML.</p>
</div>
</description>

and produces:

<description>
&lt;div xmlns="http://www.w3.org/1999/xhtml">
&lt;p>This is a chunk of XHTML.&lt;/p>
&lt;/div>
</description>

1.9 Parse

The parse component takes the text value of the document element and parses the content as if it was and unicode character stream containing XML. The outcome is a single element with children from the parsing of the XML content. This is the reverse of the serialize component.

When the text value is parsed, a document element wrapper must be assumed so that element siblings can be parsed back into XML. Further, if the 'namespace' parameter is specified, the default namespace is declared on that wrapper element.

If the 'content-type' parameter is specified, an implementation can use a different parser to produce XML content. Such a behavior is implementation defined. For example, for the mime type 'text/html', an implementation might provide an HTML to XHTML parser (e.g. Tidy).

<p:declare-step type="p:parse">
     <p:input port="source" sequence="no"/>
     <p:output port="result" sequence="no"/>
     <p:option name="namespace" required="no"/>
     <p:option name="content-type" required="no"/>
</p:declare-step>

For example, with the 'namespace' parameter set to the XHTML namespace, the following input:

<description>
&lt;p>This is a chunk.&lt;/p>
&lt;p>This is a another chunk.&lt;/p>
</description>

would produce:

<description>
<p xmlns="http://www.w3.org/1999/xhtml">This is a chunk.</p>
<p xmlns="http://www.w3.org/1999/xhtml">This is a another chunk.</p>
</description>

2 Optional Components

2.1 Http Request

The Http Request component provides interactions with resources identified by URIs over HTTP. The input document specifies the resource, options as to how the HTTP request is made, and possibly the content of the request.

<p:declare-step type="p:http-request">
     <p:input port="source" sequence="no"/>
     <p:output port="result" sequence="no"/>
     <p:option name="status-only" required="no" value="false"/>
     <p:option name="override-mimetype" required="no"/>
</p:declare-step>

The input XML is structured as:

Editorial Note

The formatting of the XML input and output is crude and incomplete in this draft.

<c:http-request
  method = NCName
  href = anyURI
  status-only? = boolean
  override-mimetype? = string>
   (c:header*,
    c:entity?)
</c:http-request>

<c:header
  name = string
  value = string />

<c:entity
  content-type? = string
  set-content-length? = boolean>
   (c:body+)
</c:entity>

<c:body
  content-type? = string>
   (anyElement?)
</c:body>

The component responds with an XML element structured as:

<c:http-response
  status = integer>
   (c:header*,
    c:entity?)
</c:http-response>

Any content returned that has an XML mime type is returned as a child of the 'body' element. If the response is a text mime type and can be encoded in unicode, the content is encoded as the text children of the body element. If the content is none of these, the response is encoded as base64 data textually represented as the text content of the body element.

The component may override the returned content type by the 'override-mimetype' option and the user may do the same via the 'override-mimetype' attribute. If this value is specified, the returned content type header is ignored and the mime type specified is used to determine what to do with the output. The original mime type is still provided in the response XML.

For example, a form post would be formulated as:

<c:http-request method="post" href="http://www.example.com/form-action" xmlns:c="http://www.w3.org/2007/03/xproc-component">
<c:entity content-type="application/x-www-form-urlencoded">
<c:body>
name=W3C&spec=XProc
</c:body>
</c:entity>
</c:http-request>

and if the response was an XHTML document, the response would be:

<c:http-response status="200" xmlns:c="http://www.w3.org/2007/03/xproc-component">
<c:entity content-type="application/xhtml+xml">
<c:body>
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>OK</title></head>
<body><p>OK!</p></body>
</html>
</c:body>
</c:entity>
</c:http-response>

2.2 Relax NG Validate

The Relax NG Validate component applies Relax validation to an XML document input.

<p:declare-step type="p:validate-relax-ng">
     <p:input port="source" sequence="no"/>
     <p:input port="schema" sequence="no"/>
     <p:output port="result" sequence="no"/>
</p:declare-step>

2.3 XML Schema Validate

The XML Schema Validate component applies XML Schema's validity assessment to an XML document input.

<p:declare-step type="p:validate-xml-schema">
     <p:input port="source" sequence="no"/>
     <p:input port="schema" sequence="yes"/>
     <p:output port="result" sequence="no"/>
     <p:option name="assert-valid" value="true"/>
     <p:option name="mode" value="strict"/>
</p:declare-step>

2.4 XSLT 2.0

The XSLT 2.0 component applies an XSLT 2.0 transformation to a document. The transformation is supplied on the port named 'transform' and the primary input document is supplied on the port named 'document'. The application of the transformation produces a sequence of documents on the output port named 'result'.

<p:declare-step type="p:xslt2">
     <p:input port="source" sequence="yes"/>
     <p:input port="transform" sequence="no"/>
     <p:output port="result" sequence="yes"/>
     <p:option name="initial-mode"/>
     <p:option name="template-name"/>
     <p:option name="allow-version-mismatch" value="true"/>
     <p:option name="output-base-uri"/>
     <p:option name="allow-collections" value="true"/>
     <p:parameter name="*"/>
</p:declare-step>

If a sequence of documents is provided on the input port named 'source', the first document is assumed to be the primary input document. By default, this sequence is also the default collection unless the 'allow-collections' parameter is set to 'false'.

The invocation of the transformation is controlled by the 'initial-mode' and 'template-name' parameter that set the initial mode and/or named template in the XSLT transformation that should initiate processing. If these values do not match the transformation specified, a dynamic error must be thrown.

The 'allow-version-mismatch' parameter indicates whether an XSLT 1.0 transformation should be allowed to be run through the XSLT 2.0 processor. A value of 'true' means that it should be allow.

The 'output-base-uri' parameter sets the context's output base URI per the XSLT 2.0 specification.

More than one document can be produced on the output port 'result'. In such cases, the principle result document will always appear last.

2.5 XSL Formatter

The XSL Formatter component receives an XSL FO document and renders the content. The result of rendering is stored to the uri provided via the 'uri' option. A reference to that result is produced on the output port.

The output content type is controlled by the 'output' option which contains the mime type of the output format. A formatter may take any number of optional rendering parameters via the step's parameters. Such parameters are defined by the XSL implementation used and are implementation defined.

<p:declare-step type="p:xsl-formatter">
     <p:input port="source" sequence="no"/>
     <p:output port="result" sequence="no"/>
     <p:option name="uri" required="yes"/>
     <p:option name="output"/>
     <p:parameter name="*"/>
</p:declare-step>

The output of this component is a document containing a single element of the form:

<c:result href = anyURI type = string/>

2.6 XQuery 1.0

The XQuery 1.0 component applies an XQuery to a sequence of documents treated as the default collection. The 'source' input port allows a sequence of documents and specifies each document that should be in the default collection. The result of the xquery is a sequence of documents constructed from a XPath 2.0 sequence of elements that are assumed to be the document element of separate documents.

<p:declare-step type="p:xquery">
     <p:input port="source" sequence="yes"/>
     <p:input port="query" sequence="no"/>
     <p:output port="result" sequence="yes"/>
     <p:parameter name="*"/>
</p:declare-step>

Since some queries do not start with an XML document element, a wrapper element of any 'query' in no namespace is allow and will be ignored by the component. The serialization of the children of this element is the query text. Any other element is assumed to be the start of a query with no prolog and the serialization of the document element is the query text.

3 Micro-Operations Components

Note

No decisions have been made about whether these components will be optional or required.

3.1 Delete

The Delete component deletes the matching items from the document on input port 'source' and outputs the results to the output port 'result'. The matching items are specified via an XPath in the parameter named 'target'.

<p:declare-step type="p:delete">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="target" required="yes"/>
</p:declare-step>

3.2 Insert

The insert component inserts a document specified on the 'insertion' port as a child of the target element in the document specified on the 'source' port. The target and position of this insert is governed by the parameters.

<p:declare-step type="p:insert">
     <p:input port="source"/>
     <p:input port="insertion"/>
     <p:output port="result"/>
     <p:option name="target"/>
     <p:option name="at-start" required="yes"/>
</p:declare-step>

The target of the insertion is specified via an XPath expression in the 'target' option. If no expression is supplied, the document element is the target.

If the at-start option is true, the insertion document will be inserted as the first child(ren) of the element, otherwise it will be inserted as the last child. If the option is not specified, a value of true is assumed.

3.3 Label Elements

The Label Elements component labels each element with a unique xml:id value. If the element already has an xml:id value, that value is preserved. A user may specify the 'prefix' parameter for prefixing the value of the xml:id value. This prefix does not affect existing xml:id values.

If an existing xml:id value conflicts with a previously generated value, the component fails.

<p:declare-step type="p:label-elements">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="prefix" required="no"/>
</p:declare-step>

3.4 Namespace Rename

The Namespace Rename component renames any namespace declaration or use of a namespace in a document to a new URI value. The source namespaces is identified by the 'from' parameter and the target namespace is identified by the 'to' parameter. The 'from' parameter value may be a space separate list of URI values.

<p:declare-step type="p:ns-rename">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="from" required="yes"/>
     <p:option name="to" required="yes"/>
</p:declare-step>

3.5 Rename

The rename component renames elements or attributes in a document based on parameter values.

<p:declare-step type="p:rename">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="target" required="no"/>
     <p:option name="name" required="yes"/>
</p:declare-step>

Each element, attribute, or processing-instruction identified by the XPath 1.0 expression specified by the 'target' parameter is renamed. The name of elements and attributes, and the target of processing-instructions, are changed to the value of the 'name' parameter.

The component fails if the specified name is not a valid name or if the renaming would introduce a syntactic error into the document (i.e., if it would create two attributes with the same name on the same element).

3.6 Replace

The replace component replaces a target element in the input document specified on the port named 'source' with the document element of the document specified on the port named 'replacement'. The result is produces as a single document on the port named 'result'.

The target of the replace is specified via an XPath in the 'target' option that must identify an element. If the target identifies multiple elements, all the elements are replaced.

<p:declare-step type="p:replace">
     <p:input port="source"/>
     <p:input port="replacement"/>
     <p:output port="result"/>
     <p:option name="target" required="no"/>
</p:declare-step>

3.7 Set-attributes

The set-attributes component sets attribute values on the document element using the attribute values provided on the document element of the 'attribute' port's document. That is, it copies the attributes on the document element from the 'attributes' input port to the document element of the 'source' input port.

<p:declare-step type="p:set-attributes">
     <p:input port="source"/>
     <p:input port="attributes"/>
     <p:output port="result"/>
</p:declare-step>

3.8 Unwrap

The Unwrap component removes the target and replaces it with its children within the input specified on the 'source' port with a new element. The target is identified by an XPath parameter named 'target'. A single document is produced on the output port named 'result'.

<p:declare-step type="p:unwrap">
     <p:input port="source" sequence="no"/>
     <p:output port="result"/>
     <p:option name="target" required="yes"/>
</p:declare-step>

3.9 Wrap

The Wrap component wraps the target found with the input specified on the 'source' port with a new element. The target is identified by an XPath parameter named 'target'. The new element named via the 'name' parameter replaces the target in the input document where the target becomes the single child of the new element. A single document is produced on the output port named 'result'.

<p:declare-step type="p:wrap">
     <p:input port="source" sequence="no"/>
     <p:output port="result"/>
     <p:option name="name" required="yes"/>
     <p:option name="target" required="yes"/>
</p:declare-step>

B References

[XML Core Req] XML Processing Model Requirements. Dmitry Lenkov, Norman Walsh, editors. W3C Working Group Note 05 April 2004

[Infoset] XML Information Set (Second Edition). John Cowan, Richard Tobin, editors. W3C Working Group Note 04 February 2004.

[XML 1.0] Extensible Markup Language (XML) 1.0 (Fourth Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006.

[XML 1.1] Extensible Markup Language (XML) 1.1 (Second Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006.

[XPath 1.0] XML Path Language (XPath) Version 1.0. James Clark and Steve DeRose, editors. W3C Recommendation. 16 November 1999.

[XSLT 1.0] XSL Transformations (XSLT) Version 1.0. James Clark, editor. W3C Recommendation. 16 November 1999.

[xml:id] xml:id Version 1.0. Jonathan Marsh, Daniel Veillard, and Norman Walsh, editors. W3C Recommendation. 9 September 2005.

[XML Base] XML Base. Jonathan Marsh, editor. W3C Recommendation. 27 June 2001.

C Glossary

pipeline

A pipeline is a set of steps connected together, with outputs flowing into inputs, without any loops (no step can read its own output, directly or indirectly).