This document is also available in these non-normative formats: XML, Revision markup
Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification describes the syntax and semantics of XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.
An XML Pipeline specifies a sequence of operations to be performed on one or more XML documents. Pipelines generally accept one or more XML documents as input and producing one or more XML documents as output, though they are not required to do so. Some pipelines are entirely self-contained, starting with input derived inside the pipeline and producing no XML output.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was produced by the XML Processing Model Working Group which is part of the XML Activity. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is a public Working Draft. This draft addresses many, but not all, of the design questions that were incomplete in previous drafts. The library of standard steps, both required and optional, is still being reviewed and considered. The Working Group continues to encourage feedback from potential users. A revision marks draft, with respect to the 5 April 2007 specification, has been provided, though it is not obviously of great value due to editorial reorganization of the material.
The most significant changes in this draft are: a new mechanism for dealing with parameters, new defaulting rules for primary input and output ports, and revisions to the standard step library.
Please send comments about this document to public-xml-processing-model-comments@w3.org (public archives are available).
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.
A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.
There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps include a subpipeline of steps within themselves.
This specification defines a standard library, Appendix A, Standard Step Library, of steps. Pipeline implementations may support additional types of steps as well.
Figure 1, “A simple, linear XInclude/Validate pipeline” is a graphical representation of a simple pipeline that performs XInclude processing and validation on a document.

This is a pipeline that consists of two atomic steps, XInclude and Validate. The pipeline itself has two inputs, “Document” and “Schema”. How these inputs are connected to XML documents outside the pipeline is implementation-defined. The XInclude step reads the pipeline input “Document” and produces a result document. The Validate step reads the pipeline input “Schema” and the output from the XInclude step and produces a result document. The result of the validation, “Result Document”, is the result of the pipeline. How pipeline outputs are connected to XML documents outside the pipeline is implementation-defined.
The pipeline document for this pipeline is shown in Example 1, “A simple, linear XInclude/Validate pipeline”.
<p:pipeline name="fig1" xmlns:p="http://www.w3.org/2007/03/xproc">
<p:input port="source" primary="yes"/>
<p:input port="schemaDoc" sequence="yes" primary="no"/>
<p:output port="result"/>
<p:xinclude/>
<p:validate-xml-schema>
<p:input port="schema">
<p:pipe step="fig1" port="schemaDoc"/>
</p:input>
</p:validate-xml-schema>
</p:pipeline>
Figure 2, “A validate and transform pipeline” is a more complex example: it performs schema validation with an appropriate schema and then styles the validated document.

The heart of this example is the conditional. The “choose” step evaluates an XPath expression over a test document. Based on the result of that expression, one or another branch is run. In this example, each branch consists of a single validate step.
<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
<p:documentation xmlns="http://www.w3.org/1999/xhtml">
<div>
<p>This is documentation</p>
</div>
</p:documentation>
<p:choose>
<p:when test="/*[@version < 2.0]">
<p:validate-xml-schema name="val1">
<p:input port="schema">
<p:document href="v1schema.xsd"/>
</p:input>
</p:validate-xml-schema>
</p:when>
<p:otherwise>
<p:validate-xml-schema name="val2">
<p:input port="schema">
<p:document href="v2schema.xsd"/>
</p:input>
</p:validate-xml-schema>
</p:otherwise>
</p:choose>
<p:xslt name="xform">
<p:input port="stylesheet">
<p:document href="stylesheet.xsl"/>
</p:input>
</p:xslt>
</p:pipeline>
[Definition: A pipeline is a set of connected steps, outputs flowing into inputs, without any loops (no step can read its own output, directly or indirectly).] A pipeline is itself a step and must satisfy the constraints on steps.
The result of evaluating a pipeline is the result of evaluating the steps that it contains, in the order determined by the connections between them. A pipeline must behave as if it evaluated each step each time it occurs. Unless otherwise indicated, implementations must not assume that steps are functional (that is, that their outputs depend only on their explicit inputs, options, and parameters) or side-effect free.
[Definition: A step is the basic computational unit of a pipeline. Steps are either atomic or compound.] [Definition: An atomic step is a step that performs a unit of XML processing, such as XInclude or transformation, and has no internal subpipeline.] Atomic steps carry out fundamental XML operations and can perform arbitrary amounts of computation, but they are indivisible. An XSLT step, for example, performs XSLT processing; an XML Schema Validation step validates one input with respect to some set of XML Schemas, etc.
There are many types of atomic steps. The standard library of atomic steps is described in Appendix A, Standard Step Library, but implementations may provide others as well. Each use, or instance, of an atomic step invokes the processing defined by that type of step. A pipeline may contain instances of many types of steps and many instances of the same type of step.
Compound steps, on the other hand, control and organize the flow of documents through a pipeline, reconstructing familiar programming language functionality such as conditionals, iterators and exception handling. They contain other steps, whose evaluation they control.
[Definition: A compound step is a step that contains additional steps. That is, a compound step differs from an atomic step in that its semantics are at least partially determined by the steps that it contains.]
Every compound step contains one or more steps. [Definition: The steps that occur directly inside a compound step are called contained steps.] [Definition: A compound step which immediately contains another step is called its container.]
[Definition: The steps (and the connections between them) within a compound step form a subpipeline.] [Definition: The last step in a subpipeline is the last step in document order within its container. ]
A compound step can contain one or more subpipelines and it determines how and which of its subpipelines are evaluated.
Steps have “ports” into which inputs and outputs are connected or “bound”. Each step has a number of input ports and a number of output ports, all with unique names. A step can have zero input ports and/or zero output ports. (All steps have an implicit output port for reporting errors that must not be declared.)
Steps have any number of options, all with unique names. A step can have zero options.
Steps may have access to any number of parameters, all with unique names. A step can have zero parameters.
Although some steps can read and write non-XML resources, what flows between steps through input ports and output ports are exclusively XML documents or sequences of XML documents. Each XML document (or document in a sequence) must conceptually be an [Infoset] with a Document Information Item at its root. The inputs and outputs can be implemented as sequences of characters, events, or object models, or any other representation the implementation chooses.
It is a
dynamic
error (err:XD0001) if a non-XML resource is produced
on a step output or arrives on a step input.
An implementation may make it possible for a step to produce non-XML output (through channels other than a named output port)—for example, writing a PDF document to a URI—but that output cannot flow through the pipeline. Similarly, one can imagine a step that takes no pipeline inputs, reads a non-XML file from a URI, and produces an XML output. But the non-XML file cannot arrive on an input port to a step.
The common case is that each step has one or more inputs and one or more outputs. Figure 3, “An atomic step” illustrates symbolically an atomic step with two inputs and one output.

All atomic steps are defined by a p:declare-step. The declaration of an atomic step defines the input ports, output ports, and options of all steps of that type. For example, every p:xslt step has two inputs, named “source” and “stylesheet”, and one output named “result” and the same set of options.
The situation is slightly more complicated for compound steps because they don't have separate declarations; each instance of a compound step serves as its own declaration. Compound steps don't have declared inputs, but they do have declared outputs, and unlike atomic steps, on compound steps, the number and names of the outputs can be different on each instance of the step.
Figure 4, “A compound step” illustrates symbolically a compound step with one output. As you can see from the diagram, the output from the compound step comes from one of the outputs of the subpipeline within the step.

[Definition: The input ports declared on a step are its declared inputs.] [Definition: The output ports declared on a step are its declared outputs.] When a step is used in a pipeline, connections are made to all of its inputs and outputs.
When a step is used, all of the declared inputs of the step must be connected. Each input may be connected to:
The output port of some other step.
A fixed, inline document or sequence of documents.
A document read from a URI.
One of the inputs declared on the top-level p:pipeline step.
When an input accepts a sequence of documents, it may have one or more bindings to any of those locations.
All of the declared outputs of a step must be connected. Outputs may be connected to:
The input port of some other step.
One of the outputs declared on the top-level p:pipeline step.
Output ports on compound steps have a dual nature: from the perspective of the compound step's siblings, its outputs are just ordinary outputs and must be connected as described above. From the perspective of the compound step itself, they are inputs into which something must be connected.
Within a compound step, the declared outputs of the step can be connected to:
The output port of some contained step.
A fixed, inline document or sequence of documents.
A document read from a URI.
Each input and output is declared to accept or produce either a single document or a sequence of documents. It is not an error to connect a port that is declared to produce a sequence of documents to a port that is declared to accept only a single document. It is, however, an error if the former step actually produces more than one document at run time.
[Definition: The signature of a step is the set of inputs, outputs, and options that it is declared to accept.] Each atomic step (e.g. XSLT or XInclude) has a fixed signature, declared globally or built-in, which all its instances share, whereas each compound step has its own implicit signature.
[Definition: A step matches its signature if and only if it specifies an input for each declared input, it specifies no inputs that are not declared, it specifies an option for each option that is declared to be required, and it specifies no options that are not declared.] In other words, every input and required option must be specified and only inputs and options that are declared may be specified. Options that aren't required do not have to be specified.
Steps may also produce error, warning, and informative messages. These messages appear on a special “error output” port defined (only) in the catch clause of a try/catch. Outside of a try/catch, the disposition of error messages is implementation-dependent.
As a convenience for pipeline authors, each step may have one input port designated as the primary input port and one output port designated as the primary output port.
[Definition: If a step has exactly one input port, or if one of its input ports is explicitly designated as the primary, then that input port is the primary input port of the step.] If a step has a single input port and that port is explicitly designated as not being the primary input port, or if a step has more than one input port and none is explicitly designated the primary, then the primary input port of that step is undefined.
[Definition: If a step has exactly one output port, or if one of its output ports is explicitly designated as the primary, then that output port is the primary output port of the step.] If a step has a single output port and that port is explicitly designated as not being the primary, or if a step has more than one output port and none is explicitly designated the primary, then the primary output port of that step is undefined.
The special significance of primary input and output ports is that they are connected automatically by the processor if no explicit binding is given. Generally speaking, if two steps appear sequentially in a subpipeline, then the primary output of the first step will automatically be connected to the primary input of the second.
Additionally, if a compound step has no declared inputs and the first step in its subpipeline has an unbound primary input, then an implicit (and unnamed) primary input port will be added to the compound step. If a compound step has no declared outputs and the last step in its subpipeline has an unbound primary output, then an implicit (and also unnamed) primary output port will be added to the compound step.
The practical consequence of these rules is that straightforward, linear pipelines are much simpler to read, write, and understand. The following pipeline has a single input which is transformed by the XSLT step; the result of that XSLT step is the result of the pipeline:
<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
<p:xslt>
<p:input port="stylesheet">
<p:document href="docbook.xsl"/>
</p:input>
</p:xslt>
</p:pipeline>
It is semantically equivalent to this pipeline:
<p:pipeline name="main" xmlns:p="http://www.w3.org/2007/03/xproc">
<p:input port="source"/>
<p:input port="result">
<p:pipe step="transform" port="result"/>
</p:input>
<p:xslt name="transform">
<p:input port="source">
<p:pipe step="main" port="source"/>
</p:input>
<p:input port="stylesheet">
<p:document href="docbook.xsl"/>
</p:input>
</p:xslt>
</p:pipeline>
Some steps accept options. Options are name/value pairs.
[Definition: An option is a name/value pair where the name is an expanded name and the value must be a string.] If a document, node, or other value is given, its [XPath 1.0] string value is computed and that string is used.
[Definition: The options declared on a step are its declared options.] All of the options specified on an atomic step must have been declared. Option names are always expressed as literal values, pipelines cannot construct option names dynamically.
[Definition: The options on a step which have specified values, either because a p:option element specifies a value or because the declaration included a default value, are its specified options.]
Some steps accept parameters. Parameters are name/value pairs.
[Definition: A parameter is a name/value pair where the name is an expanded name and the value must be a string.] If a document, node, or other value is given, its [XPath 1.0] string value is computed and that string is used.
Unlike options, which have names known in advance to the pipeline, parameters are not declared and their names may be unknown to the pipeline author. Pipelines can dynamically construct sets of parameters. Steps can read dynamically constructed sets with parameter inputs.
Steps are connected together by their input ports and output
ports. It is a
static
error (err:XS0001) if there are any loops in the
connections between steps: no step can be connected to itself nor
can there be any sequence of connections through other steps that
leads back to itself.
[Definition: The environment of a step is the static information available to each instance of a step in a pipeline.]
The environment consists of:
A set of readable ports. [Definition: The readable ports are the step name/output port name pairs that are visible to the step.] Inputs and outputs can only be connected to readable ports.
A set of in-scope options. [Definition: The in-scope options are the set of options that are visible to a step.] All of the in-scope options are available to the processor for computing option and parameter values. The actual options passed to a step are those that are declared for a step of its type and that have values either provided explicitly with p:option elements on the step or as defaults in the declaration of the step.
A default readable port. [Definition: The default readable port, which may be undefined, is a specific step name/port name pair from the set of readable ports.]
[Definition: The empty environment contains no readable ports, no in-scope options, and an undefined default readable port. ]
Unless otherwise specified, the environment of a contained step is its inherited environment. [Definition: The inherited environment of a contained step is an environment that is the same as the environment of its container with the standard modifications. ]
The standard modifications made to an inherited environment are:
All of the specified options of the container are added to the in-scope options. The value of any option in the environment with the same name as one of the options specified on the container is shadowed by the new value.
In other words, steps can access the most recently specified value of all of the options specified on any ancestor step.
The union of all the declared outputs of all of the step's sibling contained steps are added to the readable ports.
In other words, sibling steps can see each other's outputs in addition to the outputs visible to their container.
If there is a preceding sibling step element:
If that preceding sibling has a primary output port, then that output port becomes the default readable port.
Otherwise, the default readable port is undefined.
If there is not a preceding sibling step element, the default readable port is unchanged.
A step with no parent inherits the empty environment.
The XProc processor must support a few additional functions in XPath expressions evaluated by the processor.
XPath expressions within a pipeline document can interrogate the processor for information about the current state of the pipeline. Four aspects of the processor are exposed through the p:system-property function in the pipeline namespace:
Function: String p:system-property(String property)
The property string must have the form of a QName; the QName is expanded into a name using the namespace declarations in scope for the expression. The p:system-property function returns the string representing the value of the system property identified by the QName. If there is no such property, the empty string must be returned.
Implementations must provide the following system properties, which are all in the XProc namespace:
Returns a string which should be unique for each invocation of the pipeline processor.
The unique identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name.
Returns a string containing the name of the implementation, as defined by the implementer. This should normally remain constant from one release of the product to the next. It should also be constant across platforms in cases where the same source code is used to produce compatible products for multiple execution platforms.
Returns a string identifying the version of the implementation, as defined by the implementer. This should normally vary from one release of the product to the next, and at the discretion of the implementer it may also vary across different execution platforms.
Returns a string which identifies the vendor of the processor.
Returns a URI which identifies the vendor of the processor. Often, this is the URI of the vendor's web site.
Returns the version of XProc implemented by the processor; for processors implementing the version of XProc specified by this document, the number is “1.0”.
The value will always be a string in the lexical space of the decimal data type defined in [W3C XML Schema: Part 2]. This allows the value to be converted to a number for the purpose of magnitude comparisons.
The p:step-available function reports whether or not a particular type of step is understood by the processor.
Function: Boolean p:step-available(String step-type)
The step-type string must have the form of a QName; the QName is expanded into a name using the namespace declarations in scope for the expression. The p:step-available function returns true if and only if the processor knows how to evaluate steps of the specified type.
In the context of a p:for-each or a p:viewport, the p:iteration-count function reports the number of iterations that have occurred. In the context of other standard XProc compound steps, it returns 1.
Function: Integer p:iteration-count()
In the context of an extension compound step, the value is implementation-defined.
This section describes the normative XML syntax of XProc. This syntax is sufficient to represent all the aspects of a pipeline, as set out in the preceding sections.
Elements in a pipeline document represent the pipeline, the steps it contains, the connections between those steps, the steps and connections contained within them, and so on. Each step is represented by an element; a combination of elements and attributes specify how the inputs and outputs of each step are connected and how options and parameters are passed.
Conceptually, we can speak of steps as objects that have inputs and outputs, that are connected together and which may contain additional steps. Syntactically, we need a mechanism for specifying these relationships.
Containment is represented naturally using nesting of XML elements. If a particular element identifies a compound step then the step elements that are its immediate children form its subpipeline.
The connections between steps are expressed using names and references to those names.
Six kinds of things are named in XProc:
The XML syntax for XProc uses three namespaces:
The namespace of the XProc XML vocabulary described by this specification; by convention, the namespace prefix “p:” is used for this namespace.
The namespace used for documents that are inputs to and outputs from several standard and optional steps described in this specification. Some steps, such as p:http-request and p:store, have defined input or output vocabularies. We use this namespace for all of those documents. The conventional prefix “c:” is used for this namespace.
The namespace used for error reporting. When a step fails inside a p:try, it may produce error messages that can be inspected in the p:catch. The error namespace is used for those messages. The conventional prefix “err:” is used for this namespace.
The scope of the names of step types is the pipeline. Each pipeline processor has some number of built in step types and may declare (directly, or by reference to an external library) additional step types.
The scope of the names of the steps themselves is determined by
the environment of each step. In general,
the name of a step, the names of its sibling steps, the names of
any steps that it contains directly, the names of its ancestors;
and the names of its ancestor's siblings are all in the same scope.
All in-scope
steps must have unique names: it is a
static
error (err:XS0002) if two steps with the same name
appear in the same scope.
The scope of an input or output port name is the step on which it is defined. The names of all the ports on any step must be unique.
Taken together, these uniqueness constraints guarantee that the combination of a step name and a port name uniquely identifies exactly one port on exactly one in-scope step.
The scope of option names is essentially the same as the scope of step names, with the following caveat: whereas step names must be unique, option names may be repeated. An option specified on a step shadows any specification that may already be in-scope.
Parameter names are not scoped; they are distinct on each step.
[Definition: A binding associates an input or output port with some data source.] A document or a sequence of documents can be bound to a port in four ways: by source, by URI, by providing it inline, or by making it explicitly empty. Each of these mechanisms is allowed on the p:input, p:output, p:xpath-context, p:iteration-source, and p:viewport-source elements.
[Definition: A document is specified by URI if it is referenced with a URI.] The href attribute on the p:document element is used to refer to documents by URI.
In this example, the input to the p:identity step named “otherstep” comes from “http://example.com/input.xml”.
<p:identity name="otherstep">
<p:input port="source">
<p:document href="http://example.com/input.xml"/>
</p:input>
</p:identity>
It is a
dynamic
error (err:XD0002) if the processor attempts to
retrieve the URI specified on a p:document and fails. (For example, if the
resource does not exist or is not accessible with the user's
authentication credentials.)
[Definition: A document is specified by source if it references a specific port on another step.] The step and port attributes on the p:pipe element are used for this purpose.
In this example, the “document” input to the p:xinclude step named “expand” comes from the “result” port of the step named “otherstep”.
<p:xinclude name="expand">
<p:input port="source">
<p:pipe step="otherstep" port="result"/>
</p:input>
</p:xinclude>
When a p:pipe
is used, the specified port must be in the readable ports of
the current environment. It is a static error (err:XS0003) if the
port specified by a p:pipe is not in the readable ports of
the environment.
[Definition: An inline document is specified directly in the body of the element that binds it.] The content of the p:inline element is used for this purpose.
In this example, the “stylesheet” input to the XSLT step named “xform” comes from the content of the p:input element itself.
<p:xslt name="xform">
<p:input port="stylesheet">
<p:inline>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
...
</xsl:stylesheet>
</p:inline>
</p:input>
</p:xslt>
Inline documents are considered “quoted”, they are not interpolated or available to the pipeline processor in any way except as documents flowing through the pipeline.
[Definition: An empty sequence of documents is specified with the p:empty element.]
In this example, the “source” input to the XSLT 2.0 step named “generate” is explicitly empty:
<p:xslt2 name="generate">
<p:input port="source">
<p:empty/>
</p:input>
<p:input port="stylesheet">
<p:inline>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
...
</xsl:stylesheet>
</p:inline>
</p:input>
<p:option name="template-name" value="someName"/>
</p:xslt2>
If you omit the binding on a primary input port, a binding to the default readable port will be assumed. Making the binding explicitly empty guarantees that the binding will be to an empty sequence of documents.
Note that a p:input or p:output element may contain more than one p:pipe, p:document, or p:inline element. If more than one binding is provided, then the specified sequence of documents is made available on that port in the same order as the bindings.
Pipeline authors may add documentation to their pipeline documents with the p:documentation element. Except when it appears as a descendant of p:inline, the p:documentation element is completely ignored by pipeline processors, it exists simply for documentation purposes. (If a p:documentation is provided as a descendant of p:inline, it has no special semantics, it is treated literally as part of the document to be provided on that port.)
Pipeline processors that inspect the contents of p:documentation elements and behave differently on the basis of what they find are not conformant. Processor extensions must be specified with extension elements.
[Definition: An element from the XProc namespace may have any attribute not from the XProc namespace, provided that the expanded-QName of the attribute has a non-null namespace URI. Such an attribute is called an extension attribute.] The presence of an extension attribute must not cause the connections between steps to differ from the connections that any other conformant XProc processor would produce. They must not cause the processor to fail to signal an error that a conformant processor is required to signal. This means that an extension attribute must not change the effect of any XProc element except to the extent that the effect is implementation-defined or implementation-dependent.
A processor which encounters an extension attribute that it does not recognize must behave as if the attribute was not present.
The presence of an extension element must not cause the connections between steps to differ from the connections that any other conformant XProc processor would produce. They must not cause the processor to fail to signal an error that a conformant processor is required to signal. This means that an extension element must not change the effect of any XProc element except to the extent that the effect is implementation-defined or implementation-dependent.
There are three contexts in which an extension element might occur:
In an inline document. All elements in an inline document are considered quoted; no extension element can occur.
In a subpipeline. In a subpipeline, any element in a namespace that is in the set of ignored namespaces is an extension element. Every other element identifies a step.
In any other context, any element that is not in the pipeline namespace is an error.
The element children of a p:pipeline can come from many different namespaces. Some of the children identify steps in the subpipeline, others may be extension elements. In order to determine which elements are extension elements and which are expected to identify steps, the pipeline may specify a set of “ignored namespaces”
The ignored namespaces are a set of namespaces which do not identify steps. They are ignored by the processor unless the processor happens to recognize one or more of them as extension elements.
Syntactically, a pipeline author can specify the set of ignored
namespaces with the ignore-prefixes
attribute. This attribute can appear on the p:pipeline and
p:pipeline-library elements. It is a static
error (err:XS0004) if the ignore-prefixes attribute appears on any other
element in the pipeline namespace.
The value of the ignore-prefixes
attribute is a sequence of tokens, each of which must be the prefix
of an in-scope namespace. It is a static error (err:XS0005) if any
token specified in the ignore-prefixes attribute is not the prefix of
an in-scope namespace.
Elements in an ignored namespace are only ignored when they appear as the direct children of the p:pipeline or p:pipeline-library which specifies the ignored namespaces.
Any ignored namespaces that are specified in a pipeline library are not inherited by pipelines either within that library or that import that library, they only apply to the elements that appear as children of the p:pipeline-library element on which they are specified.
The description of each element in the pipeline namespace is accompanied by a syntactic summary that provides a quick overview of the element's syntax:
<p:some-element
some-attribute? = some-type>
(some |
elements |
allowed)*,
other-elements?
</p:some-element>
For clarity of exposition, some attributes and elements are elided from the summaries:
An xml:id attribute is allowed on any element. It has the semantics of [xml:id].
An xml:base attribute is allowed on any element. It has the semantics of [XML Base].
The p:documentation element is not shown, but it is allowed anywhere.
Attributes that are syntactic shortcuts for option values are not shown.
This section describes the core steps of XProc.
Every compound step in a pipeline has several parts: a set of inputs, a set of outputs, a set of options, a set of contained steps, and an environment.
In previous drafts, inputs, outputs, and options occurred in a fixed order. In this draft, they may appear in any order (but before the contained steps). Is that problematic?
Except where otherwise noted, a compound step can have an arbitrary number of outputs, options, and contained steps.
It is a
static
error (err:XS0027) if a compound step has no
contained
steps.
A pipeline is specified by the p:pipeline element. It encapsulates the behavior of a subpipeline. Its children declare the inputs, outputs, and options that the pipeline exposes and identify the steps in its subpipeline.
A pipeline can declare additional steps (e.g., ones that are provided by a particular implementation or in some implementation-defined way) and import other pipelines. If a pipeline has been imported, it may be invoked as a step within the pipeline that imported it.
<p:pipeline
name? = NCName
type? = QName
ignore-prefixes? = prefix list>
(p:input |
p:output
|
p:option
|
p:import
|
p:declare-step |
p:log)*,
subpipeline
</p:pipeline>
Viewed from the outside, a p:pipeline is a black box which performs some calculation on its inputs and produces its outputs. From the pipeline author's perspective, the computation performed by the pipeline is described in terms of contained steps which read the pipeline's inputs and produce the pipeline's outputs.
The environment inherited by the contained steps of a p:pipeline is the empty environment with these modifications:
All of the declared inputs of the pipeline are added to the readable ports in the environment.
If the pipeline has a primary input port, that input is the default readable port, otherwise the default readable port is undefined.
All of the declared options of the pipeline are added to the in-scope options in the environment.
If the p:pipeline has a primary output
port and that port has no binding, then it is bound to the primary output
port of the last step in the subpipeline.
It is a
static
error (err:XS0006) if the primary output port has no
binding and the last step in the subpipeline does not have a
primary output port.
There are two additional constraints on pipelines:
A p:pipeline must not itself be a contained step.
If a p:pipeline is part of a p:pipeline-library or if it is imported directly with p:import, then it must have a name or a type or both.
If the pipeline initially invoked by the processor has inputs or outputs, those ports are bound to documents outside of the pipeline in an implementation-defined manner.
If a pipeline has a type then that type may be used as the name of a step to invoke the pipeline. This most often occurs when the it has been imported into another pipeline, but pipelines may also invoke themselves recursively. If it does not have a type, then its name is used to invoke it as a step.
For pipelines that are part of a p:pipeline-library, see Section 5.9, “p:pipeline-library Element” for more details on how p:pipeline names are used to compute step names.
A pipeline might accept a document and a stylesheet as input; perform XInclude, validation, and transformation; and produce the formatted document as its output.
<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/2007/03/xproc">
<p:input port="document" primary="yes"/>
<p:input port="stylesheet"/>
<p:output port="result" primary="yes"/>
<p:xinclude/>
<p:validate-xml-schema>
<p:input port="schema">
<p:document href="http://example.com/path/to/schema.xsd"/>
</p:input>
</p:validate-xml-schema>
<p:xslt>
<p:input port="stylesheet">
<p:pipe step="pipeline" port="stylesheet"/>
</p:input>
</p:xslt>
</p:pipeline>
A for-each is specified by the p:for-each element. It processes a sequence of documents, applying its subpipeline to each document in turn.
<p:for-each
name? = NCName>
(p:iteration-source?,
(p:output
|
p:option |
p:log)*,
subpipeline)
</p:for-each>
When a pipeline needs to process a sequence of documents using a step that only accepts a single document, the p:for-each construct can be used as a wrapper around the step that accepts only a single document. The p:for-each will apply that step to each document in the sequence in turn.
The result of the p:for-each is a sequence of documents produced by processing each individual document in the input sequence. If the subpipeline is connected to one or more output ports on the p:for-each, what appears on each of those ports is the sequence of documents that is the concatenation of the sequence produced by each iteration of the loop.
The p:iteration-source is an anonymous input: its binding provides a sequence of documents to the p:for-each step. If no iteration sequence is explicitly provided, then the iteration source is read from the default readable port.
A portion of each input document can be selected using the select attribute. If no selection is specified, the document node of each document is selected.
Each subtree selected by the p:for-each from each of the inputs that appear on the iteration source is wrapped in a document node and provided to the subpipeline.
The processor provides each document, one at a time, to the subpipeline represented by the children of the p:for-each on a port named current.
For each declared output, the processor collects all the documents that are produced for that output from all the iterations, in order, into a sequence. The result of the p:for-each on that output is that sequence of documents.
The environment inherited by the contained steps of a p:for-each is the inherited environment with these modifications:
The port named “current” on the p:for-each is added to the readable ports.
The port named “current” on the p:for-each is made the default readable port.
If the p:for-each has a primary output
port and that port has no binding, then it is bound to the primary output
port of the last step in the subpipeline.
It is a
static
error (err:XS0006) if the primary output port has no
binding and the last step in the subpipeline does not have a
primary output port.
A p:for-each might accept a sequence of chapters as its input, process each chapter in turn with XSLT, a step that accepts only a single input document, and produce a sequence of formatted chapters as its output.
<p:for-each name="chapters">
<p:iteration-source select="//chapter"/>
<p:output port="html-results">
<p:pipe step="make-html" port="result"/>
</p:output>
<p:output port="fo-results">
<p:pipe step="make-fo" port="result"/>
</p:output>
<p:xslt name="make-html">
<p:input port="stylesheet">
<p:document href="http://example.com/xsl/html.xsl"/>
</p:input>
</p:xslt>
<p:xslt name="make-fo">
<p:input port="source">
<p:pipe step="chapters" port="current"/>
</p:input>
<p:input port="stylesheet">
<p:document href="http://example.com/xsl/fo.xsl"/>
</p:input>
</p:xslt>
</p:for-each>
The //chapter elements of the document are selected. Each chapter is transformed into HTML and XSL Formatting Objects using an XSLT step. The resulting HTML and FO documents are aggregated together and appear on the html-results and fo-results ports, respectively, of the chapters step itself.
A viewport is specified by the p:viewport element. It processes a single document, applying its subpipeline to one or more subsections of the document.
<p:viewport
name? = NCName
match = XPath
expression>
((p:viewport-source?,
p:output?,
p:log?,
p:option*),
subpipeline)
</p:viewport>
The result of the p:viewport is a copy of the original document with the selected subsections replaced by the results of applying the subpipeline to them.
The p:viewport-source is an anonymous input: its binding provides a single document to the p:viewport step. If no document is explicitly provided, then the viewport source is read from the default readable port.
The match attribute specifies an [XPath 1.0] expression that is a Pattern in [XSLT 1.0]. Each matching node in the source document is wrapped in a document node and provided to the viewport's subpipeline.
The processor provides each document, one at a time, to the subpipeline represented by the children of the p:viewport on a port named current.
What appears on the output from the p:viewport will be a copy of the input document where each matching node is replaced by the result of applying the subpipeline to the subtree rooted at that node.
It is a
dynamic
error (err:XD0003) if the viewport source does not
provide exactly one document.
The environment inherited by the contained steps of a p:viewport is the inherited environment with these modifications:
The port named “current” on the p:viewport is added to the readable ports.
The port named “current” on the p:viewport is made the default readable port.
If the p:viewport has a primary output
port and that port has no binding, then it is bound to the primary output
port of the last step in the subpipeline.
It is a
static
error (err:XS0006) if the primary output port has no
binding and the last step in the subpipeline does not have a
primary output port.
A p:viewport might accept an XHTML document as its input, add an hr element before all div elements that have the class value “chapter”, and return an XHTML document that is the same as the original except for that change.
<p:viewport match="h:div[@class='chapter']"
xmlns:h="http://www.w3.org/1999/xhtml">
<p:insert at-start="true">
<p:input port="insertion">
<p:inline>
<hr xmlns="http://www.w3.org/1999/xhtml"/>
</p:inline>
</p:input>
</p:insert>
</p:viewport>
</p:pipeline>
The nodes which match h:div[@class='chapter'] (according to the rules of [XSLT 1.0]) in the input document are selected. An hr is inserted as the first child of each h:div and the resulting version replaces the original h:div. The result of the whole step is a copy of the input document with a horizontal rule as the first child of each selected h:div.
A choose is specified by the p:choose element. It selects exactly one of a list of alternative subpipelines based on the evaluation of [XPath 1.0] expressions.
<p:choose
name? = NCName>
(p:xpath-context?,
p:when*,
p:otherwise?)
</p:choose>
A p:choose has no inputs. It contains an arbitrary number of alternative subpipelines, exactly one of which will be evaluated.
The list of alternative subpipelines consists of zero or more subpipelines guarded by an XPath expression, followed optionally by a single default subpipeline.
The p:choose considers each subpipeline in turn and selects the first (and only the first) subpipeline for which the guard expression evaluates to true in its context. If there are no subpipelines for which the expression evaluates to true, the default subpipeline, if it was specified, is selected.
After a subpipeline is selected, it is evaluated as if only it had been present.
The result of the p:choose is the result of the selected subpipeline.
In order to ensure that the result of the p:choose is consistent irrespective of the
subpipeline chosen, each subpipeline must
declare the same number outputs with the same names. If any of the
subpipelines specifies a primary output port, each
subpipeline must specify exactly the same output as primary.
It is a
static
error (err:XS0007) if two subpipelines in a
p:choose declare different
outputs.
It is a
dynamic
error (err:XD0004) if no subpipeline is
selected by the p:choose and no
default is provided.
The p:choose can specify the
context node against which the [XPath 1.0] expressions that occur on each
branch are evaluated. The context node is specified as a binding for the
xpath-context. If no binding is
provided, the default xpath-context is
the document on the default readable port.
It is a
static
error (err:XS0032) if no binding is provided and the
default
readable port is undefined.
It is a
dynamic
error (err:XD0005) if the xpath-context is bound to a sequence of
documents.
Each conditional subpipeline is represented by a p:when element.
<p:when
test = XPath
expression>
(p:xpath-context?,
(p:output
|
p:option |
p:log)*,
subpipeline)
</p:when>
Each p:when branch of the p:choose has a test attribute which must contain an [XPath 1.0] expression. That XPath expression's effective boolean value is the guard expression for the subpipeline contained within that p:when.
The p:when
can specify a context node against which its test expression is to be evaluated. That
context node is specified as a binding for the xpath-context. If no context is specified on the
p:when, the
context of the p:choose is used.
It is a
static
error (err:XS0008) if no context is specified in
either the p:choose or the p:when and the
default
readable port is undefined.
The default branch is represented by a p:otherwise element.
<p:otherwise>
((p:output |
p:option |
p:log)*,
subpipeline)
</p:otherwise>
A p:choose might test the version attribute of the document element and validate with an appropriate schema.
<p:choose name="version">
<p:when test="/*[@version = 2]">
<p:validate-xml-schema>
<p:input port="schema">
<p:document href="v2schema.xsd"/>
</p:input>
</p:validate-xml-schema>
</p:when>
<p:when test="/*[@version = 1]">
<p:validate-xml-schema>
<p:input port="schema">
<p:document href="v1schema.xsd"/>
</p:input>
</p:validate-xml-schema>
</p:when>
<p:when test="/*[@version]">
<p:identity/>
</p:when>
<p:otherwise>
<p:error code="NOVERSION"
description="Required version attribute missing."/>
</p:otherwise>
</p:choose>
A group is specified by the p:group element. It encapsulates the behavior of its subpipeline.
<p:group
name? = NCName>
((p:output |
p:option |
p:log)*,
subpipeline)
</p:group>
A p:group is a convenience wrapper for a collection of steps. The result of a p:group is the result of its subpipeline.
<p:group>
<p:option name="db-key" value="some-long-string-of-nearly-random-characters"/>
<p:choose>
<p:when test="/config/output = 'fo'">
<p:xslt>
<p:parameter name="key" select="$db-key"/>
<p:input port="stylesheet">
<p:document href="fo.xsl"/>
</p:input>
</p:xslt>
</p:when>
<p:when test="/config/output = 'svg'">
<p:xslt>
<p:parameter name="key" select="$db-key"/>
<p:input port="stylesheet">
<p:document href="svg.xsl"/>
</p:input>
</p:xslt>
</p:when>
<p:otherwise>
<p:xslt>
<p:parameter name="key" select="$db-key"/>
<p:input port="stylesheet">
<p:document href="html.xsl"/>
</p:input>
</p:xslt>
</p:otherwise>
</p:choose>
</p:group>
A try/catch is specified by the p:try element. It isolates a subpipeline, preventing any errors that arise within it from being exposed to the rest of the pipeline.
<p:try
name? = NCName>
(p:group,
p:catch)
</p:try>
The p:group represents the initial subpipeline and the recovery (or “catch”) pipeline is identified with a p:catch element.
The p:try step evaluates the initial subpipeline and, if no errors occur, the results of that pipeline are the results of the step. However, if any errors occur, it abandons the first subpipeline, discarding any output that it might have generated, and evaluates the recovery subpipeline.
If the recovery subpipeline is evaluated, the results of the recovery subpipeline are the results of the p:try step. If the recovery subpipeline is evaluated and a step within that subpipeline fails, the p:try fails.
In order to ensure that the result of the p:try is consistent irrespective of whether the
initial subpipeline provides its output or the recovery subpipeline
does, both subpipelines must declare the same number of outputs
with the same names. If either of the subpipelines specifies a
primary
output port, both subpipelines must specify exactly the
same output as primary. It is a static error (err:XS0009) if the
p:group and
p:catch
subpipelines declare different outputs.
The recovery subpipeline of a p:try is identified with a p:catch:
<p:catch
name? = NCName>
((p:output |
p:option |
p:log)*,
subpipeline)
</p:catch>
The environment inherited by the contained steps of the p:catch is the inherited environment with this modification:
The port named “error” on the p:catch is added to the readable ports.
Should the error port be made the default readable port?
A pipeline might attempt to process a document by dispatching it to some web service. If the web service succeeds, then those results are passed to the rest of the pipeline. However, if the web service cannot be contacted or reports an error, the p:catch step can provide some sort of default for the rest of the pipeline.
<p:try>
<p:group>
<p:http-request>
<p:input port="source">
<p:inline>
<c:http-request method="post" href="http://example.com/form-action">
<c:entity-body content-type="application/x-www-form-urlencoded">
<c:body>name=W3C&spec=XProc</c:body>
</c:entity-body>
</c:http-request>
</p:inline>
</p:input>
</p:http-request>
</p:group>
<p:catch>
<p:identity>
<p:input port="source">
<p:inline>
<c:error>HTTP Request Failed</c:error>
</p:inline>
</p:input>
</p:identity>
</p:catch>
</p:try>
Other steps are specified by elements that occur as contained steps and are not in any of the the ignored namespaces.
Other steps can be atomic:
<pfx:other-atomic-step
name? = NCName>
(p:input |
p:option
|
p:parameter |
p:log)*
</pfx:other-atomic-step>
Or compound:
<pfx:other-compound-step
name? = NCName>
((p:input |
p:output |
p:option |
p:log)*,
subpipeline)
</pfx:other-compound-step>
Each atomic step must be the name of a p:pipeline type or must have been declared with a p:declare-step that appears in the pipeline, or an imported library, before it is used. Pipelines can refer to themselves (recursion is allowed), to pipelines defined in imported libraries, and to other pipelines in the same library if they are in a library.
If the step element name is the same as the type of a step declared with p:declare-step, then that step invokes the declared step.
If the step element name is the same as the type or name of a p:pipeline, then that step runs the pipeline identified by that type or name.
It is a
static
error (err:XS0010) if a pipeline contains a step
whose specified inputs, outputs, and options do not match the signature for steps of
that type.
It is a
dynamic
error (err:XD0017) if the running pipeline attempts
to invoke a step which the processor does not know how to
perform.
A pipeline author can make the set of parameters passed to a step explicit with a parameter input. If the step does not make an explicit binding for a parameter input, the default could be either to pass no parameters to the step or to behave as if the parameter input was bound to the pipeline parameters.
The working group is divided on this issue and this draft does not provide an answer to that question. Reader feedback is encouraged.
Namespace qualified attributes on a step are extension attributes. Attributes, other than name, that are not namespace qualified are treated as a syntactic shortcut for specifying the value of an option. In other words, the following two steps are equivalent:
The first step uses the standard p:option syntax:
<ex:stepType> <p:option name="option-name" value="5"/> </ex:stepType>
The second step uses the syntactic shortcut:
<ex:stepType option-name="5"/>
Note that there are significant limitations to this shortcut syntax:
It only applies to option names that are not in a namespace.
It only applies to option names that are not otherwise used on the step, such as “name”.
It can only be used to specify a constant value. Options that are computed with a select expression must be written using the longer form.
It is a
static
error (err:XS0027) if an option is specified with
both the shortcut form and the long form. It is a static
error (err:XS0031) to use an option on an atomic
step that is not declared on steps of that type.
A p:input identifies an input port
for a step, declaring it if necessary. There are two kinds of
inputs that may be declared, ordinary “document” inputs and
“parameter” inputs. It is a static error (err:XS0033) to
specify any kind of input other than “document” or “parameter”.
<p:input
port = NCName
sequence? = yes|no
primary? = yes|no
kind? = "document" />
The port attribute defines the
name of the port. It is a static error (err:XS0011) to
identify two ports with the same name on the same step. It is a static
error (err:XS0012) if the port given does not match the name of an input
port specified in the step's declaration.
On compound steps and p:declare-step, an input declaration can indicate if a sequence of documents is allowed to appear on the port and if the port is a primary input port.
If sequence is specified with the
value “yes”, then a sequence is allowed. If sequence is not specified, or has the value
“no”, then it is a dynamic error (err:XD0006) for a
sequence of more than one document to appear on the declared
port.
An input port is a primary input port if primary is specified with the value “yes” or
if the step has only a single input port and primary is not specified. It is a static
error (err:XS0030) to specify that more than one
input port is the primary.
On p:declare-step, the p:input simply declares the input port. In all other contexts, the declaration may be accompanied by a binding for the input:
<p:input
port = NCName
sequence? = yes|no
primary? = yes|no
select? = XPath expression>
(p:empty |
(p:pipe |
p:document |
p:inline)+)?
</p:input>
If no binding is provided, the input will be bound to the
default
readable port. It is a static error (err:XS0032) if no
binding is provided and the default readable port is
undefined. A select expression
may also be provided. The select expression, if specified, applies the
specified [XPath
1.0] select expression to the document(s) that are read.
Each node that matches is wrapped in a document and provided to the
input port. In other words,
<p:input port="source"> <p:document href="http://example.org/input.html"/> </p:input>
provides a single document, but
<p:input port="source" select="//html:div"> <p:document href="http://example.org/input.html"/> </p:input>
provides a sequence of zero or more documents, one for each matching html:div in http://example.org/input.html.
A select expression can equally be applied to input read from another step. This input:
<p:input port="source" select="//html:div"> <p:pipe step="origin" port="result"/> </p:input>
provides a sequence of zero or more documents, one for each matching html:div in the document (or each of the documents) that is read from the result port of the step named origin.
It is a
dynamic
error (err:XD0016) if the select expression on a p:input returns
anything other than a possibly empty set of nodes.
<p:input
port = NCName
sequence? = "yes"
kind =
"parameter" />
A parameter input port is a distinguished kind of input port. It exists only to receive computed parameters; if a step does not have a parameter input port then it cannot receive computed parameters. A parameter input port must satisfy all the constraints of a normal, document input.
The port attribute defines the
name of the port. It is a static error (err:XS0011) to
identify two ports with the same name on the same step. It is a static
error (err:XS0012) if the port given does not match the name of an input
port specified in the step's declaration.
When used on a step, parameter input ports always accept a sequence of documents. If no binding is provided, the parameter input will be bound to @@TBD.
All of the documents that appear on a parameter input must either be c:parameter documents or c:parameter-list documents.
A step which accepts a parameter input reads all of the documents presented on that port, using each c:parameter (either at the root or inside the c:parameter-list) to establish the value of the named parameter. If the same name appears more than once, the last value specified is used. If the step also has literal p:parameter elements, they are are also considered in document order. In other words, p:parameter elements that appear before the parameter input may be overridden by the computed parameters; p:parameter elements that appear after may override the computed values.
A c:parameter represents a parameter on a parameter input.
<c:parameter
name = string
namespace? = anyURI
value =
string />
The name attribute of the
c:parameter
must have the lexical form of a QName. If it contains a colon, then
its expanded name is constructed using the namespace declarations
in-scope on the c:parameter element. If it does not contain
a colon and the namespace attribute
is specified, then it is an expanded name in the specified
namespace. If the namespace attribute is not specified, its
expanded name has no namespace. It is a dynamic error (err:XD0013) if the
name attribute of a c:parameter element
contains a colon and a namespace
attribute is specified.
Any extension attributes that appear on the c:parameter element are ignored.
A c:parameter-list represents a list of parameters on a parameter input.
<c:parameter-list>
c:parameter*
</c:parameter-list>
The c:parameter-list contains zero
or more c:parameter elements. It is a dynamic
error (err:XD0018) if the parameter list contains
any elements other than c:parameter.
Any extension attributes that appear on the c:parameter-list element are ignored.
A p:iteration-source identifies input to a p:for-each.
<p:iteration-source
select? = XPath expression>
(p:empty |
(p:pipe |
p:document |
p:inline)+)?
</p:iteration-source>
The select attribute and binding of a p:iteration-source work the same way that they do in a p:input.
A p:viewport-source identifies input to a p:viewport.
<p:viewport-source>
(p:pipe |
p:document
|
p:inline)?
</p:viewport-source>
Only one binding is allowed and it works the same way
that bindings work on a p:input. It is a dynamic error (err:XD0006) for a
sequence of more than one document to appear on the p:viewport-source. No select expression is allowed.
A p:xpath-context identifies a context against which an [XPath 1.0] expression will be evaluated for a p:when.
<p:xpath-context>
(p:empty |
p:pipe |
p:document
|
p:inline)?
</p:xpath-context>
Only one binding is allowed and it works the same way
that bindings work on a p:input. It is a dynamic error (err:XD0006) for a
sequence of more than one document to appear on the p:xpath-context. No select expression is allowed.
It is a
dynamic
error (err:XD0019) if the context is bound to
p:empty and the
test expression refers to the context node.
A p:output identifies an output port, optionally declaring it, if necessary.
<p:output
port = NCName
sequence? = yes|no
primary? = yes|no />
The port attribute defines the
name of the port. It is a static error (err:XS0011) to
identify two ports with the same name on the same step. It is a static
error (err:XS0013) if the port given does not match the name of an
output port specified in the step's declaration.
An output declaration can indicate if a sequence of documents is
allowed to appear on the declared port. If sequence is specified with the value “yes”,
then a sequence is allowed. If sequence
is not specified on p:output, or has
the value “no”, then it is a dynamic error (err:XD0007) if the
step produces a sequence of more than one document on the declared
port.
An output declaration can indicate if it is to be considered the
primary output for the step. If primary is specified with the value “yes”,
then the named port will be treated as the primary output port.
It is a
static
error (err:XS0014) to identify more than one output
ports as primary.
On compound steps, the declaration may be accompanied by a binding for the output.
<p:output
port = NCName
sequence? = yes|no
primary? = yes|no>
(p:empty |
(p:pipe |
p:document |
p:inline)+)?
</p:output>
It is a
static
error (err:XS0029) to specify a binding for a
p:output inside a p:declare-step.
If a binding is provided for a p:output, documents are read from that binding and those documents form the output that is written to the output port. In other words, placing a p:document inside a p:output causes the processor to read that document and provide it on the output port. It does not cause the processor to write the output to that document.