This document is also available in these non-normative formats: XML, Revision markup
Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification describes the syntax and semantics of XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.
An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents. Pipelines generally accept zero or more XML documents as input and produce zero or more XML documents as output. Pipelines are made up of simple steps which perform atomic operations on XML documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed.
This document is an editor's draft that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a fragment produced to illustrate a proposed reworking of sections 2.5 (Environment) and 5.7 (Variables etc.)
An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.
...
...
subpipeline = p:variable*,(p:for-each|p:viewport|p:choose|p:group|p:try|p:standard-step|pfx:user-pipeline |p:documentation|p:pipeinfo)*
[Definition: The environment is a context-dependent collection of information available withing sub-pipelines.] Most of the information in the environment is static and can be computed for each subpipeline before evaluation of the pipeline as a whole begins. The in-scope bindings have to be calculated as the pipeline is being evaluated.
The environment consists of:
A set of readable ports. [Definition: The readable ports are a set of step name/port name pairs.] Inputs and outputs can only be connected to readable ports.
A default readable port. [Definition: The default readable port, which may be undefined, is a specific step name/port name pair from the set of readable ports.]
A set of in-scope bindings. [Definition: The in-scope bindings are a set of name-value pairs, based on option and variable bindings.]
[Definition: The empty environment contains no readable ports, an undefined default readable port and no in-scope bindings.]
Unless otherwise specified, the environment of a contained step is its inherited environment. [Definition: The inherited environment of a contained step is an environment that is the same as the environment of its container with the standard modifications. ]
The standard modifications made to an inherited environment are:
The declared inputs of the container are added to the readable ports.
In other words, contained steps can see the inputs to their container.
The union of all the declared outputs of all of the containers's contained steps are added to the readable ports.
In other words, sibling steps can see each other's outputs in addition to the outputs visible to their container.
If there is a preceding sibling step element:
If that preceding sibling has a primary output port, then that output port becomes the default readable port.
Otherwise, the default readable port is undefined.
If there is not a preceding sibling step element, the default readable port is the primary input port of the container, if it has one, otherwise the default readable port is unchanged.
The names and values from each p:variable present at the beginning of the container are added, in document order, to the in-scope bindings. A new binding replaces an old binding with the same name. See Section 5.7.1, “p:variable” for the specification of variable evaluation.
A step with no parent inherits the empty environment.
FIXME: this section needs an introduction. Perhaps discuss variable-binding elements and scope? Perhaps define "actual value"?
A p:variable declares a variable and associates a value with it.
The name of the variable must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static error (err:XS0028) to
declare an option or variable in the XProc namespace.
The variable's value can be specified in two ways: with a
value or select attribute. It is a static
error (err:XS0016) if the value is not specified
with either select or value, or if both are specified.
If a value attribute is specified, its literal content becomes the value of the variable.
<p:variable
name = QName
value = string>
</p:variable>
If a select attribute is specified, its content is an XPath expression which will be evaluated to provide the value of the variable.
<p:variable
name = QName
select =
XPathExpression>
((p:empty |
p:pipe
|
p:document |
p:inline)? &
p:namespaces*)
</p:variable>
If a select expression is given, it is evaluated as an XPath expression using the context defined in Section 2.6.1, “Processor XPath Context”, for the enclosing container, with the addition of bindings for all p:variable elements which precede this p:variable within its surrounding container. When XPath 1.0 is being used, the string value of the expression becomes the value of the variable; when XPath 2.0 is being used, the value is an untypedAtomic.
Since all in-scope bindings are present in the Processor XPath Context as variable bindings, select expressions may refer to the value of in-scope bindings by variable reference. If a variable reference uses a QName that is not the name of an in-scope binding, an XPath evaluation error will occur.
If a select expression is given,
the readable
ports available for document binding are the readable ports in
the environment inherited by the first step in the surrounding
container's contained steps. However, in order to avoid
ordering paradoxes, it is a static
error (err:XS0019) for a variable's document binding
to refer to the output port of any step in the surrounding
container's contained steps.
If a select expression is given
but no document binding is provided, the implicit binding is to the
default
readable port in the environment inherited by the first
step in the surrounding container's
contained steps. It is a static error (err:XS0032) if no
document binding is provided and the default readable port is
undefined. It
is a dynamic error (err:XD0008) if a
document sequence is specified in the document binding for a
p:variable. If p:empty is given as the document binding, an
empty
document node is used as the context
node.
A p:option declares an option and may associate a default value with it. The p:option tag can only be used in a p:declare-step or a p:pipeline (which is a syntactic abbreviation for a step declaration).
The name of the option must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static error (err:XS0028) to
declare an option or variable in the XProc namespace.
It is a
static error (err:XS0004) to
declare two or more options on the same step with the same
name.
<p:option
name = QName
required? = boolean>
</p:option>
An option may be declared as required. If an option is required, it is a static error (err:XS0018) to
invoke the step without specifying a value for that option.
If an option is not declared to be required, it may be given a default value. The value can be
specified in two ways: with a value
or select attribute. It is a static error (err:XS0016) if both
select and value are specified.
It is a
static error (err:XS0017) to
specify that an option is both required and has a default value.
If a value attribute is specified, its literal content becomes the value of the option.
<p:option
name = QName
value = string>
</p:option>
If a select attribute is specified, its content is an XPath expression which will be evaluated to provide the value of the variable, which may differ from one instance of the step type to another.
<p:option
name = QName
select =
XPathExpression>
</p:option>
The select expression is only evaluated when its actual value is needed by an instance of the step type being declared. In this case, it is evaluated as described in Section 5.7.3, “p:with-option” except that
the context node is an empty document node;
the variable bindings consist only of bindings for options whose declaration precedes the p:option itself in the surrounding step signature;
the in-scope namespaces are the in-scope namespaces of the p:option itself.
It follows that if the select expression contains a variable reference that uses a QName that is not the name of an preceding sibling p:option declaration, an XPath evaluation error will occur.
When XPath 1.0 is being used, the string value of the expression becomes the value of the option; when XPath 2.0 is being used, the value is an untypedAtomic.
A p:with-option provides an actual value for an option when a step is invoked.
The name of the option must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static error (err:XS0031) to use
an option name in p:with-option if the
step type being invoked has not declared an option with that
name.
It is a
static error (err:XS0004) to
include more than one p:with-option
with the same option name as part of the same step invocation.
The actual value can be specified in two ways: with a value or select
attribute. It is a static
error (err:XS0016) if the value is not specified
with either select or value, or if both are specified.
If a value attribute is specified, its literal content becomes the value of the option.
<p:with-option
name = QName
value = string>
</p:with-option>
If a select attribute is specified, its content is an XPath expression which will be evaluated to provide the value of the variable.
<p:with-option
name = QName
select =
XPathExpression>
((p:empty |
p:pipe
|
p:document |
p:inline)? &
p:namespaces*)
</p:with-option>
The values of options for a step must be computed in the order determined by the step's signature. If a select expression is given for an option, it is evaluated as an XPath expression using the context defined in Section 2.6.1, “Processor XPath Context”, for the surrounding step, with the addition of variable bindings for all options whose declarations precede its declaration in the surrounding step's signature.
When XPath 1.0 is being used, the string value of the expression becomes the value of the option; when XPath 2.0 is being used, the value is an untypedAtomic.
All in-scope bindings for the step instance itself are present in the Processor XPath Context as variable bindings, so select expressions may refer to any option or variable bound in those in-scope bindings, as well as to any option whose declaration precedes their own in the step signature, by variable reference. If a variable reference uses a QName that is not the name of an in-scope binding or preceding sibling option, an XPath evaluation error will occur.
If a select expression is used
but no document binding is provided, the implicit binding is to the
default
readable port. It is a static
error (err:XS0032) if no document binding is
provided and the default readable port is undefined. It is a dynamic error (err:XD0008) if a
document sequence is specified in the binding for a p:with-option. If p:empty is given as the document binding, an
empty
document node is used as the context
node.
The p:with-param element is used to establish the value of a parameter. The parameter must be given a value when it is used. (Parameter names aren't known in advance; there's no provision for declaring them.)
The name of the parameter must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static error (err:XS0028) to use
the XProc namespace in the name of a parameter.
The value can be specified in two ways: with a value or select
attribute. It is a static
error (err:XS0016) if the value is not specified
with either select or value, or if both are specified.
If a value attribute is specified, its content becomes the value of the parameter.
<p:with-param
name = QName
value = string
port? = NCName>
</p:with-param>
If a select attribute is specified, its content is an XPath expression which will be evaluated to provide the value of the variable.
<p:with-param
name = QName
select =
XPathExpression
port? = NCName>
((p:empty |
p:pipe
|
p:document |
p:inline)? &
p:namespaces*)
</p:with-param>
The values of parameters for a step must be computed after all the options in the step's signature have had their values computed. If a select expression is given on a p:with-param, it is evaluated as an XPath expression using the context defined in Section 2.6.1, “Processor XPath Context”, for the surrounding step, with the addition of variable bindings for all options declared in the surrounding step's signature.
When XPath 1.0 is being used, the string value of the expression becomes the value of the parameter; when XPath 2.0 is being used, the value is an untypedAtomic.
All in-scope bindings for the step instance itself are present in the Processor XPath Context as variable bindings, so select expressions may refer to any option or variable bound in those in-scope bindings, as well as to any option declared in the step signature, by variable reference. If a variable reference uses a QName that is not the name of an in-scope binding or declared option, an XPath evaluation error will occur.
If a select expression is used
but no document binding is provided, the implicit binding is to the
default
readable port. It is a static
error (err:XS0032) if no document binding is
provided and the default readable port is
undefined. It is a dynamic
error (err:XD0008) if a document sequence is
specified in the binding for a p:with-param. If p:empty is given as the document binding, an
empty
document node is used as the context
node.
If the optional port attribute is
specified, then the parameter appears on the named port, otherwise
the parameter appears on the step's primary
parameter input port. It is a static
error (err:XS0034) if the specified port is not a
parameter input port or if no port is specified and the step does
not have a primary parameter input port.
Variable, option and parameter values carry with them not only their literal or computed string value but also a set of namespaces. To see why this is necessary, consider the following step:
<p:delete xmlns:p="http://www.w3.org/ns/xproc">
<p:with-option name="match" value="html:div"
xmlns:html="http://www.w3.org/1999/xhtml"/>
</p:delete>
The p:delete step will delete elements that match the expression “html:div”, but that expression can only be correctly interpreted if there's a namespace binding for the prefix “html” so that binding has to travel with the option.
The default namespace bindings associated with a variable, option or parameter value are computed as follows:
If the select attribute was used to specify the value and it consisted of a single VariableReference (per [XPath 1.0] or [XPath 2.0], as appropriate), then the namespace bindings from the referenced option or variable are used.
If the select attribute was used to specify the value and it evaluated to a node-set, then the in-scope namespaces from the first node in the selected node-set (or, if it's not an element, its parent) are used.
The expression is evaluated in the appropriate context, See Section 2.6.1, “Processor XPath Context”.
Otherwise, the in-scope namespaces from the element providing the value are used.
The default namespace is never included in the namespace bindings for a variable, option or parameter value. Unqualified names are always in no-namespace.
Unfortunately, in more complex situations, there may be no single variable, option or parameter that can reliably be expected to have the correct set of namespace bindings. Consider this pipeline:
<p:pipeline type="ex:delete-in-div"
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:ex="http://example.org/ns/ex"
xmlns:h="http://www.w3.org/1999/xhtml">
<p:option name="divchild" required="true"/>
<p:delete>
<p:with-option name="match" select="concat('h:div/',$divchild)"/>
</p:delete>
</p:pipeline>
It defines an atomic step (“ex:delete-in-div”) that deletes elements that occur inside of XHTML div elements. It might be used as follows:
<ex:delete-in-div xmlns:p="http://www.w3.org/ns/xproc" xmlns:ex="http://www.w3.org/ns/xproc/examples">
<p:with-option name="divchild" select="html:p[@class='delete']"
xmlns:html="http://www.w3.org/1999/xhtml"/>
</ex:delete-in-div>
In this case, the match option passed to the p:delete step needs both the namespace binding of “h” specified in the ex:delete-in-div pipeline definition and the namespace binding of “html” specified in the divchild option on the call of that pipeline. It's not sufficient to provide just one of the sets of bindings.
The p:namespaces element can be used as a child of p:variable, p:with-option or p:with-param to provide explicit bindings.
<p:namespaces
binding? = QName
element? = XPathExpression
except-prefixes? = prefix list>
</p:namespaces>
The namespace bindings specified by a p:namespaces element are determined as follows:
If the binding attribute is
specified, it must contain the name of
a single in-scope binding. The namespace
bindings associated with that binding are used. It is a static error (err:XS0020) if the
binding attribute on p:namespaces is
specified and its value is not the name of an in-scope binding.
If the element attribute is specified, it must contain an XPath expression which identifies a single element node (the input binding for this expression is the same as the binding for the p:option or p:with-param which contains it). The in-scope namespaces of that node are used.
The expression is evaluated in the appropriate context, See Section 2.6.1, “Processor XPath Context”.
It is a
dynamic error (err:XD0009) if the
element attribute on p:namespaces is
specified and it does not identify a single element node.
If neither binding nor element is specified, the in-scope namespaces on the p:namespaces element itself are used.
Irrespective of how the set of namespaces are determined, the
except-prefixes attribute can be
used to exclude one or more namespaces. The value of the
except-prefixes attribute must be a
sequence of tokens, each of which must
be a prefix bound to a namespace in the in-scope namespaces of the
p:namespaces element. All bindings of
prefixes to each of the namespaces thus identified are excluded.
It is a
static error (err:XS0051) if the
except-prefixes attribute on
p:namespaces does not contain a list of
tokens or if any of those tokens is not a prefix bound to a
namespace in the in-scope namespaces of the p:namespaces
element.
It is a
static error (err:XS0041) to
specify both binding and element on the same p:namespaces
element.
If a p:variable, p:with-option or
p:with-param includes one or more p:namespaces
elements, then the union of all the namespaces specified on those
elements are used as the bindings for the variable, option or
parameter value. In this case, the in-scope namespaces on the
p:variable,
p:with-option or p:with-param are
ignored. It is
a dynamic error (err:XD0013) if the
specified namespace bindings are inconsistent; that is, if the same
prefix is bound to two different namespace names.
For example, this would allow the preceding example to work:
<p:pipeline type="ex:delete-in-div"
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:ex="http://example.org/ns/ex"
xmlns:h="http://www.w3.org/1999/xhtml">
<p:option name="divchild" required="true"/>
<p:delete>
<p:with-option name="match" select="concat('h:div/',$divchild)">
<p:namespaces xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:html="http://www.w3.org/1999/xhtml"/>
</p:with-option>
</p:delete>
</p:pipeline>
The p:namespaces element provides namespace bindings for both of the prefixes necessary to correctly interpret the expression ultimately passed to the p:delete step.
This solution has the weakness that it depends on knowing the bindings that will be used by the caller. A more flexible solution would use the binding attribute to copy the bindings from the caller's option value.
<?xml version='1.0'?>
<p:pipeline type="ex:delete-in-div" xmlns:p="http://www.w3.org/ns/xproc" xmlns:ex="http://example.org/ns/ex" xmlns:h="http://www.w3.org/1999/xhtml">
<p:option name="divchild" required="true"/>
<p:delete>
<p:with-option name="match" select="concat('h:div/',$divchild)">
<p:namespaces binding="divchild"/>
<p:namespaces xmlns:h="http://www.w3.org/1999/xhtml"/>
</p:with-option>
</p:delete>
</p:pipeline>
This example will succeed as long as the caller-specified option does not bind the “h” prefix to something other than the XHTML namespace.