This specification describes the standard step vocabulary of
XProc 2.0: An XML Pipeline Language.
Status of this Document
This section describes the status of this document at
the time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest revision
of this technical report can be found in the
W3C technical reports index at
http://www.w3.org/TR/.
Publication as a First Public Working Draft does not imply
endorsement by the W3C Membership. This is a draft document and may be
updated, replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in
progress.
This section describes the standard XProc steps. A machine-readable
description of these steps may be found in
xproc-1.0.xpl.
Some aspects of documents are generally unchanged by steps:
When a step in this library produces an output document,
the base URI of the output is the base URI of the step's primary
input document unless the step's process explicitly sets an
xml:base attribute or the step's
description explicitly states how the base URI is constructed.
Unless otherwise specified, steps in this library do not
modify the document propertiesXP of the documents
that flow through them.
Also, in this section, several steps use this
element for result information:
When a step specifies a particular version of a technology,
implementations must implement that
version or a subsequent version that is backwards compatible with that
version. At user-option, they may implement other non-backwards
compatible versions.
2.1 Required Steps
This section describes standard steps that must be supported
by any conforming processor.
2.1.1 p:add-attribute
The p:add-attribute step adds a single attribute to
a set of matching elements. The input document specified on the
source is processed for matches specified by the match
pattern in the match option. For each of these
matches, the attribute whose name is specified by the
attribute-name option is set to the attribute value
specified by the attribute-value option.
The resulting document is produced on the result
output port and consists of a exact copy of the input with the
exception of the matched elements. Each of the matched elements is
copied to the output with the addition of the specified attribute
with the specified value.
The value of the match option
must be an XSLTMatchPattern. It
is a dynamic error (err:XC0023) if the match pattern does
not match an element.
The value of the attribute-name option
must be a QName.
If the lexical value does not contain a colon, then the attribute-namespace may be used to specify the
namespace of the attribute. In that case, the attribute-prefix may be specified to suggest a
prefix for the attribute name. It is a
dynamic error (err:XD0034) to specify a new namespace or
prefix if the lexical value of the specified name contains a
colon.
The corresponding expanded name is used to construct the attribute.
The value of the attribute-value option
must be a legal attribute value according to XML.
If an attribute with the same name as the expanded name
from the attribute-name option exists on the matched
element, the value specified in
the attribute-value option is used to set the
value of that existing attribute. That is, the value of the
existing attribute is changed to the attribute-value
value.
Note
If multiple attributes need to be set on the same
element(s), the p:set-attributes step can be used to set them
all at once.
This step cannot be used to add namespace declarations. It is a dynamic error (err:XC0059) if the QName
value in the attribute-name option uses the prefix
“xmlns”
or any other prefix that resolves to the namespace name
http://www.w3.org/2000/xmlns/.
Note, however, that while namespace declarations cannot be
added explicitly by this step, adding an attribute whose name is in a
namespace for which there is no namespace declaration in scope on the
matched element may result in a namespace binding being added by
Section 2.5.1, “Namespace Fixup on XML Outputs”XP.
If an attribute named
xml:base is added or changed, the base URI
of the element must also be amended accordingly.
2.1.2 p:add-xml-base
The p:add-xml-base step exposes the base URI via
explicit xml:base attributes. The input document from the
source port is replicated to the result port
with xml:base attributes added to or corrected on each element as specified
by the options on this step.
The p:add-xml-base step modifies its input as follows:
For the document element: force the element to have an xml:base
attribute with the document's [base URI] property's value as its value.
For other elements:
If the all option has the value
true, force the element to have an xml:base attribute with the element's [base
URI] value as its value.
If the element's [base URI] is different from the its parent's
[base URI], force the element to have an xml:base attribute with the following
value: if the value of the relative option is
true, a string which, when resolved against the
parent's [base URI], will give the element's [base URI], otherwise the
element's [base URI].
Otherwise, if there is an xml:base attribute present, remove it.
2.1.3 p:cast-content-type
The p:cast-content-type step changes the media type
of its input.
The input document is transformed from one media type to another.
It is a dynamic
error (err:XC1002) if the supplied content-type is not
a valid media type of the form
“type/subtype+ext”.
Casting from one XML media type to another simply changes the
“content-type” document
propertyXP.
Casting from a non-XML media type to an XML media type produces an
XML document with a c:data document element. The original
media type will be preserved in the
content-type attribute on the
c:data element.
The content of the c:data element is the base64 encoded
representation of the non-XML content.
Casting from an XML media type to a non-XML media type
must support the case where the input document is
a c:data document. The resulting document will
have the specified media type and a representationXP that
is the content of the c:data element after decoding the base64
encoded content.
It is a dynamic
error (err:XC1006) if the content-type is supplied and is
not the same as the content-type specified on
the c:data element.
Casting from an XML media type to a non-XML media type when
the input document is not a c:data document is
implementation-definedXP.
What happens when one non-XML media type is cast to another
non-XML media type is implementation-definedXP.
It is a dynamic
error (err:XC1003) if the p:cast-content-type step
cannot perform the requested cast.
In all cases except when the input document
is a c:data element, it is a dynamic
error (err:XC1007) if the content-type is not supplied.
2.1.4 p:compare
The p:compare step compares two documents for
equality.
The value of the fail-if-not-equal option must be a boolean.
This step takes single documents on each of two ports and compares them
using the fn:deep-equal (as defined in
[XPath 2.0 Functions and Operators]). It is a
dynamic error (err:XC0019) if the documents are not equal, and the value
of the fail-if-not-equal option is
true. If the documents are equal, or if the value
of the fail-if-not-equal option is
false, a c:result
document is produced with contents true if the documents
are equal, otherwise false.
2.1.5 p:count
The p:count step counts the number of documents in
the source input sequence and returns a single document
on result containing that number. The generated document
contains a single c:result element whose contents is the
string representation of the number of documents in the
sequence.
If the limit option is specified
and is greater than zero, the p:count step will count at most
that many documents. This provides a convenient mechanism to discover,
for example, if a sequence consists of more than 1 document, without
requiring every single document to be buffered before processing can
continue.
2.1.6 p:delete
The p:delete step deletes items specified by a match
pattern from the
source input document and produces the resulting document,
with the deleted items removed, on the result port.
The value of the match option must be an
XSLTMatchPattern. A match pattern may match multiple items to be
deleted.
If an element is selected by the match option, the
entire subtree rooted at that element is deleted.
This step cannot be used to remove namespaces. It is a dynamic error (err:XC0062) if the
match option matches a namespace node.
Also, note that deleting an attribute named
xml:base does not change the base URI
of the element on which it occurred.
2.1.7 p:directory-list
The p:directory-list step produces a list of the
contents of a specified
directory.
The value of the path option
must be an anyURI. It is interpreted
as an IRI reference. If it is relative, it is made absolute against
the base URI of the element on which it is specified
(p:with-optionXP or p:directory-list in the case of a
syntactic shortcutXP value).
It is a
dynamic error (err:XC0017) if the absolute path does not
identify a directory. It is a
dynamic error (err:XC0012) if the contents of the directory
path are not available to the step due to access restrictions in the
environment in which the pipeline is run.
Conformant processors must support directory paths whose
scheme is file. It is
implementation-definedXP what other schemes are
supported by p:directory-list, and what the interpretation
of 'directory', 'file' and 'contents' is for those schemes.
If present, the value of the include-filter
or exclude-filter
option must be a regular expression as specified in [XPath 2.0 Functions and Operators], section 7.61 “Regular Expression
Syntax”.
If the include-filter pattern matches a
directory entry's name, the entry is included in the output. If the
exclude-filter pattern matches a directory entry's name,
the entry is excluded in the output. If both options are provided, the
include filter is processed first, then the exclude filter.
The result document produced for
the specified directory path has a c:directory document
element whose base URI is the directory path and whose
name attribute is the last segment
of the directory path (that is, the directory's (local) name).
Its contents are determined as follows, based on the entries in
the directory identified by the directory path. For each entry in the
directory, if either no filter was specified, or the
(local) name of the entry matches the filter pattern, a
c:file, a c:directory, or a c:other
element is produced, as follows:
A c:directory is produced for each subdirectory not
determined to be special.
A c:file is produced for each file
not determined to be special.
<c:file name = string />
Any file or directory determined to be
special by the p:directory-list step may be output using a
c:other element but the criteria for marking a file as
special are implementation-definedXP.
<c:other name = string />
When a directory entry is a subdirectory, that directory's entries are not
output as part of that entry's c:directory. A user must apply this step
again to the subdirectory to list subdirectory contents.
Each of the elements c:file, c:directory,
and c:other has a name attribute when it
appears within the top-level c:directory element, whose
value is a relative IRI reference, giving the (local) file or
directory name.
The value of the code option
must be a QName.
If the lexical value does not contain a colon, then the code-namespace may be used to specify the
namespace of the code. In that case, the code-prefix may be specified to suggest a
prefix for the code. It is a
dynamic error (err:XD0034) to specify a new namespace or
prefix if the lexical value of the specified name contains a
colon.
This step uses the document provided on its input as the content
of the error raised. An instance of the
c:errorsXP element will be produced on the error output port, as is
always the case for dynamic errors.
The error generated can be caught by a p:tryXP just like any
other dynamic error.
For authoring convenience, the p:error step is
declared with a single, primary output port. With respect to
connectionsXP, this port behaves like
any other output port even though nothing can ever
appear on it since the step always fails.
For example, given the following invocation:
<p:error xmlns:my="http://www.example.org/error"
name="bad-document" code="my:unk12">
<p:input port="source">
<p:inline>
<message>The document element is unknown.</message>
</p:inline>
</p:input>
</p:error>
The error vocabulary element (and document) generated on the
error output port would be:
<c:errors xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:my="http://www.example.org/error">
<c:error name="bad-document" type="p:error"
code="my:unk12"><message>The document element is unknown.</message>
</c:error>
</c:errors>
The href,
line and column,
or offset, might also be present on the
c:error to identify the location of the p:error
element in the pipeline.
2.1.9 p:escape-markup
The p:escape-markup step applies XML serialization to the
children of the document element and replaces those children with their
serialization. The outcome is a single element with text content that
represents the "escaped" syntax of the children as they were
serialized.
This step supports the standard serialization options as specified in Section 2.3, “Serialization Options”. These
options control how the output markup is produced before it is escaped.
For example, the input:
<description>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is a chunk of XHTML.</p>
</div>
</description>
produces:
<description>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is a chunk of XHTML.</p>
</div>
</description>
Note
The result of this step is an XML document that contains the
Unicode characters that are the characters that result from escaping
the input. It is not encoded characters in a serialized octet stream,
therefore, the serialization options related to encoding characters
(byte-order-mark, encoding, and
normalization-form) do not apply. They are omitted
from the standard serialization options on this step.
By default, this step must not generate an
XML declaration in the escaped result.
2.1.10 p:filter
The p:filter step selects portions of the source document
based on a (possibly dynamically constructed) XPath select expression.
This step behaves just like an p:inputXP with
a select expression except that the select
expression is computed dynamically.
2.1.11 p:http-request
The p:http-request step provides for interaction
with resources over HTTP or related protocols.
The input
document provided on the source port specifies a request
by a single c:request element. This element specifies
the method, resource, and other request properties as well as possibly
including an entity body (content) for the request.
The standard serialization options are provided to control the
serialization of any XML content which is sent as part of the request.
The effect of these options is as specified in Section 2.3, “Serialization Options”. See Section 2.1.11.2, “Request Entity body conversion” for a discussion of when serialization
occurs in constructing a request.
It is a dynamic error (err:XC0004) if the
status-only attribute has the value true and
the detailed attribute does not have the value true.
The method attribute specifies the method to be
used against the IRI specified by the href attribute,
e.g. GET or POST (the value is not case-sensitive).
If the href
attribute is not absolute, it will be resolved against the base URI of
the element on which it is occurs.
Note
In the case of simple “GET” requests, implementors are encouraged
to support as many protocols as practical. In particular, pipeline authors may
attempt to use p:http-request to load documents with computed
URIs using the file: scheme.
If the username attribute is specified, the
username, password,
auth-method, and send-authorization
attributes are used to handle authentication according to the selected
authentication method.
For the purposes of avoiding an authentication challenge, if the
send-authorization attribute has the value
true and the authentication method specified by the
auth-method supports generation of an
Authorization header without a challenge, then an
Authorization header is generated and sent on the first
request. If the send-authorization attribute is absent or
has the value
false, then the first request is sent without an
Authorization header.
If the initial response to the request is an
authentication challenge, the auth-method,
username, password and any relevant data from
the challenge are used to generate an
Authorization header and the request is sent again. If
that authorization fails, the request is not retried.
Appropriate values for the auth-method attribute
are “Basic” or “Digest” but other values are allowed.
If the authentication method is “Basic” or “Digest”, authentication
is handled as per [RFC 2617].
The
interpretation of auth-method values on
c:request other than “Basic” or “Digest” is
implementation-definedXP.
It
is a dynamic error (err:XC0003) if a username
or password is specified without specifying an
auth-method, if
the requested
auth-method isn't supported, or the authentication
challenge contains an authentication method that isn't
supported. All implementations are required to support "Basic"
and "Digest" authentication per [RFC 2617].
The c:header element specifies a
header name and value, either for inclusion in a request, or as received in a response.
<c:header name = string value = string />
The request is formulated from the attribute values on the
c:request element and its
c:header and c:multipart or c:body children,
if present, and transmitted to the host (and port, if present) specified by the
href attribute. The details of how the request entity body, if any, is
constructed are given in Section 2.1.11.4, “Converting Response Entity Bodies”.
When the request is formulated, the step and/or protocol
implementation may add headers as necessary to either complete the
request or as appropriate for the content specified (e.g. transfer
encodings). A user of this step is guaranteed that their requested
headers and content will be sent with the exception of any conflicts
with protocol-related headers.
The p:http-request step allows users to specify
independently values that are not always independent. For example,
some combinations of c:header values
(e.g., Content-Type)
may be inconsistent
with values that the step and/or protocol implementation must set. In
a few cases, the step provides more than one mechanism to specify what
is actually a single value (e.g., the boundary string in multipart
messages).
It is a
dynamic error (err:XC0020) if the the user specifies a value
or values that are inconsistent with each other or with the requirements
of the step or protocol.
2.1.11.2 Request Entity body conversion
The c:multipart element specifies a multi-part
body, per [RFC 1521], either for inclusion in a
request or as received in a response.
In the context of a request, the media type of the c:multipartmust be a multipart media type (i.e. have a main type of 'multipart'). If the content-type attribute is not specified, a value of "multipart/mixed" will be assumed.
The boundary attribute is required and is used to
provide a multipart boundary marker. The implementation must use this
boundary marker and must prefix the value with the string
“--” when formulating the multipart message. It is a dynamic error (err:XC0002) if the value
starts with the string “--”.
If the boundary is also specified as a parameter in the
content-type option, then the parameter value specified
and the boundary value specified must
be the same. If the boundary is specified in both the boundary
option and the content-type option then the two values
must be the same.
The c:body element holds the body or body part of the message. Each of the attributes holds controls some aspect of the encoding the request body or decoding the body element's content when the request is formulated. These are specified as follows:
The content-type attribute specifies the media type
of the body or body part, that is, the value of its
Content-Type header. If the media type is not an XML type
or text, the content must already be base64-encoded.
The encoding attribute controls the decoding of the
element content for formulating the body. A value of
base64 indicates the element's content is a base64
encoded string whose byte stream should be sent as the message body.
An implementation may support encodings other than
base64 but these encodings and their names are
implementation-definedXP.
It is a dynamic
error (err:XC0052) if the encoding specified is not supported by the
implementation.
Note
The p:http-request step provides only a single set of
serialization options for XML media types. There's no direct support
for sending a multipart message with two XML parts encoded
differently.
For each body or body part, the id attribute
specifies the value of the Content-ID header;
the description attribute specifies the
value of the Content-Description header;
and the disposition attribute specifies the value
of the Content-Disposition header.
If an entity body is to be sent as part of a request (e.g. a
POST), either a c:body element, specifying the
request entity body, or a c:multipart element, specifying
multiple entity body parts, may be used. When c:multipart
is used it may contain multiple c:body children. A
c:body specifies the construction of a body or body part as
follows:
If the content-type attribute does not specify an
XML media type, or the encoding attribute is
“base64”, then it is a
dynamic error (err:XC0028) if the content of the
c:body element does not consist entirely of
characters, and the entity body or body part will consist of
exactly those characters.
Otherwise (the content-type attribute
does specify an XML media type and the
encoding attribute is not 'base64'),
it is a dynamic error (err:XC0022) if
the content of the c:body element does not consist of
exactly one element, optionally preceded and/or followed by any number
of processing instructions, comments or whitespace characters,
and the entity body or body part will consist of the serialization of
a document node containing that content. The serialization of that
document is controlled by the serialization options on the
p:http-request step itself.
For example, the following input to a
p:http-request step will POST a small XML
document:
The handling of the response to the request and the generation
of the step's result document is controlled by the
status-only, override-content-type and
detailed attributes on the c:request
input.
The override-content-type attribute controls
interpretation of the response's Content-Type header. If
this attribute is present, the response will be treated as if it
returned the Content-Type given by its value. This
original Content-Type header will however be reflected
unchanged as a c:header in the result document. It is a dynamic error (err:XC0030) if the
override-content-type value cannot be used (e.g.
text/plain to override
image/png).
If the status-only attribute has the value
true, the result document will contain only header
information. The entity of the response will not be processed to
produce a c:body or c:multipart element.
The c:response element represents an HTTP response.
The response's status code is encoded in the status
attribute and the headers and entity body are processing into
c:header and c:multipart or c:body
content.
Otherwise (the detailed attribute is not specified
or its value is false), the response to the request
is handled as follows:
If the media type (as determined by the
override-content-type attribute or the Content-Type
response header) is an XML media type, the entity is decoded if necessary, then parsed as an XML document
and produced on the result output port as the entire output of the step.
In either case the base URI of the output document is the resolved value
of the href attribute from the input c:request.
2.1.11.3.1 Redirects
One possible response from an HTTP request is a redirect,
indicated by a status code in the three-hundred range. The precise
semantics of the 3xx return codes are laid out by section
10.3 Redirection 3xx in [RFC 2616].
The p:http-request step should follow
redirect requests (in a manner consistent with [RFC 2616])
if they are returned by the server.
2.1.11.3.2 Cookies
With one exception, in version 1.0 of XProc, the
p:http-request step does not provide any standard
mechanisms for managing cookies. Pipeline authors that need to
preserve cookies across several p:http-request calls in the
same pipeline or across multiple invocations of the same or different
pipelines will have to rely on
implementation-definedXP mechanisms.
The exception arises in the case of redirection. If a redirect
response includes cookies, those cookies should be
forwarded as appropriate to the redirected location when the
redirection is followed.
This behavior will allow the p:http-request step to
interoperate with web services that use cookies as part of an
authentication protocol.
2.1.11.4 Converting Response Entity Bodies
The entity of a response may be multipart per [RFC 1521]. In those situations, the result document will be
a c:multipart element that contains multiple
c:body elements inside.
Note
Although it is technically possible for any of the individual
parts of a multipart message to also be
multipart, XProc does not provide a standard representation for such
messages. The interpretation of a multipart message inside
another multipart message is
implementation-dependentXP.
The result of the p:http-request step is an XML
document. For media types (images, binaries, etc.) that can't be
represented as a sequence of Unicode characters, the response is
encoded as base64
and then returned as text children of the c:body element.
If the content is base64-encoded, the encoding attribute on c:body must
be set to “base64”.
If the media type of the response
is a text type with a
charset parameter that is a Unicode character encoding
(per [Unicode TR#17]) or
is recognized as a non-XML media type whose contents are encoded as a
sequence of Unicode characters (e.g. it has a character parameter or
the definition of the media type is such that it requires Unicode),
the content of the constructed c:body element is the translation
of the text into a sequence of Unicode characters.
If the response is an XML media type, the content of the
constructed c:body element is the result of decoding the
body as necessary, then parsing it with an XML parser. If the content
is not well-formed, the step fails.
In a c:body in a response, the
content-type attribute must
be an exact copy of the value returned in the
Content-Type header. That is, it must reflect the
content type actually returned, not any override value that may have been
specified, and it must include any parameters returned by the server.
In the case of a multipart response, the same rules apply when
constructing a c:body element for each body part
encountered.
Note
Given the above description, any content identified as
text/html will be encoded as (escaped) text or
base64-encoded in the c:body element, as HTML isn't always
well-formed XML. A user can attempt to convert such content into XML
using the p:unescape-markup step.
If the implementation supports passing PSVI annotations between
steps, the p:identity step must preserve
any annotations that appear in the input.
2.1.13 p:insert
The p:insert step inserts the
insertion port's document into the source
port's document relative to the matching elements in the
source port's document.
The value of the match option
must be an XSLTMatchPattern. It
is a dynamic error (err:XC0023) if that pattern matches
anything other than element, text, processing-instruction, or comment nodes.
Multiple matches are
allowed, in which case multiple copies of the insertion
documents will occur. If no elements match, then the document is
unchanged.
The value of the position option must be an NMTOKEN in
the following list:
“first-child” - the insertion is made as the first child of the match;
“last-child” - the insertion is made as the last child of the match;
“before” - the insertion is made as the immediate preceding sibling of the match;
“after” - the insertion is made as the immediate following sibling of the match.
It is a dynamic error (err:XC0025)
if the match pattern matches anything other than an element node and
the value of the position option is
“first-child” or
“last-child”.
As the inserted elements are part of the output of the step they
are not considered in determining matching elements. If an empty sequence
appears on the insertion port, the result will be the same
as the source.
2.1.14 p:label-elements
The p:label-elements step generates a label for each matched
element and stores that label in the specified attribute.
The value of the attribute option
must be a QName.
If the lexical value does not contain a colon, then the attribute-namespace may be used to specify the
namespace of the attribute name. In that case, the attribute-prefix may be specified to suggest a
prefix for the attribute name. It is a
dynamic error (err:XD0034) to specify a new namespace or
prefix if the lexical value of the specified name contains a
colon.
The value of the label option is an XPath
expression used to generate the value of the attribute label.
The value of the match option
must be an XSLTMatchPattern. It
is a dynamic error (err:XC0023) if that expression matches
anything other than element nodes.
The value of the replacemust be a boolean value and is used to indicate
whether existing attribute values are replaced.
This step operates by generating attribute labels for each
element matched. For every matched element, the expression is
evaluated with the context node set to the matched element. An
attribute is added to the matched element using the attribute name is
specified the attribute option and the string value
of result of evaluating the expression. If the attribute already
exists on the matched element, the value is replaced with the string
value only if the replace option has the value of
true.
If this step is used to add or change the value
of an attribute named “xml:base”, the base URI
of the element must also be amended accordingly.
An implementation must bind the variable
“p:index” in the static context of each evaluation
of the XPath expression to the position of the element in the sequence
of matched elements. In other words, the first element (in document
order) matched gets the value “1”, the second gets
the value “2”, the third, “3”,
etc.
The result of the p:label-elements step is the input document with the
attribute labels associated with matched elements. All other non-matching content
remains the same.
2.1.15 p:load
The p:load step has no inputs but produces as its
result an XML resource specified by an IRI.
The value of the href option
must be an anyURI. It is interpreted
as an IRI reference. If it is relative, it is made absolute against
the base URI of the element on which it is specified
(p:with-optionXP or p:load in the case of a
syntactic shortcutXP value).
The value of the dtd-validate option
must be a boolean.
The p:load step is the same as p:documentXP
with two additions:
The URI to be accessed can be constructed dynamically by the
pipeline.
The p:load step has an option to invoke DTD validation.
When dtd-validate is false,
p:load processing is the same as p:documentXP
processing on the computed href value.
When dtd-validate is true,
p:load processing is the same as p:documentXP
processing on the computed href value but
must use a validating parser. It is a dynamic error (err:XC0027) if the
document is not valid or the step doesn't support DTD
validation.
The retrieved document is produced on the result
port. The base URI of the result is the (absolute) IRI used to
retrieve it.
2.1.16 p:make-absolute-uris
The p:make-absolute-uris step makes an element or
attribute's value in the source document an absolute IRI value in the
result document.
The value of the match option must be an
XSLTMatchPattern.
It is a dynamic error (err:XC0023) if
the pattern matches anything other than element or attribute
nodes.
The value of the base-uri option
must be an anyURI. It is interpreted
as an IRI reference. If it is relative, it is made absolute against
the base URI of the element on which it is specified
(p:with-optionXP or p:make-absolute-uris in the case of
a syntactic shortcutXP
value).
For every element or attribute in the input document which
matches the specified pattern, its XPath string-value is resolved
against the specified base URI and the resulting absolute IRI is used
as the matched node's entire contents in the output.
The base URI used for resolution defaults to the matched
attribute's element or the matched element's base URI unless the
base-uri option is specified. When the
base-uri option is specified, the option value is
used as the base URI regardless of any contextual base URI value in
the document. This option value is resolved against the base URI of
the p:optionXP element used to set the option.
If the IRI reference specified by the base-uri option
on p:make-absolute-uris is
not valid, or if it is absent and the input document has no base URI,
the results are implementation-dependentXP.
2.1.17 p:namespace-rename
The p:namespace-rename step renames any namespace declaration or
use of a namespace in a document to a new IRI value.
The value of the from option
must be an anyURI. It
should be either empty or absolute, but will not be
resolved in any case.
The value of the to option
must be an anyURI. It
should be empty or absolute, but will not be
resolved in any case.
The value of the apply-to option
must be one of “all”,
“elements”, or “attributes”.
If the value is “elements”, only elements will be
renamed, if the value is “attributes”, only attributes
will be renamed, if the value is “all”, both elements
and attributes will be renamed.
It is a dynamic error (err:XC0014)
if the XML namespace (http://www.w3.org/XML/1998/namespace)
or the XMLNS namespace (http://www.w3.org/2000/xmlns/) is
the value of either the from option or the
to option.
If the value of the from option is the same as
the value of the to option, the input is reproduced
unchanged on the output. Otherwise, namespace bindings, namespace
attributes and element and attribute names are changed as
follows:
Namespace bindings: If the from option is present
and its value is not the empty string,
then every binding of a prefix (or the default namespace) in the input
document whose value is the same as the value of the from
option is
replaced in the output with a binding to the value of the to
option, provided it is present and not the empty string;
otherwise (the to option is
not specified or has an empty string as its value) absent from the output.
If the from option is absent, or its value is the empty string,
then no bindings are changed or removed.
Elements and attributes: If the from option is present
and its value is not the empty string, for every element and attribute,
as appropriate, in the input whose namespace name is the same as the value of the
from option, in the output its namespace name is
replaced with the value of the to
option, provided it is present and not the empty string;
otherwise (the to option is
not specified or has an empty string as its value) changed to have no value.
If the from option is absent, or its value
is the empty string, then for every element and attribute, as appropriate,
whose namespace name has no value, in the
output its namespace name is set to the value of the
to option.
Namespace attributes: If the from option is present
and its value is not the empty string, for every namespace attribute in the
input whose value is the same as the value of the from option, in the output
the namespace attribute's value is replaced with the value of the to
option, provided it is present and not the empty string;
otherwise (the to option is
not specified or has an empty string as its value) the namespace attribute is absent.
Note
The apply-to option is primarily intended to make
it possible to avoid renaming attributes when the from option
specifies no namespace, since many attributes are in no namespace.
Care should be taken when specifying no namespace with the
to option. Prefixed names in content, for example QNames and
XPath expressions, may end up with no appropriate namespace binding.
2.1.18 p:pack
The p:pack step merges two document sequences in a pair-wise
fashion.
The value of the wrapper option
must be a QName. If the lexical value
does not contain a colon, then the wrapper-namespace
may be used to specify the namespace of the wrapper. In that case, the
wrapper-prefix may be specified to suggest a
prefix for the wrapper element.
It is a dynamic error (err:XD0034)
to specify a new namespace or prefix if the lexical value of the specified
name contains a colon.
The step takes each pair of documents, in order, one from the
source port and one from the alternate port,
wraps them with a new element node whose QName is the value specified
in the wrapper option, and writes that element to the
result port as a document.
If the step reaches the end of one input sequence before the
other, then it simply wraps each of the remaining documents in the
longer sequence.
Note
In the common case, where the document element of a document in
the result sequence has two element children, any
comments, processing instructions, or white space text nodes that
occur between them may have come from either of the input documents;
this step does not attempt to distinguish which one.
2.1.19 p:parameters
The p:parameters step exposes a set of parameters
as a c:param-set document.
Each parameter in the parameters map is converted into a
c:param element.
The resulting c:param elements are wrapped in a
c:param-set and the parameter set document is written
to the result port.
The
order in which c:param elements occur in the c:param-set is
implementation-dependentXP.
For consistency and user convenience, if any of the parameters
have names that are in a namespace, the
namespace attribute on the
c:param element must be used. Each
namemust be an NCName.
The base URI of the output document is the URI of the pipeline document
that contains the step.
2.1.19.1 The c:param element
A c:param represents a parameter on a parameter
input.
<c:param name = QName namespace? = anyURI value = string />
The name attribute of the
c:param must have the lexical form of a QName.
If the namespace attribute is
specified, then the expanded name of the parameter is constructed from
the specified namespace and the name
value. It is a dynamic
error (err:XD0025) if the namespace
attribute is specified, the name contains
a colon, and the specified namespace is not the same as the in-scope
namespace binding for the specified prefix.
If the namespace attribute is not
specified, and the name contains a colon,
then the expanded name of the parameter is constructed using the name value and the namespace declarations
in-scope on the c:param element.
If the namespace attribute is not
specified, and the name does not contain
a colon, then the expanded name of the parameter is in no
namespace.
Any namespace-qualified attribute names that appear on the
c:param element are ignored. It is a
dynamic error (err:XD0014) for any unqualified attribute
names other than “name”,
“namespace”, or “value” to
appear on a c:param element.
2.1.19.2 The c:param-set element
A c:param-set represents a set of parameters on a
parameter input.
The c:param-set contains zero or more
c:param elements. It is a
dynamic error (err:XD0018) if the parameter list contains
any elements other than c:param.
Any namespace-qualified attribute names that appear on the
c:param-set element are ignored. It is
a dynamic error (err:XD0014) for any unqualified attribute
names to appear on a c:param-set
element.
2.1.20 p:rename
The p:rename step renames elements, attributes, or
processing-instruction targets in a document.
The value of the match option must be an
XSLTMatchPattern. It is a dynamic
error (err:XC0023) if the pattern matches anything other than element,
attribute or processing instruction nodes.
The value of the new-name option must be a
QName.
If the lexical value does not contain a colon, then the new-namespace may be used to specify the
namespace of the new name. In that case, the new-prefix may be specified to suggest a
prefix for the new name. It is a
dynamic error (err:XD0034) to specify a new namespace or
prefix if the lexical value of the specified name contains a
colon.
Each element, attribute, or processing-instruction in the input
matched by the match pattern specified in the match
option is renamed in the output to the name specified by the
new-name option.
If the match option matches an attribute and if
the element on which it occurs already has an attribute whose expanded
name is the same as the expanded name of the specified
new-name, then the results is as if the current
attribute named “new-name” was deleted before
renaming the matched attribute.
With respect to attributes named “xml:base”, the
following semantics apply: renaming an from
“xml:base” to something else has
no effect on the underlying base URI of the element; however,
if an attribute is renamed from something else
to “xml:base”, the base URI
of the element must also be amended accordingly.
If the pattern matches processing instructions, then it is the
processing instruction target that is renamed. It
is a dynamic error (err:XC0013) if the pattern matches
a processing instruction and the new name has a non-null namespace.
2.1.21 p:replace
The p:replace step replaces matching nodes in
its primary input with the document element of the
replacement port's document.
The value of the match option
must be an XSLTMatchPattern. It
is a dynamic error (err:XC0023) if that pattern matches
anything other than element, text, processing-instruction, or comment
nodes. Multiple matches are allowed, in which case multiple
copies of the replacement document will occur.
Every node in the primary input matching the specified
pattern is replaced in the output is replaced by the document element
of the replacement document. Only non-nested matches are
replaced. That is, once a node is replaced, its descendants cannot
be matched.
2.1.22 p:set-attributes
The p:set-attributes step sets attributes on
matching elements.
The value of the match option must be an
XSLTMatchPattern. It is a dynamic
error (err:XC0023) if that pattern matches anything other than element
nodes.
Each attribute on the document element of the document that
appears on the attributes port is copied to each element
that matches the match expression.
If an attribute with the same name as one of the attributes to
be copied already exists, the value specified on the
attribute port's document is used. The result port of
this step produces a copy of the source port's document
with the matching elements' attributes modified.
The matching elements are specified by the match pattern in the
match option. All matching elements are processed.
If no elements match, the step will not change any elements.
This step must not copy namespace declarations. If the attributes
copied from the attributes use namespaces, prefixes, or
prefixes bound to different namespaces, the document produced on the
result output port will require
Section 2.5.1, “Namespace Fixup on XML Outputs”XP.
If an attribute named
xml:base is added or changed, the base URI
of the element must also be amended accordingly.
The document propertiesXP of the document
on the source port are augmented with the values specified
in the properties option. The document produced on
the result port has the same representation but the
adjusted property values.
It is a dynamic
error (err:XC1001) if the properties map contains
a key equal to the string “content-type”.
2.1.24 p:sink
The p:sink step accepts a sequence of documents and
discards them. It has no output.
The value of the test option must be an XPathExpression.
The XPath expression in the test option is
applied to each document in the input sequence. If the effective
boolean value of the expression is true, the document is copied to the
matched port; otherwise it is copied to the
not-matched port.
If the initial-only option is true, then when
the first document that does not satisfy the test expression is
encountered, it and all the documents that follow
it are written to the not-matched port.
In other words, it only writes the initial series of matched
documents (which may be empty) to the matched port.
All other documents are written to the not-matched port,
irrespective of whether or not they match.
The XPath contextXP for the
test option changes over time. For each document that
appears on the source port, the expression is evaluated
with that document as the context document. The context position
(position()) is the position of that document within the
sequence and the context size (last()) is the total
number of documents in the sequence.
Note
In principle, this component cannot stream because it must
buffer all of the input sequence in order to find the context size. In
practice, if the test expression does not use the
last() function, the implementation can stream
and ignore the context size.
If the implementation supports passing PSVI annotations between
steps, the p:split-sequence step must preserve
any annotations that appear in the input.
2.1.26 p:store
The p:store step stores a serialized version of its
input to a URI. This step outputs a reference to the location of the
stored document.
The value of the href option
must be an anyURI. If it is relative,
it is made absolute against the base URI of the element on which it is
specified (p:with-optionXP or p:store in the case
of a syntactic shortcutXP
value).
The step attempts to store the XML document to the specified
URI. It is a dynamic error (err:XC0050)
if the URI scheme is not supported or the step cannot store to the
specified location.
The output of this step is a document containing a single
c:result element whose content is the absolute URI of the
document stored by the step.
The standard serialization options are provided to control the
serialization of XML content when it is stored. These options are
as specified in Section 2.3, “Serialization Options”.
2.1.27 p:string-replace
The p:string-replace step matches nodes in the
document provided on the source port and replaces them
with the string result of evaluating an XPath expression.
The value of the match option must be an
XSLTMatchPattern.
The value of the replace option must be an
XPathExpression.
The matched nodes are specified with the match pattern in the
match option.
For each matching node, the XPath
expression provided by the replace option is
evaluated with the matching node as the XPath context node.
The string value of the result is used in the output.
Nodes that do not match are copied without change.
If the expression given in the match option
matches an attribute, the string value of the
replace
expression is used as the new value of the attribute in the output.
If the attribute is named “xml:base”, the base URI
of the element must also be amended accordingly.
If the expression matches any other kind of node, the entire
node (and not just its contents) is replaced by
the string value of the replace expression.
2.1.28 p:unescape-markup
The p:unescape-markup step takes the string value of
the document element and parses the content as if it was a Unicode
character stream containing serialized XML. The output consists of the
same document element with children that result from the parse. This
is the reverse of the p:escape-markup step.
The value of the namespace option
must be an anyURI. It
should be absolute, but will not be
resolved.
When the string value is parsed, the original document element
is preserved so that the result will be well-formed XML even if the
content consists of multiple, sibling elements.
The namespace option specifies a default
namespace. Elements that are in no namespace in the unescaped content
will be placed into this namespace unless there is an in-scope namespace
declaration that specifies a different namespace (or explicitly undeclares
the default namespace).
The content-type option may
be used to specify an alternate content type for the string value. An
implementation may use a different parser to
produce XML content depending on the specified content-type. For
example, an implementation might provide an HTML to XHTML parser (e.g.
[HTML Tidy] or [TagSoup]) for the
content type 'text/html'.
All implementations must support the content
type application/xml, and must use a standard XML
parser for it. It is a dynamic
error (err:XC0051) if the content-type specified is not supported by
the implementation.
Behavior of
p:unescape-markup for content-types other
than application/xml is
implementation-definedXP.
The encoding option specifies how the data is
encoded. All implementations must support the
base64 encoding (and the absence of an encoding
option, which implies that the content is plain Unicode text).
It is a dynamic error (err:XC0052) if the
encoding specified is not supported by the
implementation.
If an encoding is specified, a
charset may also be specified.
The character set may be specified as a parameter on the
content-type or via the separate
charset option. If it is specified in both places,
the value of the charset option
must be used.
If the specified
encoding is base64,
then the character set
must be specified.
It is a dynamic error (err:XC0010)
if an encoding of base64 is specified and
the character set is not specified or if the specified
character set is not supported by the implementation.
The octet-stream that results from decoding the
text must be interpreted using the character encoding named by
the value of the charset option
to produce a sequence of Unicode characters to parse.
If no encoding is specified, the character set
is ignored, irrespective of where it was specified.
For example, with the 'namespace' option set to the XHTML
namespace, the following input:
<description>
<p>This is a chunk.</p>
<p>This is a another chunk.</p>
</description>
would produce:
<description>
<p xmlns="http://www.w3.org/1999/xhtml">This is a chunk.</p>
<p xmlns="http://www.w3.org/1999/xhtml">This is a another chunk.</p>
</description>
2.1.29 p:unwrap
The p:unwrap step replaces matched elements with their
children.
The value of the match option must be an
XSLTMatchPattern. It is a dynamic
error (err:XC0023) if that pattern matches anything other than element
nodes.
Every element in the source document that matches
the specified match pattern is replaced by its children,
effectively “unwrapping” the children from their parent. Non-element nodes
and unmatched elements are passed through unchanged.
Note
The matching applies to the entire document, not just the “top-most”
matches. A pattern of the form h:div will replace
allh:div elements, not just the top-most
ones.
This step produces a single document; if the document element is
unwrapped, the result might not be well-formed XML.
2.1.30 p:wrap
The p:wrap step wraps matching nodes in the
source document with a new parent element.
The value of the wrapper option
must be a QName. If the lexical value
does not contain a colon, then the wrapper-namespace
may be used to specify the namespace of the wrapper. In that case, the
wrapper-prefix may be specified to suggest a
prefix for the wrapper element.
It is a dynamic error (err:XD0034)
to specify a new namespace or prefix if the lexical value of the specified
name contains a colon.
The value of the match option
must be an XSLTMatchPattern. It
is a dynamic error (err:XC0023) if the pattern matches
anything other than document, element, text, processing instruction, and comment
nodes.
The value of the group-adjacent option
must be an XPathExpression.
If the node matched is the document node (match="/"),
the result is a new document where the document element is a new
element node whose QName is the value specified in the
wrapper option. That new element contains copies of
all of the children of the original document node.
When the match pattern does not match the document node,
every node that matches the specified match
pattern is replaced with a new element node whose QName is the value
specified in the wrapper option.
The content of that new element is a copy of the original,
matching node. The p:wrap step performs a "deep" wrapping, the children
of the matching node and their descendants are processed and wrappers
are added to all matching nodes.
The group-adjacent option can be used to group
adjacent matching nodes in a single wrapper element. The specified
XPath expression is evaluated for each matching node with that node
as the XPath context node. Whenever two or more adjacent matching nodes
have the same “group adjacent” value, they are wrapped together in
a single wrapper element.
Two matching nodes are considered adjacent if and only if they
are siblings and either there are no nodes between them or all
intervening, non-matching nodes are whitespace text, comment, or processing
instruction nodes.
2.1.31 p:wrap-sequence
The p:wrap-sequence step accepts a sequence of
documents and produces either a single document or a new sequence of
documents.
The value of the wrapper option
must be a QName. If the lexical value
does not contain a colon, then the wrapper-namespace
may be used to specify the namespace of the wrapper. In that case, the
wrapper-prefix may be specified to suggest a
prefix for the wrapper element.
It is a dynamic error (err:XD0034)
to specify a new namespace or prefix if the lexical value of the specified
name contains a colon.
The value of the group-adjacent option
must be an XPathExpression.
In its simplest form, p:wrap-sequence takes a
sequence of documents and produces a single, new document by placing
each document in the source sequence inside a new
document element as sequential siblings. The name of the document
element is the value specified in the wrapper
option.
The group-adjacent option can be used to group
adjacent documents.
The
XPath
contextXP for the
group-adjacent option changes over time. For each document that
appears on the source port, the expression is evaluated
with that document as the context document. The context position
(position()) is the position of that document within the
sequence and the context size (last()) is the total
number of documents in the sequence.
Whenever
two or more sequentially adjacent documents have the same “group
adjacent” value, they are wrapped together in a single wrapper
element.
2.1.32 p:xinclude
The p:xinclude step applies [XInclude] processing to the source document.
If present, the value of the initial-mode
option must be a QName.
If present, the value of the template-name
option must be a QName.
If present, the value of the output-base-uri
option must be an anyURI. If it is
relative, it is made absolute against the base URI of the element on
which it is specified (p:with-optionXP or p:xslt in the
case of a syntactic shortcutXP
value).
If the step specifies a version, then that version
of XSLT must be used to process the transformation.
It is a
dynamic error (err:XC0038) if the specified version
is not available. If the step does not specify a version, the
implementation may use any version it has available and may use any means
to determine what version to use, including, but not limited to,
examining the version of the stylesheet.
The XSLT stylesheet provided on the stylesheet port
is applied to the document on the source port. Any
parameters passed in the parameters option are used
to define top-level stylesheet parameters. The primary result document
of the transformation, if there is one, appears on the
result port. At most one document can appear on the
result port. All other result documents appear on the
secondary port. The order in which result documents
appear on the secondary port is
implementation-dependentXP. If XSLT 1.0 is
used, an empty sequence of documents must appear on
the secondary port.
If a sequence of documents is provided on the
source port, the first document is used as the
primary input document. The whole sequence is also the default
collection.
If no documents are provided on the source port,
the primary input document is undefined and the default collection
is empty.
It is a
dynamic error (err:XC0039) if a sequence of documents (including
an empty sequence) is provided
to an XSLT 1.0 step.
A dynamic error occurs if the XSLT processor signals a fatal
error. This includes the case where the transformation terminates due
to a xsl:message instruction with a terminate attribute value of
“yes”. How XSLT message termination
errors are reported to the XProc processor is
implementation-dependentXP.
The invocation of the transformation is controlled by the
initial-mode and template-name
options that set the initial mode and/or named template in the XSLT
transformation where processing begins. It is a
dynamic error (err:XC0056) if the specified initial mode
or named template cannot be applied to the specified stylesheet.
The output-base-uri option sets the context's
output base URI per the XSLT 2.0 specification, otherwise the base URI
of the result document is the base URI of the first
document in the source port's sequence. If the value of
the output-base-uri option is not absolute, it will
be resolved using the base URI of its p:optionXP
element. An XSLT 1.0 step should use the value of the
output-base-uri as the base URI of its output, if the
option is specified.
If XSLT 2.0 is used, the outputs of this step
may include PSVI annotations.
The static and initial dynamic contexts of the XSLT processor
are the contexts defined in
Section 2.7.2, “Step XPath Context”XP
with the following adjustments.
The dynamic context is augmented as follows:
Context item
The first document that appears on the source port.
Variable values
Any parameters
passed in the parameters option are available as variable bindings
to the XSLT processor.
Function implementations
The function implementations provided by the XSLT processor.
Default collection
The sequence of documents provided on the source port.
2.2 Optional Steps
The following steps are optional. If they are supported by a
processor, they must conform to the semantics outlined here, but a
conformant processor is not required to support all (or any) of these
steps.
2.2.1 p:exec
The p:exec step runs an external command passing the
input that arrives on its source port as standard input,
reading result from standard output, and errors
from standard error.
The values of the command, args,
cwd, path-separator, and
arg-separator options must be strings.
The values of the source-is-xml,
result-is-xml, errors-is-xml,
and fix-slashes options must be
boolean.
The p:exec step executes the command passed on
command with the arguments passed on
args. The processor does not interpolate the values
of the command or args (for example,
expanding references to environment variables).
It is a dynamic
error (err:XC0033) if the command cannot be run.
If cwd is specified, then the current working
directory is changed to the value of that option before execution
begins. It is a dynamic
error (err:XC0034) if the current working directory cannot be changed
to the value of the cwd option.
If cwd is not
specified, the current working directory is
implementation-definedXP.
If the path-separator option is specified,
every occurrence of the character identified as the
path-separator character that occurs in the
command, args, or
cwd will be replaced by the platform-specific path
separator character. It is a dynamic
error (err:XC0063) if the path-separator option is
specified and is not exactly one character long.
The value of the args option is a string. In
order to support passing more than one argument to a command, the
args string is broken into a sequence of values.
The arg-separator option specifies the character
that is used to separate values; by default it is a single space
It is a dynamic error (err:XC0066) if
the arg-separator option is specified and is not
exactly one character long.
The following examples of p:exec are equivalent. The
first uses the default arg-separator:
If one of the arguments contains a space (e.g., a filename that
contains a space), then you must specify an alternate separator.
The source port is declared to accept a sequence so that
it can be empty. If no document appears on the source port, then the
command receives nothing on standard input. If a document does arrive on the source port,
it will
be passed to the command as its standard input. It is a dynamic error (err:XD0006) if
more than one document appears on the source port of the p:exec step.
If
source-is-xml is true, the serialization options are
used to convert the input into serialized XML which is passed to
the command, otherwise the XPath string-value
of the document is passed.
The standard output of the command is read and returned on
result; the standard error output is read and returned on
errors. In order to assure that the result will be an
XML document, each of the results will be wrapped in a c:result
element.
If result-is-xml is true, the standard output of
the program is assumed to be XML and will be parsed as a single document.
If it is false, the output is assumed not to be XML
and will be returned as escaped text.
If wrap-result-lines is
true, a c:line element will be wrapped around each line of output.
The same rules apply to the
standard error output of the program, with the errors-is-xml
and wrap-error-lines options, respectively.
If either of the results are XML, they must be
parsed with namespaces enabled and validation turned off, just like
p:documentXP.
The exit-status port always returns a single
c:result element which contains the system exit status that
the process returned. The specific exit status values returned by
a process invoked with p:exec are
implementation-dependentXP.
If a failure-threshold value is supplied, and the
exit status is greater than that threshold, then the p:exec
step must fail.
It is a dynamic
error (err:XC0064) if the exit code from the command is greater than
the specified failure-threshold value.
This failure, like any step failure,
can be captured with a p:tryXP.
2.2.2 p:hash
The p:hash step generates a hash, or digital “fingerprint”,
for some value and injects it into the source document.
The value of the algorithm option must be a QName.
If it does not have a prefix, then it must be one of the following values:
“crc”, “md”, or “sha”.
If a version is not specified, the
default version is algorithm-defined. For “crc” it
is 32, for “md” it is 5, for “sha”
it is 1.
A hash is constructed from the string specified in the
value option using the specified algorithm and version.
Implementations must support
[CRC32],
[MD5], and [SHA1]
hashes. It is
implementation-definedXP what other algorithms are
supported.
The resulting hash should be returned as a string of
hexadecimal characters.
The value of the match option must be an
XSLTMatchPattern.
The hash of the specified value is computed using the algorithm and
parameters specified. It is a
dynamic error (err:XC0036) if the requested hash algorithm is not
one that the processor understands or if the value or parameters are
not appropriate for that algorithm.
The matched nodes are specified with the match pattern in the
match option. For each matching node, the string
value of the computed hash is used in the output (if more than one node
matches, the same hash value is used in each match).
Nodes that do not
match are copied without change.
If the expression given in the match option
matches an attribute, the hash is used as the new
value of the attribute in the output.
If the attribute is named “xml:base”, the base URI
of the element must also be amended accordingly.
If the expression matches any
other kind of node, the entire node (and not just
its contents) is replaced by the hash.
2.2.3 p:in-scope-names
The p:in-scope-names step exposes all of the
in-scope variables and options as a set of parameters in a
c:param-set document.
Each in-scope variable and option is converted into a
c:param element.
The resulting c:param elements are wrapped in a
c:param-set and the parameter set document is written
to the result port.
The
order in which c:param elements occur in the c:param-set is
implementation-dependentXP.
For consistency and user convenience, if any of the variables or options
have names that are in a namespace, the
namespace attribute on the
c:param element must be used. Each
namemust be an NCName.
The base URI of the output document is the URI of the pipeline document
that contains the step.
For consistency with the p:parameters step, the
result port is not primary.
2.2.3.1 Example
This unlikely pipeline demonstrates the behavior of p:in-scope-names:
While evaluating each expression, the names of any parameters passed to the
step are available as variable values in the XPath dynamic context.
The step searches for XPath expressions in attribute values,
text content (adjacent text nodes, if they occur in the data model,
must be coalesced; this step always processes maximal length text
nodes), processing instruction data, and comments. XPath expressions
are identified by curly braces, similar to attribute value templates
in XSLT or enclosed expressions in XQuery.
In order to allow curly braces to appear literally in content, they can be escaped
by doubling them. In other words, where “{” would start an XPath expression,
“{{” is simply a single, literal opening curly brace.
The same applies for closing curly braces.
Inside an XPath expression, strings quoted by single (') or
double (") quotes are treated literally. Outside of quoted text, it
is an error for an opening curly brace to occur. A closing curly brace ends the
XPath expression (whether or not it is followed immediately by another closing
curly brace).
These parsing rules can be described by the following algorithm, though implementations
are by no means required to implement the parsing in exactly this way, provided that they
achieve the same results.
The parser begins in regular-mode at the start of
each unit of content where expansion may occur. In regular-mode:
“{{” is replaced by a single “{”.
“}}” is replaced by a single “}”.
Note:
It is a dynamic error (err:XC0067) to
encounter a single closing curly brace “}” that is not immediately
followed by another closing curly brace.
A single opening curly brace “{” (not
immediately followed by another opening curly brace) is discarded and
the parser moves into xpath-mode. The inital expression is empty.
A closing curly brace “}” is discarded and ends the
expression. The expression is evaluated and the result of that
evaluation is copied to the output. The parser returns to
regular-mode.
Note: Braces cannot be escaped by doubling them
in xpath-mode.
A single quote (') is added to the current expression and
the parser moves to single-quote-mode.
A double quote (") is added to the current expression and
the parser moves to double-quote-mode.
All other characters are appended to the current expression.
In single-quote-mode:
A single quote (') is added to the current expression and
the parser moves to xpath-mode.
All other characters are appended to the current expression.
In double-quote-mode:
A double quote (") is added to the current expression and
the parser moves to xpath-mode.
All other characters are appended to the current expression.
It is a dynamic error (err:XC0067) if the parser reaches
the end of the unit of content and it is not in regular-mode.
The context node used for each expression is the document passed on the
source port.
It is a dynamic error (err:XC0068)
if more than one document appears on the source port.
In an XPath 1.0 implementation, if
p:emptyXP is given or implied on the source port, an
empty document node is used as
the context node. In an XPath 2.0 implementation, the context item is
undefined.
It is a dynamic error (err:XC0026) if
any XPath expression makes reference to the context node, size, or
position when the context item is undefined.
In an attribute value, processing instruction, or comment, the
string value of the XPath expression is used. In text content, an
expression that selects nodes will cause those nodes to be copied into
the template document.
Note
Depending on which version of XPath an implementation supports,
and possibly on the xpath-version setting on
the p:template, some implementations may report errors, or
different results, than other implementations in those cases where the
interpretation of an XPath expression differs between the versions of
XPath.
2.2.4.1 Example
It's quite common to construct documents using values computed
by the pipeline. This is particularly (but not exclusively) the case
when the pipeline uses the p:http-request step. The input
to p:http-request is a c:request document;
attributes on the c:request element control most of the
request parameters; the body of the document forms the body of
request.
If we assume that the href value and the computed
content come from an input document, and the username and password are options, then a
typical pipeline to compute the request becomes quite complex.
There's nothing wrong with this pipeline, but it requires
several steps to accomplish with the pipeline author probably
considers a single operation. What's more, the result of these steps
is not immediately obvious on casual inspection.
In order to make this simple construction case both literally
and conceptually simpler, this note introduces two new XProc steps in
the XProc namespace. Support for these steps is optional, but we
strongly encourage implementors to provide them.
The p:in-scope-names step provides all of the in-scope options and variables
in a c:param-set (this operation is exactly analagous to what the
p:parameters step does, except that it operates on the options and variables instead
of on parameters).
The p:template step searches for XPath
expressions, delimited by curly braces, in a template document and replaces each with the
result of evaluating the expression. All of the parameters passed to the
p:template step are available as in-scope variable names when evaluating
each XPath expression.
Where the expressions occur in attribute values, their string value is used. Where
they appear in text content, their node values are used.
2.2.5 p:uuid
The p:uuid step generates a
[UUID] and injects it into
the source document.
The value of the match option must be an
XSLTMatchPattern. The value of the version option
must be an integer.
If the version is specified, that version of
UUID must be computed. It is a dynamic
error (err:XC0060) if the processor does not support the specified
version of the UUID algorithm. If the
version is not specified, the version of UUID
computed is
implementation-definedXP.
Implementations must support version 4 UUIDs.
Support for other versions of UUID, and the mechanism by which
the necessary inputs are made available for computing other versions,
is implementation-definedXP.
The matched nodes are specified with the match pattern in the
match option. For each matching node, the generated
UUID is used in the output (if more than one node matches, the
same UUID is used in each match). Nodes that do not
match are copied without change.
If the expression given in the match option
matches an attribute, the UUID is used as the new
value of the attribute in the output. If the attribute is named “xml:base”, the base URI of the element
must also be amended accordingly.
If the expression matches any
other kind of node, the entire node (and not just
its contents) is replaced by the UUID.
2.2.6 p:validate-with-relax-ng
The p:validate-with-relax-ng step applies
[RELAX NG]
validation to the source document.
The values of the dtd-attribute-values and
dtd-id-idref-warnings options
must be booleans.
If the schema document has an XML media type, then
it must be interpreted as a RELAX NG Grammar. If
the media type has a “text” type, then it
must be interpreted as a [RELAX NG
Compact Syntax] document for validation.
If the dtd-attribute-values option is
true, then the attribute value defaulting conventions of
[RELAX NG DTD Compatibility] are also applied.
If the dtd-id-idref-warnings option is
true, then the validator should
treat a schema that is incompatible with the ID/IDREF/IDREFs feature
of [RELAX NG DTD Compatibility] as if the document
was invalid.
It is a dynamic error (err:XC0053)
if the assert-valid option is true
and the input document is not valid.
The output from this step is a copy of the input, possibly
augmented by application of the
[RELAX NG DTD Compatibility]. The output of this step
may include PSVI annotations.
The values of the use-location-hints,
try-namespaces, and
assert-valid
options
must be boolean.
The value of the mode option
must be an NMTOKEN whose value is either
“strict” or “lax”.
Validation is performed against the set of schemas represented
by the documents on the schema port. These schemas must
be used in preference to any schema locations provided by schema
location hints encountered during schema validation, that is, schema
locations supplied for xs:import or
xsi:schema-location, or determined by
schema-processor-defined namespace-based strategies, for the
namespaces covered by the documents available on the schemas port.
If xs:include elements occur within the supplied
schema documents, they are treated like any other
external
documentsXP. It is
implementation-definedXP if the documents supplied
on the schemas port are considered when resolving
xs:include elements in the schema documents provided.
The use-location-hints and
try-namespaces options allow the pipeline author to
control how the schema processor should attempt to locate schema
documents necessary but not provided on the schema
port. Any schema documents provided on the schema port
must be used in preference to schema documents
located by other means.
If the use-location-hints option is
“true”, the processor should
make use of schema location hints to locate schema documents. If the
option is “false”, the processor
should ignore any such hints.
If the try-namespaces option is
“true”, the processor should
attempt to dereference the namespace URI to locate schema documents.
If the
option is “false”, the processor
should not dereference namespace URIs.
The mode option allow the pipeline author to
control how schema validation begins. The “strict”
mode means that the document element must be declared and
schema-valid, otherwise it will be treated as invalid. The
“lax” mode means that the
absence of a declaration for the document element does not itself
count as an unsuccessful outcome of validation.
If the step specifies a version, then that version
of XML Schema must be used to process the validation.
It is a
dynamic error (err:XC0038) if the specified version
is not available. If the step does not specify a version, the
implementation may use any version it has available and may use any means
to determine what version to use, including, but not limited to,
examining the version of the schema(s).
It is a dynamic error (err:XC0053)
if the assert-valid option is true
and the input document is not valid. If the assert-valid
option is false, it is not an error for the document
to be invalid. In this case, if the implementation does not
support the PSVI, p:validate-with-xml-schema is essentially
just an “identity” step, but if the implementation does
support the PSVI, then the resulting document will have additional type
information (at least for the subtrees that are valid).
When XML Schema validation assessment
is performed, the processor is invoked in the mode specified by the
mode option.
It is a dynamic error (err:XC0055)
if the implementation does not support the specified mode.
The result of the assessment is a document with the
Post-Schema-Validation-Infoset (PSVI) ([W3C XML Schema: Part 1]) annotations, if the pipeline implementation
supports such annotations. If not, the input document is reproduced
with any defaulting of attributes and elements performed as specified
by the XML Schema recommendation.
2.2.9 p:www-form-urldecode
The p:www-form-urldecode step decodes a
x-www-form-urlencoded string into an XML representation.
The value option is interpreted as a string of
parameter values encoded using the
x-www-form-urlencoded algorithm. Each name/value
pair is written in a c:param element.
The entire set of parameters
is written (as a c:param-set) on the result
output port.
It is a
dynamic error (err:XC0037) if the value provided
is not a properly
x-www-form-urlencoded value.
It is a
dynamic error (err:XC0061) if the name of any encoded parameter
name is not a valid xs:NCName. In other words, this
step can only decode simple name/value pairs where the names do not contain
colons or any characters that cannot be used in XML names.
The order of the c:param elements in the result is the same
as the order of the encoded parameters, reading from left to right.
If any parameter name occurs more than once in the encoded string,
the resulting parameter set will contain a c:param for
each instance.
2.2.10 p:www-form-urlencode
The p:www-form-urlencode step encodes a set of parameter
values as a x-www-form-urlencoded string and
injects it into the source document.
The value of the match option must be an
XSLTMatchPattern.
The set of parameters is encoded as a single
x-www-form-urlencoded string of name/value pairs.
When parameters are encoded into name/value pairs,
only the local name of each parameter is used.
The namespace name is ignored and no prefix or colon appears in the name.
The matched nodes are specified with the match pattern in the
match option. For each matching node, the encoded
string is used in the output. Nodes that do not
match are copied without change.
If the expression given in the match option
matches an attribute, the encoded
string is used as the new value of the attribute in the output.
If the expression matches any other kind of node, the entire
node (and not just its contents) is replaced by
the encoded string.
2.2.11 p:xquery
The p:xquery step applies an
[XQuery 1.0] query to the sequence of documents
provided on the source port.
If a sequence of documents is provided on the
source port, the first document is used as the
initial context item. The whole sequence is also the default
collection. If no documents are provided on the source port,
the initial context item is undefined and the default collection
is empty.
The query port must receive a single document:
If the document root element is c:query, the text
descendants of this element are considered the query.
<c:query> string </c:query>
If the document root element is in the XQueryX namespace, the
document is treated as an XQueryX-encoded query. Support for
XQueryX is implementation-definedXP.
If the query document has an XML media type, then
the string value of the document must be treated as
the query. If the media type has a “text” type,
then it must be interpreted as the query.
If the step specifies a version, then that version
of XQuery must be used to process the transformation.
It is a
dynamic error (err:XC0038) if the specified version
is not available. If the step does not specify a version, the
implementation may use any version it has available and may use any means
to determine what version to use, including, but not limited to,
examining the version of the query.
The result of the p:xquery step must be a sequence of
documents. It is a dynamic
error (err:XC0057) if the sequence that results from evaluating the XQuery contains
items other than documents and elements. Any elements that appear
in the result sequence will be treated as documents with the element as their
document element.
The sequence of documents provided on the source port.
2.2.11.1 Example
The following pipeline applies XInclude processing and schema
validation before using XQuery:
Where countp.xq might contain:
<count>{count(.//p)}</count>
2.2.12 p:xsl-formatter
The p:xsl-formatter step receives an [XSL 1.1] document and renders the content. The result of
rendering is stored to the URI provided via the href
option. A reference to that result is produced on the output
port.
The value of the href option
must be an anyURI. If it is relative,
it is made absolute against the base URI of the element on which it is
specified (p:with-optionXP or p:xsl-formatter in the
case of a syntactic shortcutXP
value).
The content-type of the output is controlled by the
content-type option. This option specifies a media
type as defined by [IANA Media Types]. The option may
include media type parameters as well (e.g.
"application/someformat; charset=UTF-8"). The use of media type
parameters on the content-type option is
implementation-definedXP.
If the content-type option is not specified,
the output type is implementation-definedXP. The default should be
PDF.
A formatter may take any number of optional rendering
parameters via the step's parameters; such parameters
are defined by the XSL implementation used and are
implementation-definedXP.
The output of this step is a document containing a single
c:result element whose content is the absolute URI of the
document stored by the step.
2.3 Serialization Options
Several steps in this step library require serialization options
to control the serialization of XML. These options are used to control
serialization as in the [Serialization]
specification.
The following options may be present on steps that perform
serialization:
byte-order-mark
The value of this option must be a boolean.
If it's not specified, the default varies by encoding: for UTF-16 it's
true, for all others, it's false.
cdata-section-elements
The value of this option must be a list of
QNames. They are interpreted as elements name.
doctype-public
The value of this option must be a string.
The public identifier of the doctype.
doctype-system
The value of this option must be an
anyURI. The system identifier of the doctype. It need not
be absolute, and is not resolved.
encoding
A character set name. If no encoding is
specified, the encoding used is implementation
definedXP. If the method is
“xml” or “xhtml”, the
implementation defined encoding must be either
UTF-8 or UTF-16.
escape-uri-attributes
The value of this option must be a
boolean. It is ignored unless the specified method is
“xhtml” or “html”.
include-content-type
The value of this option must be a boolean.
It is ignored unless the specified method is
“xhtml” or “html”.
indent
The value of this option must be a
boolean.
media-type
The value of this option must be a string. It
specifies the media type (MIME content type). If not specified, the
default varies according to the method:
The value of this option must be a
QName. It specifies the serialization method.
normalization-form
The value of this option must be an NMTOKEN,
one of the enumerated values NFC, NFD,
NFKC, NFKD, fully-normalized,
none or an implementation-defined value.
omit-xml-declaration
The value of this option must be a
boolean.
standalone
The value of this option must be an NMTOKEN,
one of the enumerated values true, false, or
omit.
undeclare-prefixes
The value of this option must be a
boolean.
version
The value of this option must be a
string.
In order to be consistent with the rest of this specification,
boolean values for the serialization parameters must use one of the
XML Schema lexical forms for boolean: "true", "false", "1", or "0".
This is different from the [Serialization]
specification which uses “yes” and “no”. No change in
semantics is implied by this different spelling.
The method option controls the serialization
method used by this component with standard values of 'html', 'xml',
'xhtml', and 'text' but only the 'xml' value is required to be
supported. The interpretation of the remaining options is as
specified in [Serialization].
A minimally conforming implementation must support the
xml output method with the following option
values:
The version must support the value 1.0.
The encoding must support the values UTF-8.
The omit-xml-declaration must be supported. If the value is not specified or has the value no, an XML declaration must be produced.
All other option values may be ignored for the xml
output method.
If a processor chooses to implement an option for serialization,
it must conform to the semantics defined in the [Serialization] specification.
Note
The use-character-maps parameter in [Serialization] specification has not been provided in
the standard serialization options provided by this
specification.
3 Errors
Errors in a pipeline can be divided into two classes: static errors and dynamic
errors.
3.1 Static Errors
[Definition: A static error is one which can
be detected before pipeline evaluation is even attempted.] Examples of static
errors include cycles and incorrect specification of inputs and outputs.
Static errors are fatal and must be detected before any steps are evaluated.
A [Definition: A dynamic error is one which
occurs while a pipeline is being evaluated.] Examples of dynamic errors include
references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the
capacity of an implementation (such as memory or disk space).
If a step fails due to a dynamic error, failure propagates upwards until either a
p:tryXP is encountered or the entire pipeline fails. In other words, outside of a
p:tryXP, step failure causes the entire pipeline to fail.
It
is a dynamic error if a username
or password is specified without specifying an
auth-method, if
the requested
auth-method isn't supported, or the authentication
challenge contains an authentication method that isn't
supported.
It is a dynamic error
if an encoding of base64 is specified and
the character set is not specified or if the specified
character set is not supported by the implementation.
It is a
dynamic error if the contents of the directory
path are not available to the step due to access restrictions in the
environment in which the pipeline is run.
It is a dynamic error
if the XML namespace (http://www.w3.org/XML/1998/namespace)
or the XMLNS namespace (http://www.w3.org/2000/xmlns/) is
the value of either the from option or the
to option.
It is a
dynamic error if the the user specifies a value
or values that are inconsistent with each other or with the requirements
of the step or protocol.
it is a dynamic error if
the content of the c:body element does not consist of
exactly one element, optionally preceded and/or followed by any number
of processing instructions, comments or whitespace characters
It is a dynamic error
if the match pattern matches anything other than an element node and
the value of the position option is
“first-child” or
“last-child”.
It is a
dynamic error if the requested hash algorithm is not
one that the processor understands or if the value or parameters are
not appropriate for that algorithm.
It is a dynamic error if the QName
value in the attribute-name option uses the prefix
“xmlns”
or any other prefix that resolves to the namespace name
http://www.w3.org/2000/xmlns/.
[RELAX NG
Compact Syntax] ISO/IEC JTC 1/SC 34.
ISO/IEC 19757-2:2003/Amd 1:2006 Document Schema Definition
Languages (DSDL) — Part 2: Grammar-based validation — RELAX NG AMENDMENT 1
Compact Syntax
2006.
[RELAX NG DTD Compatibility]
RELAX NG DTD Compatibility.
OASIS Committee Specification.
3 December 2001.
[Schematron] ISO/IEC JTC 1/SC 34.
ISO/IEC 19757-3:2006(E) Document Schema Definition
Languages (DSDL) — Part 3: Rule-based validation — Schematron
2006.
[W3C XML Schema: Part 1]
XML Schema Part 1:
Structures Second Edition.
Henry S. Thompson, David Beech, Murray Maloney, et. al., editors.
World Wide Web Consortium, 28 October 2004.
[Serialization]
XSLT 2.0 and XQuery 1.0 Serialization.
Scott Boag, Michael Kay, Joanne Tong, Norman Walsh, and Henry Zongaro, editors. W3C Recommendation. 23 January 2007.
[CRC32]
“32-Bit Cyclic Redundancy Codes for Internet Applications”,
The International Conference on Dependable Systems and Networks:
459. 10.1109/DSN.2002.1028931.
P. Koopman. June 2002.