XProc: An XML Pipeline Language

Editor's Working Draft 19 Aug 2006

This Version:: http://www.w3.org/TR/2006/ED-xproc-20060819/
Latest Version:: http://www.w3.org/TR/xproc/
Editor:: Norman Walsh, Sun Microsystems, Inc. <Norman.Walsh@Sun.COM>

This document is also available in these non-normative formats: XML

Abstract

This document is a shell where ideas, points of consensus, and early draft text is being collected. It does not necessarily represent the consensus of the Working Group.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

More boilerplate goes here…

1 Introduction

2 Pipeline Concepts

2.1 Components
2.2 Inputs and Outputs
2.3 Parameters
2.4 Component Graph

3 Language Constructs

3.1 Pipeline
3.2 For-Each
3.3 Viewport
3.4 Choose
3.5 Try/Catch
3.6 Other Components

4 Syntax

4.1 Overview

4.1.1 Associating Documents with Ports
4.1.2 Scoping of Names
4.1.3 Syntactic Shortcuts

4.2 Pipeline Vocabulary

4.2.1 p:pipeline Element
4.2.2 p:declare-input Element
4.2.3 p:declare-output Element
4.2.4 p.step Element
4.2.5 p.input Element
4.2.6 p:param Element
4.2.7 p:import-param Element
4.2.8 p:for-each Element
4.2.9 p:viewport Element
4.2.10 p:choose/p:when/p:otherwise Elements
4.2.11 p:group Element
4.2.12 p:try/p:catch Elements
4.2.13 p:declare-component Element
4.2.14 p:pipeline-library Element
4.2.15 p:import Element

5 Errors

5.1 Static Errors
5.2 Dynamic Errors

1 Introduction

An XML Pipeline describes a sequence of operations to be performed on a collection of input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output. Components in the pipeline may read or write non-XML resources as well.

Each operation in a pipeline is performed by a component. Like pipelines, components take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a component come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other components in the pipeline. The outputs from a component are either consumed by other components or are outputs of the pipeline as a whole. (Outputs may also be ignored.)

This specification defines a standard component library, Appendix D, Standard Component Library . Pipeline implementations may support additional components as well.

Figure 1, “A simple, linear XInclude/Validate pipeline” is a graphical representation of a simple pipeline that performs XInclude processing and validation on a document.

Figure 1. A simple, linear XInclude/Validate pipeline

This is a pipeline that consists of two components, XInclude and Validate. The pipeline itself has two inputs, “document” and “schema”. How these inputs are connected to XML documents is implementation-defined. The XInclude component reads the pipeline input “Document” and produces a result document. The Validate component reads the pipeline input “Schema” and the output from the XInclude component and produces a result document. The result of the validation is the result of the pipeline, “Result Document”. How pipeline outputs are connected to XML documents is implementation-defined.

Figure 2, “A transform and serialize pipeline” is a more complex example.

Figure 2. A transform and serialize pipeline

The heart of this example is the conditional. The standard “choose” component evaluates an XPath expression over a test document. If the effective boolean value of the expression is true, then that branch of the pipeline is evaluated. If no expressions are true, then the “otherwise” branch is evaluated.

Note that the “Serialize” and “FO to PDF” components produce no output, consequently this pipeline produces no output. This pipeline transforms the input document with the input stylesheet and, if the result is an XSL-FO document, generates PDF. If the result is not XSL-FO, the pipeline assumes that it is XHTML and runs Tidy over it before serializing it.

2 Pipeline Concepts

[Definition: A pipeline is an acyclic, directed graph of components connected together by inputs and outputs.] A pipeline is itself a component and must satisfy the constraints on components.

The result of evaluating a pipeline is the result of evaluating the components that it contains. A pipeline must behave as if it evaluated each component each time it occurs. Unless otherwise indicated, implementations must not assume that components are functional (that is, that their output depends only on their explicit inputs and parameters).

[Definition: A subpipeline is any collection of connected components.] The distinction between pipelines and subpipelines is simply that a pipeline is a component and can stand alone. “Subpipeline” is just a convenient name for a user-specified collection of components that occurs inside another component and has no independent identity.

2.1 Components

Components are the basic computational units of a pipeline. [Definition: A component is a unit of XML processing, such as XInclude or transformation.] Components may perform arbitrary amounts of computation but they are indivisible from the point of view of the pipeline that instantiates them.

Components have “ports” into which inputs and outputs are connected. Each component has a number of input ports and a number of output ports, all with unique names. A component may have zero input ports and/or zero output ports. (All components have a standard port for reporting errors that does not have to be, and cannot be, declared.) Components may have an arbitrary number of parameters .

All of the input ports of a component must be connected to inputs. It is a static error if a component has an input port which is not connected. Unconnected output ports are allowed, any documents produced on those ports are simply discarded.

[Definition: The signature of a component is the set of inputs, outputs, and parameters that it is declared to accept.] [Definition: The instantiation of a component matches its signature if and only if it specifies an input for each declared input and it specifies no inputs that are not declared, it specifies no outputs that are not declared, it specifies a parameter for each parameter that is declared to be required, and it specifies no parameters that are not declared.] In other words, every input and required parameter must be specified and only inputs, outputs, and parameters that are declared may be specified. Outputs and optional parameters do not have to be specified.

2.2 Inputs and Outputs

Although components are free to read and write non-XML resources, what flows between components as inputs and outputs are exclusively XML documents or sequences of XML documents. Each XML document (or document in a sequence) must be a well formed [ XML 1.0 ] or [ XML 1.1 ] document. The inputs and outputs may be implemented as sequences of characters, events, or object models, or any other representation the implementation chooses.

Editorial Note

Is support for XML 1.1 optional?

It is a dynamic error if a non-XML resource is produced on a component output or arrives on a component input.

Editorial Note

What about the cases where it's impractical to test for this error?

An implementation may make it possible for a component to produce non-XML output, as the final components in Figure 2, “A transform and serialize pipeline” demonstrate, but those results cannot flow through the pipeline. Similarly, one can imagine a component that takes no inputs, reads a non-XML file from a URI, and produces an XML output. But the non-XML file cannot be an input to the component or pipeline.

2.3 Parameters

[Definition: A parameter is a QName/value pair.] The value of a parameter must be a string. If a document, node, or other value is given, its string value is computed and that string is used.

2.4 Component Graph

[Definition: The components of a pipeline are the nodes of a component graph. The inputs and outputs of the components are the arcs of that graph.] Consider two components in such a graph, “component A” and “component B”. [Definition: Components A and B are connected if any output from one is connected to any input of the other, either directly or indirectly.]

With respect to connected components, we can speak of one component being either before or after another. [Definition: Component A is before component B if component B is a subpipeline of component A, either directly or indirectly, or if any output from component A is connected to any input of component B, either directly or indirectly.] [Definition: after is the converse of before.]

It is static error if a component is either before or after itself. In other words, the component graph must be acyclic.

3 Language Constructs

This section describes the core language constructs of XProc.

3.1 Pipeline

A pipeline is a user-defined pipeline component. It has a number of declared input and output ports and parameters. Viewed from the outside, it is a black box which performs some calculation on the inputs and produces the outputs. From the pipeline author's perspective, the computation performed by the pipeline is described in terms of a subpipeline which reads the pipeline's inputs and produce the pipeline's outputs.

For example, a pipeline might accept a document and a stylesheet as input; perform XInclude, validation, and transformation over its inputs; and produce a sequence of formatted documents as its output.

3.2 For-Each

A for-each component processes a sequence of documents, applying a subpipeline to each individual document in turn. The result of the for-each is the aggregation of the results produced by processing each individual document. If the for-each subpipeline declares multiple outputs, each output is the aggregation of the results produced on that output by each iteration.

For example, a for-each might accept a sequence of DocBook chapters as its input, process each chapter in turn with XSLT, and produce a sequence of formatted chapters as its output.

3.3 Viewport

A viewport component processes a single document, applying a subpipeline to one or more subsections of the document. The result of the viewport is a copy of the original document with the selected subsections replaced by the results of applying the subpipeline to them.

For example, a viewport might accept an XHTML document as its input, apply encryption to selected div elements within that document, and return an XHTML document that is the same as the original except that each selected div has been replaced by its encrypted result.

3.4 Choose

A choose component selects exactly one of a set of possible subpipeline s based on the evaluation of XPath expressions. If no expressions evaluate to “true”, a default subpipeline is selected. After a subpipeline is selected, it is evaluated as if only it had been present. The result of the choose is the result of the selected subpipeline .

For example, a choose might test a schema and apply XML Schema validation to an input document if the schema is an XML Schema document, apply RELAX NG validation if the schema is a RELAX NG grammar, or perform no validation otherwise.

Each subpipeline is associated with a separate XPath expression that is evaluated in the context of a document. The context document can be different for different XPath expressions.

In order to ensure that the result of the choose is consistent irrespective of the subpipeline chosen, each subpipeline must declare the same number of outputs with the same names. It is a static error if two subpipelines in a choose declare different outputs.

It is a dynamic error if no subpipeline is selected by the choose.

3.5 Try/Catch

A try component isolates a subpipeline , preventing any errors that arise within it from being exposed to the rest of the pipeline. A try component begins with two subpipelines: an initial subpipeline and a catch subpipeline. It evaluates the initial subpipeline and, if no errors occur, the results of that pipeline are the results of the component. However, if any errors occur, it abandons the first subpipeline, discarding any output that it may have generated, and evaluates the catch subpipeline. In this case, the results of the catch subpipeline are the results of the try component.

For example, a pipeline might attempt to process a document by dispatching it to some web service. If the web service succeeds, then those results are passed to the rest of the pipeline. However, if the web service cannot be contacted or reports an error, the catch component can provide some sort of default for the rest of the pipeline.

In order to ensure that the result of the try is consistent irrespective of whether the try subpipeline provides its output or the catch subpipeline does, both the try and catch subpipeline must declare the same number of outputs with the same names. It is a static error if the two subpipelines declare different outputs.

In order to support corrective action in the catch subpipeline, components inside the catch have access to the (aggregation of the) error output of the components that were in the try subpipeline.

Note

In evaluating the try subpipeline, failure of one component may cause other components to fail. In addition, some components that fail may not produce output on their error ports and some components that succeeded may produce such output. This pipeline language places no constraints on the order of error messages provided to the catch subpipeline, nor does it attempt to gaurantee that such output will be avialable in all cases.

The error documents that appear should conform to Appendix C, The Error Vocabulary .

3.6 Other Components

A pipeline document may declare additional components. These may be implementation-defined components or may be defined through some implementation-dependent extension mechanism. Each declared component must have a name and a signature . It is a static error if a pipeline refers to a component that is not recognized by the processor.

4 Syntax

This section describes the syntactic elements necessary to instantiate a pipeline.

4.1 Overview

At the highest level, a pipeline is a collection of steps. Each step has a unique name and instantiates a particular component. The component has a signature and that signature has to be satisified by the inputs and parameters specified on the step. Recall that a pipeline is not required to consume all the outputs of a component, but it must identify all the inputs and required parameters.

Components that contain subpipeline s are represented naturally using XML hierarchy by placing steps inside other steps.

4.1.1 Associating Documents with Ports

A step can bind a document or a sequence of documents to a port in three ways: by source, by URI, or by providing it “here”, as the content of the element establishing the binding. A document must be specified in exactly one of these ways.

Specified by URI

[Definition: A document is specified by URI if it refers to it with a URI.] The href attribute is used for this purpose.

In this example, the input to the Identity step named “otherstep” comes from “http://example.com/input.xml”.

<p:step name="otherstep" component="p:identity">
  <p:input port="document" href="http://example.com/input.xml"/>
</p:step>

It is a dynamic error if the processor attempts to retrieve the specified URI and fails. (For example, if the resource does not exist or is not accessible with the users authentication credentials.)

Specified by source

[Definition: A document is specified by source if it refers to a specific port on another step.] The source attribute is used for this purpose. The specified port must either be declared on some ancestor or it must be an output port of some other step. In some contexts there are additional constraints on the step that can be selected, for example, that it must be (or must not be) a descendant of the step on which the binding occurs.

[Definition: A source specification identifies a specific port with the name of the step on which that port occurs and the name of the port exposed by the component that that step instantiates.] Syntactically, this is achieved with a compound name of the form “ stepname ! portname ”.

In this example, the input to the XInclude step named “expand” comes from the “result” port of the step named “otherstep”.

<p:step name="expand" component="p:xinclude">
  <p:input port="document" source="otherstep!result"/>
</p:step>

The output port of the XInclude component is named “result”, so if another step wishes to take the output of the “expand” step as one of its inputs, it can refer to it with the compound name “expand!result”.

As a special case, if the “ stepname ” is absent, the declared ports of any ancestor steps are considered. The corresponding port of the first such ancestor that declares a port with the specified name is selected.

[Definition: A source specification identifies a source if the specified port is either a declared input on some ancestor or an output of some other step.] [Definition: A source specification identifies a sink if the specified port is a declared output on some ancestor or an input of some other step.]

It is a static error if the specified port does not exist.

Specified by here document

[Definition: An document is specified by here document if it is contained in the body of the element that binds it.]

In this example, the stylesheet input to the XSLT step named “xform” comes from the content of the input element.

<p:step name="xform" component="p:xslt">
  <p:input port="document" source="expand!result"/>
  <p:input port="stylesheet">
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    version="1.0">
      ...
    </xsl:stylesheet>
  </p:input>
</p:step>

Here documents are considered “quoted”, they are not interpolated or available to the pipeline processor in any way except as a document flowing through the pipeline.

4.1.2 Scoping of Names

The scope of a port name is the component on which it is defined. The names of all input and output ports on any component must be unique.

The scope of step names is the component graph .

Editorial Note

Should we say that components have names and remove steps from the equation all together?

4.1.3 Syntactic Shortcuts

In order for user-defined components (such as p:for-each and p:choose) to function as “black boxes” the way other components do, it is necessary to make sure that they are wholly self-contained:

<p:pipeline xmlns:p="http://www.w3.org/2006/08/pipeline">
<p:declare-input port="document"/>
<p:declare-parameter name="makeHTML" required="yes"/>

<!-- for the sake of convenience, we assume these steps take no
     inputs and produce a single output on a port named "result" -->
<p:step name="gen-fo" component="ex:generate-fo-stylesheet"/>
<p:step name="gen-html" component="ex:generate-html-stylesheet"/>

<p:choose name="choose-result">
  <p:declare-input port="document" source="!document"/>
  <p:declare-input port="fo-style" source="gen-fo!result"/>
  <p:declare-input port="html-style" source="gen-html!result"/>
  <p:declare-param name="makeHTML" select="$makeHTML"/>

  <p:when test="$makeHTML = '1'">
    <p:step name="makeHTML" component="p:xslt">
      <p:input port="document" source="chose-result!document"/>
      <p:input port="stylesheet" source="choose-result!html-style"/>
    </p:step>
    <p:step name="writeHTML" component="p:serialize">
      <p:input port="document" source="makeHTML!result"/>
    </p:step>
  </p:when>

  <p:otherwise>
    <p:step name="makeFO" component="p:xslt">
      <p:input port="document" source="chose-result!document"/>
      <p:input port="stylesheet" source="choose-result!fo-style"/>
    </p:step>
    <p:step name="writePDF" component="p:fo-to-pdf">
      <p:input port="document" source="makeFO!result"/>
    </p:step>
  </p:otherwise>
</p:choose>

</p:pipeline>

However, in practice, forcing authors to declare all of the inputs and parameters to each component is tedious and error-prone. What's more, the pipeline processor can actually determine what additional declarations are necessary simply by examining the source declarations on the steps contained within the step.

Therefore, we allow the syntactic shortcut of simply referring to in-scope sources and parameters directly.

<p:pipeline xmlns:p="http://www.w3.org/2006/08/pipeline">
<p:declare-input port="document"/>
<p:declare-parameter name="makeHTML" required="yes"/>

<!-- for the sake of convenience, we assume these steps take no
     inputs and produce a single output on a port named "result" -->
<p:step name="gen-fo" component="ex:generate-fo-stylesheet"/>
<p:step name="gen-html" component="ex:generate-html-stylesheet"/>

<p:choose name="choose-result">
  <p:when test="$makeHTML = '1'">
    <p:step name="makeHTML" component="p:xslt">
      <p:input port="document" source="!document"/>
      <p:input port="stylesheet" source="gen-html!result"/>
    </p:step>
    <p:step name="writeHTML" component="p:serialize">
      <p:input port="document" source="makeHTML!result"/>
    </p:step>
  </p:when>

  <p:otherwise>
    <p:step name="makeFO" component="p:xslt">
      <p:input port="document" source="!document"/>
      <p:input port="stylesheet" source="gen-fo!result"/>
    </p:step>
    <p:step name="writePDF" component="p:fo-to-pdf">
      <p:input port="document" source="makeFO!result"/>
    </p:step>
  </p:otherwise>
</p:choose>

</p:pipeline>

These shortcuts are purely syntactic. The processor is required to behave as if appropriate, unique declarations with the corresponding remapping of names had been present.

4.2 Pipeline Vocabulary

This section describes the XML vocabulary that instantiates a pipeline.

4.2.1 p:pipeline Element

A p:pipeline instantiates a pipeline . It declares the inputs, outputs, and parameters that the pipeline exposes and contains the subpipeline that constitutes its definition.

<p:pipeline name? = QName> (p:declare-input*, p:declare-output*, p:declare-parameter*, p:import*, p:declare-component*, subpipeline) </p:pipeline>

If specified, the name must be unique across all available pipelines. If a p:pipeline occurs as the child of a p:pipeline-library element, it must be named.

Example 1. A Sample Pipeline Document

<p:pipeline name="buildspec">
  <p:declare-input port="document"/>
  <p:declare-input port="stylesheet"/>
  <p:declare-output port="result"/>
  <p:declare-parameter name="validate"/>
  …
</p:pipeline>

4.2.2 p:declare-input Element

A p:declare-input declares an input port.

<p:declare-input port = QName select? = xpath expression sequence? = yes|no />

Editorial Note

The preceding syntax summary doesn't show any of the allowed binding attributes.

The port attribute defines the name of the port. It is a static error two define two ports with the same name.

The declaration may be accompanied by a binding (or default binding) for the input. This binding can be accomplished by source , by URI , or by here document . If a by source binding is used, the port selected must be an output port on an step which is not a descendant of the step on which the p:declare-input appears or it must be a port declared with p:declare-output on some ancestor of the step.

The select expression, if specified, applies the specified XPath select expression to the document(s) that are read. Each matching node or set of nodes is wrapped in a document and provided to the input port. See Section 4.2.5, “p.input Element”.

An input declaration can indicate if a sequence of documents is allowed to appear on the declared port. If sequence is specified with the value “yes”, then a sequence is allowed. If the sequence is not specified, or has the value “no”, then it is a dynamic error for a sequence of more than one document to appear on the declared port.

4.2.3 p:declare-output Element

A p:declare-output identifies an output port.

<p:declare-output port = QName sequence? = yes|no />

Editorial Note

The preceding syntax summary doesn't show any of the allowed binding attributes.

The port attribute defines the name of the port. It is a static error to declare two ports with the same name.

The declaration must be accompanied by a binding for the output. This binding can be accomplished by source , by URI , or by here document . If a by source binding is used, the port selected must be an output port on an step which is a descendant of the step on which the p:declare-output appears.

An output declaration can indicate if a sequence of documents is allowed to appear on the declared port. If sequence is specified with the value “yes”, then a sequence is allowed. If the sequence is not specified, or has the value “no”, then it is a dynamic error if the component produces a sequence of more than one document on the declared port.

4.2.4 p.step Element

A p:step instantiates a component in a pipeline.

<p:step component = QName name = QName> (p:input*, p:import-param*, p:parameter*) </p:step>

The component attribute identifies the component to be instantiated. It is a static error if the name is not unique in the current scope, if the specified component is not known to the processor, or if the specified inputs, outputs, and parameters do not match the component's signature .

4.2.5 p.input Element

A p:input identifies input for a component.

Inputs identify their source in exactly one of three ways, by source :

<p:input port = NCName source = source specification select? = xpath expression />

by URI:

<p:input port = NCName href = URI select? = xpath expression />

or by here document :

<p:input port = NCName select? = xpath expression> heredocument </p:input>

It is a static error if more than one of source, uri or here document is specified.

The port attribute identifies the input port of the component that will read from the specified source. It is a static error if the name given does not match the name of an input port for the component or if more than one input is specified for any given port.

<p:input port="document" href="http://example.org/input.html"/>

provides a single document, but

<p:input port="document" href="http://example.org/input.html" select="//html:div"/>

provides a sequence of zero or more documents, one for each matching html:div in http://example.org/input.html.

<p:input port="document" source="stepname!portname" select="//html:div"/>

provides a sequence of zero or more documents, one for each matching html:div in the document (or each of the documents) that is read from the portname port of the step named stepname.

4.2.6 p:param Element

A p:param associates a particular value with a parameter.

<p:param name = QName select? = xpath expression />

The specified XPath expression is evaluated and its string value becomes the value of the parameter.

The value of the parameter may also be specified as the content of the p:param element.

<p:param name = QName> any content </p:param>

The string value of the content becomes the value of the parameter.

4.2.7 p:import-param Element

An p:import-param provides a set of in-scope parameters to a component.

<p:import-param name = tokens />

All in-scope parameters which match the name are made available to the component as if they had been specified with individual p:param elements.

The name attribute must be a single asterisk (*), a QName, or a string of the form *: NCName or NCName :*.

4.2.8 p:for-each Element

A p:for-each instantiates a for-each .

<p:for-each name = NCName> (p:declare-input, p:declare-output*, subpipeline) </p:for-each>

Exactly one input must be declared and it must include a binding for the port it declares. If outputs are declared, they must also include a binding. The processor will provide each document read through that binding to the subpipeline that the p:for-each contains, one at a time. For each iteration, the processor will collect any outputs that appear on the declared outputs and aggregate them together. The result of the p:for-each is that set of aggregated outputs.

Example 2, “A Sample For-Each” shows an example of a p:for-each in action.

Example 2. A Sample For-Each

<p:for-each name="chapters">
  <p:declare-input port="chap" href="http://example.org/docbook.xml" select="//chapter"/>
  <p:declare-output port="html" source="xform-to-html!result"/>
  <p:declare-output port="fo" source="xform-to-fo!result"/>
  <p:step name="xform-to-fo" component="p:xslt">
    <p:input name="document" source="chapters!chap"/>
    <p:input name="stylesheet" href="fo/docbook.xsl"/>
  </:step>
  <p:step name="xform-to-html" component="p:xslt">
    <p:input name="document" source="chapters!chap"/>
    <p:input name="stylesheet" href="html/docbook.xsl"/>
  </:step>
</p:for-each>

The //chapters of the DocBook document are selected. Each chapter is transformed into HTML and XSL FO using an XSLT step. The resulting HTML and FO documents are aggregated together and appear on the html and fo ports, respectively, of the chapters step.

It is a static error if there is not exactly one p:declare-input child of p:for-each, if the declared input does not specify a binding, or if it specifies a binding to a step inside the p:for-each.

It is a static error if any declared output does not specify a binding.

4.2.9 p:viewport Element

A p:viewport instantiates a viewport .

<p:viewport name = NCName> (p:declare-input, p:declare-output, subpipeline) </p:viewport>

Exactly one input must be declared and it must include both a binding and a select expression. Exactly one output must be declared and it must include a binding. The processor will provide a document that contains each set of nodes that matches the specified select expression through the input binding to the subpipeline that the p:viewport contains, one at a time. What appears on the output from the p:viewport will be a copy of the input document except that where each matching node or set of nodes appears, the result of applying the subpipeline to those nodes will be output.

It is a dynamic error if the input source is a sequence of more than one document or if the output from any iteration is a sequence of more than one document.

Example 2, “A Sample For-Each” shows an example of a p:for-each in action.

Example 3. A Sample Viewport

<p:viewport name="encdivs">
  <p:declare-input port="div" source="step!port" select="//h:div[@class='enc']"/>
  <p:declare-output port="html" source="encrypt!result"/>
  <p:step name="encrypt" component="p:encrypt-document">
    <p:input name="document" source="encdivs!div"/>
  </:step>
</p:viewport>

The //h:div[@class='enc']s of the document are selected. Each selected div is encrypted and the resulting encrypted version replaces the original div. The result of the step is a copy of the input document with each selected div encrypted.

It is a static error if there is not exactly one p:declare-input child and exactly one p:declare-output child of p:viewport, if the declared ports do not specify a binding, or if the input port specifies a binding to a step inside the p:viewport.

4.2.10 p:choose/p:when/p:otherwise Elements

A p:choose instantiates a choose .

<p:choose name = NCName source? = source specification> (p:when*, p:otherwise?) </p:choose>

Where p:when specifies a conditional branch.

<p:when test = expression source? = source specification> (p:declare-output*, subpipeline) </p:when>

And p:otherwise specifies the default branch.

<p:otherwise> (p:declare-output*, subpipeline) </p:otherwise>

The test expression is evaluated for each p:when in turn. The first p:when for which the expression evaluates to “true” is selected, all other p:when elements (and the p:otherwise) are ignored. If no p:when has a test expression which evaluates to “true”, then the p:otherwise is selected and all p:when elements are ignored.

The context for each test expression is the document specified by the source attribute on the corresponding p:when. If no source attribute is specified on the p:when, then the context is the document specified by the source attribute on the p:choose. It is a static error if no source is specified in either place.

All of the p:when branches and the p:otherwise must declare the same number of output ports with the same names. It is a static error if they do not.

The result of the p:choose is the result of the selected branch. It is a dynamic error if no p:when is selected and no p:otherwise is specified.

4.2.11 p:group Element

A p:group is a wrapper for a subpipeline .

<p:group name = NCName> (p:declare-output*, subpipeline) </p:group>

The result of a p:group is its declared outputs.

4.2.12 p:try/p:catch Elements

A p:try instantiates a try/catch .

<p:try name = NCName> (p:group, p:catch) </p:try>

Where p:group is any pipeline group and p:catch surrounds the error recovery behavior:

<p:catch> (p:declare-output*, subpipeline) </p:catch>

The p:try component evaluates the p:group. If the group evaluates without signaling an error, that is the result of the p:try component.

However, if any component in that group signals an error, then the group is abandoned (any accumulated output is discarded) and the p:catch subpipeline is evaluated. In that case, the result of the p:catch is the result of the p:try component.

If any component in the p:catch subpipeline signals an error, the p:try fails.

Within the p:catch block, the special input port !#error is defined. The document(s) on that port constitute the error messages received from the component which failed. Note that the order of the messages on that port is undefined. Note also that the failure of one component may cause others to fail and the component which signaled the error may not be the only or even the first component that failed.

Both the p:group and the p:catch must declare the same number of output ports with the same names. It is a static error if they do not.

4.2.13 p:declare-component Element

A p:declare-component provides the name and signature of an implementation-dependent component. It declares the inputs, outputs, and parameters of the component.

<p:declare-component name = QName> (p:declare-input*, p:declare-output*, p:declare-parameter*) </p:declare-component>

Editorial Note

We need to make some provision for identifying an external binding, even if it's implementation defined. We'll need some sort of mechanism for declaring multiple implementations too.

It is a static error if a pipeline step refers to a component that is not recognized by the processor. It is not an error to declare such a component, only to use it.

The input and parameter declarations of a p:declare-component may use the name “*” to indicate that the component accepts an arbitrary number of inputs, outputs, or parameters.

4.2.14 p:pipeline-library Element

A p:pipeline-library contains one or more component declarations and/or pipelines. It declares components that pipelines can import.

<p:pipeline-library> (p:import*, p:declare-component*, p:pipeline*) </p:pipeline-library>

Example 4. A Sample Pipeline Library

<p:pipeline-library>
  <p:declare-component name="extension-component">…</p:declare-component>
  <p:pipeline name="xinclude-and-validate">…</p:pipeline>
  <p:pipeline name="validate-and-transform">…</p:pipeline>
  …
</p:pipeline>

4.2.15 p:import Element

An p:import loads a pipeline or pipeline library, making it available in the current context.

<p:import href = URI />

An import statement loads the specified URI and makes any pipelines declared within it available to the current pipeline.

It is a dynamic error if the URI cannot be retrieved or if, once retrieved, it does not point to a p:pipeline-library or p:pipeline. If it points to a p:pipeline, it is a dynamic error if the pipeline does not have a name.

5 Errors

Errors in a pipeline can be divided into two classes: static errors and dynamic errors .

5.1 Static Errors

[Definition: A static error is one which can be detected before pipeline evaluation is even attempted.] Examples of static errors include cycles, incorrect specification of inputs and outputs, and reference to unknown components.

Static errors are fatal and must be detected before any components are evaluated.

5.2 Dynamic Errors

A [Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, components which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space).

If a component fails due to a dynamic error, failure propogates upwards until either a try is encountered or the entire pipeline fails. In other words, outside of a try , component failure causes the entire pipeline to fail.

A References

[XML Core Req] XML Processing Model Requirements Dmitry Lenkov, Norman Walsh, editors. W3C Working Group Note 05 April 2004

[Infoset] XML Information Set (Second Edition) John Cowan, Richard Tobin, editors. W3C Working Group Note 04 February 2004.

[XML 1.0] Extensible Markup Language (XML) 1.0 (Fourth Edition) Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006.

[XML 1.1] Extensible Markup Language (XML) 1.1 (Second Edition) Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006.

B Glossary

pipeline

A pipeline is an acyclic, directed graph of components connected together by inputs and outputs.

Note: defined but never referenced.

subpipeline

A subpipeline is any collection of connected components.

component

A component is a unit of XML processing, such as XInclude or transformation.

signature

The signature of a component is the set of inputs, outputs, and parameters that it is declared to accept.

matches

The instantiation of a component matches its signature if and only if it specifies an input for each declared input and it specifies no inputs that are not declared, it specifies no outputs that are not declared, it specifies a parameter for each parameter that is declared to be required, and it specifies no parameters that are not declared.

parameter

A parameter is a QName/value pair.

component graph

The components of a pipeline are the nodes of a component graph . The inputs and outputs of the components are the arcs of that graph.

connected

Components A and B are connected if any output from one is connected to any input of the other, either directly or indirectly.

before

Component A is before component B if component B is a subpipeline of component A, either directly or indirectly, or if any output from component A is connected to any input of component B, either directly or indirectly.

after

after is the converse of before .

Note: defined but never referenced.

by URI

A document is specified by URI if it refers to it with a URI.

by source

A document is specified by source if it refers to a specific port on another step.

source specification

A source specification identifies a specific port with the name of the step on which that port occurs and the name of the port exposed by the component that that step instantiates.

Note: defined but never referenced.

source

A source specification identifies a source if the specified port is either a declared input on some ancestor or an output of some other step.

Note: defined but never referenced.

sink

A source specification identifies a sink if the specified port is a declared output on some ancestor or an input of some other step.

Note: defined but never referenced.

by here document

An document is specified by here document if it is contained in the body of the element that binds it.

static error

A static error is one which can be detected before pipeline evaluation is even attempted.

dynamic error

A dynamic error is one which occurs while a pipeline is being evaluated.

C The Error Vocabulary

This appendix describes the XML vocabulary that components are expected to use to identify messages on their error ports.

T.B.D.

D Standard Component Library

This appendix describes the standard components that must be supported by any conforming processor.

T.B.D.

XProc: An XML Pipeline Language

Editor's Working Draft 19 Aug 2006

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Pipeline Concepts

2.1 Components

2.2 Inputs and Outputs

2.3 Parameters

2.4 Component Graph

3 Language Constructs

3.1 Pipeline

3.2 For-Each

3.3 Viewport

3.4 Choose

3.5 Try/Catch

3.6 Other Components

4 Syntax

4.1 Overview

4.1.1 Associating Documents with Ports

4.1.2 Scoping of Names

4.1.3 Syntactic Shortcuts

4.2 Pipeline Vocabulary

4.2.1 p:pipeline Element

4.2.2 p:declare-input Element

4.2.3 p:declare-output Element

4.2.4 p.step Element

4.2.5 p.input Element

4.2.6 p:param Element

4.2.7 p:import-param Element

4.2.8 p:for-each Element

4.2.9 p:viewport Element

4.2.10 p:choose/p:when/p:otherwise Elements

4.2.11 p:group Element

4.2.12 p:try/p:catch Elements

4.2.13 p:declare-component Element

4.2.14 p:pipeline-library Element

4.2.15 p:import Element

5 Errors

5.1 Static Errors

5.2 Dynamic Errors

A References

B Glossary

C The Error Vocabulary

D Standard Component Library