This document is also available in these non-normative formats: XML
This is the requirements document for XProc V2.0.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
User and implementor experience with [XProc] has exposed a number of ways in which the XProc language could be improved. The Working Group's focus for V2.0 is on usability improvements.
The requirements in this document are divided into two groups, a set of “must” requirements and a set of “should” requirements. The Working Group feels that the requirements in the “must” category are absolutely essential. The requirements listed in the “should” category are viewed as either less critical or more speculative in nature.
The following requirements are considered “must” requirements for XProc V2.0.
Experience with parameters in XProc 1.0 reveals that they are too complicated. They often cause user confusion and introduce syntactic complexity not justified by their function. XProc v2.0 must dramatically simplify parameters, perhaps simply removing parameter ports altogether without replacing them with a new mechanism of equivalent power (and complexity).
Experience has shown that real-world pipelines often involve non-XML documents. Several workarounds have been invented for special cases. The limitation that V1.0 can only pass XML between steps makes some pipelines difficult, if not impossible, to write.
Providing the ability to allow non-XML documents to flow between steps opens up the possibility of writing simple pipelines to work with images, JSON, Turtle, EPUB, etc.
Alignment with [XQuery 3.0]/[XSLT 3.0] will keep features of XProc consistent with modern XML technologies: error handling, serialization options, [XDM] features, etc. In addition, support for XPath 1.0 no longer seems relevant; it adds complexity to the specification and is unlikely to be implemented today. XPath 1.0 support will be removed from XProc.
There are many pipelines for which the flow analysis does not provide a convenient or predictable ordering of steps. Because some steps have side effects not manifest in the pipeline, it may be necessary to ensure a particular order. This facility is not supported by XProc 1.0, but is available in implementation-defined extensions. XProc 2.0 will standardize this facility.
XProc 1.0 restricts the values of variables, options, and parameters to be only strings. This has proven to be an inconvenient limitation. XProc 2.0 will allow variables, options, and parameters to have any [XDM] value insofar as possible. XProc 2.0 will also allow the required types of variables, options, and parameters to be specified.
The syntactic sugar that allows step options to be expressed concisely as attribute values on a step is foiled whenever the value of the option must be computed by the pipeline. Allowing those options to contain XSLT-style attribute value templates (AVTs) would simplify many pipelines. Additionally, allowing AVTs in other places, such as the href attribute on p:document, will be considered.
[XSLT 3.0] introduces a feature which allows expressions in curly braces to be evaluated in element content. This feature is similar to the facility provided by the p:template step. Extending XProc to support curly braces in a manner consistent with [XSLT 3.0] will be considered.
XProc 1.0 offers relatively few default behaviors, requiring instead that pipelines specify every construct fully. User experience has demonstrated that this leads to very verbose pipelines and has been a constant source of complaint. XProc 2.0 will introduce a variety of syntactic simplifications as an aid to readability and usability, including but not limited to:
<p:pipe step="name"/> binds to the primary output port of the step named “name”.
<p:pipe port="secondary"/> binds to the “secondary” port of the step on which the default readable port occurs.
<p:input port="portname" href="..."/> is a shortcut for a nested p:document.
<p:input port="portname"/> is a shortcut for a nested p:empty.
Allow p:inline to be optional.
Allow curly brace expansion in p:inline (with an attribute to control whether or not that behavior is enabled)
Provide a select attribute on p:for-each/p:viewport
Change all steps with a single non-primary output to have a single primary output
Consider harmonizing p:viewport-source and p:iteration-source
Add an AVT value attribute to options, parameters, and variables (to be used instead of select)
Implementation experience has demonstrated that there are areas of the specification that didn't get the balance right between precision for implementors and clarity for users, for example “non-step wrappers”. The XProc 2.0 specification should attempt to resolve these problems without introducing inordinate complexity.
The 1.0 specification also defines the p:pipeline element as a syntactic shortcut for a particular form of p:declare-step. While convenient in some circumstances, it has proven to be a source of some confusion especially among new users. XProc 2.0 may remove the p:pipeline element.
Adding metadata to documents is a natural thing for pipelines to do, either for subsequent use by the pipeline or for eventual output. For example, the serialization options provided in an XSLT stylesheet could be carried forward to the eventual serialization of the result document by the pipeline. In XProc 1.0, there's no way to maintain that association. XProc 2.0 should support the ability to associate processor and user-defined metadata with documents.
While most steps have a predetermined and static number of inputs and outputs, this is not universally the case. In XProc 1.0, a putative p:eval step which could run a dynamically constructed pipeline, for example, suffers from the limitation that the signature of the p:eval step usually differs from the signature of the evaluated pipeline.
XProc 2.0 should provide a facility for supporting steps with a variable number of inputs and outputs.
XProc 1.0 provides scant support for reporting the status of a pipeline and providing aid to users attempt to debug pipelines. Implementation-defined extensions have demonstrated that some additional facilities, such as a p:message step, would be an aid to users.
XProc 2.0 will add some mechanism for reporting status messages and will consider adding additional steps and/or language features to aid in analysing the behavior of a running pipeline.
Experience with user-defined functions in XQuery and XSLT reveals that they can be a powerful addition to the language. Providing some feature that allowed users to extend the vocabulary of functions available in, for example, the test expressions on p:when elements would greatly simplify some pipelines.
Such a mechanism might take the form of the ability to load extension functions defined in, for example, XQuery, or it might include adding the ability to define functions in XProc.
Support for catching errors in XProc 1.0 is limited to a simple p:try/p:catch pair, which catches and handles all errors uniformly. To align XProc with modern languages, the try/catch mechanism will be extended to support the ability to catch specific errors and possibly with the addition of a “finally” construct.
In addition to supporting [XDM] values in variables, options, and parameters, XProc 2.0 might allow [XDM] values in more places, such as allowing p:for-each to iterate over a sequence of strings or integers.
XProc 1.0 is a specification that consists of both the language definition and the inventory of required and optional steps. Release management might be simplified by separating the language core from the vocabulary of steps and providing some sort of versioning strategy that allowed the vocabulary of steps to be revised more frequently. XProc 2.0 may be defined in more than one Rec-track specification document.
The vocabulary of steps available in XProc is extensible. Users and implementors have developed additional steps. For example, to support pipelines that produce EPUB documents or manipulate files on disk. It is worth considering which, if any, new steps should be elevated to the XProc namespace. The candidates include, but are not limited to:
p:zip and p:unzip
p:template and p:in-scope-names
Semantic web steps (p:sparql, p:rdfa, ...)?
Operating system steps (p:env, ...)?
File system steps (p:mkdir, p:copy, ...)?
The use of an optional, single p:output binding in p:viewport creates confusion for users. The binding is used both to connect the inner workings of the viewport and as the name of the output port as seen from the outside.
In addition, the fact that viewport can produce only a single result means that for some tasks, multiple passes are required, using a combination of p:viewport and p:for-each. Consider the task of changing image references in an XHTML document from .svg to .png and generating the sequence of .png images. In XProc 1.0, this requires a p:viewport and a p:for-each.
Adding an explicit p:viewport-result allows us to remove the confusion between the input and the name of the output. Allowing multiple outputs allows us to collapse the p:viewport and p:for-each logic into a single step.
<p:viewport name? = NCName match = XSLTMatchPattern> ((p:viewport-source? & p:viewport-result? & p:output* & p:log?), subpipeline) </p:viewport>
The viewport-result connects the transformation inside the viewport back into the source document over which viewport is operating. The transformed document always appears on a port named 'result'. Any other outputs are simply sequences analagous to p:for-each. It's a static error to name one of those outputs 'result'.
The base URI of a document created by the p:inline element is the base URI of the p:inline element. Specifying an xml:base attribute on the root element of the document does not help as that only applies to that element and its decendants.
Additionally, in some pipelines, it is desirable to be able to change the base URI of documents produced by other steps. No convenient mechanism exists in XProc V1.0 to satisfy these requirements.
[RFC 2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. Network Working Group, IETF, Mar 1997.
[XProc] XML: An XML Pipeline Language. Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. W3C Recommedation 11 May 2010.
[XDM] XQuery and XPath Data Model (XDM) 3.0, Norman Walsh, John Snelson, Editors. World Wide Web Consortium, 08 January 2013.
[XQuery 3.0] XQuery 3.0: An XML Query Language, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 08 January 2013.
[XSLT 3.0] XSL Transformations (XSLT) Version 3.0, Michael Kay, Editor. World Wide Web Consortium, 10 July 2012.