W3C

XProc V2.0 Requirements

W3C First Public Working Draft 5 November 2013

This Version:
http://www.w3.org/TR/2013/WD-xproc-v2-req-20131105/
Latest Version:
http://www.w3.org/TR/xproc-v2-req/
Editors:
Alex Milowski, Invited expert
James Fuller, Invited expert
Norman Walsh, MarkLogic Corporation

This document is also available in these non-normative formats: XML


Abstract

This is the requirements document for XProc V2.0.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a First Public Working Draft produced by the XML Processing Model Working Group which is part of the XML Activity.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Please send comments about this document to public-xml-processing-model-comments@w3.org (public archives are available).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Appendices

1 Introduction

User and implementor experience with [XProc] has exposed a number of ways in which the XProc language could be improved. The Working Group's focus for V2.0 is on usability improvements.

The requirements in this document are divided into two groups, a set of “must” requirements and a set of “should” requirements. The Working Group feels that the requirements in the “must” category are absolutely essential. The requirements listed in the “should” category are viewed as either less critical or more speculative in nature.

2 MUST Requirements

The following requirements are considered “must” requirements for XProc V2.0.

2.1 Simplify parameters

Experience with parameters in XProc 1.0 reveals that they are too complicated. They often cause user confusion and introduce syntactic complexity not justified by their function. XProc v2.0 must dramatically simplify parameters, perhaps simply removing parameter ports altogether without replacing them with a new mechanism of equivalent power (and complexity).

2.2 Integrate non-XML documents into pipelines

Experience has shown that real-world pipelines often involve non-XML documents. Several workarounds have been invented for special cases. The limitation that V1.0 can only pass XML between steps makes some pipelines difficult, if not impossible, to write.

Providing the ability to allow non-XML documents to flow between steps opens up the possibility of writing simple pipelines to work with images, JSON, Turtle, EPUB, etc.

2.3 Align with XPath 3.0 technologies

Alignment with [XQuery 3.0]/[XSLT 3.0] will keep features of XProc consistent with modern XML technologies: error handling, serialization options, [XDM] features, etc. In addition, support for XPath 1.0 no longer seems relevant; it adds complexity to the specification and is unlikely to be implemented today. XPath 1.0 support will be removed from XProc.

2.4 Add explicit flow handling

There are many pipelines for which the flow analysis does not provide a convenient or predictable ordering of steps. Because some steps have side effects not manifest in the pipeline, it may be necessary to ensure a particular order. This facility is not supported by XProc 1.0, but is available in implementation-defined extensions. XProc 2.0 will standardize this facility.

2.5 Allow arbitrary XDM values in variables

XProc 1.0 restricts the values of variables, options, and parameters to be only strings. This has proven to be an inconvenient limitation. XProc 2.0 will allow variables, options, and parameters to have any [XDM] value insofar as possible. XProc 2.0 will also allow the required types of variables, options, and parameters to be specified.

2.6 Allow attribute value templates

The syntactic sugar that allows step options to be expressed concisely as attribute values on a step is foiled whenever the value of the option must be computed by the pipeline. Allowing those options to contain XSLT-style attribute value templates (AVTs) would simplify many pipelines. Additionally, allowing AVTs in other places, such as the href attribute on p:document, will be considered.

[XSLT 3.0] introduces a feature which allows expressions in curly braces to be evaluated in element content. This feature is similar to the facility provided by the p:template step. Extending XProc to support curly braces in a manner consistent with [XSLT 3.0] will be considered.

2.7 Support a variety of syntactic simplifications

XProc 1.0 offers relatively few default behaviors, requiring instead that pipelines specify every construct fully. User experience has demonstrated that this leads to very verbose pipelines and has been a constant source of complaint. XProc 2.0 will introduce a variety of syntactic simplifications as an aid to readability and usability, including but not limited to:

  • <p:pipe step="name"/> binds to the primary output port of the step named “name”.

  • <p:pipe port="secondary"/> binds to the “secondary” port of the step on which the default readable port occurs.

  • <p:input port="portname" href="..."/> is a shortcut for a nested p:document.

  • <p:input port="portname"/> is a shortcut for a nested p:empty.

  • Allow p:inline to be optional.

  • Allow curly brace expansion in p:inline (with an attribute to control whether or not that behavior is enabled)

  • Provide a select attribute on p:for-each/p:viewport

  • Change all steps with a single non-primary output to have a single primary output

  • Consider harmonizing p:viewport-source and p:iteration-source

  • Add an AVT value attribute to options, parameters, and variables (to be used instead of select)

2.8 Document backwards-incompatibilities

Backwards incompatiblity is painful for users and will be avoided wherever possible. However, XProc 2.0 will introduce language features that are not backwards compatible with 1.0. The specification must document these incompatibilities.

3 SHOULD Requirements

3.1 Editorial improvements

Implementation experience has demonstrated that there are areas of the specification that didn't get the balance right between precision for implementors and clarity for users, for example “non-step wrappers”. The XProc 2.0 specification should attempt to resolve these problems without introducing inordinate complexity.

The 1.0 specification also defines the p:pipeline element as a syntactic shortcut for a particular form of p:declare-step. While convenient in some circumstances, it has proven to be a source of some confusion especially among new users. XProc 2.0 may remove the p:pipeline element.

3.2 Associate arbitrary metadata with documents

Adding metadata to documents is a natural thing for pipelines to do, either for subsequent use by the pipeline or for eventual output. For example, the serialization options provided in an XSLT stylesheet could be carried forward to the eventual serialization of the result document by the pipeline. In XProc 1.0, there's no way to maintain that association. XProc 2.0 should support the ability to associate processor and user-defined metadata with documents.

3.3 Support steps with a dynamic number of ports

While most steps have a predetermined and static number of inputs and outputs, this is not universally the case. In XProc 1.0, a putative p:eval step which could run a dynamically constructed pipeline, for example, suffers from the limitation that the signature of the p:eval step usually differs from the signature of the evaluated pipeline.

XProc 2.0 should provide a facility for supporting steps with a variable number of inputs and outputs.

3.4 Provide improved status information

XProc 1.0 provides scant support for reporting the status of a pipeline and providing aid to users attempt to debug pipelines. Implementation-defined extensions have demonstrated that some additional facilities, such as a p:message step, would be an aid to users.

XProc 2.0 will add some mechanism for reporting status messages and will consider adding additional steps and/or language features to aid in analysing the behavior of a running pipeline.

3.5 Provide a mechanism for importing user-defined functions

Experience with user-defined functions in XQuery and XSLT reveals that they can be a powerful addition to the language. Providing some feature that allowed users to extend the vocabulary of functions available in, for example, the test expressions on p:when elements would greatly simplify some pipelines.

Such a mechanism might take the form of the ability to load extension functions defined in, for example, XQuery, or it might include adding the ability to define functions in XProc.

3.6 Enhance try/catch

Support for catching errors in XProc 1.0 is limited to a simple p:try/p:catch pair, which catches and handles all errors uniformly. To align XProc with modern languages, the try/catch mechanism will be extended to support the ability to catch specific errors and possibly with the addition of a “finally” construct.

3.7 Write a primer

A new user introduction to XProc would aid adoption.

3.8 Consider using XDM everywhere

In addition to supporting [XDM] values in variables, options, and parameters, XProc 2.0 might allow [XDM] values in more places, such as allowing p:for-each to iterate over a sequence of strings or integers.

3.9 Consider dividing the specification

XProc 1.0 is a specification that consists of both the language definition and the inventory of required and optional steps. Release management might be simplified by separating the language core from the vocabulary of steps and providing some sort of versioning strategy that allowed the vocabulary of steps to be revised more frequently. XProc 2.0 may be defined in more than one Rec-track specification document.

3.10 Consider additional steps and enhancements

The vocabulary of steps available in XProc is extensible. Users and implementors have developed additional steps. For example, to support pipelines that produce EPUB documents or manipulate files on disk. It is worth considering which, if any, new steps should be elevated to the XProc namespace. The candidates include, but are not limited to:

  • p:zip and p:unzip

  • p:template and p:in-scope-names

  • p:eval

  • Semantic web steps (p:sparql, p:rdfa, ...)?

  • Operating system steps (p:env, ...)?

  • File system steps (p:mkdir, p:copy, ...)?

3.11 Simplify p:viewport and allow it to have multiple outputs

The use of an optional, single p:output binding in p:viewport creates confusion for users. The binding is used both to connect the inner workings of the viewport and as the name of the output port as seen from the outside.

In addition, the fact that viewport can produce only a single result means that for some tasks, multiple passes are required, using a combination of p:viewport and p:for-each. Consider the task of changing image references in an XHTML document from .svg to .png and generating the sequence of .png images. In XProc 1.0, this requires a p:viewport and a p:for-each.

Adding an explicit p:viewport-result allows us to remove the confusion between the input and the name of the output. Allowing multiple outputs allows us to collapse the p:viewport and p:for-each logic into a single step.

<p:viewport
  name? = NCName
  match = XSLTMatchPattern>
    ((p:viewport-source? &
      p:viewport-result? &
      p:output* &
      p:log?),
     subpipeline)
</p:viewport>

The viewport-result connects the transformation inside the viewport back into the source document over which viewport is operating. The transformed document always appears on a port named 'result'. Any other outputs are simply sequences analagous to p:for-each. It's a static error to name one of those outputs 'result'.

3.12 Provide a way to specify the base URI of a document

The base URI of a document created by the p:inline element is the base URI of the p:inline element. Specifying an xml:base attribute on the root element of the document does not help as that only applies to that element and its decendants.

Additionally, in some pipelines, it is desirable to be able to change the base URI of documents produced by other steps. No convenient mechanism exists in XProc V1.0 to satisfy these requirements.

A References

[RFC 2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. Network Working Group, IETF, Mar 1997.

[XProc] XML: An XML Pipeline Language. Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. W3C Recommedation 11 May 2010.

[XDM] XQuery and XPath Data Model (XDM) 3.0, Norman Walsh, John Snelson, Editors. World Wide Web Consortium, 08 January 2013.

[XQuery 3.0] XQuery 3.0: An XML Query Language, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 08 January 2013.

[XSLT 3.0] XSL Transformations (XSLT) Version 3.0, Michael Kay, Editor. World Wide Web Consortium, 10 July 2012.