XProc Language V2 Requirements and Use Cases (DRAFT)

W3C Working Draft 27 August 2013

This version:
Latest version:
Editor:
James Fuller, Invited Expert <jim.fuller@webcomposite.com>

This document is also available in these non-normative formats: XML.


Abstract

This document contains requirements for the development of version 2.0 of the XML Processing Mode and Language (XProc).

Status of this Document

This document is an editors' copy that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C pubications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This W3C Recommendation for version 1.0 has been produced as W3C XProc as part of the XML Activity, following the procedures set out for the W3C Process. The goals of the XML Processing Mode Working Group are discussed in its charter.

Comments on this document shoud be sent to the W3C mailing ist pubic-xml-processing-mode-comments@w3.org (archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Poicy. The group does not expect this document to become a W3C Recommendation. This document is informative only. W3C maintains a pubic list of any patent disclosures made in connection with the deliverabes of the group; that page aso includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disc;ose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Terminology
3 Design Principles
4 Requirements
    4.1 Simplify parameters
    4.2 Non XML document processing
    4.3 Abandon support for XPath 1.0
    4.4 Explicit Flow Handling
    4.5 Fully general XDM values
    4.6 Remove the concept of "non-step wrapper"
    4.7 Allow AVT
    4.8 Document metadata
    4.9 Steps with varying numbers of inputs and outputs
    4.10 Improved status/debugging information
    4.11 Extension Libraries
    4.12 Enhance Try / Catch step
    4.13 Syntactic simplifications
5 Use cases
    5.1 Making parameters easier
    5.2 Working with JSON
    5.3 Working with Turtle
    5.4 Working with JSON-LD
    5.5 Website Publishing; Working with Web Assets
    5.6 EPUB
6 Logged Issues

Appendices

A References
B Contributors


1 Introduction

This document is heavily infuenced by the 'XProc 1.0 Solutions Note'
Editorial note 
WE HAVE NOT PUBLISHED THIS NOTE YET
which anayzed how XProc v1.0 satisfied the requirements and use cases outined in "XML Processing Mode Requirements and Use Cases" document.

2 Terminology

[Definition: XML Pipeline]

An XML Pipeline is a conceptualization of the flow of a configuration of steps and their parameters. The XML Pipeine defines a process in terms of order, dependencies, or iteration of steps over XML information sets.

3 Design Principles

The design principles described in this document are requirements whose compliance with is an overal goal for the specification. It is not necessariy the case that a specific feature meets the requirement. Instead, it shoud be viewed that the whole set of specifications related to this requirements document meet that overall goal specified in the design principles.

Improving ease of use

Provide syntactic changes that will improve the usability, comprehension and ease with which to create and develop XML Pipelines.

Increase the scope for working with non XML content

Provide facilities for allowing both XML and non-XML data to flow through a pipeline.

Address known deficiencies and shortcomings in the language

Review existing bugzilla list and address catastrophic, critical and major bugs that require fixing/amendment.

4 Requirements

4.1 Simplify parameters

Parameters as defined in v1.0 proved to be too complicated. XProc v2.0 must dramatically simplify paramaters.

Change paramaters to be more like options. Adopt the XSLT 3.0 extensions to the data model and functions & operators in XPath 3.0 that support maps.

Norm Walsh's proposal and thread

4.2 Non XML document processing

Experience has shown that real-world pipelines often involve non-XML documents. The limitation that V1.0 can only pass XML between steps has proved to be inconvenient. Several workarounds have been invented for special cases.

Providing native processing of non XML content, within a constrained scope, enables working with mixed document distributions (EPUB, json, JSON-LD, Turtle, etc).

XProv v2.0 must allow non-XML documents to pass through a pipeline.

Vojtech Toman proposal

Alex Milowski proposal

4.3 Abandon support for XPath 1.0

Supporting both XPath 1.0 and XPath 2.0 complicates the specification. In the V1.0 timeframe, it was necessary to consider implementations that might be based on XPath 1.0. That is no longer the case.

XProc v2.0 must be based on the XQuery 1.0 and XPath 2.0 Data Model or its successors.

Remove any must requirements for supporting XPath 1.0.

4.4 Explicit Flow Handling

Sometimes the flow of control in a pipeline is not manifest from the data flow analysis and sometimes arranging for the data flow analysis to manage every dependency would require great complexity.

There must be a simple mechanism for asserting that step A must run before step B, even if B has no data flow dependency on A.

Editorial note 
see Calabash's cx:depends-on

4.5 Fully general XDM values

Variables, options, and parameters must be able to hold aribtrary XDM values, including sequences and nodes.

Romain Deltour proposal

4.6 Remove the concept of "non-step wrapper"

Must remove the concept of 'Non-step wrappers' by making p:when/p:otherwise in p:choose and p:group/p:catch in p:try compound steps.

Vojtech Toman proposal

4.7 Allow AVT

Attribute value templates must be supported within option values (scope TBA).

4.8 Document metadata

Some documents have associated metadata. For example, documents have a content type. XProc v2.0 should provide a mechanism for associating arbitrary metadata with documents.

4.9 Steps with varying numbers of inputs and outputs

Some pipeline steps (split, join, nvdl, eval) don't naturally have a fixed number of inputs and outputs. It should be possible to write pipelines such that the number of inputs and outputs varies.

4.10 Improved status/debugging information

Pipelines should be provided with a simple mechanism for writing status and debug messages.

4.11 Extension Libraries

Pipelines should be able to import external function libraries and be able to invoke them from xpath (scope TBA).

Alex Milowski thread

Norm Walsh related

4.12 Enhance Try / Catch step

p:catch should be able to trap specific error codes.

4.13 Syntactic simplifications

Editorial note 
Review F.4.3 Verbosity

The following list of enhancements should be possible.

  1. <p:pipe step="name"/> should bind to the primary output port of the step named 'name'. It is an error if there is no such primary output port.

  2. <p:pipe port="secondary"/> should bind to the 'secondary' port of the step on which the default readable port occurs. It is an error if there is no such step.

  3. <p:input port="portname"/> should be a shortcut for an empty binding.

  4. <p:input port="portname" href="..."/> should be a shortcut for a document binding to the URI specified in href.

  5. No non-default outputs, all standard steps should have at least one primary output port.

  6. Allow data types on variables, options, and parameters it should be possible to specify the data type of variables, options, and parameters.

  7. Allow p:inine to be optional

  8. Provide a select attribute to p:for-each

  9. <p:input port="parameters"/> as a shorthand for <p:input port="parameters"><p:empty/></p:input>

5 Use cases

This section contains a set of use cases that should be enabled through the fulfillment of v2.0 requirements. They are provided so that we may trace requirements to real world usage, as well as inform v2.0 design decisions.

To aid navigation, the requirements can be mapped to the use cases of this section as follows:

RequirementUse Cases

5.1 Making parameters easier

(source: )

5.2 Working with JSON

(source: )

5.3 Working with Turtle

(source: )

5.4 Working with JSON-LD

(source: )

5.5 Website Publishing; Working with Web Assets

(source: )

5.6 EPUB

(source: )

Editorial note 
Need to decide what unsatisfied use cases are to be included

6 Logged Issues

Editorial note 
Need to review bugzilla list of issues for inclusion

A References

xm-core-wg
XML Processing Mode Requirements. Dmitry Lenkov, Norman Wash, editors. W3C Working Group Note 05 Apri 2004 (See http://www.w3.org/TR/proc-mode-req/.)
xm-infoset-rec
XML Information Set (Second Edition) John Cowan, Richard Tobin, editors. W3C Working Group Note 04 February 2004 (See http://www.w3.org/TR/xm-infoset/.)

B Contributors

The foowing members of the XXXX Working Group contributed to this specification as part of their requirements document effort within that working group: