XProc Architecture

From W3C Wiki

[Back to XprocVnext]

What flows between Steps

Should we open up the pipeline architecture to allow non-XML documents to flow through it?

With respect to other media types (see below for some possibilities), there are a number of possibilities:

  1. Allow statically
    1. At the step level (i.e. step signatures include media-types for all inputs/outputs)
    2. At whole-pipeline margins
      1. Require pipeline output media-type to match the media-type of its connected input
        1. Any non-XML output must immediately be converted to XML
        2. Any foo–foo connections are allowed
        3. Auto-shim for every possible pair
        4. Auto-shim only for other–XML and XML–other (so other1→other2 requires two shims)
  2. Allow dynamically (e.g. from p:http-request)
    1. With a static declaration of expected alternatives; anything else is an error
    2. With a pipeline fallback if all else fails, getting <c:data media-type=...>...</c:data>

Any shim-to-XML can be (?) configured wrt the target vocabulary

  • How?
    • We could identify shim tactics with QNames (similar to how serialization methods work in XProc already)


Allow the same values that XQuery/XSLT allows as values for variables

Sets of Documents

Allow unbounded number of outputs from some steps?

  • MZ says we need this for the NVDL use case [cross-reference needed]. Markup pipeline allowed this, subsequent steps need to access by name, where default naming is with the integers. . .
  • p:pack could have more than two inputs, so you could do column-major packing


Plain text



In other words, automatically converting between media-types as required by output–input connections

Other Architecture Changes


Can we suspend a pipeline waiting for something to happen?

Some examples:

  • wait for HTTP POST from GitHub (notifications)
  • JMS queue listener
  • TCP socket listener


Related-but-different (with pipeline-internal events, as it were):

Pipelines frozen in time

Can we dump a partially-evaluated pipeline instance for subsequent resumption?

In other words, can we implement the ability to pause/resume pipelines?

Expose pipelines as XPath functions

For easy re-using of pipelines.

  • Or, allow XQuery/XSLT to import pipelines

media type decl

  • allow the declaration of the media type expected on input/output port (as I think it's being discussed for the support of flowing non-XML documents)

and extend the implicit connections based on the media-type declarations. For instance, it's not uncommon in our use of XProc to have several "sets" of documents flowing through a sequence of steps. A.g. a file set XML description and sequences of in-memory XML documents. Primary input / outputs are implicitly connected but we do have to explicitly connect the others. It would be great if there was a rule like "an input port expecting document of type X is automatically connected to the readable port with providing documents of matching type X". Explicit connections are needed only if there are several readable port candidates or if not type info is available.

XProc v.Next  Architecture | Usabilitiy | New Steps | Resource Manager | Integration