<!DOCTYPE TEI.2 PUBLIC '-//C. M. Sperberg-McQueen//DTD
          TEI Lite 1.0 plus SWeb (XML)//EN'
          '../../../../../People/cmsmcq/lib/swebxml.dtd' [

<!ENTITY date.last.revised "23 January 2007">
<!ENTITY date.last.revised "24 November 2006 - 30 December 2006">

<!ENTITY iexcl  "&#xA1;" ><!--=inverted exclamation mark-->
<!ENTITY iquest "&#xBF;" ><!--=inverted question mark-->
<!ENTITY ldquo  "&#x201C;" ><!--=double quotation mark, left-->
<!ENTITY lsquo  "&#x2018;" ><!--=single quotation mark, left-->
<!ENTITY mdash  "&#x2014;" ><!--=em dash-->
<!ENTITY ne     "&#x2260;" ><!--/ne /neq R: =not equal-->
<!ENTITY neq    "&#x2260;" ><!--/ne /neq R: =not equal-->
<!ENTITY rarr   "&#x2192;" ><!--/rightarrow /to A: =rightward arrow-->
<!ENTITY rdquo  "&#x201D;" ><!--=double quotation mark, right-->
<!ENTITY rsquo  "&#x2019;" ><!--=single quotation mark, right-->

<!ENTITY d '<ident>d</ident>' >
<!ENTITY E '<ident>E</ident>' >
<!ENTITY n '<ident>n</ident>' >
<!ENTITY R '<ident>R</ident>' >
<!ENTITY T '<ident>T</ident>' >

<!ATTLIST bibl id ID #IMPLIED>
<!ATTLIST div id ID #IMPLIED>
<!ATTLIST item id ID #IMPLIED>
<!ATTLIST scrap id ID #IMPLIED>

<!NOTATION PDF SYSTEM "application/pdf" >
<!NOTATION PNG SYSTEM "image/png" >
<!ENTITY xproc01.metamodel SYSTEM "images/xproc01.metamodel.png" NDATA PNG >
<!ENTITY fig1.orig 
  SYSTEM "http://www.w3.org/TR/xproc/graphics/sch-xinclude-validate-pipeline.png"
  NDATA PNG >
<!ENTITY sch-xinclude-validate-pipeline 
  SYSTEM "http://www.w3.org/TR/xproc/graphics/sch-xinclude-validate-pipeline.png"
  NDATA PNG >
<!ENTITY fig01.v1  SYSTEM "images/fig01.v1.png" NDATA PNG >
<!ENTITY fig01.v2e SYSTEM "images/fig01.v2e.png" NDATA PNG >
<!ENTITY fig01.v2f SYSTEM "images/fig01.v2f.png" NDATA PNG >
<!ENTITY fig2.orig 
  SYSTEM "http://www.w3.org/TR/xproc/graphics/sch-transform.png"
  NDATA PNG >
<!ENTITY sch-transform
  SYSTEM "http://www.w3.org/TR/xproc/graphics/sch-transform.png"
  NDATA PNG >
]>
<?xml-stylesheet type="text/xsl" href="xproc.model.xsl"?> 
<!---* 
<?xml-stylesheet type="text/xsl" href="../../../../../People/cmsmcq/lib/swebtohtml.xsl"?> 
*-->
<TEI.2 rend="w3c-public">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Some Alloy models for XProc</title>
<author>C. M. Sperberg-McQueen</author>
</titleStmt>
<publicationStmt>
</publicationStmt>
<sourceDesc>
<p>Created in electronic form.</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<front>
<titlePage>
<docTitle>
<titlePart>Some Alloy models for XProc</titlePart>
</docTitle>
<titlePart>A working paper for the W3C XProc Working Group</titlePart>
<docAuthor>C. M. Sperberg-McQueen</docAuthor>
<docDate>November 2006 - January 2007</docDate>
<docDate>Last revised <date>&date.last.revised;</date></docDate>

<titlePart>This paper is unfinished.  If you are not collaborating with
the author on it and have run across it by accident, please ignore it.</titlePart>
</titlePage>
</front>

<body>

<!--* <div>
<head>Introduction</head>  *-->
<p>
This paper presents several formal models for some aspects of XProc <ptr
target="XProc" type="bibref"/>, an XML
pipeline language being defined by the W3C XML Processing Model
Working Group; these models are formulated using the notation of Alloy
<ptr target="Jackson" type="bibref"/>.
The goal is threefold: to help clarify design issues,
to use Alloy to perform simple sanity checks on the design, and to give
the author some practice using Alloy.</p>

<p>The current design of XProc postulates a layer of abstractions
(pipelines, pipeline steps, etc.) which are independent of and
unrelated to the XML vocabulary used to describe specific instances
of the abstractions.  The goal is to have a complete description in
abstract terms, which owes nothing to the properties of the XML
documents, and which thus is not unduly influenced by accidents
of syntax.  
The current version of this paper concentrates on providing a
formal version of the abstract model, as described in sections
1-3 of the specification, and on identifying places in which the
abstraction layer is not fully described in the current draft and
cannot be understood independently of the XML representation.
</p>	

<p>Several kinds of comment which address rather different
readerships are intermixed here.  The first three are what the 
reader of a paper on formal models for XProc might most
reasonably expect:<list type="bullets">
<item>descriptions of what's in the XProc spec</item>
<item>descriptions of the Alloy model(s) of XProc</item>
<item>remarks (currently few and far between)
on Alloy syntax, for the use of readers who don't know
Alloy</item>
</list>
Part of the goal, however, is to use Alloy modeling to help
find aspects of XProc which could be improved, which leads to:
<list type="bullets">
<item><p>meta-discussion about the design of XProc and what the
XProc spec ought to be doing (as opposed to what it does);
at the limit, these bleed into straightforward editorial
comments which should just be sent to the editor (I tried
initially to keep purely editorial issues out of this working 
paper, but in the end I just gave up on the attempt)</p>
<p rend="meta-xproc">As an experiment, some such meta-discussions on
XProc design issues are here rendered with a distinctive background
color and font-style, like this paragraph.  Readers
without an interest in the details of the XProc Working Group's
design process may skim these paragraphs.</p></item>
</list>
And part of the goal is to improve the author's facility with
Alloy, which gives rise to:
<list type="bullets">
<item><p>meta-discussion on alternative methods of modeling XProc in 
Alloy and speculation on which is preferable</p>
<p rend="meta-alloy">As an experiment, some such meta-discussions on
Alloy choices are here rendered with a distinctive background
color and font-style, like this paragraph.  Readers not concerned
with questions of Alloy usage may skip these paragraphs without
losing any essential exposition.</p></item>
</list>
In the ideal case, I expect that future versions of this document will
limit themselves to the first three of these (descriptions of XProc
and its Alloy models); for the foreseeable future, however,
it seems likely that meta-commentary is essential.  
<!--* Future
versions of this document may style the different forms of
meta-discussion in distinct ways. *-->
</p>

<p>This paper provides no systematic introduction to Alloy notation.
Those not familiar with Alloy should be able to follow at least the
broad outlines of the discussion: Alloy is well designed to feel
accessible to anyone familiar with common object-oriented programming
languages, and I have made an effort to present everything important
both formally, in Alloy notation, and informally, in prose.<note
place="foot">Some authorities recommend eliminating redundancy from
technical specifications as far as possible, but this is I think
almost always a large mistake. The redundancy of presenting material
both formally and informally makes it easier to follow dense or
unfamiliar material and thus easier to detect errors.  Some
specification languages, such as Z, are explicitly designed to be used
in a mixture of prose and formal notation, and the usual stylistic
advice is to exploit the duality of formal and informal presentation
to provide a controlled but fully intentional redundancy.
See e.g. <ptr type="bibref" target="McMorran-Powell"/>, <ptr
type="bibref" target="Potter.et.al"/>, and <ptr type="bibref"
target="Wordsworth"/>.  Since Alloy is not specified as an
intermixture of formal notation and prose, this paper uses
a literate programming notation to make the intermixture possible
<ptr target="SWeb" type="bibref"/>.</note> To follow the models in more detail, any
reader not familiar with Alloy will find it useful to read through the
Alloy tutorial. Good online documentation can be found at
<xref>http://alloy.mit.edu</xref> as well as in the book <ptr
type="bibref" target="Jackson"/>.</p>

<p>Sections <ptr target="xproc-intro" type="secnum"/>,
<ptr target="concepts" type="secnum"/>, and 
<ptr target="constructs" type="secnum"/> of this paper present aspects
of the model which emerge from the <title>Introduction</title>,
<title>Pipeline Concepts</title>, and <title>Language
constructs</title> sections, respectively, of <ptr target="XProc"
type="bibref"/>. For each
section, first parts of the spec are quoted, and the salient parts of
the model (as I understand it) are identified in a sequence of
numbered propositions. The <soCalled>salient</soCalled> parts, for
purposes of this paper, are those which (appear to) lend themselves to
modeling using a tool like Alloy.  No claim is made that paragraphs
not transcribed here are not essential to the spec or of lesser value,
only that they contain no (new) propositions I can imagine modeling
usefully using a system like Alloy.</p>
<p>
Items in brackets reflect propositions not stated
explicitly in the spec and may go slightly beyond what is intended,
but seem useful to capture as possible rules. Some double-bracketed items
([[ ... ]])
just relate (what I take to be) different ways of saying the same
thing.  Items marked with <q>&iquest;...?</q> may or may not capture what
is meant by the spec; the paragraphs in question may need editorial
attention to make them clearer. <!--* Items marked <q>?!</q> seem to
follow clearly from the wording of the spec, but seem to the author
not to make sense, either because the formulation needs improvement
or because it seems to reflect an unbelievable design choice. *-->
Items marked <q>Q:  ...</q> are
questions about pipelines which may occur to a reader but which the
spec appears not to have answered at the time they occurred to this
reader.</p>
<p>The Alloy models provided in the course of the document formalize
parts of the design as captured in the numbered propositions.</p>


<note type="display">
<p>In its current form, both this paper and the models it presents are
incomplete.</p>

<p>The paper provides no introduction to XProc other than that
implicit in the quotations from the spec and the paraphrase in the
form of numbered propositions.  In particular, there is no motivation
of the design features of the language.  At times I have considered
embedding this material in the text of the spec, but for the moment I
have chosen merely to quote from the spec extensively.  I keep having
second thoughts about this, and returning to the XML source of the
spec to see about inserting the models there.  Each time I get a little
closer to being able to add material conveniently. But so far I am not
in a position to edit the XProc spec conveniently.</p>

<p>For the moment, the reader not already familiar with XProc may do
well to read this document together with the XProc spec <ptr
target="XProc" type="bibref"/>. The draft of XProc used is that of 17
November 2006.</p>


</note>
<!--* </div> *-->

<div id="xproc-intro">
<head>Concepts in the introduction</head>

<div id="xproc01">
<head>Basic classes of objects</head>
<p>Section 1, paragraph 1 of <ptr target="XProc" type="bibref"/> 
reads:
<q type="block">
<p>An XML Pipeline specifies a sequence of operations to be performed
on a collection of input documents. Pipelines take zero or more XML
documents as their input and produce zero or more XML documents as
their output. Steps in the pipeline may read or write non-XML
resources as well.
</p>
</q></p>
<p>From this, the following propositions appear to follow.
<list type="propositions">
<item id="ex.pipelines">Pipelines exist. (1p1; 
cf. prop. <ptr target="def.pipeline"/>)<note place="foot">The
notation <q>(1p1)</q> is a reference to the paragraph from
which this proposition was derived.  Such paragraph
references are used in the propositions and elsewhere; they
indicate the section number (here 1) and, after a 
<q>p</q>, the paragraph number within the section (here also 1).
Propositions restated later sometimes have the later location
also noted, but this is not always so.  In some cases, as
here, reference is made to other propositions with related
content.</note></item>
<item id="ex.operations">Operations exist. (1p1)</item>
<item>Pipelines take zero or more XML documents as input. (1p1)</item>
<item>Pipelines produce zero or more XML documents as output. (1p1)</item>
<item>Pipeline inputs are XML documents. (1p1)</item>
<item>Pipeline outputs are XML documents. (1p1)</item>
<item id="ex.steps">Steps exist. (1p1, 1p3)</item></list>
</p>
<p rend="meta-xproc">There is a forward reference to steps here; remove?</p>
<p>
<list type="propositions">
<item>Steps may be in pipelines. (1p1)</item>
<item id="read..step.nonxml">Steps may read non-XML resources. (1p1)</item>
<item id="write..step.nonxml">Steps may write non-XML resources. (1p1)</item>
</list></p>

<p>Section 1 paragraph 2 reads in part:
<q type="block">
<p>A pipeline consists of components. ...</p>
</q>
</p>
<p>From this, we can infer:
<list type="propositions">
<item id="ex.components">Components exist. (1p2)</item>
<item>Pipelines consist of components. (1p2)</item>
</list>
</p>

<p>Section 1 paragraph 3 elaborates on components:
<q type="block">
<p>
There are two kinds of components: steps and (language)
constructs. Steps carry out single operations and have no substructure
as far as the pipeline is concerned, whereas constructs can include
components within themselves.</p></q>
</p>

<list type="propositions">
<item id="ex.constructs">Constructs exist. (1p3)</item>
<item id="component..step">Every step is a component. (1p3)</item>
<item id="component..construct">Every construct is a component. (1p3)</item>
<item>Every component is a step or a construct. (1p3)</item>
<item id="atomic..step">Steps are treated as atomic (no internal structure 
to model). (1p3, 2.1p1)</item>
<item id="include..construct.components">Constructs 
include components within themselves. (1p3, 2.1p2, 2.1p3)</item>
<item>[X contains Y iff X includes Y within itself.] (1p3)</item>
<item id="contain..construct.components">Constructs contain components. (1p3)</item>
<!--* 
<item>Q: if X contains Y and Z contains Y, must X = Z? (1p3)</item>
<item>Q: if X directly contains Y and Z directly contains Y, must X = Z? (1p3)</item>
*-->
</list>

<p>If we begin by declaring pipelines, XML documents, etc. as
signatures in Alloy, and make clear the relation between components
(as the general class) and steps and constructs (as its subtypes), we
have something like the following.  (This model is available
as a stand-alone Alloy file as <xref>xproc01.als</xref> in the
same directory as this document.)
<scrap id="intro1" name="First cut at signatures" file="xproc01.als">
module xproc01

// Pipelines 
sig Pipeline {
  inputs: set XMLDoc,
  outputs: set XMLDoc,
  components: set Component
}

// Pipelines read and write mostly XML documents, 
// but also nonXML
sig XMLDoc {}
sig nonXML {}

// Components are of two kinds:  Steps and Constructs
abstract sig Component {}
abstract sig Step extends Component {}
abstract sig Construct extends Component {
  components: set Component
}
</scrap>
</p>
<p>
This model reflects most of propositions <ptr target="ex.pipelines"/>
through <ptr target="contain..construct.components"/>; it makes no attempt
to model the relation between pipelines and operations, or the
reading and writing of non-XML resources, 
so
propositions <ptr target="ex.operations"/>, 
<ptr target="read..step.nonxml"/>, and
<ptr target="write..step.nonxml"/>
are excluded.
</p>
<p>From these declarations, Alloy can generate a metamodel showing
the relations of the classes:
<figure entity="xproc01.metamodel" rend="100%">
<head>Class relations in model xproc01</head>
</figure>
</p>
<p>The built-in set <ident>univ</ident> is extended by
<ident>Pipeline</ident>, <ident>Component</ident>,
<ident>XMLDoc</ident>, and <ident>nonXML</ident>.
<ident>Component</ident>, in turn, is extended by
<ident>Step</ident> and <ident>Construct</ident>.
The relations <ident>input</ident> and <ident>output</ident> 
connect <ident>Pipeline</ident> instances to
<ident>XMLDoc</ident> instances, while
<ident>components</ident> connects pipelines
to components, and components to components.
</p>
<p>Of course, this set of types is not yet complete, and their
interrelations are incorrectly specified:  Pipelines, it will 
become clear later in the spec, are a subclass of Constructs.
And each component can have inputs and outputs.</p>
<p rend="meta-alloy">
I define Component as abstract, to force the conclusion that every
component is either a Step or a Construct.  To prevent
other models from importing this one and adding further subtypes
of component, one could add:
<scrap name="Prohibiting further subtypes of Component">
fact component_completeness { 
  Component = Step + Construct 
}</scrap>
Defining Step and Construct as abstract I'm less certain about;
my idea is that the types of steps supported can be declared as
extensions of Step, thus:
<scrap name="Predefined step types (sketch)" id="predefstep-sketch">
sig Identity extends Step {}
sig XSLT extends Step {}
sig XInclude extends Step {}
sig Serialize extends Step {}
sig Parse extends Step {}
sig Load extends Step {}
sig Store extends Step {}
sig ExtensionStep extends Step {
  type : Extension
}
</scrap>
Making the Step signature be abstract has as a consequence the
invariant that every step is of a known kind; if it's not of a
predefined kind then it's an ExtensionStep.  Implementations that
provide other step types can be modeled by adding signature
declarations for the new step types.  Alternatively, we could forbid
such additional declarations by imposing the explicit constraint that
every Step must be a member of one of the sub-signatures just named:
<scrap prev="predefstep-sketch">
fact step_completeness {
  Step = Identity + XSLT + XInclude + Serialize
         + Parse + Load + Store + ExtensionStep
}
</scrap>
This may or may not be the right way to model the relation among
required step types and others.</p>
<p rend="meta-alloy">Declaring Construct as abstract is intended to work the same
way; I'll declare concrete sets which extend Construct and can
be instantiated.</p>
<p rend="meta-xproc">The reader of the XProc spec may be uncertain whether the correct
declarations of Identity, XSLT, and so on will be as shown above
(which allows the world to contain multiple Identity Steps and
multiple XSLT Steps, etc.), or with a specification that there is
really only one Identity step, only one XSLT step, etc.: 
<scrap name="Predefined step types (alternate sketch)">
one sig Identity extends Step {}
one sig XSLT extends Step {}
...
</scrap>
It becomes clearer at the end of the introduction that what the standard library
contains are types of steps, rather than steps.  See also 
proposition <ptr target="cict"/> below.
</p>
</div>
<div id="data-flows">
<head>Data flows, first cut</head>


<p>Section 1 paragraph 2 reads:
<q type="block">
<p>
A pipeline consists of components. Like pipelines, components take
zero or more XML documents as their input and produce zero or more XML
documents as their output. The inputs to a component come from the
web, from the pipeline document, from the inputs to the pipeline
itself, or from the outputs of other components in the pipeline. The
outputs from a component are consumed by other components, are outputs
of the pipeline as a whole, or are discarded.
</p>
</q>
</p>
<list type="propositions">
<item>Components take zero or more XML documents as input. (1p2)</item>
<item>Components produce zero or more XML documents as output. (1p2)</item>
<item>Component inputs and outputs are XML documents. (1p2)</item>
<item>Component inputs may come from the Web. (1p2)</item>
<item>Component inputs may come from the pipeline document. (1p2)</item>
<item>Component inputs may come from pipeline inputs. (1p2)</item>
<item>Component inputs may come from outputs of other components. (1p2)</item>
<item>&iquest;Component inputs must come from one of: the Web, pipeline document,
pipeline inputs, component outputs?  (Not certain whether implication
intended or not.) (1p2)</item>
<item>[No single XML document comes from more than one of: the Web, pipeline document,
pipeline inputs, component outputs.] (1p2)</item>
<item id="consume..component.xml">[Components may consume XML documents.] (1p2)</item>
<item>[[If a component <term>consumes</term> an XML document, then it
takes that XML document as an input, and vice versa.]] (1p2)</item>
<item>Components may consume component outputs. (1p2)</item>
<item>Pipeline outputs may consume component outputs. (1p2)</item>
<item>[[If a pipeline output <term>consumes</term> a component output, then
the component output flows into the pipeline output. And conversely,
if component output flows into a pipeline output, then the pipeline
output consumes the component output.]] (1p2)</item>
<item>Component outputs may be discarded. (1p2)</item>
<item id="multiple-consume">[An 
XML document can both be consumed by a component and consumed by a 
pipeline output.] (1p2)</item>
<item id="consume-or-discard">[No XML 
document can both (1) be consumed by a component or by a 
pipeline output, and (2) be discarded.] (1p2)</item>
</list>
<p rend="meta-xproc">Proposition <ptr target="multiple-consume"/>
interprets the <q>or</q> of the spec as an inclusive or
as it relates to data being consumed by a pipeline output 
or by other components; proposition <ptr target="consume-or-discard"/> 
interprets the same <q>or</q> as exclusive when it
relates to data being consumed or discarded.  This only 
became puzzling to me when I thought about it carefully.</p>

<p>Components can read and write both XML documents and non-XML
resources, so the reader may wonder why the inputs and outputs of
components are described as being XML documents only.  Section 2.2 of
the spec makes clear (it might perhaps be clearer in section 1, too)
that non-XML resources never flow from one component to another, and
thus never flow <emph>through the pipeline itself</emph>.  The
terms <mentioned>input</mentioned> and <mentioned>output</mentioned>
are used only for the named ports of a component, not for other
data the process might read or write.</p>
<p rend="meta-xproc">
To the extent that the <term>inputs</term> and <term>outputs</term>
of a component are restricted to XML documents, the terms
<mentioned>input</mentioned> and <mentioned>output</mentioned> are
technical terms with a specialized meaning.  <note>The discussion which
follows assumes in some places that the terms are <emph>not</emph> 
consistently restricted in this way; it may need revision.</note></p>
<p>Pipelines take their inputs from the outside world, and their
outputs flow into the outside world; how a processor manages that is
implementation-dependent.</p>
<p>Note that the Introduction's description is asymmetric; as it
describes things, a component can deal directly with the outside world
only when it comes to inputs, not for outputs.  For now, we'll
focus on modeling the data flows within a pipeline, and will
ignore any direct access to the outside world by steps within
a pipeline.</p>
<p>The data flows of a pipeline will clearly be of great interest
for a formal model.</p>
<p rend="meta-xproc">At the moment, however, sections 1-3 of the spec
don't say enough about data flows and the interconnection of 
pipeline components to allow a satisfactory model to be built.
The remarks that follow rely on knowledge acquired through other
channels, and on speculation about how XProc ought to work.</p>
<p>Several approaches to modeling data flows
can be imagined.  Based on what is said in the introduction, 
we can simply say that components have inputs and outputs, 
which can be XML documents or non-XML data:
<scrap name="Unnamed data flows (sketch)" id="flows-00">
...
abstract sig Component {
  ins: set (XMLDoc + nonXML),
  outs: set (XMLDoc + nonXML)
}  
...
</scrap>
</p>
<p>
Connections between the components of a pipeline can then
be modeled as a mapping from pipeline inputs and component outputs 
to component inputs and pipeline outputs.
<scrap name="Unnamed data flows (sketch)" prev="flows-00">
...
sig Pipeline {
  ins: set XMLDoc,
  outs: set XMLDoc,
  components: set Component,
  descendants: set Component,
  flows: (ins + descendants.@outs) ->
    (descendants.@ins + outs)
}{
  // The 'descendants' of a pipeline are all of the
  // components in its 'components' set, or in the
  // 'components' set of any descendant.
  descendants = univ.^components

  // Flows only involve XML documents.
  // I.e. the domain of flows is XMLDoc
  flows.univ in XMLDoc

  // And so is the range
  univ.flows in XMLDoc
}
...
</scrap>
</p>
<p>
Some notes on Alloy notation may be in order here.
<list type="bullets">
<item><p>The operator <code>-></code> denotes a mapping from
its left-hand argument to its right-hand argument.  Here,
<q><code>flows: (ins + descendants.@outs) ->
(descendants.@ins + outs)</code></q> says that for any 
given instance of the <ident>Pipeline</ident> signature, 
<ident>flows</ident>
is a mapping from <code>(ins + descendants.@outs)</code>
to <code>(descendants.@ins + outs)</code>.</p>
<p>Viewed in isolation, of course, any field in a signature is
already a relation from members of the signature to values of
the field; when a field is itself of type relation of arity &n;, 
its name denotes a relation of arity &n; + 1.  Here, <ident>flows</ident>
is a set of triples, with each triple consisting of
a Pipeline instance, an XML document, and and XML document.</p></item>
<item><p>The operator <code>+</code> denotes set union.</p>
<p>Here, the expression <q><code>ins + descendants.@outs</code></q>
denotes the union of the set <ident>ins</ident> defined
earlier in the signature and the set <code>descendants.@outs</code>.</p></item>
<item><p>The operator <code>@</code> acts as an escape
character; it allows the names <ident>ins</ident> and
<ident>outs</ident> to be used to refer not only to the
<ident>ins</ident> and <ident>outs</ident> of a particular
pipeline but to those of other components.  Here, the
expression <q><code>descendants.@ins</code></q> denotes the
union of the values of <ident>ins</ident> for all descendants.</p>
</item>
<item><p>For any relation &R;, the expression 
<code>^R</code> denotes the transitive closure of &R;;
here <q><code>^components</code></q> denotes
the transitive closure of the relation <ident>components</ident>.
(Not shown here is the related operator *: <code>*R</code> denotes
the reflexive transitive closure of &R;.</p>
</item>
<item>
<p>The expression <q><code>flows.univ</code></q> denotes the
domain of the relation <ident>flows</ident>; <q><code>univ.flows</code></q>
denotes its range.  The utility functions <ident>dom</ident>
and <ident>ran</ident> could have been used instead, but
the use of <ident>univ</ident> in this way is a common
Alloy idiom.  For any binary relation &R;, 
<code>R.univ</code> and <code>univ.R</code> work in this way.
For relations of arity <ident>n</ident> > 2, they denote the
relations of arity <ident>n</ident> - 1 which result when the first, 
or the last, atom in each tuple of the original relation is
dropped.<note place="foot"><p>
The . operator performs a equi-join on two sets of tuples using the last
column of its left-hand argument and the first column of its
right-hand argument, and then
projects every column of the resulting tuples except the one
on which the join was performed.  The name <ident>univ</ident>
denotes the universal set.  So <code>univ.flows</code>
performs an equi-join on the universal set and the first column 
of the set of flows (resulting in a set containing all the
flows), and then discards the column used for the join,
leaving only the range of <ident>flows</ident>.
Fuller information on the dot operator is in the Alloy documentation.
</p></note></p>
</item>
<item><p>The second set of braces in the declaration of <ident>Pipeline</ident>
contains facts which must hold of every instance of the
signature:  writing a proposition <ident>P</ident> there
is equivalent to writing <q>For all x in Pipeline, <ident>P</ident> holds.</q>
Or in Alloy notation,<scrap name="Alternate formulation">
sig Pipeline {
  ins: set XMLDoc,
  outs: set XMLDoc,
  components: set Component,
  descendants: set Component,
  flows: (ins + descendants.@outs) ->
    (descendants.@ins + outs)
}

// The 'descendants' of a pipeline are all of the
// components in its 'components' set, or in the
// 'components' set of any descendant.
fact descendants {
  all p: Pipeline | p.descendants = univ.(p.^components)
}

// Flows only involve XML documents.
// I.e. the domain of flows is XMLDoc
fact flow_domain {
  all p: Pipeline | p.flows.univ in XMLDoc
}

// And so is the range
fact flow_range {
  all p: Pipeline | univ.(p.flows) in XMLDoc
}
</scrap>
</p>
</item>
</list>
</p>
<p>
This first formulation of data flows begins to get the point across, but it's 
not satisfactory for several reasons:
<list type="bullets">
<item><p>It suggests that <ident>flows</ident> is a mapping
from data sources to data sinks; in reality, what we want
is not for the ouput of one component to be mapped to the
input of another, but to be <emph>identical</emph> to it.</p></item>
<item><p>This first cut doesn't enunciate any of the obvious
sanity checks.  To name just one:  it doesn't forbid
mapping the output of a component to an input of the
same component.  See proposition <ptr target="acyclic..flow"/>
below.</p>
</item>
<item><p>It treats the inputs and outputs of a component as
simple undifferentiated sets of XML documents, which is
implausible:  in reality, if a component takes two input
documents they will typically have quite distinct roles
(such as: Document and Schema, as shown in Figure 1 of
the spec) and be bound to named ports.</p></item>
</list>
</p>
</div>
<div>
<head>Data flows, second cut</head>
<p>The propositions seen so far (<ptr target="ex.pipelines"/> through
<ptr target="consume-or-discard"/>) do not provide
enough information to model the named ports usefully, but
if we jump ahead a little and assume some information
which does not actually occur until later, we can
provide a slightly more plausible model.  The input
and output ports of a pipeline are, in this model, 
mappings from names to XML documents.</p>
<div>
<head>Overall structure</head>
<p>The overall structure of this model is shown
in the code snippet below:
<scrap id="xproc02"
file="xproc02.als"
name="Pipelines and components with named ports">

<ref target="xp2.module">Module declaration and imports</ref>
<ref target="xp2.opaque">Primitives / unanalysed signatures</ref>
<ref target="xp2.components">Components</ref>
<ref target="xp2.pipelines">Pipelines</ref>
<ref target="xp2.specifics">Specific component types</ref>
<ref target="xp2.commands">Sanity checks</ref>

</scrap></p>
</div>
<div>
<head>Module declaration</head>
<p>For reference, let's call this model <ident>xproc02</ident>.
We import the library file <ident>util/relation</ident> in
order to make the <ident>dom</ident> and <ident>ran</ident>
functions available.
<scrap id="xp2.module" name="Module declaration and imports">
module xproc02
open util/relation as R
</scrap>
</p>
</div>
<div>
<head>Primitive unanalysed types</head>
<p>As before, we treat names, XML documents, and non-XML
resources as primitive, unanalysed types:
<scrap id="xp2.opaque" name="Primitives / unanalysed signatures">
// Some primitive, unanalysed signatures: 
// names, documents, resources
sig Name {}
sig XMLDoc {}
sig nonXML {}
</scrap>
</p>
</div>
<div>
<head>Components and their structure</head>
<p>Components all have inputs and outputs, which are functional
mappings from names to XML documents:
<scrap id="xp2.components" name="Components">
// Any pipeline component has inputs and outputs.
abstract sig Component {
  ins: Name -> lone XMLDoc,
  outs: Name -> lone XMLDoc
}{
  // The names of input and output ports are disjoint.
  // (At least I think they are; is this true?  See 2.1p6.)
  no dom[ins] &amp; dom[outs]
  
  // No document is simultaneously an input and an output
  // for the same component.  See 2p1.
  // (Actually this is weaker than 2p1.  In fact,
  // data flows are acyclic.)
  no ran[ins] &amp; ran[outs]
}
</scrap>
The keyword <kw>lone</kw> in the expression
<q><code>ins: Name -> lone XMLDoc</code></q> means that any Name maps to
at most one XMLDoc.
</p>
<p>
In the scrap above, signature facts are provided (in the second pair of braces)
to specify relevant invariants.
First, no name can be used both for an input and
for an output port.  (I don't know if this is actually true
in the current design of XProc, but it seemed useful in
avoiding confusion.)  The expression <q><code>no dom[ins] &amp; dom[outs]</code></q>
could also be expressed in a style more like ordinary
predicate calculus:
<eg>all n: Name | n in dom[ins] => n not in dom[outs]
  and n in dom[outs] => n not in dom[ins]</eg>
or 
<eg>all n: Name | not (n in dom[ins] and n in dom[outs])</eg>
The expression <q><code>no dom[ins] &amp; dom[outs]</code></q>
has the general form <q><code>no</code> <ident>Expr</ident></q>,
which means <gloss>the set described by 
expression <ident>Expr</ident> has cardinality 0</gloss>.
Here, that set is the intersection of <code>dom[ins]</code>
and <code>dom[outs]</code>.</p>
<p>Second, since no component can consume its own output,
no document is both an input and an output for the component.
(If we want this really to be true, we will have an
interesting time explaining what the Identity step which all
XProc implementations must support is actually doing.  For
now, I'll suppose that the so-called <soCalled>Identity</soCalled>
step actually produces an XML document equivalent and isomorphic to, but not
technically identical to, its input.
This is not the place to work out notions of XML document
identity.  To the extent that we wish to use the 
<ident>XMLDoc</ident> signature to represent data flows
from one component to another, it has more stringent identity
conditions than do XML documents.)
</p>
</div>
<div>
<head>Steps</head>
<p>There are two (disjoint) kinds of component, in the current
spec called <soCalled>steps</soCalled> and 
<soCalled>constructs</soCalled>.
</p>
<p rend="meta-xproc">It is about as much as I can do to force
myself to use this terminology, instead of using the terms
<mentioned>step</mentioned> (for any <soCalled>component</soCalled>),
<mentioned>atomic step</mentioned>, and <mentioned>compound
step</mentioned>.  This would free the term <mentioned>construct</mentioned>
to refer to syntactic objects, which is its natural
interpretation, instead of whatever we conceive components
to be.  In some cases, I am not sure I have succeeded in
forcing myself to use the standard terms.
A future revision of this paper may abandon this
attempt to align with the current terminology and adopt
a Superior and Cleaner Terminology throughout.</p>
<p>Steps have no interesting substructure (although
different kinds of steps will have different input/output
signatures; see below).  The following scrap models
propositions <ptr target="ex.steps"/>, 
<ptr target="component..step"/>, and 
<ptr target="atomic..step"/>.
<scrap prev="xp2.components">
// Steps (atomic components) have no further internal 
// structure, just inputs and outputs.
abstract sig Step extends Component {}
</scrap>
</p>
</div>
<div>
<head>Constructs</head>
<p>Compound steps, however (i.e. <soCalled>constructs</soCalled>),
contain other components.  We define the relation
<ident>components</ident> for the components
directly contains, and the auxiliary relation
<ident>descendants</ident> (you would think it would
not be hard to find a better name than that) for
all components contained directly or indirectly
within a particular construct.
The following scrap models propositions
<ptr target="ex.constructs"/>,
<ptr target="component..construct"/>, and
<ptr target="include..construct.components"/> through
<ptr target="contain..construct.components"/>.

<scrap prev="xp2.components">
// Constructs (compound components), however, have
// nested components
abstract sig Construct extends Component {
  components: set Component,
  descendants: set Component
}{
  descendants = ran[^@components]
}
</scrap>
It is the expression <q><code>descendants = ran[^@components]</code></q>
which gives meaning to the <ident>descendants</ident> relation.
</p>
<p rend="meta-alloy">If there is a way to <soCalled>initialize</soCalled>
the <ident>descendants</ident> relation as part of its declaration,
instead of in a separate signature fact, I don't currently see it.</p>
<!--*
<p>At the abstract level, we need to specify that no
component can be its own descendant.  (This follows
naturally and necessarily from other invariants, if
we regard components as being XML elements, rather than just being
represented by XML elements.  But the current draft of
XProc postulates not just XML elements, but also components
which are not the same as XML elements, although it does not
make any serious attempt to clarify their nature.  At the
component level, we need to rule out cycles explicitly.)
<scrap name="Containment is acyclic" prev="xp2.components">
fact no_self_containment {
  no c: Component | c in c.descendants
}
</scrap>
</p>
<p rend="meta-alloy">It would seem more natural to express this
invariant as a signature fact.  When I tried, though, I was
unsuccessful in getting the Alloy Analyzer to accept the model;
I did not record and no longer recall the error messages.
If there <emph>is</emph> a way to put this invariant into
the declaration for the <ident>Construct</ident> signature,
I would like to use it, but I don't currently see it.</p>
*-->
</div>
<div>
<head>Pipelines</head>
<p>Finally, we distinguish a Pipeline as a special kind of
construct.
<scrap id="xp2.pipelines" name="Pipelines">
sig Pipeline extends Construct {}
</scrap>
</p>
</div>
<div>
<head>Specific kinds of step</head>
<p>As in the first model, we have declared Step as
abstract, in order to specify individual types of
steps separately in more detail.</p>
<p>An XInclude step, for example, takes one input
and produces one output:
<scrap id="xp2.xinclude" name="The XInclude step type">
sig XInclude extends Step {}{
  some document : Name
      | some X : XMLDoc
         | ins = ( document -> X )

  some result : Name
      | some X : XMLDoc
         | outs = ( result -> X ) 
}
</scrap>
</p>
<p>A Validate step, by contrast, takes two inputs,
and produces one output. 
<scrap id="xp2.validate" name="The Validate step type">
sig Validate extends Step {}{
  some document, schema : Name 
      | some X1, X2 : XMLDoc
      | disj[document, schema] 
         // N.B. the Names are disjoint, not necessarily the documents
         and ins = ( document -> X1 + schema -> X2 )

  some result : Name
      | some X : XMLDoc
      | outs = ( result -> X ) 
}
</scrap>
</p>
<p>The other step types are left for later elaboration; for
now, we use the following place-holders.
<scrap id="xp2.specifics" name="Specific component types">
sig Identity extends Step {}
sig XSLT extends Step {}
<ptr target="xp2.validate"/>
<ptr target="xp2.xinclude"/>
sig Serialize extends Step {}
sig Parse extends Step {}
sig Load extends Step {}
sig Store extends Step {}
sig ExtensionStep extends Step {
  type : Name
}
</scrap>
</p>
</div>
<div>
<head>The <ident>show</ident> predicate</head>
<p>Finally, we define a <ident>show</ident> predicate which can be
used to find instances of the model described.  (Inability to find an
instance may indicate a contradiction in the model.)<note
place="foot">In some cases, however, inability to find an instance
turns out only to indicate that the default scope on the <kw>run</kw>
command is too small to meet the constraints imposed by a predicate.
This is particularly true when a predicate asks for four disjoint
instances of the same signature; there will be no instances of such a
predicate in the default scope of three.</note> In an attempt to make
the instance more interesting, we specify that the inputs and outputs
of the pipeline should not be empty.
<scrap id="xp2.commands" name="Sanity checks">
pred show (p: Pipeline) {
  some p.ins
  some p.outs
  Component = p + p.^components
}
run show for 3 
</scrap>
</p>
<p>The predicate <ident>cyclic</ident> generates an instance
with a cyclic containment relation (a component which
contains itself directly or indirectly).
<scrap prev="xp2.commands" name="Sanity checks, cont'd">
pred cyclic (p: Pipeline) {
  some p.ins
  some p.outs
  some c: Component | c in c.^components
}
run cyclic for 3 
</scrap>
The predicate <ident>acyclic</ident> generates an instance
in which no component
contains itself directly or indirectly.
<scrap prev="xp2.commands" name="Sanity checks, cont'd">
pred acyclic (p: Pipeline) {
  some p.ins
  some p.outs
  no c: Component | c in c.^components
}
run acyclic for 3 
</scrap>
</p>
<note type="block">To do: Define further predicates to generate 
particular kinds of instances of this model:
empty pipelines, pipelines with only simple components (steps, no constructs),
pipelines with some inputs and outputs,
universes in which every component is in some pipeline,
universes in which some components exist outside pipelines,
pipelines with three children,
pipelines with six children.</note>
</div>
</div>
<div>
<head>Example  1</head>
<p>Figure 1 of the spec shows a simple pipeline with
two steps:  an XInclude step whose output goes to
a Validate step.  Each is represented by a box, as is
the entire pipeline.  The boxes representing the individual steps
are decorated with male and female connectors indicating
input (female) and output (male) ports.
Inputs labeled <q>Document</q>
and <q>Schema</q> flow from outside the outer box
to input connectors on the two steps; the output of
the XInclude step flows to the other input of the Validate
step.  The output of the Validate step flows out of the
pipeline.
<figure entity="sch-xinclude-validate-pipeline" rend="50%">
<head>Figure 1 of the spec.</head>
</figure>
</p>
<p rend="meta-xproc">The diagram does not show
connectors (ports) on the pipeline, so it may be read
as indicating that the XInclude step takes its input
direct from the Web, rather than from a named pipeline
input, and similarly for the schema input to the 
Validation step.  The text accompanying the figure
makes clear, however, as does the XML version of the pipeline
in appendix E.1 of the spec, that Schema and Document are
pipeline inputs and that the result document of
the Validate step is the result document of the
pipeline.</p>
<p rend="meta-xproc">The diagram should perhaps be redrawn to
make the ports on the pipeline explicit.</p>
<p>
It would be useful to ensure that the figure as shown
is consistent with the model we have defined.
We can do this, by defining an Alloy predicate
which describes an instance of the model, and
by imposing constraints on that instance which
correspond to the properties of the pipeline shown
in the figure.</p>
<p rend="meta-alloy">Describing a specific instance in
this way is not a focus of any Alloy documentation I have
seen.  It may count as software abuse.</p>
<p rend="meta-alloy">
In the usual mode of operation, the modeler just
describes the model, not particular instances, and
lets Alloy handle the business of generating instances.
The generation of instances is indeed one of Alloy's 
particular strengths compared with other modeling 
tools.  But it's not uncommon to require, in an Alloy predicate,
some particular properties that an instance of the model may
possess, so as to be able to focus for a while on instances with
those properties.  As a side effect, of course, the 
generation of model instances which satisfy the constraints of
a given predicate also establishes that the constraints are
consistent with the model; otherwise, no such instance could
exist.  In attempting to induce Alloy to re-draw the
pipeline of Figure 1, we are simply pushing a little
harder in that direction than is usual in exploratory
modeling:  we describe constraints which should
result in a pipeline isomorphic to that of Figure 1,
and we ask Alloy to show us some examples.
If they don't look like Figure 1, then something is
wrong, and probably some constraints are missing either
from our general model or from the statement of the
salient properties of the pipeline in the figure.
</p>
<p>We begin by declaring the module and importing
the <ident>relation</ident> utility.
<scrap name="Checking figure 1 against the model" id="xp2.fig1" 
file="xproc02.fig1.als">
module xproc_figure01
open util/relation as R
</scrap>
Then we import the <ident>xproc02</ident> model.
There's no particular reason we need to have that
model and this work on figure 1 in separate files, but
it feels cleaner this way.  This way the xproc02
file just focuses on the model proper, and not on
the attempt to replicate Figure 1.
<scrap name="Import xproc02 model" prev="xp2.fig1">
open xproc02 as XP
</scrap>
</p>
<p>As an experiment, we'll define some subtypes of
Name.  This isn't strictly necessary, but since
the Alloy visualizer uses type names as labels in 
various places, this will help make the mapping
from names to documents clearer.  (Individual
mappings will be labeled <q>ins[Document]</q>
and <q>ins[Schema]</q>, for instance, instead of <q>ins[Name.0]</q>
and <q>ins[Name.1]</q>.)
<scrap name="Define subtypes of Name" prev="xp2.fig1">
// First, define some specific names (these aren't 
// essential, but they are convenient)
one sig Document, Schema, Input, Result extends Name {}
</scrap>
As the images below show, the experiment seems to have
worked; the diagrams are somewhat easier to read with
these labels.<note place="foot"><p rend="meta-alloy">It
may be possible, I now realize, to get the same effect by tweaking
the visualizer parameters.  That is probably cleaner.</p></note></p>
<p>Next, we define an Alloy predicate named
<ident>Figure_1</ident>.  Within this predicate, 
we will need to refer to the pipeline itself,
to the two steps in the pipeline, and to various
port names.  After the definition of the predicate,
we include the command <code>run Figure_1 for 3 but
4 XMLDoc</code>, which stipulates that the
instance generated by Alloy may have at most
three atoms for any signature<note place="foot">In
the case of the subtypes of Name, the instances of
Document, Schema, Input, and Result appear not to
count against the budget for Name.  How the
scope rules count atoms which can be members of overlapping
signatures I do not know.</note>, except for XMLDoc,
which may have four.  (We want the two inputs and 
the two outputs to be distinct documents.)
<scrap name="Define Figure_1 predicate" prev="xp2.fig1">
// Now define the figure itself.
// It requires a pipeline, two components, and four names
// for specifying the data flows.
pred Figure_1 (p: Pipeline, 
     x: XInclude, 
     v: Validate,
     document: Document,
     schema: Schema, 
     input: Input, 
     result: Result
) {

<ref target="xp2f1.compstruc">Constrain component structure</ref>
<!--* <ref target="xp2f1.names">Ensure that port names are disjoint</ref> *-->
<ref target="xp2f1.ports">Describe the port wiring</ref>
}
run Figure_1 for 3 but 4 XMLDoc
</scrap></p>
<p>Within the body of the predicate, we specify first that
the pipeline should contain exactly the two components
<ident>x</ident> and <ident>v</ident>:
<scrap id="xp2f1.compstruc" name="Constrain component structure">
   // The XInclude and validation components are 
   // in the pipeline. 
   // Nothing else is in the pipeline
   p.components = (x + v)
</scrap>
</p>
<!--* <p>
<scrap id="xp2f1.names" name="Ensure that port names are disjoint">

   // The names are disjoint; that follows from the
   // declarations, and so does not need to be checked.
   // (Just as well; the following line raises an index
   // out of bounds error in Java.)
   // disj[document, schema, input, result]


</scrap>
</p>*-->
<p>Next we describe how the data flows.  We
declare four variables of type XMLDoc, and specify
that the input and output ports of the components must
map to those documents in a particular way.
The input ports of the pipeline, for example, are 
named Document and Schema<note place="foot">That's
a slight oversimplification:  the names are not actually 
specified here.  What is specified is actually that one
port has a name of type Document, and one of type Schema.
Since Schema and Document are declared as singleton sets,
each has just one member; it does no harm here if we imagine
that the one member of the set named Document is the
name <q>Document</q>, or speak as if we imagined it.</note>
and they map to two XML documents which we here call
<ident>Doc</ident> and <ident>Sch</ident>.  The <ident>outs</ident>
field of the pipeline contains a single mapping,
from the name Result to the XML document Res.  And so on.
The flow of data from component (or port) to another
is expressed only by identity of the XML document 
involved.</p>
<p rend="meta-alloy">As noted in the comment, it's not clear
to me at this writing whether it's better to declare
the four XML documents here, or in the list of declarations
for the predicate.  As far as I can tell, the only 
semantic difference is the scope of the declarations.
<!--* (not true, this was a misconception)
The visualizer does treat them differently, though:  
variable names declared as arguments of the predicate
can be used to label the atoms which instantiate them,
while variables scoped to a single expression, like the
XML documents here, are not available for use as labels
in the same way. *-->
</p>
<p>We also specify that the four variables of type XMLdoc
must be disjoint &mdash; in other words, we want four
documents, not just four variables.
<scrap id="xp2f1.ports" name="Describe the port wiring">
   // There are four documents needed.  (I don't currently
   // see whether it's better to declare these above, or here.)
   // This is one way to define the data flows.  
   // There are others.
 
   some Doc, Sch, Tmp, Res: XMLDoc {

      // For simplicity, we specify that the four documents 
      // pairwise disjoint.  That's not actually required
      // by the figure or by the nature of the case.  But
      // reasoning about it would require a better definition
      // of XML document identity than I want to bother
      // with now.

      disj[Doc, Sch, Tmp, Res]

      // How the connections work

      p.ins = ( document -> Doc + schema -> Sch )
         and p.outs = ( result -> Res)

         and x.ins = ( document -> Doc )
         and x.outs = ( result -> Tmp )

         and v.ins = ( document -> Tmp + schema -> Sch)
         and v.outs = ( result -> Res ) 

   }
</scrap>
</p>
<p>The same effect can be had, without specifying names for the 
four documents, in the following slightly more compact 
formulation:
<scrap id="xp2f1.ports.bis" name="Describe the port wiring">
  // Ports on the pipeline
  // N.B. ports on the components are described in xproc02
  dom[p.ins] = document + schema
  dom[p.outs] = result

  // Data flows.  Here is another way to define the flows.

  x.ins[document] = p.ins[document] 
  v.ins[document] = x.outs[result]
  v.ins[schema] = p.ins[schema]
  p.outs[result] = v.outs[result]

  disj[p.ins[document], p.ins[schema], 
       p.outs[result], x.outs[result]]
</scrap>
The first two lines specify the input and output ports on the
pipeline.  
The next four lines show the flow from data sources to sinks,
using the convention that sinks are on the left and sources
on the right.  The last line requires that there be four
distinct XML documents.</p>
<p>Alloy generates a diagram to help visualize the instance
it finds for the <ident>Figure_1</ident> predicate.  
If a little effort is put into customizing the display,
the result can be made fairly legible:
<figure entity="fig01.v2e" rend="100%">
<head>The pipeline of figure 1, as an instantiation of the xproc02 model.</head>
</figure>
Some manual repositioning of the document boxes and process
ovals in an image editor can make it slightly easier to see
the flow of the data.<note place="foot">The figure shown here 
was created by saving the Graphviz input generated by Alloy 
in a <soCalled>dot file</soCalled>, and then cleaning the image 
in an image editing program (OmniGraffle 3.2).  It would
appear that there may be some interoperability issues
concerning colors and shape filling in the Graphviz
notation.  Fortunately, for our purposes the variation can
be ignored.</note>  (If the type is too
narrow to read, a wider browser window will allow it to
become larger.)
<figure entity="fig01.v2f" rend="100%">
<head>The pipeline of figure 1, as an instantiation of the xproc02 model.</head>
</figure>
<!--*
A modified
form of that diagram is this:
<figure entity="fig01.v1" rend="100%">
<head>The pipeline of figure 1, as an instantiation of the xproc02 model.</head>
</figure>
*--></p>
<p>The graphic notation used here is, of course, different from
that used in the XProc spec.  But the information content is,
I believe, much the same; we have succeeded in showing that
the pipeline shown in Figure 1 is a legal instance of the pipeline
model <ident>xproc02</ident>.</p>

</div>
<div>
<head>Example 2</head>
<p>Having succeeded with Figure 1, it is tempting to attempt a 
similar effort with Figure 2 of the XProc spec.  A glance at the
image makes clear, however, that the models developed
so far are not nearly complete enough to generate any instance
remotely like this one.  
<figure entity="sch-transform" rend="50%">
<head>Figure 2 of the spec.</head>
</figure>
The <ident>Document</ident> input splits into two data streams;
the fork has no box in the diagram and is not a component. We
currently have no words or concepts to describe what is happening
here.  (Another apparent difference is only illusory: the
outputs of the two Validate steps flow together, and the point at
which they join is not a component.  But this can be described
using the concepts already introduced: the two data flows go
into the same data sink.)</p>
<p>We will come back to Figure 2 later.</p>
<p rend="meta-xproc">The constructs of Figure 2 go not only beyond
the Alloy models developed and sketched above; they also go
beyond anything section 1 of the spec has prepared us for.
In particular, the conditional direction of flow either
into V1 validation or V2 validation, depending on a property
of the input document, appears to involve entities of kinds
not yet mentioned in the exposition.  There is no reason they
would need to be mentioned in section 1 of the spec, but
nothing said in the other sections devoted to the abstract model
touches on them, either.  In the current draft of the
spec, the abstract model of XProc does not seem to be defined 
in enough detail to make it possible to describe what is happening
in Figure 2.</p>
<p rend="meta-xproc">This is one reason the author believes
that the XProc spec would do better to take a different approach
to the definition of pipelines.</p>
</div>

</div>

<div id="concepts">
<head>Pipeline concepts</head>

<div>
<head>Pipeline connections and evaluation</head>
<p>Section 2 paragraph 1:
<q type="block">

<p>[Definition: A pipeline is a set of components connected together,
with outputs flowing into inputs, without any loops (no component can
read its own output, directly or indirectly).] A pipeline is itself a
construct and must satisfy the constraints on constructs.
</p>
</q>
</p>

<list type="propositions">
<item>Connections [or: data flows] exist. (2p1)</item>
<item>Components may connect to components. (2p1)</item>
<item id="def.pipeline">A pipeline is a set of components connected together. 
[Or: A pipeline is a pair (C, F), where
C is a set of components and 
F is a relation C &rarr; C.] (2p1;
cf. prop. <ptr target="ex.pipelines"/>)</item>
<item>[If a pipeline consists of components X, Y, ... Z, then 
that pipeline is the set {X, Y, ... Z} connected together, and vice versa.] (2p1;
cf. prop. <ptr target="def.pipeline"/>)</item>
<item>[A connection between components is a data flow.] (2p1)</item>
<item id="out.to.in">?! In a data flow, an output flows to an input. 
(Or perhaps: &iquest;Data flows connect data sources to data sinks?)
(2p1)</item>
</list>
<p rend="meta-xproc">Note that as reflected in proposition <ptr
target="out.to.in"/>, the spec as written does not cover the case of
[pipeline] input flowing to [component] input or [component] output to
[pipeline] output.  The spec might be clearer if it consistently
reserved the words <mentioned>input</mentioned> and
<mentioned>output</mentioned> for named ports and used terms
like <mentioned>(data) source</mentioned> for possible
sources of data (pipeline inputs and component outputs)
and <mentioned>(data) sink</mentioned> for possible consumers of
data (pipeline outputs and component inputs).</p>
<list type="propositions">
<item id="acyclic..flow">Data flow, viewed as a relation on
components, is acyclic.  That is, no component ever
reads or can read its own output, directly or indirectly. (2p1)</item>
<item>Every pipeline is a construct. (2p1)</item>
</list>

<p>Section 2 paragraph 2:
<q type="block">
<p>
The result of evaluating a pipeline is the result of evaluating the
components that it contains, in the order determined by the
connections between them. A pipeline must behave as if it evaluated
each component each time it occurs. Unless otherwise indicated,
implementations must not assume that components are functional (that
is, that their outputs depend only on their explicit inputs and
parameters) or side-effect free.</p></q>
</p>
<p rend="meta-xproc">What does <q>each time it occurs</q> mean?
How could it be made precise?</p>

<list type="propositions">
<item>A pipeline may be <term>evaluated</term>. (2p2)</item>
<item>[[The result of evaluating something is its <soCalled>value</soCalled>.]] (2p2)</item>
<item>The value of a pipeline is the <q>result of evaluating 
[its] components, in the order determined by the connections
between them</q> . (2p2)</item>
<item>The connections between components determines an order. (2p2)</item>
<item>Q: What does <q>the order determined by the connections
between [components]</q> mean?  Is it a total or partial
order? (2p2)</item>
<item>Components may produce output that depends on things other
than their inputs and parameters (i.e., components need
not be mathematical functions). 
(2p2)</item>
<item>Components may produce side effects. (2p2)</item>
<item>[[If X and Y produce side effects when evaluated, the order 
in which they are evaluated is detectable.  Components may
produce side effects.  It follow that the order in which components
are evaluated may be detectable.]] (2p2)</item>
</list>
</div>
<div>
<head>Steps, constructs, and subpipelines</head>

<!--*
<p>Section 2.1 paragraph 1:
<q type="block"> 
<p>Steps are the basic computational units of a pipeline. [Definition:
A step is an atomic component that performs a unit of XML processing,
such as XInclude or transformation.] Steps can perform arbitrary
amounts of computation but they are indivisible from the point of view
of the construct that contains them. Steps carry out fundamental XML
operations. An XSLT step, for example, performs XSLT processing; a
validation step validates one input with respect to some schema, etc.
</p>
</q>
</p>

<p>Section 2.1 paragraph 2:
<q type="block">
<p>
Language constructs, on the other hand, control and organize the flow
of documents through a pipeline, reconstructing familiar programming
language functionality such as conditionals, iterators and exception
handling. As such, they typically contain components, whose evaluation
they control.
</p>
</q>
</p>
*-->
<!--*
<p>Section 2.1 paragraph 3:
<q type="block">
<p>
[Definition: A construct is a component that contains additional
components. That is, a construct differs from a step in that its
semantics are at least partially determined by the components that it
contains.]
</p>
</q>
</p>
*-->
<p>Section 2.1 paragraph 4 reads:
<q type="block">
<p>
Every construct contains zero or more components. [Definition: The
components that occur directly inside a construct are called contained
components.] [Definition: A construct which immediately contains a
component is called its container.]
</p>
</q>
</p>
<p><list type="propositions">
<item>Each construct contains zero or more components. (2.1p4)</item>
<item>[X can contain Y directly or indirectly.] (2.1p4)</item>
<item>[X contains Y indirectly if there is some construct Z
such that X directly contains Z and Z contains Y directly or indirectly.] (2.1p4)</item>
<item>Containers exist. (2.2p4)</item>
<item>Construct X is the <term>container</term> of component Y if and only 
if X directly contains component Y. (2.1p4)</item>
<item>[Every container is a construct; every construct that contains at least
one component is a container.] (2.1p4)</item>
<item>Contained components exist.</item>
<item>A component X is a <term>contained component</term> if there is some 
component Y which directly contains X. (2.1p4)</item>
<item>&iquest;Every component in a pipeline, except possibly the pipeline
itself, is a contained component? (2.1p4)</item>
<item>Q: Do components exist outside of pipelines?  Is every component
that is not a pipeline a contained component? (2.1p4)</item>
</list>
</p>
<p>Section 2.1 paragraph 5:
<q type="block">
<p>
[Definition: The components (and the connections between them) within
a container form a subpipeline.] Each construct determines how and
which, if any, of its subpiplines is evaluated.
</p>
</q>
</p>
<list type="propositions">
<item>Subpipelines exist.</item>
<item>Q: Is a subpipeline a pipeline?</item>
</list>
<p>Note that section 3.1 of the spec makes clear that subpipelines
are not pipelines:  <q>a pipeline <hi rend="bold">must not</hi> itself be a contained
component.</q></p>
<list type="propositions">
<item>The set of components contained &iquest;directly or indirectly? by
a construct, together with their connections, constitute a <term>subpipeline</term>.
(2.1p5)</item>
<item id="ct..construct.subpipeline.eq.1">[It follows that 
each construct contains one subpipeline.]
(2.1p5)</item>
<item>Each construct determines which of its subpipelines to evaluate.
(2.1p5)</item>
<item id="ct..construct.subpipeline.eq.n">[It follows that 
each construct may contain more than one subpipeline.]
(2.1p5)</item>
<item>When a construct contains more than one subpipeline, one or 
more of the subpipelines may remain unevaluated; one or more may be
evaluated; (2.1p5)</item>
</list>
<p rend="meta-xproc">Note that propositions <ptr target="ct..construct.subpipeline.eq.1"/>
and <ptr target="ct..construct.subpipeline.eq.n"/> contradict each
other.  This leads to a question:</p>
<list type="propositions">
<item>Q: can constructs contain just one subpipeline, or several? (2.1p5)</item>
</list>
<p>Section 2.1 paragraph 6:
<q type="block">
<p>
Steps and constructs have &ldquo;ports&rdquo; into which inputs and outputs are
connected. Each component has a number of input ports and a number of
output ports, all with unique names. A component can have zero input
ports and/or zero output ports. (All components have an implicit
standard output port for reporting errors that must not be declared.)
</p>
</q>
</p>
<list type="propositions">
<item>Ports exist. (2.1p6)</item>
<item>Names exist. (2.1p6)</item>
<item>Every port has a name. (2.1p6)</item>
<item id="uniq.portname">?! All ports have unique names. (What does this mean?)  (2.1p6)</item>
<item>&iquest;No two names are identical.
Therefore, every name is unique.
Therefore, every port which has a name, has a unique name? (2.1p6)
</item>
<item>&iquest;No two ports in the universe have the same name? 
(2.1p6)</item>
<item>&iquest;No two ports on the same component have the same name? 
(Is this what <q>all with unique names</q> means?)  (2.1p6)</item>
<item>&iquest;No two ports in the same pipeline have the same name? 
(2.1p6)</item>
<item>Some ports are input ports. (2.1p6)</item>
<item>Some ports are output ports. (2.1p6)</item>
<item>&iquest;Every port is an input port or an output port? (2.1p6)</item>
<item>&iquest;No port is both an input port and an output port? (2.1p6)</item>
<item>Each component has zero or more input ports. (2.1p6)</item>
<item id="c.zm.output">Each component has zero or more output ports. (2.1p6)</item>
<item id="ex.stderr">Each component has a predefined output port for error
reporting. (Section 3.6 of the spec supplies the name <q><code>#error</code></q>
for this predefined port.) (2.1p6)</item>
<item id="c.om.output">[It follows that each component has one or more output ports.] (2.1p6)</item>
</list>
<p>Note that propositions <ptr target="c.zm.output"/> and <ptr target="c.om.output"/>
put different lower bounds on the number of outputs for a component.  This
leads to the question:</p>
<list type="propositions">
<item>Q:  is the <q>implicit
standard output port for reporting errors</q> an <term>output port</term>
as that term is intended to be used? (2.1p6)</item>
</list>
<p>Section 2.1 paragraph 7:
<q type="block">
<p>
Components have any number of parameters, all with unique names. A
component can have zero parameters.
</p>
</q>
</p>
<list type="propositions">
<item id="ex.p">Parameters exist. (2.1p7)</item>
<item>Parameters have names. (2.1p7)</item>
<item>?! All parameters have unique names. (What does this mean?)  (2.1p7)</item>
<item>&iquest;No two names are identical.
Therefore, every name is unique.
Therefore, every parameter which has a name, has a unique name? (2.1p7)
</item>
<item>&iquest;No two parameters in the universe have the same name? 
(2.1p7)</item>
<item>&iquest;No two parameters on the same component have the same name? 
(Is this what <q>all with unique names</q> means?)  (2.1p7)</item>
<item>&iquest;No two parameters in the same pipeline have the same name? 
(2.1p7)</item>
<item id="c.zm.param">Every component has zero or more parameters. (2.1p7)</item>
</list>
</div>
<div>
<head>Inputs and outputs</head>
<!--*
<p>Let's see if we can point to item <ptr target="c.zm.param" type="propnum"/></p>
*-->
<!--*
<p>Section 2.2 paragraph 1 reads:
<q type="block">
<p>Although some kinds of components can read and write non-XML
resources, what flows between components as inputs and outputs are
exclusively XML documents or sequences of XML documents. Each XML
document (or document in a sequence) must be an [Infoset] with a
Document Information Item at its root. The inputs and outputs can be
implemented as sequences of characters, events, or object models, or
any other representation the implementation chooses.
</p>
</q>
</p>
*-->
<p>Section 2.2 paragraph 2 reads:
<q type="block">
<p>
It is a dynamic error if a non-XML resource is produced on a component
output or arrives on a component input.
</p>
</q>
</p>
<list type="propositions">
<item id="ex.errors">Errors exist. (2.2p2, cf. prop. <ptr target="ex.msgs"/>)</item>
<item>Some errors are dynamic. (2.2p2)</item>
<item>&iquest;An error may occur when a pipeline is evaluated? (2.2p2)</item>
<item>A dynamic error occurs when non-XML input flows through / arrives at an
input or output during pipeline evaluation. (2.2p2)</item>
<item>Q: Who is responsible for raising errors:  the component or
the pipeline framework or ... ? (2.2p2)</item>
</list>
<!--*
<p>Section 2.2 paragraph 3 reads:
<q type="block">
<p>
An implementation may make it possible for a component to produce
non-XML output&mdash;for example, writing a PDF document&mdash;but that output
cannot flow through the pipeline. Similarly, one can imagine a
component that takes no pipeline inputs, reads a non-XML file from a
URI, and produces an XML output. But the non-XML file cannot be an
input to a component or pipeline.
</p>
</q>
</p>*-->
<p>Section 2.2 paragraph 4 reads:
<q type="block">
<p>
Each component declares its input and output ports. [Definition: The
input ports declared on a component are its declared inputs.] 
[Definition: The output ports declared on a component are its declared
outputs.]
</p>
</q>
</p>
<p rend="meta-xproc">Each component? or each component type?
Or both?  2.2p6 suggests that steps <emph>cannot</emph> declare
their own ports, but must inherit their signature from their step type.</p>
<p>It's not clear from this whether the <term>inputs</term> and
<term>outputs</term> of a component, as referred to from time to
time in the spec, are only the declared inputs and outputs,
or whether the terms are sometimes used both of the declared
and of possibly additional undeclared data read or written.</p>
<p>The different is salient only if we wish to model interactions
of a pipeline with the outside world beyond those identified by
the declared inputs and outputs of the pipline or its
components. If we do, then the following propositions may
be salient:</p>
<list type="propositions">
<item>A component (? or only a step?) may [or: must not] read XML resources other than
its inputs. (2.2p4)</item>
<item>A component (? or only a step?) may [or: must not] write XML resources other than
its outputs. (2.2p4)</item>
</list>
<p>Section 2.2 paragraph 5 reads:
<q type="block">
<p>
All of the declared inputs of a component must be connected to outputs
in the pipeline. It is a static error if a component has an input port
which is not connected. Unconnected output ports are allowed; any
documents produced on those ports are simply discarded.
</p>
</q>
</p>
<list type="propositions">
<item id="conn..input.flow">Every component input 
is connected (by a data flow) to a data source. (2.2p5)</item>
<item>A static error occurs when [a pipeline is so constructed that]
a component input is not connected by a data flow to a
data source. (2.2p5)</item>
<item>Every component output is a data source and may be connected through a data flow
to a data sink. (2.2p5)</item>
</list>
<p>Section 2.2 paragraph 6 reads:
<q type="block">
<p>
[Definition: The signature of a component is the set of inputs,
outputs, and parameters that it is declared to accept.] Each type of
step (e.g. XSLT or XInclude) has a fixed signature, declared globally
or built-in, which all its instances share, whereas each instance of a
construct has its own signature declared locally.
</p>
</q>
</p>
<list type="propositions">
<item>Signatures exist. (2.2p6)</item>
<item id="ex.declarations">&iquest;Declarations exist? (2.2p6)</item>
</list>
<p rend="meta-alloy">Proposition <ptr target="ex.declarations"/> does
not mean that any Alloy specification that models declarations will
necessarily provide an Alloy signature for declarations; the effect of
declarations may be implicit rather than explicit in the model. The
proposition is here only to allow us to speak as needed about the
properties of declarations without being guilty of Meinongianism and
ascribing properties to things that do not exist.</p>
<list type="propositions">
<item>[Component types exist.] (2.2p6)</item>
<item id="cict">[Components are instances of component types.] (2.2p6)</item>
</list>
<p rend="meta-xproc">Section 1 of the spec is a bit vague
about whether the term <mentioned>component</mentioned> denotes a
particular instance of a component type (e.g. one use of XInclude), or
a multiply usable object which appears in an infinite number of
pipelines.  Here it becomes clear that XInclude, XSLT, etc., are to be
taken as types of components, not as individual components.</p>
<p rend="meta-xproc">Note, however, that the difference between 
components and component types is not simply that there is
one of the latter, and one of the former per use of the type
in any pipeline.  The model as described in the
spec does not require that a particular component use (or:
instance) be part of only a
single pipeline.  Pipelines are sets of components (or more precisely,
pipelines are (C,F) pairs with C a set of components).  Any object can
be a member of more than one set; it's rather difficult to prevent it.
It's true that allowing components to appear in more than one
unrelated pipeline seems likely to confuse some users, if only because
there doesn't seem to be any principled way to tell when two pipelines
should contain the <emph>same</emph> XInclude component, and when they
should contain different ones.  Some may regard this as an argument in
favor of revising the design and abandoning the attempt to postulate 
an abstraction layer independent of and unrelated to the XML transfer 
syntax.</p>
<p rend="meta-xproc">If pipelines and components are XML elements,
then it follows trivially for the usual interpretation of XML element
identity that every component has at most one (direct) container.  If
there is a separate abstraction layer, then either we must mirror that 
relation at the abstraction layer, although there doesn't seem to be
any motivation for it, or we must explain when different XML elements
in the transfer syntax correspond to the same component, and when they
correspond to distinct components.  Or else we must explain that the
mapping from XML to components is indeterminate.</p>
<!--* There is no
more need for a given component to be a part only of a single pipeline
than there is for any given object to be a member only of a single set
of objects. *-->

<list type="propositions">
<item>Every component has a signature. (2.2p6)</item>
<item>Every step type has a signature. (2.2p6)</item>
<item id="def0.component-type-sig">&iquest;The signature of a component type 
is a triple (I,O,P) where <list>
<item>I is a set of names of input ports.</item>
<item>O is a set of names of output ports.</item>
<item>P is a set of names of parameters.</item>
</list>? (2.2p6, cf. <ptr target="def1.component-type-sig"/>)</item>
<item id="def0.component-sig">&iquest;The signature of a particular component (an instance
of a component type) is a triple (F,T,V), where<list>
<item>F is a set of declarations which connect named input ports to data flows 
[or: to other ports].</item>
<item>T is a set of declarations which connect named output ports to data flows 
[or: to other ports].</item>
<item>V is a set of bindings which associate values with (parameter) names.</item>
</list>? (2.2p6)</item>
</list>
<p rend="meta-xproc">These two formulations (<ptr target="def0.component-type-sig"/> and
<ptr target="def0.component-sig"/>) assume that when an input,
output, or parameter is declared, what happens is that a name is
reserved for the port or parameter.  That seems a plausible
assumption, but it appears to be too simple; paragraph 8 of section
2.2 (see below) indicates that each port is declared as accepting (or
producing) a single document or a sequence of documents. There appears
to be nothing in the spec that describes declarations in more detail
at the abstract level.  
</p>
<!--* To do: see if the syntax description helps at all. *-->
<list type="propositions">
<item id="eq.step-sig.steptype-sig">The signature of a step instance
is the same as the signature of its step type. (2.2p6)</item>
<item>[It follows that for any steps S1 and S2, if
S1 and S2 have the same step type, then S1 and S2 have the same
signature.] (2.2p6)</item>
</list>
<p>Note that proposition <ptr target="eq.step-sig.steptype-sig"/>
is incompatible with proposition <ptr target="def0.component-sig"/>.
Paragraph 2.2p7 seems to make clear that the problem lies with proposition
<ptr target="def0.component-sig"/>                                                                           .</p>
<list type="propositions">
<item id="local.construct-sig">The signature of a construct
is declared locally. (2.2p6)</item>
<item>[It follows that for any constructs C1 and C2, if
C1 and C2 have the same component type (or if they do not), 
their signatures may be
identical or different.] (2.2p6)</item></list>
<p>Section 2.2 paragraph 7 reads:
<q type="block">
<p>
[Definition: A component matches its signature if and only if it
specifies an input for each declared input and it specifies no inputs
that are not declared; it specifies a parameter for each parameter
that is declared to be required; and it specifies no parameters that
are not declared.] In other words, every input and required parameter
must be specified and only inputs, outputs, and parameters that are
declared may be specified. Outputs and optional parameters do not have
to be specified.
</p>
</q>
</p>
<p rend="meta-xproc">This paragraph raises a few questions. It appears
incompatible with proposition
<ptr target="def0.component-sig"/> (or to be more precise, if
proposition <ptr target="def0.component-sig"/> is accepted, this
paragraph appears to be incomprehensible).  That suggests that
proposition <ptr target="def0.component-sig"/> reflects a possibly
serious misunderstanding of the text of the spec and will need
reformulation or deletion.</p>
<list type="propositions">
<item>A component may specify an input for a declared input [port].
(2.2p7)</item>
<item>A component must not specify any inputs except for declared
input [ports]. (2.2p7)</item>
<item>Parameters may be required [or optional]. (2.2p7)</item>
<item>?! A component may specify a parameter for a declared parameter.
(2.2p7) [[&iquest;Perhaps this means: A copmonent may specify a value
for / bind a value to a declared parameter.]]</item>
</list>
<p rend="meta-xproc">Terminology problem:  the phrase <q>specifies a parameter
for each parameter</q> sounds nonsensical; although some sensible meanings
for the phrase can be imagined, they make unreasonable demands on the
reader's mental agility.  Ideally, the term <mentioned>parameter</mentioned>
would be used for one but not both of the senses used here; if that's
not feasible, we should avoid using the term twice in different and
complementary senses in a four-word span.</p>
<list type="propositions">
<item id="spec..component.output">&iquest;A component may specify an
output for a declared output [port]? (2.2p7)</item>
<item  id="subset..component-output.signature-output">&iquest;A
component must not specify any outputs except for declared output
[ports]? (2.2p7)</item>
<item>A component may match a signature.</item>
<item>A component matches a signature iff:<list type="bullets">
<item>The set of inputs specified by the component is a subset of the
inputs declared in the signature, and a superset of the inputs
declared in the signature as required.</item>
<item>&iquest;The set of outputs specified by the component is a subset of the
outputs declared in the signature?</item>
<item>The set of parameters specified by the component is a subset of
the parameters declared in the signature, and a superset of the
parameters declared in the signature as required.</item>
</list>
</item>
</list>

<p rend="meta-xproc">Note that the text explicitly entails the notion
that a component not only can be (must be) connected 
(as specified in prop. <ptr
target="conn..input.flow"/>) but can <q>specify an input</q>.  This
appears to be either (1) a failure to enforce the boundary between the
abstraction layer, in which input ports either are or are not
connected to data flows, and the XML syntax layer, in which directions
for how to connect things are given, or else (2) an attempt to define
an abstraction layer not only for sets of connected components but
also for the descriptions of connections.  In the latter case, the
abstraction layer will also need to include a definition of the
process of constructing connections in accord with the descriptions
(or not in accord with them, in the case of error conditions). Neither
of these seems plausible.</p>
<!--*<p rend="meta-xproc">If we define pipelines and components as XML
elements, this problem appears to me to be simpler:  the pipeline
document specifies connections, and as part of evaluation a pipeline
processor actually makes the connections, either successfully or (in
the case of error) unsuccessfully.</p>
*-->
<p rend="meta-xproc">Note also that the text provides two formulations
linked by <q>In other words,</q> but the two formulations do not
say the same thing.  Thesecond entails propositions
not included in the first, e.g. propositions
<ptr target="spec..component.output"/> and
<ptr target="subset..component-output.signature-output"/>.
</p>
<p>Section 2.2 paragraph 8 reads:
<q type="block">
<p>
Each input and output is declared to accept or produce either a single
document or a sequence of documents. It is not a static error to
connect a port that is declared to produce a sequence of documents to
a port that is declared to accept only a single document. It is,
however, a dynamic error if the former component actually produces
more than a single document at run time.
</p>
</q>
</p>
<list type="propositions">
<item>A declaration may identify an input or output port as a
sequence, or as a single document. (2.2p8)</item>
<item>Every input or output port is declared either as a sequence, or
as a single document, not both. (2.2p8)</item>
<item>[A single document is indistinguishable from a sequence of
documents with length 1.] (2.2p8)</item>
<item>[During evaluation, a port declared for a sequence of documents
may produce or receive a single document.] (2.2p8)</item>
<item>A dynamic error occurs when during evaluation of a pipeline a
port declared for a single document <!--* emits or *-->
receives a sequence of documents with length greater than one. (2.2p8)</item>
<item>Q:  Does a dynamic error occur when a port declared for a single
document receives an empty sequence of documents (i.e. no document)?
(2.2p8)</item>
</list>
<p>The information in 2.2p8 allows us to reformulate proposition
<ptr target="def0.component-type-sig"/> more correctly:</p>
<list type="propositions">
<item>The keywords <kw>one</kw> and <kw>sequence</kw> exist.</item>
<item>The keywords <kw>required</kw> and <kw>optional</kw> exist.</item>
<item id="def1.component-type-sig">&iquest;The signature of a component type 
is a triple (I,O,P) where <list>
<item>I is a functional mapping from names (of input ports) to the
set {<kw>one</kw>, <kw>sequence</kw>}; the declared input
ports are those with names in the domain of I.</item>
<item>O is a functional mapping from names (of output ports) to the
set {<kw>one</kw>, <kw>sequence</kw>}; the declared output
ports are those with names in the domain of O.</item>
<item>P is a functional mapping from names (of parameters) to the
set {<kw>required</kw>, <kw>optional</kw>}; the declared parameters
are those with names in the domain of P.</item>
</list>? (2.2p6 + 2.2p8, cf. prop. <ptr target="def0.component-type-sig"/>)</item>
</list>
<p>Section 2.2 paragraph 9 reads: 
<q type="block"> 
<p>Steps may also produce error, warning, and informative
messages. These messages appear on a special &ldquo;error
output&rdquo; that is available in the catch clause of a
try/catch. </p> </q>
</p>
<list type="propositions">
<item id="ex.msgs">Messages exist. (2.2p9, 
cf. prop. <ptr target="ex.errors"/>)</item>
<item id="ex.emsgs">Some messages are error messages. (2.2p9)</item>
<item id="ex.wmsgs">Some messages are warning messages. (2.2p9)</item>
<item id="ex.imsgs">Some messages are informative messages. (2.2p9)</item>
<item>[Every message is an error message, a warning message, or
an informative message.] (2.2p9)</item>
<item>[No message is nore than one of: an error message, a warning message, 
an informative message.] (2.2p9)</item>
<item>Messages flow through the <q>implicit standard output port
for reporting errors</q>. (2.2p9, cf. prop. <ptr target="ex.stderr"/>)</item>
<item>[It follows that messages are XML documents.] (2.2p9)</item>
<item>[Every try/catch construct has an input port on its
catch clause &iquest;which is connected to the <q>implicit standard output port
for reporting errors</q> of the components (directly?) contained
in the &iquest;try clause of the try/catch clause?] 
(2.2p9, cf. prop. <ptr target="ex.stderr"/>)</item>
</list>
<p>
</p>
</div>
<div>
<head>Parameters</head>
<p>Section 2.3 paragraph 1 reads:
<q type="block">
<p>[Definition: A parameter is a QName/value pair.] The value of a
parameter must be a string. If a document, node, or other value is
given, its (XPath 1.0) string value is computed and that string is
used.</p> </q></p>
<p rend="meta-xproc">Here once more, as in 2.2p7, the spec is either
violating the abstraction boundary by bringing representation-level
issues (the type of value denoted by an XPath expression before type
coercion) into a discusison of the abstraction layer, or else
postulating an abstraction layer which models not only correct
pipelines but the process of mapping from the XML representation to
the abstraction layer.  For now, I assume that there is no need
for any representation at the abstraction level of the coercion 
of XML documents or nodes into Unicode strings.</p>
<list type="propositions">
<item>A parameter is a pair (Q,V), where Q is a QName and V (the
<soCalled>value</soCalled>) is a Unicode string.<note place="foot">The
reader unfamiliar with XPath 1.0 may be puzzled by the formluation
<q>a Unicode string</q>; the spec describes the value of a parameter
as an <q>(XPath 1.0) string value</q>, and that entails that it is a
Unicode string.  It also entails that some Unicode characters not
appear, namely those not legal in an XML document, but it seems 
unnecessary to model that fact or to mention it
in the paraphrase of the spec.</note> (2.3p1)</item>
</list>
</div>
<div>
<head>Connections</head>
<p>Section 2.4 paragraph 1 reads:
<q type="block">
<p>Components are connected together by their inputs and
outputs. [Definition: Components A and B are connected if they are
either directly or indirectly connected. Component A is directly
connected to B if an output of A is associated with an input port of
B. Component A is indirectly connected to B if there is a chain of
directly connected components that allows traversal from A to B.]
</p>
</q>
</p>
<p>Cf. paragraph 1p2 and 
propositions <ptr target="consume..component.xml"/> through
<ptr target="consume-or-discard"/>.</p>

<p>Note that the relation <ident>directly-connected</ident> is not
symmetric: A and B are connected if data flows from A to B, but not if
data flows from B to A.  And since data flow is acyclic (prop. <ptr
target="acyclic..flow"/>), the relation is not only not symmetric, but
asymmetric:  if A &rarr; B is in the relation, then B &rarr; A is
not.  The relation <ident>connected</ident> corresponds
to reachability via a directed path in a directed graph, not
to connectedness in the corresponding undirected graph.</p>
<list type="propositions">
<item>A component A is connected to a component B iff
A is directly connected to B or A is indirectly connected to B.
(2.4.p1)</item>
<item>Two components A and B are directly connected iff
an output of A is connected to an input of B. (2.4.p1)</item>
<item>Two components A and B are indirectly connected iff
either A is directly connected to B,
or A is directly connected to some third component C which is 
indirectly connected to B. (2.4.p1)</item>
</list>

<p rend="meta-xproc">Since most treatments of graph theory define
graph traversal in such a way that for any vertex V, V is reachable
from V (by a path of length 0), the definition of indirect connection
offered in 2.4p1 may be taken either as reflexive (if A and B are the
same component, then there is a chain of length 0 connecting them) or
as irreflexive (as in the preceding paragraph).  Hence question
<ptr target="connected..a.a.ka"/>.</p>

<list type="propositions">
<item id="connected..a.a.ka">Q: is A connected to A?</item>
</list>

<p rend="meta-xproc">I take the intent to be that the relation should
be irreflexive; the definition should probably be rephrased to say
<q><add>Two distinct</add> components A and B are connected if
<add>and only if</add> ...</q> or <q>traversal from A to
B along a path with length > 0</q>
or <q>[Definition: Components A and B
are connected if <add>and only if</add> they are either directly or
indirectly connected. Component A is directly connected to B if
<add>and only if</add> an output of A is associated with an input port
of B <add>and A &neq; B</add>. Component A is indirectly connected to
B if <add>and only if a &neq; B and</add> there is a chain of directly
connected components that allows traversal from A to B.]</q>  I have
also turned the conditionals into biconditionals on the theory that
normally definitions should be biconditionals.</p>

<p>Note also that <mentioned>directly connected</mentioned> and
<mentioned>indirectly connected</mentioned> are not complementary; the
definition of <mentioned>indirectly connected</mentioned> makes it
synonymous with <mentioned>connected</mentioned>, since
if any components A and B are directly connected, then there is 
a chain of components (consisting of A and B in sequence)
which allows traversal from A to B, and therefore A and B
are indirectly connected.</p>
<p>Section 2.4 paragraph 2 reads:
<q type="block">
<p>
With respect to connected components, we can speak of one component
being either before or after another. [Definition: Component A is
before component B if component B is a contained component of
component A, either directly or indirectly, or if any output from
component A is connected to any input of component B, either directly
or indirectly.] [Definition: Component A is after component B if
component B is the container for component A (or an ancestor of such a
container) or if any output from component B is connected to any input
of component A, either directly or indirectly.]
</p>
</q>
</p>
<p rend="meta-xproc">Several editorial problems here.
<list type="bullets">
<item>The first sentence of the paragraph suggests that the terms
<mentioned>before</mentioned> and <mentioned>after</mentioned> are
relations on pairs of connected copmonents. But the terms are defined
to apply also to some unconnected pairs.  The first sentence could
probably be deleted without tears.</item>
<item rend="meta-xproc">It's not clear why the definition of
<mentioned>before</mentioned> and <mentioned>after</mentioned> speaks
of inputs and outputs being connected (which is not, as far as I can
tell, a defined term), instead of using the definition of connected
components presented in the preceding paragraph.  For purposes of this
working paper I am going to assume that this is a simple editorial
slip rather than a subtle trap, and that <q>if any output from
component A is connected to any input of component B, either directly
or indirectly</q> is intended to have the same meaning as
<q>if A is connected to B</q>.</item>
<item>I believe that it would be shorter, and come to the same
thing, if the second definition were: <q>Given two components
A and B, A is after B if and only if B is before A.</q>  If
the relation <ident>after</ident> is anything but the inverse
of the relation <ident>before</ident>, then the text needs to
highlight the difference.</item>
<item>The term <mentioned>ancestor</mentioned> is not defined for
abstract components; it appears to be an irruption from the XML
level.</item>
<item>The phrase <q>a contained component of component A,
either directly or indirectly</q> would be unnecessary if
at the appropriate point in 2.1 the spec defined containment
as transitive (which would be natural).  The definition of
<mentioned>container</mentioned> could either be left unchanged
or generalized; in the latter case those uses which need to
be changed could be changed to say <q>immediate container</q>
and the others could be simplified by deleting phrases
like <q>or an ancestor of such a container)</q>.  The
relevant part of paragraph 2.4p2 could then be recast as
<q>Component A is <term>before</term> a distinct
component B if and only if either A contains B or
A is connected to B.  Component A is <term>after</term>
B if and only if B is before A.</q></item>
<item>Would it kill us to say that <mentioned>before</mentioned>
defines a partial order? Or (equivalently) that given any two
components A and B, either A is before B or B is before A or neither
is before the other?</item>
</list>
</p>
<list type="propositions">
<item>For any two components A and B, 
A precedes (is before) B iff either A contains B or
A is connected to B.</item>
<item>For any two components A and B, 
A follows (is after) B iff B precedes A.</item>
<item>Q.  Is a pipeline guaranteeed to be connected by the <ident>before</ident>
relation?</item>
</list>

<p>Section 2.4 paragraph 3 reads:
<q type="block">
<p>
It is a static error if a component is either before or after itself.
</p>
</q>
</p>
<list type="propositions">
<item>A static error occurs when [a pipeline is so constructed
that] some component is before itself. (2.4p3)</item>
</list>
<p rend="meta-xproc">If <ident>after</ident> is the inverse of
<ident>before</ident>, then <q>either before or after</q> is redundant
(and uses seven syllables when two will do &mdash; have you been
reading the HyTime spec again? Stop that!), and makes this reader
wostop to wonder whether he misread paragraph 2.4p2:  <q>If 2.4p3 says
<q>either before or after</q>, then perhaps the two relations are not
inverses after all?  Better go back and reread 2.4p2 again, to make
sure.</q>  Other readers, of course, may stumble if 2.4p3 just says
<q>before</q>.  In choosing the formulation, I lean toward 
serving the readers to take the definitions seriously.  Also
toward two syllables not seven.</p>
</div>
</div>

<div id="constructs">
<head>Language constructs</head>
<p>Paragraph 2 of section 3 of the spec reads:<q type="block">
<p>Every component in a pipeline has five parts: a set of inputs, a
set of outputs, a set of parameters, a set of contained components,
and a context inherited from its container. [Definition: The context
is the the set of input ports, output ports, and parameter names that
are visible.]</p></q></p>
<list type="propositions">
<item>Every component has [at least?] five properties: 
a set of inputs,
a set of outputs, 
a set of parameters, 
a set of contained components, and
a context.</item>
</list>
<p>This can be captured conveniently in an Alloy signature,
representing the properties of components as fields of the
signature.
If we model inputs and outputs as objects of signature Port
(left undescribed for now), we can
write:
<scrap id="component.structure.5" name="Components with five propoerties">
abstract sig Component {
  ins : set Port,
  outs : set Port,
  parms : set Parameter,
  components : set Component,
  context : Context
}
abstract sig Step {}{
  components = none
}
abstract sig Construct {}
</scrap></p>
<list type="propositions">
<item>Contexts exist.</item>
<item>Every context has three properties:
a set of input ports, a set of ouput ports, and
a set of parameter names.</item>
</list>
<p>Here, too, the Alloy expression for the structure of contexts is
simple.  In order to reduce confusion, I have named the first
two fields of the context <ident>sources</ident> (ports
which provide data to be consumed) and <ident>sinks</ident>
(ports which consume data); the latter of these corresponds to the
<ident>outputs</ident> of the spec, while the former
might correspond to the <ident>inputs</ident> of the spec:
<scrap id="s.context" name="Contexts">
sig Context {
  sources : set Port,
  sinks : set Port,
  cparms : set Parameter
}
</scrap>
</p>
<p>These declarations may become more complex, depending on how
ports and parameters are modeled.</p>
<p>The subsections of part 3 of the spec describe the
properties of each construct.  Several generalizations are
possible:<list type="bullets">
<item><p>First of all, the inputs of each type of construct
are <q>as declared</q>:</p>
<list type="propositions">
<item>For every construct, the set of inputs is the set of declared
inputs.</item>
</list>
<p rend="meta-xproc">The spec does not describe how inputs are
declared, and it is not clear to this reader whether declarations
are objects which occur in the abstraction layer, or only
in the XML layer.</p>
</item>
<item>
<p>The outputs of each type of construct
are <q>as declared</q>:</p>
<list type="propositions">
<item>For every construct, the set of outputs is the set of declared
outputs.</item>
</list>
<p>For some construct types, the cardinality of inputs or outputs
is fixed at one.</p>
</item>
<item>
<p>The parameters of each type of construct
are <q>as declared</q>:</p>
<list type="propositions">
<item>For every construct, the set of parameters is the set of declared
parameters.</item>
</list>
<p>The spec does not describe parameter declarations in detail, and
once more it's unclear whether they are visible in the abstraction
layer or not.</p>
</item>
<item>
<!--* continue with cc and context *-->
<p>The set of components directly contained by any constract
are <q>as declared</q>.</p>
<list type="propositions">
<item>For every construct, the set of contained components is
the set of declared contained components.</item>
</list>
<p>In some cases (choose, try/catch), the number of subpipelines is
fixed or otherwise described specially for the individual construct 
type.  The spec does not describe how contained components are
declared.</p>
</item>
<item><p>The context of each construct is given by a set of descriptions
relating it to the context of its container.  There are four algorithms,
which I will call <ident>inherited_context</ident> (for the normal, unmarked
case), <ident>choose_context</ident> (for use in Choose constructs),
<ident>try_context</ident> (for use in Try clauses), and
<ident>Catch_context</ident> (for use in Catch clauses).
They are described below.</p>
</item>
</list>
</p>

<p>The normal case of inherited context is to add the declared inputs,
declared outputs, and parameters of the construct to the sets
inherited from the container.  In the words of section 3.1
(substituting <q>[component]</q> for <q>pipeline</q>):
<q type="block"><p>The context of a [component] is its 
inherited context modified as follows:
<list type="bullets">
<item>All of the <term>declared inputs</term> of the [component] 
are added to the <term>outputs</term> in the context.</item>
<item>The union of all the declared outputs of the <term>contained components</term> 
are added to the <term>outputs</term> in the context.</item>
<item>All of the declared parameters of the [component] 
are added to the <term>parameters</term> in the context.</item>
</list>
This is the context used by the pipeline and inher[i]ted by its 
<term>contained components</term>.</p></q>
</p>
<p>
<scrap id="f.inherited_context"
name="Normal inherited context">
fun inherited_context(c0: Context, 
    i: set Port, 
    o: set Port, 
    p: set Parameter)
  : Context {
  one c : Context | c.sinks = c0.sinks
    and c.sources = c0.sources + i + o
    and c.cparms = c0.cparms + p
}</scrap>
</p>
<p rend="meta-alloy">Note that the definition just given assumes
that ports from distinct components are distinct, so that when we
add the input ports of this construct, they are all distinct from
any other ports already in the context.  Other parts of the
model are simpler if we model ports just as names, in which case
this definition needs to become more complex:  what need to be
recorded in the context are (component, port-name) pairs,
not just port names.</p>
<p>This algorithm is used by the Pipeline, For-each, Viewport, and
Group constructs.
A construct that uses this algorithm for determining the
context to be inherited by its containees will have a signature
fact reading something like 
<code>context = inherited_context(container.@context,
ins, components.@outs, parms)</code>,<note place="foot">The
<code>@</code>-signs are required to prevent Alloy from interpreting
the identifiers <ident>context</ident> and <ident>outs</ident>
as short-hand for <code>this.context</code> and <code>this.outs</code>,
i.e. as references to the <ident>context</ident> and <ident>outs</ident>
fields of the particular atom whose conformance to the 
signature facts is being considered. </note>
where <ident>container</ident> denotes the construct
which directly contains the construct whose context is being
described.  Without an inherited context (i.e. without a container), 
the algorithm does not determine a context.</p>
<p>
Note that both the construct's declared inputs and the
outputs of its containees are added to the <ident>outputs</ident> of
the context.  The outputs of the context do not appear here;
they will have been among the containee outputs added to the
context by the container.
</p>
<p>The <ident>choose_context</ident> algorithm is used only by the
Choose construct, but will be described here for easier comparison
with the other algorithms.  Section 3.4 of <ptr type="bibref"
target="XProc"/> reads in part:<q type="block"><p>The context of 
a choose is its inherited context modified as follows:
<list type="bullets">
<item>All of the declared inputs of the choose are added to the outputs in the context.</item>
<item>The declared outputs of (any one of) the subpipelines are added to the outputs in the context.</item>
<item>All of the declared parameters of the choose are added to the parameters in the context.</item>
</list>
This is the context used by the choose and inher[i]ted by its subpipelines.</p></q>
</p>
<p rend="meta-xproc">The second bullet item here presumably does not mean 
that a pipeline processor chooses a subpipeline at random,
calculates a context for it, and assigns that context to the components
in each of its subpipelines as their inherited context.  But that is
what it says.</p>
<p>Here I will take the second bullet as requiring that each
distinct subpipeline within the choose have its own inherited
context, which adds the parameters and inputs of the choose and the
outputs of the containees <emph>in that subpipeline</emph> to the
context.</p>
<p>It's not quite clear how to reconcile this with the other
things said in the specification.
One way for this definition to be effective would be for a
subpipeline construct to be regarded as the container for each
subpipeline within the choose.  The problem with this is that the spec
does not define a subpipeline construct to attach the different contexts
to.
Another way would be to assume that the
inherited context for the components within the different subpipelines
is not simply the <ident>context</ident> field of the container.
The major problem with this is that the spec doesn't say what the
inherited context is, in that case; a secondary problem is that in
that case, the context property of a Choose object does not seem to
have much purpose or point.</p>
<p>In the absence of further information (either a clarification of
the spec, or speculation on the part of the model builder), it's
not clear how to model this algorithm as an Alloy function.</p>
<p>The <ident>try_context</ident> and <ident>catch_context</ident> algorithms
similarly require clarification of the subpipeline concept before
they can be defined in Alloy.</p>

<p>A few difficulties with these definitions of the context
may be worth noting:<list type="bullets">
<item>The inputs of the context never change, because
nothing in any of the function definitions changes them.</item>
<item>Each definition requires having an inherited context to start
from, but the spec does not say how the outermost context is
initialized.</item>
<item>As noted, the treatment of inheritance in Choose and Try/Catch
is unclear; it seems to rely both on having an explicit subpipeline
construct in the abstraction layer and on not having such a
concept.</item>
</list>
</p>
<div>
<head>Pipeline</head>
<p>As described in section 3.1 of <ptr target="XProc" type="bibref"/>,
pipelines have inputs, outputs, parameters, and containees
as declared.  They use the <ident>inherited_context</ident> algorithm
for calculating their context.  In Alloy terms:
<scrap id="a.pipeline" name="Pipelines">
sig Pipeline {
}{
  context = inherited_context(container.@context, 
               ins, components.@outs, parms)
}
</scrap>
</p>

<p>As noted above, the <ident>inherited_context</ident> algorithm
requires that the pipeline have a container.  This will not be true
for the outermost pipeline, but it will be true for pipelines
contained in other constructs, if section 3.1 of the spec is intended
to describe nested subpipelines. The spec seems to contradict itself
on this point.  The first paragraph of 3.1 reads <q type="block">A
pipeline encapsulates the behavior of a subpipeline.</q> while the
final paragraph reads <q type="block">There is one additional
constraint imposed on pipelines: a pipeline <hi rend="bold">must
not</hi> itself be a <term>contained component</term>.</q>
It may be possible to reconcile these two statements in some
way, but I cannot see a way to do so.</p>
<p>The consequence is that any formal model of constructs will require
that we go somewhat beyond the design as expounded in the spec.  If
we assume that the Pipeline abstraction is indeed to model
both the outermost pipeline and also subpipelines within other
constructs, then we might represent it in Alloy with 
definitions something like the following.</p>
<p>
First, we need to revise the definition in scrap
<ptr target="component.structure.5"/>, to provide a convenient
way to refer to a component's container; we declare
the field <code>container : lone Component</code> to indicate
that some components (outermost pipelines) lack containers,
while others (everything else) have exactly one.
<scrap id="component.structure.6" name="Components with six propoerties">
abstract sig Component {
  ins : set Port,
  outs : set Port,
  parms : set Parameter,
  container : lone Component,
  components : set Component,
  context : Context
}
</scrap></p>
<p>We can now define Pipeline so as to describe both outermost pipelines
and subpipelines; for convenience, we define them as two distinct
subtypes of Construct.  For outermost pipelines, I conjecture
a context whose inputs (data sinks) is the empty set<note place="foot">Note
that this conjecture has the consequence that for any 
context within the pipeline, the <ident>inputs</ident>
are the empty set.</note>,
whose outputs (data sources) are just the inputs (<emph>not</emph> the
outputs) of the pipeline,
and whose parameters are those of the pipeline:
<scrap id="sig.OuterPipeline" name="Outer pipelines">
sig OuterPipeline extends Pipeline {
}{
  context.sources = ins,
  context.sinks = outs,
  context.cparms = parms,
  container = none
}
</scrap>
</p>
<p>Subpipelines have the same structure but different signature facts:
<scrap id="sig.subpipeline" name="Subpipelines">
sig Subpipeline extends Pipeline {
}{
  context = inherited_context(container.@context,
              ins, components.@outs, parms)
  #container = 1  // or: some container
}
</scrap></p>
<p>Whether outermost or nested, all pipelines share one
invariant, captured in their common superclass:
the pipeline serves as the container of all 
the contained components.
<scrap id="sig.Pipelines.v2" name="Pipelines (v2)">
abstract sig Pipeline extends Construct {
}{
  all c : components | c.@container = this
}
</scrap></p>
</div>
<div>
<head>For-Each</head>
<p>A For-each construct processes a series of documents, applying
its subpipeline to each in turn.  It is required to have 
exactly one input port, and contains exactly one sub-pipeline.</p>
<p>In Alloy terms, we can model this either with an explicit
subpipeline construct, or without.  First, with an explicit
subpipeline:
<scrap id="sig.for-each" name="For-each (with explicit subpipeline)">
sig ForEach extends Construct {
}{
  context = inherited_context(container.@context,
              ins, components.@outs, parms)
  #container = 1  // or: some container
  <ptr target="sig.for-each-more-with"/>
}
</scrap>
Modeled, in this way, a For-each must contain exactly one other component,
namely a subpipeline:
<scrap id="sig.for-each-more-with" name="Signature constraints for For-each with subpipeline">
  #components = 1
  components in Subpipeline
</scrap>
At the XML level, there is no nested <ident>pipeline</ident> element,
and so the subpipeline cannot have any declared ports or parameters,
in any pipeline created from a valid pipeline document.  This constraint has
no particular motivation at the abstract level, but if we wish to
match the XML constraint, we can:
<scrap prev="sig.for-each-more-with" name="Signature constraints for For-each with subpipeline">
  components.ins = none
  components.outs = none
  components.parms = none
  <ptr target="sig.for-each-more"/>
</scrap>
</p>
<p>Finally, For-each constructs are required to have exactly one input port:
<scrap id="sig.for-each-more" name="Additional signature constraints for For-each">
  #ins = 1        // or: one ins
</scrap>
</p>
<p>It is perhaps more convenient to do without the nested subpipeline 
construct; it seems to serve little purpose.  We can model For-each
without an explicit subpipeline component, by defining the signature
as an extension of Subpipeline.  The declaration is then slightly more
concise:
<scrap id="sig.for-each-bis" name="For-each (without subpipeline)">
sig ForEach extends Subpipeline {
}{
  <ptr target="sig.for-each-more"/>
}
</scrap></p>
</div>
<div>
<head>Viewport</head>
<p>A Viewport resembles a For-each, in that it must have one
input port, contains one subpipeline, and typically runs several times.
but instead of running once for each document in an input sequence,
a Viewport runs once on each set of nodes in the (or: in each) input
document which match a selection expression.  
The Alloy descriptions of Viewport
are virtually identical to those for For-each; the differences
are (1)
the selection, which is modeled here without any internal
structure as an object of type XPathExpression, and
(2) a constraint requiring the set of output ports to be
a singleton.</p>
<p>We can model Viewport with an explicit contained subpipeline:
<scrap id="sig.Viewport.with" name="Viewport with subpipeline">
sig XPathExpression{}
sig Viewport extends Construct {
  selection : XPathExpression
}{
  context = inherited_context(container.@context,
              ins, components.@outs, parms)
  #container = 1  // or: some container
  #components = 1
  components in Subpipeline
  components.ins = none
  components.outs = none
  components.parms = none
  #ins = 1        // or: one ins
  #outs = 1       // or: one outs
}
</scrap>
</p>
<p>Or we can model Viewport without any explicit contained subpipeline:
<scrap id="sig.Viewport.without" name="Viewport without subpipeline">
sig Viewport extends Subpipeline {
  selection : XPathExpression
}{
  #ins = 1
  #outs = 1
}</scrap>
</p>
<p rend="meta-xproc">The <ident>select</ident> attribute on 
an input to a 
<ident>p:viewport</ident> seems to have a rather different
meaning from the same attribute on an input to a different
construct, e.g. <ident>p:pipeline</ident>.  The description
of the input and output of viewpors seems to contradict flatly
the description of the semantics of the <ident>select</ident>
attribute in section 4.2.2 of the XProc spec.  Section 4.2.2 
suggests that nodes not matching the <ident>select</ident>
are discarded; the description of Viewport says they are not.
Is it wise to 
represent such different semantics with the same syntax?</p>
</div>
<div>
<head>Choose</head>
<div>
<head>Structure and input of the Choose</head>
<p>The section of the XProc specification devoted to the 
abstration layer is not very informative about Choose. If,
however, we assume that the abstraction should have the same properties
as are described for the XML transfer syntax, we can 
describe it thus:  a Choose contains a set<note place="foot"><p>The
spec actually requires that the subpipelines be ordered, but
it will be more convenient to model them as a set.  The
sequence of subpipelines P1, P2, P3 with tests T1, T2, T3 
can be transformed, in the translation from XML to the
component layer, into a set of subpipelines {P1, P2, P3}
with tests P1, P2 and not P1, P3 and not P1 and not P2.</p>
<p rend="meta-xproc">The appeal to a sequence over the
subpipelines (<q>The choose considers each subpipeline 
in turn ...</q>) contradicts the statement in section
3.1 of the spec that each construct has a <emph>set</emph>,
not a <emph>list</emph>, of subcomponents.</p>
</note>
and optionally an Otherwise subpipeline.  Each When 
subpipeline has a <ident>test</ident> predicate containing
a guard expression. A When subpipeline is evaluated only if
its guard is true; the Otherwise subpipeline is evaluated
if and only if none of the guards of the When constructs
are true.</p>
<p>In addition, a Choose may specify a <soCalled>context</soCalled>
for the evaluation of the guard expressions; this has
nothing to do with the <term>context</term> described
in section 3.1 of the spec.  It appears, actually, to be
a specialized input port (and is so modeled here).  
A Choose, When, or Otherwise construct
may declare a context, and if the Choose does not,
then the When and Otherwise constructs contained by it 
must do so.</p>
<p>In Alloy terms:
<scrap id="sig.choose" name="Choose">
sig Choose extends Construct {
  XPath_context : lone Port
}{
  <ptr target="Choose.constraints"/>
}
</scrap></p>
<p>The so-called <soCalled>context</soCalled> input is simply
an input port; a Choose must have at most one.  In the
following Alloy transcription, it's named <ident>XPath_context</ident>
to avoid a name collision with the <ident>context</ident>
inherited from Component.
<scrap id="Choose.constraints" name="Signature constraints on Choose">
  XPath_context = ins
</scrap></p>
<p rend="meta-alloy">Alternatively, one could suppress the
declaration of the field <ident>XPath_context</ident> and
simply impose the constraint <code>lone ins</code>.  I've
included the declaration, for now, just to have something in
the Alloy model that connects with the term <mentioned>context</mentioned>
in the spec.</p>
<p rend="meta-xproc">Using the term <mentioned>context</mentioned>
both for the set of visible ports and parameters and also for
the single input for the subpipelines of the Choose is an
editorial oversight that should probably be fixed, because it's
certain to lead to serious confusion.</p>
<p>The components contained by a Choose must be zero or more When
subpipelines and optionally an Otherwise:
<scrap prev="Choose.constraints" name="Signature constraints on Choose">
  components in When + Otherwise
  #(Otherwise &amp; components) = 1
</scrap></p>

</div>
<div>
<head>Outputs of Choose and its subpipelines</head>
<p>Every subpipeline contained by a Choose must declare the same
outputs.  Or, more precisely, as section 4.2.9 of the spec says:
<q type="block">All of the p:when branches and the p:otherwise must 
declare the same number of output ports with the same names. It is a 
static error if they do not.</q>
If we model Ports just as sets of names, so that different
subpipelines can declare the same ports, this constraint can
be modeled simply:
<scrap prev="Choose.constraints" name="Subpipeline outputs constraint on Choose">
  all c1, c2 : Component |
    if c1 + c2 in components 
       then c1.@outs = c2.@outs
</scrap>
Most port references will then need to be references to a particular
port name on a particular components.  If, on the other hand,
ports are modeled as having, but not being, names, and if the ports
of any two distinct components are necessarily distinct, then
a more complex statement of the constraint will be necessary.</p>
<p rend="meta-alloy">The details of the constraint will depend on
the details of how ports are modeled, which will involve some
experimentation I have not yet done.  If we assume that each Port
has a Name, as in:
<scrap id="sig.Port.named" name="Named ports (sketch)">
sig Name {}
sig Port {
  name : Name
}
</scrap>
then the constraint might be expressed as:
<scrap id="Choose.outs.constraints.v2" name="Subpipeline outputs constraint on Choose (v2)">
  all c1, c2 : Component |
    if c1 + c2 in components 
       then c1.@outs.name = c2.@outs.name
</scrap>
A constraint will then be needed on Component, to ensure that the names 
of the ports on any given component
are unique.  But as noted elsewhere (see proposition
<ptr target="uniq.portname"/> and following), the
precise constraint is not clear to me from the spec.
</p>
<p>Logically speaking, the outputs of the Choose construct
ought to be those of its containees, but the spec appears not
to say this anywhere.  But I'll model it anyway.  If ports
are just names, then:
<scrap prev="Choose.constraints" name="Output constraint on Choose (first cut)">
 outs =  components.@outs
</scrap>
Since every subpipeline declares
the same set of names, the set <code>components.@outs</code>
is the same set as the <ident>outs</ident> field of any
subcomponent, and the constraint <code>outs =  components.@outs</code>
requires that the Choose itself have exactly the same set of outputs.</p>
<p rend="meta-alloy">If ports have identity beyond that of
their name, then the Alloy constraint will have to require
that the set of names on the output ports of the Choose
and that on the output ports of each subpipeline are the
same:
<scrap id="Choose.outs.v2" name="Output constraint on Choose (v2)">
 outs.name =  components.@outs.name
</scrap>
</p>
</div>
<div>
<head>Parameters of Choose</head>
<p>The XML element <ident>p:choose</ident> is not allowed to
contain <ident>p:parameter</ident> elements; if we wish to
carry this constraint over to the abstraction layer, we
can express the constraint in Alloy thus:
<scrap prev="Choose.constraints" 
       id="c.Choose.parms"
       name="Signature constraints on Choose">
 parms = none
</scrap>
</p>
</div>
<div>
<head>Calculating the context for a Choose</head>
<p>The calculation of the context for a Choose is described
thus in the spec:
<q type="block">The context of a choose is its inherited context modified as follows:
<list type="bullets">
<item>All of the <term>declared inputs</term> of the choose 
are added to the <term>outputs</term> in the context.</item>
<item>The declared outputs of (any one of) the subpipelines 
are added to the <term>outputs</term> in the context.</item>
<item>All of the declared parameters of the choose 
are added to the <term>parameters</term> in the context.</item>
</list>
</q>
I have assumed above that the set of declared inputs of a Choose
is the singleton set containing its declared <soCalled>context</soCalled>,
or else the empty set, but the spec does not say this explicitly, as
far as I can tell.  And the set of declared parameters is
guaranteed to be the empty set, if we adopt the invariant 
in code fragment <ptr target="c.Choose.parms"/>.  If we
model ports as names, then the context rule for Choose is:
<scrap prev="Choose.constraints" name="Context calculation for Choose">
  context = inherited_context(container.@context,
              ins, components.@outs, parms)
</scrap>
or equivalently:
<scrap id="Choose.context.v2" name="Context calculation for Choose, v2">
  context = inherited_context(container.@context,
              ins, outs, none)
</scrap>
</p>
<p>If ports are not just names, this will need to be more complex.
The spec says that <q>The declared outputs of (any one of) the
subpipelines are added to the <ident>outputs</ident> in
the context.</q>  But if a port is unique to a particular
component, then this could lead to unacceptable results:
if the subpipeline whose outputs are chosen is not the one
evaluated, then the data flow will not be arranged
correctly.  The static pipeline checker appears to need an
oracle of some kind to predict accurately which subpipeline
will run.</p>
<p>It would seem simpler (at this point I am leaving the text
of the spec behind and just thinking about the problem in 
general) to say that the context inherited by the subpipelines
is just the same as the context inherited by the Choose,
with the addition of the optional declared input to the
set of data sources:
<scrap id="Choose.context.v3" name="Context calculation for Choose, v3">
  context = inherited_context(container.@context,
              ins, none, none)
</scrap>
And the context transmitted from the subpipelines down to
their containees should contain the outputs declared for
each subpipeline.  (This description will work whether
ports are names or non-name objects which have names.)</p>
<p rend="meta-xproc">The wording in the spec seems to
suggest that the design currently suffers from contradictory
impulses:  some things seem to imply that at the abstract
level there are When and Otherwise subpipelines (and 
nothing else) directly contained by the Choose; other things
(like the description of context for the Choose) seem
to imply that no subpipelines are visible within the Choose,
and that it's important, as a consequence, that the Choose
(rather than the subpipeline) ensure that the output ports
of the chosen subpipeline are visible among the data
sources of that subpipeline.</p>
</div>
<div>
<head>When and Otherwise</head>
<p>The When subpipelines are normal subpipelines, except for
having a guard expression and an XPath_context to evaluate it in
(and no other inputs).  Section 4.2.9 reads in part:
<q type="block">If no context is specified on the p:when, 
the context specified on the p:choose is used. It is a static 
error if no context is specified in either place.</q>
It's simplest for now to model the static error by 
simply making it impossible.  We constrain the field
XPath_context to have a single Port as its value;
if there no input, the XPath_context is that of the
container.
<scrap id="sig.When" name="When subpipelines">
sig When extends Subpipeline {
  XPath_context : one Port  
  test : XpathExpression
}{
  lone ins
  if #ins = 0
     then XPath_context = container.XPath_context
     else XPath_context = ins
}
</scrap></p>
<p>The Otherwise subpipeline is simpler.  Section 4.2.9 says
that its XML representation must declare no inputs; we
reproduce that constraint here.
<scrap id="sig.Otherwise" name="Otherwise subpipelines">
sig Otherwise extends Subpipeline {
}{
  ins = none
}
</scrap></p>



</div>
<div>
<head>Guard expressions</head>
<p>For the moment, I'll leave the Boolean evaluation of the
guard expressions out of the model. The model thus has no
representation of the constraint that:</p>
<list type="propositions">
<item>In a Choose, at most one guard expression among the
contained When subpipelines may evaluate to true.</item>
<item>In a Choose, if no guard expression on any
contained When subpipeline evaluates to true, and
no Otherwise subpipeline is present, then it is a dynamic
error.</item>
</list>
</div>
</div>
<div>
<head>Group</head>
<p>The 'group' construct <q>encapsulates the behavior of its
<term>subpipeline</term></q>, according to section 3.5 of
the XProc specification.  It's not clear that there is any
need for a separate Group construct at the abstraction level,
distinct from the Subpipeline abstraction.</p>
<p rend="meta-xproc">Er, why <emph>is</emph> the group
construct visible in the abstraction layer?</p>
<p>The inputs, outputs, parameters, and containees are
all <q>as declared</q>, and the context is determined
using the <ident>inherited_context</ident> algorithm.</p>
<p>In Alloy terms:
<scrap id="sig.Group" name="Groups">
</scrap></p>
</div>
<div>
<head>Try/Catch</head>
<p>...</p>
<p>In Alloy terms:
<scrap id="sig.try-catch" name="Try/catch">
</scrap></p>
</div>
<div>
<head>Other steps</head>
<p>...</p>
<p>In Alloy terms:
<scrap id="sig.othersteps" name="Other steps">
</scrap></p>
</div>
</div>
<trailer>$Id: models.xml,v 1.16 2007/01/23 20:03:02 cmsmcq Exp $</trailer>
</body>
<back>
<div>
<head>Sweb Notation</head>
<note type="display"><p>Section on Sweb notation needed here.</p>
<p>Or better yet, external reference to a simple document on
how to read Sweb notation.</p>
<p>For now, the reader puzzled by the presentation of the Alloy
fragments in this document is probably best off reading
any of the various introductions to literate programming
easily found on the network; the system used here is
SWeb, described in <ptr type="bibref" target="SWeb"/>.</p></note>
</div>
<div>
<head>References</head>
<listBibl>
<bibl id="Jackson" n="Jackson 2006">
Jackson, Daniel.  
<title level="m">Software abstractions: Logic, language, and
analysis</title>.  Cambridge: MIT Press, 2006.
</bibl>
<bibl id="McMorran-Powell" n="McMorran/Powell 1993">
McMorran, Mike, and Steve Powell.
<title level="m">Z guide for beginners</title>.  
Oxford: Blackwell Scientific, 1993.
</bibl>
<bibl id="Potter.et.al" n="Potter/Sinclair/Till 1991">
Potter, Ben, Jane Sinclair, and David Till.
<title level="m">An introduction to formal specification 
and Z</title>.  New York: Prentice Hall, 1991.
</bibl>
<bibl id="SWeb" n="Sperberg-McQueen 1993">
Sperberg-McQueen, C. M. 
<title level="u">SWEB: an SGML Tag Set for Literate Programming</title>.
Unpublished paper.  Available on the Web at
<xref>http://www.w3.org/People/cmsmcq/1993/sweb.html</xref>
</bibl>
<bibl id="XProc" n="Walsh / Milowski 2006">
Walsh, Norman, and Alex Milowski, ed.
<title>XProc: an XML Pipeline Language</title>.  
W3C Working Draft 17 November 2006.
[Cambridge, Sophia-Antipolis, Tokyo]: World Wide Web Consortium, 2006.
On the Web at:
<xref>http://www.w3.org/TR/2006/WD-xproc-20061117/</xref>.
Latest version:
<xref>http://www.w3.org/TR/xproc/</xref>.
</bibl>
<bibl id="Wordsworth" n="Wordsworth 1992">
Wordsworth, J. B.
<title level="m">Software development with Z: A practical
approach to formal method in software engineering.</title>.  
Wokingham: Addison-Wesley, 1992.
</bibl>

</listBibl>
</div>
<div>
<head>To do</head>
<p>N.B. This list needs to be cleaned up; some items 
appear to be ghosts which have wandered in from another
document.</p>
<list type="bullets">
<item>Formulate propositions from the rest of section 2,
and from section 3 (trying, though, to minimize repetition).</item>
<item>Rework exposition of xproc01 and xproc02 to note
which propositions it does and doesn't model.</item>
<item>Formulate more Alloy models and generate more sample
instances.  Illustrate variant ways of solving design
problems (like component connections:  restrict to siblings?
or allow arbitrary linkages?)</item>
<item>Model data flows in various ways:<list type="bullets">
<item>Component -> Component</item>
<item>ins : Name -> XMLDoc</item>
<item>ins: set Name, ... sig Flow { from: Source, to: Sink }</item>
</list>
</item>
<item>Model context construction and inheritance.</item>
<item>Model correct construction and behavior.</item>
<item>Model signature checking.</item>
<item>Model error handling: static pipeline errors, 
dynamic pipeline errors, component failures.</item>
<item>Model component identity several ways:<list type="bullets">
<item>Type/instance story for components, with unconstrained
instances (as specified in current draft)</item>
<item>1:1 with XML of the pipeline document (no component
ever is in more than one pipeline)<note place="foot">
No component is ever in more than one pipeline, that is, 
unless its XML element is in more than one XML document.
That's a common view, but elements in shared external
entities and material included via XInclude might be
understood as an exception.</note></item>
<item>No type/instance distinction, only a component/use (or 
component / occurrence) distinction.</item>
<item>No component/use (or 
component / occurrence) distinction, either: put everything
that has to be distinct into the pipeline, not into an
occurrence.</item>
</list>
</item>
<item>Can we assert that the flow graph of a legal pipeline is
always connected?  Or is it legal to have a component which
has no input and produces no output that is used?  If such a
pipeline is legal, how is the ordering of the steps done?</item>
<item>Model step uniqueness all three ways:  step as token
(how to formulate this?), step as type (only one XSLT),
step as indeterminate (may or may not be distinct).</item>
<item>Model composite structures (requires making a subpipeline
construct, to make sense of the spec)</item>
<item>Test the definition of ordering; I think it's correct
but a naive translation rules out useful pipelines (see
etudes/xproc04 comments)</item>
<item>What ways are there to outlaw 'senseless' data flows
in the model?  E.g. step output = pipeline input,
or step input = pipeline output (and nothing else).</item>
<!--* <item>Model with and without self-containment.</item> *-->
<!--* <item>Model with and without self-containment.</item> *-->
<item>Model the input and output ports of steps as names,
flows as mapping from name to name.</item>
<item>Model ports as objects, and their names as 
a property on each step, mapping from ports to names.</item>
<item>
reduce number of uses of 'formal'.
</item>
<item>
some comments on grapheme, type, token, ... Introduce t/t more
formally.
</item>
<item>
'abstract object' is a problem.  If abstract, then perhaps no
object?
</item>
<item>
article on type/token distinction
</item>
<item>
go easier on Tanselle and Vander Meulen
</item>
<item>
systematic ambiguity ... ?
</item>
</list>
</div>
</back>

</text>
</TEI.2>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-default-dtd-file:"/Library/SGML/Public/Emacs/sweb.ced"
sgml-omittag:t
sgml-shorttag:t
End:
-->

