XML Processing Model WG -- 27 Feb 2006

At a couple of points, my
hands became jittery from the discomfort of typing on 
my laptop keyboard, and I took notes on paper.  When I
searched for those paper notes this morning, I failed
to locate them; the lacunae are marked explicitly. My
apologies.

-CMSMcQ

[Tuesday 28 February 2006, Mardi Gras, afternoon session.]

[2:00 p.m.]

Henry Thompson spoke about the (unreleased) Markup Technology Pipeline
Language.  It has a simple GUI for building a pipeline of processes;
built-in processes include processes to absolutize URIs, eliminate
elements, filter elements, call arbitrary programs which map XML to
XML, wrap subtrees in particular elements, check URIs, send and
receive SOAP messages (synchronously), perform XInclude processing,
execute an XSLT 1.0 transformation sheet.  One XML representation of
the pipeline uses a language very similar to that of Sun's pipeline
language.  (It does add some built-in semantics, namely stdin and
stdout, which are not present in the Sun pipeline language.)  But
those pipelines are not runnable; there is a compiled form which is
much more highly articulated and somewhat less readable.

The compiler itself is an eight-stage pipeline (not a separate set of
Java classes).

Two interfaces are provided; event streams or full documents.  The
objects provided are the same where possible: a start-tag event looks
like an element with no children.  "Viewports" are an important tool:
select elements, pass them through part of the pipeline, and
reassemble the original document, replacing the original elements by
their images.

Examples:  validate twice with surgery.

http://www.markup.co.uk/showcase/

HT identified as key points: (1) separation between UI, the language
users see, and the language the pipeline processor actually executes;
(2) a resource manager which is largely an optimization issue but does
mean you can write straight-through pipelines with appeals to resource
manager which would otherwise require multiple inputs; (3) the
two-level story allows a clean way to say component design is
independent of push vs. pull or tree vs event -- the compilation
process and the runtime take care of mismatches.

Q: If you didn't have both push and pull, would you not need segments?
A: The end-viewport component has two inputs, so I need both.

[Some discussion lost here; apologies from scribe.]

Topic: iteration

NDW: does it suffice to say for v1 that you can iterate a pipeline
over the sequence of documents, but cannot iterate to a fixed point,
and cannot iterate for some fixed number of repetitions?

RT: how about an implicit iterator?  A component that takes a single
document, when presented with multiple docs, runs on each document and
produces a sequence of docs.  (At this point, someone says something
under their breath about mapcar.)

MM: what is a "viewport"?  HT: it's a pipeline stage that allows you
to identify a set of nodes, apply a process to each of them, and
produce as a result document the original 'matrix document', with the
results of the process substituted for the original selected elements.

We discussed alternate names for this kind of construct: peephole,
subtree, bypass.

EB (responding to RT): I'm wary of putting too much emphasis on
implicit aspects.  If you want to iterate, use an explicit iteration
construct.

Iteration can be handled either by a specific component, or by
built-in language-level constructs.  We discussed.

We digressed into a discussion of the analysability of ad hoc
constructs vs built-ins; macros vs special forms (fxpros).

NDW reiterated his proposal: in v1, iteration would be limited to
iteration over a sequence of documents.  (This is intended without
prejudice to the presence or absence of
viewport/subtree/peephole/bypass.)

HT: yes.

MSM: I thought we had requirements for iteration to a fixed point?
Didn't we discuss them just the other day? A: yes, we discussed
iteration to a fixed point the other day.  But no, it has not been
accepted as a requirement, at least not as one we are committed to
achieving.

There followed a discussion of what 'requirements' means.  We have a
set of requirements, some of which we will and some of which we won't
meet.  Or we have a set of candidate requirements, some of which we
will accept as actual requirements and others of which we won't.

There was some concern over NDW's proposed restriction: AM doesn't
want to lose viewports through inattention.

Non-use-case use case: the Atom feed which just gives you a bit at a
time, with a link to Next bit, will be hard (impossible?) to handle
without iteration to a fixed point or iteration under control of some
Boolean condition.

[Coffee break here]

[When the scribe returned from , discussion of parameters underway.
One point to record: if we use any one of the schemes proposed for
dynamic parameters, it doesn't preclude the use of simple ways of
specifying static parameters.]

NDW noted: We've discussed conditionals, iternation, resource managers
vs pipes; what else is there near the top of people's lists?

AM: viewports? 

EB: sub-pipelines?  there are issues, maybe just issues of detail.
But we haven't agreed on how they connect up.

And we haven't agreed on a processing model.  Backward, forward,
other?

HT hypothesized that if we can come up with a way of describing the
semantics of the pipeline language that didn't require us to take a
stand on the question of data flow model vs dependency model, it would
be a win.

AM (another topic to discuss): XDM data model vs infosets.

Topic: infosets vs data models vs ...

The only problem AM has with XDM is (a) its weird treatment of invalid
material and (b) its incomplete access to the PSVI.

HT agreed.  Also, the XML Schema WG did its work by adding infoset
properties; others may do the same.

MSM asked: how does that distinguish the infoset from XDM?  There
followed a discussion (inconclusive) about whether XDM is closed.

[Discussion lost by scribe here, apologies ...]

Would it cause problems if we said that what gets passed around are
XDMs?  Technical problems?  Political problems?

Some voices say Richard can't do what he wants in that case (i.e. the
language cannot be implemented by using pipeline stages which each
read and write XML serial syntax and run in Unix pipelines).  Richard
was not convinced: if all you have a pre-defined components, and you
have some restrictions on what is required, then you won't necessarily
be able to tell the difference.  

HT: what would the conformance clause say, if we wanted that?  I can't
see what people want.

[There followed an excursion into the formulation of conformance
clauses.]

At 5:30, Norm provided a concluding summary and evaluation of the
meeting and we adjourned.
XML Processing Model WG

28 Feb 2006 P.M.

Attendees