XProc Minutes 9 Feb 2006

The XML Processing Model (XProc) WG met on Thursday, 9 Feb 2006 at
11:00a EST (08:00a PST, 16:00GMT, 17:00CET, 01:00JST+, 09:30p India) for
one hour on the W3C Zakim Bridge.

See the XProc WG[1] page for pointers to current documents and 
other information. 

Norm gave regrets; Michael (MSM) chaired (and scribed), and apologizes
for the late arrival of these minutes.

Attendance:

Present
  Erik Bruchez, Orbeon
  Vikas Deolaliker, Sonoa Systems
  Andrew Fang, PTC-Arbortext
  Murray Maloney, invited expert
  Alex Milowski, invited expert
  C. M. Sperberg-McQueen, W3C
  Henry Thompson, W3C
  Richard Tobin, Univ. of Edinburgh
  Alessandro Vernet, Orbeon
  Paul Grosso, PTC-Arbortext

Regrets
  Norm Walsh, Sun Microsystems
  Jeni Tennison, invited expert
  Rui Lopes

 1. Administrivia
      1. Accept this agenda.

Accepted without change.

      2. Accept minutes[3] from the previous teleconference

Accepted without change.

      3. Next meeting: 16 Feb 2006.  

Noted.

      4. Tech Plenary[4] registration is now open[5].

Noted.

 2. Technical
      1. XProc Requirements and Use Cases[6]

We continued our discussion of the document.  
http://www.w3.org/XML/XProc/docs/langreq.html

Alex Milowski asked whether anyone had any general comments on the
design goals section, before resuming our walk through the specific
requirements.

Alessandro said he was a bit concerned at the terminology used,
particularly the specific mention of the infoset.  We discussed
recently what the input and output of pipeline stages should be (in
particular, infosets vs. XDM instances), but AV did not think we had
reached consensus one way or the other.  He would prefer to avoid
talking specifically about infosets.

Alex said he felt strongly that we have to set a minimum bar, and that
the infoset is that minimum.  

Henry thought we had actually reached agreement that pipeline stages
and implementations of the pipeline language are not constrained to a
particular data model, but you are constrained to support the info
set.  

There followed a long discussion of whether we support arbitrary data
streams including non-XML data, or streams limited to particular
vocabularies or subsets of the infoset (e.g. XML, but not with
attributes).  There was some strong sentiment in favor of saying no,
we do not support arbitrary data streams, but there were also some
concerns about that restriction.  There seemed to the scribe to be
something like consensus that it needs to be possible to build
special-purpose pipeline stages that only support specific
vocabularies, and that if such a vocabulary has (for example) no
attributes, it might be a challenge to formulate a requirement that
attributes (to continue the example) must be supported.

Henry suggested that we should probably elevate to the status of a
general rule the basic principle that pipeline stages should pass the
input infoset through to their output without change, except for the
changes which are part of the processing and which are documented.  If
(for example) a component is advertised as accepting an XPath which
denotes a set of nodes which contain URIs, and a base URI, and
producing output in which the URIs have all been absolutized, then if
all the namespace bindings are missing from the output, Henry wanted
to have a legitimate grievance against the component maker.

MSM sympathized with this view, but wondered whether such a rule was
inherently toothless, in the sense that the maker of the component
described by Henry would be able to make the component work
'correctly' by changing the documentation to say "this component
absolutizes all URIs and suppresses all namespace bindings".  Such a
description might make clearer to the user that that component is not
really useful in practice, but it could be a correct, conforming
component nonetheless.  Henry agreed that it would be hard to make the
definition of conformance entail usefulness, but thought the notion of
component signatures might be worth exploring even so.

Alessandro suggested that the discussion showed clearly that we don't
have a clean consensus on the details; there seemed to be agreement on
this conclusion.  It is probably enough, said Alessandro, if the
requirements document says it's a requirement to be clearer about this
problem.

Alex asked whether we should perhaps require that the language define
a minimum set of infoset items and properties.  That would leave open
for us to define either a subset of the infoset, or require XDM, or do
something else.  But it does say clearly that doing that is a work
item.

Richard expressed concern about possible over-specificity.  It's
(still) a possibility, he said, to define a system that does not allow
addition of components, so everything would be a black box.  In such a
system, with black boxes for components, the user isn't actually able
to *tell* what info items are flowing across the component boundaries.
[Possible exception: in the right circumstances, a change in the input
which fails to elicit a corresponding change in the output would
indicate that a particular piece of information is not crossing some
boundary somewhere.

Alex replied that we do have a requirement for adding components.
Should that requirement be labeled optional?  (If it's not optional,
then it's not actually still a possibility to define a system that
does not allow addition of components, so everything would be a black
box.)

Murray said it seemed to him that me we can't say what happens inside
of each component: it might use the infoset, or one data model, or
another -- that's not up to us, it's up to the component.  He supposed
we could limit the inputs and outputs.  But at least for the terminal
component, it's useful to be able to produce non-XML output (text
files, Postscript, ...), so it seems we are likely to be on thin ice
if we seek to eliminate all non-XML data streams from our purview.
Even agreeing that some minimum bar needs to be set, Murray said, some
of the possible places we've talked about putting the bar seem
(unnecessarily) restrictive.

Richard said he was more interested in restricting the WG in our
deliberations than in restricting implementations.  He didn't want
this to expand to become a wholly general language for processing
arbitrary data.  But he saw only one way to avoid that, namely to say
that what's passing through the pipeline is XML -- though not
necessarily in textual form (hence the reference to the infoset).

Alex said he had registered an issue on requirement 4.3.
So this should be trackable now.

Perhaps, he continued, this topic (infosets and nature of what passes
through the pipeline) should be on the ftf agenda, and then we can
table it (i.e. suppress it) on the calls between now and then.

He encouraged the WG to look at the use cases, at least briefly (with
the side warning that the HTML is currently sub-optimal).

In terms of the requirements document, Alex reminded the WG, we have
made it through to 4.6.  I stuck in the strawman we talked about last
time, refactoring the old requirement into two pieces.  4.6 is the
idea of having standard names for standard steps.  4.7 is a specific
proposal for a minimal set of standard steps.

At this point, the allotted time expired and we adjourned.

 3. Any other business

None.

[1] http://www.w3.org/XML/Processing/
[3] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Feb/0004
[4] http://www.w3.org/2005/12/allgroupoverview.html
[5] http://www.w3.org/2002/09/wbs/35195/TP2006/
[6] http://www.w3.org/XML/XProc/docs/langreq.html