XML Processing Model WG -- 5 Oct 2006

1. Administrivia

1.1. Roll call.

Present: Andrew Fang, PTC-Arbortext Paul Grosso, PTC-Arbortext Alex Milowski, Invited Expert Michael Sperberg-McQueen, W3C/MIT Richard Tobin, University of Edinburgh Alessandro Vernet, Orbeon, Inc. Mohamed ZERGAOUI, INNOVIMAX

Absent / regrets:

Erik Bruchez, Orbeon, Inc. Vikas Deolaliker, Sonoa Systems, Inc. Rui Lopes, Invited expert Murray Maloney, Invited Expert Jeni Tennison, Invited Expert Henry Thompson, W3C/ERCIM, University of Edinburgh Norman Walsh, Sun Microsystems, Inc.

1.2. Accept this agenda.

Accepted without change.

1.3. Accept the minutes of 28 Sep 2006.

http://www.w3.org/XML/XProc/2006/09/28-minutes.html

Accepted as a true record.

1.4. Next meeting: 12 Oct 2006.

No regrets.

2. Technical

2.1. Scope of step names

RT said that he hadn't followed the discussion of this topic in detail and asked AM to summarize.

AM said his main goal in

http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Oct/0003.html

was just to clarify the spec. We spent most of last week's call saying we don't want to talk about graphs anymore, but the current draft just has this cryptic sentence saying

The scope of component names is the flow graph of their container and the flow graphs of the constructs therein, recursively.

which is going to be hard to make clear, if we aren't talking about graphs any more. We need a better story.

One specific question, he said, is this: can you point to the ports of your siblings? Or more generally, what can you point to?

If we don't have a notion of a flow graph, then you have to talk about questions like this in terms of the XML.

RT said he'd be very troubled if we did that. The XML, he argued, is a representation of a more abstract language. There are aspects of it we do not want to have to talk about.

AM responded that he'd be happy to work / describe scope rules on a different level -- but then we have to say what that level is, and avoid the kind of problems that soured us on graphs.

MSM suggested that we distinguish two questions:

What level should we work at, in describing the scope rules and so on?
What should the scope rules be?

W.r.t. (1), MSM proposed the straw-man position that we should talk solely in terms of the XML. We do not need a distinct layer of abstraction: if we add one, and it's not really very different, we have only added confusion; if we add one that's really very different structurally from the XML, then our XML syntax stinks and should be redone. He noted that while programs viewed as sequences of characters are clearly representations of more abstract objects, still Kernighan and Ritchie describe C programs not as sets of conditionals and branches and so on, but as character sequences that obey certain rules and have certain meanings. Wirth's description of Pascal similarly stays very close to the surface. Scoping rules are formulated in terms of the parts of the program text in which a particular identifer and binding are visible, and where they're not visible.

W.r.t. (2), MSM asked the devil's advocate question: why have rules at all? Why not just say port references have to be in a form that allows unambiguous identification of the port, and decline to have rules about forward reference and so on?

RT responded that he thought the XML syntax should indeed be fairly close to the underlying abstraction, but he suggested that there are some features of the XML surface syntax that we want to be able to ignore. (We did not go into details, or if we did, the scribe missed them.)

A few days ago, RT had posted a sample of the style of description he has in mind:

http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Oct/0032.html

Taking up the C example, RT noted that the C spec (not K&R, but the standard) seems to him to identify an abstact level of thing in its organization: it has a segment on the conditional, in which it talks about its semantics, and syntax, and so on, and another on functions, and so on, abstraction by abstraction. It doesn't focus exclusively on the syntax.

Take (RT continued) the example of our choice construct. RT would not like the scope rules to have to change, if we change our minds about this or that thing being a child element or an attribute.

RT answered MSM's second question by noting that there are some connections that make no sense.For example, a port in branch 1 of a choose should almost certainly not be written to expect output from a port in a different branch of the same choose -- if one of them runs, the other won't. So MSM's straw man proposal of "no rules" won't really work.

AM was worried that we would end up doing part of a conceptual model, but not a complete one. Doing a complete conceptual model would be really complicated and a lot of work. But doing just part of one is probably not very much help. We'll do better, he said, to stick with something that is close to the tree model.

RT said his level of abstraction was closer to the XML tree than anyone just listening to the discussion (or reading the minutes) might suspect. MSM suggested we walk through it together. RT agreed, although he said he feared we were straying a bit far from the topic on the agenda.

We discussed RT's sample at

http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Oct/0032.html

RT noted that he is distinguishing

component classes ('XSLT 1')
component instances (a particular XSLT step in a particular pipeline)
activations (one invocation / run / activation of a particular XSLT step in a particular pipeline)

MSM asked about the apparent discrepancy between identifying ports and so on in the description of the class, but saying at the top of the message that the static environment is a property of the component instance. RT reflected that methods, too, are described at the class level but also spoken of as belonging to instances. He clarified, however, that the static environment is not the set of ports a component instance has -- it's the ports of all the other components in the pipeline which are visible to the component instance, the set of ports in scope, the ports it can name.

After discussion of the example, MSM said the descriptions do look slightly abstracted from the syntax, but they are still very treelike. He wondered whether AM would find it treelike enough.

AM said that when RT started talking about activations and flows, AM did get concerned. There's a boundary we have to be careful not to cross, to avoid overconstraining the implementation. People shouldn't (be tempted to) take this as a literal description of how you write the implementation, just a description of how you connect things up.

RT agreed. He didn't think it actually does suggest that it's an implementation recipe. It's a bit like programming-language specs: using a stack and representing procedure activations as stack frames is one implementation strategy, but not the only implementation strategy, and the specs accordingly avoid talking about stack frames (or even stacks).

AM said he'd like to register a general concern that there's a slippery slope here, and we could end up building implementation assumptions into the spec. He didn't have any specific examples.

RT shared the concern, and said that if AM or anyone else ever find something that assumes a particular implementation strategy, he expected he would agree that it needs to be changed.

RT suggested that the sample descriptions in his email really should be augmented with descriptions of they XML syntax. There was a brief moment when it looked as if we would digress permanently onto the question of how to document the syntax (DTD notation? XSD notation? RNG? ad hoc?), but the Working Group beat it back.

MSM proposed a deal: he would accept RT's desire to think of the description as describing an abstract component layer, as long as the description is also readable as describing just the surface form / the XML.

RT suggested that the biggest complication there is that some things will inherit part of their description from other things.

MSM noted that in some cases that kind of thing can leads to sub-optimal design of the surface syntax.

AM suggested we could get a lot of mileage by starting with just two fundamental constructs / ideas: atomic components, and containers. RT eventually persuaded AM that the body (or bodies) of containers (which he called 'flows') are not quite either of those things and need their own description, just as compound statements do in a programming language description.

AM asked: so how does this help us talk about the scope of ports?

RT said that it didn't answer the scope questions, but suggested that it gives us a language for describing the answer: the ports you can refer to are those in your static environment. For a given construct X, the documentation describes what is visible to the flow bodies it contains; typically that will be what is visible to X, plus some stuff which can be described in terms of the structure of X.

MSM said that what RT and AM were describing sounded like an attribute grammar. After some hesitation, RT agreed that it could be viewed that way.

AM asked how, to take a concrete example, an atomic component in a nested flow gain the ability to refer to an input port of the pipeline itself?

RT answered that each construct describes the visible ports for its children, normally in terms of what's available in its own static environment, plus (or minus) other specific things. An example is given in the sample description of for-each, in the email. If a construct didn't just expose new things, but hid things, that could be described, too.

There is not any need to describe interactions between nested things and the containers of the containers of their containers: what happens is that each layer / each link in the chain of containment, describes what is visible to the things it contains; the visibility of distant ancestors gets passed along one layer at a time. In just the same way, an attribute grammar describing an Algol-like language will describe the set of variable bindings visible in a block as the set of bindings visible outside the block, overlaid by the set of variable bindings found in the block itself. In deeply nested blocks, every layer of nesting has its own story about why a top-level binding is visible.

At about this point, we recognized that we had gone overtime.

MSM noted that those who had been speaking seemed to have converged on the idea that something like the description suggested in RT's email might be about right (viewed either as a description as a set of abstractions structurally very similar to the XML, or viewed as a description of the XML). He asked those who had not spoken whether they shared this view, or were skeptical. Several people then said they were still uncertain; a desire for further concrete examples was voiced.

2.2. Initial discussions of parameter handling

We didn't reach this topic.

3. Any other business

None.

We adjourned at 7 past the hour.

XML Processing Model WG

Meeting 38, 5 October 2006

Attendees

Contents