XML Processing Model WG -- 28 Sep 2006

Norm is late to the call, Henry begins the meeting (thanks, Henry!)

<ht> http://www.w3.org/XML/XProc/2006/09/28-agenda.html

<ht> Agenda agreed

<ht> Accept minutes of previous meeting as a valid record

<ht> Apologies for next week from HST, Norm

<ht> Tentative agreement from MSM to chair next week

Michael: Suggest we do 2 minutes around the table, then discuss

Alex: Pop the whole discussion up several levels
... The level of detail shown in these diagrams is too great to be helpful to readers
... I use that level in my implememntation, but I wouldn't expose it to users
... Maybe we should even just drop all mention of graphs from the spec
... and not constrain implementaitons

<alexmilowski> Clarification: I mean delete the computer science terminology of a "directed acyclic graph"

Murray: Like what Alex said. Move to a higher level. Don't need to talk about graphs.
... Can move this discussion elsewhere.

MoZ: Need to have a formal semantics in the near future.
... Agree we don't need to talk about this now.
... Go further with semantics and models in parallel.
... Don't want to loose focus on semantics.

Richard: I think graphs are great. They're easy to understand and we should have lots of them in the spec.
... But they aren't the semantics. We should have a straightforward semantics written in English.
... They should work in a natural way describing each of the components.
... Don't need to describe the semantics of the whole flow; do it in a modular way for each component.
... We don't need to decide how the diagrams look because they aren't in the semantics.
... They don't even have to be right; they can give the users an idea without mapping onto the semantics.

<alexmilowski> +1 to that !

Richard: It's irrelevant whether or not the pictures are accurate as long as they're helpful to the reader.

Erik: I like what Richard said.

Michael: I'm made nervous by some of what I've heard.
... I think we need two things, or one thing with two aspects.
... 1. We need to understand what we think pipelines are; and 2. we need a story.
... I'm not committed to the graph description as the best or only story
... But at some level it's clearly appealing.
... I'm made nervous about the proposals that say let's not go there, unless they mean let's find another way to reach clarity.
... I worked on a spec 10 years ago that has worked pretty well, and I remember Dan Connolly pushing often to answer the question "what is an XML document".
... I didn't think we needed that kind of clarity; everyone knows what we mean.
... But many of the hardest problems have come from the fact that we didn't answer the question that Dan asked.
... I now think that it would have been useful
... I'm not sure defining a pipeline as a graph is useful, but it may tell a good story.
... My goal here is clarity.
... If we don't say a pipeline is a graph, I want something similarly concise and suggestive.

Rui: I tend to be pragmatic; I agree with Alex and almost everyone else.

Norm: I'm happy to move on to something else for a bit. I'll assume to the extent that people agree or disagree with what the spec says, they'll comment on the words in the spec.
... Sorry I missed what Alex said.

Henry: My experience producing these diagrams: http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Sep/0071.html/
... I started trying to produce diagrams that were faithful to my understanding.
... Convinced me of two things: a lot of detail is necessary and when we state the semantics of the components, it will not be appropriate to try to do so in terms of the diagrams
... We'll want to use words like the spec currently does perhaps with a more explicit abstract model.

<MSM> Henry, can you say a bit more about what kind of detail you found necessary and why it proved essential?

Henry: It did help me understand

Henry: If we look at figure 1a, a nested graph mode of a for-each
... The distinction between thick gray lines and dotted black lines are that the gray lines are the pies and the dotted lines that are the lines of the approrpiate type for the component.
... Documents flow down think gray lines.
... What a dotted line means is context dependent, in 1a the top dotted line means send a single document down this repeatedly.
... The bottom dotted line is the concatenation of the output of all the iterations.
... Now compare with figure 1b

<MSM> and why is it essential to distinguish single-document data flows from multi-document data flows / loops?

Henry: In the computed schema case, I've put the computation outside the for-each
... There is an auxilliary input to the for-each because I need a new kind of connection.
... The dotted line from the XSLT result to the XInclude port. This time the dotted line means repeatedly include the same component.

Alex: This is where we need to pop up a level.
... There are some constraints here, but the schema is not an input to the for-each. It's used inside, so your implementation has to do something about it.

<MSM> At the point where we start saying it's essential to distinguish three kinds of edges, I think we have left any story about graphs behind.

Alex: I think there needs to be a line that crosses the red box. It's not formally an input to the for-each.
... I don't do it that way and it works just fine.

<ht> http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Sep/0073.html

Alex: There's a constraint here that you can describe to the user, that the scheme has to be the same each time. You don't have to say that it's an input.

Murray: In fact, isn't the computed schema a pull? It's not a computed input, it's a pull from the validate.

Norm: No, it's not a pull, it's not a URI, it's computed by a previous step.

Alex: That's the kind of clarity we need. It's not something you're iterating over, it's a value that goes along from the ride.

Richard: I don't think we want any diagrams like this in the spec. A diagram with three kinds of lines just isn't useful to a new user.

<alexmilowski> +1 to that

<MSM> +1 to opposition to three kinds of dotted lines

Richard: They're useful to us to argue about what we actually mean.
... OTOH, I think diagrams like the ones currently in the spec, which are much more informal, are what we need.
... I'd like to compare the XML spec to the Schema spec. I don't want this spec to be like the Schema spec where you have to analyze every sentence for meaning.
... There should be a place where the semantics are spelled out clearly. If there are errors there, we should fix them. If there are other descriptions that seem to say something slightly different, that shouldn't matter.
... If in the course of producing the semantics, it turns out to be useful to produce a specialized diagram, we can do so.

<MSM> [I agree with Richard that we don't want readers to have to be language lawyers. But I believe one reason the schema spec is unclear is that we did not insist on clear common understanding of what schemas and schema components are. That means the prose has to avoid saying things clearly, it has to say things obscurely so different WG members can each think it's saying what they mean.]

Richard: (replying to MSM's comment above) I don't think that has anything to do with what's wrong with the Schema spec. It simply tries too hard to be declaritive. It lays things out as discrete declarations instead of providing a place to read from to understand procedurally what validation does.

Alex: I'm looking at things that say, if you used an input inside a for-each that isn't something you're iterating, then it's constant during iteration.

<MSM> [I think part of the reason the validation rules are hard to read is that they aren't in fact declarative, they are an algorithm pretending not to be an algorithm. What the spec needs is more declarativity not less.]

Richard: I think the easiest way to describe this is in terms of the documents produced by ports.
... The fact that there's a line going across a box simply means that this document is used in each iteration.

Alex: It turns out that implementations are going to have to do something to make that work.

Richard: We can simply describe what happens (the component gets that document) without describing how the component achieves that result.

Alex: I wonder if we can try to restate what was going on in the deleted section 4.1.3.
... I think there are a handful of rules that we could say concretely without going into particular implementation semantics.

Henry: The only reason that I'm unsure of our ability to do that is that I'm worried about the cross-product phenomenon.
... The reason for stating things in terms of the auxillary-input story has the advantage that there are no cross-prodcut problems.
... It worries me that it's not going to be the case that if you refer to a port across a choose boundary and say "thus and such"
... Instead, you're going to have to say "if you refer to one across this boundary and that boundary"...
... I can't prove that, but I'm worried that it won't be possible to say things once.

Alex: I think we have two kinds of things we have to worry about, output of regular steps and these "dangling" inputs.

Norm: I'm also worried about the choose component and dealing with branches that don't execute.

Henry: I thought about that too and I'm less worried. Outputs are always produced (crucially), the fact that it's input is plugged into the branch of a choose that doesn't execute can't matter, because the producer of the unused input might have side effects, and they must always happen.

Richard: The flow and pipe metaphor is unuseful in this case. It works much better to explain it in terms of a document being produced and saying that it will be availble if it is executed.

Alex: I think for-each and viewport have a similar but somewhat simpler story. So if we get it right it should work everywhere.
... As a concrete exercise, can we think about those constraints.

Richard: Michael said earlier that conceptualizing this as a graph is very attractive, but it seems to me that it has not so far proved to be a useful way of describing the details.
... In a regular programming language, this sort of problem never arises and that's an indication that perhaps it's not useful here.

Alex: I agree.

Norm: Perhaps that's the highest level take away from three weeks of discussion.

Richard: If we use graphs but they don't map to the semantics, Micheal will tell us we're being misleading, right?

Michael: It depends. I notice that lots of people draw pictures and can't actually tell you what the lines and symbols mean.
... If we have a coherent explanation of what the symbols mean, that might be sufficient.
... If they don't depict graphs, then we shouldn't say they are. We should say they "show the data flow" or something like that.
... There's a distinction between drawing something that is false and something that is a simplification.

Richard: Maybe we should instead say that even if we use graphs as diagrams, we shouldn't use formal graph-theory language in any kind of prose if we aren't intending to be precise.
... Perhaps we shouldn't say that a pipeline is a directed acyclic graph...I'd rather keep the lines and boxes as informal descriptions than to formalize them so that they are consistent with graph theoretic terms.

Michael: It's not entirely clear to me that the complications Henry found it necessary to introduce are in fact necessary.
... If you just say that for any pipeline we can struct a graph with nodes and arcs, that's much simpler. That doesn't draw you a picture of parameter passing or flow of control, but as long as you don't say it does, you're not telling a lie.

Norm describes next steps: changes to the document to remove graph theoretic terms we aren't apparently comfortable with and another to propose areas where the spec needs more prose.

Henry: I'm going to take a stab at writing some class definitions for a few of the constructs.
... To see if the lessons I learned can be expressed in one way or the other.

Alex: I'm going to try to recodify some of my constraints.

Norm: (Planning for next week.) I'll see what develops in email and publish an agenda next Wednesday either cancelling the call or proposing an agenda.

Adjourned

- DRAFT -

XML Processing Model WG

Meeting 37, 28 Sep 2006

Attendees

Contents

Summary of Action Items