XML Processing Model WG -- 23 Feb 2006

Administrivia

Alex points to the latest requirements document: http://www.w3.org/XML/XProc/docs/langreq.html

Accept this agenda?

-> http://www.w3.org/XML/XProc/2006/02/23-agenda.html

Accepted

Accept minutes from the previous teleconference?

-> http://www.w3.org/XML/XProc/2006/02/16-minutes.html

Accepted.

Next meeting: 27/28 Feb at the technical plenary.

Room 157 at the Royal Casino Hotel

Henry reminds us that there's a wiki for ride sharing

<ht> http://esw.w3.org/topic/MeetingTaxis

Agenda planning for the face-to-face

-> http://www.w3.org/XML/XProc/2006/02/27-28-agenda.html

Who's willing to speak about exisiting tools?

Norm, Henry, Richard, Alex, Erik, and Rui volunteer.

Andrew will send a summary of the Arbortext pipeline

Norm reviews the rest of the planned agenda

<ebruchez> all good

Andrew would like the presentations to be in the afternoon because he'll be calling in

Norm: I'll move it to the afternoon

Monday morning: administriva and use cases

Monday afternoon: presentations and infoset input/output discussoin

Norm will update the agenda

Norm to add note about asking in IRC for phone connectivity

Technical

Requirements and use cases.

Alex: Changes from last week: make validation a design principle a design principle; removed naming of pipelines as a requirement
... No more editorial changes.
... We were working our way through issues on the list.

Norm asks for explanation of the table at the beginning of section 4

Alex: It's supposed to map requirements to use cases. The presentation doesn't work but it's saying that for each requirement, here's the use case that supports that requirement.
... I wanted to get rid of having the links in the requirements list so that they would be easier to read.
... The presentation needs to be fixed.
... I'm not really concerned about the presentation right now, just as long as we get the content in place.

Norm: +1
... One issue that I thought we could talk about is the issue of string paramer or simple datatype parameters as opposed to infoset parameters.

Norm observes that the last word on this thread was from Erik.

Erik: We didn't have many use cases that required parameters so we didn't mind using a little trick for XSLT

Henry: I think we have a terminology problem, the example is full of parameters!

Erik: I think the distinction is between infosets and datatype parameters.
... The question is do we need to kinds of parameters in the language to be able to do this?
... If we decide we can only pass XML infosets between components, then how do you pass a numeric parameter to a stylesheet?

Richard: Ok, so this is a small subset of the parameter problem. You're talking about parameters that come from other components, rather than parameters that are specified when you write the pipeline?

Erik: Why should parameters only be static?

Richard: I can see that that's a good generalization, but in my pipeline virtually every step has some parameters, but none of them are derived from previous steps.

Erik: Maybe it depends what you call parameter.

Richard: The sort I'm talking about are XPaths to identify bits of a document

Alex: My pipelines run in a J2EE environment, so I'm passing all sorts of stuff to the pipeline.

<Zakim> ht, you wanted to distinguish at least three cases

Henry: It's clear that we're talking a little bit at cross-purposes
... I can see at least three cases: there are things that I think of as parameters that are static, pipeline-design time XML resources: stylesheets for XSLT or schema documents for validation.
... The second class are design-time controls for components: XPaths, etc. I agree with Richard that the 99% case is that that's known at pipeline-design time. It's a static parameter.
... Another case is command line switches to command line invocations: switches with values and booleans, switches that are either present or absent. An XInclude impl could have a command-line switch that indicated whether or not base fixup is applied.
... That's another example of a design-time choice.
... The third case is runtime parameterization that gets accessed by various components at run time.

Some discussion of what Alex meant.

Henry: I think the point about the third class is that they often go hand-in-hand with the second case.

<richard> If your pipeline is compiled, then Alex's examples are parameters whose values are not known at compile time

Henry: Often you have a slot in the pipeline at design time and a run-time parameter that fills that slot.

Norm: design-time parameters in the pipeline, design-time controls for components, run-time parameters passed to the pipeline.
... those are the three cases?

<ht> HST distinguished between static resources (stylesheets, schema documents)

Henry: I think it's useful to distinguish between those and others.

Norm: I think a third case is parameters that come out of one component and flow into another.
... In the full generality, those could be any kind of parameter, but I've been thinking of those only in terms of infosets.

Richard: One way to deal with it would be to have an XML document that contains the parameters and then that could be generated by a stage in the pipeline.

<Zakim> ebruchez, you wanted to discuss using XML infosets to do that or the XDM

Erik: I just wanted to point out that there's some conceptual simplifications that could be made.
... For example, when I hear of a stylesheet or schema as a parameter, I know that in many cases that's static, but you can also simplify it by saying that they're both XML documents and you can combine them.
... You can just consider an XSLT stylesheet or a schema is just an infoset.
... In XPL we've been trying to maximize this simplification.

<ht> HST likes the idea that follows from merging Richard's suggestion with Norm's resource pool idea: Provide an name:value store as part of the pipeline engine, which can be set a) at pipeline invoication; b) via a pseudo-output URI and a standardised XML document; via the engine API as it faces the components

Erik: This way you don't need to switch between concepts. We should try to keep that simplification in mind.
... If we use the XDM data model then the whole question becomes simpler because we can just pass around XDM simple types.
... In XPL since we only have infosets, when we need to pass the user principle, we encapsulate it all in an XML infoset. So most components take an XML infoset as a configuration.

I think this is what Henry was suggesting a few moments ago

Henry: no, actually not.

Richard: I agree that Erik's simplification is good, I just don't want it to make the simple case where you have a static stylesheet or schema more complex or inefficient.
... So if we can keep the generality without precluding optimization, I'm all for it.

Erik: There are ways to avoid the optimization problems.

Alex: I feel really strongly that simple things should be simple.
... If I have a simple string and I need to assign it to a name so that some component can access it, turning it into an XML resource seems really hard.

<ht> HST strongly endorses this, even if all it means is that the XML _syntax_ for pipeline authoring makes it transparent

Alex: We need a simple way to bind simple values to names

Erik: Alex, I think that's fine when you just write them statically. It's where you want to generate them that it becomes problematic.
... I think you're going to have to have a way to allow a component to generate a parameter for some other component.

Alex: I think if we could come to agreement that there are parameters and resources, that would be good. Being able to formally declare a dependency on a resource is a good thing.
... That means that you have the use case of generating something in the middle of a pipeline that is a parameter. But that's a sepearate problem and we can decide if that's possible separately.

<Zakim> ht, you wanted to remind ourselves about using the infoset for this . . .

Henry: Just to add to the dimensionalities we're thinking about, I think there's an important distinction between parameterizations of components on the one hand and out-of-band computed information on the other.
... For out-of-band computed information, the infoset is your friend. example, what do you do with an XSLT step in the middle of a pipeline which sets the output encoding?
... You put an annotation in the infoset so that the information is available when you need to serialize it.

<ebruchez> You may also want to completely separate serialialization in pipelines.

Henry: What's interesting is that it's not information for the next step, it's for someone else later.
... Infoset annotations are the way to go here.

Murray: I liked Henry's characterization of the three different kinds of parameters.
... I'm a pipeline processor, I have a blank mind. Once things get going, I start to become aware of stuff.
... I want to be able to store that away so that I can use it later.
... Some of it is in files that I can assign URIs to, and some of it is in memory (which might also have URIs). I can build this little environment.
... I might be operating many components and there might be an arbitrary number of steps.
... Along the way, I'm going to have to calculate things. For example, processing a book might require multiple passes to get all the page numbers correct.
... I might store the infoset for the ToC somewhere, then later when I know the page numbers, I might want to edit it.
... Then later, I might grab that and actually use it to build a PostScript rendering of the ToC.

<ht> HST observes this connects up with Norm's pool idea, understood as a little local filesystem

Murray: Then later still, I might build an online version of that ToC.

Norm agrees with HST, but is frightened of mutable infosets.

<ht> each instance of the pipeline starts with an empty disk, as it were

Murray: When the job ends, I go back to having a blank mind. Maybe some of my stuff is stored, maybe it isn't.
... All of these things are resources that are created as we go (or before we start).

<alexmilowski> mid-pipeline binding of parameters is something I do all the time...

<ebruchez> It's called a variable ;-)

<ht> MT Pipeline does the same thing as XPL here, using no-URI fragments to identify such local resources, e.g. #tempDoc

Alex: I'm with you Murray, I think the problem we're having is with the distinction between parameters and infosets.

Murray: But aren't they all resources?
... If I say "-j Alex", if that value needs to be assigned, somehow I have to be able to reference that value.

Alex: I think it's useful to treat the infosets differently, but maybe there's room for debate on that. I think there should be simple parameter values too.
... They're all resources philosophically, but lots of processors have a distinction betwen "the input" and parameters that are sent to them.
... Look at the Java components that wrap up XSLT for example.

Norm: The Java/XSLT case might be useful to consider.

<ebruchez> It's a push vs. pull, in a way.

<ebruchez> XSLT's source is pulled by the transformer, and the other parameters are set in advance.

<ebruchez> Not sure if that has to hold though.

<alexmilowski> the parameters do not have to set in advance

Adjourned

XML Processing Model WG

23 Feb 2006

Attendees