See also: IRC log
Alex points to the latest requirements document: http://www.w3.org/XML/XProc/docs/langreq.html
Room 157 at the Royal Casino Hotel
Henry reminds us that there's a wiki for ride sharing
Who's willing to speak about exisiting tools?
Norm, Henry, Richard, Alex, Erik, and Rui volunteer.
Andrew will send a summary of the Arbortext pipeline
Norm reviews the rest of the planned agenda
<ebruchez> all good
Andrew would like the presentations to be in the afternoon because he'll be calling in
Norm: I'll move it to the afternoon
Monday morning: administriva and use cases
Monday afternoon: presentations and infoset input/output discussoin
Norm will update the agenda
Norm to add note about asking in IRC for phone connectivity
Requirements and use cases.
Alex: Changes from last week:
make validation a design principle a design principle; removed
naming of pipelines as a requirement
... No more editorial changes.
... We were working our way through issues on the list.
Norm asks for explanation of the table at the beginning of section 4
Alex: It's supposed to map
requirements to use cases. The presentation doesn't work but
it's saying that for each requirement, here's the use case that
supports that requirement.
... I wanted to get rid of having the links in the requirements list so that they would be easier to read.
... The presentation needs to be fixed.
... I'm not really concerned about the presentation right now, just as long as we get the content in place.
... One issue that I thought we could talk about is the issue of string paramer or simple datatype parameters as opposed to infoset parameters.
Norm observes that the last word on this thread was from Erik.
Erik: We didn't have many use cases that required parameters so we didn't mind using a little trick for XSLT
Henry: I think we have a terminology problem, the example is full of parameters!
Erik: I think the distinction is
between infosets and datatype parameters.
... The question is do we need to kinds of parameters in the language to be able to do this?
... If we decide we can only pass XML infosets between components, then how do you pass a numeric parameter to a stylesheet?
Richard: Ok, so this is a small subset of the parameter problem. You're talking about parameters that come from other components, rather than parameters that are specified when you write the pipeline?
Erik: Why should parameters only be static?
Richard: I can see that that's a good generalization, but in my pipeline virtually every step has some parameters, but none of them are derived from previous steps.
Erik: Maybe it depends what you call parameter.
Richard: The sort I'm talking about are XPaths to identify bits of a document
Alex: My pipelines run in a J2EE environment, so I'm passing all sorts of stuff to the pipeline.
<Zakim> ht, you wanted to distinguish at least three cases
Henry: It's clear that we're
talking a little bit at cross-purposes
... I can see at least three cases: there are things that I think of as parameters that are static, pipeline-design time XML resources: stylesheets for XSLT or schema documents for validation.
... The second class are design-time controls for components: XPaths, etc. I agree with Richard that the 99% case is that that's known at pipeline-design time. It's a static parameter.
... Another case is command line switches to command line invocations: switches with values and booleans, switches that are either present or absent. An XInclude impl could have a command-line switch that indicated whether or not base fixup is applied.
... That's another example of a design-time choice.
... The third case is runtime parameterization that gets accessed by various components at run time.
Some discussion of what Alex meant.
Henry: I think the point about the third class is that they often go hand-in-hand with the second case.
<richard> If your pipeline is compiled, then Alex's examples are parameters whose values are not known at compile time
Henry: Often you have a slot in the pipeline at design time and a run-time parameter that fills that slot.
Norm: design-time parameters in
the pipeline, design-time controls for components, run-time
parameters passed to the pipeline.
... those are the three cases?
<ht> HST distinguished between static resources (stylesheets, schema documents)
Henry: I think it's useful to distinguish between those and others.
Norm: I think a third case is
parameters that come out of one component and flow into
... In the full generality, those could be any kind of parameter, but I've been thinking of those only in terms of infosets.
Richard: One way to deal with it would be to have an XML document that contains the parameters and then that could be generated by a stage in the pipeline.
<Zakim> ebruchez, you wanted to discuss using XML infosets to do that or the XDM
Erik: I just wanted to point out
that there's some conceptual simplifications that could be
... For example, when I hear of a stylesheet or schema as a parameter, I know that in many cases that's static, but you can also simplify it by saying that they're both XML documents and you can combine them.
... You can just consider an XSLT stylesheet or a schema is just an infoset.
... In XPL we've been trying to maximize this simplification.
<ht> HST likes the idea that follows from merging Richard's suggestion with Norm's resource pool idea: Provide an name:value store as part of the pipeline engine, which can be set a) at pipeline invoication; b) via a pseudo-output URI and a standardised XML document; via the engine API as it faces the components
Erik: This way you don't need to
switch between concepts. We should try to keep that
simplification in mind.
... If we use the XDM data model then the whole question becomes simpler because we can just pass around XDM simple types.
... In XPL since we only have infosets, when we need to pass the user principle, we encapsulate it all in an XML infoset. So most components take an XML infoset as a configuration.
I think this is what Henry was suggesting a few moments ago
Henry: no, actually not.
Richard: I agree that Erik's
simplification is good, I just don't want it to make the simple
case where you have a static stylesheet or schema more complex
... So if we can keep the generality without precluding optimization, I'm all for it.
Erik: There are ways to avoid the optimization problems.
Alex: I feel really strongly that
simple things should be simple.
... If I have a simple string and I need to assign it to a name so that some component can access it, turning it into an XML resource seems really hard.
<ht> HST strongly endorses this, even if all it means is that the XML _syntax_ for pipeline authoring makes it transparent
Alex: We need a simple way to bind simple values to names
Erik: Alex, I think that's fine
when you just write them statically. It's where you want to
generate them that it becomes problematic.
... I think you're going to have to have a way to allow a component to generate a parameter for some other component.
Alex: I think if we could come to
agreement that there are parameters and resources, that would
be good. Being able to formally declare a dependency on a
resource is a good thing.
... That means that you have the use case of generating something in the middle of a pipeline that is a parameter. But that's a sepearate problem and we can decide if that's possible separately.
<Zakim> ht, you wanted to remind ourselves about using the infoset for this . . .
Henry: Just to add to the
dimensionalities we're thinking about, I think there's an
important distinction between parameterizations of components
on the one hand and out-of-band computed information on the
... For out-of-band computed information, the infoset is your friend. example, what do you do with an XSLT step in the middle of a pipeline which sets the output encoding?
... You put an annotation in the infoset so that the information is available when you need to serialize it.
<ebruchez> You may also want to completely separate serialialization in pipelines.
Henry: What's interesting is that
it's not information for the next step, it's for someone else
... Infoset annotations are the way to go here.
Murray: I liked Henry's
characterization of the three different kinds of
... I'm a pipeline processor, I have a blank mind. Once things get going, I start to become aware of stuff.
... I want to be able to store that away so that I can use it later.
... Some of it is in files that I can assign URIs to, and some of it is in memory (which might also have URIs). I can build this little environment.
... I might be operating many components and there might be an arbitrary number of steps.
... Along the way, I'm going to have to calculate things. For example, processing a book might require multiple passes to get all the page numbers correct.
... I might store the infoset for the ToC somewhere, then later when I know the page numbers, I might want to edit it.
... Then later, I might grab that and actually use it to build a PostScript rendering of the ToC.
<ht> HST observes this connects up with Norm's pool idea, understood as a little local filesystem
Murray: Then later still, I might build an online version of that ToC.
Norm agrees with HST, but is frightened of mutable infosets.
<ht> each instance of the pipeline starts with an empty disk, as it were
Murray: When the job ends, I go
back to having a blank mind. Maybe some of my stuff is stored,
maybe it isn't.
... All of these things are resources that are created as we go (or before we start).
<alexmilowski> mid-pipeline binding of parameters is something I do all the time...
<ebruchez> It's called a variable ;-)
<ht> MT Pipeline does the same thing as XPL here, using no-URI fragments to identify such local resources, e.g. #tempDoc
Alex: I'm with you Murray, I think the problem we're having is with the distinction between parameters and infosets.
Murray: But aren't they all
... If I say "-j Alex", if that value needs to be assigned, somehow I have to be able to reference that value.
Alex: I think it's useful to
treat the infosets differently, but maybe there's room for
debate on that. I think there should be simple parameter values
... They're all resources philosophically, but lots of processors have a distinction betwen "the input" and parameters that are sent to them.
... Look at the Java components that wrap up XSLT for example.
Norm: The Java/XSLT case might be useful to consider.
<ebruchez> It's a push vs. pull, in a way.
<ebruchez> XSLT's source is pulled by the transformer, and the other parameters are set in advance.
<ebruchez> Not sure if that has to hold though.
<alexmilowski> the parameters do not have to set in advance