05 Nov 2010

See also: IRC log




Investigating methods for making pipelines easier to build

Vojtech: We're reimplementing the DITA toolkit in XProc (instead of Ant+Java extensions)
... It's about 9000-10000 lines of XProc
... Top-level pipeline consists of about 20 steps
... If you didn't want to do, for example, conref processing (one of the steps in the middle)
... how would you turn that off or replace that with something else?
... Suppose you want to do it just slightly differently
... Currently only option is to cut-and-paste the entire pipeline and change the one part I want to change.
... It would be nice if you could pass a step in dynamically or do some sort of replacement

Alex: Or pass in a subpipeline

Norm: The problem is that there's no obvious, single step that you want to override, you need to access them all

Vojtech: Even if they copy-and-paste to change the definition of a pipeline, the original pipeline may be called from somewhere else where they have no control.
... It would also be nice to have library-level variables or constants.

Norm: I can imagine we might do that...

Mohamed: But it's not really variables, you want to override them, so it's more like parameters or options.

Norm: That could get messy.

More discussion

Vojtech: I really want to replace all steps of a given type, not just a single instance.

Florent: It seems that there are two aspects here: the ability to override an entire step type (like XSLT import precedence) and the ability to modify the internals of one type.

Norm: Maybe you could make it work with just a mechanism to override whole step types. If you want to override only some, you can conditionalize the internals.

Mohamed: But how would you know what instance it was?

Norm: Yes, I think we'd have to provide a function to get the step name.

Alex: Sometimes the pipeline author knows where the extension points are, where the author is making choices, and that's maybe slightly different.

Norm: That's where p:eval would be useful. You could let users pass in a pipeline.
... You can kind of do that today by putting in calls to step types that are undefined, then the user has to import both your pipeline and declarations for those types.
... (I wonder if that's actually true)...

Alex: It might be nice if we could make that easier; some mechanism for declaring steps "abstract" and then some mechanism for defining them when you import the pipeline.
... I wonder if users will be able to find ways to define where their extension points need to be or if we'll need a "redefine" mechanism.

Mohamed: I don't think I want to see the ability to change a single instance of a type, but being able to change a whole type seems like something I might want to see.

Norm: Isn't this just like what we were talking about in XSLT WG about overriding templates?

Mohamed: Yes.

Norm: Ok. I'm not dreaming.

"Streaming" XProc

Mohamed: The idea is the same as it was at the beginning of XProc. Being able to sort out, whatever we call it, saying how you can make your pipeline more streamable.
... We already say that if you use last(), you've probably impacted streaming in the pipeline.
... It would be nice if we had a document that described a profile of XProc that would improve streamability.
... Innovimax is working on an implementation of XProc using multi-threading and streaming.
... We are planning to issue the project at the end of March. It is based on XML Calabash. We already have some interesting results.
... We're working at the same time on a streamable subset of XPath with a different research agency.
... We have customers with large volumes of data that can't use DOMs and they can't rewrite all their tools.

Norm: That could be interesting. If you had a defined streamable subset of XPath, you could have a switch to analyze a pipeline for streamability.

Mohamed: Like the work we're doing in XSLT 3.0 now for streaming, we should be looking at XProc with streaming in mind.

Alex: It always depends on what you mean by streaming. What's your bound?

Some discussion of the value (or not) in defining a streamable subset of XPath (whatever that means)

Florent recounts the example in XSLT of two XPath expressions interacting in ways that prevent streaming even though the expressions are simple.

Discussion leads to general agreement that a study of use cases is necessary and would be valuable.

Mohamed: Having a spec is only the first step. The second step is to have a document that describes usage patterns that will improve performance.

Mohamed suggests that we could have a workshop on pipeline performance.

Norm: So do we want to ask Liam to put something in the charter about investigating streaming or performance?

General nods of agreement.

More discussion of possible extension steps

<scribe> ACTION: Alex to review common concurrency patterns to see what might work best for us (.e.g. countdown latches) [recorded in http://www.w3.org/2010/11/05-xproc-minutes.html#action01]

Summary of Action Items

[NEW] ACTION: Alex to review common concurrency patterns to see what might work best for us (.e.g. countdown latches) [recorded in http://www.w3.org/2010/11/05-xproc-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2010/11/05 11:56:30 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.135  of Date: 2009/03/02 03:52:20  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

No ScribeNick specified.  Guessing ScribeNick: Norm
Inferring Scribes: Norm

WARNING: No "Present: ... " found!
Possibly Present: Alex Florent Liam MoZ Mohamed Norm Vojtech alexmilowski fgeorge fgeorges
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy

WARNING: No meeting title found!
You should specify the meeting title like this:
<dbooth> Meeting: Weekly Baking Club Meeting

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 05 Nov 2010
Guessing minutes URL: http://www.w3.org/2010/11/05-xproc-minutes.html
People with action items: alex

[End of scribe.perl diagnostic output]