XML Processing Model WG -- 20 Apr 2006

Accept this agenda?

-> http://www.w3.org/XML/XProc/2006/04/20-agenda.html

Alessandro suggests taking item 2.5 before 2.3

Accepted

Accept minutes from the previous teleconference?

-> http://www.w3.org/XML/XProc/2006/04/13-minutes.html

Accepted

Next meeting: 27 Apr telcon

Any regrets?

None given

<scribe> ACTION: Henry to provide registration page for August f2f [recorded in http://www.w3.org/2006/04/20-xproc-minutes.html#action01]

<ht> http://www.w3.org/2002/09/wbs/38398/XProcFTF2/ is now listed as an open questionnaire for our group [this completes HT's action --scribe]

<scribe> ACTION: Murray to provide local arrangements info for August (ETA: two weeks) [recorded in http://www.w3.org/2006/04/20-xproc-minutes.html#action02]

MSM: One prominent way to get to the meeting will be to drive. Can we add some questions about car pooling to the registration form?

Murray: I'm thinking about that, I'll see what makes the most sense.

Henry: Let's us a wiki for that instead

Issue 3117: Should parallel execution of step be allowed by the language?

-> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3117

Alessandro: This was raised in a call a few weeks ago.
... I don't know if we need to spend a whole lot of time on it. We probably don't want to add constructs to the language to control this if we can avoid it.

Richard: I hope most of this falls out naturally. If we don't specify the order of execution where it isn't inevitable. That implicitly allows parallel execution. We don't initially have to say much about it.
... If you have two things that could be executed in parallel, maybe they will be. If you want to synchronize them, you have to provide some mechanism, such as reading a document that one is writing.

Alex: I think we shouldn't disallow parallel execution.

Richard: We shouldn't put anything in the language to accidentally prevent it.

Norm: It sounds like we view parallel exec. just as an optimization.

Richard: I take the normal unix pipeline as a model. If you have two processes running, nothing expresses the order except that if one is reading and one is writing, you can be sure the reader will block waiting for the writer.
... Another aspect is that any kind of streaming implies a certain kind of parallelism.

Norm repeats summary.

Richard: Not just features of the language, but also the way we describe the language. A processing model might have unintended consequences that prevented parallelism, we want to avoid that
... An example: we might say that the processing language as if it executed the components in top-to-bottom, left-to-right order which would be bad because it would imply that side-effects (if there are any) occur in a particular order.

Issue 3118: Should an implementation of the language be allowed to perform caching?

-> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3118

Alessandro: This is a specific question about a particular example.
... The stylesheet executed by the second step is executed by the first step.
... Should the pipeline engine be allowed to cache the stylesheet produced by the first step across invocations
... Can the engine be smart enough to determine that the output will be the same and reuse a cached value.

Norm: I think that what an engine does is not our problem.

Richard: The answer, in some sense, is obviously yes. If the engine can determine that the same results will be produced, then it can use the cached copy.

Richard: What does it mean for it to be exactly the same? Vanilla XSLT 1.0 stylesheets can't produce any side effects.
... but care must still be taken to assure that side effects don't happen
... We may need a way to allow authors to express that some components are side-effect free

Alex: It would be interesting to consider annotating the steps
... You may be able to say "never cache" but maybe a smart impl could cache or not as it saw fit otherwise.

Richard: There are some even simpler cases of caching. In the MT pipeline, we compile schemas and cache them. That means the same schema used in two places can reuse the cached copy.

Alex: The more interesting case is where it's produced by the pipeline.

<MoZ> alexmilowski, cache hints like expires in Cocoon ?

Alex: The concept of a dynamicly generated schema isn't far fetched, but URIs that change everytime you read them could be problematic.

HT: The http expires case isn't good enough. The MT engine checks using the http refresh if stale everytime anyone touches a cached resource because there's no way to count on pipeline time and internet time being similar.
... The actual time between two uses of a cached object may be wildly different from what you think they are. The only safe thing to do is ask the server each time.
... That works on a filesystem too
... I'm not sure how that works in the context of documents generated by the pipeline

<Zakim> ht, you wanted to endorse the idea of annotation

HT: I think that for practical reasons, I'd be very unhappy to see any requirement of no side-effects imposed on components.
... I think "escape to program execution" and "synchronous SOAP exchange" are examples of components that cannot have intrinsic gaurantees of no side-effects.
... There are also cases of components that do database updates. Those components have a side-effect.

Alex: Those aren't (necessarily) examples of pipeline steps communicating through side-effects
... If you're going to have that synchronization problem, you'd setup a dependency for that.

HT: I'm in favor of an approach which has a default and allows the component to assert the opposite.
... 1. Not side-effect free; even though my inputs are the same, you can't be sure I'll produce the same output and

<MSM> [this seems to be an "i am not a function" declaration?]

HT: 2. An expression of out-of-band dependencies.

Norm observes that we've wandered into the issue of side-effects

Norm: I think everyone will agree that if the pipeline knows the output will be the same, it can cache the result

Alessandro: I'm not sure if this is too strong a statement. Consider the case of reading a stylesheet from a URI.

MSM: Side-effects and caching are not seperable questions

Alex: Caching is a feature of the implementation not the language

Richard: Just a big switch will probably be too coarse grained.
... I imagine descriptions for each component type and the XSLT component might, for example, say that it has no side-effects by default. But then on a particular case, you could override it

Alex: The thing that concerns me about being able to say a component has side-effect is that it isn't clear what that means.
... Does it really effect the pipeline running?
... Unless there's some dependency in the flow-graph, what can the processor do.
... Unless we do something like what XSLT does with the document() function, I'm not sure there's a great answer here.

Norm: I wouldn't stream past a component that had side-effects

Norm describes the case of a SOAP service

Alex: You have a pipeline with five steps, each has an auxinput that calls this SOAP service.
... If you cache, you'll get one answer. If you don't cache, you'll get five answers.

Norm will take the question to email

Any other business?

None.

Adjourned.

XML Processing Model WG

20 Apr 2006

Attendees