XML Processing Model WG -- 27 Apr 2006

Accept this agenda?

-> http://www.w3.org/XML/XProc/2006/04/27-agenda.html

Accepted

Accept minutes from the previous teleconference?

-> http://www.w3.org/XML/XProc/2006/04/20-minutes.html

Accepted

Next meeting: 4 May telcon

Any regrets?

<MoZ> yes

Face-to-face meeting?

Registration page: http://www.w3.org/2002/09/wbs/38398/XProcFTF2/

Issue 3096: Are components side-effect free?

-> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3096

Norm proposes:

I propose that we say that all components are non-functional. That is,

a pipeline implementation must behave as if it evaluated a component

every time it occurs. "Must behave as if" is spec-ease for

"implementations that are clever enough to determine with certainty

that a component is, in fact, functional are free to cache the

intermediate results because by golly if it is, no one will be able to

Richard: This doesn't preclude adding a mechanism later to allow authors to assert that a step or component is functional

Norm: Yes.

Richard: Does this address the converse case? Producing output side-effects and behaving the same way for given inputs

Norm: This is the "functional" aspect, not the side-effect aspect

Richard: Side-effects are like hidden outputs, functionality is like hidden inputs

Alex: This is a good place to start, register a new issue about functional components?

<scribe> ACTION: Alex to create an issue about the possibility of functional components [recorded in http://www.w3.org/2006/04/27-xproc-minutes.html#action01]

Proposal accepted.

Issue 3113: Does the pipeline engine act as a resource manager?

-> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3113

Norm: One aspect of this question is, does the pipeline engine provide the sort of URI-stability that XSLT, for example, gives the document function

Richard: I strongly disagree with this as a requirement; it requires a degree of intimacy between the engine and the components that may not always be available

Alex: Is this something that might be "at user option"

Norm: I'd like to avoid that if at all possible

<Zakim> ht, you wanted to push back

Henry: I need some information; in my current state of knowledge I think it's a bad idea for pipeline engines
... Especially when you are running a pipeline engine as a server, you do not want to flush the cache everytime you run a pipe because it's useful to keep things around.
... In their parsed and ready-to-go state (provided they haven't changed)
... I'm happier saying, "no, you should expect your pipeline to behave in the way of any other web application does"
... Yes, things can change.

Alex: If we step back and look at the web browser case, consider an image embedded 10 times on the same page. The browser reuses the image.
... The resolution of URI-to-resource is stable for the duration of a page is one reasonable expectation

<MSM> [I think the fact that browser do or do not re-fetch is an optimization they make, not part of the specification of correct browser behavior - am I wrong?]

Richard: consider other things like XML pipelines, like shell scripts, where "cat foo" twice might not return the same file.

<MSM> [If ten <img> elements in the same HTML document refer to "my_image.jpg", and that image is served with a lifetime of 0, are correct browsers guaranteed to fetch it only once? What spec says that?]

Some discussion of whether or not browsers actually behave that way

<Zakim> MSM, you wanted to say that as an empirical statement, it's not a very strong argument for making the behavior part of our spec

MSM: Implementors will do that for performance reasons regardless of whether a spec requires it or not

Richard: Is there a spec for how you display things in a web page?

Alex: No

MSM: In that sense, it's not clear to me that the browser analogy bears on our decision

Alex: There's a user expectation of some aspect of stability

Richard: I don't think the browser analogy is a good one. The engine is running a collection of potentially independently implemented components.

Murray: I'm relying on my memory, but in HTTP there's a mechanism for specifying time-to-live. So if there's a nano-second TTL, then maybe it would go get the resource again.
... Similarly, if I was getting the time of day from a URI then it might change
... So if you're worried about that, maybe you need a "caching" component.

Norm: I think consensus is coming towards the answer "no"

Alex: I don't agree, I think it's important that URIs are stable for the duration of an execution
... If you need to identify unique resources, you can generate unique URIs with query parameters
... We haven't decided if the resources flowing through the pipeline have URIs or not

Richard: I notice that the bug is actually talking about something produced by the pipeline

Norm: I think those are the same case

Richard: You could provide components that fetch and store URIs stably.

Norm describes the situation where an XSLT needs to get an ancillary resource by URI

Alex: I really want some URIs to be stable throughout the duration of a pipeline

Murray: I'm not convinced that we don't need a resource manager
... I'd like to posit the existence of a component that is a proxy server or something of that ilk
... That component knows if requests should always send things back from the cache

<MSM> [I agree fervently that as users we need resource managers, and that implementations of our language will be more usable if they use good resource managers. But we also need character sets. We don't specify a character set as part of our spec to meet that need, and the same should probably hold for resource management. Separate problem, separate spec.]

Murray: I think it's the case that sometimes you're going to want the documents to remain stable and sometimes you're going to want to get current results

<alexmilowski> yes!

Richard: But I may be using components that don't know how to use a proxy server

Murray: I thought once you setup a proxy, then all requests went through that proxy.

That's implementation and operating-system dependent

Richard: Proxies do give a degree of generality that seems nice

MSM: I'm not sure I'm understanding everything going on here. I agree that being able to cache and being able to gaurantee up-to-date resources are good things
... But lots of these things seem to be not terribly closely related to pipelines any more than we need a character set.
... We just rely on getting character sets from lower layers.
... Building it into the pipeline engine strikes me as a breach of orthogonality.
... At least for the components that we require an implmentation provide, we can say what the answers are or say that they're implementation defined

Murray: I think you're thinking of it in terms of the pipeline language and not the overall processing model. If you're processing large volumes of XML, you may want a proxy server that has access to pipeline descriptions so that all your documents can be passed through.

<richard> Beware of assuming that everything comes through HTTP. What if they're just files?

Indeed. The proxy has to handle file: URIs as well.

MSM: It should be orthogonal. If I've got a caching proxy installed, I want my pipeline engine to use that one, not one that it felt it needed to build in.

Alex: The document function in XSLT gets the resource through the local environment that might use a local cache

MSM: The only thing the XSLT language says is that if you call the document function with the same URI, you'll get the same document

Alex: You want to be able to compare the objects you get back from the document function.
... Do we really have the requirement that things behave this way across components?

Richard: I think that Alex has drawn attention to an important point. XSLT can do this because it only says the document function behaves this way.
... Are we really going to say that if the stylesheet is a file: URI then it can't just open it?

Murray: In a shell script, you'd handle this by copying it and then referring to the copy.

Richard: Yes, and if you were using a program that had the name hard coded, then you couldn't make it use the copy

Norm attempts to summarize the consensus which remains "no"

HT: The discussion we've had has been drawn somewhat more narrowly than the first sentenc of the actual issue.

<MSM> [I wonder if there is consensus on the proposition that in cases like the example given by Norm in raising the issue, it *is* our responsibility to say whether the data stream written to uri Foo is or is not guaranteed the same as the data stream (later) read from uri Foo]

HT: We've discussed in the past the use of pipeline engines as resource managers.
... Consider output="#foo" somewhere and input="#foo" somewhere else in a pipeline.
... One way to think about that is that the engine is managing those resources.
... I don't believe that issue is off the table because of this discussion

Norm: I agree

<MSM> I'm a little puzzled / troubled here. If I interpret output="#foo" and input="#foo" as references to resources to be managed by the pipeline, then I suddenly have an ambiguity I didn't use to have:

<MSM> does the input stream read the ouptut stream?

<MSM> or is this a pipeline which reads resource #foo, does something with it, and writes it back?

Scribe lost the thread

<MSM> ht, I wonder if you can expound on how you would propose avoiding this ambiguity

<ht> So I think Richard just expressed the dichotomy in an interesting way -- do we name ports, or infosets

<MSM> +1: Richard's formulation of the question is an acute one

ADJOURNED

XML Processing Model WG

Meeting 18, 27 Apr 2006

Attendees

Contents

Accept this agenda?

Accept minutes from the previous teleconference?

Next meeting: 4 May telcon

Issue 3096: Are components side-effect free?

Issue 3113: Does the pipeline engine act as a resource manager?

Summary of Action Items