See also: IRC log
Registration page: http://www.w3.org/2002/09/wbs/38398/XProcFTF2/
I propose that we say that all components are non-functional. That is,
a pipeline implementation must behave as if it evaluated a component
every time it occurs. "Must behave as if" is spec-ease for
"implementations that are clever enough to determine with certainty
that a component is, in fact, functional are free to cache the
intermediate results because by golly if it is, no one will be able to
Richard: This doesn't preclude adding a mechanism later to allow authors to assert that a step or component is functional
Richard: Does this address the converse case? Producing output side-effects and behaving the same way for given inputs
Norm: This is the "functional" aspect, not the side-effect aspect
Richard: Side-effects are like hidden outputs, functionality is like hidden inputs
Alex: This is a good place to start, register a new issue about functional components?
<scribe> ACTION: Alex to create an issue about the possibility of functional components [recorded in http://www.w3.org/2006/04/27-xproc-minutes.html#action01]
Norm: One aspect of this question is, does the pipeline engine provide the sort of URI-stability that XSLT, for example, gives the document function
Richard: I strongly disagree with this as a requirement; it requires a degree of intimacy between the engine and the components that may not always be available
Alex: Is this something that might be "at user option"
Norm: I'd like to avoid that if at all possible
<Zakim> ht, you wanted to push back
Henry: I need some information;
in my current state of knowledge I think it's a bad idea for
... Especially when you are running a pipeline engine as a server, you do not want to flush the cache everytime you run a pipe because it's useful to keep things around.
... In their parsed and ready-to-go state (provided they haven't changed)
... I'm happier saying, "no, you should expect your pipeline to behave in the way of any other web application does"
... Yes, things can change.
Alex: If we step back and look at
the web browser case, consider an image embedded 10 times on
the same page. The browser reuses the image.
... The resolution of URI-to-resource is stable for the duration of a page is one reasonable expectation
<MSM> [I think the fact that browser do or do not re-fetch is an optimization they make, not part of the specification of correct browser behavior - am I wrong?]
Richard: consider other things like XML pipelines, like shell scripts, where "cat foo" twice might not return the same file.
<MSM> [If ten <img> elements in the same HTML document refer to "my_image.jpg", and that image is served with a lifetime of 0, are correct browsers guaranteed to fetch it only once? What spec says that?]
Some discussion of whether or not browsers actually behave that way
<Zakim> MSM, you wanted to say that as an empirical statement, it's not a very strong argument for making the behavior part of our spec
MSM: Implementors will do that for performance reasons regardless of whether a spec requires it or not
Richard: Is there a spec for how you display things in a web page?
MSM: In that sense, it's not clear to me that the browser analogy bears on our decision
Alex: There's a user expectation of some aspect of stability
Richard: I don't think the browser analogy is a good one. The engine is running a collection of potentially independently implemented components.
Murray: I'm relying on my memory,
but in HTTP there's a mechanism for specifying time-to-live. So
if there's a nano-second TTL, then maybe it would go get the
... Similarly, if I was getting the time of day from a URI then it might change
... So if you're worried about that, maybe you need a "caching" component.
Norm: I think consensus is coming towards the answer "no"
Alex: I don't agree, I think it's
important that URIs are stable for the duration of an
... If you need to identify unique resources, you can generate unique URIs with query parameters
... We haven't decided if the resources flowing through the pipeline have URIs or not
Richard: I notice that the bug is actually talking about something produced by the pipeline
Norm: I think those are the same case
Richard: You could provide components that fetch and store URIs stably.
Norm describes the situation where an XSLT needs to get an ancillary resource by URI
Alex: I really want some URIs to be stable throughout the duration of a pipeline
Murray: I'm not convinced that we
don't need a resource manager
... I'd like to posit the existence of a component that is a proxy server or something of that ilk
... That component knows if requests should always send things back from the cache
<MSM> [I agree fervently that as users we need resource managers, and that implementations of our language will be more usable if they use good resource managers. But we also need character sets. We don't specify a character set as part of our spec to meet that need, and the same should probably hold for resource management. Separate problem, separate spec.]
Murray: I think it's the case that sometimes you're going to want the documents to remain stable and sometimes you're going to want to get current results
Richard: But I may be using components that don't know how to use a proxy server
Murray: I thought once you setup a proxy, then all requests went through that proxy.
That's implementation and operating-system dependent
Richard: Proxies do give a degree of generality that seems nice
MSM: I'm not sure I'm
understanding everything going on here. I agree that being able
to cache and being able to gaurantee up-to-date resources are
... But lots of these things seem to be not terribly closely related to pipelines any more than we need a character set.
... We just rely on getting character sets from lower layers.
... Building it into the pipeline engine strikes me as a breach of orthogonality.
... At least for the components that we require an implmentation provide, we can say what the answers are or say that they're implementation defined
Murray: I think you're thinking of it in terms of the pipeline language and not the overall processing model. If you're processing large volumes of XML, you may want a proxy server that has access to pipeline descriptions so that all your documents can be passed through.
<richard> Beware of assuming that everything comes through HTTP. What if they're just files?
Indeed. The proxy has to handle file: URIs as well.
MSM: It should be orthogonal. If I've got a caching proxy installed, I want my pipeline engine to use that one, not one that it felt it needed to build in.
Alex: The document function in XSLT gets the resource through the local environment that might use a local cache
MSM: The only thing the XSLT language says is that if you call the document function with the same URI, you'll get the same document
Alex: You want to be able to
compare the objects you get back from the document
... Do we really have the requirement that things behave this way across components?
Richard: I think that Alex has
drawn attention to an important point. XSLT can do this because
it only says the document function behaves this way.
... Are we really going to say that if the stylesheet is a file: URI then it can't just open it?
Murray: In a shell script, you'd handle this by copying it and then referring to the copy.
Richard: Yes, and if you were using a program that had the name hard coded, then you couldn't make it use the copy
Norm attempts to summarize the consensus which remains "no"
HT: The discussion we've had has been drawn somewhat more narrowly than the first sentenc of the actual issue.
<MSM> [I wonder if there is consensus on the proposition that in cases like the example given by Norm in raising the issue, it *is* our responsibility to say whether the data stream written to uri Foo is or is not guaranteed the same as the data stream (later) read from uri Foo]
HT: We've discussed in the past
the use of pipeline engines as resource managers.
... Consider output="#foo" somewhere and input="#foo" somewhere else in a pipeline.
... One way to think about that is that the engine is managing those resources.
... I don't believe that issue is off the table because of this discussion
Norm: I agree
<MSM> I'm a little puzzled / troubled here. If I interpret output="#foo" and input="#foo" as references to resources to be managed by the pipeline, then I suddenly have an ambiguity I didn't use to have:
<MSM> does the input stream read the ouptut stream?
<MSM> or is this a pipeline which reads resource #foo, does something with it, and writes it back?
Scribe lost the thread
<MSM> ht, I wonder if you can expound on how you would propose avoiding this ambiguity
<ht> So I think Richard just expressed the dichotomy in an interesting way -- do we name ports, or infosets
<MSM> +1: Richard's formulation of the question is an acute one