This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3113 - Does the pipeline engine acts as a resource manager?
Summary: Does the pipeline engine acts as a resource manager?
Status: CLOSED WONTFIX
Alias: None
Product: XML Processing Model
Classification: Unclassified
Component: Pipeline language (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Norman Walsh
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-12 12:14 UTC by Norman Walsh
Modified: 2011-02-23 20:27 UTC (History)
2 users (show)

See Also:


Attachments

Description Norman Walsh 2006-04-12 12:14:30 UTC
I think it makes sense for the pipeline engine to be a resource manager for the components. In other words, if some stage of a pipeline has produced a document, http://example.org/doc, and a later stage attempts to reference that document (through an external parsed entity, XInclude statement, import statement, XLink, or what-have-you), the pipeline processor should gaurantee that the document retrieved will be the document previously produced.

Note that this does not necessarily imply that the document has ever been serialized at the location of the document's base URI.
Comment 1 Alessandro Vernet 2006-04-13 08:05:39 UTC
When the input of a pipeline makes a reference to a document available at given URI, say with <p:input name="data" href="http://example.com/my-file.xml">, then it seems reasonable that the pipeline engine goes out and fetches the document at the URI. We can call the part of the pipeline engine that performs this job the "resource manager".

However, I see that case of a step that produces a document then used by another step as something different. The output of the first step is given a label, and the input of the second step makes a reference to that label. I imagine that in a given implementation that data could flow between the steps throught SAX events. In this second case, there isn't really a piece that we can naturally call a "resource manager".

Alex
Comment 2 Innovimax 2006-04-13 09:21:39 UTC
I agree with the fact that we need to know what is used in a pipeline to shake for side-effects and dependencies
we should in that case use an "external resource manager" to manage what comes from out of the pipeline which should be XML conformant.
And we should even enforce the implementors not to access resources which haven't be declared in the external resource manager.
In that case, external resource manager could cache some resource in case of reuse.

As Alessandro says, what happen in the pipeline should, AFAIK, be managed differently to avoid problems with streamable implementations (using events : sax or stax)

But we can also, imagine that the output of a pipe want to be "published" as an external resource (in what precise cases ?)
So we should consider all these cases
Comment 3 Norman Walsh 2006-04-13 11:03:19 UTC
I agree that the case of a document flowing between two components on a pipe is different, but I think there are still two cases to consider:

1. What if that document is tee'd off to the serializer component? Does a subsequent reference to the URI that was serialized gaurantee access to the same document? (i.e. in the case of an XInclude in some other document fed through the XInclude component)

2. What if the document flowing through the pipe has a base URI? In this case, it isn't explicitly serialized, but the pipeline engine could still "know" about it.
Comment 4 Norman Walsh 2006-04-13 16:12:07 UTC
Discussed without resolution at http://www.w3.org/XML/XProc/2006/04/13-minutes.html
Comment 5 Norman Walsh 2006-04-13 16:13:36 UTC
Use case from today's telcon:

Consider a component that produces stylesheet fragment frag.xsl and serializes it.

Some subsequent XSLT component takes as input base.xsl which, unbeknownst to the pipeline, uses xsl:include to refer to frag.xsl.

How can that dependency be supported? Does the resource manager just do it automatically or does the pipeline author have to make it explicit.
Comment 6 Alessandro Vernet 2006-04-19 23:49:38 UTC
I gave some more thought about this question and I will get back to the example that Norm gave during our previous call. The intent of the author is to have a first stylesheet produces a XSLT fragment to be included in a second stylesheet that we want to run.

We could do this in 3 steps:
1) First XSLT runs, produces the fragment.
2) Use XSLT to produce an XSLT that includes the fragment. The standard input is the output is step 1.
3) Run the XSLT produced in step 2.

One could write this:

<p:pipeline xmlns:p="...">

    <!-- Produce fragment in "first stylesheet" -->
    <p:step name="xslt">
        <p:input href="..."/>
        <p:input name="stylesheet" href="..."/>
        <p:output label="fragment"/>
    </p:step>

    <!-- Include fragment in  -->
    <p:step name="xslt">
        <p:input labelref="fragment"/>
        <p:input name="stylesheet" href="..."/>
        <p:output label="stylesheet-2"/>
    </p:step>

    <!-- Run "second stylesheet" -->
    <p:step name="xslt">
        <p:input href="..."/>
        <p:input name="stylesheet" href="stylesheet-2"/>
        <p:output label="..."/>
    </p:step>

</p:pipeline>

All the connections are explicit in this case. I like that. However, we need 3 steps, and Norm's solution of using XInclude is more elegant. I don't like that and would like to be able to use XInclude.

Let's look at a solution that:
a) Keeps all the connections explicit in the pipeline
b) Allows for the XInclude solution
c) Implies the existence of a resource manager

It involves 2 steps:

1) The first step produces the fragment, as before.
2) The second step declares another input, with a name we choose, say "fragment". The XInclude in the stylesheet looks like <xi:include href="input:fragment"/>.

One could write this:

<p:pipeline xmlns:p="...">

    <!-- Produce fragment in "first stylesheet" -->

    <p:step name="xslt">
        <p:input href="..."/>
        <p:input name="stylesheet" href="..."/>
        <p:output label="step-1-output"/>
    </p:step>

    <!-- Run "second stylesheet" -->
    <p:step name="xslt">
        <p:input href="..."/>
        <p:input name="fragment" labelref="step-1-output"/>
        <p:input name="stylesheet" href="stylesheet-2"/>
        <p:output label="..."/>
    </p:step>

</p:pipeline>

The pipeline engine provides a resource manager. The implementation of a step is responsible to delegate the resolution of URLs to the resource manager. In the context of second stylesheet as XInclude asks for "input:fragment" the pipeline engine sees that there is an input called "fragment" to that step, that is it connected to the output of the first step, and so that the output of the first step is the document which is requested.

Alex
Comment 7 Innovimax 2006-04-20 08:10:27 UTC
Alessandro, I think you forgot one other way to do this :
xsl:include (http://www.w3.org/TR/xslt#include) or xsl:import (http://www.w3.org/TR/xslt#import) 
and this is usually the way people do that now (without XProc)
are we going to say everybody that to use XProc, they should rewrite all their stylesheets and use XInclude !! 
I agree with the fact we need absolutely a resource manager, but let's point out the problems we could have if user don't give one, could the processor still work and recover with less optimizations or is this a critical need such that a xproc instance must provide a resource manager.

If the later, what are the problems raised by wraping existing components (is this just overloading URIResolver or something like).
I think the two ways give a lot a problems : one for the implementor and the other for users
Comment 8 Alessandro Vernet 2006-04-26 18:42:17 UTC
Mohamed,

You could do the same thing with xsl:include or xsl:import. You would call the stylesheet from the pipeline with:

    <p:step name="xslt">
        <p:input href="..."/>
        <p:input name="fragment" labelref="step-1-output"/>
        <p:input name="stylesheet" href="stylesheet-2"/>
        <p:output label="..."/>
    </p:step>

And then in XSLT use:

<xsl:include href="input:fragment"/>
  or
<xsl:import href="input:fragment"/>

I agree that that in most cases people will use <xsl:import>/<xsl:include> over <xi:include>, but just considered the case of <xi:include> as this was the example given earlier by Norm.

Alex
Comment 9 Norman Walsh 2006-04-26 18:55:27 UTC
I don't think the solution outlined in comment #6 works.

The XSLT component presumably takes two inputs: a document and a stylesheet. I don't actually expect it to have any provision for accepting other inputs. Even in the specific case given, there's nothing in the call to the xslt component that gives it the slightest clue about what the anonymous "fragment" input is, is for, or where it fits into the process.

The XSLT processor is only going to be able to get at the component bits it needs through URIs.
Comment 10 Alessandro Vernet 2006-04-26 19:21:21 UTC
(In reply to comment #9)

Norm,

Would you fill more comfortable using another element name than <p:input>, or somehow making it clear that this is an "auxilary input" to XSLT which won't be defined in the interface exposed by the XSLT component? We could have something that looks like:

    <p:step name="xslt">
        <p:aux-input name="fragment" labelref="step-1-output"/>
        <p:input href="..."/>
        <p:input name="stylesheet" href="stylesheet-2"/>
        <p:output label="..."/>
    </p:step>

Alex
Comment 11 Norman Walsh 2006-04-26 19:49:28 UTC
It wasn't the use of p:input that bothered me. What the syntax has to do is indicate the URI of the document:

  <p:step name="xslt">
    <p:input href="generated-fragment.xsl"/>
    <p:input name="document" href="document.xml"/>
    <p:input name="stylesheet" href="base.xsl"/>
    <p:output label="..."/>
  </p:step>

I don't see a convenient way to specify that the fragment can arrive at the component through an input pipe, though I suppose we could say:

  <p:step name="xslt">
    <p:input href="generated-fragment.xsl" ref="somelabel"/>
    <p:input name="document" href="document.xml"/>
    <p:input name="stylesheet" href="base.xsl"/>
    <p:output label="..."/>
  </p:step>

But I'm not sure that helps very much. In either case, I think the pipeline engine has to intercept the processors attempt to load "generated-fragment.xsl" and make sure that it supplies the correct document.

Comment 12 Alessandro Vernet 2006-04-26 21:42:37 UTC
(In reply to comment #11)

Norm,

I am fine with having the resource manager intercepting attempts made by components to load documents. But I am not comfortable with making those URLs look like file names. If in XSLT you have:

<xsl:copy-of select="doc('name.xsl')"/>

Is name.xsl a file loaded relative to the current base URI, or a reference to the output of a processor? I suggest we solve this problem by using a special scheme in the URI that will be recognized by the resource manager:

<xsl:copy-of select="doc('input:name')"/>

Then, if a step B uses the output of a step A, I would like the connection to be visible in the pipeline. I suggested that we make that connection obvious by assigning a label to the output of step A:

<p:output name="..." label="my-label"/>

And that we make a reference to that label in step B:

<p:input name="name" ref="my-label"/>

Would this make sense?

Alex
Comment 13 Norman Walsh 2006-04-27 09:58:23 UTC
I don't think a special URI scheme is a good idea. Beyond the web
architecture issues associated with creating new ones[1], I think
it's unlikely that they're necessary. I'm assuming the pipeline
engine acts as a resource manager at a level below the actual
processor (in Java/SAX terms by acting as an entityResolver() or
URIResolver()).

If your stylesheet contains

  <xsl:copy-of select="doc('name.xsl')"/>

what the resolver actually sees is the absolute URI that that resolves
to (or maybe the base URI and the relative URI, from which the
absolute URI can be constructed). As long as your pipeline generates
the correct absolute URI, everything will just work. Note that there's
no reason your pipeline can't generate
http://example.org/style/name.xsl as easily as it generates anything
else.

Making the connections visible is absolutely required, but I'm not
sure that it requires labels or sending the documents through the
"input/output" pipe.

I think the key thing is being able to say that some steps produce
documents with specific URIs.

Consider:

  <p:step name="p:xslt">
    <p:input name="stylesheet" href="style.xsl"/>
    <p:input name="document" href="doc.xml"/>
    <p:output name="result" label="module-res"/>
    <p:output href="http://example.org/style/module.xsl"/>
  </p:step>

That might be one way of saying that this stylesheet produces a result
document (or sequence of document) and that one of those results is
expected to have the URI http://example.org/style/module.xsl. 

A later step might be written this way:

  <p:step name="p:xslt">
    <p:input name="stylesheet" href="base-style.xsl"/>
    <p:input name="document" href="other-doc.xml"/>
    <p:input href="http://example.org/style/module.xsl"/>
    <p:output name="result" label="doc-out"/>
  </p:step>

That might be one way of saying that this transformation requires both
a stylesheet and a document but also another input, that isn't going
to arrive through either of the input pipes directly, named
http://example.org/style/module.xsl. That seems to me like enough
information for the pipeline engine to see that the first step has to
run before the second.

What actually happens to the document (or sequence of documents) produced
on the "module-res" pipe is irrelevant.

[1] http://www.w3.org/TR/webarch/#pr-reuse-uri-schemes
Comment 14 Alessandro Vernet 2006-04-27 17:58:20 UTC
(In reply to comment #13)

Norm,

We agree on the need to make connections visible at the pipeline level, so there is no need to elaborate on that. I also agree that one should try to reuse URI schemes, but we should not push the reuse so far that we change the semantic of a scheme. If you write:

 <p:step name="p:xslt">
   <p:input href="http://example.org/style/module.xsl"/>
   <p:input name="stylesheet" href="base-style.xsl"/>
   <p:input name="document" href="other-doc.xml"/>
   <p:output name="result" label="doc-out"/>
 </p:step>

In the first input we read a reference to "http://example.org/style/module.xsl". In general when we see a reference to an http://... URI, it means that the document will be retrieved using the HTTP protocol. It means that we can enter this URI in a browser and see the document. However here we are saying: well, this is true in general, *unless* earlier in the pipeline there is a <p:output href="http://example.org/style/module.xsl"/>. In that case this is not a reference to an HTTP URI but to the document produced by this output. I think this is confusing and that this confusion can and should be avoided.

The "reuse URI schemes" practice you quote says:

"A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources."

Using an http:// URI to make a reference to the output of another step in my mind violates the "provides the desired properties of identifiers and their relation to resources" condition.

I think using another scheme to make references to the input of a step is a better options. You could then write:

 <p:step name="p:xslt">
   <p:input name="foo" ref="label-of-another-output"/>
   <p:input name="stylesheet" href="base-style.xsl"/>
   <p:input name="document" href="other-doc.xml"/>
   <p:output name="result" label="doc-out"/>
 </p:step>

And use doc('input:foo') in your stylesheet. The connection is explicit in the pipeline (<p:input name="foo" ref="label-of-another-output"/>) and is done just like other connections (ref to a label). You could also write:

 <p:step name="p:xslt">
   <p:input name="foo" href="http://example.org/style/module.xsl"/>
   <p:input name="stylesheet" href="base-style.xsl"/>
   <p:input name="document" href="other-doc.xml"/>
   <p:output name="result" label="doc-out"/>
 </p:step>

And use doc('input:foo') in your stylesheet, but that would be equivalent to just using doc('http://example.org/style/module.xsl') in the stylesheet. And this would in both cases load the resource from an HTTP server, per the usual semantic of an HTTP URI.

Alex
Comment 15 Norman Walsh 2006-05-04 16:16:21 UTC
No: http://www.w3.org/XML/XProc/2006/05/04-minutes.html