Hiding dependencies in XPaths (Was: XProc Minutes 25 May 2006) from Jeni Tennison on 2006-05-26 (public-xml-processing-model-wg@w3.org from May 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 26 May 2006 09:41:37 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <4476BF41.2070707@jenitennison.com>
Hi,

http://www.w3.org/XML/XProc/2006/05/25-minutes.html say:
>    Richard: If all the inputs are available as documents that you can refer
>    to by name in XPath expressions, this results in a hidden dependency
>    within XPaths.
>    ... In order to determine which components have to have been evaulated,
>    you have to peek into the XPath to see what inputs it relies on.
>    ... That seems to be a minor implementation annoyance but a good way of
>    hiding dependencies.
>    ... which is a bad thing.
>    ... It means that two things in apparently unrelated branches of the
>    pipeline may have to wait for each other because of the XPath expression
>    one uses.
>    ... It really is just a syntax issue on one level in that you could draw
>    all the lines in explicitly. It's just that it's burried deep down in the
>    syntax.

I think I understand the point Richard's making here: if we have a 
conditional like the following (in my preferred syntax):

   <p:choose>
     <p:input name="input" />
     <p:variable name="xsl-stylesheet-pi"
       select="$input/processing-instruction('xsl-stylesheet')" />
     <p:when test="$xsl-stylesheet-pi and
                   contains($xsl-stylesheet-pi,
                            'type=&quot;text/xsl&quot;')">
       ...
     </p:when>
     <p:otherwise>
       ...
     </p:otherwise>
     <p:output name="output" select="$output" />
   </p:choose>

then to understand that the condition:

   $xsl-stylesheet-pi and
   contains($xsl-stylesheet-pi,
            'type=&quot;text/xsl&quot;')

relies on the input to the <p:choose>, the implementation has to look at 
the condition XPath and see that it refers to $xsl-stylesheet-pi, then 
look at the definition of the variable $xsl-stylesheet-pi and see that 
the XPath that supplies its value refers to the variable $input, and 
thus work out that the condition relies on the input to the <p:choose>.

What I don't understand is how things are actually all that better when
the input is referenced by setting the context node instead. The above 
would look like (|s indicate changed lines):

   <p:choose>
     <p:input name="input" />
|   <p:output name="output" ref="output" />
|   <p:variable name="xsl-stylesheet-pi" context="input"
|     select="processing-instruction('xsl-stylesheet')" />
     <p:when test="$xsl-stylesheet-pi and
                   contains($xsl-stylesheet-pi,
                            'type=&quot;text/xsl&quot;')">
       ...
     </p:when>
     <p:otherwise>
       ...
     </p:otherwise>
   </p:choose>

The implementation still has to look inside the condition XPath to work 
out that $xsl-stylesheet-uri has been referenced. It's then easier to 
work out that this variable references the input to the <p:choose>, but 
the implementation still has to look for variable references within the 
XPath expression used to set the $xsl-stylesheet-pi variable in case 
other variables (which might rely on other documents) have been referenced.

As far as I can tell, so long as we have variables that can be set based 
on inputs then we have to look in XPaths for dependencies. Of course we 
could ban variables altogether, or only allow them to be used to 
manipulate parameter values, in which case the above would have to be 
written:

   <p:choose>
     <p:input name="input" />
     <p:output name="output" ref="output" />
|   <p:when context="input"
|           test="processing-instruction('xsl-stylesheet') and
|                 contains(processing-instruction('xsl-stylesheet'),
|                          'type=&quot;text/xsl&quot;')">
       ...
     </p:when>
     <p:otherwise>
       ...
     </p:otherwise>
   </p:choose>

Although I don't find the above very usable (because of the required 
repetition of location paths), it has the virtue of being absolutely 
clear where the dependencies lie.

>    Richard: I think the issue of strings is a red herring. Though I agree
>    that we should restrict them to strings now, that doesn't mean we can't
>    make them more complex in the future.
>    ... If the functionality that's needed is the ability to refer to multiple
>    documents, it could be done more explicitly. There could be a syntax that
>    bound variables to the names of outputs of other steps. That at least
>    would make it expicit which ones were being used.

Fromn a user's standpoint, I find something like:

   <p:choose>
     <p:input name="doc1" />
     <p:input name="doc2" />
     ...
     <p:variable name="doc1" context="doc1" select="." />
     <p:variable name="doc2" context="doc2" select="." />
     <p:when test="name($doc1/*) = name($doc2/*)">
       ...
     </p:when>
     ...
   </p:choose>

more obscure than:

   <p:choose>
     <p:input name="doc1" />
     <p:input name="doc2" />
     ...
     <p:when test="name($doc1/*) = name($doc2/*)">
       ...
     </p:when>
     ...
   </p:choose>

I don't see how the first option is more explicit. Under the first 
option, to work out what the condition is actually doing I have to work 
back through the variable definitions to the inputs. In the second, I 
have one less redirection to worry about, which makes things easier for me.

If we have the functionality of queries over multiple documents *at 
all*, then I really don't see how one method is any simpler than the 
other for the implementation, and I definitely think that assigning 
inputs to variables is easier for the user.

[snip]
>    Proposal: XPath expressions will be evaluated over exactly one input,
>    syntactic details unresolved.

I feel pretty strongly that this is the wrong way to go, but if I 
haven't managed to convince anyone of that by next week then I don't 
want to hold up progress on the draft.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Friday, 26 May 2006 08:41:57 UTC