Bug 20995 - [XPROC10] document-uri
[XPROC10] document-uri
Status: ASSIGNED
Product: XML Processing Model
Classification: Unclassified
Component: Pipeline language
unspecified
PC Windows NT
: P2 normal
: ---
Assigned To: Norman Walsh
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-14 12:45 UTC by Tim Mills
Modified: 2014-03-12 14:15 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Mills 2013-02-14 12:45:39 UTC
The XProc 1.0 specification fails to mention the value of the document-uri property for the results of any step, or for pipeline elements such as p:document.

I presume that for the pipeline element p:document, it should be the URI of the accessed document.

However, for other steps, and in particular the p:load step, I presume, the document-uri should be absent.  Otherwise, there is a risk that the XPath rule 
It would appear that this would permit the violation of XPath rule that "given a document node $N, the result of fn:doc(fn:document-uri($N)) is $N will always be True" would be violated, e.g. by performing two invocations of p:load specify the same URI, but with different values for dtd-validate.

It might be worth noting that the 'p:identity' operation changes the document-uri and base-uri properties.  It's also not clear whether for a document $d and its 'identical' copy $i, ($d is $i) must return false.
Comment 1 Norman Walsh 2013-03-13 15:06:01 UTC
Hi Tim,

We're starting to take up the bugs :-)

Can you clarify what you mean in the last paragraph about p:identity changing the URIs. The WG doesn't believe that p:identity changes anything about the document.

And we also don't make the XQuery/XSLT guarantee about consistency of documents, because pipelines explicitly change them.
Comment 2 Tim Mills 2013-03-14 10:32:29 UTC
> We're starting to take up the bugs :-)

Great - thanks.  You may be relieved to know I'm near the end of implementation, and so there shouldn't be too many more queries to come.

> Can you clarify what you mean in the last paragraph about p:identity
> changing the URIs. The WG doesn't believe that p:identity changes anything
> about the document.

That came from a misunderstanding of the comment in test Test base-uri #002.

      <!-- This p:identity step makes sure that we grab the root element -->
      <!-- where the xml:base exists. Otherwise, we get the base uri -->
      <!-- of the input document itself, and that varies by test env. -->
      <p:identity>
	<p:input port="source" select="/doc"/>
      </p:identity>

Of course, it's not the p:identity which is having an effect on the base URI, but rather the implicit creation of new document nodes resulting from the select="/doc".

> And we also don't make the XQuery/XSLT guarantee about consistency of
> documents, because pipelines explicitly change them.

Since the only changes visible from execution of an XProc pipeline result from a p:store or a result document from p:xslt (which set a potentially new document URI), it makes sense for new (intermediate) documents created by XProc steps (explicitly or implicitly through use of select) to have absent document URIs.

Since XProc uses XPath, it needs to guarantee that

"... given a document node $N, the result of fn:doc(fn:document-uri($N)) is $N will always be True, unless fn:document-uri($N) is an empty sequence."

although XProc is at liberty to relax the guarantee of stability for documents access.

The example in 

http://www.w3.org/TR/xproc/#parallelism

shows that constructing the pipeline carefully so that the consequences of side-effects are evident to the processor can avoid much of the unpleasantness of side-effects on document stability.
Comment 3 Norman Walsh 2014-01-07 15:38:39 UTC
Hi Tim,

Apologies for not driving the process of resolving 1.0 errata with more vigor. I'll try to do better in 2014.

In discussing this issue (http://www.w3.org/XML/XProc/2013/03/20-minutes#action02), we considered the possibility that we could look at the evaluation context for an XPath expression as being scoped to an individual step. That would seem to satisfy the XPath constraint, at least if some care is taken to cache documents for the duration of evaluating the expressions in a step invocation.

I wonder how that sits with you.
Comment 4 Tim Mills 2014-01-10 10:53:23 UTC
(In reply to Norman Walsh from comment #3)
> Hi Tim,
> 
> Apologies for not driving the process of resolving 1.0 errata with more
> vigor. I'll try to do better in 2014.

Thanks.  I know you're busy.

> ...
> I wonder how that sits with you.

To ensure that the XPath requirement is met, I think:

1.  fn:doc needs to be stable within the entire execution of a pipeline (not just a step), AND

2.  For a document sourced by means other than calls to fn:doc (or fn:collection), either
  a) the document node should have an empty document-uri, OR
  b) if the document node has a non-empty document-uri 'SOME-URI', then it MUST be stable, i.e. as if it had been accessed via a call to fn:doc('SOME-URI').

Otherwise, it is possible for two documents A and B to arrive as inputs to a step and have the same non-empty document-uri but be different, violating the XPath requirement.
Comment 5 Norman Walsh 2014-02-19 10:15:07 UTC
Minutes from 19 Feb:

We're going to document this as an erratum in V1 but not try to fix it there.

In V2, we'll make doc() stable.
Comment 6 Tim Mills 2014-02-19 10:23:48 UTC
Thanks.

And will point (2)

2.  For a document sourced by means other than calls to fn:doc (or fn:collection), either
  a) the document node should have an empty document-uri, OR
  b) if the document node has a non-empty document-uri 'SOME-URI', then it MUST be stable, i.e. as if it had been accessed via a call to fn:doc('SOME-URI').

also hold in V2?
Comment 7 Norman Walsh 2014-03-12 14:15:11 UTC
Yes, I think it's incumbent on us to get the XPath semantics right in V2.