This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3173 - [F+O, DM] A difficulty with document-uri()
Summary: [F+O, DM] A difficulty with document-uri()
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Ashok Malhotra
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-01 20:39 UTC by Michael Kay
Modified: 2007-02-25 23:30 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2006-05-01 20:39:14 UTC
The data model says that document-uri is a property of a document node.

F+O says that the document-uri() function returns the value of this property, and guarantees that if it is not null, then doc(document-uri($A)) is $A.

These two statements seem contradictory. Does document-uri() look for a property of the document node, or does it look in available-documents in the dynamic context? Or is there some unstated constraint that the two are consistent? 

It seems that to achieve the equivalence doc(document-uri($A)) is $A, we need to implement document-uri() by doing a reverse lookup on the available-documents mapping in the dynamic context, rather than by looking for a property of the document node.

First of all, I think this means that the document-uri() function is misspecified. It's not obvious from the specification that the result of the function is context-dependent. And if we change the specification so the function does a reverse lookup on available-documents, then the document-uri property in the data model becomes redundant.

Secondly, I think it means that some of the XQTS test cases are making an unwarranted assumption. A number of tests (document-uri-12, 15, 16, 17, 18, 19) apply the document-uri() function to a document that was not retrieved using the doc() function, but was built outside the scope of a particular query. At the time the document is built, it's not known what the available-documents mapping will be, which makes it difficult to associate a document-uri with the document that offers a guarantee that doc(document-uri($X)) is $X will in fact hold true. I think I can find ways of passing these tests by some rather artificial manipulation of the dynamic context in the test driver, but it doesn't feel right. 

I propose the following way forward:

(a) abolish the document-uri property and accessor

(b) describe fn:document-uri() as follows: if available-documents in the dynamic context contains a mapping from one or more URIs to the specified document node, the function returns one of those URIs (the choice is implementation-dependent but stable). If it contains no such mapping, the function returns an empty sequence.

(c) add a Note to say that if a document node $N was constructed using a call on the doc() function, then it is guaranteed that document-uri($N) will return a non-empty result; otherwise, it is implementation-dependent whether it will return a non-empty result

(d) delete test cases that apply document-uri() to nodes that were not constructed by calling the doc() function (for example, nodes passed into the query as parameters).
Comment 1 Colin Adams 2006-05-02 05:37:07 UTC
Why not simply require that all document nodes must be mapped in available-documents?
Comment 2 Michael Kay 2006-05-02 08:16:52 UTC
>Why not simply require that all document nodes must be mapped in
available-documents?

Because it runs against existing practice. (A) Existing APIs such as JAXP and System.Xml.Xsl allow you to supply a DOM Document as the value of a parameter or as the principal input to a transformation, and there is not necessarily any URI that will return that document. (B) The "mapping" (available-documents) is traditionally implemented by a user hook (called a URIResolver or XmlResolver) that takes a URI as input and returns a node as output; the mapping performed by such a user hook is not intrinsically reversible.
Comment 3 Michael Kay 2006-05-09 17:55:12 UTC
We discussed this today. I'll try and explain in a little more detail the implementation problems I was having, and see if this has any resonances.

I'm currently passing the XQTS tests on document-uri(): to achieve this I adopted the strategy of scanning the parameters to the query or stylesheet, and if any of them is a document node, looking at the "system ID" of the document node, and if it is an absolute URI, adding it to the URI->document mapping table maintained by the doc() function. The document-uri() function looks in this mapping table, and returns null if the document node isn't in there. The "system ID" is derived from the systemId property of the JAXP source object. One of the problems is that this is overloading the semantics of the JAXP interface (this property is also used to control the base URI of the document node, which isn't necessarily the same thing as the document URI). Another problem is in deciding what to do if two documents have the same system ID (it could break existing applications to make this an error). 

This approach avoids the problem of a document having multiple URIs. You just get one of them back, chosen arbitrarily.

I think the main feedback on the specification is that to make this work, I'm getting the value of document-uri() from available-documents in the dynamic context, and not from any property of the document node itself (except in the case of documents that arrive other than through the doc() function).

Perhaps the change we need to the spec is:

(a) in the data model description of the document-uri property, some acknowledgement that the property must be unique across all the documents used in a query or transformation

(b) in the description of available-documents in the dynamic context, a recognition that there is a consistency constraint: if a document D used in the query or transformation has a document-uri of U, then available-documents must include a mapping from U to D.

Comment 4 Mary Holstege 2006-05-10 18:08:17 UTC
Your analysis certainly points out that for a particular class of
implementations, those which perform real-time uncached external resolution of
URIs with no stability guarantees, some work is necessary to ensure that the
existing consistency constraints are met. True.  And confusing things can
happen if you don't have a fn:doc stability. Yes, also true. 

What is less clear to me is whether additional consistency constraints are
required in the spec. 

Your change (a) in comment 3 looks to me like a resurrection of stability 
for fn:doc, because the only testable implication of this requirement is
precisely that fn:doc($U) is fn:doc($U). While personally I thought relaxation
of the stability requirement on fn:doc was a mistake, this just sneaks it in 
through the back door. Proposal (b) is just a restatement of the existing
consistency constraint between fn:doc and fn:document-uri, and not one I find
clearer. I'm not sure how I would define "a document D used in a query or
transformation".  I am sure how to define fn:doc(fn:document-uri($A)) is $A
where fn:document-uri($A) is not empty.

I suppose one take on this is, well, given we have permitted implementations to 
provide unstable fn:doc, then we should allow them to provide unstable
fn:document-uri as well. And fn:document-available, I reckon. So, for unstable
implementations, no promises that fn:doc(fn:document-uri($A)) is $A and no
promises that fn:document-available($U) is consistent with fn:doc($U). I think
that is a logical conclusion of fn:doc instability.  

In looking at the original source of all of this -- some tests in the testsuite
-- my conclusion it that the testsuite is making some unstated assumptions
about document URIs, and that ought to be fixed: it is assuming that the file
names given to input documents should bear some relation to the document URIs
used by the driver/implementation in constructing data model instances.
Proposal: testsuite instructions to make this clearer. 
Comment 5 Michael Kay 2006-06-16 11:06:09 UTC
I was actioned to propose a way forward on this bug (Action A-301-01). My proposal is as follows. 

PROPOSAL

1. In F+O section 2.6, replace

<old>
If fn:document-uri($arg) does not return the empty sequence, then the following expression always holds:

fn:doc(fn:document-uri($arg)) is $arg
</old>

by:

<new>
In the case of a document node $doc that was obtained as the result of calling the fn:doc function, provided that the user has not chosen to relax the requirements for _stable_ evaluation, then the following expression always holds:

fn:doc(fn:document-uri($doc)) is $doc

In the case of document nodes obtained by other means, for example documents passed as parameters to a query or stylesheet, or documents returned by the fn:collection function, there is no such guarantee. For example, there is nothing to prevent two query or stylesheet parameters having as their values two distinct documents that share the same document URI (they could perhaps represent two different states of the same resource at different points in time).
</new>

2. In XPath/XQuery 2.1.2, under the definition of "available documents" in the dynamic context, add the Note:

<note>If available documents maps one or more strings to a document D, then the _document-uri_ accessor applied to D must return one of those strings.</note>

RATIONALE

The changes are as follows:

(a) we remove the constraint that doc(document-uri($doc)) returns $doc except in the case where $doc was originally obtained as the result of calling doc(). It's not generally possible to enforce this constraint in the case of documents supplied as external parameters to a query, without defining additional difficult-to-specify rules, for example that document-uris must be unique within the scope of a query/transformation.

(b) we link the constraint to the rules on stable execution, so that the constraint is not enforced if stability has been relaxed

(c) the residual rule for document-uri() in F+O still implies a constraint on the "available documents" mapping in the dynamic context, and this constraint is now documented where it belongs. It is phrased in such a way as to avoid imposing any further constraints, for example it is still possible to have two URIs that map to the same document, and it is still possible to have a document with a document URI that is not present in "available documents".

(d) We could extend the constraint to apply also to documents loaded using the collection() function. I chose not to do so, on balance, in order to allow maximum freedom to implementors of collection(). For example, I can envisage an implementation of collection() in which the sequence of documents is constructed algorithmically, in which case it might be difficult to make the same documents individually accessible using doc().

Michael Kay 

Comment 6 Mary Holstege 2006-06-16 22:14:17 UTC
I like the linkage of stability into the mix, but I still don't understand
why the notion of "documents obtained as a result of calling fn:doc" and
documents obtained by other means is necessary or helpful.  If an implementation
can handle stability, it can handle stability. If it can't, it can't. I think
adding the linkage to stability is all we need to do, and drop all the additional caveats and exceptions. Adding in the clarification under "available
documents" is good too.

For example, why fn:collection should be included in the list of
the non-guaranteed? Implementations that have it it will most likely use it
simply as a convenient way of getting at a named set of document nodes
instead of having to concatenate a sequence of fn:doc calls, so ensuring
the guarantee costs nothing more than it does for fn:doc, but by taking away
the guarantee you have also made it so one can't write applications that can
use the generic label in place of specific document URIs. It also means you
are dropping guarantees for any vendor-specific extension functions that fetch
or construct document nodes, which, given the nature of this area of the spec,
I would consider highly likely to exist.  But even when you look at things like
parameters to queries (by which I assume you mean external variables). I really
don't see why, given I have a document node in my hand, it is any kind of 
problem to -- if it has a document-uri at all -- to make sure that fn:doc
returns that guy if you have a stable implementation. That's the work a stable
implementation needs to do. 
Comment 7 Michael Kay 2006-06-17 13:43:27 UTC
Let's concentrate on query and stylesheet parameters / external variables first. If we require

doc(document-uri($param)) is $param

for such cases, the consequences are:

(a) documents supplied as parameters must have unique document URIs (assuming they have document URIs at all): we need to define a new error condition that is raised if this condition is violated.

(b) if a document D with document URI U is supplied as a parameter, then available-documents in the dynamic context must include a mapping from U to D. This means it is an error if it contains a mapping from U to anything else, and we must define this error condition. This error condition gives me problems, because in many real products, the "mapping" is implemented by a user-defined function (variously called a URIResolver or an XMLResolver) that can map URIs to document nodes in any way that it likes. The error condition is one that I therefore find it difficult to enforce.

I think it is simpler not to define these two error conditions, and instead to relax the rule that doc(document-uri($param)) is $param.

Concerning collection(), we would again need to define some additional error conditions, and some of these are hard to enforce. The obvious constraint again is that the document-uri's of the returned documents are unique; but it goes beyond this. For example, if the user has called doc-available(U) and received the answer false(), then collection() must not be allowed to return a document whose document-uri is U. Rather than do the work on the specification to define all such constraints, I think it would be easier to relax the rules.
Comment 8 Michael Kay 2006-06-23 11:12:23 UTC
During discussion, another use case in support of the proposed change was cited, which I capture here for the record.

Suppose you call collection('http://some.collection.uri/docs?password=abc123') and this returns a collection of documents one of which has a document URI property of "http://some.collection.uri/doc864". The doc-available() function might have already been called with this argument and might have returned false, perhaps because the document is inaccessible without a password. What is the implementation supposed to do:

(a) treat this response from the collection function as an error (if so, what error?)

(b) treat this document returned by the collection function as if it had no document-uri

(c) allow this document to have a document-uri, but one which cannot successfully be passed to the doc() function

There was a feeling in the meeting that (c) was the most appropriate. This is consistent with my proposal in comment #5.

Michael Kay
Comment 9 Michael Kay 2006-08-25 17:21:27 UTC
The WGs today discussed the proposal at

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3173

(member-only) and agreed to accept it with one minor change. The proposal as amended is:

(A) add to the definition of "Available documents" in XP/XQ 2.1.2:

"If there are one or more URIs in Available Documents that map to a document
node D, then the document-uri property of D must either be absent, or must
be one of these URIs.

Note: this means that given a document node $N, the result of
{fn:doc(fn:document-uri($N)) is $N} will always be true, unless
fn:document-uri($N) is an empty sequence."

(B) add to the definition of "Available collections":

"For every document node D that is present in one or more Available
Collections, or that is the root of a tree containing a node that is so
present, the document-uri property of D must either be absent, or must be a
URI U such that Available Documents contains a mapping from U to D."

Note: this means that for any document node $N retrieved using the
fn:collection function, either directly or by navigating to the root of a
node that was returned, the result of {fn:doc(fn:document-uri($N)) is $N}
will always be true, unless fn:document-uri($N) is an empty sequence. This
implies a requirement for the fn:doc and fn:collection functions to be
consistent in their effect. If the implementation uses catalogs or
user-supplied URI resolvers to dereference URIs supplied to the fn:doc
function, the implementation of the fn:collection function must take these
mechanisms into account. For example, an implementation might achieve this
by mapping the collection URI to a set of document URIs, which are then
resolved using the same catalog or URI resolver that the fn:doc function
uses."

(C) change the text in F+O section from:

"If fn:document-uri($arg) does not return the empty sequence, then the
following expression always holds: fn:doc(fn:document-uri($arg)) is $arg"

to:

"In the case of a document node $D returned by the fn:doc function, or a
document node at the root of a tree containing a node returned by the
fn:collection function, it will always be true that either
fn:document-uri($D) returns the empty sequence, or that the following
expression is true: fn:doc(fn:document-uri($D)) is $D. It is implementation-defined whether this guarantee also holds for document nodes obtained by other means, for example a document node passed as the initial context node of a query or transformation."
Comment 10 Carmelo Montanez 2006-09-08 18:35:27 UTC
All Affected tests have been changed to reflect new decisions.

Carmelo
Comment 11 Jim Melton 2007-02-25 23:30:02 UTC
Closing bug because commenter has not objected to the resolution posted and more than two weeks have passed.