This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14971 - [FO30] uri-collection() - confusion over purpose
Summary: [FO30] uri-collection() - confusion over purpose
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.0 (show other bugs)
Version: Member-only Editors Drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-28 22:54 UTC by Michael Kay
Modified: 2012-07-23 12:25 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2011-11-28 22:54:11 UTC
Somehow the specification of uri-collection() has become confused. The specification includes the stated intent that it should be able to return a set of URIs without dereferencing them, yet the semantics of the function as stated appear to preclude this option, because it is required that the URIs returned should all resolve to document nodes when supplied as arguments to the doc() function.

I discovered this today when using uri-collection() to solve a problem where the URIs in question actually referred to resources that I then wanted to retrieve using unparsed-text().

One of the use cases is to allow each URI to be processed individually using doc() under the scope of a try/catch, so that the application can recover from individual failures to process a particular URI.
Comment 1 Michael Kay 2012-02-15 00:10:56 UTC
After discussion in the WG on a couple of occasions, the  preference is to remove any strong linkage between the results required for uri-collection() and collection(). For example, given a collection URI C, uri-collection(C) might include URIs that cannot be dereferenced or are otherwise unsuited to inclusion in the results of collection(C), while collection(C) might include resources that have no individual URIs, such as documents representing the rows in a relational table or view. However, the set of collection URIs recognized by collection() should be the same as the set of URIs recognized by uri-collection().

However, the note under the description of "available collections" describes a constraint that we must respect, namely that if a document node $N is returned by collection(U), then either document-uri($N) must be empty, or doc(document-uri($N)) must be $N. 

The changes to implement this are as follows: Rewrite the rules for uri-collection as:

In the dynamic context, *available collections* provides a mapping from URIs to collections of nodes. In some cases these nodes may have URIs, and it is useful to be able to retrieve the URIs without accessing the nodes. The fn:uri-collection function returns a sequence of URIs.

The set of URIs accepted by the fn:collection function SHOULD be the same as the set of URIs accepted by the fn:uri-collection function.
 
The zero-argument form of the function returns the document URIs of the document nodes in the default collection.

The single-argument form returns the document URIs of the document nodes in the collection with a given collection URI. If the value of the argument is an empty sequence, the action is as for the zero-argument form of the function. If the argument is a relative URI reference, it is resolved against the Dynamic Base URI property from the dynamic context.

There is no requirement that the URIs in the result of this function should all be distinct.

Notes

One purpose in providing this function is to allow the URIs of the documents in a collection to be retrieved without incurring the cost (which might be significant in some implementations) of dereferencing the URIs to obtain the actual nodes; the application might choose to dereference some of the URIs and ignore others. In addition, an application that dereferences the URIs individually can use try/catch to recover from failures that occur while dereferencing an individual URI, whereas when fn:collection is used, try/catch can only recover from a failure to retrieve the collection as a whole.

It is true for every URI U, and therefore for every URI in the result of fn:uri-collection, that in the absense of errors, when doc(U) returns a document node D, then document-uri(D) should be U. However, the fn:uri-collection function MAY return URIs that cannot be dereferenced, or that refer to resources other than document nodes; and equally the fn:collection function MAY return resources that have no document URI. There is therefore no necessary one-to-one correspondence between the results of fn:collection(C) and fn:uri-collection(C).

It is also true for every URI U, and therefore for every URI in the result of fn:uri-collection, that there is no necessary relationship between the results of doc(U) and unparsed-text(U). It is possible that either, neither, or both of these calls will succeed, and if both succeed, there is no necessary relationship between the results returned by the two calls. While the architecture of the web demands that at some semantic level both must be representations of the same resource, this assertion is in practice unenforceable and untestable. In particular, there is no guarantee that fn:parse-xml(fn:unparsed-text(U)) returns the same result as fn:doc(U), since fn:doc and fn:unparsed-text may retrieve different resource representations as a result of content negotiation.

[This last note belongs in the general introduction to this family of functions and not specifically to fn:uri-collection].
Comment 2 Michael Kay 2012-02-15 08:52:52 UTC
Discussed today; still awaiting further review. One comment made and accepted was that the set of URIs returned by uri-collection() will not exclusively be "document URIs", they may also include URIs of other resources such as unparsed text resources or even collections (or perhaps even collations!).
Comment 3 Michael Kay 2012-07-23 12:25:07 UTC
An update on the status of this bug.

In https://lists.w3.org/Archives/Member/w3c-xsl-query/2012Jun/0129.html I proposed a way forward.

In the minutes of meeting #514 it is recorded that this proposal was adopted, with modification.

There was also subsequent email that effectively modified the proposal recognizing the need to introduce a "default resource collection".

Some editorial work is needed to check that the proposal has been fully implemented in the affected specifications.