This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1314 - fn:distinct-values should not accept incomparable types
Summary: fn:distinct-values should not accept incomparable types
Status: CLOSED WONTFIX
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 1.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Ashok Malhotra
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-09 22:01 UTC by Don Chamberlin
Modified: 2005-09-29 10:46 UTC (History)
0 users

See Also:


Attachments

Description Don Chamberlin 2005-05-09 22:01:55 UTC
The fn:distinct-values function (Section 15.1.6) eliminates duplicates from an 
atomized sequence, based on comparing values by the "eq" operator. However, it 
says "Values that cannot be compared, i.e. the eq operator is not defined for 
their types, are considered to be distinct." This is problematic for the 
following reasons:

(1) If incomparable values were actually compared by the "eq" operator, an 
error would result (for example, 7 eq "7" raises error XPTY0004.)

(2) An "order by" clause also raises an error (XPTY0004) if it encounters 
incomparable sort keys.

(3) The aggregation functions fn:avg, fn:min, fn:max, and fn:sum also raise an 
error (FORG0006) if they encounter incomparable sort keys.

(4) Implementations of fn:distinct-values based on sorting or hashing are not 
possible under the current definition because they do not accept heterogeneous 
input sequences.

In summary, the current specification of fn:distinct-values is inconsistent 
with the rest of the language and difficult to implement efficiently. The 
definition of fn:distinct-values should be made consistent with other functions 
and operators by raising an error if incomparable values are encountered. This 
will allow "order by" and fn:distinct-values to share a common efficient 
implementation.

Proposal: In the definition of fn:distinct-values, replace the second sentence 
with the following: "If the input sequence contains any two values for which 
the eq operator is not defined, a type error is raised [err:FORG0006]." Also 
add an example: fn:distinct-values(1, 2.3, "Hello") raises err:FORG0006.
Comment 1 Michael Kay 2005-05-09 22:32:07 UTC
I actually attempted to implement the previous specification of distinct-values,
when non-comparable values were considered an error, and I found it very
difficult to achieve; I found the current specification much easier to implement
(my implementation is based on hashing using a simple hash function based on
both the value and the type label). So let's base the argument on what's right
for users, not on implementation factors, which are likely to vary from one
implementor to another.

From a usability point of view, XML Schema supports union types, and the typed
value of a collection of nodes can therefore contain a mixture of different
atomic types. It seems to me a most unfriendly and unnecessary restriction to
tell users that they can't invoke distinct-values() on a collection whose schema
definition is a union type. 

Note also that although sorting in XQuery disallows mixed types, sorting and
grouping in XSLT do not, so the consistency argument works both ways.

Michael Kay
Comment 2 Ashok Malhotra 2005-05-18 21:26:26 UTC
This was discussed during the joint WG meeting on 5/17/2005 and there was no
consensus to make this change.

Ashok Malhotra