This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9757 - Group By Clause: Equivalence: "atomic" is incorrect
Summary: Group By Clause: Equivalence: "atomic" is incorrect
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3.0 (show other bugs)
Version: Working drafts
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-18 07:00 UTC by Michael Dyck
Modified: 2010-06-22 18:04 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Dyck 2010-05-18 07:00:24 UTC
[Reiterates point #1 from http://lists.w3.org/Archives/Member/w3c-xsl-query/2009Nov/0075.html (Members only)]

The definition of "equivalence" begins:
     "Equivalence of two atomic values V1 and V2 is defined..."
However, the word "atomic" is incorrect, because either value might be an empty sequence, which is not an atomic value.

(I don't think we have a term for "an atomic value or empty sequence".)

(a)
We could simply delete the word "atomic" from the definition. The procedure would then appear to apply to any two values. (So, e.g. it would say that (1,2) is not equivalent to itself.) Of course, it would still only be "invoked" with atomic-or-empty values.

(b)
In addition to deleting "atomic", we could add:
     "where each value is an atomic value or an empty sequence"
to the beginning of the definition.

(c)
Another possibility would be to expand the definition to actually handle any two values. Note that in both uses of the procedure (in 'group by' clauses and switch statements), the two values are the result of atomization. So we could say something like:

     Equivalence of two values V1 and V2 is defined by the following process:
     1. Atomize each value.
     2. If either resulting value is more than one item, error.
     3. [proceed with existing rules]

This would then lead to somewhat briefer wording at the "call points".

(d)
One other possibility (mutually exclusive with (c), I think) occurred to me when I noticed that the rules for equivalence have a similar shape to those for 'greater-than' in [Order By Clause]. We could possibly merge the two into one procedure that, given two atomic-or-empty values V1 and V2, returns one of four findings:
     V1 is-greater-than V2
     V1 is-less-than V2
     V1 is-equivalent-to V2
     V1 is-not-comparable-to V2

It would be nice if XQuery 1.1 didn't have to add yet another way to 
compare two values.
     Value Comparison (involving fn:compare)
     General Comparison (involving "magnitude relationship")
     fn:distinct-values
     fn:deep-equal
     "greater-than" (for order by)
     and now "equivalence" (for group by and switch)
Did I miss any?

[This last bit is similar to Bug 8222, but that suggests merging 'equivalence' with distinct-values(), whereas this suggests merging 'equivalence' with 'greater-than'.]
Comment 1 Michael Kay 2010-06-02 22:28:54 UTC
My proposal for this is as follows:

1. Delete the definition of "equivalence of two atomic values".

2. In 3.8.7, change list item 2 of the "process of group formation" to read as follows:

   <new>

2. The input tuple stream is partitioned into groups of tuples whose grouping keys are /equivalent/. Two tuples T1 and T2 have /equivalent/ grouping keys if and only if, for each grouping variable GV, the atomized value of GV in T1 is deep-equal to the atomized value of GV in T2, as defined by applying the function fn:deep-equal using the appropriate collation.

Note: the atomized grouping key will always be either an empty sequence or a single atomic value. Defining equivalence by reference to the fn:deep-equal function ensures that the empty sequence is equivalent only to the empty sequence, that NaN is equivalent to NaN, that untypedAtomic values are compared as strings, and that values for which the eq operator is not defined are considered non-equivalent.

3. The appropriate collation for comparing two grouping keys is the collation specified in the pertinent _GroupingSpec_ if present, or the default collation from the static context otherwise. If the collation is specified by a relative URI, that relative URI is resolved to an absolute URI using the base URI in the static context. If the specified collation is not found in statically known collations, a static error is raised [err:XQST0076]. 

  </new>

3. In 3.11 Switch, change the two paragraphs starting "The first step.." and "[Definition: The effective case..." to:

  <new>

The first step in evaluating a switch expression is to apply atomization to the value of the switch operand expression. If the result is a sequence of length greater than one, a type error is raised [err:XPTY0004].

The resulting value is matched against each _SwitchCaseOperand_ in turn until a match is found or the list is exhausted. The matching is performed as follows:

1. The _SwitchCaseOperand_ is evaluated.

2. The resulting value is atomized.

3. If the atomized sequence has length greater than one, a type error is raised [err:XPTY0004]

4. The atomized value of the switch operand expression is compared with the atomized value of the _SwitchCaseOperand_ using the fn:deep-equal function, with the default collation from the static context.

[Definition: The effective case in a switch expression is the first case clause that matches, using the rules given above.] The value of the switch expression is the value of the return expression in the effective case.

   </new>

OBSERVATION: This reflects the current rules. However, I can't really see a good case here for not raising a type error if the switch operand and switch case operand have non-comparable types. This would seem to be a programming mistake that we should report to the user. Personally, I think it would be better to do the comparison using "eq" rather than fn:deep-equal(), and to disallow () for both the switch and case operands. This situation is quite different from grouping, where grouping a heterogeneous collection of values makes sense.
Comment 2 Michael Dyck 2010-06-22 07:23:39 UTC
The suggested wording for 'Switch Expression' fails to perpetuate the change by which "effective case" can also refer to the default clause, but that would be easy to fix.
Comment 3 Jonathan Robie 2010-06-22 16:45:11 UTC
In today's call, the WG agreed to accept Comment #1 as our resolution, as modified by Comment #2. 

We did not decide to change the semantics of switch.
Comment 4 Michael Dyck 2010-06-22 18:04:39 UTC
That's okay with me.