This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5295 - [XSLT 2.0] xsl:for-each-group: transitivity
Summary: [XSLT 2.0] xsl:for-each-group: transitivity
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 2.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-28 18:32 UTC by Michael Kay
Modified: 2008-07-11 15:15 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2007-11-28 18:32:17 UTC
Bug #5183 raised against distinct-values() applies equally to xsl:for-each-group.

In 14.3 after the paragraph "Grouping keys are compared using the rules..." I propose adding:

If the population contains values of different numeric types that differ
from each other by small amounts, then the eq operator is not transitive,
because of rounding effects occurring during type promotion. In the situation
where there are three values A, B, and C in the population such that A eq B, B eq C, but A ne C, then the number of groups is implementation dependent, subject only to the constraints that (a) every item in a non-singleton group is equal to at least one other item in that group, (b) for any two distinct groups, there is at least one pair of values (one from each group) such that the two values are not equal to each other. 

For example, this arises when computing 

      <xsl:for-each-group select="
         xs:float('1.0'),
          xs:decimal('1.0000000000100000000001',
          xs:double( '1.00000000001')">

because the values of type xs:float and xs:double both compare equal to the
value of type xs:decimal but not equal to each other.
Comment 1 Michael Kay 2007-12-18 18:45:48 UTC
The spec states that current-grouping-key() returns the grouping key of the initial item in a group (that is, the item that is first in population order). I suggest therefore that we define that grouping is done as if the items are processed in population order and each successive item is processed as follows: if its grouping key is equal to that of the initial item in some group, then the item is added to that group; otherwise the item becomes the initial item in a new group.

This ensures that (a) each group will have a distinct value for current-grouping-key(), and (b) every item in a group will have a grouping key that is equal to the current-grouping-key() for that group. It does not ensure that X and Y are in the same group if and only if their grouping keys are equal.
Comment 2 Michael Dyck 2007-12-18 18:57:53 UTC
(In reply to comment #1)
> 
> if its grouping key is equal to that of the initial item in some group, then
> the item is added to that group;

This wording suggests that if it's equal to the initial item in more than one group, then it is added to each such group, which I suspect is not your intent. (E.g., consider if the population order is A, C, B: A and C each start a group, and then B is equal to both A and C.)
Comment 3 Henry Zongaro 2007-12-19 20:52:05 UTC
I suspect you intended to include a group-by attribute in your example in comment #0 - probably as in the following:

      <xsl:for-each-group select="
         xs:float('1.0'),
          xs:decimal('1.0000000000100000000001',
          xs:double( '1.00000000001')" group-by=".">

I'm still not convinced that any change is needed in xsl:for-each-group.  The description of the group-by attribute in the first item in the bulleted list in section 14.3 says that "the items in the population are examined in population order" and that "if there is already a group created to hold items having that grouping key value, J is added to that group; otherwise a new group is created for items with that grouping key value, and J becomes its first member."

Assuming that "having that grouping key value" was intended to imply comparison using "eq" - that what the fourth paragraph of 14.2 seems to say - that would mean the first value (1.0 of type xs:float) would be placed in the first group, the second value (the decimal value 1.0000000000100000000001) would also be in the first group (because it compares equal to the xs:float value 1.0, and the third value (the double value 1.00000000001) would be placed in a second group, because it does not compare equal to the grouping key of the first group - the float value.

My argument hinges on the assumption that the grouping key for a group is the grouping key for the first item in the group.  That's based on the fact that the current-grouping-key function is defined that way.  That seems to me like a reasonable assumption, but perhaps others would argue it's not justified.

Now, having said all that, it occurs to that if an item has two grouping keys that are not the same value but equal according to the eq operator, we still might have implementation-dependent behaviour.  Consider this example:

      <xsl:for-each-group select="
         xs:float('1.0'),
          xs:decimal('1.00000000001'"
         group-by="if (. instanceof xs:float) then (xs:float(.),xs:decimal(.))
                      else .">

Now the first value has grouping keys float 1.0 and decimal 1.0, and the second value has the grouping key 1.00000000001 of type decimal.  The definition of group-by says that "if two or more grouping keys for the same item are equal, then the duplicates are ignored."  So if the implementation chooses the float key value for the first item, there will be one group; if it choose the decimal value, there will be two groups.
Comment 4 Henry Zongaro 2007-12-19 21:24:37 UTC
Sorry - I see that the first half of my comment #3 essentially restates what Michael Kay proposed in comment #1.  But Michael Dyck's comment #2 is a very interesting one - XSLT 2.0 currently permits an item to be in more than one group, but I don't think the WG had considered that an item could end up in more groups than the number of grouping keys it has.  It might not have been Michael's intent to permit it, but it wouldn't be unreasonable.

To sum up, I have to admit that I'm finally convinced that xsl:for-each-group does have a problem.  My apologies that my thought processes ended up taking up so much bandwidth.
Comment 5 Henry Zongaro 2008-03-13 14:40:11 UTC
In order to eliminate the possibility (raised by Michael Dyck in comment #2) that an item with one grouping key could be added to more than one group, I propose the additional refinement to Michael Kay's proposal of comment #1 that the sequence of grouping keys for an item should be considered in order.  For each item in the population in population order, for each of the grouping keys for that item in sequence, the extant groups should be processed in some implementation dependent order.  Processing for that grouping key of the items stops with the first such group that has a grouping key that is eq to the current grouping key for the item.  The item is added to the group, unless it was already added to it.  If there is no such group, the item becomes the first member of a new group, and the current grouping key for the item becomes the grouping key for the group.

I think that would ensure that an item cannot be added to more groups than it has grouping keys.  It also would specify which of the multiplicity of grouping keys for an item would be the grouping key for a group.
Comment 6 Michael Kay 2008-03-20 13:14:20 UTC
Revised proposal based on comment #1 as modified by comment #5:

In 14.3 after the paragraph "Grouping keys are compared using the rules..." add the following text:

If the population contains values of different numeric types that differ
from each other by small amounts, then the eq operator is not transitive,
because of rounding effects occurring during type promotion. It is thus possible to have three values A, B, and C among the grouping keys of the population such that A eq B, B eq C, but A ne C.

For example, this arises when computing 

      <xsl:for-each-group group-by="." select="
         xs:float('1.0'),
          xs:decimal('1.0000000000100000000001',
          xs:double( '1.00000000001')">

because the values of type xs:float and xs:double both compare equal to the
value of type xs:decimal but not equal to each other.

In this situation the results must be equivalent to the results obtained by the following algorithm:

* For each item I in the population in population order, for each of the grouping keys K for that item in sequence, the processor identifies those existing groups G such that the grouping key of the initial item of G is equal to K. 

* If there is exactly one group G, then I is added to this group, unless I is already a member of this group.

* If there is no group G, then a new group is created with I as its first item.

* If there is more than one group G (which can only happen in exceptional circumstances involving non-transitivity), then one of these groups is selected in an implementation-dependent way, and I is added to this group, unless I is already a member of this group.

The effect of these rules is that (a) every item in a non-singleton group has a grouping key that is equal to that of at least one other item in that group, (b) for any two distinct groups, there is at least one pair of items (one from each group) whose grouping keys are not equal to each other. 

Comment 7 Michael Kay 2008-07-10 16:14:25 UTC
The proposal in comment #6 was approved at the WG telcon on 10 July 2008
Comment 8 Michael Kay 2008-07-11 15:15:17 UTC
This proposal has been implemented (with slight editorial variation) as erratum E25.