This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5324 - [XSLT 2.0] case-order is not clearly described
Summary: [XSLT 2.0] case-order is not clearly described
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 2.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-17 00:12 UTC by Michael Kay
Modified: 2008-07-29 08:10 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2007-12-17 00:12:16 UTC
Bug #791 (member-only) against the XSLT test suite points out that case-order on xsl:sort is not clearly described.

The sum total of the description is: "The case-order attribute indicates whether the desired collation should sort upper-case letters before lower-case or vice versa. The effective value of the attribute must be either lower-first (indicating that lower-case letters precede upper-case letters in the collating sequence) or upper-first (indicating that upper-case letters precede lower-case)."

As the bug report against the test suite points out, this could be read as indicating that case-order="lower-first" is supposed to mean that lower-case "z" precedes upper-case "A" in the collating sequence. I do not think this is the intent. The intent was stated (not especially well) by example in the XSLT 1.0 specification:

For example, if lang="en", then A a B b are sorted with case-order="upper-first" and a A b B are sorted with case-order="lower-first".
Comment 1 Colin Adams 2007-12-17 13:53:17 UTC
Note that there are additional considerations that need be taken into account when formulating a clear specification for the beahviour. Such as:

1) Which case mappings are to be used? My guess is that the default Unicode legacy mappings would be appropriate (strings don't change length). In which case German S SHARP (I think that's the Unicode name) won't have any cased version.
2) What happens to title-cased letters?
Comment 2 Michael Kay 2008-01-31 18:11:13 UTC
In response to comment #1, I don't think we need to go into that level of detail. We describe case-order (like lang) as requesting use of a collation with certain characteristics, and we can describe this in terms of a property, for example that

for every string S, compare(lower-case(S), upper-case(S), $coll) < 1

without prescribing every detail of the collation's behaviour. An example of a collation that has this property is one that sorts 

pole, Pole, polish, Polish 
Comment 3 Michael Kay 2008-03-13 16:01:18 UTC
In action A-2008-02-07-005 I was asked to produce text to fix this.

The current text is:

The case-order attribute indicates whether the desired collation should sort upper-case letters before lower-case or vice versa. The effective value of the attribute must be either lower-first (indicating that lower-case letters precede upper-case letters in the collating sequence) or upper-first (indicating that upper-case letters precede lower-case).

Proposal: Add after the existing text. "When lower-first is requested, the returned collation SHOULD have the property that for any string S, lower-case(S) collates before upper-case(S); when upper-first is requested, the returned collation SHOULD have the property that for any string S, upper-case(S) collates before lower-case(S). When case of letters is a tertiary characteristic, as in the Unicode Collation Algorithm, choosing upper-first will have the effect that, for example, StAndrew collates after Stand but before Standrew." 
Comment 4 Colin Adams 2008-03-14 07:34:56 UTC
The proposed text still seems to suggest (to me) that, for upper first, Z should collate before a and, for lower first, z should collate before A.
Comment 5 Colin Adams 2008-03-14 07:52:38 UTC
I propose to change the text:

"The case-order attribute indicates whether the desired collation should sort
upper-case letters before lower-case or vice versa. The effective value of the
attribute must be either lower-first (indicating that lower-case letters
precede upper-case letters in the collating sequence) or upper-first
(indicating that upper-case letters precede lower-case)."

to:

"The case-order attribute indicates whether the desired collation should sort
upper-case variants of a letter before their lower-case variants or vice versa. The effective value of the
attribute must be either lower-first (indicating that the lower-case variant of a letter
precedes  the upper-case variant of the same letter in the collating sequence) or upper-first
(indicating that the upper-case variant of a letter precedes the lower-case variant). If the letter has an additional title-case variant, then that should be treated as if it were an ypper-case variant with respect to the lower-case variant"
Comment 6 Michael Kay 2008-03-20 11:49:10 UTC
In action A2008-03-13-003 I was asked to try again.

Proposal: Add after the existing text. "When lower-first is requested, the
returned collation SHOULD have the property that when two strings differ only in the case of one or more characters, then a string in which the first differing character is lower-case should precede a string in which the corresponding character is title-case, which should in turn precede a string in which the corresponding character is upper-case. When upper-first is requested, the
returned collation SHOULD have the property that when two strings differ only in the case of one or more characters, then a string in which the first differing character is upper-case should precede a string in which the corresponding character is title-case, which should in turn precede a string in which the corresponding character is lower-case."

For example, if lower-first is requested, then a sorted sequence might be "MacAndrew, macintosh, macIntosh, Macintosh, MacIntosh, macintoshes, Macintoshes, McIntosh". If upper-first is requested, the same sequence would sort as "MacAndrew, MacIntosh, Macintosh, macIntosh, macintosh, MacIntoshes, macintoshes, McIntosh"
Comment 7 Michael Kay 2008-07-10 16:43:17 UTC
It was agreed on 10 Jul 2008 to use the text in comment #6 but reinstating the XSLT 1.0 example (A a B b) for additional clarity.
Comment 8 Michael Kay 2008-07-17 14:12:52 UTC
Erratum E26 has been drafted.

Colin, as the person who effectively raised this problem (as a bug against the test suite), I would be grateful if you would close it if you are satisfied.
Comment 9 Colin Adams 2008-07-29 08:10:59 UTC
Closed as requested.