<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>7630</bug_id>
          
          <creation_ts>2009-09-15 14:08:51 +0000</creation_ts>
          <short_desc>[FO] There is no formal definition of the Unicode codepoint collation</short_desc>
          <delta_ts>2012-03-27 23:30:54 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Functions and Operators 1.0</component>
          <version>Recommendation</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Michael Kay">mike</reporter>
          <assigned_to name="Michael Kay">mike</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>27179</commentid>
    <comment_count>0</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2009-09-15 14:08:51 +0000</bug_when>
    <thetext>The specification contains no formal definition of the Unicode codepoint collation http://www.w3.org/2005/xpath-functions/collation/codepoint

A suitable definition might be:

declare function compare-seq($x as xs:integer*, $y as xs:integer*) as xs:integer {
   if (count($x) eq 0 or count($y) eq 0) 
   then if (count($x) eq 0 and count($y) eq 0)
        then 0
        else if (count($x) eq 0) then -1 else +1
   else if ($x[1] eq $y[1])
        then compare-seq(remove($x, 1), remove($y, 1))
        else if ($x[1] lt $y[1]) then -1 else +1
}

and then compare($X as xs:string, $Y as xs:string) under the Unicode codepoint collation is defined to have the result compare-seq(string-to-codepoints($X), string-to-codepoints($Y)).

Problem raised by Patrick Durusau (patrick at durusau dot net) on public-qt-comments, 2 Sept 2009.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>27869</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2009-09-29 15:41:00 +0000</bug_when>
    <thetext>The following proposal was accepted by the WG on 2009-09-29

ACTION A-411-02: MK will produce a textual proposal for resolving Bugzilla
#7630 (definition of the Unicode codepoint collation).

For the 1.0/2.0 specification:

Add a new paragraph after the current fourth paragraph of F+O section 7.3.1

The Unicode codepoint collation does not perform any normalization on the
supplied strings. It is defined as follows. Each of the two strings is
converted to a sequence of integers using the fn:string-to-codepoints
function. These two sequences $A and $B are then compared as follows: 

* If both sequences are empty, the strings are equal

* If one sequence is empty and the other is not, then the string
corresponding to the empty sequence is less than the other string

* If the first integer in $A is less than the first integer in $B, then the
string corresponding to $A is less than the string corresponding to $B.

* If the first integer in $A is greater than the first integer in $B, then
the string corresponding to $A is greater than the string corresponding to
$B.

* Otherwise (the first pair of integers are equal), the result is obtained
by applying the same rules recursively to fn:subsequence($A, 2) and
fn:subsequence($B, 2)

For the 1.1/2.1 specification: Use the same rules, but create a new section
containing the definition of the Unicode codepoint collation and refer to
this section from the appropriate places; and make &quot;Unicode codepoint
collation&quot; a defined term, hyperlinking all references to it.


</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66156</commentid>
    <comment_count>2</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2012-03-27 23:30:54 +0000</bug_when>
    <thetext>I note that the agreed change has been made to the 3.0 draft, but the change for the 1.0/2.0 specification does not appear in the published second edition. I have therefore added a reference to this bug to the list of candidate errata (in the xsl-query-specs CVS area), and am herewith closing the bug.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>