<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>5054</bug_id>
          
          <creation_ts>2007-09-17 14:13:58 +0000</creation_ts>
          <short_desc>Unicode character in K2-StringLT-1</short_desc>
          <delta_ts>2007-09-18 17:29:34 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XML Query Test Suite</product>
          <component>XML Query Test Suite</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows XP</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Andrew Eisenberg">andrew.eisenberg</reporter>
          <assigned_to name="Frans Englich">frans.englich</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>16610</commentid>
    <comment_count>0</comment_count>
    <who name="Andrew Eisenberg">andrew.eisenberg</who>
    <bug_when>2007-09-17 14:13:58 +0000</bug_when>
    <thetext>Test case K2-StringLT-1 contains the comparison of two large codepoints.

I generate the following XQueryX for this test case:

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;xqx:module xmlns:xqx=&quot;http://www.w3.org/2005/XQueryX&quot;
            xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
            xsi:schemaLocation=&quot;http://www.w3.org/2005/XQueryX
                                http://www.w3.org/2005/XQueryX/xqueryx.xsd&quot;&gt;
  &lt;xqx:mainModule&gt;
    &lt;xqx:queryBody&gt;
      &lt;xqx:ltOp&gt;
        &lt;xqx:firstOperand&gt;
          &lt;xqx:stringConstantExpr&gt;
            &lt;xqx:value&gt;&amp;#60000;&lt;/xqx:value&gt;
          &lt;/xqx:stringConstantExpr&gt;
        &lt;/xqx:firstOperand&gt;
        &lt;xqx:secondOperand&gt;
          &lt;xqx:stringConstantExpr&gt;
            &lt;xqx:value&gt;&amp;#55300;&lt;/xqx:value&gt;
          &lt;/xqx:stringConstantExpr&gt;
        &lt;/xqx:secondOperand&gt;
      &lt;/xqx:ltOp&gt;
    &lt;/xqx:queryBody&gt;
  &lt;/xqx:mainModule&gt;
&lt;/xqx:module&gt;
 

When I attempt to validate this XQueryX, I see this error:

   Character reference &quot;&amp;#55300&quot; is an invalid XML character.


I&apos;m weak on the details of Unicode. I believe that character &amp;#55300 is &amp;#xD804. I see the following in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt:

D800;&lt;Non Private Use High Surrogate, First&gt;;Cs;0;L;;;;;N;;;;;
DB7F;&lt;Non Private Use High Surrogate, Last&gt;;Cs;0;L;;;;;N;;;;;

Perhaps you could change &amp;#xD804 to some other character. I&apos;ve experimented a bit, and &amp;#xD700; validates just fine.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16667</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2007-09-18 12:57:54 +0000</bug_when>
    <thetext>I think the translation of the query into XQueryX was done incorrectly. From looking at the file at the octet level, the first operand is the octet sequence ee a9 a0, the second is f0 91 85 b0. These are the UTF-8 representations of the characters with codepoints (decimal) 60000 and 70000 respectively. Codepoint 70000 will be represented in UTF-16 as a surrogate pair, and it looks as if your translation has taken the first 16 bits of the surrogate pair as representing the entire character. </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16671</commentid>
    <comment_count>2</comment_count>
    <who name="Andrew Eisenberg">andrew.eisenberg</who>
    <bug_when>2007-09-18 17:29:27 +0000</bug_when>
    <thetext>Mike, your comment helped me pinpoint the bug in the XQueryX generation. I agree that the test case is correct as it is.
 </thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>