<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>2653</bug_id>
          
          <creation_ts>2006-01-05 16:54:36 +0000</creation_ts>
          <short_desc>fn-escape-html-uri-20 is misencoded</short_desc>
          <delta_ts>2006-06-22 12:59:04 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XML Query Test Suite</product>
          <component>XML Query Test Suite</component>
          <version>0.8.4</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows XP</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Michael Kay">mike</reporter>
          <assigned_to name="Carmelo Montanez">carmelo</assigned_to>
          
          
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>7634</commentid>
    <comment_count>0</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2006-01-05 16:54:36 +0000</bug_when>
    <thetext>It seems that the source file for query fn-escape-html-uri-20 is encoded in
iso-8859-1 rather than utf-8.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>7635</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2006-01-05 17:03:50 +0000</bug_when>
    <thetext>Also applies to fn-escape-html-uri-21, except that in this case the Euro symbol
is encoded as x80, which is some MS-Windows codepage encoding, I think.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>7679</commentid>
    <comment_count>2</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2006-01-06 15:46:57 +0000</bug_when>
    <thetext>Note that in fn-escape-html-uri-21, the results are wrong as well. The correct
%HH escaping of the Euro symbol (Unicode x20AC) is %E2%82%AC.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>7993</commentid>
    <comment_count>3</comment_count>
    <who name="Carmelo Montanez">carmelo</who>
    <bug_when>2006-01-26 18:55:16 +0000</bug_when>
    <thetext>Mike:

Really having a hard time with this.  Do you have a proposed solution?

Thanks,
Carmelo</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>8650</commentid>
    <comment_count>4</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2006-03-09 17:02:15 +0000</bug_when>
    <thetext>This is still incorrect in 0.8.6.

The simplest solution is to replace the &quot;e-acute&quot; character in the first query with a character reference &amp;_#xe9; (no underscore), and the Euro symbol with &amp;_#x20AC; That way, the encoding of the file containing the query won&apos;t matter any more.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>8651</commentid>
    <comment_count>5</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2006-03-09 17:11:25 +0000</bug_when>
    <thetext>Moreover, the expected results for fn:escape-html-uri-21 are wrong. The escape sequence %C2%80 is the UTF-8 representation of the codepoint hex 80. But there is no Unicode character assigned to this codepoint. The mistake seems to have arisen because hex 80 is the representation of the Euro symbol in some proprietary Windows character set.

The correct results are 

example%E2%82%ACexample</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>8762</commentid>
    <comment_count>6</comment_count>
    <who name="Carmelo Montanez">carmelo</who>
    <bug_when>2006-03-16 19:05:43 +0000</bug_when>
    <thetext>Mike:

Thanks for the suggestion.  I did teh changes as suggested.  Hopefully that will do the trick.  Please close the bug when able to verify and if in agreement.

Carmelo</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>9470</commentid>
    <comment_count>7</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2006-04-26 18:55:05 +0000</bug_when>
    <thetext>These two tests are still wrong in 0.90.

In fn-escape-html-uri-20, the test is

fn:escape-html-uri(&quot;example&amp;#xe9;&amp;#x20AC;example&quot;)

and the correct result should be

example%C3%A9%E2%82%ACexample

In fn-escape-html-uri-21, the query still contains a character encoded as x80 which is an incorrect encoding of the Euro character. To eliminate encoding problems, I suggest writing the character as &amp;_#x20AC.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>9560</commentid>
    <comment_count>8</comment_count>
    <who name="Carmelo Montanez">carmelo</who>
    <bug_when>2006-05-02 19:43:23 +0000</bug_when>
    <thetext>Michael:

At some point, I did fixed these two devils and somehow they got lost.
Resubmitted new results for fn-escape-html-uri-20 and new results
and query for fn-escape-html-uri-21.

Thanks,
Carmelo</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>