This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2653 - fn-escape-html-uri-20 is misencoded
Summary: fn-escape-html-uri-20 is misencoded
Status: CLOSED FIXED
Alias: None
Product: XML Query Test Suite
Classification: Unclassified
Component: XML Query Test Suite (show other bugs)
Version: 0.8.4
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Carmelo Montanez
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-05 16:54 UTC by Michael Kay
Modified: 2006-06-22 12:59 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2006-01-05 16:54:36 UTC
It seems that the source file for query fn-escape-html-uri-20 is encoded in
iso-8859-1 rather than utf-8.
Comment 1 Michael Kay 2006-01-05 17:03:50 UTC
Also applies to fn-escape-html-uri-21, except that in this case the Euro symbol
is encoded as x80, which is some MS-Windows codepage encoding, I think.
Comment 2 Michael Kay 2006-01-06 15:46:57 UTC
Note that in fn-escape-html-uri-21, the results are wrong as well. The correct
%HH escaping of the Euro symbol (Unicode x20AC) is %E2%82%AC.
Comment 3 Carmelo Montanez 2006-01-26 18:55:16 UTC
Mike:

Really having a hard time with this.  Do you have a proposed solution?

Thanks,
Carmelo
Comment 4 Michael Kay 2006-03-09 17:02:15 UTC
This is still incorrect in 0.8.6.

The simplest solution is to replace the "e-acute" character in the first query with a character reference &_#xe9; (no underscore), and the Euro symbol with &_#x20AC; That way, the encoding of the file containing the query won't matter any more.
Comment 5 Michael Kay 2006-03-09 17:11:25 UTC
Moreover, the expected results for fn:escape-html-uri-21 are wrong. The escape sequence %C2%80 is the UTF-8 representation of the codepoint hex 80. But there is no Unicode character assigned to this codepoint. The mistake seems to have arisen because hex 80 is the representation of the Euro symbol in some proprietary Windows character set.

The correct results are 

example%E2%82%ACexample
Comment 6 Carmelo Montanez 2006-03-16 19:05:43 UTC
Mike:

Thanks for the suggestion.  I did teh changes as suggested.  Hopefully that will do the trick.  Please close the bug when able to verify and if in agreement.

Carmelo
Comment 7 Michael Kay 2006-04-26 18:55:05 UTC
These two tests are still wrong in 0.90.

In fn-escape-html-uri-20, the test is

fn:escape-html-uri("exampleé€example")

and the correct result should be

example%C3%A9%E2%82%ACexample

In fn-escape-html-uri-21, the query still contains a character encoded as x80 which is an incorrect encoding of the Euro character. To eliminate encoding problems, I suggest writing the character as &_#x20AC.
Comment 8 Carmelo Montanez 2006-05-02 19:43:23 UTC
Michael:

At some point, I did fixed these two devils and somehow they got lost.
Resubmitted new results for fn-escape-html-uri-20 and new results
and query for fn-escape-html-uri-21.

Thanks,
Carmelo