This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29528 - Character encoding corruptions in the test set 'number'
Summary: Character encoding corruptions in the test set 'number'
Status: RESOLVED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 Test Suite (show other bugs)
Version: Working drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Abel Braaksma
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-11 16:17 UTC by O'Neil Delpratt
Modified: 2016-10-20 14:35 UTC (History)
2 users (show)

See Also:


Attachments

Description O'Neil Delpratt 2016-03-11 16:17:46 UTC
We are hitting character encoding issues within the test suite in mercurial. Specifically in the test set 'number'. For example see the test case number-5075.

In some tools and environments the special characters are corrupted.

It maybe better to generate the characters using entity references to avoid these character encoding corruptions.
Comment 1 Abel Braaksma 2016-03-11 23:47:40 UTC
I am reluctant, there are many testsets that use non-ascii Unicode characters or even non-BMP Unicode characters. They don't seem to cause problems. So I'd rather try to find out why these particular tests are now causing trouble. Perhaps a simple BOM at the beginning of the file is missing and adding it will prevent Mercurial trying to treat it as ASCII and corrupt the file? Or we encode as UTF-16?
Comment 2 Abel Braaksma 2016-07-07 16:23:58 UTC
O'Neil, do you still have issues with this? Or was it resolved meanwhile?
Comment 3 Michael Kay 2016-10-19 19:13:09 UTC
Just to confirm, we are still having problems with this test-set - it is working in some environments and not others. We have yet to pin the issue down.
Comment 4 Michael Kay 2016-10-20 10:58:15 UTC
We have drilled more deeply into this problem and discovered that the difference between environments that work and those that don't is the choice of XML parser - Apache Xerces gets it right, the JDK parser (even in JDK 8) gets it wrong. It's unbelievable that the JDK parser is still buggy after so many years, but that seems to be the case.

We will once again raise a JDK bug on this, and once again remind ourselves and our users always to use the Apache parser in preference.

Meanwhile, closing this bug.
Comment 5 Michael Kay 2016-10-20 14:35:08 UTC
For my own future reference, we raised essentially the same problem as a JDK bug here

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8145969

and it has been reported by others here:

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8058175

and they claim it is fixed in JDK 9.