29528 2016-03-11 16:17:46 +0000 Character encoding corruptions in the test set 'number' 2016-10-20 14:35:08 +0000 1 1 1 Unclassified XPath / XQuery / XSLT XSLT 3.0 Test Suite Working drafts PC All RESOLVED INVALID P2 normal --- 1 oneil abel.online abel.braaksma mike public-qt-comments oldest_to_newest 125445 0 oneil 2016-03-11 16:17:46 +0000 We are hitting character encoding issues within the test suite in mercurial. Specifically in the test set 'number'. For example see the test case number-5075. In some tools and environments the special characters are corrupted. It maybe better to generate the characters using entity references to avoid these character encoding corruptions. 125453 1 abel.braaksma 2016-03-11 23:47:40 +0000 I am reluctant, there are many testsets that use non-ascii Unicode characters or even non-BMP Unicode characters. They don't seem to cause problems. So I'd rather try to find out why these particular tests are now causing trouble. Perhaps a simple BOM at the beginning of the file is missing and adding it will prevent Mercurial trying to treat it as ASCII and corrupt the file? Or we encode as UTF-16? 126921 2 abel.braaksma 2016-07-07 16:23:58 +0000 O'Neil, do you still have issues with this? Or was it resolved meanwhile? 127869 3 mike 2016-10-19 19:13:09 +0000 Just to confirm, we are still having problems with this test-set - it is working in some environments and not others. We have yet to pin the issue down. 127877 4 mike 2016-10-20 10:58:15 +0000 We have drilled more deeply into this problem and discovered that the difference between environments that work and those that don't is the choice of XML parser - Apache Xerces gets it right, the JDK parser (even in JDK 8) gets it wrong. It's unbelievable that the JDK parser is still buggy after so many years, but that seems to be the case. We will once again raise a JDK bug on this, and once again remind ourselves and our users always to use the Apache parser in preference. Meanwhile, closing this bug. 127881 5 mike 2016-10-20 14:35:08 +0000 For my own future reference, we raised essentially the same problem as a JDK bug here http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8145969 and it has been reported by others here: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8058175 and they claim it is fixed in JDK 9.