<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>29528</bug_id>
          
          <creation_ts>2016-03-11 16:17:46 +0000</creation_ts>
          <short_desc>Character encoding corruptions in the test set &apos;number&apos;</short_desc>
          <delta_ts>2016-10-20 14:35:08 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>XSLT 3.0 Test Suite</component>
          <version>Working drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="O&apos;Neil Delpratt">oneil</reporter>
          <assigned_to name="Abel Braaksma">abel.online</assigned_to>
          <cc>abel.braaksma</cc>
    
    <cc>mike</cc>
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>125445</commentid>
    <comment_count>0</comment_count>
    <who name="O&apos;Neil Delpratt">oneil</who>
    <bug_when>2016-03-11 16:17:46 +0000</bug_when>
    <thetext>We are hitting character encoding issues within the test suite in mercurial. Specifically in the test set &apos;number&apos;. For example see the test case number-5075.

In some tools and environments the special characters are corrupted.

It maybe better to generate the characters using entity references to avoid these character encoding corruptions.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125453</commentid>
    <comment_count>1</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-11 23:47:40 +0000</bug_when>
    <thetext>I am reluctant, there are many testsets that use non-ascii Unicode characters or even non-BMP Unicode characters. They don&apos;t seem to cause problems. So I&apos;d rather try to find out why these particular tests are now causing trouble. Perhaps a simple BOM at the beginning of the file is missing and adding it will prevent Mercurial trying to treat it as ASCII and corrupt the file? Or we encode as UTF-16?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>126921</commentid>
    <comment_count>2</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-07-07 16:23:58 +0000</bug_when>
    <thetext>O&apos;Neil, do you still have issues with this? Or was it resolved meanwhile?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127869</commentid>
    <comment_count>3</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-10-19 19:13:09 +0000</bug_when>
    <thetext>Just to confirm, we are still having problems with this test-set - it is working in some environments and not others. We have yet to pin the issue down.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127877</commentid>
    <comment_count>4</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-10-20 10:58:15 +0000</bug_when>
    <thetext>We have drilled more deeply into this problem and discovered that the difference between environments that work and those that don&apos;t is the choice of XML parser - Apache Xerces gets it right, the JDK parser (even in JDK 8) gets it wrong. It&apos;s unbelievable that the JDK parser is still buggy after so many years, but that seems to be the case.

We will once again raise a JDK bug on this, and once again remind ourselves and our users always to use the Apache parser in preference.

Meanwhile, closing this bug.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127881</commentid>
    <comment_count>5</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-10-20 14:35:08 +0000</bug_when>
    <thetext>For my own future reference, we raised essentially the same problem as a JDK bug here

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8145969

and it has been reported by others here:

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8058175

and they claim it is fixed in JDK 9.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>