<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>25266</bug_id>
          
          <creation_ts>2014-04-04 20:44:02 +0000</creation_ts>
          <short_desc>Consider adding 34 code points to the EUC-JP decoder present in Blink/ICU&apos;s euc-jp-2007.</short_desc>
          <delta_ts>2014-04-28 21:35:45 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Jungshik Shin">jshin</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>adrianba</cc>
    
    <cc>hsivonen</cc>
    
    <cc>mike</cc>
    
    <cc>pub-w3</cc>
    
    <cc>travil</cc>
    
    <cc>VYV03354</cc>
    
    <cc>www-international</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>103434</commentid>
    <comment_count>0</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2014-04-04 20:44:02 +0000</bug_when>
    <thetext>While updating the EUC-JP mapping table per the encoding standard (mainly dropping most of JIS X 212 characters among other things) for Blink/Chromium, I found that there are about 20 code points that are missing in the EUC-JP decoder. 

They&apos;re listed below: 

# 1. 0x8E 0xE0 to 0x8E 0xE2
#   00A2 00A3 00AC
# 2. JIS X 0212 extra (0x8F 0xF3 0xhh)
#   2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171
#   2172 2173 2174 2175 2176 2177 2178 2179 221A 2220 2229 222A 222B 2235 2252
#   2261 22A5 3231
# 3. JIS X 0208 extra : 0xFC 0xFB =&gt; FFE2</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103745</commentid>
    <comment_count>1</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-11 13:16:09 +0000</bug_when>
    <thetext>This seems like a subset of bug 16941.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103748</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-11 13:23:26 +0000</bug_when>
    <thetext>I do not mind adding these. Are other vendors likely to add them?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103749</commentid>
    <comment_count>3</comment_count>
      <attachid>1470</attachid>
    <who name="Masatoshi Kimura">VYV03354</who>
    <bug_when>2014-04-11 13:45:54 +0000</bug_when>
    <thetext>Created attachment 1470
Testcase

I&apos;m for adding 0xFC 0xFB =&gt; FFE2. Gecko and Trident already support this.
But I&apos;m adding other code points for two reasons:
1. No official specs have those mappings.
2. They are not interoperable between browsers. Gecko will convert all of them to U+FFFD. Trident will convert 0x8E 0xE0 .. 0x8E 0xE2 to U+73EE U+7AE2 U+9D5D. Also Trident does not support triple-byte sequences at all (0x8F 0xF3 0xhh will be converted to U+5834 U+xxxx).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103751</commentid>
    <comment_count>4</comment_count>
    <who name="Masatoshi Kimura">VYV03354</who>
    <bug_when>2014-04-11 13:46:57 +0000</bug_when>
    <thetext>(In reply to Masatoshi Kimura from comment #3)
&gt; But I&apos;m adding other code points for two reasons:

But I&apos;m against adding...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103880</commentid>
    <comment_count>5</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-15 16:07:17 +0000</bug_when>
    <thetext>I don&apos;t understand the requested 0xFC 0xFB mapping. Per the euc-jp decoder algorithm that becomes 8644 as pointer, which maps to U+FFE2 in index jis0208.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104564</commentid>
    <comment_count>6</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-28 15:15:03 +0000</bug_when>
    <thetext>*** Bug 16941 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104582</commentid>
    <comment_count>7</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2014-04-28 21:35:45 +0000</bug_when>
    <thetext>From bug 16141 : 
The ones that are still missing can be found in an IBM extension to JIS X 0212, Rows 83--84.
Tables:
    &lt;http://coq.no/character-tables/ibmjapan2.pdf&gt;
    &lt;http://coq.no/character-tables/ibmjapan2.js&gt;


However, if Firefox has never supported them [1], I&apos;m not wed to them, either. And I&apos;ll just get rid of them from Chromium&apos;s new euc-jp table. Then, it&apos;ll be exactly the same as specified in the encoding spec. 

&gt; I don&apos;t understand the requested 0xFC 0xFB mapping. Per the euc-jp decoder 
&gt; algorithm that becomes 8644 as pointer, which maps to U+FFE2 in index jis0208.

You&apos;re right. It&apos;s there. Sorry about the noise on this code point. 

I&apos;m closing this as wontfix.  

[1] The Chrome&apos;s current table (not the one I&apos;m adding based on the encoding spec but the one that is currently used in released versions) was made by comparing IE, Firefox and ICU tables, but lost is how exactly I &apos;curated&apos; and &apos;merged&apos; them.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>1470</attachid>
            <date>2014-04-11 13:45:54 +0000</date>
            <delta_ts>2014-04-11 13:45:54 +0000</delta_ts>
            <desc>Testcase</desc>
            <filename>euctest.html</filename>
            <type>text/html</type>
            <size>423</size>
            <attacher name="Masatoshi Kimura">VYV03354</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIGh0bWw+CjxzdHlsZT4KYm9keSB7IHdvcmQtd3JhcDogYnJlYWstd29yZCB9Cjwv
c3R5bGU+CjxzY3JpcHQ+CnZhciBmciA9IG5ldyBGaWxlUmVhZGVyKCkKdmFyIGlucHV0ID0gWzB4
OEUsMHhFMCwweDhFLDB4RTEsMHg4RSwweEUyXTsKZm9yICh2YXIgYyA9IDB4QTE7IGMgPD0gMHhC
NzsgKytjKSB7CiAgaW5wdXQgPSBpbnB1dC5jb25jYXQoWzB4OEYsMHhGMyxjXSk7Cn0KaW5wdXQg
PSBpbnB1dC5jb25jYXQoWzB4RkMsMHhGQl0pOwpmci5yZWFkQXNUZXh0KG5ldyBCbG9iKFtuZXcg
VWludDhBcnJheShpbnB1dCldKSwgImV1Yy1qcCIpOwpmci5vbmxvYWQgPSBmdW5jdGlvbigpIHsK
ICBkb2N1bWVudC5ib2R5Lmluc2VydEFkamFjZW50SFRNTCgiQmVmb3JlRW5kIiwgZXNjYXBlKGZy
LnJlc3VsdCkpOwp9Owo8L3NjcmlwdD4K
</data>

          </attachment>
      

    </bug>

</bugzilla>