Bugzilla – Bug 25266
Consider adding 34 code points to the EUC-JP decoder present in Blink/ICU's euc-jp-2007.
Last modified: 2014-04-28 21:35:45 UTC
While updating the EUC-JP mapping table per the encoding standard (mainly dropping most of JIS X 212 characters among other things) for Blink/Chromium, I found that there are about 20 code points that are missing in the EUC-JP decoder.
They're listed below:
# 1. 0x8E 0xE0 to 0x8E 0xE2
# 00A2 00A3 00AC
# 2. JIS X 0212 extra (0x8F 0xF3 0xhh)
# 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171
# 2172 2173 2174 2175 2176 2177 2178 2179 221A 2220 2229 222A 222B 2235 2252
# 2261 22A5 3231
# 3. JIS X 0208 extra : 0xFC 0xFB => FFE2
This seems like a subset of bug 16941.
I do not mind adding these. Are other vendors likely to add them?
Created attachment 1470 [details]
I'm for adding 0xFC 0xFB => FFE2. Gecko and Trident already support this.
But I'm adding other code points for two reasons:
1. No official specs have those mappings.
2. They are not interoperable between browsers. Gecko will convert all of them to U+FFFD. Trident will convert 0x8E 0xE0 .. 0x8E 0xE2 to U+73EE U+7AE2 U+9D5D. Also Trident does not support triple-byte sequences at all (0x8F 0xF3 0xhh will be converted to U+5834 U+xxxx).
(In reply to Masatoshi Kimura from comment #3)
> But I'm adding other code points for two reasons:
But I'm against adding...
I don't understand the requested 0xFC 0xFB mapping. Per the euc-jp decoder algorithm that becomes 8644 as pointer, which maps to U+FFE2 in index jis0208.
*** Bug 16941 has been marked as a duplicate of this bug. ***
From bug 16141 :
The ones that are still missing can be found in an IBM extension to JIS X 0212, Rows 83--84.
However, if Firefox has never supported them , I'm not wed to them, either. And I'll just get rid of them from Chromium's new euc-jp table. Then, it'll be exactly the same as specified in the encoding spec.
> I don't understand the requested 0xFC 0xFB mapping. Per the euc-jp decoder
> algorithm that becomes 8644 as pointer, which maps to U+FFE2 in index jis0208.
You're right. It's there. Sorry about the noise on this code point.
I'm closing this as wontfix.
 The Chrome's current table (not the one I'm adding based on the encoding spec but the one that is currently used in released versions) was made by comparing IE, Firefox and ICU tables, but lost is how exactly I 'curated' and 'merged' them.