25266 – Consider adding 34 code points to the EUC-JP decoder present in Blink/ICU's euc-jp-2007.

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25266 - Consider adding 34 code points to the EUC-JP decoder present in Blink/ICU's euc-jp-2007.

Summary: Consider adding 34 code points to the EUC-JP decoder present in Blink/ICU's e...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	Encoding (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+encodingspec

URL:
Whiteboard:
Keywords:

Duplicates (1):	16941 (view as bug list)
Depends on:
Blocks:

Reported:	2014-04-04 20:44 UTC by Jungshik Shin
Modified:	2014-04-28 21:35 UTC (History)
CC List:	7 users (show)

See Also:

Attachments
Testcase (423 bytes, text/html) 2014-04-11 13:45 UTC, Masatoshi Kimura	Details

Description Jungshik Shin 2014-04-04 20:44:02 UTC

While updating the EUC-JP mapping table per the encoding standard (mainly dropping most of JIS X 212 characters among other things) for Blink/Chromium, I found that there are about 20 code points that are missing in the EUC-JP decoder. 

They're listed below: 

# 1. 0x8E 0xE0 to 0x8E 0xE2
#   00A2 00A3 00AC
# 2. JIS X 0212 extra (0x8F 0xF3 0xhh)
#   2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171
#   2172 2173 2174 2175 2176 2177 2178 2179 221A 2220 2229 222A 222B 2235 2252
#   2261 22A5 3231
# 3. JIS X 0208 extra : 0xFC 0xFB => FFE2

Comment 1 Anne 2014-04-11 13:16:09 UTC

This seems like a subset of bug 16941.

Comment 2 Anne 2014-04-11 13:23:26 UTC

I do not mind adding these. Are other vendors likely to add them?

Comment 3 Masatoshi Kimura 2014-04-11 13:45:54 UTC

Created attachment 1470 [details]
Testcase

I'm for adding 0xFC 0xFB => FFE2. Gecko and Trident already support this.
But I'm adding other code points for two reasons:
1. No official specs have those mappings.
2. They are not interoperable between browsers. Gecko will convert all of them to U+FFFD. Trident will convert 0x8E 0xE0 .. 0x8E 0xE2 to U+73EE U+7AE2 U+9D5D. Also Trident does not support triple-byte sequences at all (0x8F 0xF3 0xhh will be converted to U+5834 U+xxxx).

Comment 4 Masatoshi Kimura 2014-04-11 13:46:57 UTC

(In reply to Masatoshi Kimura from comment #3)
> But I'm adding other code points for two reasons:

But I'm against adding...

Comment 5 Anne 2014-04-15 16:07:17 UTC

I don't understand the requested 0xFC 0xFB mapping. Per the euc-jp decoder algorithm that becomes 8644 as pointer, which maps to U+FFE2 in index jis0208.

Comment 6 Anne 2014-04-28 15:15:03 UTC

*** Bug 16941 has been marked as a duplicate of this bug. ***

Comment 7 Jungshik Shin 2014-04-28 21:35:45 UTC

From bug 16141 : 
The ones that are still missing can be found in an IBM extension to JIS X 0212, Rows 83--84.
Tables:
    <http://coq.no/character-tables/ibmjapan2.pdf>
    <http://coq.no/character-tables/ibmjapan2.js>


However, if Firefox has never supported them [1], I'm not wed to them, either. And I'll just get rid of them from Chromium's new euc-jp table. Then, it'll be exactly the same as specified in the encoding spec. 

> I don't understand the requested 0xFC 0xFB mapping. Per the euc-jp decoder 
> algorithm that becomes 8644 as pointer, which maps to U+FFE2 in index jis0208.

You're right. It's there. Sorry about the noise on this code point. 

I'm closing this as wontfix.  

[1] The Chrome's current table (not the one I'm adding based on the encoding spec but the one that is currently used in released versions) was made by comparing IE, Firefox and ICU tables, but lost is how exactly I 'curated' and 'merged' them.