This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
When I compared the mapping of EUC-KR in the encoding spec with ICU's Windows-949 [1] (that was obtained by scraping *one of Windows' converters*, I found the following differences: 1. ICU's Windows-949 mapping has 395 'decoding only' (from Unicode to windows-949) entries for characters like 'Currency Sign cent' (U+00A2, U+00A3), regular Latin/Greek/Cyrillic letters, and Hangul Conjoining Jamos (U+11xx), Hangul half-width jamos (U+FFxx), enclosed CJK characters (e.g. U+32xx ) etc. 2. ICU's Windows-949 has 190 additional round-trip mapping entries. Most of them (188 of them) are for the two user-defined blocks in KS X 1001 (in EUC-KR, "C9 [A1-FE]" and "FE [A1-FE]") that are mapped to PUA code points (U+E000 - U+E0BB). The remaining two are U+0080 and U+F8F7 mapped to 0x80 and 0xFF. I don't think that we want to support the two user-defined blocks in KS X 1001. I'm not sure about U+0080 and U+F8F7. However, I believe that quite many (NOT all) of 'decoding only' entries had better be supported. [1] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/source/data/mappings/windows-949-2000.ucm&q=windows-949-2000.ucm&sq=package:chromium&type=cs
Created attachment 1565 [details] ICU's windows-949 : decoding only entries
If you go from Unicode to euc-kr, it is called encoding, not decoding. E.g. the stuff you need for <form> and URL.
You're absolutely right ! I must have had more 'coffee' ;-)
The attachment title should be changed to 'encoding only entries'(In reply to Jungshik Shin from comment #1) > Created attachment 1565 [details] > ICU's windows-949 : decoding only entries This should be 'ICU's windows-949 : encoding only entries'.
So you attached 394 "encoding only" entries. How should I know which ones we want to add to the standard and which we want to ignore?
I tested your attached code points. Chrome and Firefox encode them as "HTML entities". The default error handling mode. Safari has these 394 mappings. Internet Explorer outputs "HTML entities" too, however, they're not always numeric, but are sometimes named. This is truly bizarre. Anyway, given these results, I don't think any changes are warranted here, as only Safari does what you suggest, but legacy content is far more likely to rely on what Internet Explorer does, which is pretty close to what Chrome, Firefox, and the Standard do (and often matches). https://dump.testsuite.org/encoding/form-encoding-special-euc-kr.html
Chrome used to behave like Safari until I changed its EUC-KR to use the current encoding spec. So, the following is a bit circular. > Chrome and Firefox encode them as "HTML entities". The default error handling mode. Anyway, it's not terribly important.