This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7444 - EUC-JP and ISO-2022-JP also need replacement encodings: CP51932 (or eucJP-ms) and CP50221.
Summary: EUC-JP and ISO-2022-JP also need replacement encodings: CP51932 (or eucJP-ms)...
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P3 normal
Target Milestone: LC
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://whatwg.org/specs/web-apps/curr...
Whiteboard: charset
Keywords: NE
Depends on:
Blocks:
 
Reported: 2009-08-27 18:08 UTC by contributor
Modified: 2011-11-09 07:45 UTC (History)
7 users (show)

See Also:


Attachments
EUC-JP on WinIE (43.83 KB, image/png)
2010-03-18 12:55 UTC, NARUSE, Yui
Details
EUC-JP on MacFx3.6 (91.04 KB, image/png)
2010-03-18 12:56 UTC, NARUSE, Yui
Details
EUC-JP on Safari4 (69.45 KB, image/png)
2010-03-18 12:57 UTC, NARUSE, Yui
Details
EUC-JP on MacChrome5 (125.73 KB, image/png)
2010-03-18 12:58 UTC, NARUSE, Yui
Details
EUC-JP on WinOpera10 (58.51 KB, image/png)
2010-03-18 12:59 UTC, NARUSE, Yui
Details

Description contributor 2009-08-27 18:08:09 UTC
Section: http://whatwg.org/specs/web-apps/current-work/#character-encodings-0

Comment:
EUC-JP and ISO-2022-JP also need replacement encodings: CP51932 (or eucJP-ms) and CP50221.

Posted from: 210.138.109.139
Comment 1 Ian 'Hixie' Hickson 2009-09-21 23:15:45 UTC
Waiting for Anne to do this.
Comment 2 Maciej Stachowiak 2010-03-14 14:50:16 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
  http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.
Comment 3 NARUSE, Yui 2010-03-18 12:55:59 UTC
Created attachment 832 [details]
EUC-JP on WinIE
Comment 4 NARUSE, Yui 2010-03-18 12:56:42 UTC
Created attachment 833 [details]
EUC-JP on MacFx3.6
Comment 5 NARUSE, Yui 2010-03-18 12:57:30 UTC
Created attachment 834 [details]
EUC-JP on Safari4
Comment 6 NARUSE, Yui 2010-03-18 12:58:24 UTC
Created attachment 835 [details]
EUC-JP on MacChrome5
Comment 7 NARUSE, Yui 2010-03-18 12:59:10 UTC
Created attachment 836 [details]
EUC-JP on WinOpera10
Comment 8 NARUSE, Yui 2010-03-18 14:01:32 UTC
First I described about EUC-JP.

See attached images begin with EUC-JP.
They are showing http://coq.no/X/charset5/test-EUC-JP.php?EUC-JP with
* Internet Explorer 6 on Windows XP
* Firefox 3.6 on Mac OS X 10.5
* Safari 4.0.5 on Mac OS X 10.5
* Google Chrome 5 on Mac OS X 10.5
* Opera 10.0 on Windows Vista

All of them can show
(0) ASCII (yen sign/back solidus is beyond this ticket)
(1) JIS X 0208 before 1990
(2) Half-width katakana
* NEC selected IBM extended characters
  (1st and 2nd character of labeled as `IBM')

IE, Firefox, Chrome and Opera can show
* NEC special characters (labeled as `KanjiTalk 6/7, NEC' and `NEC')

Firefox, Safari, Chrome and Opera can show
* JIS X 0212 derived from IBM extended character (3rd-6th of `IBM')

Firefox, Chrome and Opera can show
(3) JIS X 0212-1990

Safari and Chrome can show
* IBM extended chacater (last one of `IBM')

No one can show
(1) JIS X 0208 after 1990
* DEC Kanji and KanjiTalk

IANA defined EUC-JP as following but real implementations are above.

Name: Extended_UNIX_Code_Packed_Format_for_Japanese
MIBenum: 18
Source: Standardized by OSF, UNIX International, and UNIX Systems
        Laboratories Pacific.  Uses ISO 2022 rules to select
               code set 0: US-ASCII (a single 7-bit byte set)
               code set 1: JIS X0208-1990 (a double 8-bit byte set)
                           restricted to A0-FF in both bytes
               code set 2: Half Width Katakana (a single 7-bit byte set)
                           requiring SS2 as the character prefix
               code set 3: JIS X0212-1990 (a double 7-bit byte set)
                           restricted to A0-FF in both bytes
                           requiring SS3 as the character prefix
Alias: csEUCPkdFmtJapanese
Alias: EUC-JP  (preferred MIME name)

CP51932 is:
(0) ASCII (yen sign/back solidus is beyond this ticket)
(1) JIS X 0208-1983
    NEC special characters
    NEC selected IBM extended characters
(2) Half-width katakana
http://nkf.sourceforge.jp/ucm/cp51932.ucm

All browser without Safari can show this character set.

Safari cannnot show NEC special characters;
but Chrome, whose engine is the same of Safari: WebKit, can show,
so I think this is Safari's bug.
Comment 9 Ian 'Hixie' Hickson 2010-03-31 20:11:37 UTC
Forgive me, for I am not well-versed in these encodings.

What should I put in the spec in the "Character encoding overrides" table?
Comment 10 NARUSE, Yui 2010-04-02 23:11:59 UTC
(In reply to comment #9)
> Forgive me, for I am not well-versed in these encodings.
> 
> What should I put in the spec in the "Character encoding overrides" table?

I think, what want you say is "'EUC-JP' is actually Windows Codepae 51932" is not kind for readers of HTML5.
It is reasonable, so I'm trying to register CP51932:
http://mail.apps.ietf.org/ietf/charsets/msg01877.html
Comment 11 Ian 'Hixie' Hickson 2010-04-12 22:11:08 UTC
Thank you for starting the registration process. Much appreciated. I'll update the spec once the registry is updated.
Comment 12 Ian 'Hixie' Hickson 2010-04-13 09:36:23 UTC
Marking this REMIND for now for tracking purposes; please feel free to reopen whenever the encoding is registered. I'll check this periodically.
Comment 13 Masatoshi Kimura 2010-09-17 12:29:38 UTC
CP51932 has been registered now.
http://www.iana.org/assignments/character-sets
http://www.iana.org/assignments/charset-reg/CP51932
You can use it as a replacement encoding for EUC-JP.
Comment 14 Ian 'Hixie' Hickson 2010-09-29 19:12:33 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter's comments.

I've added the EUC-JP to CP51932 mapping.

Should there also be a mapping for ISO-2022-JP? This was mentioned in the first comment but wasn't mentioned afterwards.

Also, do you know what I should use as the EUC-JP reference?
Comment 15 contributor 2010-09-29 19:15:10 UTC
Checked in as WHATWG revision r5560.
Check-in comment: Canonical mapping for EUC-JP for compat reasons.
http://html5.org/tools/web-apps-tracker?from=5559&to=5560
Comment 16 NARUSE, Yui 2010-09-29 22:33:54 UTC
(In reply to comment #14)
> EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> satisfied with this response, please change the state of this bug to CLOSED. If
> you have additional information and would like the editor to reconsider, please
> reopen this bug. If you would like to escalate the issue to the full HTML
> Working Group, please add the TrackerRequest keyword to this bug, and suggest
> title and text for the tracker issue; or you may create a tracker issue
> yourself, if you are able to do so. For more details, see this document:
>    http://dev.w3.org/html5/decision-policy/decision-policy.html
> 
> Status: Accepted
> Change Description: see diff given below
> Rationale: Concurred with reporter's comments.
> 
> I've added the EUC-JP to CP51932 mapping.

Thank you!

> Should there also be a mapping for ISO-2022-JP? This was mentioned in the first
> comment but wasn't mentioned afterwards.

I posted a registration of CP50220 on 2010-09-17.
When it is registered, it should be used as ISO-2022-JP.

> Also, do you know what I should use as the EUC-JP reference?

EUC-JP, which includes US-ASCII, JIS X 0201 Katakana, JIS X 0208, and JIS X 0212,
is defined in "UI-OSF ú{ê«ÀKñ Version 1.1".
http://home.m05.itscom.net/numa/uocjle-a4.pdf voluntary uploaded 
http://home.m05.itscom.net/numa/uocjleE.pdf voluntary uploaded

It is referred by at least Japanese-Locale-Policy and Solaris 10's Japanese Manual.
http://www.linux.or.jp/JF/JFdocs/Japanese-Locale-Policy.txt
http://docs.sun.com/app/docs/doc/819-0364/ja.locale-10002?a=view
Comment 17 Henri Sivonen 2010-09-30 06:34:40 UTC
FWIW, Gecko indeed uses Microsoft-style data tables instead of the de jure tables for all three Japanese encodings. (Except on OS/2 where IBM-style tables are used instead.)
Comment 18 Ian 'Hixie' Hickson 2010-10-05 22:04:55 UTC
For the reference, I use a single document title, a list of names of editors, if any, and the name of the standards organisation that published the document, if any. Could you let me know what I should use of EUC-JP based on your comments above? Ideally using just ASCII and English, I'm afraid my understanding of Japanese is rather limited. :-(
Comment 19 NARUSE, Yui 2010-10-05 23:12:28 UTC
(In reply to comment #18)
> For the reference, I use a single document title, a list of names of editors,
> if any, and the name of the standards organisation that published the document,
> if any. Could you let me know what I should use of EUC-JP based on your
> comments above? Ideally using just ASCII and English, I'm afraid my
> understanding of Japanese is rather limited. :-(

It should be "Definition and Notes of Japanese EUC".
It is written by UI-OSF-USLP.
(the Open Software Foundation, Inc., UNIX International, Inc, and UNIX System Laboratries Pacific, Ltd.) see C.1.1
It is included in Annex C of http://home.m05.itscom.net/numa/uocjleE.pdf

P.S. I'm ok about "Y. Naruse" in http://html5.org/tools/web-apps-tracker?from=5559&to=5560
Comment 20 Ian 'Hixie' Hickson 2010-10-12 08:03:23 UTC
> It should be "Definition and Notes of Japanese EUC".
> It is written by UI-OSF-USLP.
> (the Open Software Foundation, Inc., UNIX International, Inc, and UNIX System
> Laboratries Pacific, Ltd.) see C.1.1
> It is included in Annex C of http://home.m05.itscom.net/numa/uocjleE.pdf

Awesome, thanks. I've updated the spec (diff below).

> P.S. I'm ok about "Y. Naruse" in
> http://html5.org/tools/web-apps-tracker?from=5559&to=5560

Thanks, that makes my life easier. :-)


I'll mark this bug REMIND again while we wait for IANA to register CP50220. Please don't hesitate to reopen the bug once it's registered so that I can update the spec accordingly.

Thank you so much for your patience and help with this bug. It is much appreciated.
Comment 21 contributor 2010-10-12 08:03:47 UTC
Checked in as WHATWG revision r5607.
Check-in comment: EUC-JP reference.
http://html5.org/tools/web-apps-tracker?from=5606&to=5607
Comment 22 NARUSE, Yui 2011-10-01 13:50:23 UTC
(In reply to comment #20)
> I'll mark this bug REMIND again while we wait for IANA to register CP50220.
> Please don't hesitate to reopen the bug once it's registered so that I can
> update the spec accordingly.

Recently CP50220 has registered as MIBenum: 2260.
http://www.iana.org/assignments/character-sets
Comment 23 Ian 'Hixie' Hickson 2011-10-03 22:26:46 UTC
Awesome, thanks.
Comment 24 Ian 'Hixie' Hickson 2011-10-06 06:35:43 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: I've added the ISO-2022-JP mapping as requested. Please check the diff below and the spec as it now stands, and let me know if there's anything further than needs doing (reopen the bug if so).

Thanks agan for your help, much appreciated!
Comment 25 contributor 2011-10-06 06:40:15 UTC
Checked in as WHATWG revision r6646.
Check-in comment: Define compatibility mapping for ISO-2022-JP.
http://html5.org/tools/web-apps-tracker?from=6645&to=6646
Comment 26 NARUSE, Yui 2011-11-09 07:45:51 UTC
I'm ok, thanks!