W3C is pleased to receive the XML Japanese Profile from Xerox, Panasonic, Toshiba, GLOCOM, Academica Sinica, Alis Technologies, and Sun Microsystems.
The submission provides advice on how to encode and transfer Japanese data in XML. The main concern is to avoid conversion problems between legacy encodings such as shift_jis, euc-jp, and iso-2022-jp and Unicode/ISO 10646- based encodings such as UTF-8 and UTF-16. The problems appear in three areas. The first is the the 7-bit codepoints 0x5C (backslash or yen) and 0x7e (tilde or overline). The second is a series of about ten symbols that are mapped differently by different converters on different systems. The third is the presence or absence of vendor-specific extensions. The bulk of the characters, in particular all standard kanji and all letters, are fortunately not affected by these problems.
The submission provides advice on how to avoid conversion problems, in order of priority:
While such conversion problems are not very important for HTML pages viewed directly in a browser, they become very serious in the context of automatic data exchange and digital signatures. Ideally, the various vendors and consortia that defined the various incompatible conversion tables would agree to converge to a single conversion table.
The submission will be brought to the attention of the Internationalization Working Group and Interest Group as well as the XML Signature Working Group.
Disclaimer: Placing a Submission on a Working Group agenda does not imply endorsement by either the W3C Staff or the participants of the Working Group, nor does it guarantee that the Working Group will agree to take any specific action on a Submission.
This Member Submission has been updated on 24 March 2005. The updates were editorial, or were made to reflect updates to base standards. The above team comment still applies. The Member Submission has been useful as a reference in several W3C Recommendations.