Last update 1/9/98 by christw@microsoft.com
The IANA charset registry is at ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets.
Display Name | Preferred Charset ID (1) | Additional Aliases (1) | MLANG code page (4) |
Supported by versions (2) | Notes |
---|---|---|---|---|---|
Arabic ASMO-708 | ASMO-708 | 708 | 4CS | logical order | |
Arabic Alphabet (DOS) | DOS-720 | 720 | 4CS | logical order | |
Arabic Alphabet (ISO) | iso-8859-6 | ISO_8859-6:1987, iso-ir-127, ISO_8859-6, ECMA-114, arabic, csISOLatinArabic | 28596 | 4CS | logical order |
Arabic Alphabet (Windows) | windows-1256 | 1256 | 4CS | logical order | |
Baltic Alphabet (ISO) | iso-8859-4 | csISOLatin4, iso-ir-110, ISO_8859-4, ISO_8859-4:1988, l4, latin4 | 28594 | 4 | |
Baltic Alphabet (Windows) | windows-1257 | 1257 | 3,4 | ||
Central European (DOS) | ibm852 | cp852 | 852 | 4 | |
Central European Alphabet (ISO) | iso-8859-2 | csISOLatin2, iso-ir-101, iso8859-2, iso_8859-2, iso_8859-2:1987, l2, latin2 | 28592 | 3,4 | |
Central European Alphabet (Windows) | windows-1250 | x-cp1250 | 1250 | 3,4 | |
Chinese Simplified (GB2312) | gb2312 | chinese, csGB2312, csISO58GB231280, GB2312, GBK, GB_2312-80, iso-ir-58 | 936 | 3,4 | BUG: mislabeled under Win95 and NT: gb2312 may contain GBK only characters BUG: GB_2312-80 is a iso2022 7-bit charset |
Chinese Simplified (HZ) | hz-gb-2312 | 52936 | 4 | ||
Chinese Traditional | big5 | csbig5, x-x-big5 | 950 | 3,4 | |
Cyrillic Alphabet (DOS) | cp866 | ibm866 | 866 | 4 | |
Cyrillic Alphabet (ISO) | iso-8859-5 | csISOLatinCyrillic, cyrillic, iso-ir-144, ISO_8859-5, ISO_8859-5:1988 | 28595 | 4 | Bug: csISOLatin5, , l5, latin5 are also aliases and should not. |
Cyrillic Alphabet (KOI8-R) | koi8-r | csKOI8R, koi | 20866 | 3,4 | |
Cyrillic Alphabet (Windows) | windows-1251 | x-cp1251 | 1251 | 3,4 | |
Greek Alphabet (ISO) | iso-8859-7 | csISOLatinGreek, ECMA-118, ELOT_928, greek, greek8, iso-ir-126, ISO_8859-7, ISO_8859-7:1987 | 28597 | 3,4 | |
Greek Alphabet (Windows) | iso-8859-7 | windows-1253 | 1253 | 3,4 | BUG: Mislabeled: capital alpha with tonos differs from iso-8859-7 |
Hebrew Alphabet (DOS) | DOS-862 | 862 | 4CS | logical order | |
Hebrew Alphabet (ISO) | iso-8859-8 | csISOLatinHebrew, hebrew, iso-ir-138, ISO_8859-8, visual, ISO-8859-8 Visual | 28598 | 4CS | visual order |
Hebrew Alphabet (Windows) | windows-1255 | logical, ISO_8859-8:1988 , iso-ir-138 | 1255 | 3CS,4CS | logical order |
Japanese (JIS) | iso-2022-jp | csISO2022JP | 50220 | 4 | Converts HW to FW Kana on outgoing. Incoming it supports all JIS variants |
Japanese (JIS - allow 1-byte Kana) |
csISO2022JP | iso-2022-jp | 50221 | 4 | Sends HW Kana w/ESC |
Japanese (JIS - allow 1-byte Kana) |
iso-2022-jp | csISO2022JP | 50222 | 3,4 | Sends HW Kana w/SIO |
Japanese (EUC) | euc-jp | csEUCPkdFmtJapanese, Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc, x-euc-jp | 51932 | 3,4 | |
Japanese (Shift-JIS) | shift_jis | csShiftJIS, csWindows31J, ms_Kanji, shift-jis, x-ms-cp932, x-sjis | 932 | 3,4 | |
Korean | ks_c_5601-1987 | csKSC56011987, euc-kr, korean, ks_c_5601 | 949 | 3,4 | For labeling mail messages the charset ID "euc-kr" is used. Bug: for IE5 make "KSC5601" preferred |
Korean (ISO) | iso-2022-kr | csISO2022KR | 50225 | 3,4 | |
Latin 3 Alphabet (ISO) | iso-8859-3 | 28593 | 4 | not used in the real world | |
Thai (Windows) | windows-874 | 874 | 3,4 | ||
Turkish Alphabet (Windows) | iso-8859-9 | Windows-1254 | 1254 | 3,4 | Bug: entries for Latin5 are missing. Note: no provisions taken to eliminate reserved 0x80-0x9F |
Ukrainian Alphabet (KOI8-RU) | koi8-ru | 21866 | 4 | BUG: proper name is "Cyrillic (Ukranian)" | |
Universal Alphabet (UTF-7) | utf-7 | csUnicode11UTF7, unicode-1-1-utf-7, x-unicode-2-0-utf-7 | 65000 | 4 | |
Universal Alphabet (UTF-8) | utf-8 | unicode-1-1-utf-8, unicode-2-0-utf-8, x-unicode-2-0-utf-8 | 65001 | 4 | |
Vietnamese Alphabet (Windows) | windows-1258 | 1258 | 3,4 | ||
Western Alphabet | iso-8859-1 | ANSI_X3.4-1968, ANSI_X3.4-1986, ascii, cp367, cp819, csASCII, IBM367, ibm819, iso-ir-100, iso-ir-6, ISO646-US, iso8859-1, ISO_646.irv:1991, iso_8859-1, iso_8859-1:1987, latin1, us, us-ascii, windows-1252, x-ansi | 1252 | 3,4 | BUG: Mislabeled for us-ascii only documents Note: no provisions taken to eliminate reserved 0x80-0x9F |
Nonstandard charsets with special meaning inside Internet Explorer
and MLANG - not to be used for labeling documents - |
|||||
Universal Alphabet | unicode | 1200 | 4 | Charset name is not relevant: BOM (3) is used to denote UCS-2 | |
Universal Alphabet (Big-Endian) | unicodeFEFF | 1201 | 4 | Charset name is not relevant: BOM is used to denote UCS-2 | |
User Defined | x-user-defined | 50000 | 4 | Uses windows-1252 decoder allowing any font. Designed to support "webfonts" that replace Latin1 with user defined characters. | |
Korean (Auto Select) | _autodetect_kr | 50949 | 4 | ONLY for user selection on incoming documents, not for labeling. | |
Japanese (Auto Select) | _autodetect | 50932 | 3,4 | ONLY for user selection on incoming documents, not for labeling. |
(1) Aliases not supported by version 3 are colored blue. Refer to the "Supported by versions" column to find if the whole charset including aliases is unsupported by IE3.
(2) "CS" after the version number signals that the support is contained only in the "Complex Script" version of Internet Explorer. The Complex Script versions are labeled as "Hebrew", "Arabic", "Middle East" or "Thai".
(3) BOM stands for Byte Order Mark and are the first 2 bytes of a UCS-2 document.
(4) MLANG.DLL is the charset translation library inside IE. MLANG uses the same code page IDs as Windows but not all MLANG IDs are available on all versions of Windows. In other words you can use the same code page ID to pass into MultiByteToWideChar() but not every code page ID listed here is available for all versions of Windows, even with the appropriate language pack. For example you can not use MultiByteToWideChar() with code page 65001 on Windows 95.