Charsets in Microsoft Internet Explorer 4

Last update 1/9/98 by christw@microsoft.com

The IANA charset registry is at ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets.

Display Name Preferred Charset ID (1) Additional Aliases (1) MLANG
code page (4)
Supported by versions (2) Notes
Arabic ASMO-708 ASMO-708   708 4CS logical order
Arabic Alphabet (DOS) DOS-720   720 4CS logical order
Arabic Alphabet (ISO) iso-8859-6 ISO_8859-6:1987, iso-ir-127, ISO_8859-6, ECMA-114, arabic, csISOLatinArabic 28596 4CS logical order
Arabic Alphabet (Windows) windows-1256   1256 4CS logical order
Baltic Alphabet (ISO) iso-8859-4 csISOLatin4, iso-ir-110, ISO_8859-4, ISO_8859-4:1988, l4, latin4 28594 4  
Baltic Alphabet (Windows) windows-1257   1257 3,4  
Central European (DOS) ibm852 cp852 852 4  
Central European Alphabet (ISO) iso-8859-2 csISOLatin2, iso-ir-101, iso8859-2, iso_8859-2, iso_8859-2:1987, l2, latin2 28592 3,4  
Central European Alphabet (Windows) windows-1250 x-cp1250 1250 3,4  
Chinese Simplified (GB2312) gb2312 chinese, csGB2312, csISO58GB231280, GB2312, GBK, GB_2312-80, iso-ir-58 936 3,4 BUG: mislabeled under Win95 and NT: gb2312 may contain GBK only characters
BUG: GB_2312-80 is a iso2022 7-bit charset
Chinese Simplified (HZ) hz-gb-2312   52936 4  
Chinese Traditional big5 csbig5, x-x-big5 950 3,4  
Cyrillic Alphabet (DOS) cp866 ibm866 866 4  
Cyrillic Alphabet (ISO) iso-8859-5 csISOLatinCyrillic, cyrillic, iso-ir-144, ISO_8859-5, ISO_8859-5:1988 28595 4 Bug: csISOLatin5, , l5, latin5 are also aliases and should not.
Cyrillic Alphabet (KOI8-R) koi8-r csKOI8R, koi 20866 3,4  
Cyrillic Alphabet (Windows) windows-1251 x-cp1251 1251 3,4  
Greek Alphabet (ISO) iso-8859-7 csISOLatinGreek, ECMA-118, ELOT_928, greek, greek8, iso-ir-126, ISO_8859-7, ISO_8859-7:1987 28597 3,4  
Greek Alphabet (Windows) iso-8859-7 windows-1253 1253 3,4 BUG: Mislabeled: capital alpha with tonos differs from iso-8859-7
Hebrew Alphabet (DOS) DOS-862   862 4CS logical order
Hebrew Alphabet (ISO) iso-8859-8 csISOLatinHebrew, hebrew, iso-ir-138, ISO_8859-8, visual, ISO-8859-8 Visual 28598 4CS visual order
Hebrew Alphabet (Windows) windows-1255 logical, ISO_8859-8:1988 , iso-ir-138 1255 3CS,4CS logical order
Japanese (JIS) iso-2022-jp csISO2022JP 50220 4 Converts HW to FW Kana on outgoing. Incoming it supports all JIS variants
Japanese (JIS -
allow 1-byte Kana)
csISO2022JP iso-2022-jp 50221 4 Sends HW Kana w/ESC
Japanese (JIS -
allow 1-byte Kana)
iso-2022-jp csISO2022JP 50222 3,4 Sends HW Kana w/SIO
Japanese (EUC) euc-jp csEUCPkdFmtJapanese, Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc, x-euc-jp 51932 3,4  
Japanese (Shift-JIS) shift_jis csShiftJIS, csWindows31J, ms_Kanji, shift-jis, x-ms-cp932, x-sjis 932 3,4  
Korean ks_c_5601-1987 csKSC56011987, euc-kr, korean, ks_c_5601 949 3,4 For labeling mail messages the charset ID "euc-kr" is used.
Bug: for IE5 make "KSC5601" preferred
Korean (ISO) iso-2022-kr csISO2022KR 50225 3,4  
Latin 3 Alphabet (ISO) iso-8859-3   28593 4 not used in the real world
Thai (Windows) windows-874   874 3,4  
Turkish Alphabet (Windows) iso-8859-9 Windows-1254 1254 3,4 Bug: entries for Latin5 are missing.
Note: no provisions taken to eliminate reserved 0x80-0x9F
Ukrainian Alphabet (KOI8-RU) koi8-ru   21866 4 BUG: proper name is "Cyrillic (Ukranian)"
Universal Alphabet (UTF-7) utf-7 csUnicode11UTF7, unicode-1-1-utf-7, x-unicode-2-0-utf-7 65000 4  
Universal Alphabet (UTF-8) utf-8 unicode-1-1-utf-8, unicode-2-0-utf-8, x-unicode-2-0-utf-8 65001 4  
Vietnamese Alphabet (Windows) windows-1258   1258 3,4  
Western Alphabet iso-8859-1 ANSI_X3.4-1968, ANSI_X3.4-1986, ascii, cp367, cp819, csASCII, IBM367, ibm819, iso-ir-100, iso-ir-6, ISO646-US, iso8859-1, ISO_646.irv:1991, iso_8859-1, iso_8859-1:1987, latin1, us, us-ascii, windows-1252, x-ansi 1252 3,4 BUG: Mislabeled for us-ascii only documents
Note: no provisions taken to eliminate reserved 0x80-0x9F
Nonstandard charsets with special meaning inside Internet Explorer and MLANG
- not to be used for labeling documents -
Universal Alphabet unicode   1200 4 Charset name is not relevant: BOM (3) is used to denote UCS-2
Universal Alphabet (Big-Endian) unicodeFEFF   1201 4 Charset name is not relevant: BOM is used to denote UCS-2
User Defined x-user-defined   50000 4 Uses windows-1252 decoder allowing any font. Designed to support "webfonts" that replace Latin1 with user defined characters.
Korean (Auto Select) _autodetect_kr   50949 4 ONLY for user selection on incoming documents, not for labeling.
Japanese (Auto Select) _autodetect   50932 3,4 ONLY for user selection on incoming documents, not for labeling.

(1) Aliases not supported by version 3 are colored blue. Refer to the "Supported by versions" column to find if the whole charset including aliases is unsupported by IE3.

(2) "CS" after the version number signals that the support is contained only in the "Complex Script" version of Internet Explorer. The Complex Script versions are  labeled as "Hebrew", "Arabic", "Middle East" or "Thai".

(3) BOM stands for Byte Order Mark and are the first 2 bytes of a UCS-2 document.

(4) MLANG.DLL is the charset translation library inside IE. MLANG uses the same code page IDs as Windows but not all MLANG IDs are available on all versions of Windows. In other words you can use the same code page ID to pass into MultiByteToWideChar() but not every code page ID listed here is available for all versions of Windows, even with the appropriate language pack. For example you can not use MultiByteToWideChar() with code page 65001 on Windows 95.