This page is no longer maintained and may be inaccurate. For more up-to-date information, see the Internationalization Activity home page.
Here is a partial list of the charsets most typically used for Web pages in various languages:
Language | charset |
---|---|
Afrikaans (af) | iso-8859-1, windows-1252 |
Albanian (sq) | iso-8859-1, windows-1252 |
Arabic (ar) | iso-8859-6 |
Basque (eu) | iso-8859-1, windows-1252 |
Bulgarian (bg) | iso-8859-5 |
Byelorussian (be) | iso-8859-5 |
Catalan (ca) | iso-8859-1, windows-1252 |
Croatian (hr) | iso-8859-2, windows-1250 |
Czech (cs) | iso-8859-2 |
Danish (da) | iso-8859-1, windows-1252 |
Dutch (nl) | iso-8859-1, windows-1252 |
English (en) | iso-8859-1, windows-1252 |
Esperanto (eo) | iso-8859-3* |
Estonian (et) | iso-8859-15 |
Faroese (fo) | iso-8859-1, windows-1252 |
Finnish (fi) | iso-8859-1, windows-1252 |
French (fr) | iso-8859-1, windows-1252 |
Galician (gl) | iso-8859-1, windows-1252 |
German (de) | iso-8859-1, windows-1252 |
Greek (el) | iso-8859-7 |
Hebrew (iw) | iso-8859-8 |
Hungarian (hu) | iso-8859-2 |
Icelandic (is) | iso-8859-1, windows-1252 |
Inuit (Eskimo) languages | iso-8859-10* |
Irish (ga) | iso-8859-1, windows-1252 |
Italian (it) | iso-8859-1, windows-1252 |
Japanese (ja) | shift_jis, iso-2022-jp, euc-jp |
Korean (ko) | euc-kr |
Lapp | iso-8859-10* ** |
Latvian (lv) | iso-8859-13, windows-1257 |
Lithuanian (lt) | iso-8859-13, windows-1257 |
Macedonian (mk) | iso-8859-5, windows-1251 |
Maltese (mt) | iso-8859-3* |
Norwegian (no) | iso-8859-1, windows-1252 |
Polish (pl) | iso-8859-2 |
Portuguese (pt) | iso-8859-1, windows-1252 |
Romanian (ro) | iso-8859-2 |
Russian (ru) | koi8-r, iso-8859-5 |
Scottish (gd) | iso-8859-1, windows-1252 |
Serbian (sr) cyrillic | windows-1251, iso-8859-5*** |
Serbian (sr) latin | iso-8859-2, windows-1250 |
Slovak (sk) | iso-8859-2 |
Slovenian (sl) | iso-8859-2, windows-1250 |
Spanish (es) | iso-8859-1, windows-1252 |
Swedish (sv) | iso-8859-1, windows-1252 |
Turkish (tr) | iso-8859-9, windows-1254 |
Ukrainian (uk) | iso-8859-5 |
* = scarce support in browsers
** = Lapp doesn't have a 2-letter code, a three letter code (lap) is proposed in NISO Z39.53.
*** =
Serbian can be written in Latin (most commonly used) and Cyrillic (mostly windows-1251)
Note that UTF-8 can be used for all languages and is the recommended charset on the Internet. Support for it is rapidly increasing.
For Hebrew in HTML, iso-8859-8 is the same as iso-8859-8-i ('implicit directionality'). This is unlike e-mail, where they are different.
For more 2-letter language codes, see ISO 639.
Thanks to J.N. Zonjee for providing additional data on the Serbo-Croatian family of languages.
A study during the first half of 1997 by the Babel team gave these counts on a sample of 3239 home pages:
count | percentage | languages | |
---|---|---|---|
iso-8859-2 | 1 | 0.031% | Czech |
iso-8859-5 | 2 | 0.062% | Russian |
macintosh | 3 | 0.093% | 1 German, 1 French, 1 Italian |
windows-850 | 4 | 0.12% | 1 French, 2 German |
windows-1251 | 6 | 0.19% | Russian |
windows-1250 | 10 | 0.31% | Czech |
euc-jp | 12 | 0.37% | Japanese |
iso-2022-jp | 38 | 1.2% | Japanese |
shift_jis | 51 | 1.6% | Japanese |
windows-1252 (includes iso-8859-1) | 3112 | 96% | 4 Malay, 9 Danish, 14 Finnish, 19 Norwegian, 20 Dutch, 21 Portugese, 30 Italian, 35 Swedish, 38 Spanish, 57 French, 143 German, 2722 English |
(Counts based on automatic analysis of page contents, not on actual 'charset' labels, since they are still too often missing or incorrect.)