W3C International Internationalization

This page is no longer maintained and may be inaccurate. For more up-to-date information, see the Internationalization Activity home page.

Warning: The material on this page is no longer maintained and is very old and outdated!

We recommend the use of UTF-8 wherever possible.

Languages, countries, and the charsets typically used for them

Here is a partial list of the charsets most typically used for Web pages in various languages:

 Language   charset 
 Afrikaans (af)   iso-8859-1, windows-1252 
 Albanian (sq)   iso-8859-1, windows-1252 
 Arabic (ar)   iso-8859-6 
 Basque (eu)   iso-8859-1, windows-1252 
 Bulgarian (bg)   iso-8859-5 
 Byelorussian (be)   iso-8859-5 
 Catalan (ca)   iso-8859-1, windows-1252 
 Croatian (hr)   iso-8859-2, windows-1250
 Czech (cs)   iso-8859-2 
 Danish (da)   iso-8859-1, windows-1252 
 Dutch (nl)   iso-8859-1, windows-1252 
 English (en)   iso-8859-1, windows-1252 
 Esperanto (eo)   iso-8859-3* 
 Estonian (et)   iso-8859-15 
 Faroese (fo)   iso-8859-1, windows-1252 
 Finnish (fi)   iso-8859-1, windows-1252 
 French (fr)   iso-8859-1, windows-1252 
 Galician (gl)   iso-8859-1, windows-1252 
 German (de)   iso-8859-1, windows-1252 
 Greek (el)   iso-8859-7 
 Hebrew (iw)   iso-8859-8 
 Hungarian (hu)   iso-8859-2 
 Icelandic (is)   iso-8859-1, windows-1252 
 Inuit (Eskimo) languages   iso-8859-10* 
 Irish (ga)   iso-8859-1, windows-1252 
 Italian (it)   iso-8859-1, windows-1252 
 Japanese (ja)   shift_jis, iso-2022-jp, euc-jp 
Korean (ko) euc-kr
 Lapp   iso-8859-10* **
 Latvian (lv)   iso-8859-13, windows-1257 
 Lithuanian (lt)   iso-8859-13, windows-1257 
 Macedonian (mk)   iso-8859-5, windows-1251
 Maltese (mt)   iso-8859-3* 
 Norwegian (no)   iso-8859-1, windows-1252 
 Polish (pl)   iso-8859-2 
 Portuguese (pt)   iso-8859-1, windows-1252 
 Romanian (ro)   iso-8859-2 
 Russian (ru)   koi8-r, iso-8859-5 
 Scottish (gd)   iso-8859-1, windows-1252 
 Serbian (sr) cyrillic  windows-1251, iso-8859-5*** 
 Serbian (sr) latin  iso-8859-2, windows-1250 
 Slovak (sk)   iso-8859-2 
 Slovenian (sl)   iso-8859-2, windows-1250
 Spanish (es)   iso-8859-1, windows-1252 
 Swedish (sv)   iso-8859-1, windows-1252 
 Turkish (tr)   iso-8859-9, windows-1254 
 Ukrainian (uk)   iso-8859-5   

* = scarce support in browsers
** = Lapp doesn't have a 2-letter code, a three letter code (lap) is proposed in NISO Z39.53.
*** = Serbian can be written in Latin (most commonly used) and Cyrillic (mostly windows-1251)

Note that UTF-8 can be used for all languages and is the recommended charset on the Internet. Support for it is rapidly increasing.

For Hebrew in HTML, iso-8859-8 is the same as iso-8859-8-i ('implicit directionality'). This is unlike e-mail, where they are different.

For more 2-letter language codes, see ISO 639.

Thanks to J.N. Zonjee for providing additional data on the Serbo-Croatian family of languages.

Most popular charsets

A study during the first half of 1997 by the Babel team gave these counts on a sample of 3239 home pages:

  count     percentage   languages  
  iso-8859-2   1   0.031%   Czech
  iso-8859-5   2   0.062%   Russian
  macintosh   3   0.093%   1 German, 1 French, 1 Italian  
  windows-850   4   0.12%   1 French, 2 German
  windows-1251   6   0.19%   Russian
  windows-1250   10   0.31%   Czech
  euc-jp   12   0.37%   Japanese
  iso-2022-jp   38   1.2%   Japanese
  shift_jis   51   1.6%   Japanese
  windows-1252  (includes iso-8859-1)   3112   96%   4 Malay, 9 Danish, 14 Finnish, 19 Norwegian, 20 Dutch, 21 Portugese, 30 Italian, 35 Swedish, 38 Spanish, 57 French, 143 German, 2722 English

(Counts based on automatic analysis of page contents, not on actual 'charset' labels, since they are still too often missing or incorrect.)


W3C Bert Bos, i18n coordinator
Webmaster
Last updated 12 February 2002