Two-letter or three-letter ISO language codes


Should I use two-letter or three-letter ISO language codes in language tags?

The Internet and the Web uses language tags to indicate the natural language of text in protocols and formats, such as HTML, XHTML, XML, HTTP, and others. In the past, the language tag values were defined by RFC 3066 Tags for the Identification of Languages (and its predecessor RFC 1766) and they began with either a ISO 639-1 two-letter language code or ISO 639-2 three-letter code.

For some languages there were both two-letter and three-letter alternatives in the ISO codes. (And for some languages there were even two three-letter alternatives to choose from.) People were sometimes confused about which ISO code they should use in a language tag.


The answer is neither!

The current IETF specification describing how to create language tags is referred to as BCP 47. This no longer refers you to the ISO code lists. Instead you should look for the appropriate subtags in the new IANA Language Subtag Registry. This registry contains only one subtag per language, so there is no longer any ambiguity.

Although you now need to look in the IANA Language Subtag Registry rather than the ISO code lists, the language tags you have been using so far don't need to change (as long as you followed the 'shortest code' rule). This is just a change in how you access them.

The IANA Language Subtag Registry still uses, and keeps up to date with, codes from the ISO standards, but the registry maintainers take care that there is just one subtag for any language: either a two-letter or a three-letter subtag.

For more information about how to use the new language tag syntax and the registry, see the articles Language Tags in HTML and XML and Choosing a Language Tag.