W3C International Internationalization

This page is no longer maintained and may be inaccurate. For more up-to-date information, see the Internationalization Activity home page.


Identifiers in programming languages have usually been in ASCII. Some programming languages are case-sensitive, some are not. A more recent programming language like Java allows identifiers with a much wider character repertoire. Web formats also use non-ASCII identifiers: HTML forms identify buttons by name; RDF gives names to properties of resources; XML allows elements and attributes in a document to be called with non-ASCII names.

The use of ASCII only leads to some assumptions that don't extend easily. For example, every letter has a unique upper- and lowercase variant. Beyond ASCII, there are scripts that don't have any concept of case, there are single letters that have a letter pair as an equivalent, and upper-case/lower-case equivalents can depend on language. This is the reason why for example XML is case-sensitive.

Keyboard limitations are often cited as one potential problem. But even if a language uses thousands of characters, there is standard software to input these characters from a general keyboard.

ASCII is not totally unambiguous. People have learned to distinguish between 'l' and '1', or 'O' and '0', in ASCII. People using other scripts similarly know where characters can be difficult to identify, and can handle them as long as the differences are visible.

Things that are indistinguishable visually ('' and '') may be written in two different ways (single character and character with floating accent).

To make internationalized identifiers work, it has been suggested that there should be a single convention, that system designers are encouraged to adhere to. That way the likelihood of mistakes by, or surprises for users are minimized.

Martin Dürst has written a proposal (draft-duerst-i18n-norm-04.txt, expired ) in that direction.

The issue affects not just the Web, but the whole Internet. Martin Dürst's draft is discussed on the URI mailing list (uri@bunyip.com).

W3C Bert Bos
Last updated $Date: 2008/05/07 17:37:06 $