World Wide Web Consortium Issues Critical Internationalization Recommendation

Author(s) and publish date


"Character Model of the World Wide Web - Fundamentals" Brings Unified Approach to Using Characters on the Web -- 15 February 2005 -- The World Wide Web Consortium (W3C) has published the "Character Model of the World Wide Web: Fundamentals" as a W3C Recommendation. It provides a well-defined and well-understood way for Web applications to transmit and process the characters of the world's languages.

This architectural Recommendation gives authors of specifications, software developers, and content developers a common reference, enabling interoperable text manipulation on the World Wide Web. It builds on the Universal Character Set, defined jointly by the Unicode Standard and ISO/IEC 10646. Topics include use of the terms 'character', 'encoding' and 'string', a reference processing model, choice and identification of character encodings, character escaping, and string indexing.

The goal of the Character Model for the World Wide Web is to facilitate use of the Web by all people, regardless of their language, script, writing system, and cultural conventions, in accordance with the W3C goal of universal access.

Unicode Brings the Universal Character Set to the Web

At the core of the character model is the Universal Character Set (UCS). The model allows Web technologies to support text in the world's scripts (and on different platforms) and to be exchanged, read, and searched by Web users around the world. Unicode was chosen because it provides a way of referencing characters independent of the encoding of the text, it is being updated and completed carefully, and it is widely accepted and implemented by industry.

W3C adopted Unicode as the document character set for HTML in HTML 4.0. The same approach was later used for Recommendations such as XML 1.0 and CSS Level 2. W3C specifications and applications now use Unicode as the common reference character set.

New Specification Clarifies Character Usage on the Web

As the number of Web applications increases, the need for a shared character model has become more critical. Unicode is the natural choice as the basis for that shared model, especially as applications developers begin to consolidate their encoding options. However, applying Unicode to the Web requires additional specifications; this is the purpose of the W3C Character Model series.

Some aspects particular to the Web that receive more explanation in the series include:

  • Choice of Unicode encoding forms (UTF-8, UTF-16, UTF-32)
  • Counting characters, measuring string length in the presence of variable-length character encodings and combining characters
  • Duplicate encodings of characters (e.g., precomposed vs. decomposed)
  • Use of escape mechanisms to represent characters

Series Documents to Be Completed in 2005

Today's Recommendation is the first in a set of three documents. In development are "Character Model for the World Wide Web 1.0: Normalization," specifying early uniform normalization and string identity matching for text manipulation, and "Character Model for the World Wide Web 1.0: Resource Identifiers," specifying IRI conventions.

Industry Leaders Key in Development of Character Model Series

The Character Model was developed by the W3C Internationalization Activity's Working Group (now the W3C Internationalization Core Working Group) with the help of the W3C Internationalization Interest Group. W3C Members participating in the Working Group include BBC, Boeing, Ecole Mohammadia d'Ingénieurs, IBM, Microsoft, Siemens, Sun Microsystems, and webMethods.

About the World Wide Web Consortium [W3C]

The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users, and various prototype and sample applications to demonstrate use of new technology. More than 350 organizations are Members of W3C. To learn more, see


Contact Americas and Australia --
Janet Daly, <>, +1.617.253.5884
Contact Europe, Africa and Middle East --
Marie-Claire Forgue, <>, +33.492.38.75.94
Contact Asia --
Yasuyuki Hirakawa <>, +81.466.49.1170

Related RSS feed