W3C W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!


W3C W3C resource, approved by consensus. (Includes Working Drafts.)

Non-W3C Non-W3C resource or W3C resource that is not yet approved by consensus.

Non-W3C Recommended technique.

W3C I18N Techniques:
Developing specifications

This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help specification developers take account of internationalization issues while developing their spec. It is a subpage of the techniques index.

On this page: CharactersText directionResource identifiersDate & time


Choosing a definition of 'character'

How to's

Defining a Reference Processing Model

    • Specifications MUST define text in terms of Unicode characters, not bytes or glyphs.
    • For their textual data objects specifications MAY allow use of any character encoding which can be transcoded to a Unicode encoding form.
    • Specifications MAY choose to disallow or deprecate some character encodings and to make others mandatory. Independent of the actual character encoding, the specified behavior MUST be the same as if the processing happened as follows:
      • The character encoding of any textual data object received by the application implementing the specification MUST be determined and the data object MUST be interpreted as a sequence of Unicode characters - this MUST be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form.
      • All processing MUST take place on this sequence of Unicode characters.
      • If text is output by the application, the sequence of Unicode characters MUST be encoded using a character encoding chosen among those allowed by the specification.
    • If a specification is such that multiple textual data objects are involved (such as an XML document referring to external parsed entities), it MAY choose to allow these data objects to be in different character encodings. In all cases, the Reference Processing Model MUST be applied to all textual data objects.
How to's

Choosing character encodings

How to's
Background reading
  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents? W3C article.

Indexing strings

How to's

Text direction

Defining markup that supports bidirectional text

How to's

Resource identifiers

Date & time

Choosing a date format

How to's

Contact: Richard Ishida (ishida@w3.org).

Content last changed 2013-02-06 16:23 GMT