Internationalization techniques:
Authoring HTML & CSS

This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help you author HTML and CSS for internationalization. It is one of several techniques pages.

You can see a list of updates to this document. You can also leave a comment about this page.

Characters

Getting started

Background reading

Choosing and applying a character encoding

Best practices checklist
  • Choose UTF-8 for all content. more
  • If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
  • Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
See also

This section is specifically about how to choose a character encoding for your content and ensure that the content is in that encoding.

For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.

See also the dedicated section about Changing to UTF-8.

How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Are corporate Web sites using Unicode right now?  This article is somewhat outdated, now that Unicode accounts for around 80% of pages on the Web.

  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Changing to UTF-8

Best practices checklist
  • Save the data as UTF-8, don't just change the encoding declaration. more
  • Declare the encoding in your page. more
  • Ensure that your server does the right thing. more
See also

This section is specifically about how migrate your content to the UTF-8 (Unicode) encoding. For more general advice see Choosing and applying a character encoding.

For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.

How to's
Background reading
  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Declaring the character encoding for HTML

Best practices checklist
  • Use the HTTP header if it is available. more
  • Always use an in-document encoding declaration, even if you are also using the HTTP header. more
  • Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
See also

This section is specifically about how declare the character encoding of your HTML page.

For advice about which encoding to choose, see Choosing and applying a character encoding.

For further advice about setting the character encoding on the server, see Setting the HTTP charset parameter and Setting character encoding information using .htaccess.

How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.

  • Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.

Declaring the character encoding for a CSS style sheet

Best practices checklist
  • If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
  • If you use @charset, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
See also

This section is specifically about how declare the character encoding of your CSS stylesheet.

For advice about which encoding to choose, see Choosing and applying a character encoding.

For further advice about setting the character encoding on the server, see Setting the HTTP charset parameter and Setting character encoding information using .htaccess.

How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links

Using escapes to represent characters

Best practices checklist
  • Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
  • Use escapes for invisible or ambiguous characters. more
  • Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
  • Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
  • Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
  • Ensure that all href attribute values have escaped ampersands in query parameters, ie. & rather than just &. more
  • Avoid named character entities in XHTML. more
How to's
Useful reference links
Spec links

Checking the encoding of a document

How to's
Useful reference links

Handling the byte-order mark (BOM)

Best practices checklist
  • If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
  • If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
How to's
Useful reference links
Spec links

Handling character normalization

Best practices checklist
  • Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
How to's
Useful reference links

Handling encoding issues in forms

Best practices checklist
  • Use UTF-8 for the character encoding of your page. more
  • Consider checking on the server that form data is arriving in UTF-8. more
How to's
  • What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?

Using Unicode control codes

Best practices checklist
  • Don't use Unicode characters if there is markup to do the same job. more
  • Use character escapes to represent control codes, so that they are visible. more
See also

If you represent control codes using character escapes, see also Using escapes to represent characters for more information.

How to's

Working around unavailable characters/glyphs

How to's

Using non-ASCII web addresses

Useful reference links
Spec links
Background reading

Language

Getting started

Background reading

Declaring the overall language of a page

Best practices checklist
  • Always declare the default language for text in the page using attributes on the html tag. more
  • Do NOT use the meta element with the content attribute set to Content-Language. more
  • Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
  • Do not declare the default language of a document in the body element, use the html element. more
  • Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the html tag, or leave the languages undefined until later.
  • Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
See also

For detailed advice about how to select the right language tags, see Choosing language values.

See also Declaring metadata about the language of the intended audience.

How to's
Background reading
Spec links
Tests

Identifying in-document language changes

Best practices checklist
  • When the page contains content in another language, add a language attribute to an element surrounding that content. more
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
  • If the text in attribute values and element content is in different languages, consider using a nested approach. more
See also

See also Declaring the overall language of a page.

For detailed advice about how to select the right language tags, see Choosing language values.

How to's
Background reading
Spec links

Choosing language tags

Best practices checklist
  • Use subtags as defined by BCP 47 for language attribute values. more
  • Use the shortest possible language tag values. more
  • Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
  • Use the subtag zxx when the text is known to be not in any language. more
  • When the language is undetermined and you have to label it, use lang="". more
  • If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
How to's
Useful reference links
Spec links

Declaring metadata about the language(s) of the intended audience

Best practices checklist
  • Consider using a Content-Language HTTP header to declare metadata about the language(s) of the intended audience of a document. more
  • Where a document contains content aimed at speakers of more than one language, use the HTTP Content-Language header with a comma-separated list of language tags. more
See also

This section is specifically about setting metadata for the document as an object. For information about declaring the language of the document for text-processing purposes, see Declaring the overall language of a page.

For detailed advice about how to select the right language tags, see Choosing language values.

How to's
Spec links

Indicating the language of a link destination

Best practices checklist
  • When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document. more
  • If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using hreflang with CSS. more
  • Do not use flag icons to indicate languages. more
See also

For advice about how to select the right language tags, see Choosing language values.

See also Linking to localized content in the Navigation section.

How to's
Spec links

Setting & changing browser language preferences

How to's

Using Accept-Language for locale setting

How to's

Markup & text

Getting started

How to's

Working with composite strings and string re-use

Best practices checklist
  • Use a topic-comment approach whenever possible. more
  • Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
  • Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
  • Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
  • Provide information to the translator, where needed, to clarify what a substring represents. more
  • When requested by the localization group, be prepared to provide information about the size of each substring. more
  • Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
  • Reused strings must not refer to more than one text, graphic or conceptual context. more
  • If in doubt as to whether a string is a good candidate for re-use, don't. more
  • If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
  • Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.

  • Things to be aware of if you plan to use the same text string in different places on your site or user interface.

Useful reference links

Using ruby markup

See also

This section is specifically about how to use markup for ruby annotations. For information about styling ruby see Styling ruby text.

How to's
  • Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.

Background reading
  • What are 'ruby' annotations?

  • Mainly based on a standard for Japanese layout, JIS X 4051, however, it also addresses areas which are not covered by JIS X 4051.

  • A summary of how bopomofo is used and the implications for support on the Web.

  • Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.

  • Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Spec links

Using b and i tags

Best practices checklist
  • Use the class attribute on a b or i element to identify why the element is being used. more
  • Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
How to's
Spec links

Working with form controls

See also

In the Characters section see Handling encoding issues in forms.

In the section Text Direction see Managing text direction in form controls.

In the section Styling & Layout see Working with names and Working with date formats.

How to's
  • As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?

Indicating what should and should not be translated

Best practices checklist
  • Use the translate attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
How to's
Spec links

Text direction

Getting started

How to's

Setting up a right-to-left page

Best practices checklist
  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
  • Add dir="rtl" to the html tag any time the overall document direction is right-to-left. more
  • Don't add dir="rtl" to the body tag. more
  • If you need to avoid the scroll bar moving on some browsers, put dir on the head element and a div just inside the body element. more
  • Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
  • If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
See also

This section is about setting up the default direction for a whole page. For information about working with text direction changes inside the document see Changing the direction of a block element and Mixing text direction inline.

How to's
Spec links
Tests

Setting direction on block elements

Best practices checklist
  • Add the dir attribute to a block element to change base direction. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
See also

For information about setting up the default direction for a whole page see Setting up a right-to-left page.

See also Managing direction in form controls.

How to's
Spec links
Tests

Managing text direction in form controls

Best practices checklist
  • Add dir="auto" to input tags to automatically align text to the correct side of an input field. more
  • Add dir="auto" to textarea and pre tags to make paragraphs align to the left or right according to the intial strong character more
  • Consider using the dirname attribute to pass information to the server about the direction of text in a text or search form control. more
How to's
Spec links
Tests

Mixing text direction inline

Best practices checklist
  • Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
  • HTML4: If you know the phrase's direction, or can work it out for injected text, use the dir attribute to set the direction of the phrase. more
  • HTML4: If the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number or a logically separate opposite-direction phrase, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
  • HTML5: If you know the phrase's direction, or can work it out for injected text, wrap the phrase in a bdi element and add a dir attribute with rtl or ltr. more
  • HTML5: If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in bdi (no dir attribute needed), or if the phrase is tightly wrapped by an element already, just add dir="auto" to that element. more
  • Use Unicode control characters for bidirectional control only for attribute text or element text that allows no internal markup. more
  • Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
  • Do not leave white space at the end of inline elements that mark a directional boundary. more
How to's
Spec links
Tests

Handling parentheses and other mirrored characters

Best practices checklist
  • Treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'. more
How to's

Overriding the Unicode bidirectional algorithm

Best practices checklist
  • Use the bdo element to force the directionality of a sequence of inline characters. more
How to's
Spec links
Tests

Styling & layout

Preparing for text expansion during translation

Best practices checklist
  • Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
How to's
Background reading
  • Overview of text expansion issues.

  • Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?

  • Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)

Styling by language

Best practices checklist
  • Use :lang to set language-specific styling. more
See also

Related sections include Using attributes to declare language, and Choosing language values.

How to's
Spec links
Tests

Styling lists

How to's
  • Provides cut and paste templates for a large number of international counter styles that can be used for ordered lists and other such counters.

Spec links
  • Allows you to convert ASCII numbers into other representations that can be used for ordered list counters, headings, etc, using the algorithms described by CSS3 Counter Styles.

  • Preview of upcoming proposals for CSS3 written in 2003.

Tests

Creating vertical text

Spec links
  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Managing line breaks

How to's
  • Mozilla documentation describing how to use the CSS hyphens property. Also includes a table of supported languages in browsers.

Spec links
Tests

Justifying and aligning text

How to's
  • Mozilla documentation describing how to use the CSS hyphens property. Also includes a table of supported languages in browsers.

Spec links

Styling ruby text

See also

This section is specifically about styling ruby text. For more information about markup for ruby see Using ruby markup.

How to's
  • Introduction to styling ruby with CSS3 Ruby Module. In W3C article, Ruby Markup and Styling.

Spec links
Background reading
  • What is 'ruby'?

  • Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Applying various script-specific typographic conventions

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Using fonts & webfonts

How to's

Working with date formats

How to's
  • How do I prepare my web pages to display varying international date formats?

Working with personal names

Best practices checklist
  • Ask yourself whether you really need to have separate fields for given name and family name. more
  • Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
  • Avoid limiting the field size for names in your database. more
  • Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
  • Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
  • Ask separately, when setting up a profile for example, how that person would like you to address them. more
  • If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
  • Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
  • Be as clear as possible about telling people how to specify their name. more
  • Don't assume that a single letter name is an initial. more
  • Don't require that people supply a family name. more
  • Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
  • Don't require names to be entered all in upper case. more
  • Allow the user to enter a name with spaces. more
  • Don't assume that members of the same family will share the same family name. more
  • It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
  • If you hope to get Latin- or ASCII-only, you need to tell the user. more
  • You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
  • If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
  • How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?

Getting started

Background reading

Linking to localized content

Best practices checklist
  • Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
  • Consider how to indicate to the user where the in-page language links are, and if the page is available in a long list of languages, consider whether or not to use something like a select control (and if so, how to make it obvious what its function is). more
  • Locate pull-down menus or selection lists at or near the top of the page. more
  • Use a recognizable image alongside a pull-down menu to indicate that it is a control which will take the user to localized pages. Do not use text. more
  • Consider using the size attribute to display the first set of options in a select control. more
  • Translate the links or options into the target language. more
  • Encode your page as UTF-8, so that it supports the necessary characters. more
  • Decide whether it is a problem that a user won't have fonts for all the list items or menu options. If it is, use javascript menus or some other graphic-based approach. more
  • Decide whether to add a description alongside each option, using the language of the current page, so that users can tell what the native word means. more
  • Find the most appropriate way of ordering the list of options. more
See also

See also Indicating the language of a link destination in the Language section.

How to's
  • If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language?

  • What are the best practices for using pull-down menus based on the select element to direct visitors to localized content?

  • On some Web pages you’ll find country flags as symbols for languages. This article explains why this approach is problematic, and what you should do instead.

Using content negotiation

Best practices checklist
  • Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
  • If the user switches to a different language, offer them the opportunity to remember that choice and serve up subsequent pages in that language, overriding their browser settings. more
See also

See also Indicating the language of a link destination in the Language section.

Also the Language section in the Server Setup page.

How to's
Background reading