Internationalization techniques:
Authoring web pages

This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help you author HTML and CSS for internationalization.

You are not expected to read this page from top to bottom. Instead, select topics of interest from the control just below. You can see a list of updates to this document. You can also raise an issue about this page.

Find a topic

Characters

Getting started

Background reading

Choosing and applying a character encoding

See also

This section is specifically about how to choose a character encoding for your content and ensure that the content is in that encoding.

For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.

See also the dedicated section about Changing to UTF-8.

  • Choose UTF-8 for all content. more
  • If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
  • Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Are corporate Web sites using Unicode right now?  This article is somewhat outdated, now that Unicode accounts for around 97% of pages on the Web.

  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Show more links

Changing to UTF-8

See also

This section is specifically about how migrate your content to the UTF-8 (Unicode) encoding. For more general advice see Choosing and applying a character encoding.

For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.

  • Save the data as UTF-8, don't just change the encoding declaration. more
  • Declare the encoding in your page. more
  • Ensure that your server does the right thing. more
How to's
Background reading
  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Declaring the character encoding for HTML

See also

This section is specifically about how declare the character encoding of your HTML page.

For advice about which encoding to choose, see Choosing and applying a character encoding.

For further advice about setting the character encoding on the server, see Setting the HTTP charset parameter and Setting character encoding information using .htaccess.

  • Use the HTTP header if it is available. more
  • Always use an in-document encoding declaration, even if you are also using the HTTP header. more
  • Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.

  • Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.

Show more links

Declaring the character encoding for a CSS style sheet

See also

This section is specifically about how declare the character encoding of your CSS stylesheet.

For advice about which encoding to choose, see Choosing and applying a character encoding.

For further advice about setting the character encoding on the server, see Setting the HTTP charset parameter and Setting character encoding information using .htaccess.

  • If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
  • If you use @charset, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Show more links

Using escapes to represent characters

  • Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
  • Use escapes for invisible or ambiguous characters. more
  • Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
  • Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
  • Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
  • Ensure that all href attribute values have escaped ampersands in query parameters, ie. & rather than just &. more
  • Avoid named character entities in XHTML. more
How to's
Useful reference links
Spec links
Show more links

Checking the encoding of a document

How to's
Useful reference links

Handling the byte-order mark (BOM)

  • If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
  • If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
How to's
Useful reference links
Spec links
Show more links

Handling character normalization

  • Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
How to's
Useful reference links

Handling encoding issues in forms

  • Use UTF-8 for the character encoding of your page. more
  • Consider checking on the server that form data is arriving in UTF-8. more
How to's
  • What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?

Using Unicode control codes

See also

If you represent control codes using character escapes, see also Using escapes to represent characters for more information.

  • Don't use Unicode characters if there is markup to do the same job. more
  • Use character escapes to represent control codes, so that they are visible. more
How to's

Working around unavailable characters/glyphs

How to's

Using non-ASCII web addresses

Useful reference links
Spec links
Background reading

Language

Getting started

Background reading

Declaring the overall language of a page

See also

For detailed advice about how to select the right language tags, see Choosing language values.

See also Declaring metadata about the language of the intended audience.

  • Always declare the default language for text in the page using attributes on the html tag. more
  • Do NOT use the meta element with the content attribute set to Content-Language. more
  • Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
  • Do not declare the default language of a document in the body element, use the html element. more
  • Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the html tag, or leave the languages undefined until later.
  • Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
How to's
Background reading
Spec links
Tests
Show more links

Identifying in-document language changes

See also

See also Declaring the overall language of a page.

For detailed advice about how to select the right language tags, see Choosing language values.

  • When the page contains content in another language, add a language attribute to an element surrounding that content. more
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
  • If the text in attribute values and element content is in different languages, consider using a nested approach. more
How to's
Background reading
Spec links
Show more links

Choosing language tags

  • Use subtags as defined by BCP 47 for language attribute values. more
  • Use the shortest possible language tag values. more
  • Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
  • Use the subtag zxx when the text is known to be not in any language. more
  • When the language is undetermined and you have to label it, use lang="". more
  • If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
How to's
Useful reference links
Spec links
Show more links

Declaring metadata about the language(s) of the intended audience

See also

This section is specifically about setting metadata for the document as an object. For information about declaring the language of the document for text-processing purposes, see Declaring the overall language of a page.

For detailed advice about how to select the right language tags, see Choosing language values.

  • Consider using a Content-Language HTTP header to declare metadata about the language(s) of the intended audience of a document. more
  • Where a document contains content aimed at speakers of more than one language, use the HTTP Content-Language header with a comma-separated list of language tags. more
How to's
Background reading
Spec links
Show more links

Indicating the language of a link destination

See also

For advice about how to select the right language tags, see Choosing language values.

See also Linking to localized content in the Navigation section.

  • When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document. more
  • If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using hreflang with CSS. more
  • Do not use flag icons to indicate languages. more
How to's
Spec links
Show more links

Setting & changing browser language preferences

How to's

Using Accept-Language for locale setting

How to's

Markup & text

Getting started

How to's

Using b and i tags

  • Use the class attribute on a b or i element to identify why the element is being used. more
  • Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
How to's
Spec links

Using ruby markup

See also

This section is specifically about how to use markup for ruby annotations. For information about styling ruby see Styling ruby text.

How to's
  • Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.

Useful reference links
Background reading
  • What are 'ruby' annotations?

  • A summary of how bopomofo is used and the implications for support on the Web.

  • Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.

  • Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Spec links
Tests
Show more links

Working with form controls

See also

In the Characters section see Handling encoding issues in forms.

In the section Text Direction see Managing text direction in form controls.

In the section Styling & Layout see Working with names and Working with date formats.

How to's
  • As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?

Working with strings in JavaScript & databases

  • Use a topic-comment approach whenever possible. more
  • Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
  • Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
  • Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
  • Provide information to the translator, where needed, to clarify what a substring represents. more
  • When requested by the localization group, be prepared to provide information about the size of each substring. more
  • Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
  • Reused strings must not refer to more than one text, graphic or conceptual context. more
  • If in doubt as to whether a string is a good candidate for re-use, don't. more
  • If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
  • Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.

  • Things to be aware of if you plan to use the same text string in different places on your site or user interface.

Useful reference links

Indicating what should and should not be translated

  • Use the translate attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
How to's
Spec links
Tests
Show more links

Styling & layout

Getting started

Preparing for text expansion during translation

  • Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
How to's
Background reading
  • Overview of text expansion issues.

  • Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?

  • Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)

Styling by language

See also

Related sections include Using attributes to declare language, and Choosing language values.

  • Use :lang to set language-specific styling. more
How to's
Spec links
Tests

Using logical property styles

  • Use CSS logical properties wherever possible, so as to facilitate localization into right-to-left and vertically-set scripts.
How to's
Spec links
Tests

Styling counters for lists, etc.

  • Use the CSS @counter-style rule to define or modify counters used for list markers, figure numbering, chapter headings, etc..
  • Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the default for some scripts/languages..
How to's
  • How to define your own counter styles when the pre-defined styles aren't fitting your needs.

  • Cut-and-paste code snippets for a large number of international counter styles that can be used for ordered lists and other such counters.

Useful reference links
Spec links
Tests
Show more links

Managing line breaks

See also

Hyphenation affects line-breaking, but has it's own section here. Line-breaking behaviour is also closely associated with justification. For the latter, see Justifying & aligning text.

  • Since default line-breaking rules vary by language, always correctly label your content for language. more
How to's
  • Specifies whether or not the browser should insert line breaks wherever the text would otherwise overflow its content box due to a lack of spaces. Particularly useful for Chinese, and Japanese. Values include break-all and keep-all.

  • For Chinese, Japanese, or Korean (CJK), specifies how (or if) to break lines when working with punctuation and symbols. Values include strict, normal, loose, and anywhere.

Useful reference links
Background reading
  • High level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.

Spec links
Tests
Show more links

Hyphenation

See also

This section is specifically about hyphenation. For more general information about line breaking see Managing line breaks.

  • Since CSS hyphenation only works if content is labelled for language, always do that. Since hyphenation rules are language-specific, ensure that the language is labelled correctly.
How to's
Useful reference links
Spec links
Tests

Justifying and aligning text

See also

Justification behaviour is closely associated with line-breaking and hyphenation. For more information on those topics, see Managing line breaks.

  • Wherever possible use start and end values for the CSS text-align property, rather than left and right. Only use left and right on the rare occasions when the alignment has to remain as is, regardless of language. more
  • Only use text-align when you really need to override the alignment produced by the current base direction. Don't litter your markup or stylesheet with unnecessary alignment calls.
  • Avoid using HTML attributes with values of left and right. Instead add selectors to your CSS stylesheet. This allows you to use logical properties, but also makes it much easier to change things during localisation.
  • Use CSS property names that include the words 'start' and 'end', rather than 'left', 'right', 'top', and 'bottom'. Eg. margin-inline-start and margin-block-start. more
  • Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the preferred default for some scripts/languages.
  • Since justification rules vary by language, always correctly label your content for language. more
How to's
  • Specifies the horizontal alignment of an inline or table-cell box, including the value justify, which is used to turn on justification.

  • Defines what type of justification should be applied to text when it is justified (ie. when text-align:justify is set). Values include inter-word and inter-character.

Useful reference links
Background reading
  • High level summary of various typographic strategies for fully justifying text on a line and in a paragraph for a variety of scripts, and some advice for authors and implementers.

Spec links
Tests
Show more links

Creating vertical text

How to's
Spec links
  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Styling ruby text

See also

This section is specifically about styling ruby text. For more information about markup for ruby see Using ruby markup.

How to's
  • Discusses how to use CSS styling to affect the rendering of ruby content.

  • Defines the distribution of the different ruby elements over the base.

Useful reference links
Background reading
  • What is 'ruby'?

  • Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Spec links
Tests
  • Includes tests for ruby-position, ruby-align, ruby-merge, and ruby autohide

Show more links

Applying various script-specific typographic conventions

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Using fonts & webfonts

How to's

Working with date formats

How to's
  • How do I prepare my web pages to display varying international date formats?

Working with personal names

  • Ask yourself whether you really need to have separate fields for given name and family name. more
  • Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
  • Avoid limiting the field size for names in your database. more
  • Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
  • Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
  • Ask separately, when setting up a profile for example, how that person would like you to address them. more
  • If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
  • Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
  • Be as clear as possible about telling people how to specify their name. more
  • Don't assume that a single letter name is an initial. more
  • Don't require that people supply a family name. more
  • Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
  • Don't require names to be entered all in upper case. more
  • Allow the user to enter a name with spaces. more
  • Don't assume that members of the same family will share the same family name. more
  • It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
  • If you hope to get Latin- or ASCII-only, you need to tell the user. more
  • You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
  • If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
  • How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?

Bidirectional text

Getting started

How to's

Setting up a right-to-left page

See also

This section is about setting up the default direction for a whole page. For information about working with text direction changes inside the document see Changing the direction of a block element and Mixing text direction inline.

  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
  • Add dir="rtl" to the html tag any time the overall document direction is right-to-left. more
  • Don't add dir="rtl" to the body tag. more
  • If you need to avoid the scroll bar moving on some browsers, put dir on the head element and a div just inside the body element. more
  • Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
  • If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
How to's
Spec links
Tests

Setting direction on block elements

See also

For information about setting up the default direction for a whole page see Setting up a right-to-left page.

See also Managing direction in form controls.

  • Add the dir attribute to a block element to change base direction. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
How to's
Spec links
Tests

Managing text direction in form controls

  • Add dir="auto" to input tags to automatically align text to the correct side of an input field. more
  • Add dir="auto" to textarea and pre tags to make paragraphs align to the left or right according to the intial strong character more
  • Consider using the dirname attribute to pass information to the server about the direction of text in a text or search form control. more
How to's
Spec links
Tests

Mixing text direction inline

  • Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
  • If you know the phrase's direction, wrap it in an element with a dir attribute. If you don't already have an element around the text, use span or bdi. more
  • If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in bdi (no dir attribute needed), or if the phrase is tightly wrapped by an element already, just add dir="auto" to that element. more
  • To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
  • Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup. more
  • Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
  • Do not leave white space at the end of inline elements that mark a directional boundary. more
How to's
Spec links
Tests

Handling parentheses and other mirrored characters

  • Treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'. more
How to's

Overriding the Unicode bidirectional algorithm

  • Use the bdo element to force the directionality of a sequence of inline characters. more
How to's
Spec links
Tests

You can link to this page and open specific items by using the open parameter in the URL. For example, authoring-html.en?open=language&open=langvalues will automatically open the sections Language and Choosing language tags. The necessary parameter values are shown to the right of each heading. These are links, to help you create a URL for sharing. The query ?open=all expands all sections.

.