Internationalization techniques:
Authoring HTML & CSS

This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help you author HTML and CSS for internationalization. It is one of several techniques pages.

You can see a list of updates to this document. You can also raise an issue about this page.

Characters
Getting started
Background reading
Choosing and applying a character encoding
See also

This section is specifically about how to choose a character encoding for your content and ensure that the content is in that encoding.

For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.

See also the dedicated section about Changing to UTF-8.

  • Choose UTF-8 for all content. more
  • If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
  • Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Are corporate Web sites using Unicode right now?  This article is somewhat outdated, now that Unicode accounts for around 80% of pages on the Web.

  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Show more links
Changing to UTF-8
See also

This section is specifically about how migrate your content to the UTF-8 (Unicode) encoding. For more general advice see Choosing and applying a character encoding.

For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.

  • Save the data as UTF-8, don't just change the encoding declaration. more
  • Declare the encoding in your page. more
  • Ensure that your server does the right thing. more
How to's
Background reading
  • What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Declaring the character encoding for HTML
See also

This section is specifically about how declare the character encoding of your HTML page.

For advice about which encoding to choose, see Choosing and applying a character encoding.

For further advice about setting the character encoding on the server, see Setting the HTTP charset parameter and Setting character encoding information using .htaccess.

  • Use the HTTP header if it is available. more
  • Always use an in-document encoding declaration, even if you are also using the HTTP header. more
  • Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.

  • Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.

Show more links
Declaring the character encoding for a CSS style sheet
See also

This section is specifically about how declare the character encoding of your CSS stylesheet.

For advice about which encoding to choose, see Choosing and applying a character encoding.

For further advice about setting the character encoding on the server, see Setting the HTTP charset parameter and Setting character encoding information using .htaccess.

  • If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
  • If you use @charset, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
How to's
Useful reference links
  • If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Show more links
Using escapes to represent characters
  • Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
  • Use escapes for invisible or ambiguous characters. more
  • Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
  • Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
  • Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
  • Ensure that all href attribute values have escaped ampersands in query parameters, ie. & rather than just &. more
  • Avoid named character entities in XHTML. more
How to's
Useful reference links
Spec links
Show more links
Checking the encoding of a document
How to's
Useful reference links
Handling the byte-order mark (BOM)
  • If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
  • If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
How to's
Useful reference links
Spec links
Show more links
Handling character normalization
  • Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
How to's
Useful reference links
Handling encoding issues in forms
  • Use UTF-8 for the character encoding of your page. more
  • Consider checking on the server that form data is arriving in UTF-8. more
How to's
  • What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?

Using Unicode control codes
See also

If you represent control codes using character escapes, see also Using escapes to represent characters for more information.

  • Don't use Unicode characters if there is markup to do the same job. more
  • Use character escapes to represent control codes, so that they are visible. more
How to's
Working around unavailable characters/glyphs
How to's
Using non-ASCII web addresses
Useful reference links
Spec links
Background reading
Language
Getting started
Background reading
Declaring the overall language of a page
See also

For detailed advice about how to select the right language tags, see Choosing language values.

See also Declaring metadata about the language of the intended audience.

  • Always declare the default language for text in the page using attributes on the html tag. more
  • Do NOT use the meta element with the content attribute set to Content-Language. more
  • Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
  • Do not declare the default language of a document in the body element, use the html element. more
  • Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the html tag, or leave the languages undefined until later.
  • Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
How to's
Background reading
Spec links
Show more links
Identifying in-document language changes
See also

See also Declaring the overall language of a page.

For detailed advice about how to select the right language tags, see Choosing language values.

  • When the page contains content in another language, add a language attribute to an element surrounding that content. more
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
  • If the text in attribute values and element content is in different languages, consider using a nested approach. more
How to's
Background reading
Spec links
Show more links
Choosing language tags
  • Use subtags as defined by BCP 47 for language attribute values. more
  • Use the shortest possible language tag values. more
  • Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
  • Use the subtag zxx when the text is known to be not in any language. more
  • When the language is undetermined and you have to label it, use lang="". more
  • If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
How to's
Useful reference links
Spec links
Show more links
Declaring metadata about the language(s) of the intended audience
See also

This section is specifically about setting metadata for the document as an object. For information about declaring the language of the document for text-processing purposes, see Declaring the overall language of a page.

For detailed advice about how to select the right language tags, see Choosing language values.

  • Consider using a Content-Language HTTP header to declare metadata about the language(s) of the intended audience of a document. more
  • Where a document contains content aimed at speakers of more than one language, use the HTTP Content-Language header with a comma-separated list of language tags. more
How to's
Spec links
Show more links
Indicating the language of a link destination
See also

For advice about how to select the right language tags, see Choosing language values.

See also Linking to localized content in the Navigation section.

  • When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document. more
  • If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using hreflang with CSS. more
  • Do not use flag icons to indicate languages. more
How to's
Spec links
Show more links
Setting & changing browser language preferences
How to's
Using Accept-Language for locale setting
How to's
Markup & text
Getting started
How to's
Working with composite strings and string re-use
  • Use a topic-comment approach whenever possible. more
  • Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
  • Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
  • Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
  • Provide information to the translator, where needed, to clarify what a substring represents. more
  • When requested by the localization group, be prepared to provide information about the size of each substring. more
  • Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
  • Reused strings must not refer to more than one text, graphic or conceptual context. more
  • If in doubt as to whether a string is a good candidate for re-use, don't. more
  • If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
  • Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.

  • Things to be aware of if you plan to use the same text string in different places on your site or user interface.

Useful reference links
Using ruby markup
See also

This section is specifically about how to use markup for ruby annotations. For information about styling ruby see Styling ruby text.

How to's
  • Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.

Background reading
  • What are 'ruby' annotations?

  • Mainly based on a standard for Japanese layout, JIS X 4051, however, it also addresses areas which are not covered by JIS X 4051.

  • A summary of how bopomofo is used and the implications for support on the Web.

  • Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.

  • Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Spec links
Show more links
Using b and i tags
  • Use the class attribute on a b or i element to identify why the element is being used. more
  • Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
How to's
Spec links
Working with form controls
See also

In the Characters section see Handling encoding issues in forms.

In the section Text Direction see Managing text direction in form controls.

In the section Styling & Layout see Working with names and Working with date formats.

How to's
  • As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?

Indicating what should and should not be translated
  • Use the translate attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
How to's
Spec links
Text direction
Getting started
How to's
Setting up a right-to-left page
See also

This section is about setting up the default direction for a whole page. For information about working with text direction changes inside the document see Changing the direction of a block element and Mixing text direction inline.

  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
  • Add dir="rtl" to the html tag any time the overall document direction is right-to-left. more
  • Don't add dir="rtl" to the body tag. more
  • If you need to avoid the scroll bar moving on some browsers, put dir on the head element and a div just inside the body element. more
  • Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
  • If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
How to's
Setting direction on block elements
See also

For information about setting up the default direction for a whole page see Setting up a right-to-left page.

See also Managing direction in form controls.

  • Add the dir attribute to a block element to change base direction. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
How to's
Managing text direction in form controls
  • Add dir="auto" to input tags to automatically align text to the correct side of an input field. more
  • Add dir="auto" to textarea and pre tags to make paragraphs align to the left or right according to the intial strong character more
  • Consider using the dirname attribute to pass information to the server about the direction of text in a text or search form control. more
How to's
Mixing text direction inline
  • Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
  • If you know the phrase's direction, wrap it in an element with a dir attribute. If you don't already have an element around the text, use span or bdi. more
  • If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in bdi (no dir attribute needed), or if the phrase is tightly wrapped by an element already, just add dir="auto" to that element. more
  • To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
  • Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup. more
  • Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
  • Do not leave white space at the end of inline elements that mark a directional boundary. more
How to's
Handling parentheses and other mirrored characters
  • Treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'. more
How to's
Overriding the Unicode bidirectional algorithm
  • Use the bdo element to force the directionality of a sequence of inline characters. more
How to's
Creating vertical text
How to's
  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Styling & layout
Getting started
Preparing for text expansion during translation
  • Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
How to's
Background reading
  • Overview of text expansion issues.

  • Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?

  • Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)

Styling by language
See also

Related sections include Using attributes to declare language, and Choosing language values.

  • Use :lang to set language-specific styling. more
How to's
Styling lists
How to's
  • Provides cut and paste templates for a large number of international counter styles that can be used for ordered lists and other such counters.

  • Allows you to convert ASCII numbers into other representations that can be used for ordered list counters, headings, etc, using the algorithms described by CSS3 Counter Styles.

  • Preview of upcoming proposals for CSS3 written in 2003.

Managing line breaks
How to's
  • Mozilla documentation describing how to use the CSS hyphens property. Also includes a table of supported languages in browsers.

Justifying and aligning text
How to's
  • Mozilla documentation describing how to use the CSS hyphens property. Also includes a table of supported languages in browsers.

Styling ruby text
See also

This section is specifically about styling ruby text. For more information about markup for ruby see Using ruby markup.

How to's
  • Introduction to styling ruby with CSS3 Ruby Module. In W3C article, Ruby Markup and Styling.

Spec links
Background reading
  • What is 'ruby'?

  • Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Applying various script-specific typographic conventions
  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Using fonts & webfonts
How to's
Working with date formats
How to's
  • How do I prepare my web pages to display varying international date formats?

Working with personal names
  • Ask yourself whether you really need to have separate fields for given name and family name. more
  • Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
  • Avoid limiting the field size for names in your database. more
  • Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
  • Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
  • Ask separately, when setting up a profile for example, how that person would like you to address them. more
  • If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
  • Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
  • Be as clear as possible about telling people how to specify their name. more
  • Don't assume that a single letter name is an initial. more
  • Don't require that people supply a family name. more
  • Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
  • Don't require names to be entered all in upper case. more
  • Allow the user to enter a name with spaces. more
  • Don't assume that members of the same family will share the same family name. more
  • It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
  • If you hope to get Latin- or ASCII-only, you need to tell the user. more
  • You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
  • If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
  • How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?

You can link to this page and open specific items by using the open parameter in the URL. For example, authoring-html.en?open=language&open=langvalues will automatically open the sections Language and Choosing language tags. The necessary parameter values are shown to the right of each heading.