W3C I18N Techniques: Authoring (X)HTML & CSS
This page lists links to resources on the W3C Internationalization Activity site and elsewhere that help you perform particular tasks. It is a sub page of the techniques index. Although the page title refers to 'authoring', it is also relevant to (X)HTML that is produced by scripts and tools, and that may be derived from an authored document in a different format (such as an XML file).
Characters
In this section
- Getting started
- Choosing a character encoding
- Declaring the character encoding in an X/HTML document
- Declaring the character encoding in a CSS style sheet
- Declaring the character encoding on the server
- Using escapes to represent characters
- Changing the encoding of a document
- Using non-ASCII web addresses
Getting started
Background information
-
Character encodings for beginners
What is a character encoding, and why should I care? W3C article.
-
Introducing character sets and encodings
W3C Getting Started article.
-
Unicode, character sets, coded character sets, character encodings, the document character set, and character escapes. Part of a W3C tutorial.
-
Brief overview of encoding declarations. W3C article.
-
W3C article.
-
W3C article.
Choosing a character encoding
How to's
-
W3C best practices document (Characters and Encodings).
-
In W3C tutorial, Character sets & encodings in XHTML, HTML and CSS.
-
Guidelines for the migration of software and data to Unicode. W3C article.
-
Display problems caused by the UTF-8 BOM
When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them? W3C article.
-
Setting encoding in web authoring applications
How do I set character encoding in my web authoring application? W3C article.
Background reading
-
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents? W3C article.
Other references
-
Are corporate Web sites using Unicode right now? W3C article.
See also
Declaring the character encoding in an X/HTML document
How to's
-
Declaring the document encoding
In W3C tutorial, Character sets & encodings in XHTML, HTML and CSS.
Background reading
-
The influence of standards- vs. quirks-mode on character encoding declarations. W3C article.
Declaring the character encoding in a CSS style sheet
How to's
-
CSS character encoding declarations
W3C FAQ-based article.
-
In W3C tutorial, Character sets & encodings in XHTML, HTML and CSS.
Declaring the character encoding on the server
How to's
-
Declaring the document encoding
In W3C tutorial, Character sets & encodings in XHTML, HTML and CSS.
-
Setting the HTTP charset parameter
Notes on the charset HTTP header, server setup, and how to generate headers using scripts. W3C article.
Background reading
-
The influence of standards- vs. quirks-mode on character encoding declarations. W3C article.
Using escapes to represent characters
How to's
-
Using character entities and NCRs
What are character entity and NCR escapes, and when should I use them? W3C article.
Changing the encoding of a document
How to's
-
Changing an (X)HTML page encoding to UTF-8
How do I change the encoding of my (X)HTML pages to UTF-8? W3C FAQ-based article.
-
How do I set character encoding in my web authoring application?
How do I change the encoding of my (X)HTML pages to UTF-8? W3C FAQ-based article.
-
Display problems caused by the UTF-8 BOM
When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them? W3C article.
-
Setting encoding in web authoring applications
How do I set character encoding in my web authoring application? W3C article.
Background reading
-
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents? W3C article.
Checking the encoding of a document
How to's
-
How can I check the character encoding information sent in the HTTP header of a web document? W3C article.
-
Checking the character encoding using the validator
How can I check that the character encoding of my document is correct using the W3C HTML Validator? W3C article.
Links
-
External link .
-
External link .
-
External link .
Using non-ASCII web addresses
Background reading
-
An Introduction to Multilingual Web Addresses
How IDN and IRIs work, aimed at content authors and general users who want to understand the basics without too many gory technical details. W3C article.
Language
In this section
- Getting started
- Using attributes to declare language
- Declaring metadata about the language of the intended audience
- Declaring language for documents aimed at speakers of more than one language
- Choosing language values
- Identifying in-document language changes
- Indicating the language of a link destination
- Using Accept-Language for locale setting
Getting started
Background information
-
W3C Getting Started article.
-
Declaring Language in XHTML and HTML
W3C tutorial.
-
How to choose the right attribute values. W3C article.
-
W3C article.
-
Why use the language attribute?
Why use the language attribute? W3C article.
Using attributes to declare language
Best practices
How to's
-
Declaring Language in XHTML and HTML
W3C tutorial.
-
Using attributes to declare language
In W3C best practices document (Specifying Language in XHTML and HTML Content)
-
Declaring metadata about the language of the intended audience
In W3C best practices document (Specifying Language in XHTML and HTML Content)
Background reading
-
Why use the language attribute?
Why use the language attribute? A number of useful reasons.
Other references
-
Specifying the language of content: the lang attribute
lang attribute definition in HTML 4.01 (in HTML 4.01 spec, section 8.1)
-
xml:lang attribute definition in XML 1.0. (in XML 1.0 spec, section 2.12)
-
Clarify natural language usage
Express natural language in a document. (in Web Content Accessibility Guidelines, Guideline 4)
-
Identifying changes in language
Use lang attribute when language changes in a document (in Web Content Accessibility Techniques for HTML, section 2.1)
-
Identifying the primary language
Use lang attribute on html tag (in Web Content Accessibility Techniques for HTML, section 2.2)
-
The lang and xml:lang Attributes
xml:lang and lang attribute definitions in XHTML 1.0 (section C.7)
-
Content-Language definition in HTTP 1.1 (section 14.12)
-
Specifying the language of content: the lang attribute
HTML on Content-Language, only says that the HTML language attribute has a higher precedence (in HTML 4.01 spec, section 8.1)
Tests
-
Automatic font assignment for CJK text
W3C test page
-
Automatic font assignment for CJK text
W3C test results
-
W3C test page
-
Document-level language declaration
W3C test results
Declaring metadata about the language of the intended audience
Best practices
How to's
-
Declaring Language in XHTML and HTML
W3C tutorial.
-
Declaring metadata about the language of the intended audience
In W3C best practices document (Specifying Language in XHTML and HTML Content)
Background reading
-
Using HTTP and meta for language information
Should I declare the language of my XHTML document using a language attribute, the Content-Language HTTP header, or a Content-Language meta element? W3C article.
Other references
-
Content-Language in the HTTP1.1 specification (section 14.12)
-
Specifying the language of content: the lang attribute
Content-Language in the HTML specification: only says that the html language attribute has a higher precedence (section 8.1)
Declaring language for documents aimed at speakers of more than one language
How to's
-
Mechanisms for declaring language in HTML
In W3C best practices document (Specifying Language in XHTML and HTML Content)
-
Declaring Language in XHTML and HTML
W3C tutorial.
Background reading
-
Why use the language attribute?
A number of useful reasons. W3C article.
-
Using HTTP and meta for language information
Should I declare the language of my XHTML document using a language attribute, the Content-Language HTTP header, or a Content-Language meta element? W3C article.
Other references
-
Content-Language in the HTTP1.1 specification (section 14.12)
Choosing language values
Best practices
How to's
-
W3C best practices document (Specifying Language in XHTML and HTML Content)
-
How to choose the right attribute values. W3C article.
-
Specifying language tag values
W3C tutorial (Declaring Language in XHTML and HTML)
-
How do I use language markup in HTML or XML content when I don't know the language, or the content is non-linguistic? W3C article.
-
Two-letter or three-letter language codes
Should I use two-letter or three-letter ISO language codes in language tags? W3C article.
Particularly useful links
-
IANA's language tag registry.
-
Language Subtag Registry lookup tool
User friendly interface to IANA's language tag registry by Richard Ishida.
-
Points to a document containing both RFC 4646 (Tags for the Identification of Languages) and RFC 4647 (Matching Language Tags)
-
RFC 4646 Tags for the Identification of Languages
The specification.
-
RFC 4647 Matching of Language Tags
The specification.
Other references
-
Specifying the language of content: the lang attribute
lang in the HTML 4.01 spec (section 8.1)
-
xml:lang in the XML spec (section 2.12)
-
RFC 3066 Tags for the Identification of Languages
The previous IETF document that used to define how to use language tags to identify languages, now obsolete.
-
Understanding the New Language Tags
Overview of RFC3066bis by one of its authors. W3C article (now historic).
-
ISO 3166: Codes for Country Names
ISO country codes
-
ISO 639: Codes for the Representation of Names of Languages
ISO language codes
Test data
-
Automatic font assignment for CJK text
W3C test page
-
Automatic font assignment for CJK text
W3C test results
Identifying in-document language changes
Best practices
How to's
-
Using attributes to declare language
In W3C best practices document (Specifying Language in XHTML and HTML Content)
-
Declaring Language in XHTML and HTML
W3C tutorial.
-
Declaring metadata about the language of the intended audience
In W3C best practices document (Specifying Language in XHTML and HTML Content)
Background reading
-
Why use the language attribute?
Why use the language attribute? A number of useful reasons.
Other references
-
Specifying the language of content: the lang attribute
lang attribute definition in HTML 4.01 (in HTML 4.01 spec, section 8.1)
-
xml:lang attribute definition in XML 1.0. (in XML 1.0 spec, section 2.12)
-
Clarify natural language usage
Express natural language in a document. (in Web Content Accessibility Guidelines, Guideline 4)
-
Identifying changes in language
Use lang attribute when language changes in a document (in Web Content Accessibility Techniques for HTML, section 2.1)
-
The lang and xml:lang Attributes
xml:lang and lang attribute definitions in XHTML 1.0 (section C.7)
Indicating the language of a link destination
Best practices
How to's
-
How to indicate the language of a link destination
W3C best practices document (Specifying Language in XHTML & HTML Content)
-
The :before and :after pseudo-elements
:before and :after in the CSS 2.1 spec (section 12.1)
Other references
-
hreflang in the HTML spec (section 12.2)
Test data
-
W3C test page
-
W3C test results
See also
Using Accept-Language for locale setting
How to's
-
Accept-Language used for locale setting
Is it a good idea to use the HTTP Accept-Language header to determine the locale of the user? W3C article.
-
How do I prepare my web pages to display varying international date formats? W3C article.
Markup & text
In this section
Getting started
Background information
-
Quick tips: Presentation vs. content
W3C article.
-
W3C article.
Working with composite strings and string re-use
Best practises
How to's
-
Working with Composite Messages
W3C article.
-
Re-using Strings in Scripted Content
W3C article.
Non-W3C references
-
<Insert Title Here> (or, Variables in Interface Language)
Article by Chris Noessel illustrating a number of examples where composite messages can cause problems.
-
Text Fragmentation and Reuse in User Interfaces
Composite messages and text-reuse in software user interfaces. The basis for articles in this section. (Multilingual Computing article)
-
Globalization Step-by-Step, String Handling
Microsoft-specific article on handling composite messages in Win32 and .NET (Microsoft article)
Using ruby markup
How to's
-
Overview of the Ruby Annotation specification (in W3C tutorial, Ruby Markup and Styling)
Background reading
-
What is 'ruby'? (W3C article)
-
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Other references
-
Ruby Annotation Recommendation
W3C Recommendation
-
W3C Working Draft
-
Ruby Annotation in XHTML 1.1 spec (section 3, bottom of the page)
-
Sample module implementations of the Ruby Annotation Specification in several schemas (W3C Personal Note)
Test data
-
W3C test pages
-
Description of support for ruby markup and styling in browsers (in W3C tutorial, Ruby Markup and Styling)
See also
Using Unicode control codes in text
How to's
-
HTML, XHTML, XML and Control Codes
How do I handle control codes (ie. the 'C0' U+0000-U+001F and 'C1' U+007F-U+009F ranges) in XML, XHTML and HTML? W3C article.
Working around unavailable characters/glyphs
How to's
-
What to do if a Unicode character or font glyph is missing. W3C article.
Text direction
In this section
Getting started
Background information
-
What you need to know about the bidi algorithm and inline markup
W3C article.
-
Quick tips: Right-to-left text
W3C article.
Making bidi localization easier
Best practices
How to's
-
Authoring with localization in mind
In Authoring HTML: Handling Right-to-left Scripts.
Setting up a right-to-left page
Best practices
How to's
-
In tutorial, Creating HTML Pages in Arabic, Hebrew & Other Right-to-Left Scripts.
-
Setting up a right-to-left page
In techniques document, Authoring HTML: Handling Right-to-left Scripts
-
Unicode controls vs. markup for bidi support
To correctly format bidi text in (X)HTML or XML content, should I use Unicode control codes or markup? FAQ-based article.
-
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages? FAQ-based article.
Tests
Changing the direction of a block element
Best practices
How to's
-
How to use the dir attribute. Handling tables. In tutorial, Creating HTML Pages in Arabic, Hebrew & Other Right-to-Left Scripts.
-
Changing the directionality of a block element
In Best Practices for Authoring HTML: Handling Right-to-left Scripts.
-
Unicode controls vs. markup for bidi support
To correctly format bidi text in (X)HTML or XML content, should I use Unicode control codes or markup? FAQ-based article.
-
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages? FAQ-based article.
See also
Mixing text direction inline
Best practices
How to's
-
What you need to know about the bidi algorithm and inline markup
Article.
-
In tutorial, Creating HTML Pages in Arabic, Hebrew & Other Right-to-Left Scripts.
-
In Best Practices for Authoring HTML: Handling Right-to-left Scripts.
-
Why does my browser collapse spaces between Latin and Arabic/Hebrew text? W3C article.
-
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages? FAQ-based article.
-
Unicode controls vs. markup for bidi support
To correctly format bidi text in (X)HTML or XML content, should I use Unicode control codes or markup? FAQ-based article.
-
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do? FAQ-based article.
Tests
See also
Handling parentheses and other mirrored characters
Best practices
How to's
-
In tutorial, Creating HTML Pages in Arabic, Hebrew & Other Right-to-Left Scripts.
-
Handling parentheses & other mirrored characters
In Best Practices for Authoring HTML: Handling Right-to-left Scripts.
Overriding the Unicode bidirectional algorithm
How to's
-
In tutorial, Creating HTML Pages in Arabic, Hebrew & Other Right-to-Left Scripts.
-
Overriding the Unicode bidirectional algorithm
In Best Practices for Authoring HTML: Handling Right-to-left Scripts.
-
In What you need to know about the bidi algorithm and inline markup.
Styling & layout
In this section
Getting started
Background information
-
Quick tips: Presentation vs. content
W3C article.
Preparing for text expansion
How to's
-
Background images that support localization
How can I ensure that when text expands in translation the background images will still work? W3C article.
-
How can I ensure that when text expands in translation the background images will still work? W3C article.
Background reading
-
Overview of text expansion issues. W3C article.
-
Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries? W3C article.
Styling by language
How to's
-
Styling using the lang attribute
Compares :lang, lang |= and lang= selectors, and assesses current usability (W3C article)
Other references
-
The language pseudo-class :lang
CSS2 Specification (section 5.11.4)
-
CSS2 Specification (section 5.8)
-
CSS2 Specification (section 5.8.3)
-
CSS2 Specification (section 5.9)
Test data
-
W3C test page
-
W3C test results
Numbering lists
Other references
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Test data
-
W3C test pages.
-
Test results: list-style-type set to armenian
W3C test results.
-
Test results: list-style-type set to georgian
W3C test results.
-
Test results: list-style-type set to lower-greek
W3C test results.
-
Test results: list-style-type set to armenian
W3C test results.
Creating vertical text
Other references
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Managing line breaks
Other references
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Test data
-
W3C test pages.
-
Test results: Line breaking and spaces
W3C test results.
-
Test results: Line breaking, opening & closing punctuation and non-starters
W3C test results.
Justifying and aligning text
Other references
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Test data
-
W3C test pages.
-
Test results: Line breaking and spaces
W3C test results.
-
Test results: Line breaking, opening & closing punctuation and non-starters
W3C test results.
Styling ruby text
How to's
-
Introduction to styling ruby with CSS3 Ruby Module. In W3C article, Ruby Markup and Styling.
Background reading
-
What is 'ruby'? (W3C article)
-
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Other references
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
W3C Working Draft
See also
Applying various script-specific typographic conventions
Other references
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Forms
In this section
Getting started
Background information
-
W3C article.
Handling encoding issues
How to's
-
What is the best way to deal with encoding issues in forms that may use multiple languages and scripts? W3C article.
Sorting select lists
How to's
-
As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list? W3C article.
Navigation
In this section
Getting started
Background information
-
W3C article.
-
Monolingual vs. multilingual web sites
What are the trade-offs between international sites that are monolingual vs. multilingual? W3C article.
-
International & multilingual web sites
What is an "international" or a "multilingual" web site? W3C article.
Linking to localized content
How to's
-
Using <select> to Link to Localized Content
What are the best practices for using pull-down menus based on the select element to direct visitors to localized content? W3C article.
Using content negotiation
Background information
-
Monolingual vs. multilingual Web sites
W3C article.
How to's
-
When to use language negotiation
Argues that content negotiation is always a good idea, but that it is not sufficient alone. W3C article.
Cultural issues
In this section
Getting started
Background information
-
Quick tips: Images, animations & examples
W3C article.
-
How do I prepare my web pages to display varying international date formats? W3C article.
Working with local data formats
How to's
-
How do I prepare my web pages to display varying international date formats? W3C article.
Troubleshooting
In this section
Blank lines on a page
How to's
-
Display problems caused by the UTF-8 BOM
When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them? W3C article.
Changing the encoding of a page in an editor
How to's
-
Setting encoding in web authoring applications
How do I set character encoding in my web authoring application? W3C article.
See also
Setting & changing browser language preferences
How to's
-
Setting language preferences in a browser
How do I check or change the language settings of my browser? W3C article.
Loss of spacing between RTL and LTR text
How to's
-
Why does my browser collapse spaces between Latin and Arabic/Hebrew text? W3C article.
Checking the encoding of a document
How to's
-
How can I check the character encoding information sent in the HTTP header of a web document? W3C article.
-
Checking the character encoding using the validator
How can I check that the character encoding of my document is correct using the W3C HTML Validator? W3C article.
Links
-
External link .
-
External link .
-
External link .