Accesskey n skips to in-page navigation. Skip to the content start.

s_gotoW3cHome Internationalization
 

Language on the Web

Intended audience: anyone who is new to internationalization and needs guidance on topics to consider and ways to get into the material on the site.

Updated 2009-05-01 12:47

This page provides some orientation for newcomers to Web internationalization who don't really know where to start. The aim is to ease you gently into some of the material on the site.

You can find a selection of more detailed articles using the links to the right. Once you get some ideas from this page, you will probably just use the topic index, the techniques index, or the site search.

What's it about?

Learn more...

Why declare language? discusses why, in a little more detail, content authors and developers should declare language information.

HTML and XML-based formats allow you to declare the natural language (ie. human rather than program language) of a document or a range of text so that tools and applications can use that information for language-sensitive tasks. These include things such as applying appropriate fonts or other styling, switching voice in text to speech, spell-checking, etc.

The more content is tagged and tagged correctly with language information, the more useful and pervasive such applications will become.

It is also possible to use language information provided by a browser or user agent to ensure that, where a choice exists, users receive data in their preferred language.


Declaring the language

Content authors need to know how to declare the language of a document or range of text in the Web technology they are dealing with. Most XML-based formats, such as XHTML, SVG, SSML, etc. will make use of the xml:lang attribute defined by the XML specification, but there may be other markup, such as the lang attribute in HTML.

Content authors should consider whether you need to make a difference between declaring the language of a document or range of text so that tools can process it, and declaring the intended audience of a resource (ie. metadata). You may need to apply slightly different approaches for these in (X)HTML, for example. You should also avoid confusing language declarations with script and character encoding declarations.

Content authors and webmasters also need to know how to use values for languages in a standard way. The current standard approach for W3C specifications is to use the rules expressed in BCP 47. This replaces earlier specifications such as RFC 3066 and RFC 1766, and goes beyond information available in the ISO language and country standards. You should also use the IANA Language Subtag Registry to look up language tags, rather than the ISO specifications.

Web masters or people working with server settings may also need to set up the server so that it sends language-related HTTP information with requested files. This is typically done as part of content-negotiation, where the server sends the user one of several alternative versions of a document depending on the settings of the users browser.


Navigating around sites using language information

When an HTTP request is made to a server, the requesting user agent usually sends information about language preferences. The server can then use this information to return a version of the document in the appropriate language if such an alternative is available.

End-users should know how to check that their language preferences are correctly set, and how to change them if not.

Webmasters should know how to set up their server to manage language-based content negotiation.

Web designers and developers dealing with multilingual sites should consider how to guide visitors to the right resources.


Using language information for styling

On most browsers, you can use CSS selectors to apply different styling according to the language of a range of text. For example, within an English document you can assign embedded Thai text a particular font and appropriate line height adjustments anywhere it appears, just by the fact that the content is labeled as Thai.

In CSS, style sheet developers can use selectors with the content property, where it is supported, to automatically indicate the language of a link target.


Designing language information into markup formats

Schema or specification developers should consider whether the format they are developing includes markup to enable authors to identify the document's main language and any changes in language within the document.

Schema or specification developers should also be clear when it is appropriate to use xml:lang in XML-based formats, and when they should create a different attribute or element to specify language information.

Learn more...

Schema developers
Defining markup for labelling natural language

By: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2007-03-08. Last substantive update 2009-05-01 12:47 GMT. This version 2011-10-10 10:37 GMT

For the history of document changes, search for gs-language in the i18n blog.