Accesskey n skips to in-page navigation. Skip to the content start.

Go to W3C Home PageGo to Architecture Domain home page  Internationalization 
 

Language on the Web

Intended audience: anyone who is new to internationalization and needs guidance on topics to consider and ways to get into the material on the site.

This page provides some orientation for newcomers to Web internationalization who don't really know where to start.

By listing a number of articles we have created relative to a particular topic area, we hope to help you see how things fit together, and give you a starting point for further exploration of the topic and the other related articles on the site.

We also try to indicate, in broad brush stokes, who would be interested in what aspects of the topic.

After reading these resources, you can find more detailed information using the topic index, techniques index or the search box on this page.

What's it about?

HTML and XML-based formats allow you to declare the natural language (ie. human rather than program language) of a document or a range of text so that tools and applications can use that information for language-sensitive tasks. These include things such as applying appropriate fonts or other styling, switching voice in text to speech, spell-checking, etc.

The more content is tagged and tagged correctly with language information, the more useful and pervasive such applications will become.

It is also possible to use language information provided by a browser or user agent to ensure that, where a choice exists, users receive data in their preferred language.

Declaring the language

Content authors need to know how to declare the language of a document or range of text in the Web technology they are dealing with. Most XML-based formats, such as XHTML, SVG, SSML, etc. will make use of the xml:lang attribute defined by the XML specification, but there may be other markup, such as the lang attribute in HTML.

Content authors should consider whether you need to make a difference between declaring the language of a document or range of text so that tools can process it, and declaring the intended audience of a resource (ie. metadata). You may need to apply slightly different approaches for these in (X)HTML, for example. You should also avoid confusing language declarations with script and character encoding declarations.

Content authors and webmasters also need to know how to use values for languages in a standard way. The current standard approach for W3C specifications is to use the rules expressed in BCP 47. This replaces earlier specifications such as RFC 3066 and RFC 1766, and goes beyond information available in the ISO language and country standards. You should also use the IANA Language Subtag Registry took look up language tags, rather than the ISO specifications.

Using language information for styling

On some browsers, you can use style selectors to tell the style sheet are using to choose which style to apply based on the language of the current text. For example, within an English document you can assign embedded Thai text a particular font and appropriate line height adjustments anywhere it appears, just by the fact that the content is labeled as Thai.

In CSS, style sheet developers can use these selectors with the content property, where it is supported, to automatically indicate the language of a link target.

Setting language preferences for communicating with the server

When an HTTP request is made to a server, the requesting user agent usually sends information about language preferences. The server can then use this information to return a version of the document in the appropriate language if such an alternative is available.

End-users should know how to check that their language preferences are correctly set, and how to change them if not.

Webmasters should know how to set up their server to manage language-based content negotiation.

Web designers and developers dealing with multilingual sites should consider how to guide visitors to the right resources.

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news.

Designing language information into markup formats

Schema or specification developers should consider whether the format they are developing includes markup to enable authors to identify the document's main language and any changes in language within the document.

Schema or specification developers should also be clear when it is appropriate to use xml:lang in XML-based formats, and when they should create a different attribute or element to specify language information.

Author: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2007-03-08. Last substantive update 2007-03-08 18:12 GMT. This version 2007-04-13 15:49 GMT

Page location: http://www.w3.org/International/getting-started/language.en.php

For the history of document changes, search for gs-language in the i18n blog.