W3C

Primary Language in HTML

World Wide Web Consortium Note 13-March-1998

This version:
http://www.w3.org/TR/1998/NOTE-html-lan-19980313.html
Latest Version:
http://www.w3.org/TR/NOTE-html-lan
Editor:
M.T. Carrasco Benitez [CAR] <manuel.carrasco@emea.eudra.org>

Status of this document

This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has had any editorial control in its preparation, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.

This document recommends how to mark the primary language(s) in a HTML document. It could be considered a clarification of the HTML 4.0 Specification [HTML40]; in particular, it is not in contradiction with the HTML 4.0 Specification. The objective is to have a best practice in this field; at present there is some confusion.

Abstract

In HTML elements, the lang attribute specifies the natural language. This document is mostly concerned with how to specify the primary language(s) (there could be more than one) and the base language (there is only one) in HTML documents.

Overview

Most of the existing documents are monolingual. Linguistic versions (e.g., translations) of the same text are often kept as separated documents. This is indeed the most sensible approach.

Some documents are bilingual and few are trilingual or n-lingual. Bilingual documents are usually short; i.e, a few paragraphs. N-lingual documents are usually very short; a few sentences.

The main reason for the existence of n-lingual documents is political; i.e., in certain situations it is not politically correct to assume a base language. A common practice is to have one small document that is a menu of languages. For example, the Europa server of the European Commission [EUR].

Another approach to choose the language is to set the client (e.g., the browser) to the preferred language(s). The client will transmit the language(s) in the Accept-Language field of HTTP. Immediately, the server will send an appropriate document. For example, the Spanish version will be presented if the language preferences (in the browser) are Spanish and French and the document is available (in the server) in French, German and Spanish.

Where to specify the primary language(s)

There should be one recommended place to specify the primary language(s). It is recommended that the primary language(s) be specified in a META element. For example:
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Language" Content="fr">
<TITLE>Mon doc</TITLE>
</HEAD>
<BODY>
Je suis un Berlinois.
</BODY>
</HTML>

The value of the Content attribute of the META element is the same as the value of the Content-Language header in HTTP; i.e, a comma-separated list of language codes. For example:

<META HTTP-EQUIV="Content-Language" Content="fr,en">

These language codes are the same used in the lang attribute of some HTML elements. For example:

<BODY LANG=fr>

The language codes are defined in [RFC1766]. See also 8.1.1 Language codes of the HTML 4.0 Specification [HTML40] and [RFC2068].

The order of the languages in the Content-Language is significant. The first language in the list is the base language of the document; i.e., any text not re-specified with the lang attribute is in the base language.

The META should not be marked with more than one language in documents with minor fragments in other languages. The rules to specify a document as monolingual, bilingual or n-lingual are the same as for printed books.

The reason for recommending META as opposed to the HTML element with the lang attribute are:

A lang attribute in the HTML element overrides the language specified in the META element. The inheritance rules are in 8.1.2 Language information and text direction of the HTML 4.0 Specification [HTML40].

Acknowledgment

The recommendations are the rough consensus from the mailing list www-international@w3.org [LIST] of the W3C and a meeting during the Unicode Conference in Mainz in March 1997.

In particular, thanks to

References

[CAR]
M.T. Carrasco Benitez. http://dragoman.org/
[EUR]
Europa. http://europa.eu.int/
[HTML40]
HTML 4.0 Specification. http://www.w3.org/TR/REC-html40/
In particular:
2.3.1 Internationalization
5.1 The Document Character Set
7.4.4 Meta data
8 Language information and text direction
[LIST]
http://www.w3.org/International/O-misc-mlists.html
[RFC1766]
Tags for the Identification of Languages, H. Alvestrand, March 1995.
Available at http://ds.internic.net/rfc/rfc1766.txt
[RFC2068]
Hypertext Transfer Protocol -- HTTP/1.1, R. Fielding, J. Gettys, J. Mogul, H. Frystyk Nielsen and T. Berners-Lee, January 1997.
Available at http://ds.internic.net/rfc/rfc2068.txt
In particular:
3.10 Language Tags
12 Content Negotiation
12.3 Transparent Negotiation
14.4 Accept-Language
14.13 Content-Laguage
14.43 Vary
15.7 Privacy Issues Connected to Accept Headers