Warning:
This wiki has been archived and is now read-only.
ChangeProposals/ContentLanguages
Contents
HTML5 Change Proposal (ISSUE 88) :
Let multiple language tags continue to be legal
Leif Halvard Silli, on the 23rd of April 2010 (updated on 30th of April. New update: 12. mai 2010. On 23. June 2010: Added some precisions, more on Risks, Negative effects information as well as a new Positive effect).
Summary
- Multiple language tags (a comma separated list) in the
http-equiv="Content-Language"
meta element continue to be legal - Conformance checkers will emit a warning whenever — and only if — a fallback language actually kicks in (and as long as
<html>
contains thelang="*"
attribute, fallback never kicks in). - The warning will kick in regardless of whether the fallback language is caused by a serverside
Content-Language
HTTP header or ahttp-equiv="Content-Language"
meta element.
Note: Neither this proposal, nor any of the other proposals on the table, affects HTML’s conditions for when a fallback language kicks in.
Rationale
Rationale: Conformance checking and warnings are in place, but should be about the correct things.
The problems with the current specification (the zero edit proposal) are:
- That it it offers no carrot for doing the right thing.
- while the fallback language effect stops as soon as the author adds
lang
on the root element, the spec requires conformance checker to continue whining until thehttp-equiv="Content-Language"
meta element has been removed.
- while the fallback language effect stops as soon as the author adds
- That it prevents authors from legally using multiple values to replicate the language fallback effect of doing the same thing in a HTTP header — whether they want to replicate the effect of multiple tags or a single tag.
- That no language gets set, as HTML5 requires from multiple tags whether they occur in HTTP or in
http-equiv
, is still an effect. The spec is therefore incorrect when it says about the latter that “for instance it only supports one language”. - Also, consider Firefox’ Page Info panel. Consider some CMSes. Consider simply authors themselves. All of which today can use
http-equiv
for referring to what the HTTP Content-Language is/was meant to be.
- That no language gets set, as HTML5 requires from multiple tags whether they occur in HTTP or in
- That it underlines the confusion that may exist today, about the nature of
lang
versusContent-Language
, by requiring:- different syntax rules for features that are expected to be identical (HTTP and
http-equiv
) - similar syntax rules for features that are different (
http-equiv
andlang
) - a warning message which asks authors to “use
lang
instead” – as if they were juxtaposable alternatives.
- different syntax rules for features that are expected to be identical (HTTP and
Note: The alternative proposal about totally forbidding http-equiv="Content-Language"
has the exact same effects as the zero-edit proposal. The only advantage of a total forbidding proposal, is that it is a more consequent (but not fully consequence, because it treats HTTP different from meta). From that perspective, it is a bit easier to deal with. However, the total forbidding proposal also increases the gap between what is permitted and what actually works — from that angle it is worse than the zero edit proposal.
Instead of the above, this change proposal propose:
- the Zero-edit proposal’s warning about using
lang
instead ofContent-Language
should be changed into a warning which informs that a fallback language measure has kicked in, and recommend that authors create a language declaration (vialang
) rather than relying on the fallback feature. This warning should be shown regardless of whether the fallback comes fromhttp-equiv
or from the higher level (HTTP). Justification: Since it is a fallback feature, and with other semantics, there is no guarantee that the author has used it for the language effect. - to hold the syntax rules of HTTP (which permits multiple language tags) as the conforming ones (rather than those of
lang
, which forbids multiple languages), will have the effect of underlining thatlang
andContent-Language
have different purposes. For instance, since the fallback algorithm doesn’t kick in whenever multiple languages are used in the pragma or on the server, there would not be any warning in these cases. - a carrot: what we want from authors is that they rely on
lang
(andxml:lang
) for specifying the language — when the author does that, he/she should get immediate reward in the form of removal of conformance warning.
Details
Spec changes throughout the document
Replace the following expression, everywhere it occurs
pragma-set default language
with the following
pragma-set locale language
Spec changes to section 4.2.5.3 Pragma directives:
Replace the following text
This pragma sets the pragma-set default language. Until the pragma is successfully processed, there is no pragma-set default language.
with the following
This pragma contains a Content-Language list, whose semantics and syntax is defined in the HTTP spec. [HTTP] An HTML5 parser processes this list into a known or unknown pragma-set locale language. Until the pragma is successfully processed, there can not be a pragma-set locale language. The Content-Language list may also be defined in a HTTP header, and will then result in a known or unknown HTTP header-set locale language. When a document is lacking a language declaration in the form of thelang
orxml:lang
attribute on the root element, the document’s locale language (pragma-set or HTTP-set) is consulted by the user agent and used as fallback value for the primary document language. Validators are required to emit a warning whenever the locale language is used as fallback for the primary document language, see section 3.2.3.3 The lang and xml:lang attributes and the informative comment below.
The following info about the HTTP semantics and Content-Language usage, is informative:
- That there is no Content-Language list (as a http-equiv pragma or a HTTP header) means that the document targets all users regardless of their language preference and regardless of their ability to actually read the document language. This is often the simplest and best option.
- That there is a Content-Language list (as a http-equiv pragma or a HTTP header) means that the target audience is narrowed down to the users that are expected to prefer the language(s) on the list. Note: The Content-Language list should be defined on the HTTP server side, to be fully effective.
- The HTML parser processing is only a side effect of the HTTP semantics – authors should not define the Content-Language list according to its parser effect, but according to it semantics.
- Examples of semantically meaningful use of the Content-Language list:
- An English document localized – but not translated – for presentation to all European Union citizens: the Content-Language could list one language tag per language used in the European Union.
- An English document localized – but not translated – for German users: the Content-Language list could list a single language tag – 'de'.
- An English document is localized for British English users: the Content-Language lists a single language tag – 'en'.
- A document in Queen's English is targeted at US citizens – with the Content-Language set to 'en-US'.
- Usage warnings: Only the example number 3 would parse into a locale language value that actually was useful as a primary document language. The first example would parse into a harmless 'unknown' locale language value. While the second and fourth example would end up as to a large degree vs to a noticeable degree unusable as the primary document language. Hence the validator warnings described under section 3.2.3.3.
Delete the following text
Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.[HTTP]
(Instead a warning is shown which is related to language declaration – see proposed change to section 3.2.3.3 The lang and xml:lang attributes under the next sub header, below.)
After the following text,
the content attribute must have a value consisting of a valid BCP 47 language tag
then add the following:
, or a comma separated list of two or more BCP 47 language tags
Delete the following text:
This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.
Spec changes to section 3.2.3.3 Pragma directives:
Correct the terminology used in this paragraph
If none of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown, and the corresponding language tag is the empty string.
like this (the corrected words are emphasized):
If none of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set locale language set, then that is the language of the node. If there is no pragma-set locale language set, then language information from a higher-level protocol (such as a HTTP header-set locale language), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple locale languages, the language of the node is unknown, and the corresponding language tag is the empty string.
And after the above paragraph, then add the following NOTE:
NOTE: Conformance checkers will include a warning whenever it is necessary to use the pragma-set locale language or the HTTP header-set locale language as the primary language of an element, for the simple reason that the document’s locale language may not correspond to the primary document language, see info note about the Content-Language pragma. Authors are encouraged to eliminate the need to use use the locale language as fallback, by adding alang
orxml:lang
attribute on the root element.
Impact
Positive Effects
- More positive: authors can get rid of the warning by adding something —
<html lang="*">
— this is better than a focus on removal of the (over all) harmlessContent-Language
meta element. - More stable: same syntax as before continue to be permitted.
- More permissive: authors, CMS-es and browsers can continue to take advantage of HTTP-EQUIV’s ability to reference what the HTTP header is/was supposed to be, including replicating its fallback effect.
- More correct: the difference between
lang
andContent-Language
is pointed out, while the link betweenhttp-equiv
and HTTP is emphasized. - More useful: a warning that a fallback feature has kicked in, is more useful than a warning which focuses on one of the places where the fallback language could potentially kick in from. Why tell the author to “please use
lang
instead” if the author has already made sure that thelang
attribute is in place? - Has positive side effect: Encouragement to place a
lang
attribute on the starttag of thehtml
element will lead authors to actually type in thehtml
root element, instead of relying on the parser to generate it for them.
Negative Effects
This change proposal does not offer a simple “just cut off your left hand” solution to the problem at hand.
One could claim that to completely forbid the Content-Language
meta element is a straight forward solution — easy to teach and learn. Likewise, HTML5’s current solution is also quite simple (for specification and validator developers): always show either a warning (in case of just one language tag) or an error (in case of multiple language tags) whenever the Content-Language
meta element is used.
The justification for the more complicated approach of this change proposal, however, is that it is both more accurate as well as a better compromise. More accurate because it does not conceal the problems by introducing an artificial technical and semantic difference between Content-Language
from the HTTP header and Content-Language
inside the http-equiv
meta element. Instead it requires — and offers — authors (as well as those who teach Internationalization of HTML) to think and understand. It is a better compromise, because, it will lead to conformance checkers to display significantly fewer error and warning messages than the zero edit proposal or the ‘totally forbidden’ proposal will do (based on Opera MAMA, then 13% of Web pages include the Content-Language
meta element). It also has a more meaningful warning — focusing on semantics and effect rather than on syntax.
Conformance Classes Changes
- For UAs: none, compared with the change that HTML5 already requires.
- For validators: They must validate a comma separated list as conforming. They must check when the fallback language algorithm is activated.
- For the HTML5 spec: see the Details section above.
Risks
Conclusion: Based on the following analysis, the risks are ignorable and certainly lower than the option of always showing either a warning or an error (the “zero edit” proposal) or always showing an error (the “completely forbidden” proposal).
Analysis: To evaluate the risks, one must evaluate how authors are likely to react to this change proposal.
- Whenever a validator detects that a fallback language is in effect, this change proposal requires the validator to ask the author (via a warning message) to consider expressing the document language via
lang
(andxml:lang
) on the root element instead of relying on a fallback mechanism. - Authors are then meant to either ignore the fallback language warning (if the author knows what he/she is doing) or to do one of the following:
- Either add a
lang
(andxml:lang
) attribute on the root element, to get rid of the warning – this is the simple solution that we hope most authors will take. - Or delete the
Content-Language
meta element and/or HTTP header — without simultaneously addinglang
(andxml:lang
). - Or change the value of the
Content-Language
meta element and/or HTTP header from a single language tag, to two or more language tags — without simultaneously addinglang
(andxml:lang
).
- Either add a
Any of the above 3 options will make the warning go away.
- If the author does understand the problem, the author is also likely to understand the warning and to know how to fix it — an author who is aware of the CSS
:lang(*)
selector is also likely to be aware oflang
andxml:lang
. - However, to authors who don't understand the problem, then deleting the cause of the warning, without a simultaneous adding of
lang
(andxml:lang
) will no doubt sometimes present itself as the simplest solution. Such a deletion could possibly, from time to time, lead to loss of language information for the user. Though certainly not nearly as often as the ‘Zero-edit proposal‘ and the “makehttp-equiv=Content-Language
completely forbidden” proposal would cause the same thing — since both of those proposals would lead to a conformance warning or error message every time the meta element occurs. - This proposal – in combination with more and more deployment of HTML5 compatible user agents — could perhaps also lead to a rize in the amount of
Content-Language
HTTP headers and http-equiv elements containing multiple language tags. (Legacy user agents are not likely to cause such an increase, due to their buggy support.) However, the negative effects on legacy user agents are seldom experienced in practise. And as users upgrade to HTML5 compatible browsers, these already unquantifiable but seldom seen effects, will only become more and more ignorable.
References
Section 14.12 Content-Language of RFC 2616: HTML4’s general HTTP-EQUIV explanation: HTML4, section 8.1.2 Inheritance of language codes