ChangeProposals/ContentLanguages

From HTML WG Wiki
Jump to: navigation, search

HTML5 Change Proposal (ISSUE 88) :
Let multiple language tags continue to be legal

Leif Halvard Silli, on the 23rd of April 2010 (updated on 30th of April).

Summary

  • Multiple language tags (a comma separated list) in http-equiv Content-Language continues to be legal
  • Conformance checkers will emit a warning whenever – and only if –  the fallback language algorithm kicks in
  • The fallback warning will kick in regardless of whether it comes from HTTP or Content-Language

Rationale

The problems with the current specification are

  1. That it prevents authors from legally using multiple values to replicate the language fallback effect of doing the same thing in a HTTP header.
    • That no language gets set, as HTML5 requires from multiple tags whether they occur in HTTP or in http-equiv, is still an effect. The spec is therefore incorrect in claiming about the latter that “for instance it only supports one language”.
  2. That it prevents http-equiv from being used as a reference to what the HTTP Content-Language is/was meant to be.
    • Consider Firefox’ Page Info panel. Consider some CMSes. Consider simply authors themselves.
  3. That it underlines the confusion that may exist today, about the nature of lang versus content-language, by requiring:
    • different syntax rules for features that are expected to be identical (HTTP and http-equiv)
    • similar syntax rules for features that are different (http-equiv and lang)
    • a warning message which asks authors to “use lang instead” – as if they were juxtaposable alternatives.

Conformance checking and warnings are in place, but should be about the correct things.

  1. The current warning about using lang instead of Content-Language should be changed into a warning which informs that a fallback language measure has kicked in, and recommend that authors create a language declaration (via lang) rather than relying on the fallback feature. This warning should be shown regardless of whether the fallback comes from http-equiv or from the higher level (HTTP). Justification: Since it is a fallback feature, and with other semantics, there is no guarantee that the author has used it for the language effect.
  2. To hold the syntax rules of HTTP (which permits multiple language tags) as the conforming ones (rather than those of lang, which forbids multiple languages), will have the effect of underlining that lang and Content-Language have different purposes. For instance, since the fallback algorithm doesn’t kick in whenever multiple languages are used in the pragma or on the server, there would not be any warning in these cases.

Details

Proposed spec changes, to section 4.2.5.3 Pragma directives:

Replace the following text

Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.[HTTP]

with the following

The semantics of this pragma, as well as of the HTTP Content-Language header, are different from the semantics of the lang attribute. [HTTP] Thus, there is no guarantee that the author consciously used either of them for setting the language. Therefore, conformance checkers will include a warning, whenever HTML5’s fallback language algorithm is activated, whether it is the higher protocol or this pragma that kicks in. Authors are informed about which language the document falls back to, and are encouraged to not rely on the fallback feature but to instead explicitly use the lang attribute on the root element.

After the following text,

the content attribute must have a value consisting of a valid BCP 47 language tag

then add the following:

, or a comma separated list of two or more BCP 47 language tags

Delete the following text:

This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.


Impact

Positive Effects

  1. More stable: same syntax as before continues to be permitted.
  2. More permissive: authors, CMS-es and browsers can continue to take advantage of HTTP-EQUIV’s ability to reference what the HTTP header is/was supposed to be, including replicating its fallback effect.
  3. More correct: the difference between lang and Content-Language is pointed out, while the link between http-equiv and HTTP is emphasized.
  4. More useful: a warning that a fallback feature has kicked in, is more useful than a warning which focuses on one of the places where the fallback language could potentially kick in from. Why tell authors to “use lang insetad” if the author has already made sure that the lang attribute is in place?

Negative Effects

none

Conformance Classes Changes

  • For UAs: none, compared with the change that HTML5 already requires.
  • For validators: They must validate a comma separated list as conforming. They must check that HTTP Content-Language and HTTP-EQUIV are identical. They must check when the fallback language algorithm is activated.
  • For the HTML5 spec: see the Details section above.

Risks

In legacy UAs, there is a risk that multiple language tags cause them to report that the document is in a meaningless language. However, this is a low risk. And authors can avoid it by using the lang and xml:lang attributes. This change proposal ensures that authors will continue to be encouraged to use lang, and not Content-Language, for setting the language.

References

Section 14.12 Content-Language of RFC 2616: HTML4’s general HTTP-EQUIV explanation: HTML4, section 8.1.2 Inheritance of language codes