Difference between revisions of "ChangeProposals/ContentLanguages"

From HTML WG Wiki
Jump to: navigation, search
(Fantasai-improvements-version-1)
Line 6: Line 6:
  
 
* Multiple language tags (a comma separated list) in <code>http-equiv</code> <code>Content-Language</code> continues to be legal
 
* Multiple language tags (a comma separated list) in <code>http-equiv</code> <code>Content-Language</code> continues to be legal
* Conformance checkers will emit a ''warning'' whenever  – and ''only'' if –  the fallback language algorithm kicks in
+
* Conformance checkers will emit a ''warning'' whenever  – and ''only'' if –  the fallback language algorithm kicks in
 
* The fallback warning will kick in regardless of whether it comes from ''HTTP'' or <code>Content-Language</code>
 
* The fallback warning will kick in regardless of whether it comes from ''HTTP'' or <code>Content-Language</code>
  
Line 29: Line 29:
 
== Details ==
 
== Details ==
  
Proposed spec changes, to section ''4.2.5.3 Pragma directives'':
+
===Proposed spec changes, to section ''4.2.5.3 Pragma directives'':===
 +
 
 +
'''Replace the following expression, everywhere it occurs'''
 +
 
 +
<blockquote>pragma-set default language</blockquote>
 +
 
 +
'''with the following'''
 +
 
 +
<blockquote>single pragma-set audience language</blockquote>
 +
 
  
 
'''Replace the following text'''
 
'''Replace the following text'''
  
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-Content-Language">Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.[HTTP]</blockquote>
+
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-language">This pragma sets the ''pragma-set default language''.</blockquote>
  
 
'''with the following'''
 
'''with the following'''
  
<blockquote cite="">
+
<blockquote>This pragma '''often, but not always''' results in a ''single pragma-set audience language''. Until the pragma is successfully processed, there is no single pragma-set audience language.<br><br>  
The semantics of this pragma, as well as of the HTTP Content-Language header, are different from the semantics of the <code>lang</code> attribute. [HTTP] Thus, there is no guarantee that the author consciously used either of them for setting the language. Therefore, conformance checkers will include a warning, whenever HTML5’s fallback language algorithm is activated, whether it is the higher protocol or this pragma that kicks in. Authors are informed about which language the document falls back to, and are encouraged to not rely on the fallback feature but to instead explicitly use the <code>lang</code> attribute on the root element.
+
</blockquote>
+
  
After the following text,
+
This pragma must describe the specific natural language – or languages – that identifies the intended ''audience'' of the document. It may contain a single language tag, or several comma separated language tags. Its semantics and purpose are described in the HTTP spec. [HTTP]<br><br>
  
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-Content-Language">the content attribute must have a value consisting of a valid BCP 47 language tag</blockquote>
+
In practice, the intended audience language is most often represented by a single language tag, in which case the pragma will result in a ''single pragma-set audience language'' which, when language info from <code>xml:lang</code> or <code>lang</code> is absent, gets consulted in order to serve as primary language info for the element in question, see [http://dev.w3.org/html5/spec/dom.html#the-lang-and-xml:lang-attributes 3.2.3.3 The lang and xml:lang attributes].<br><br>
  
then '''add''' the following:
+
Note: When a ''single pragma-set audience language'' is absent, the higher protocol (HTTP) if any, gets  consulted, to check whether it a similar fashion sets a '''''single header-set audience language'''''.<br><br>
 +
 
 +
The single pragma-set or header-set audience language is often identical with the most useful primary language declaration for the text of the document itself because audience language and document language will often be one and the same. And this is also why it is allowed to function as fallback language information in this way.  However, this single audience language fallback would for instance not be correct if one was dealing with an English document that was aimed at a French audience. Hence, authors are not allowed to rely on the pragma-set fallback language, as a way to declare the primary document language, but should instead explicitly declare document language information via the <code>lang</code> or <code>xml:lang</code> attribute on the root element – see section 3.2.3.3. </blockquote>
 +
 
 +
 
 +
'''Delete the following text'''
 +
 
 +
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-language">Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.[HTTP]</blockquote>
 +
 
 +
(Instead a warning is shown which is related to language declaration  – see proposed change to section ''3.2.3.3 The lang and xml:lang attributes'' under the next sub header, below.)
 +
 
 +
'''After the following text,'''
 +
 
 +
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-language">the content attribute must have a value consisting of a valid BCP 47 language tag</blockquote>
 +
 
 +
'''then ''add'' the following:'''
  
 
<blockquote cite="">, or a comma separated list of two or more BCP 47 language tags</blockquote>
 
<blockquote cite="">, or a comma separated list of two or more BCP 47 language tags</blockquote>
Line 53: Line 75:
 
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-type">This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.</blockquote>
 
<blockquote cite="http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-type">This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.</blockquote>
  
 +
===Proposed spec changes, to section ''3.2.3.3 Pragma directives'':===
 +
 +
'''Correct the terminology used in this paragraph'''
 +
 +
<blockquote cite="http://dev.w3.org/html5/spec/dom.html#the-lang-and-xml:lang-attributes">
 +
If none of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown, and the corresponding language tag is the empty string.</blockquote>
 +
 +
'''like this (the ''corrected words'' are emphasized):'''
 +
 +
<blockquote>
 +
If none of the node's ancestors, including the root element, have either attribute set, but there is a '''single pragma-set audience language''' set, then that is the language of the node. If there is no '''single pragma-set audience language''' set, then language information from a higher-level protocol (such as a '''single HTTP header-set audience language'''), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple '''audience''' languages, the language of the node is unknown, and the corresponding language tag is the empty string.</blockquote>
 +
 +
'''And ''after'' the above paragraph, then ''add'' the following ''NOTE'':'''
 +
 +
<blockquote>NOTE: Conformance checkers will include a warning whenever it is ''both'' '''possible''' ''and'' '''necessary''' to use the ''single pragma-set audience language'' or the ''single HTTP header-set audience language'' as the primary language of an element, for the simple reason that the audience language might not correspond to the primary document language. Authors are encouraged to eliminate the need to use use the audience language as fallback, by adding a <code>lang</code> or  <code>xml:lang</code> attribute on the root element.</blockquote>
  
 
== Impact ==
 
== Impact ==

Revision as of 02:58, 11 May 2010

HTML5 Change Proposal (ISSUE 88) :
Let multiple language tags continue to be legal

Leif Halvard Silli, on the 23rd of April 2010 (updated on 30th of April).

Summary

  • Multiple language tags (a comma separated list) in http-equiv Content-Language continues to be legal
  • Conformance checkers will emit a warning whenever – and only if – the fallback language algorithm kicks in
  • The fallback warning will kick in regardless of whether it comes from HTTP or Content-Language

Rationale

The problems with the current specification are

  1. That it prevents authors from legally using multiple values to replicate the language fallback effect of doing the same thing in a HTTP header.
    • That no language gets set, as HTML5 requires from multiple tags whether they occur in HTTP or in http-equiv, is still an effect. The spec is therefore incorrect in claiming about the latter that “for instance it only supports one language”.
  2. That it prevents http-equiv from being used as a reference to what the HTTP Content-Language is/was meant to be.
    • Consider Firefox’ Page Info panel. Consider some CMSes. Consider simply authors themselves.
  3. That it underlines the confusion that may exist today, about the nature of lang versus content-language, by requiring:
    • different syntax rules for features that are expected to be identical (HTTP and http-equiv)
    • similar syntax rules for features that are different (http-equiv and lang)
    • a warning message which asks authors to “use lang instead” – as if they were juxtaposable alternatives.

Conformance checking and warnings are in place, but should be about the correct things.

  1. The current warning about using lang instead of Content-Language should be changed into a warning which informs that a fallback language measure has kicked in, and recommend that authors create a language declaration (via lang) rather than relying on the fallback feature. This warning should be shown regardless of whether the fallback comes from http-equiv or from the higher level (HTTP). Justification: Since it is a fallback feature, and with other semantics, there is no guarantee that the author has used it for the language effect.
  2. To hold the syntax rules of HTTP (which permits multiple language tags) as the conforming ones (rather than those of lang, which forbids multiple languages), will have the effect of underlining that lang and Content-Language have different purposes. For instance, since the fallback algorithm doesn’t kick in whenever multiple languages are used in the pragma or on the server, there would not be any warning in these cases.

Details

Proposed spec changes, to section 4.2.5.3 Pragma directives:

Replace the following expression, everywhere it occurs

pragma-set default language

with the following

single pragma-set audience language


Replace the following text

This pragma sets the pragma-set default language.

with the following

This pragma often, but not always results in a single pragma-set audience language. Until the pragma is successfully processed, there is no single pragma-set audience language.

This pragma must describe the specific natural language – or languages – that identifies the intended audience of the document. It may contain a single language tag, or several comma separated language tags. Its semantics and purpose are described in the HTTP spec. [HTTP]

In practice, the intended audience language is most often represented by a single language tag, in which case the pragma will result in a single pragma-set audience language which, when language info from xml:lang or lang is absent, gets consulted in order to serve as primary language info for the element in question, see 3.2.3.3 The lang and xml:lang attributes.

Note: When a single pragma-set audience language is absent, the higher protocol (HTTP) if any, gets consulted, to check whether it a similar fashion sets a single header-set audience language.

The single pragma-set or header-set audience language is often identical with the most useful primary language declaration for the text of the document itself because audience language and document language will often be one and the same. And this is also why it is allowed to function as fallback language information in this way. However, this single audience language fallback would for instance not be correct if one was dealing with an English document that was aimed at a French audience. Hence, authors are not allowed to rely on the pragma-set fallback language, as a way to declare the primary document language, but should instead explicitly declare document language information via the lang or xml:lang attribute on the root element – see section 3.2.3.3.


Delete the following text

Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.[HTTP]

(Instead a warning is shown which is related to language declaration – see proposed change to section 3.2.3.3 The lang and xml:lang attributes under the next sub header, below.)

After the following text,

the content attribute must have a value consisting of a valid BCP 47 language tag

then add the following:

, or a comma separated list of two or more BCP 47 language tags

Delete the following text:

This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.

Proposed spec changes, to section 3.2.3.3 Pragma directives:

Correct the terminology used in this paragraph

If none of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown, and the corresponding language tag is the empty string.

like this (the corrected words are emphasized):

If none of the node's ancestors, including the root element, have either attribute set, but there is a single pragma-set audience language set, then that is the language of the node. If there is no single pragma-set audience language set, then language information from a higher-level protocol (such as a single HTTP header-set audience language), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple audience languages, the language of the node is unknown, and the corresponding language tag is the empty string.

And after the above paragraph, then add the following NOTE:

NOTE: Conformance checkers will include a warning whenever it is both possible and necessary to use the single pragma-set audience language or the single HTTP header-set audience language as the primary language of an element, for the simple reason that the audience language might not correspond to the primary document language. Authors are encouraged to eliminate the need to use use the audience language as fallback, by adding a lang or xml:lang attribute on the root element.

Impact

Positive Effects

  1. More stable: same syntax as before continues to be permitted.
  2. More permissive: authors, CMS-es and browsers can continue to take advantage of HTTP-EQUIV’s ability to reference what the HTTP header is/was supposed to be, including replicating its fallback effect.
  3. More correct: the difference between lang and Content-Language is pointed out, while the link between http-equiv and HTTP is emphasized.
  4. More useful: a warning that a fallback feature has kicked in, is more useful than a warning which focuses on one of the places where the fallback language could potentially kick in from. Why tell authors to “use lang insetad” if the author has already made sure that the lang attribute is in place?

Negative Effects

none

Conformance Classes Changes

  • For UAs: none, compared with the change that HTML5 already requires.
  • For validators: They must validate a comma separated list as conforming. They must check that HTTP Content-Language and HTTP-EQUIV are identical. They must check when the fallback language algorithm is activated.
  • For the HTML5 spec: see the Details section above.

Risks

In legacy UAs, there is a risk that multiple language tags cause them to report that the document is in a meaningless language. However, this is a low risk. And authors can avoid it by using the lang and xml:lang attributes. This change proposal ensures that authors will continue to be encouraged to use lang, and not Content-Language, for setting the language.

References

Section 14.12 Content-Language of RFC 2616: HTML4’s general HTTP-EQUIV explanation: HTML4, section 8.1.2 Inheritance of language codes