This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9264 - There should be a link/border between META content-language algorithm and HTTP content-language headers
Summary: There should be a link/border between META content-language algorithm and HTT...
Status: RESOLVED NEEDSINFO
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC All
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/semantic...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-18 11:36 UTC by Leif Halvard Silli
Modified: 2010-10-04 13:57 UTC (History)
6 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2010-03-18 11:36:05 UTC
Challenge: When a node's language is set to "unknown" via an empty lang="" attribute, then user agents should respect that and not search for a fallback language in a META@content-language and/or the content-language header from the server. (See Bug 9263)

HOWEVER,  there is a practical problem: Firefox and Safari (Gecko and Webkit) currently break this rule, and instead applies any language tag they may find in the content-language HTTP header/META element as the language of such elements.

Test case:  http://software.hixie.ch/utilities/js/live-dom-viewer/saved/406

For Webkit, Konqueror and Chrome, the cure for this is to provide two (2) META elements - one who defines the content-language(s) and then an empty one:

<!DOCTYPE html>
<html  lang="">
<meta http-equiv="Content-Language" content="en">
<meta http-equiv="Content-Language" content="">
<p>Webkit, Konqueror and Chrome think the language is unknown here.

Test case: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/407

However, this doesn't cure it for Mozilla browsers. Firstly, they don't care whether the empty META comes first or last. But they do require that the empty META contains white-space!

<!DOCTYPE html>
<html  lang="">
<meta http-equiv="Content-Language" content="en">
<meta http-equiv="Content-Language" content="                                 ">
<p>Mozilla browsers think the language is unknown here.

Test case: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/408

The problem is that this is invalid in the current version of HTML5.

In HTML4 and XHTML, there is no requirement that the content="" attribute of the <META> Content-Language element doesn't contain only white-space.

But HTML5 says that it is the *first* META pragma that defines the language. The second META should be ignored.

PROPOSAL: 

Variant 1: As long as the documents *first* <META> Content-Language element fulfills the HTML5 requirements (whatever they will end up looking like), then the next/last <META> Content-Language element should be allowed to have content which cancels the fallback language effect. Namely, it should be allowed to contain white-space. (Or a comma - or something else that works.) 

Variant 2: A <META> Content-Language element  should be allowed to contain whitespace. (Or a comma - or something else that cancels the fallback language effect.)
Comment 1 Leif Halvard Silli 2010-03-18 11:59:05 UTC
(In reply to comment #0)

> The problem is that this is invalid in the current version of HTML5.
> 
> In HTML4 and XHTML, there is no requirement that the content="" attribute of
> the <META> Content-Language element doesn't contain only white-space.

Whereas HTML5 doesnt permit only whites-space inside.
Comment 3 Ms2ger 2010-03-18 20:24:11 UTC
Solution: don't add the meta element in the first place? If Gecko and WebKit don't match the specification, I suggest filing bugs in their respective bug trackers.
Comment 4 Leif Halvard Silli 2010-03-18 22:16:49 UTC
(In reply to comment #3)
> Solution: don't add the meta element in the first place? If Gecko and WebKit
> don't match the specification, I suggest filing bugs in their respective bug
> trackers.

Sorry, but that is not a solution: As I explained above, it doesn't matter to Mozilla browsers whether the content-language value comes from the server or from the or from the meta element.

To forbid the meta content-language would only be a solution to Chrome, Webkit and Konqueror.
Comment 5 Leif Halvard Silli 2010-03-18 23:23:16 UTC
(In reply to comment #0)
   ...
> PROPOSAL: 
> Variant 1:  [...]
> Variant 2: []

Variant 3: Stick the finger in soil  (orientate yourself about reality): 

Is it realistic that all browsers start to look at the first META element? When? Currently, all browsers that look at the META content-language look at the *last* element, if there are more than one. 

And of these (which includes IE), only Mozilla makes use of this element when it contains more than one language.

Hence a solution to this problem could be that 
 
A) Forget about requiring user agents to look a the first META content-language element first.   
B) On an edge, validators should give a warning *especially* when META content-language contains only one language.
C)  Spec should advice that, whenever server and/or META is used to send the content-language, and when the document needs to use an empty lang="" to set an element to "unknown language" (in other words: when you want to have full control) then there should always be a last, white-space filled, META content-language element in the code. 

In particular: If we really ask all user agents to change themselves and look at the first META content-language element, then we should take a deeper look at it and make more and more useful changes than that.
Comment 6 Ian 'Hixie' Hickson 2010-04-01 21:56:43 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Changing the language to work around fixable minor browser bugs is not the right way to do it. We'd be better off fixing the browsers. Please file bugs with the relevant browser vendors. If we assume that the browsers aren't going to do what the spec says, then there's no point having the spec.
Comment 7 Leif Halvard Silli 2010-04-05 02:31:10 UTC
New info for the editor to consider. I also changed the title of this bug. 

The real issue here, is not one about user agent bugs (even if that issue play a role as well) but is instead about an undefined spot in the spec.

According to: 
http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-language

]]
Until the pragma is successfully processed, there is no pragma-set default language.
     [ snip ]
2. If the meta element has no content attribute, or if that attribute's value
     is the empty string, then abort these steps.
3. If the element's content attribute contains a U+002C COMMA character (,) 
     then abort these steps.
[[

So, if the pragma processing is aborted because the content attribute is the empty string or because the element contains multiple languages (and thus a comma), then there is no pragma set default language. 

And since there is no pragma set default language, the HTTP  header "must be used as the final fallback language", as described here:
http://dev.w3.org/html5/spec/Overview.html#the-lang-and-xml:lang-attributes 

]]
If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead.
[[

So, on one side, you insist on making http-equiv content-language identical with the lang="*" attribute, through the author requirement that it should only contain a single BCP47 language tag. While on the other side, there is no congruency between the semantics of

    <HTML LANG="<empty-string>" 

and the semanticcs of

    <META http-equiv="content-language" content="<empty-string>">

The lack of congruency is however made up by the facgt that  this is in line with how Mozilla based browsers behave - which until IE8 were the  only ones to actually listen to the HTTP server. 

Never the less, with the restrictions of the content model of the content attribute of the META content-language element (compared to HTML4), this creates an interoperability problem:  

* The author might not have full control of what language the server sends out. How can he/she make sure that he HTTP server's content-language header does not affect the perceived language of the document, whenever the document, and if the intent for some reason is to not use or rely on the @lang attribute? There is no way (except providing false language information inside the META declaration - which really isn't an option.)
* This in turn creates a discrepancy between user agents which do listen to the content-language header from the HTTP server, and those that do not. For the former, they will not see an empty META declaration as the end of the story - these browser will instead pick the language from the server. Whereas the latter user agents, for them an emtpy META declaration will in practise be equivalent to an empty lang="*"  attribute on the root element.  As of today, Mozilla browsers will go and listen to the HTTP server. While e.g. Safari will not. This may create problems for those that do use HTTP server sent content-language headers. Even if all future user agents started to listen to the server, there would still be problems with legacy browsers. And, more so, there would still be problems with lack of control over what the server sends out.

To solve these problems, I propose that authors may use WHITESPACE inside the META content-langauge declaration, in order to explicitely set the content-language pragma to an unknown language:

              <meta http-equiv="content-language" content="<WHITESPACE>"> 

This already has the wanted effect in Mozilla based browsers - it makes them stop listening to the server.  (WHITESPACE is just a proposal. It is possible that it, from a technical point, could be a hyphen or a underscore character as well.)
 
This requires a change to the META content-language algoritm (http://dev.w3.org/html5/spec/semantics.html#attr-meta-http-equiv-content-language ) 

I suggest the following text - or something equivalent - to be added after step 2 and before the current step 3, in the content-language pragma algorithm:

         ]] If the sole content of the element's content attribute is white space, then explicitely set the pragma-set default language to unknown and end the algorithm. [[
Comment 8 Leif Halvard Silli 2010-04-05 03:42:25 UTC
(In reply to comment #7)
>The lack of congruency is however made up by the facgt that  this is in line
>with how Mozilla based browsers behave [..]

Of course, this only holds true for when the META declaration is the empty string. If it is a list of comma separated languages, then Mozilla browsers do not behave as the spec says.
Comment 9 Ian 'Hixie' Hickson 2010-04-12 23:42:28 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: I don't quite understand. Could you provide a URL to a page that is affected by this problem so that I can study how it affects authors in the real world (i.e. outside of theoretical test cases)?