This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9263 - Incorrect language determination algorithm
Summary: Incorrect language determination algorithm
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC All
: P3 normal
Target Milestone: LC
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/Overview...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-18 10:27 UTC by Leif Halvard Silli
Modified: 2010-10-04 14:48 UTC (History)
5 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2010-03-18 10:27:54 UTC
Section '3.2.3.3 The lang and xml:lang attributes' says:

]]
Setting the attribute to the empty string indicates that the primary language is unknown. [BCP47]
[[

General comment: Please look through the text in this textion and get rid of unclarities related to the use of the wordings "unknown" and "abscense of any language information" etc.

Please specify what it means that the lang is unknown. Should the user agent accept that the lang is unknown? Or should it go looking for a language? Note that the last step of the language determination algorithm of the same section says:

]]
 In the absence of any language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown (the empty string).
[[

Should a user agent consider an empty lang="" as "absence of any language information"? Or should it consider that it means that the language is "unknown"? The above sentence should say that the language is "unknown" also when the lang="" attribute is set to the empty string. The user agent should then abort the language detection algorithm and set the language of the node to "unknown".

Proposal: I think that user agents, internally,  should discern between an empty lang="" that sets the language to "unknown" and "no language information can be found".


Comments in more detail, on the language determination algorithm:

]] To determine the language of a node,  [[

PROBLEM: What is the language of a node *before* the user agent starts looking for its language? Is it "uknown"? If it is "unknown", what should then happen when the user agent detects that the nearest  lang="" attribute contains the empty string? Should it go looking for the next non-empty lang attribute and/or for a content-language header? Or should it stop looking? (Answer: It should stop looking.)

Please make clear(er) what the User Agent should do when the the lang attribute contains the empty string.

]] 
If no explicit language is given for any ancestors of the node, including the root element, but there is a pragma-set default language set, then that is the language of the node.
[[

Comment:  If the @lang attribute is set to the empty string, does this then count as "no explicit language is given"? Or does it mean that a explicit "unknown language" has been set? (I suggest that it should be the latter.)

]]
If there is no pragma-set default language, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language. In the absence of any language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown (the empty string).
[[

Please make clear that the pragma-set language and/or the higher protocol MUST not be used as fallback language whenever the lang="" attirbute has been set to the empty string. (Currently, Firefox and Safari violate this.) 

I concretely suggest saying something like "then the language of the node is equal to unknown (equal to the empty string)" instead of the current "the language of the node is unknown (the empty string)"

Test case to show that Mozilla and Webkit wrongly ignores a lang="" with the empty string, and instead go looking for the pragma and/or the http header:

 http://software.hixie.ch/utilities/js/live-dom-viewer/saved/406
Comment 2 Ian 'Hixie' Hickson 2010-04-02 00:00:58 UTC
*Please file one bug per issue.*

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please file new bugs. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: I've tried to clarify this section. If it is still unclear, please file a new bug for each unclear bit, please do not reopen this bug.
Comment 3 Ian 'Hixie' Hickson 2010-04-02 00:10:29 UTC
http://html5.org/tools/web-apps-tracker?from=4942&to=4943