This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10661 - use an ISO 639-2 specified language for HTML5 documents
Summary: use an ISO 639-2 specified language for HTML5 documents
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: contributor
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: a11y
: 18816 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-09-20 15:36 UTC by Gregory J. Rosmaita
Modified: 2015-01-12 18:46 UTC (History)
13 users (show)

See Also:


Attachments

Description Gregory J. Rosmaita 2010-09-20 15:36:36 UTC
PROBLEM: not all the components of HTML5 use the correct ISO 639-2 
natural language indicator in the HEAD of HTML5 documents 

for example, the main HTML5 spec uses:

<html lang="en-US-x-Hixie" class="split index">

W3C Technical Report Publication Policy, (Pubrules), suggests "en" or 
"en-us" (nothing personal, fellow anglophones, but W3C documents follow 
"en-us" rules as far as spelling, etc.)

SOLUTION: use either lang="en" or lang="en-us"
Comment 1 Ms2ger 2010-09-21 11:11:55 UTC
I don't understand what the problem is.
Comment 2 Gregory J. Rosmaita 2010-09-21 13:32:14 UTC
(In reply to comment #1)
> I don't understand what the problem is.

currently the HTML5 spec declares lang="en-US-x-Hixie" whereas it should declare a recognized natural language declaration for the document such as:

lang="en"

or 

lang="en-us"

is essential for proper processing of the natural language contained in the document -- screen reader users, for example, whose first language is not 
english but who understand english may have their screen reader set to auto-switch natural language based on the natural language declared for that page;
likewise, someone whose first language uses a non-latin alphabet, relies on the natural language declaration for the page in order to properly render the page's content 

lang="en-US-x-Hixie" may be "cute" but cute has no place in a standard which itself should comply with standards, and the standard for W3C publications is to use either lang="en" or lang="en-us"
Comment 3 Henri Sivonen 2010-09-21 14:30:46 UTC
(In reply to comment #2)> is essential for proper processing of the natural language contained in the> document -- screen reader users,Is this a practical problem? That is, are there screen readers in use that don't properly ignore language subtags they don't know about. If there are, have you filed bugs against those screen readers about implementing RFC 5646 properly?(FWIW, I think using private use language subtags in a standard is questionable, but if software fails to ignore unrecognized subtags and fails to pay attention to the standard subtags (en and US), that's a bigger problem that needs to go in the appropriate bug databases.)
Comment 4 Henri Sivonen 2010-09-21 14:42:55 UTC
Let's try that again with proper line breaks:
(In reply to comment #2)
> is essential for proper processing of the natural language contained in the
> document -- screen reader users,

Is this a practical problem? That is, are there screen readers in use that don't properly ignore language subtags they don't know about. If there are, have you filed bugs against those screen readers about implementing RFC 5646 properly?

(FWIW, I think using private use language subtags in public is questionable, but if software fails to ignore unrecognized subtags and fails to pay attention to the standard subtags (en and US), that's a bigger problem that needs to go in the appropriate bug databases.)
Comment 5 Leif Halvard Silli 2010-09-21 16:44:40 UTC
(In reply to comment #4)

> (FWIW, I think using private use language subtags in public is questionable,
> but if software fails to ignore unrecognized subtags and fails to pay attention
> to the standard subtags (en and US), that's a bigger problem that needs to go
> in the appropriate bug databases.)


Where is it defined that user agents must ignore the 'x-*' part? I don't think that anyone can infer what 'en-us-x-hixie' means. There is no semantic difference between 'en-us-x-hixie' and 'en-us-x-myscript' or 'en-us-x-my-invented-orthogra-phy'. It would perhaps be smart of user agents to - by default - ignore the '-x-whatever' part. But I don't think it is said anywhere that they should.

> Is this a practical problem? That is, are there screen readers in use that
> don't properly ignore language subtags they don't know about. If there are,
> have you filed bugs against those screen readers about implementing RFC 5646
> properly?

First one needs to know what the proper treatment of such tags is. From BCP47:

]]
2.2.  Language Subtag Sources and Interpretation
  
   …

o  The single-letter subtag 'x' introduces a sequence of private use
      subtags.  The interpretation of any private use subtag is defined
   …
      solely by private agreement and is not defined by the rules in
      this section or in any standard or registry defined in this
      document.
[[
Comment 6 Ian 'Hixie' Hickson 2010-09-28 06:51:08 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Please obtain a sense of humour and then try again.
Comment 7 Gregory J. Rosmaita 2010-09-28 11:13:15 UTC
(In reply to comment #6)
> Status: Rejected
> Change Description: no spec change
> Rationale: Please obtain a sense of humour and then try again.

i once had a sense of humor -- then i started working on HTML5... regardless of humor, the natural language definition of the w3c document should be either "en" or "en-us"
Comment 8 Ian 'Hixie' Hickson 2010-09-28 18:01:04 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: No new information added this last resolution.
Comment 9 Ms2ger 2010-09-30 17:08:49 UTC
Still no new information added since the last resolution.
Comment 10 Gregory J. Rosmaita 2010-09-30 17:36:18 UTC
(In reply to comment #9)
> Still no new information added since the last resolution.

in the w3c manual of style, it is written:

in section 11.2 "Spelling" [http://www.w3.org/2001/06/manual/#Parts]
QUOTE
# Spell-check using a U.S. English dictionary. Append ",spell" to a
W3C URI to invoke W3C's spell checker.
# Free dictionaries are also available on the Ispell home page [ISPELL]
for UNIX and the Excalibur home page [EXCAL] for Mac OS.
# W3C uses Merriam-Webster's Collegiate® Dictionary, 10th Edition [M-W],
on the Web as the spelling arbiter because it is free, on-line, and
available to every technical report author and editor. If a word does
not appear there, use the American Heritage® Dictionary, 4th Edition
[AH]. Other dictionaries are used as needed (for example, Random House
and Webster's unabridged, Oxford and Oxford Concise).
# W3C uses U.S. English (e.g., "standardise" should read "standardize"
and "behaviour" should read "behavior").
UNQUOTE

and in section 11.9 "Markup" [http://www.w3.org/2001/06/manual/#Markup]

QUOTE
* Give each page lang="en-US" on the html element for HTML, or
xml:lang="en-US" lang="en-US" on the html element for XHTML 1.0.
* Use the span element and lang and xml:lang attributes for
language changes within a page.
UNQUOTE

yes, the style guide contains recommendations, not requirements, but i do 
find it compelling to this discussion, though, that the document simply 
states, "W3C uses U.S. English"
Comment 11 David Singer 2010-09-30 20:35:37 UTC
Minor comments.

1) the title of the bug is wrong.  en-us-x-hixie does use an ISO 639 (not even 639-2) language code ('en'). It's actually a request to use a BCP-47 language tag without private use subtags.

2) the private use subtags should not detract from the meaning of the standard prefix. systems that 'get confused' by that subtag have a bug.  if they are not aware of a private agreement assigning meaning to that subtag, then they can (and should) ignore it.

4) en-us-x-hixie is en-us.  The private use subtag tells you *more*, and its presence cannot remove information from the standard use subtags.

4) little jokes are fine.  if they cause problems, it's also fine to make one and drop it.  we don't need heat on either side.
Comment 12 Joshue O Connor 2010-10-12 15:36:47 UTC

The Bug Triage Sub Team have reviewed this Bug and feel it is not a TF priority, not related to the spec features but a minor accessibility issue of the spec itself. The use of en-us-x-hixie as a language tag seems inappropriate for a formal specification, but not worth TF effort. We suggest Gregory takes the advice to file bugs against user agents that don't handle the language tag as per spec.
Comment 13 Ian 'Hixie' Hickson 2010-11-11 23:01:00 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: there is no problem here
Comment 14 Edward O'Connor 2015-01-12 18:46:39 UTC
*** Bug 18816 has been marked as a duplicate of this bug. ***