<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>10661</bug_id>
          
          <creation_ts>2010-09-20 15:36:36 +0000</creation_ts>
          <short_desc>use an ISO 639-2 specified language for HTML5 documents</short_desc>
          <delta_ts>2015-01-12 18:46:39 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML5 spec (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>a11y</keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Gregory J. Rosmaita">oedipus</reporter>
          <assigned_to>contributor</assigned_to>
          <cc>faulkner.steve</cc>
    
    <cc>hsivonen</cc>
    
    <cc>ian</cc>
    
    <cc>joshue.oconnor</cc>
    
    <cc>laura.lee.carlson</cc>
    
    <cc>mike</cc>
    
    <cc>Ms2ger</cc>
    
    <cc>oedipus</cc>
    
    <cc>public-html-a11y</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>singer</cc>
    
    <cc>xn--mlform-iua</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>39116</commentid>
    <comment_count>0</comment_count>
    <who name="Gregory J. Rosmaita">oedipus</who>
    <bug_when>2010-09-20 15:36:36 +0000</bug_when>
    <thetext>PROBLEM: not all the components of HTML5 use the correct ISO 639-2 
natural language indicator in the HEAD of HTML5 documents 

for example, the main HTML5 spec uses:

&lt;html lang=&quot;en-US-x-Hixie&quot; class=&quot;split index&quot;&gt;

W3C Technical Report Publication Policy, (Pubrules), suggests &quot;en&quot; or 
&quot;en-us&quot; (nothing personal, fellow anglophones, but W3C documents follow 
&quot;en-us&quot; rules as far as spelling, etc.)

SOLUTION: use either lang=&quot;en&quot; or lang=&quot;en-us&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39130</commentid>
    <comment_count>1</comment_count>
    <who name="Ms2ger">Ms2ger</who>
    <bug_when>2010-09-21 11:11:55 +0000</bug_when>
    <thetext>I don&apos;t understand what the problem is.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39131</commentid>
    <comment_count>2</comment_count>
    <who name="Gregory J. Rosmaita">oedipus</who>
    <bug_when>2010-09-21 13:32:14 +0000</bug_when>
    <thetext>(In reply to comment #1)
&gt; I don&apos;t understand what the problem is.

currently the HTML5 spec declares lang=&quot;en-US-x-Hixie&quot; whereas it should declare a recognized natural language declaration for the document such as:

lang=&quot;en&quot;

or 

lang=&quot;en-us&quot;

is essential for proper processing of the natural language contained in the document -- screen reader users, for example, whose first language is not 
english but who understand english may have their screen reader set to auto-switch natural language based on the natural language declared for that page;
likewise, someone whose first language uses a non-latin alphabet, relies on the natural language declaration for the page in order to properly render the page&apos;s content 

lang=&quot;en-US-x-Hixie&quot; may be &quot;cute&quot; but cute has no place in a standard which itself should comply with standards, and the standard for W3C publications is to use either lang=&quot;en&quot; or lang=&quot;en-us&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39133</commentid>
    <comment_count>3</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-09-21 14:30:46 +0000</bug_when>
    <thetext>(In reply to comment #2)&gt; is essential for proper processing of the natural language contained in the&gt; document -- screen reader users,Is this a practical problem? That is, are there screen readers in use that don&apos;t properly ignore language subtags they don&apos;t know about. If there are, have you filed bugs against those screen readers about implementing RFC 5646 properly?(FWIW, I think using private use language subtags in a standard is questionable, but if software fails to ignore unrecognized subtags and fails to pay attention to the standard subtags (en and US), that&apos;s a bigger problem that needs to go in the appropriate bug databases.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39135</commentid>
    <comment_count>4</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-09-21 14:42:55 +0000</bug_when>
    <thetext>Let&apos;s try that again with proper line breaks:
(In reply to comment #2)
&gt; is essential for proper processing of the natural language contained in the
&gt; document -- screen reader users,

Is this a practical problem? That is, are there screen readers in use that don&apos;t properly ignore language subtags they don&apos;t know about. If there are, have you filed bugs against those screen readers about implementing RFC 5646 properly?

(FWIW, I think using private use language subtags in public is questionable, but if software fails to ignore unrecognized subtags and fails to pay attention to the standard subtags (en and US), that&apos;s a bigger problem that needs to go in the appropriate bug databases.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39154</commentid>
    <comment_count>5</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2010-09-21 16:44:40 +0000</bug_when>
    <thetext>(In reply to comment #4)

&gt; (FWIW, I think using private use language subtags in public is questionable,
&gt; but if software fails to ignore unrecognized subtags and fails to pay attention
&gt; to the standard subtags (en and US), that&apos;s a bigger problem that needs to go
&gt; in the appropriate bug databases.)


Where is it defined that user agents must ignore the &apos;x-*&apos; part? I don&apos;t think that anyone can infer what &apos;en-us-x-hixie&apos; means. There is no semantic difference between &apos;en-us-x-hixie&apos; and &apos;en-us-x-myscript&apos; or &apos;en-us-x-my-invented-orthogra-phy&apos;. It would perhaps be smart of user agents to - by default - ignore the &apos;-x-whatever&apos; part. But I don&apos;t think it is said anywhere that they should.

&gt; Is this a practical problem? That is, are there screen readers in use that
&gt; don&apos;t properly ignore language subtags they don&apos;t know about. If there are,
&gt; have you filed bugs against those screen readers about implementing RFC 5646
&gt; properly?

First one needs to know what the proper treatment of such tags is. From BCP47:

]]
2.2.  Language Subtag Sources and Interpretation
  
   …

o  The single-letter subtag &apos;x&apos; introduces a sequence of private use
      subtags.  The interpretation of any private use subtag is defined
   …
      solely by private agreement and is not defined by the rules in
      this section or in any standard or registry defined in this
      document.
[[</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39668</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-28 06:51:08 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Please obtain a sense of humour and then try again.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39715</commentid>
    <comment_count>7</comment_count>
    <who name="Gregory J. Rosmaita">oedipus</who>
    <bug_when>2010-09-28 11:13:15 +0000</bug_when>
    <thetext>(In reply to comment #6)
&gt; Status: Rejected
&gt; Change Description: no spec change
&gt; Rationale: Please obtain a sense of humour and then try again.

i once had a sense of humor -- then i started working on HTML5... regardless of humor, the natural language definition of the w3c document should be either &quot;en&quot; or &quot;en-us&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39768</commentid>
    <comment_count>8</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-28 18:01:04 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: No new information added this last resolution.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40214</commentid>
    <comment_count>9</comment_count>
    <who name="Ms2ger">Ms2ger</who>
    <bug_when>2010-09-30 17:08:49 +0000</bug_when>
    <thetext>Still no new information added since the last resolution.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40225</commentid>
    <comment_count>10</comment_count>
    <who name="Gregory J. Rosmaita">oedipus</who>
    <bug_when>2010-09-30 17:36:18 +0000</bug_when>
    <thetext>(In reply to comment #9)
&gt; Still no new information added since the last resolution.

in the w3c manual of style, it is written:

in section 11.2 &quot;Spelling&quot; [http://www.w3.org/2001/06/manual/#Parts]
QUOTE
# Spell-check using a U.S. English dictionary. Append &quot;,spell&quot; to a
W3C URI to invoke W3C&apos;s spell checker.
# Free dictionaries are also available on the Ispell home page [ISPELL]
for UNIX and the Excalibur home page [EXCAL] for Mac OS.
# W3C uses Merriam-Webster&apos;s Collegiate® Dictionary, 10th Edition [M-W],
on the Web as the spelling arbiter because it is free, on-line, and
available to every technical report author and editor. If a word does
not appear there, use the American Heritage® Dictionary, 4th Edition
[AH]. Other dictionaries are used as needed (for example, Random House
and Webster&apos;s unabridged, Oxford and Oxford Concise).
# W3C uses U.S. English (e.g., &quot;standardise&quot; should read &quot;standardize&quot;
and &quot;behaviour&quot; should read &quot;behavior&quot;).
UNQUOTE

and in section 11.9 &quot;Markup&quot; [http://www.w3.org/2001/06/manual/#Markup]

QUOTE
* Give each page lang=&quot;en-US&quot; on the html element for HTML, or
xml:lang=&quot;en-US&quot; lang=&quot;en-US&quot; on the html element for XHTML 1.0.
* Use the span element and lang and xml:lang attributes for
language changes within a page.
UNQUOTE

yes, the style guide contains recommendations, not requirements, but i do 
find it compelling to this discussion, though, that the document simply 
states, &quot;W3C uses U.S. English&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40292</commentid>
    <comment_count>11</comment_count>
    <who name="David Singer">singer</who>
    <bug_when>2010-09-30 20:35:37 +0000</bug_when>
    <thetext>Minor comments.

1) the title of the bug is wrong.  en-us-x-hixie does use an ISO 639 (not even 639-2) language code (&apos;en&apos;). It&apos;s actually a request to use a BCP-47 language tag without private use subtags.

2) the private use subtags should not detract from the meaning of the standard prefix. systems that &apos;get confused&apos; by that subtag have a bug.  if they are not aware of a private agreement assigning meaning to that subtag, then they can (and should) ignore it.

4) en-us-x-hixie is en-us.  The private use subtag tells you *more*, and its presence cannot remove information from the standard use subtags.

4) little jokes are fine.  if they cause problems, it&apos;s also fine to make one and drop it.  we don&apos;t need heat on either side.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41121</commentid>
    <comment_count>12</comment_count>
    <who name="Joshue O Connor">joshue.oconnor</who>
    <bug_when>2010-10-12 15:36:47 +0000</bug_when>
    <thetext>

The Bug Triage Sub Team have reviewed this Bug and feel it is not a TF priority, not related to the spec features but a minor accessibility issue of the spec itself. The use of en-us-x-hixie as a language tag seems inappropriate for a formal specification, but not worth TF effort. We suggest Gregory takes the advice to file bugs against user agents that don&apos;t handle the language tag as per spec.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42465</commentid>
    <comment_count>13</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-11-11 23:01:00 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: there is no problem here</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>117030</commentid>
    <comment_count>14</comment_count>
    <who name="Edward O&apos;Connor">eoconnor</who>
    <bug_when>2015-01-12 18:46:39 +0000</bug_when>
    <thetext>*** Bug 18816 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>