<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>26951</bug_id>
          
          <creation_ts>2014-10-02 09:06:28 +0000</creation_ts>
          <short_desc>why do these examples of &lt;html&gt; lack the lang attribute?</short_desc>
          <delta_ts>2014-10-02 13:17:30 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>https://html.spec.whatwg.org/#structure-of-this-specification</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          <dependson>26942</dependson>
          
          <everconfirmed>1</everconfirmed>
          <reporter name="steve faulkner">faulkner.steve</reporter>
          <assigned_to name="steve faulkner">faulkner.steve</assigned_to>
          <cc>contributor</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>zcorpan</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>112561</commentid>
    <comment_count>0</comment_count>
    <who name="steve faulkner">faulkner.steve</who>
    <bug_when>2014-10-02 09:06:28 +0000</bug_when>
    <thetext>+++ This bug was initially created as a clone of Bug #26942 +++

Specification: https://html.spec.whatwg.org/multipage/introduction.html
Multipage: https://html.spec.whatwg.org/multipage/#structure-of-this-specification
Complete: https://html.spec.whatwg.org/#structure-of-this-specification
Referrer: https://html.spec.whatwg.org/multipage/

Comment:
why do these examples of &lt;html&gt; lack the lang attribute?

Posted from: 24.22.56.84
User agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112562</commentid>
    <comment_count>1</comment_count>
    <who name="steve faulkner">faulkner.steve</who>
    <bug_when>2014-10-02 09:37:43 +0000</bug_when>
    <thetext>regardless of how many poeple do it, its best practice and useful for user agents such as AT that use the lang attribute to load the correct pronunciation dictionaries for a page. Thanks for calling this out, easily fixed. https://github.com/w3c/html/commit/fd501aa4b6167338bd994609e89d267fd2f1b422 

grep of data from 2013 indicates that lang use is widespread https://docs.google.com/spreadsheet/ccc?key=0AlVP5_A996c5dENJVkl4ZngxS0ZTZHVvbHdQYWQ2Zmc&amp;usp=sharing</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112563</commentid>
    <comment_count>2</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2014-10-02 10:07:10 +0000</bug_when>
    <thetext>It doesn&apos;t tell you how often lang is used correctly. lang=&quot;en&quot; in particular is often used on non-English pages due to copy/paste from &quot;best practice&quot; examples...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112565</commentid>
    <comment_count>3</comment_count>
    <who name="steve faulkner">faulkner.steve</who>
    <bug_when>2014-10-02 10:12:03 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #2)
&gt; It doesn&apos;t tell you how often lang is used correctly. lang=&quot;en&quot; in
&gt; particular is often used on non-English pages due to copy/paste from &quot;best
&gt; practice&quot; examples...

am in process of looking at data to check usage, will update add advice as appropriate</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112574</commentid>
    <comment_count>4</comment_count>
    <who name="steve faulkner">faulkner.steve</who>
    <bug_when>2014-10-02 12:37:16 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #2)
&gt; It doesn&apos;t tell you how often lang is used correctly. lang=&quot;en&quot; in
&gt; particular is often used on non-English pages due to copy/paste from &quot;best
&gt; practice&quot; examples...

so i did some digging on the latest available data from webdevdata (around 100,00 pages) found that approx 1 in 3 pages (33,000) had at least one lang attribute . I manually perused the code of approx 100 of those looking for how it was used. I found that approx 95%+ the lang attribute correctly reflected the language of the page.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112575</commentid>
    <comment_count>5</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2014-10-02 12:42:02 +0000</bug_when>
    <thetext>How many of those were non-English content?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112576</commentid>
    <comment_count>6</comment_count>
    <who name="steve faulkner">faulkner.steve</who>
    <bug_when>2014-10-02 12:42:44 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #5)
&gt; How many of those were non-English content?

approx a 3rd</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112578</commentid>
    <comment_count>7</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2014-10-02 12:58:45 +0000</bug_when>
    <thetext>So on github...

https://github.com/search?l=html&amp;q=%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=searchresults&amp;type=Code&amp;utf8=✓

4,341,523 Spanish HTML pages with &lt;html lang...&gt;

https://github.com/search?l=html&amp;q=&quot;html+lang+en&quot;+%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=searchresults&amp;type=Code&amp;utf8=✓

4,142,691 of those (95%) specify &lt;html lang=en&gt;

https://github.com/search?l=html&amp;q=&quot;html+lang+es&quot;+%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=searchresults&amp;type=Code&amp;utf8=✓

87,594 (2%) specify &lt;html lang=es&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112579</commentid>
    <comment_count>8</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2014-10-02 13:13:08 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #7)
&gt; So on github...
&gt; 
&gt; https://github.com/
&gt; search?l=html&amp;q=%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=searchresults&amp;type
&gt; =Code&amp;utf8=✓

Sorry, wrong link.

https://github.com/search?l=html&amp;q=&quot;html+lang&quot;+%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=searchresults&amp;type=Code&amp;utf8=✓</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112580</commentid>
    <comment_count>9</comment_count>
    <who name="steve faulkner">faulkner.steve</who>
    <bug_when>2014-10-02 13:17:30 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #7)
&gt; So on github...
&gt; 
&gt; https://github.com/
&gt; search?l=html&amp;q=%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=searchresults&amp;type
&gt; =Code&amp;utf8=✓
&gt; 
&gt; 4,341,523 Spanish HTML pages with &lt;html lang...&gt;
&gt; 
&gt; https://github.com/
&gt; search?l=html&amp;q=&quot;html+lang+en&quot;+%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=sea
&gt; rchresults&amp;type=Code&amp;utf8=✓
&gt; 
&gt; 4,142,691 of those (95%) specify &lt;html lang=en&gt;
&gt; 
&gt; https://github.com/
&gt; search?l=html&amp;q=&quot;html+lang+es&quot;+%28búsqueda+OR+nombre+OR+contraseña%29&amp;ref=sea
&gt; rchresults&amp;type=Code&amp;utf8=✓
&gt; 
&gt; 87,594 (2%) specify &lt;html lang=es&gt;

I am sure you can find all sorts of cruft on github, think its more worthwhile to look at published pages actually used by masses, rather than github files generally only used/viewed by the person who created them</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>