[Bug 16166] i18n-ISSUE-138: Make lang and xml:lang synonyms in HTML5

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16166

--- Comment #1 from Simon Pieters <simonp@opera.com> 2012-03-01 08:28:04 UTC ---
I ran a search on the dotnetdotcom.org data.

$ grep -aPo "<[^>]+xml:lang[^>]+>" web200904 > xmllang.txt

I removed all line breaks in xmllang.txt and then replaced all ">" with ">\n".

68202 tags have xml:lang (but potentially also lang).

I then ran this python script to filter out lines that have a lang attribute:

#!/usr/bin/python
import re
f = open('xmllang.txt', 'r')
o = open('onlyxmllang.txt', 'a')
for line in f:
    if re.search(r'\slang\s?=', line):
        continue
    o.write(line)
f.close()
o.close()


10245 tags have xml:lang but not lang. What are those tags?

#!/usr/bin/python
import re
f = open('onlyxmllang.txt', 'r')
tags = {}
for line in f:
    tag = re.match(r'<([^\s]+)', line).group(1)
    if tag in tags:
        tags[tag] = tags[tag] + 1
    else:
        tags[tag] = 1
f.close()
o = open('onlyxmllangtags.txt', 'a')
for tag in tags:
    o.write(tag + ': ' + str(tags[tag]) + '\n')
o.close()


feed: 5
rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"xmlns="http://purl.org/rss/1.0/"xmlns:foaf="http://xmlns.com/foaf/0.1/"xmlns:content="http://purl.org/rss/1.0/modules/content/"xml:lang="ja">:
1
!--: 5
h2: 16
h3: 1
dc:title: 5
blink: 1
meta: 190
htmlxmlns=: 6
rdf:li: 5
dc:publisher: 4
!DOCTYPE: 4
dc:subject: 2
span: 163
img: 24
caption: 1
li: 27
content: 5
",: 2
HTML: 59
th: 1
xs:documentation: 811
input: 5
!--<rdf:RDF: 10
Segment: 27
dcterms:isPartOf: 4
body: 93
rdf:RDF: 7
head: 5
acronym: 35
?php++require_once: 1
td: 16
link: 17
abbr: 90
address: 1
em: 3
strong: 1
table: 1
!--<html: 1
rss: 1
a: 105
i: 2
title: 1
html: 7965
summary: 1
htmlxml:lang="fr": 3
p: 430
META: 24
div: 58

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug.

Received on Thursday, 1 March 2012 08:28:08 UTC