Web authoring statistics by Google

Hello everyone,

I came across an interesting statistical analysis of web pages by Google:
http://code.google.com/webstats/index.html

Google has analyzed the elements and attributes used on websites they index.
Some of the results might be interesting for the working group and can be
used as input for the techniques. I'll give a few examples:

* Longdesc isn't in the top 10 most used attributes for IMG. Even ISMAP is
used more often. ALT is only used in about 75% of the cases. 
* The BR element is used more than the P element, although semantically a P
should be far more common than a BR. Perhaps we can include a common
failure: using BRs instead of Ps to separate paragraphs. 
* TH is used in only a fraction of the tables out there. This could mean
most tables are layout tables or that people don't know how to properly
markup data tables.
* Elements like FONT and B are still very popular, just like deprecated
presentational attributes such as bgcolor etc.. In contrast, class and id
don't seem to get used much. To me, this indicates that a lot of websites
still control the presentation from HTML instead of CSS.

Yvette Hoitink
Heritas, Alphen aan den Rijn, the Netherlands
E-mail: y.p.hoitink@heritas.nl
WWW: http://www.heritas.nl 

Received on Saturday, 28 January 2006 13:50:59 UTC