“Ciel ! Ma page invalide” or how to be caught!

We are often recommending Web developers to create good Web pages and to follow Web standards. But do we stand by our own criterias of quality? How much do we eat our own dog food? So we ran the Log Validator script recently on the whole QA Web site and we found out that we were invalid! Time to apply our own little method : Improve the quality of the Web site step by step.

Results for module basic : HTML Validation

Here are the 630 most popular documents overall for the W3C QA Space.

Results for module HTMLValidator

Here are the 4 most popular invalid document(s) that I could find in the logs for the W3C QA Space.

Rank Hits #Error(s) Address
20 42 18 http://www.w3.org/QA/2006/01/failed_commitments.html
63 15 3 http://www.w3.org/QA/WG/2004/06/QAH-issues
86 9 1 http://www.w3.org/QA/Tools/LogValidator/Manual-Modules

Conclusion : You asked for 100 invalid HTML document but I could only find 4 by processing (all the) 100 document(s) in your logs. This means that about 4% of your most popular documents were invalid. NOTE: I stopped after processing 100 documents: Maybe you could set MaxDocuments to a higher value?

Here are the 49 most popular document(s) with broken links that I could find in the logs for the W3C QA Space.

Rank Hits #Error(s) Address
6 327 4 http://www.w3.org/QA/
9 199 1 http://www.w3.org/QA/2002/04/valid-dtd-list.html
11 110 1 http://www.w3.org/QA/Activity.html
16 68 1 http://www.w3.org/QA/IG/
18 65 1 http://www.w3.org/QA/Library/
22 52 1 http://www.w3.org/QA/2005/08/specgl-errata
24 48 11 http://www.w3.org/QA/TheMatrix
25 47 1 http://www.w3.org/QA/Tools/qa-dev
28 44 1 http://www.w3.org/QA/2006/03/minutes_of_qa_ig_f2f_at_the_w3.html
30 43 1 http://www.w3.org/QA/Tools/LogValidator/
31 43 1 http://www.w3.org/QA/2006/02/buy_standards_compliant_web_si.html
32 42 1 http://www.w3.org/QA/WG/qaframe-primer
33 42 1 http://www.w3.org/QA/2006/02/ruby_annotation_to_change_the.html
34 42 1 http://www.w3.org/QA/2006/01/failed_commitments.html
36 41 1 http://www.w3.org/QA/2006/02/content_negotiation.html
37 41 3 http://www.w3.org/QA/archive/w3cqa_news/technology_101/
38 41 1 http://www.w3.org/QA/2006/03/
40 41 3 http://www.w3.org/QA/2006/02/
41 41 3 http://www.w3.org/QA/archive/technology/css/
42 41 1 http://www.w3.org/QA/WG/
43 41 2 http://www.w3.org/QA/archive/technology/http/
45 40 3 http://www.w3.org/QA/archive/web_spotting/opinions_editorial/
46 39 1 http://www.w3.org/QA/Tips/iso-date
50 37 1 http://www.w3.org/QA/2002/04/Web-Quality
52 36 2 http://www.w3.org/QA/Tips/use-links
53 36 4 http://www.w3.org/QA/archive/web_spotting/tutorials/
54 35 9 http://www.w3.org/QA/archive/w3cqa_news/publications/
55 35 7 http://www.w3.org/QA/archive/w3cqa_news/tools/
56 33 2 http://www.w3.org/QA/2006/01/quality_assurance_interest_gro.html
57 33 5 http://www.w3.org/QA/archive/w3cqa_news/qaig_life/
58 32 4 http://www.w3.org/QA/archive/w3cqa_news/meetings/
60 32 1 http://www.w3.org/QA/WG/2005/01/test-faq
61 32 5 http://www.w3.org/QA/archive/technology/html/
63 28 1 http://www.w3.org/QA/Agenda/
67 25 1 http://www.w3.org/QA/Tools/LogValidator
73 21 1 http://www.w3.org/QA/2002/09/Step-by-step
75 20 1 http://www.w3.org/QA/2002/07/WebAgency-Requirements
77 18 1 http://www.w3.org/QA/2004/08/QAH-charter.html
78 18 1 http://www.w3.org/QA/Activity
80 17 1 http://www.w3.org/QA/IG/charter.html
84 15 1 http://www.w3.org/QA/WG
85 15 1 http://www.w3.org/QA/WG/2004/06/QAH-issues
86 15 1 http://www.w3.org/QA/Tips/uri-manage
87 14 2 http://www.w3.org/QA/2004/08/QAH-qapd-text.html
89 13 1 http://www.w3.org/QA/Library
91 13 11 http://www.w3.org/QA/WG/qawg-issues-html
96 13 1 http://www.w3.org/QA/IG
97 13 1 http://www.w3.org/QA/Tools/qa-dev/
100 12 4 http://www.w3.org/QA/WG/qaframe-spec-extech

Conclusion : I had to check 100 document(s) in order to find 49 HTML documents with broken links. This means that about 49% of your most popular documents needs fixing. NOTE: I stopped after processing 100 documents: Maybe you could set MaxDocuments to a higher value?

Next Step? Fixing!

We will run again this program every month and see if we can respect what we predicate. When there will be issues which seems more difficult to overcome, we will explain the solution we have found and adopted to solve them.

20 thoughts on ““Ciel ! Ma page invalide” or how to be caught!

  1. “How much do we eat our own dog food?”

    A funny one! Is this translated from a French saying?

    What tool were you using to automatically check those pages? Maybe I’m just missing an important feature of the W3C’s own validator!

    Interesting to see those pages failed validation, although it’s not something to be embarrassed about. The errors were few and far between, due to being built upon solid – standards compliant – foundations in the first place. It’s good you shared that even the W3C has some – very minor – skeletons in the closet!

  2. :D I just got my website to validate, then switched to dreamweaver, and it all got invalid again :( Notepad Rocks!


  3. I see that among the 4 «most popular invalid documents» (you’ve got to love that formulation!) one is dated back from 2004/06.

    How much back in time do you think one should go? I mean, I know W3C documents are to be kept for a long time at the same place (Cool URIs etc.), but for the rest of the world, what do you think?

  4. Stephane: I would say you should keep a historical page for as long as possible, how long that actually is remains your decision. What you have to think about when deleting a page is what a person will think of you when they recieve a 404 error.

    I’m also pretty certain Search Engines appreciate lots of content, so the longer you keep those old pages the better.

    Not sure whether it would be worth making them validate after so long though, that’s something else that would require careful consideration: Advantage Vs. Cost.

  5. Tday’s lesson? Always Validate Before You Put It Online.

    Actually, no, it’s Shut The Hell Up And Walk The Talk First, but I will not go into this right now ;-)

    Even in programming, you are told to write a large program step by step, bit by bit, always testing what you already have to make sure it works, before you add more code. Otherwise you’ll find yourself with 2500 lines of code generating 500 errors, and you’ll have no idea where to start fixing them.

    Similiarly, the validation process shouldn’t be treated as an afterthought; fixing 2 or 3 errors at a time, 50 times, is definitely less intellectually and psychologically daunting than trying to fix 100 to 150 validation errors in a huge document! In my opinion tne should make validation an integral part of the web development and site design process.

  6. In reply to Tom and “Notepad Rocks!”
    Yes indeed it does but there is a tag that suppresses a large number of html editors from changing your tags with their so-called auto correct features and here it is. Not sure if it will help you with Dreamweaver but it is worth a try.

    Hope that is of help to someone:)

  7. Mike: there was progress even before the article was posted – the invalid markup was being fixed. But is it really embarassing? :) On a site with hundreds, thousands of documents, most of them with markup written by hand, not having even a handful of invalid documents isn’t that embarassing, especially since we took care of them.

    Liam: The tool used to check the whole QA site, as mentioned in the article, is called the Log Validator. There’s a link in the article leading to that software’s page.

    Stephane: The age of a resource matters little, I think. First, as Liam mentioned, keeping old documents around is a situation that benefits everyone (no rogue 404s, search engines notice it and give you “credit” for it…). Deciding whether you want to check these old resources when cleaning up the whole site should not be a question of age, however, but a question of whether these resources are being accessed: it may be that some old resources are extremely popular, moreso than recent ones… Which is why the log validator works by popularity of resources, not by their age.

  8. I’m sure it’s been considered, but why should validation be something we have to remember or forget. Could it not be automated? Here’s some ideas, what do you think:

    1) w3cFTP – Could automatically validate every page you upload?

    2) w3cLive – Not just your ordinary i met the standards once button. But a script which actually validates a page for each hit, to then determine whether to display the validation gif or not.

    3) w3cAgent – Similar to a broken link checker. A script to be distributed which can periodically check for: broken links, speed testing, xhtml validation, css validation, WAI,… whatever else. Take note of the Web Developer Toolbar for Firefox.

  9. Si seulement il n’y avait que ces pages là…

    Pensez donc à toute la partie de la W3School. C’est l’horreur !

    Courage !

  10. Bonjour K,

    W3School est un projet qui n’a aucune relation avec le W3C. Ce commentaire est interessant car je me posais la question, il y a quelques temps si quelques volontaires aimeraient fair un catalogue de tous les elements et attributs des technologies du W3C (HTML 4.01 inclus) afin de faire une reference pour tous les logiciels a l’exterieur.

    (For English readers: W3School is not affiliated in any way with W3C.)

  11. Emil: the “we likes” was more of a private joke than a mistake… But congratulations, you were apparently the first to notice it.

  12. Interesting to see those pages failed validation, It’s good you shared that even the W3C has some – very minor – skeletons in the closet!

  13. indeed it does but there is a tag that suppresses a large number of html editors from changing your tags with their so-called auto correct features and here it is. Not sure if it will help you with Dreamweaver but it is worth a try.

  14. This is really funny; pure code/standards evangelists caught with messy html :) I guess it would suffice if Google required clear code to be indexed, and the quality of (x)html/css would improve.

  15. Well here it is years later and I notice my comment #2006-05-17 was stripped of the useful information that I gave. Not much to do about that but thanx Jason #2006-09-04 for reiterating my comment almost word for word. At first I thought it is to bad that comment as well does not have the tag I was referring to and now I can not remember what it was.

    Oh well, you are better off to not use Dreamweaver or the like and just stick to notepad anyway and the meta tag was surely not a standard so to spread it would only make the tag soup epidemic worse. I do wonder what ever happened with these tests though and where w3c stands with valid pages. This page is still invalid, it has a dtd of xhtml 1.0 strict and the tables lack the summary attribute.

    1. W3C Web site is not perfect. Some pages are invalid. Last we ran a survey through all pages, not many pages were invalid.

      Now there is the question of conformity to HTML specification or other standards like WAI, etc. We do not score as high I’m pretty sure. We could certainly improve in some areas. Suggested fixes are welcome, when they are done politely with an example.

Comments are closed.