« IE8 versioning snowstorm | Main | CSS Validator Translation - Polish and Chinese translators wanted! »

link test suite

Writing a link checker looks simple enough. There are quite a few of them, including of course the now venerable W3C Link Checker. Ditto for web spiders, indexers. There are so many of them it has to be a sign that there is nothing very complex behind them. Basically, each of those programs just parses HTML documents, finds links, follows them. Lather, rinse, repeat. No?

Unfortunately, trouble and the devil are always in the details. Notwithstanding the fact that parsing HTML in the wild is quite a challenge in itself, there are tons of ways for these software to fail at their seemingly simple job:

In order to see a little more clearly in all this, we have started hacking together a "Link Test Suite", with a harness to run it against our W3C link checker, for a start. Join the thread on the public tools hacking list for more information (and some musings on unit testing in python), or check out the source. As usual, this is all under the W3C open source license, so do use, contribute, hack, or just take the code and run away with it.

Filed by olivier Théreaux on January 28, 2008 2:54 AM in Tools
| | Comments (0) | TrackBacks (0)

Leave a comment

Note: this blog is intended to foster a polite, on-topic and interesting discussion. Comments failing these requirements and spam will not get published; others will appear on the entry page after review by the staff. This may take some time: thank you for your patience.

Your comment


About you

This blog is written by W3C staff and working group participants,
 and maintained by Karl Dubost and olivier Thereaux.
Powered by Movable Type, magpierss and a lot of Web Technology