After WWW was invented at
CERN by Tim Berners-Lee, he decided to gather
pointers to the information that became available through his world wide web
browser under three categories: by access, by
geographical location, and by subject.
History
The latter became known as the WWW Virtual Library.
I continued the project from 1993 on, and took a major step in 1993 when it became clear that the administration of the WWW Virtual Library had to be distributed among many participants if we wanted to keep up with the phenomenal growth of the WWW.
The originality of the WWW Virtual Library is its architecture: it is decentralized, distributed among well over 100 sites, with several classifications (including a classification that follows the Library of Congress standards) and grants a large editorial independance.
About 200 people are now involved in the project from all over the
world, with one full time coordinator since January 1996. According
to WebCrawler, we are the 14th most referenced URL on
the Web in April 1996.
Perspectives
Meanwhile, the WWW Virtual Library often contains a much larger database on a specific subject than Yahoo (compare Zoos in the Virtual Library - over 200 references - or in Yahoo - I could find only one "pet" zoo - for example).
Now, let's suppose we want to use one of those fancy indexes. If we take Altavista, which is said to be the most exhaustive index currently in service - I admit, it's a friend from DEC that said that - and type "rowing", we get ten thousand references, in almost complete disorder, and I hope it is clear that the last part of the filtering is actually left to us, and we now have to wander through all these documents that were listed to us in a way only based on the occurence of the keyword we entered.
Or supposedly so: Three months ago, I was searching for a program by the Washington University called ftpd on the MetaCrawler (that searches on many different indexes), and after entering "Washington University ftp ftpd site wu-ftpd" I was startled to see that Altavista brought back in its 10 listed proposals the Playboy Home Page! (They have now fixed this particular "feature", whatever it was due to).
It is fairly easy to trick an index into showing your page most of the time, whatever the reader types as keywords, and even easier for the programmer of the index to slightly alter the results of the search, but for now I want to believe this was an unintentional mistake (I hope they didn't find a way to access my subconciousness!). Anyway, as more and more commercial interests enter the World Wide Web, it is possible that such incidents become intentional - as well as more subtle - at some point. After all, the robots that build the indexes databases don't ask for permission before storing your files on their databases, and it may be arguable to actually feed them with extra words so that your document is found more easily. (Next time you search for something on the Web, don't be too surprised if the first document presented to you is the WWW Virtual Library project ;-) .
Anyway, as the amounts of information available on the web are still in continuous expansion, dumb indexes might not be the ultimate solution. They already provide often over 100 000 answers to your request, and it is clear that we then enter a world where too much information kills information.
At the WWW Virtual Library, we believe that the only solution to this sorting problem is the good old human touch and common sense, so hard to teach to computers. One of our most active members, Dr. Matthew Ciolek, maintains a fascinating page on information quality.
Meanwhile, we continue to complete our coverage of all fields of information present on the Internet.