The World Wide Web Virtual Library Project

Context

The World Wide Web (WWW) is the universe of network accessible information. Its main features are universal access and unconstrained topology, and it is independant of geography, platform, etc.

After WWW was invented at CERN by Tim Berners-Lee, he decided to gather pointers to the information that became available through his world wide web browser under three categories: by access, by geographical location, and by subject.

History

The latter became known as the WWW Virtual Library.

I continued the project from 1993 on, and took a major step in 1993 when it became clear that the administration of the WWW Virtual Library had to be distributed among many participants if we wanted to keep up with the phenomenal growth of the WWW.

The originality of the WWW Virtual Library is its architecture: it is decentralized, distributed among well over 100 sites, with several classifications (including a classification that follows the Library of Congress standards) and grants a large editorial independance.

About 200 people are now involved in the project from all over the world, with one full time coordinator since January 1996. According to WebCrawler, we are the 14th most referenced URL on the Web in April 1996.

Perspectives

Strengths of the concept

Good scalability

Whenever one of the volunteers working on the WWW Virtual Library starts having trouble maintaining an area, we encourage him to take the same step that we took and split his own area into several subjects that he will then delegate to other persons. This works surprisingly well as many people around the world are willing to maintain a list of references for a specific subject, either as a hobby or because it is their job anyway, or both. To become part of the WWW Virtual Library gives them and their work substantial additional credibility and visibility among the Internet community. Two good examples of delegation inside a subject area are Engineering and Asian Studies.

Good expertise

The people of the WWW Virtual Library will usually maintain only one specific subject, which they will in most cases teach or at least study. It gives us a tremendous edge over other catalogues such as Yahoo! where the centralized maintainance is done by people having expertise in general cataloguing, and therefore not being able to make the appropriate editorial decisions.

Good reliability

The WWW Virtual Library is located on over 100 sites, and we are currently defining schemes for redundant mirroring of the same information over several continents.

The human touch

The WWW Virtual Library is quite unique in this regard. Thanks to a large editorial independance, browsing its pages is very similar to browsing a collection of magazines, which, while not always exhaustive, are well presented and well ordered. Let me take a specific example: let's suppose we look for information on rowing. If I look on the Virtual Library home page, a click on sports, then water sports will bring the document; see how much nicer it is to look at this document than to browse through Yahoo in search of the same information. Frankly, sometimes I wonder if Yahoo isn't still on a gopher server! To say the least, browsing through Yahoo reminds me of my old phonebook...

Meanwhile, the WWW Virtual Library often contains a much larger database on a specific subject than Yahoo (compare Zoos in the Virtual Library - over 200 references - or in Yahoo - I could find only one "pet" zoo - for example).

Now, let's suppose we want to use one of those fancy indexes. If we take Altavista, which is said to be the most exhaustive index currently in service - I admit, it's a friend from DEC that said that - and type "rowing", we get ten thousand references, in almost complete disorder, and I hope it is clear that the last part of the filtering is actually left to us, and we now have to wander through all these documents that were listed to us in a way only based on the occurence of the keyword we entered.

Or supposedly so: Three months ago, I was searching for a program by the Washington University called ftpd on the MetaCrawler (that searches on many different indexes), and after entering "Washington University ftp ftpd site wu-ftpd" I was startled to see that Altavista brought back in its 10 listed proposals the Playboy Home Page! (They have now fixed this particular "feature", whatever it was due to).

It is fairly easy to trick an index into showing your page most of the time, whatever the reader types as keywords, and even easier for the programmer of the index to slightly alter the results of the search, but for now I want to believe this was an unintentional mistake (I hope they didn't find a way to access my subconciousness!). Anyway, as more and more commercial interests enter the World Wide Web, it is possible that such incidents become intentional - as well as more subtle - at some point. After all, the robots that build the indexes databases don't ask for permission before storing your files on their databases, and it may be arguable to actually feed them with extra words so that your document is found more easily. (Next time you search for something on the Web, don't be too surprised if the first document presented to you is the WWW Virtual Library project ;-) .

Anyway, as the amounts of information available on the web are still in continuous expansion, dumb indexes might not be the ultimate solution. They already provide often over 100 000 answers to your request, and it is clear that we then enter a world where too much information kills information.

At the WWW Virtual Library, we believe that the only solution to this sorting problem is the good old human touch and common sense, so hard to teach to computers. One of our most active members, Dr. Matthew Ciolek, maintains a fascinating page on information quality.

Weaknesses of the concept

Holes: although the WWW Virtual Library is the oldest catalogue on the World Wide Web, since it was created along with WWW by the same person, Tim Berners-Lee, back at CERN in the early 90s, we have not yet managed to find volunteers to cover all the fields for which information is available on the Web. But we're working on it!
Quality inconsistencies: This is a more serious problem: among the fields that are currently maintained inside the WWW Virtual Library, some are clearly maintained better that others, and we need to work on a solution to fix that in the long term. When several persons or groups of persons postulate for the maintainance of a given field, one solution could be to gather a comittee of experts in the field to attribute the responsability of the maintainance of the subject, on a temporary basis, to the appropriate group.

These weaknesses are only temporary, and due to the extreme youth of both the WWW Virtual Library and its genitor, WWW.

developments

We are currently working hard on a scheme to provide efficient inter-subject communication, so that navigation inside the WWW Virtual Library becomes extremely easy and natural. There will be many new ways of accessing the righ page of the WWW Virtual Library, including an index.

Meanwhile, we continue to complete our coverage of all fields of information present on the Internet.

Conclusion

We are not the only catalogue of the Internet
Your help is welcome!
Please visit us at http://www.w3.org/vl
Thank you for your attention.

Arthur Secret (vlib@w3.org)