Virtual Software Library

P. Lega, Z. Turk Ph.D., C. Webster, M. Barzun

The ability to download free software, updates, drivers, patches, and demos of commercial software remains one of the main motives for connecting to the Internet. This is not surprising: Unlike many other types of Internet data (such as news, reviews, video clips, and stock quotes), which all have proven and effective traditional distribution channels to compete with (magazines, TV, radio), software's most efficient distribution channel is clearly the Internet. We can safely expect that in the future the role of the network in the distribution of software will grow and that we will see a dramatic increase in the number of commercial software packages distributed on the Internet as well an increase in the number of independent software authors who will be able to sell directly through the Internet.

Perhaps the most important advantage that the Internet has over other information distribution channels is that is reduces the information overload by offering the just-in-time information delivery (as opposed to just-in-case paradigm we experience in books and magazines). This means that if and when we need a certain kind of information, we go out and ask something to find it for us. The information the user brings into this search can be broken into two parts: 1) who do you ask and 2) how precise is your question. The better information one includes in the question, the more likely the most relevant information is found. Another factor (3) that determines the success of the search is the type of database being searched. For example, a search for a paper on demographic policies in SE Asia is likely to achieve more relevant results if the database searched is one edited by qualified librarians and not simply one consisting of every document on the Web.

Before the development of the VSL (Virtual Software Library) started in 1994, there were two ways to find software on the Internet. The first was to know the file name and then ask Archie which computers that program could be found on. Archie is a system that collects information on which files are stored on what servers. The second way was to go to a server "known" to be specialized in a certain type of software and try to find the file by looking into the catalogue of that site (structured edited information in a proprietary format).

The design goals behind the VSL were (a) to create one engine that could find the most relevant software available on the Internet, (b), to make use of the edited, structured information on the files, which were ready and available on the specialized servers (c) to help make access to those files easy, reliable and fast and (d) to make this in such a way, both technically and organizationally, that any piece of software could appear in the VSL. We assume that the trend toward software being written as components (Java, ActiveX, etc.) will dramatically increase the number of software files distributed over the Internet. We do not believe that a single organization can possibly maintain a high quality catalogue of all that and keep it up to date. On the contrary, we are convinced that there are specialists out there, who do a great job of cataloguing a certain type of software such as games, winsock utilities, ms-dos files, etc. We are sure no one can make a catalogue of IBM software as well as IBM can - but we are also convinced that there should be one place too look both for IBM's and other independent software vendor's products. This makes the VSL conceptually very different from someone who might simply buy a few CD-ROMs and put them on the Web under their own label. The catalogs included in the VSL keep full credit for their work.

The initial information schema of the VSL (Turk, 1995) allowed for such distributed cataloguing, registration, and downloading of software, but centralized the search services. The VSL database has been freely available to several search sites (called VSL Front Desks) on 4 continents. Since the VSL has been taken over by c|net: the computer network in May 1995, the major next step was the publishing of a standard format for the description of software archives (VSL-OF1, the Virtual Software Library Open format 1.0).

Several major software companies such as Adobe, IBM, Intel, Lotus, SGI, and Borland have reorganized their archives so that they could be included in the VSL. Recent developments include the ability to register not only whole archives but single files and richer file level information.

Over the last year of explosive growth, the VSL team gained insight into creating a huge meta-archive from harvesting externally generated index and archive information via the VSL-OF1 standard:

Most archivists have limited resources locally to create and maintain accurate and useful content data. We have felt a clear need for the meta-archive to provide tools for its members use to assist generating data for the archive system.
A meta-archive software system must be flexible and able to "adopt" and remotely manage the indices of otherwise non-indexed archives.
A meta-archive system must support both the tremendous power of terse broadband searching, and the value-added by assembling and editorially enhancing its more popular and timely content.
Maintaining a consistent interface for using and participating in this system has helped greatly, but the larger challenge has been to provide the supporting automation tools and resources to fuel the archives growth.

We have started solving the problem of poorly described content by implementing tools that allow in-house editorial staff to add value to the harvested information. This feature is key when attempting to present meaningful results to the end-user from sparsely annotated remote archives. This differentiaton from most software search engines makes the VSL more useful to less-technical end users.

A balance of interface standards and flexible modes of archive participation appear to be the key to the success of a broad user base meta-archive.

Call for Papers

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.