Lycos: Distributed Indexing


Draft: submitted to MIT Distributed Indexing/Searching Workshop
Not for citation


Position paper for Distributed Indexing/Searching Workshop at MIT

Charles Kollar, John Leavitt, Michael Mauldin
Lycos, Inc.
555 Grant St #350
Pittsburgh, PA 15219
kollar@lycos.com, jrrl@lycos.com, fuzzy@lycos.com
412-261-6660

ADEQUACY OF ROBOTS.TXT

Although Lycos has always supported the current robot exclusion standard, we note two deficiencies. First, the existing standard as written contains ambiguities about the exact syntax, leaving implementors to make their own choices. Second, the server-based model breaks down for sites with weak central support, ie: on-line service based home pages, company sites, and universities.

We will propose a draft rewriting of the current standard (reducing the ambiguity without adding new features) at the workshop.

We also suggest two extensions to the exclusion standard:

1. Use of the meta tag in HTML to provide file-specific robot access information for that file. That way a user who can only control his or her own directory can still exclude file from robot access.

2. The ability to exclude a file or directory from robot access using a file naming convention (e.g.: a prefix of "ns-" or "prv-" on a path element would stand for "no spidering").

SUPPORT FOR MULTI-ENGINE SEARCH

Support for search combining services such as MetaCrawler or SavvySearch is controversial to commercial search services such as Lycos. These services pay for the substantial costs of collecting and indexing information by placing advertisements on the output results, and to the extent that meta-searchers strip those ads from the results, this practice is theft.

The best way to keep these services from being denied access to searchers is to agree on a model for carrying ads along with the content, and to identify to the original server (in the user-agent field, for example) that the request is coming from a meta-searcher.

DUBLIN META-DATA

Lycos acknowledges the usefulness of standardizing field names for common meta-data elements such as subject, title, author, and so forth, and history shows that consensus at any useful level of detail is difficult or impossible, so we would recommend either wholesale acceptance of the Dublin Core, wholesale acceptance of a competing standard, or skipping this task.

One caveat from our experience with large free-form documents such as found on the web. If users are allowed to attach their own indexing terms to meta data, there will be some amount of gamesmanship by authors attempting to make their documents come our first in ranked retrievals. Our philosophy is that documents are their own best description, and the inclusion of "invisible" search terms invites this practice of "spam-dexing".

ACCEPTANCE OF SOIF AS A STANDARD

Lycos is still evaluating the usefulness of SOIF as a standard, and takes no position at this time.

INTRANET ISSUES

One issue that must be addresses when dealing with the Intranet is that of compartmentalized access. On the Internet, public usally means public, and all documents can be treated identically. Within a typicaly corporation, there are pockets of documents that are viewable only by certain individuals, and these policies must be respected by any corporate wide information resource.
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.


Last updated 19-Apr-96 by fuzzy@cmu.edu