Lycos: Distributed Indexing
Draft: submitted to MIT Distributed Indexing/Searching Workshop
Not for citation
Position paper for Distributed Indexing/Searching Workshop at MIT
ADEQUACY OF ROBOTS.TXT
Although Lycos has always supported the current
robot exclusion standard,
we note two deficiencies. First, the existing standard as written contains
ambiguities about the exact syntax, leaving implementors to make their own
choices. Second, the server-based model breaks down for sites with weak
central support, ie: on-line service based home pages, company sites, and
universities.
We will propose a draft rewriting of the current standard (reducing the
ambiguity without adding new features) at the workshop.
We also suggest two extensions to the exclusion standard:
1. Use of the meta tag in HTML to provide file-specific robot access
information for that file. That way a user who can only control his or her
own directory can still exclude file from robot access.
2. The ability to exclude a file or directory from robot access using a
file naming convention (e.g.: a prefix of "ns-" or "prv-" on a path element
would stand for "no spidering").
SUPPORT FOR MULTI-ENGINE SEARCH
Support for search combining services such as MetaCrawler or SavvySearch is
controversial to commercial search services such as Lycos. These services
pay for the substantial costs of collecting and indexing information by
placing advertisements on the output results, and to the extent that
meta-searchers strip those ads from the results, this practice is
theft.
The best way to keep these services from being denied access to searchers is
to agree on a model for carrying ads along with the content, and to identify
to the original server (in the user-agent field, for example) that the
request is coming from a meta-searcher.
DUBLIN META-DATA
Lycos acknowledges the usefulness of standardizing field names for common
meta-data elements such as subject, title, author,
and so forth, and history shows that consensus at any useful level of
detail is difficult or impossible, so we would recommend either wholesale
acceptance of the Dublin Core,
wholesale acceptance of a competing standard, or skipping this task.
One caveat from our experience with large free-form documents such as found
on the web. If users are allowed to attach their own indexing terms to meta
data, there will be some amount of gamesmanship by authors attempting to
make their documents come our first in ranked retrievals. Our philosophy is
that documents are their own best description, and the inclusion of
"invisible" search terms invites this practice of "spam-dexing".
ACCEPTANCE OF SOIF AS A STANDARD
Lycos is still evaluating the usefulness of SOIF as a standard, and takes no
position at this time.
INTRANET ISSUES
One issue that must be addresses when dealing with the Intranet is that of
compartmentalized access. On the Internet, public usally means public, and
all documents can be treated identically. Within a typicaly corporation,
there are pockets of documents that are viewable only by certain
individuals, and these policies must be respected by any corporate wide
information resource.
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.
Last updated 19-Apr-96 by fuzzy@cmu.edu