SPIDERING BOF REPORT
Report by Michael Mauldin (Lycos)
(later edited by Michael Schwartz)

While the overall workshop goal was to determine areas where standards
could be pursued, the Spidering BOF attempted to reach actual standards
agreements about some immediate term issues facing robot-based search
services, at least among spider-based search service representatives who
were in attendance at the workshop (Excite, InfoSeek, and Lycos).  The
agreements fell into four areas, but we report only three of them here
because the fourth area concerned a KEYWORDS tag that many workshop
participants felt was not appropriate for specification by this BOF
without the participation of other groups that have been working on that
issue.

The remaining three areas were:

1.  ROBOTS meta-tag

	<META NAME="ROBOTS"
              CONTENT="ALL | NONE | NOINDEX | NOFOLLOW">

	default = empty = "ALL"
	"NONE" = "NOINDEX, NOFOLLOW"

        The filler is a comma separated list of terms:
        ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW.

        Discussion: This tag is meant to provide users who cannot control
        the robots.txt file at their sites.  It provides a last chance to
        keep their content out of search services.  It was decided not to
        add syntax to allow robot specific permissions within the meta-tag.

        INDEX means that robots are welcome to include this page in
        search services. 

        FOLLOW means that robots are welcome to follow links from this
        page to find other pages.

        So a value of "NOINDEX" allows the subsidiary links to be explored,
        even though the page is not indexed.  A value of "NOFOLLOW" allows the
        page to be indexed, but no links from the page are explored (this may
        be useful if the page is a free entry point into pay-per-view content,
        for example.  A value of "NONE" tells the robot to ignore the page.

2.  DESCRIPTION meta-tag

	<META NAME="DESCRIPTION" CONTENT="...text...">

        The intent is that the text can be used by a search service when
        printing a summary of the document.  The text should not contain
        any formatting information.

3.  Other issues with ROBOTS.TXT

        These are issues recommended for future standards discussion that
        could not be resolved within the scope of this workshop.
 
        - Ambiguities in the current specification </li>
                   http://www.kollar.com/robots.html
 
	- A means of canonicalizing sites, using:
                   HTTP-EQUIV	HOST
                   ROBOTS.TXT	ALIAS
 
	- ways of supporting multiple robots.txt files per site ("robotsN.txt")
 
        - ways of advertising content that should be indexed (rather than
          just restricting content that should not be indexed)

        - Flow control information: retrieval interval or maximum
          connections open to server