It is quite unlikely that the current paradigm of multiple Internet
search facilities, each running a competing robot that is trying, on
its own, to cover the web, will remain viable in the face of the continuing
observed growth.

There are a variety of strategies with which this problem could be attacked.  
It is important to consider, for each strategy, its business implications
as well as its technical viability.  There is no point cooking up a
dream technology that will seriously damage the interests of Netscape,
Microsoft, and the flock of recently-IPO-funded Internet Index purveyors,
because no such technology has any hope of adoption.

I propose an overview presentation that attempts to enumerate these 
strategies and briefly outlines the technology and business implications,
pro and contra, of each.  Some of the approaches that will be included
in this tour are:

1. Dividing the crawling problem
 1.a Geographically
 1.b by subject domain
 1.c by network domain
 This obviously has to be "Plan A".  Since the problem is too big for 
 everyone to solve on their own, basic computer science suggests
 partitioning it.  Each partitioning brings with it a set of advantages
 and problems.

2. Unifying the robots
 Internet search facilities compete (presumably) on the basis of standard
 information retrieval metrics such as precision and recall.  But they
 all start at the same point: a lot of web pages.  Is there a case to
 be made for centralizing crawling activity, and sharing its 
 results?

3. Crawling less
 There is a lot of stuff on the web that needs to be crawled every day.
 There is a lot of other stuff that needs to be crawled only once [e.g.
 the text of the U.S. Declaration of Independence and of
 "Pride and Prejudice"], and there is a very large amount of stuff
 that arguably never needs to be crawled at all.  Is there a basis
 for joint work in formalizing some metrics here?

4. Sharing the burden with the providers
 Being indexed is a net benefit to the providers of information.  Given
 that current crawling strategies are likely to break down in the face
 of Internet growth, it is reasonable to ask them to participate with
 a small amount of the effort.  Some ways they could do this are:
 4.1 Provider-push crawl requests
 4.2 Provider-generated metadata [subject category, volatility, etc.]
 4.3 Canonical URL notification - a huge win in terms of duplicate control

My personal position is that, to some degree, application of all of these
strategies is essential for success.  I am the chief designer and implementor 
of the Open Text Index and in particular its crawling/indexing subsystem, 
"Firewalker". - Tim