It is quite unlikely that the current paradigm of multiple Internet search facilities, each running a competing robot that is trying, on its own, to cover the web, will remain viable in the face of the continuing observed growth. There are a variety of strategies with which this problem could be attacked. It is important to consider, for each strategy, its business implications as well as its technical viability. There is no point cooking up a dream technology that will seriously damage the interests of Netscape, Microsoft, and the flock of recently-IPO-funded Internet Index purveyors, because no such technology has any hope of adoption. I propose an overview presentation that attempts to enumerate these strategies and briefly outlines the technology and business implications, pro and contra, of each. Some of the approaches that will be included in this tour are: 1. Dividing the crawling problem 1.a Geographically 1.b by subject domain 1.c by network domain This obviously has to be "Plan A". Since the problem is too big for everyone to solve on their own, basic computer science suggests partitioning it. Each partitioning brings with it a set of advantages and problems. 2. Unifying the robots Internet search facilities compete (presumably) on the basis of standard information retrieval metrics such as precision and recall. But they all start at the same point: a lot of web pages. Is there a case to be made for centralizing crawling activity, and sharing its results? 3. Crawling less There is a lot of stuff on the web that needs to be crawled every day. There is a lot of other stuff that needs to be crawled only once [e.g. the text of the U.S. Declaration of Independence and of "Pride and Prejudice"], and there is a very large amount of stuff that arguably never needs to be crawled at all. Is there a basis for joint work in formalizing some metrics here? 4. Sharing the burden with the providers Being indexed is a net benefit to the providers of information. Given that current crawling strategies are likely to break down in the face of Internet growth, it is reasonable to ask them to participate with a small amount of the effort. Some ways they could do this are: 4.1 Provider-push crawl requests 4.2 Provider-generated metadata [subject category, volatility, etc.] 4.3 Canonical URL notification - a huge win in terms of duplicate control My personal position is that, to some degree, application of all of these strategies is essential for success. I am the chief designer and implementor of the Open Text Index and in particular its crawling/indexing subsystem, "Firewalker". - Tim