Back-link annotation set scenerio


I've had a small amount of interaction with the one or two members
of the Foresight Institute.  The Foresight Institute was founded by
K. Eric Drexler, the author of "Engines of Creation", a book primarily
about nanotechnology.

They are extremely interested in back-links.  A back-link is a link
from a target document back to its referencing documents.  For example,
if documents A, B, and C each contain hypertext links to a document
D, the list of back links for D are the URL's for A, B, and C.

The back-link problem has been in the hypertext literature for a
long time and annotation sets offer a potential solution.  It also
illustrates a truly gigantic annotation set!


In this scenerio, an organization decides to provide a back-link
annotation set as a public service to the Internet.  The back-link
database is going to be seeded by a well behaved Web spider (i.e.
one that adheres to robots.txt directives.)  Each time the Web
spider encounters a hypertext link from a document A to document B,
it adds A's URL and document title to B's back-link list.  In addition,
people can manually feed additional documents into the back-link database
via simple CGI scripts.

For example, if an author A wishes to write a refutation to author
B's document, they simply insert a hypertext link from A's refutation
document to document B.  Next, document A's URL is submitted to the
back-link annotation set server via a simple CGI script.  Since
document A has a link to document B, A's URL is added to B's
back-link list.  Anybody who subscribes to the back-link annotation
set, will see a back-link annotation that contains the title of A's

At times, people will encounter back-links in the back-link annotation
set that are no longer valid (i.e. document A is deleted or no
longer has a reference to document B.)  Another simple CGI script
is used to allow people to point out when a document has gone
stale.  Alternatively, if the Web spider ever revisits the document,
it can discover on its own that the document is stale.

Users of the backlink annotation set, simply subscribe to it and
follow any interesting back-link annotations discovered.

Given how big the Web currently is and how big it is likely to grow,
it does not take much imagination to see that a back-link annotation
set is going to be mongo big.  Any organization that undertakes the
effort of constructing such an annotation set is going to have to
commit serious amounts of resources to pull it off, i.e. many machines
with fat pipes and big disk farms distributed world-wide with a very
sophisticated distributed Web spider.


Even though I wrote up this scenerio, I would like to propose that
we agree in advance that supporting such gigantic annotation sets
not be *required* by any design we do.  If we do come up with a design
that *can* support such a large annotation set, that would be great,
but let's not contort the design to support such an extreme design