A cybermap is a map of connections between documents (or parts of large documents) which he builds by looking at text alone. He builds a keyword vector for each document, and then generates a "similarity" matrix for each document pair. (See his paper for the algorithms.) Keywords are weighted by their power to distingish documents.
Given the list of similarities, he uses a fast and simple method to arrange the documents first into clusters, then into one big tree, the minimum spanning tree for the similarity matrix. This algorithm involves using the (similarity, doc1, doc2) triples in order of descending similarity to make a links between related documents. He limits the links to one per document to avoid getting a mess. When every document has exactly one link, there are a number of separate trees which are his clusters. He then links the clusters, starting with the one created last (and therefore having the weakest links), finding the strongest similarity between documents in that and some other cluster. The result is a single tree. He picks as root node the document which had the highest weight of interesting words.
Peter has tested his work on a simple catalogue of dinosaurs, using a Macintosh, and on a set of 50 documents using MIT's Connection Machine. He uses large quantities of computer power to generate the similarity matrix.
Peter has also used his system on mail messages and news articles. It is unfortunate that it's not very easy to incrementally add new material. This could be an interesting sort of tool for generating a browsable intelligible web out of a mess of mail, news and random documents.Tim