PetaPlex Project

The PetaPlex Project is a project funded by the US Intelligence Community to develop feasible architectures for very large-scale digital libraries -- to meet the future needs of the community and those of large-scale commercial applications. The specific goals targeted in the current phase of the project is to develop an architecture capable of scaling to 20 petabytes on-line with subsecond response time to access random, fine-grained URN-specified objects, at a sustained rate in excess of 30 million tranactions per second. The current statement of work calls for integrating one million, 20 Gb disks into a coherent system that can attain these performance objectives --- at acceptable cost. To achieve this level of throughput, the current prototype resolves URN's -- finds, fetches, and displays/executes -- in a single packet round-trip and a single seek. To achieve cost feasbility, the architecture is "massively simple" -- it consists only of simple, commodity-cost, COTS technologies that enable near-automatic construction and maintenance of the system. A principal part of the architecture involves the full-text search of the hypermedia-structured database for many concurrent searches, on the order of 100,000 on-going searches at any time. The scheme being explored is highly-parallelized, both for the incremental maintenance of the indexes, conducting searches, and storing results in persistent and accessible form.

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.