P2P and the W3C

See also Peer-to-Peer vs. Client-Server and a related thread on w3t@w3.org.

Introduction

Peer-to-peer (P2P) technology is destined to play a highly significant role in the near future. Protocols are currently in development that constitute a fundamental reworking of the client-server model upon which the Internet of today has largely been built. This document outlines some of the more interesting features and technologies of P2P networks, and considers their relevance to the W3C. In particular, it shows how some of the problems commonly encountered on the Web are resolved in a a well-designed peer-based network.

URI Persistence

One of the biggest problems with the Web, and a problem that the W3C has struggled to address, is that of URI persistence. We've even promoted a maxim to get the point across: "Cool URIs don't break". But despite our efforts, it is unlikely that we will ever win this battle. Even if we manage to convince the world to standardize on redirects and date spaces, we will never be able to prevent businesses from shutting down, organizations from running out of money, or people from dying. As long as a client-server model serves as the Web's foundation, URIs are going to break.

On a well-designed P2P network, however, links effectively never break. Since content is duplicated as it is requested, it remains on the network for as long as people continue to be interested in it. Less popular files may take longer to retrieve (more nodes must be queried before the file is found), but as long as even one individual periodically requests a given resource, it will always be available to the rest of the network. A given resource may ultimately be lost due to lack of requests (given finite disk space, some means of garbage collection must exist), but in practice this should not happen until interest in the resource has long since passed.

Hash Keys and Content Integrity

Content Hash Keys (CHKs) are remotely similar to Date Spaces, a somewhat arbitrary URI naming convention used to promote URI persistence. Unlike date space, however, hash keys are not arbitrary at all; they are statistically unique character strings generated from the content of the resource in question. Files inserted into the P2P network with names that correspond to a hash of their content ensure content persistence as well as URI persistence. If the content of a file changes it will no longer have the same name, thus existing URIs will continue to refer to the original file instead of the modified one. This solves another problem commonly encountered on the web, where the contents of a URI change unpredictably, sometimes to the detriment of those making the request. (Incidentally, content hashing is essential on a network where data may pass through many hands before reaching its final destination. If there were no way to verify data integrity, the network would quickly find itself at the mercy of malicious nodes and unreliable connections.)

Public Keys and Private Domains

A second file naming scheme seen on P2P networks uses public key cryptography to create "namespaces" under which content can be arranged hierarchically, similar to filesystem directories. This is done to address the immutability of content under the CHK naming convention. If a source can be trusted, one may want to use a URI that can reference mutable content. It may also be desirable to search for other content that has been inserted under the same public key (implying the content has the same author). Public key content insertion allows authors to stake out unique "domains" in a P2P network, and allows peers to build a database of trusted content providers. This addresses one further shortcoming of the Web: Short of cracking someone's public key, there is no way to perform the equivalent of DNS spoofing in a P2P network.

The Anarchists Internet

Though the term may have negative connotations, Internet anarchy is is really about giving power to individuals and small organizations, leveling the playing field with corporate monoliths. Because of the costs associated with maintaining a site with high storage and/or bandwidth requirements, it is difficult to compete against entities with vast sums of money at their disposal. On a P2P network, however, everyone is equal this regard. Ample disk space and bandwidth are provided to anyone with something interesting to say, not just the ones who can afford it. A user with no more than dial-up connection to a local ISP can disseminate information as effectively as a mega-corporation with mirrors around the globe.

Security Through Obscurity

There are many ways of compromising an individual's rights to security and privacy on the Web, largely because it is very easy to identify the origin and destination of the content being transferred. Individuals have essentially no way of preventing determined parties from tracking the content they are requesting, nor any well-established ways of providing content without fear of being revealed. Under the less autocratic governments enjoyed by most W3C members we may not see the danger in this, but despotic speech and press restrictions still exist in many countries around the world.

In contrast to the client-server Web, peer-based distribution models are capable of heavily obscuring the data transfer process. This is accomplished by having content requests forwarded from peer to peer, and then having the content passed back through every node in the chain. For example, a malicious individual intercepting a content transfer between two nodes, say D -> E, would not be able to conclude that E made the request, or that D was the content provider. From a network traffic standpoint it is much more likely that some remote node Z requested the resource, node A provided it, and D and E were simply unknowing middlemen. [Note: If nodes make no data checks, or if data is encrypted, holding D or E responsible for the content being transferred is equivalent to holding a mail carrier responsible for the contents of the mail. Still, it has been theorized that sophisticated network analysis could potentially reveal providers and/or requesters under this model, and there are efforts to find ways of thwarting such attacks.]

Conclusion

Recent P2P developments are bringing the technology very close to the "breakthrough threshold" where it will begin to gain wide public acceptance. Much of this will be due to the significant advantages P2P offers over the traditional Web. Some of the Web's most obstinate problems that we try to address at the W3C have already been solved in P2P networks, despite their relative immaturity. But the sophistication of recent P2P developments has remained below the horizon of the major standards bodies, the W3C included.

It has been brought up in other discussions that the W3C might serve as a host to third party projects and standardization efforts. Considering that it is unlikely we would be able to form an in-house P2P development team that could compete with the grassroots open-source efforts, this may be an ideal opportunity to test the feasibility of bringing one or more existing external projects under the auspices of the W3C. If we were to act early, we would have the opportunity to approach the groups of our choice; and if we were successful with our persuasions, the W3C would establish a strong presence in the nascent P2P standards and development arenas.


The technical information above was gleaned from numerous sources, though mostly from the mailing list and web page of The Freenet Project, where some of the most sophisticated P2P developments are currently underway. W3C team members might find particular interest in the Freenet Protocol.


Michael Carmack, 2001/05/16