Technical and Privacy Challenges for Integrating FOAF into Existing Applications

Abstract

The appeal of Friend-of-a-Friend (FOAF)-enabled applications, and thus to a large extent the success of FOAF itself, depends on having enough people creating and maintaining FOAF files that the network achieves critical mass. While initial FOAF adoption by individual users on their Web sites (or using certain publishing tools) has been encouraging, chances for widespread adoption will be greatly increased if existing companies and organizations with large user bases can be persuaded to offer FOAF support for their existing users' data. In particular, social networking and contact management services represent an important target for the FOAF community because they are building and maintaining exactly the set of data that FOAF describes (personal information and links to other people). It is thus vital that we understand what hurdles these services face when considering the addition of FOAF-enabled features.

In this paper, we examine a number of barriers to adoption which derive from specific technical and privacy issues facing organizations that are considering integrating FOAF support into their current applications. The issues discussed here emerged as a result of a startup's (Plaxo, Inc.) research and engineering efforts with FOAF and represent both practical problems and ethical dilemmas, each of which will need to be addressed by any organization considering the use of FOAF. We believe that widespread adoption of FOAF will only be possible once organizations have a roadmap that is both technically clear and economically and ethically sound. The goal of this paper is thus to articulate the issues, discuss some possible responses, and illicit community feedback in order to craft real solutions.

Introduction

The Friend-of-a-Friend (FOAF) project is an instance of a popular and growing trend in Internet technology development towards open, machine-readable access to data and computer programs distributed across the world. The general idea is that if it's easy to get programs to talk to other programs and share data, without requiring that the engineers of each program form an explicit partnership, a whole new class of "Semantic Web" services can be built that automatically stitch together disparate aspects of life. This in theory should decrease friction and redundancy in the user experience (e.g., having to enter the same information twice for two different services) and enable novel extensions to existing services to be built in a distributed fashion (e.g., visualizing your contact list as a network).

In the case of FOAF, the innovation is to provide a standard format (based on RDF) for describing a person's contact information, as well as his/her relationships to other people. (FOAF is also used for describing a person's projects and other material, but for the purposes of this paper we fill focus on contact information and links to other people, since this is the core data that enables a network of people to be assembled.) In addition to providing your contact information in the form of a business card or on your Web page, if you make it available in FOAF, other programs can find and interpret it automatically. This would make it easy to pop up a quick biographical sketch of a person anytime you visit a page they made, or to add that person's contact information to your address book. In addition, if you list your friends and associates in your FOAF file, programs can go find these people, and if they too use FOAF, it can show you their contact info and their friends, and so on across an emergent worldwide social network.

Proprietary social networks like Friendster, Orkut, LinkedIn, and others have attracted millions of users to fill out a personal profile and link to their friends, usually with the objective of dating or business networking. The success of these networks depends on having enough people join that you can find people you're interested in—a small network holds little value because you don't know or care about any of the people on it. Thus as new networking services are released, there is pressure to grow them as quickly as possible. Given the time-consuming and repetitive task of entering your contact information and friend list into yet-another-social-networking-service (YASNS), people have called upon these sites to offer import and export of member data in a standard format such as FOAF. In February 2004, tribe.net announced their intention to support FOAF, which sparked a fervor of additional calls for similar services to follow suit.

In theory, offering FOAF support for a social networking service could have several benefits in addition to easing user acquisition (input). By allowing members to publish their data in FOAF (export), third parties could build additional features that leverage and extend the value of the original network. Just as many services now offer open APIs to support the development of plug-ins, opening the data on social networks would let people build new and exciting applications that members could immediately take advantage of.

The author of this paper is a senior software engineer at Plaxo, a company that helps people keep their address books up-to-date by automatically synchronizing contact information between friends and associates. Members each maintain their own contact information and decide with whom they want to share it. If Todd has permission to see Ryan's home information, for example, then when Ryan gets a new apartment and updates his Plaxo cards, Todd's address book will automatically be updated with Ryan's new address. Plaxo is not strictly a social networking service (though we're often considered one): like most social networking services, members provide Plaxo with their own contact information and their address book, but unlike most of these sites, a member's address book is kept strictly private (since it contains potentially sensitive contact information) and thus there is no social browsing from friend to friend.

At Plaxo, we've long been interested in the FOAF project, since Plaxo maintains exactly the data that FOAF describes, and our goal is to create a global and ubiquitous contact network that is available in all applications requiring contact information. The frustration mentioned above at having to enter contact information over and over again is precisely the problem that Plaxo is designed to solve. And while initially we chose to focus on integrating with Microsoft Outlook and Outlook Express because of their large market share, we would like people to be able to Plaxo-enable all of their favorite applications. In support of this vision, we created a SOAP API that partners can use to talk to the Plaxo network. We also created a number of internal prototypes that use FOAF to respond to contact information requests (input) and publish members' contact information and address books on the Web (output). These prototypes quickly raised a number of technical issues and privacy concerns for which we couldn't find easy answers, and which prevented us from releasing most of this work publicly (though we have released limited FOAF output for Plaxo members).

After Tribe's announcement renewed the clamor for companies to support FOAF, we decided to ask the FOAF community for advice on how to solve the problems we had encountered. We published an article on Plaxo's Blog entitled "Plaxo and FOAF: What's the right model?" which started a lively and productive discussion. The goal of this paper is to elaborate on the issues that we found while trying to answer the rallying cry of the FOAF supporters and to continue discussing solutions.

In short, our conclusion is that in order to get more companies to answer that cry, the FOAF community should work to develop a practical "FOAF integration guide" that explains how to add FOAF-enabled features and provides a compelling case for doing so. This involves addressing both technical challenges (how to make it work) as well as privacy concerns (how to make it acceptable). The balance of this paper explores these issues in turn, concluding with a discussion of their implications for the companies and for FOAF itself.

Technical Challenges

Even if companies understand the potential benefits of supporting FOAF, there are still a number of technical issues that must be addressed before support becomes practical. These issues range in scale from logistic details to additional infrastructure requirements. Confronting them is a necessary part of "scaling up" FOAF to tackle today's large scale applications of social software.

Extensibility

While the data Plaxo stores for each member closely resembles the data stored in a FOAF file, Plaxo members store more contact fields in their cards than FOAF currently supports (examples include job title and birthday). Plaxo stores essentially the same set of contact fields used by Microsoft Outlook and the standard vCard format (it's worth noting that Plaxo members can publish their contact information as a vCard that always stays up-to-date). When importing or exporting FOAF, we try to map the contact fields that have an equivalent representation (e.g. mbox, phone, and workplaceHomepage), but ideally members should be able to preserve the full richness of their contact information (and the contact information in their address book) when moving in and out of FOAF. While the FOAF specification could be amended to store additional contact fields, the more general problem is that any service will have some of its own special fields that aren't covered in the standard. The solution is either to adopt a lowest-common-denominator approach in which extra information is simply lost, or to provide a standard mechanism for extending FOAF in such a way that this information can be preserved and eventually understood by additional services.

Since FOAF is based on RDF, and commonly represented in XML documents that can mix tags from multiple namespaces, in principle this should be possible. For example, there's a W3C submission on representing vCards in RDF that's used by eventSherpa (among others) to augment the FOAF files they generate for members (see, for example, Paul Cowles's profile). However, the proposed schema appears not to have been updated since February 2001, and it has not been adopted as a recommended standard yet. There is also ContactML, but it too appears to have limited support and no official status.

To address extensibility, we believe the FOAF community should do two things. First, it should establish a set of "best practices" and examples of how to extend FOAF when additional data needs to be represented. If there are clear limits to what data people should and shouldn't try to add to a FOAF file, those should be articulated. Second, the community should maintain a repository for common extensions to FOAF, so that if a number of organizations want to represent a similar set of additional fields, those fields can either be added to the FOAF standard itself, or else standard extensions can be agreed upon so that sites can continue to read each other's output. To ignore the need for extensibility and instead focus only on sharing the current information expressible in FOAF would be a mistake, as it would prevent full integration with existing services and stifle innovation. However, to go about extending FOAF in a haphazard and disorganized manner would risk losing the standard format that makes FOAF exciting in the first place.

Permissions

FOAF files today are static and public—users decide what information to include and everyone in the world can see that information. In contrast, Plaxo members can set up detailed permissions about which users can see what of their information. Most members allow people that already know their e-mail address to see their public (business) information, but restrict access to their private (home) information to a select group of friends and associates to which they grant explicit permission. Furthermore, the contents of a member's address book are completely hidden from other members, though one could also imagine letting members opt to share their address book with trusted contacts. Given the strong emphasis that Plaxo places on protecting its members' privacy and giving them complete control over what information is shared and with whom, it is difficult to release much if any of their data as open FOAF files available to any Web surfer.

For Plaxo's feature that lets people publish their contact information on a Web page (mentioned above), members check whether to publish their business information or personal information (or both) and a URL is generated with a special key that will only display the requested information. This is also how HowdyCard handles privacy: users can keep several cards, each with their own password/URL to hand out to qualified recipients. A similar approach could be taken for publishing FOAF files, but it has a number of drawbacks. First, it results in several different URLs being generated to share different amounts of information (one for business information, another for personal information, etc.). This is cumbersome and potentially confusing, both for users ("which URLs do I give to whom?") and for computer programs like aggregators that might use the data ("how do these files relate to one another?"). Second, there is nothing directly preventing unauthorized users from viewing the protected information; it is up to members to keep their private URLs private, but this can be difficult when trying to share them with friends over e-mail and the Web, which are generally insecure and prone to leaving a trail.

One solution that has been proposed by useful inc. is to encrypt sensitive information using PGP and to store it in a separate file that is linked to the main FOAF file using the rdfs:seeAlso relation and the wot vocabulary. The idea is that anyone can see the public information, but only people with the right private keys can decrypt the additional sensitive information, and there is only one URL for everyone (since the extra information is linked to from within the main document). This is a clever technique, but it requires that everyone who wants access to the private information get a public/private key pair and publish the public key in advance. Furthermore the sensitive information needs to be encrypted using all of the public keys of would-be recipients. This means that if you want to give a new person access to your personal information, you need to get their public key and re-encrypt your sensitive file with this additional key. An alternative might be to use a single private key as a "password" and to give this key to trusted contacts, but while this simplifies publishing and permissioning, it ends up being more like the original scheme of distributing secret URLs (which are essentially passwords) and relying on security through obscurity.

Like extensibility, adding support for permissions and restricted viewing is essential for breaking FOAF out of a one-size-fits-all model and allowing more complex services to embrace FOAF without compromising data richness or privacy. We don't have a clear solution at this time and community feedback will be essential in deciding how to leverage existing technologies and how much responsibility should be placed on FOAF itself.

Authentication

The main difficulty in adding a permission model to FOAF is that there's no authentication of the person viewing a given FOAF file. Since FOAF files are static Web pages, they're available to any person (or computer program) that happens upon them, and there's no way to look up who's requesting to see a member's FOAF file and what permissions they've been granted. Within Plaxo, all our members are authenticated by their e-mail addresses (which are verified with a round-trip) and they each have a password, so when one member wants to look at another member's contact information, Plaxo knows exactly what information to show. Thus Plaxo internally supports restricted access to information without requiring extra URLs, private keys, or other shared passwords. If Plaxo were to let members publish their contact information using FOAF, we could provide a single URL for each member and require that other members login before getting access to the FOAF file. Once logged in, we could dynamically generate the FOAF file with the information the viewer was authorized to see.

This is fine for Plaxo members viewing each other's contact information, but the premise of FOAF is that it's an open and distributed standard in which no one company or organization is in control. If a non-Plaxo member wants to view a member's FOAF file, Plaxo has no choice but to just present public information. What's needed is an authentication scheme that's as distributed as FOAF itself—one in which users can grant each other permissions in a standard format, and people that want access to private information first authenticate using a standard mechanism. The complex and well-studied technical details of distributed authentication schemes are outside of the scope of this paper, but see for instance the Liberty Alliance project or Drupal. In any case, some form of distributed authentication is required to properly enable permissions on top of FOAF files.

One interesting approach to distributed permissions for contact information is offered by clink systems. Each user is identified by a "contact link" (or "clink" for short), which is essentially a URL like joseph.plaxo.com. Rather than being granted by a central authority, anyone can make a clink for themselves using their own domain. Users grant each other permissions by attaching each other's clinks to their contact info. So to reuse the example above, Ryan can give Todd permission to his contact info by attaching Todd's clink to it. When Todd requests Ryan's information, his clink is there so he is granted access (assuming that both Todd and Ryan are using clink-enabled servers). Todd still has to login to a clink server to access any data that's been granted to him (authentication is handled via public-key encryption) so he can distribute his clink without compromising his own privacy. This is essentially a simplified and customized form of PKI authentication, but it still requires that everyone generate and distribute clinks (that don't change) and maintain private passwords. It also requires that everyone hosting sensitive data run a clink server to handle authentication. It is thus somewhere in between a data standard like FOAF and a web service like an API.

Privacy Challenges

While the technical logistics of extending FOAF and adding authenticated permission controls are certainly important and need to be worked out in more detail, the more fundamental issue holding back widespread adoption of FOAF is privacy. As described above, FOAF files are inherently public and essentially make accessible to everyone a person's contact information and address book. Many current social networking and related services have found that their customers regard some of this information as highly sensitive, and thus provide members with a great deal of control over what gets shared with whom. For example, Orkut lets members restrict most of their information to just their list of friends, or additionally all friends of friends (as opposed to making it available to everyone). Different fields can be marked with different levels of permission, so members can decide individually what information is particularly sensitive. Similarly, Plaxo lets members decide separately who has access to their work information and their home information, and a member's address book (both who's in it and their respective contact information) is kept strictly private. These privacy safeguards are essential for users to trust and feel comfortable using these services. While the previous section discussed technical challenges in implementing a permission system on top FOAF, this section discusses what's at stake with sharing data in the first place.

Deciding how much to share

Currently Plaxo, and organizations with similar privacy standards, would be unable to publish much of their members' data as FOAF—particularly the contents of members' address books—without violating their own privacy policies. FOAF files without any foaf:knows links are of limited value since they are isolated nodes that cannot be connected to the larger network. Even if we only published biographical data in FOAF, it would still be difficult to automatically release much information, since even what members designate "public" is currently understood to mean "available to other authenticated members of Plaxo that already know my e-mail address". This is a much stricter standard than "anyone with a Web browser". Each service has its own privacy policy, but few if any would allow member data to be released publicly without explicit user consent. Thus FOAF-support will likely need to be an opt-in feature, in which little or no information is made available by default, but members can elect to share more information if they see it as valuable.

While requiring users to opt-in to making their information available as FOAF is necessary to ensure privacy, it introduces a number of additional challenges and drawbacks. First, it will realistically mean that only a small fraction of the network will end up with FOAF files, since opting-in requires learning about the feature, understanding the benefits, and going through an activation process. At Plaxo, we have found that most users tend to keep the default settings we provide, and relatively few "power-users" spend a lot of time exploring the other options that are available. Given that the impetus for getting major social networking sites to support FOAF is to dramatically increase the number of available FOAF files, the challenge is to protect privacy without destroying the value that FOAF adoption was supposed to create in the first place.

Even if members are told that they can publish their information as FOAF, it may be difficult to explain the immediate benefit of doing so. Currently there are very few compelling applications that take advantage of FOAF, largely because there are very few FOAF users driving such applications to be built. This is a classic chicken-and-egg problem, in which the draw to get users to embrace FOAF depends in a sense on their having already embraced it. Furthermore, every feature that is added to a service is a risk: it complicates the interface, dilutes the existing feature set, and requires explanation and support. While awareness of FOAF has spread quickly in its short life, in a mass consumer service it is a safe assumption that the vast majority of users will have never heard of it. Thus it will be asking a lot of companies to potentially compromise their members' privacy—even if it just means asking their users to opt-in to such a service—given that the take-rate and immediate benefits are both likely to be low.

Data ownership vs. privacy

The conclusion from above is that in order to protect privacy, users will generally be required to opt-in to sharing their contact information and address books using FOAF. However, even getting the consent of the address book owner is not enough to remove privacy from the equation. If members publish the names and contact information of people in their address books, they are also potentially compromising the privacy of those contacts, who were never given the opportunity to object. There is a subtle but important tension between the rights of the data owner (in this case the owner of an address book) and the rights of the person whom the data describes. For example, if Todd wants to publish Ryan's contact information on the Web using FOAF, Ryan may object on the grounds that his privacy is being compromised. While his objection is understandable, should Ryan be able to tell Todd what he can and can't do with his own address book? Both parties are somewhat entitled, and clearly it can't be both ways—the decision must ultimately fall to either the owner of the address book or to the person contained therein.

According to US law, the information in a person's address book is the property of the owner of the address book and not the property of the individuals about whom the information pertains. In the case of Plaxo, no one can demand that their information be removed from someone else's address book—once it's there, it's the property of the owner. To make a physical analogy, once you give someone your business card, they can do with it what they want, and you can't demand it back. It seems ludicrous to suggest that you should be able to break into someone's house and steal back your business card, but in the digital world it becomes less clear, and some have called for an alternative scheme in which the person described by a piece of data is always in control of it.

At Plaxo, we sometimes get requests from a non-member that we remove all copies of his/her information from our members' address books on the grounds that he/she never consented to our storing of it in the first place. Our standard response is to remind the requestor of US data ownership law, but as a courtesy we contact the members and request that they remove his/her data anyway. To date, our members have always been happy to accommodate such requests, so the conflict between ownership and privacy has remained fairly benign.

This delicate balance is upset if ownership of data now potentially entails publishing that data on the Web. When using Plaxo, members' address books are only available to themselves and to Plaxo (so we can synchronize them between multiple computers and provide secure online access). Thus the major objection of people asking for removal of their information is either ideological or out of mistrust for Plaxo as a company. It is not, in other words, a reaction to any harm currently being done, but only an attempt to prevent future harm. If members are allowed to share their address books as FOAF, suddenly non-members may find their personal contact information being published on the Web, which can be demonstrably harmful. While it wouldn't technically break any laws for Plaxo to let members publish their address books on the Web, it would likely exacerbate a sensitive privacy concern, and thus we are wary to do so.

The challenge for the FOAF community is to articulate a middle path by which people can create and publish FOAF files, including the list of people they know, while respecting the privacy of those people whose information is released in the process. For example, members of Ecademy can publish their information and contact list in FOAF, but the only information that gets included about other members is their name and e-mail address—enough to establish a link, but not enough to give away sensitive contact information. In fact, even the e-mail address is encrypted using a one-way hash (SHA1). This makes it possible to look up a given e-mail address in a member's FOAF contact list (by comparing the hashed addresses) but the original e-mail address can't be recovered by a stranger and used for spamming. The advantage of this approach is that each member decides how much information about themselves to share, and the foaf:knows links are only used to "wire up" the network. The disadvantage is that members can't fully export their address books in FOAF—they lose all the extra contact fields in the process.

E-mail vs. SHA1

The sensitivity of publishing e-mail addresses is particularly high given the current prevalence of spam. An early innovation of FOAF was to offer SHA1-encyrpted e-mail addresses as an alternative to displaying them raw. As mentioned above this is sufficient for verifying a link between two known e-mail addresses, but it prevents unknown addresses from being leaked. Clearly however, e-mail addresses need to be shared at some point if people want to communicate with one another. Thus, a sub-challenge in sorting out when and how much information to display is developing a set of "best practices" for when to display raw addresses and when to hash them.

In the case of Plaxo, links between members are established by e-mail address (if I have your e-mail address in my address book and you join Plaxo and register that address, I can get your public information) so in all cases where data is being shared, the e-mail address is already known. Thus, we have not needed to hash or otherwise encrypt addresses, and in fact having raw addresses is critical to establishing connections. The compromise of Ecademy is to display raw addresses for users' own contact information (if they so desire), but to always use SHA1 for entries in their contact lists. This is basically avoiding the issue of ownership vs. privacy by side-stepping it (though someone could still complain that even their name was being published). But it follows the philosophy that a FOAF document is primarily about the author, and while the foaf:knows links can be used to establish relationships with other people, they are not intended to store their contact information.

If everyone has their own FOAF file, of course, there is no need to store information about other people because they are storing it themselves, but until this is the case, restricting the use of foaf:knows means deliberately dropping data that would otherwise be available. Many social networking sites require you to import your address book and store several contact fields for each entry. If FOAF is supposed to make it easy to maintain your data across several such sites, such a restriction may run counter to its original intent.

Conclusions

Widespread adoption of FOAF would be greatly aided if existing applications with large user bases can be convinced to let their users publish their contact information and address books using FOAF. Advocacy alone, for instance stating to these companies that they should embrace open standards or die, is unlikely to be sufficient because there are real technical and privacy issues involved in such a release of information. Thus, the best course of action for the FOAF community is to carefully consider these problems and develop a set of technical solutions and privacy-conscious best practices that they can clearly articulate. These include supporting extensibility, adding permissions with authentication, deciding what information should be shared in what circumstances, and respecting the rights and wishes of both the people that own the data and the people that data describe.

Developing a FOAF integration roadmap will do more than just making it easier for companies to offer FOAF support. It will also serve to direct the evolution of FOAF itself. The issues raised above suggest several underlying tensions in the design of FOAF. Is FOAF just a static data format, or could it evolve to be more like an API, including a mechanism for granting permissions and authenticating requestors? Should foaf:knows links be used only to point to other FOAF files, or can they safely be used to richly represent a user's address book. How should FOAF trade off the use of raw e-mail addresses and SHA1 sums? These questions require pragmatic answers for FOAF to transition from a research project to a mainstream technology. But they also offer novel research challenges that will provide the fuel for a new phase of experimentation.

Listing of Hyperlinks (in Order of Use)

Submitted: 22 July 2004 (revised 15 August) by Joseph Smarr (joseph-foaf@plaxo.com).