ICCS, School of Informatics
University of Edinburgh
Now that the Social Web has finally reached truly widespread adoption, the question remains: Why can't users have their data back? To flip the question, what could application developers and companies create if they had access to the masses of social data currently spread throughout the Web? Luckily, the building-blocks that allow us to open up these closed walls of data exist right now from OpenID to OAuth to FOAF; all that is missing is a strategy for putting it together. As has been said before, you may not be interested in strategy, but strategy is interested in you. Or to be more particular, your data.
Social data portability and privacy are usually viewed as opposing forces. Yet data portability and privacy are mutual benefits that a framework for a mature Social Web could bring users. A "walled garden of data" fundamentally leads to less security and privacy for users. For example, the lack of portability does not imply privacy, since the data of users may be data-mined and is all-too-portable, for it can be easily sold to parties unknown to the users without their explicit knowledge. This lack of portability has made it common practice for many social web services to ask users to give them direct access to their e-mail inboxes, which is a highly insecure practice that can easily lead to hacker attacks. For a company, it is clearly less legally and socially dangerous from a business perspective to get explicit permission from users to data-mine their data, share it with other companies, and reveal it in public on the Web. Furthermore, since the social networks of users are spread throughout the Web in a fragmentary manner, it would be in the best interest of companies who wished to do better profile analysis in order to improve services like targeted-advertising to have a standard for social data portability. There are two components to this story: the first is a story about allowing the user to provide authentication to services wanting their data, and the second story is about how the social graph of the user is accessed and shared once authentication has been provided.
The authentication story is "How can you use a single login to securely access multiple services, both to download, modify, and upload social data?" Luckily, these components already exist and are rapidly reaching widespread usage in the form of OpenID and OAuth. OpenID is a way of a user "logging-in once" to a service provider and then having their log-in details be verified by a trusted identity provider (A OpenID username is a globally-unique URI, which sounds familiar to some in the W3C, as Tim Berners-Lee once said "Everything of importance deserves a URI. Go ahead and give yourself a URI. You deserve it!"). OAuth is a standard way for developers to offer their services via an API without forcing their users to expose their password. OAuth prevents it by explicitly asking you if you want to let Facebook grab your details from the provider. One great benefit of this approach is that does not necessarily have to happen in a "download" and "upload" social data model, but can allow other applications to access social data without that data ever leaving its server, or "streaming" the social data to the other applications. Once a social network has been secured via OpenID and the authority of the user, the next question is "What API to use?"
The problem with social networking is that we are faced with a plethora of APIs. While some devices like mobile phones, which are strategically placed near the contact list, may still have the chance to standardize their social data API, many Web 2.0 services already have their own APIs, ranging from the plethora of "contacts" APIs available to more concerted interoperable social networking APIs such as OpenSocial. Furthermore, many services already allow import and export of their data using time-trusted standards like VCard, and many Web sites are now exposing some portion of their data using microformats like XFN and hCard. Furthermore, different sites have different types of data they store in profiles and different ways of describing social networks. What is necessary is a way for the social graph of all these services to become interoperable, while not forcing the developers of any of these services to implement change their API or their preferred data format.
Furthermore, one of the worst things that could happen to any process of social network standardization would be for a committee to get together and craft some ad-hoc "specification" of a new API out, along with some arbitrarily delimited data model. The world of social networking is vast, and from blood-types in Mixi to wine reviews in Corkd, the types of data different kinds of users find important is hugely varying, and the number of use-cases vast. What precisely is the core, if any, of social data that should be standardized, and what other components should be left to develop in a decentralized manner? What any upcoming standardization activity could learn from the HTML5 Working Group at the W3C is the absolute central importance of basing design decisions on empirical data and actual running implementations. However, with social networking it's going to be more difficult, for while the HTML5 Working Group can just ask a search engine for a centralized list of the popularity and correct usage of various elements in HTML, there is no centralized collection of data about the Social Web. Before any data is standardized based on "common sense", we need a serious game-plan to "ample" both the social web service provides and users to determine what types of data and use-cases we are working towards. However, the problem is that the landscape of social networking is changing so quickly that any empirical data will likely be out of date, so anything standard for social data must be extensible in a principled manner.
The second part of the story, where there is far less agreement than on the authentication story, is the data model: What data model should we use for interoperable social networking? However, this is precisely the wrong question - for obviously, if we are to respect the work currently done, the question should be: What kind of abstract data model can we standardize? Ideally, this abstract data model should be able to map to all the existing APIs, as well existing formats such as VCard and XFN, and interoperable across data formats as diverse as XML and JSON. Furthermore, whatever abstract data model we use should be fundamentally extensible. The only data model that fulfills these dual constraints, of both being extensible and being free of a bias towards any sort of encoding is the Resource Description Framework (RDF), the foundational language of the Semantic Web. First, RDF is a graph-based data format, and so is well-suited for describing any social graph. Second, it uses URIs as globally unique identifiers, much like OpenID. Third, it is defined primarily on the level of semantics, allowing the data itself to take any form (XML, JSON, APIs). As a graph-based data format that allows URI-based merging of data, it makes perfect sense for "mashing up" different social networking sites. Furthermore, it already has a host of supporting technologies, from GRDDL to convert microformat-data to RDF to SPARQL for querying the social graph. At least one useful exercise would be to see how far RDF can go in modeling whatever concrete social networking APIs and formats are out there in the real world, and the RDF vocabulary FOAF provides a good starting place.
However, one key use-case will be the ability of users to give and restrict access to their data, and most people will want to do this on the basis of groups and constraints. While RDF is a simple language, it does not provide the ability to provide such constraints such as "If my friend is from my hometown and does not work with me, give them access to personal photos." This sort of definition of groups by constraints can be done over RDF data by using the W3C's upcoming RIF (Rule Interchange Format) standard. For our purposes, this essentially adds variables and conjunctions to RDF. So, although the foundation should just be allow some standardization on some level of abstract model of social data, as the Social Web matures there should be some way for services and even users to provide access constraints over their social data.
Yet the Semantic Web has its own host of problems, and despite the years of research, very little research has gone into maturing the social side of the Semantic Web. This points out a fundamental flaw in the design of RDF as it stands today: While publishing data in the wild may do for publicly-available data, this model of deployment would be a failure for the Social Web. While OpenID and OAuth and such can ameliorate this to some extent, it would be better if provenance was built into the data model itself. Furthermore, as social networking data becomes "de-attached" from a particular service, it becomes increasingly important to track down "where" the data came from and "what" has been done to it. This is a major hurdle for RDF, which does not have such facilities built in. However, neither does any other format. While this leans more into research, one powerful idea here might be the idea of using the applications of rules, which form proofs as in proof-carrying-code, to track the movement and modifications of social data. Ideally, whenever you get some social data, you should be able to look at the data itself and "follow-your-nose" to figure out where it's come from and verify its authenticity. How precisely such provenance is easily included in RDF is a topic for research.
Almost all the work in the Social Web has come from outside the W3C. However, the W3C, despite reputation for being semi-closed and sometimes slow nature, has a number of advantages. The W3C has a number of of options for interaction, many of which, such as Interest Groups or Incubator Groups, are completely open. If there are a large number of separate activities that need to be started, such as an interest group, a Taskforce could be created to co-ordinate them. Second, it has a well-earned reputation as a vendor-neutral space, something that is badly needed in the social networking space. If anything ever does get standardized, it must go through the W3C's Royalty-Free patent process. Third, the W3C a process that guarantees via a charter and staff contact time that at least some progress will be made, and any results will get an international review, and the W3C's blessings could increase the chances of adoption of private and interoperable social data from services that may be on-the-line. Lastly, the W3C can co-operate with the Open Web Foundation and Data Portability.org to make sure efforts aren't duplicated. For example, community-driven specifications from the Open Web Foundation could be sent to the W3C for further double-checking, review, and recommendation, and this would lead for further update for those specifications and a mutually beneficial relationship. Obviously, good standards tend to get created by small groups or individuals, not by committees, and then brought for wider review by committees which can then fill in some of the missing details and double-check edge-cases. So, at very least, the W3C could serve as a vendor-neutral discussion forum to develop a detailed high-level game plan for future social networking efforts, and at most could even provide recommendations once enough agreement has been reached.
What is a realistic goal? At least in the short-term, it is unlikely that we can get organizations to commit to a single API, although this may be possible in mobile platforms. It is likely that we can create a common extensible meta-model that maps to the data existing APIs export. It is unlikely that a single standard for privacy and authentication on the Web should be passed, but ideally the W3C should lend its support to crucial technologies like OpenID and OAuth, and follow a data-driven development model in social networking. It would be a mistake for the W3C to standardize a single monolithic technology architecture for the Social Web. However, a W3C Best Practice Standards recommendation that a service could show it implemented and secure and inteoperable social networking site, much like the "Valid HTML" graphic, could show services to commit in practice (not just by joining list-servs).
As mentioned before, everyone should win from this effort. The original hypertext Web succeeded because previously closed information spaces of hypertext systems were combined into one seamless space of information. Most of our important information is social, and there is no reason that this data should also not be part of the one Web, while at the same point, absolute privacy and security should be preserved. And vendors, developers, and users will benefit from a single social web. Large vendors that are already possess lots of social data should be able to use a single Social Web to leverage the transfer and monetization of their data to smaller vendors. Small vendors can use to avoid the "chicken-and-egg" problem of not having a social netorking site by accessing social networks from larger vendors and users themselves. Researchers in academia have a host of problems around authentication, provenance, identity, and ease-of-use to tackle. And users can truly share and own their own social data.