StateOfSocialWeb

From Social Web XG Wiki
Revision as of 14:57, 18 August 2010 by Bblfish (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

State of the Social Web

[Christine comment June 30 during SWXG meeting: the introductory comments need to be revisited when we have finished writing the body of this section]

There has recently been an upsurge in work on making a distributed and secure social networking platform possible, fueled largely by the discontent with existing social networking sites's terms-of-service as regards the privacy of data. In general, there is a motive to ``open the social graph (See original @@post by Brad Fitzpatrick of Livejournal) and "escape the walled gardens @@See diagram by Tim Berners-Lee@@."" In this regard, walled gardens} like the ubiquitous Facebook are data silos where user data can easily be entered, but only accessed and manipulated via proprietary interfaces for humans and machines, therefore making effectively preventing the user from moving from one social networking platform provider to another, so creating a `wall' around their social data. This current dismal situation is analogous to the early days of hypertext before the World Wide Web, where various systems stored hypertext in proprietary and incompatible formats without the ability to use globally link and access hypertext data across systems, a situation solved by the creation of URIs and HTML. While hypertext data is in the first case often meant to public, this is not the case for social data, and in any closed data silo ran by a single proprietary vendor, the user de facto loses all privacy to their data to the vendor. Yet via a distributed social networking built on the principles of Web architecture, social data can given `first-class' citizenship on the Web like hypertext and Linked Data.

@@SOCIAL WEB VS SOCIAL NETWORKING

However, a critical problem in realizing this vision of a distributed and secure social network is the fact that any `distributed' social network will become yet another walled garden unless it is based on open and royalty-free standards. Via open standards,  multiple social networking platforms ranging from large vendors to simple personal websites should be able to interoperate.  However, currently these standards are scattered across various communities and even incompatible, so that producing a single overview of what technologies and standards are necessary is difficult. Without such an overview, it is virtually impossible for software developers working on decentralized social networking platforms such as Elgg  and Diaspora, to let their work be interoperable. Therefore, we propose an set of simple and widely-deployed standards that is committed to principles based on Web architecture. 

@@OPENID

One option is WebID. As the basic tenet of Web architecture is that everything can be identified with a URI, the core of our system is verification of identity via attaching public key certificates to URIs that refer to people, so making a URI into a WebID. In particular, as the Semantic Web and Linked Data are also based on the principles of Web Architecture, this allows the Social Web to be seamlessly integrated with to the Web of Linked Data. Furthermore, WebIDs allows Linked Data, instead of being public, to be accessible only after authentication, a scenario that is widely applicable (such as in data-based e-commerce and enterprise Linked Data) outside social networking.

For a few years social networking remained a small phenoumenon on the Web in the form of sites such as Orkut, but with the arrival of Myspace, social networking took off first among youth networking around music, and then with the eventual arrival of Facebook, social networking became nearly ubiquitous in every age group and country. Although the world of social networking is still fragmented geographically and demographically (such as youth in France favoring sites such as Skyblogs and the popularity of sites such as Hi5} in Japan, Facebook has a clear and rapidly growing global dominance in traditional social networking, but many other sites such as LinkedIn  and Twitter specialize and dominate in other kinds of fields, and most other sites such as Amazon and Google are rapidly adding ``social featues.  In order to avoid digital schitzophrenia even amongst major providers, with users being forced to re-enter data and adapt to differing terms of services agreements, even the users of only  major vendors have a case for wanting these vendors to implement the features distributed social networking. By encouraging fluidity of data and access, major vendors can compete over user-interfaces, Social Web applications, and allowing users to market their data (See @@ which allows users to sell their purchase histories for money) rather than data lock-in.  As users should be able to easily authenticate and import data to Social Web platforms, smaller vendors and start-up companies, and non-profits can then also more easiy provide services to users, 

First, we will outline the principles of Web architecture and the requirements for an open and distributed social Web. Then, we will dive into the technical details that meet each requirement. Lastly, using a sample data-set of Semantic Web researchers, we will show how fragmented people are across various social networks, using a sample of popular social networking sites. Then we will show how widely implemented (or not) each of the technical solutions to each of the requirements are, and detail further steps for deploying the Social Web of Linked Data.

Drivers for Social Web

Users are going to first be motivated and attracted to the Social Web by new offers with new functionality or features, and at a secondary level, by the need for better (different) privacy protection systems.

[this text from the Social Web Frameworks document http://www.w3.org/2005/Incubator/socialweb/wiki/SocialWebFrameworks2#Social_Graph_Management_Today]

Users participating in the Social Web will have personae for different situations/contexts. This ability - to set up and to maintain multiple profiles, depending on the context of the social application - will be the driving force that will expand participation in the Social Web including and beyond the silo applications provided by existing Social applications.

A Social Web user should be able to create and to organize one or more different profiles at a virtual location and using the management interface of their choice. For example, a user might want to manage their personal information such as home address, telephone number, and best friends on one profile management system and their work-related information such as office address, office telephone number, and work colleagues on an other management system, or may even want to store their entire profile locally with a trusted third party. Today widely adopted and open standards to do this do not exist. The current “aggregator” approach are short term solutions akin to the “screen scraping” days of the 1980s and unlikely to scale to the necessary size.

The new approach we propose allows the user to associate a specific Social Profile directly to Web 2.0 service providers. For example, your Friends Profile can be exposed to MySpace and Twitter, whereas your Work Profile to Plaxo and LinkedIn. Additionally, traditional service providers can utilize the same features, and your Health Profile can be exposed to health care providers and your Citizen Profile exposed to online Government services. New players, large or small, can also interface to these Profile services and offer seamless access to a User's personal data. This approach opens up the market for targeted user profile management services by a wide range of stakeholders.

Privacy is still one of the major issues with online profiles managed by existing social networks, and often the user believes that by setting a “private” user profile they will not be exposed to any threats (eg leaking of personal data) outside of their social network. Concepts such as "friends of friends" can open and expose personal data to wider audience than the user may have expected. With a common user-managed policy framework used across all Social Applications, the user will now be in direct control of access and usage permissions to their different profiles. Having a common framework makes it easier for the user to understand the implications of their privacy and access control decisions.


As custodian of their own profiles, users can then decide which social applications can access which profile details via exposing one (or more) profiles to that provider. In this new architecture, exposing a profile is an explicit act, and one that the user is more aware of, and can just as easily retract as well. [end of pasted in text]

Requirements for Distributed Social Networking

The requirements for distributed social networking are diverse, but can be separated into a common core of five components. Each component builds upon the last, but uses a slightly different technology stack. In particular, the five components are:

  • Authentication: A user should be able to securely authenticate and access a Social Web platform without remembering a host of possibly varying user-names and subsequent passwords. In particular, users should be required to divulge the minimal amount of information necessary to authenticate and divulge this information a minimal amount of times.
  • Data Portability: Data should be both easily importable and exportable from Social Web platforms. This avoids the problem of the user having to re-enter all their data every time they sign into a new Social Web platform, especially by providing dangerous unlimted access to their e-mail account to a third-part vendor. As no single application can predict all the kinds of data that may be wanted by a Social Web application, the data format should be extensible, but capture at least a common core so that developers do not have to ``re-invent data-formats.
  • Distributed Access: Third-party Social Web applications should be able to be granted access by authenticated users to their social data, but only access to the parts of their social data necessary, and that data should not be copied and saved unnecessarily by the third-party.
  • Messaging: Authenticated users should be able message each other across any Social Web platform. These messages should be able to be either asynchronous (like e-mail) or synchronous (like chat). Messages should either be publically or privately (encrypted) be sent across the network. Messages may be either broadcast to the entire Web (blog posts) or to a limited number of contacts (status updates). If necessary, provenance information of the messages should be tracked across sites.
  • Policy-Aware: Users should be able to define who can access their data and under what conditions. Users should be able to explicitly defined groups or implicitly define them via conditions of the social data of users of the group, and give access to these groups. Similarly, these conditions may have to do with the social data itself, such as licensing and attribution.


While Semantic Web technologies have many advantages, including especially the seamless integration with other sources of data in the Web of Linked Data, while these technologies had a head-start in the Social Web due to the early appearance of FOAF, currently the nexus of much Social Web work is happening in other communities not familiar with the Semantic Web.  Rather than re-inventing a new technology for each component in the stack, in detail we will inspect how existing work can either be translated into or work with Semantic Web technologies.

[Christine Perey comment June 30: I agree that it is unlikely to be successful, and inappropriate to try to bring/translate everything which is happening in the Social Networking ecosystem into the SemWeb context. Rather, the question is how can W3C offer what has been achieved, proven useful to others.

If we agree with the above statement, then I question if it is beneficial to the report's target readers/audience to repeatedly bring up the Semantic Web in this report. The report needs to be "lighter" (i.e., less heavy handed) with respect to use of SemWeb end of comment by Christine Perey]

Authentication: from Passwords to WebIds

Username and passwords

The most widely deployed identification technology deployed currently on the web is still the username/password pair. This is relatively easy to understand, but suffers from a number of drawbacks, the best known of these being:

  • the username has to be short and memorable, but there are more people than short and memorable names, and so often people arriving at a new web site will find that their name has been taken.
  • a user that has a number of accounts on different sites will tend to use the same username password combination, leading to growing insecurity of each account
  • the passwords generated are often insecure, and if they are not they can be difficult to remember
  • the web site therefore also requires a mechanism to allow the user to reset his password. This is usually done by identifying the user via his ability to receive an email. The username/password can therefore be seen to be a short cut to the lengthier e-mail verification process. ( Though, the bootstrapping e-mail provider is itself usually only protected by a username and password.)

This means that creating a new account requires the user to do a number of things

  • create a username
  • think up a password
  • verify an e-mail address

Browsers now make it easier for users to create different passwords for each web site by remembering them for the user. But this just displaces the issue, as it then becomes very difficult for her to change browsers or devices.

At the end of this process the user ends up with a new account. But this account contains only the information he just typed in. It is disconnected from the other accounts he may have, from the content developed there, and from the social networks. The user will have to recreate a list of friends on the new server - which he will only be able to do if those friends have accounts there. So the user ends up pretty much with a blank account.

Firefox Sync and Weave

Something like the Sync plugin could allow the user to copy passwords, browser preferences and bookmarks from one browser and device to another in a secure manner. It would of course need to be deployed on every platform and browser if it were to to be globally useful. The way Sync and Weave function is simply by storing the content of the bookmarks and passwords in a determinate format cryptographically on a server at a URL. The end user then only needs to remember this URL and the one password for its contents, to be able to retrieve it in any other device that knows how to decrypt and read the content.

The Weave project of which sync is a small component, also aims to make password based authentication more integrated in the browser by allowing the browser to create and update passwords automatically across web sites, as explained by Aza Raskin in a blog entry on the topic

While something like the weave project would indeed solve the problem of bad passwords and authentication, and even help it work across devices if all devices can be made to agree on a format, it still leaves the issue of needing to fill out new personal information on a social network. Though this could be solved for personal information from a browser perspective it cannot solve the issue of linking a user across social networks.

OpenId

OpenId centralises the authentication step at the OpenId service provider. A user need only remember one globally unique identity - be it a home page as with the original OpenID, or the URL of the service provider, or even just an e-mail address, as enabled by the FingerPoint and the webfinger protocols - and he can login to any enabled web site. Furthermore OpenID comes with an attribute exchange protocol, that allows the user to specify what data about himself should be sent to that web site.

The limitation of the attribute exchange spec is that it is limited to the information that can be placed inside a URL - ie a maximum of 1024 characters - due to use of redirects to pass information in the protocol. The other issue is that though the user has a global login it is not a an identity that is linked to. Finally the problem of the complexity of the protocol, mean that it is not quite as RESTful as is needed to be able to take full advantage of web architecture. But it did get it mostly right.

need to fill in more here

Web ID

The WebId Protocol (aka foaf+ssl) uses the TLS stack available in browsers since 1997 for identification and authentication of an agent in a global namespace. To do this the agent - which includes a diverse group such as human beings, animals, organisations, robots and other social entities - is referred to via a URI, constructed in such a way that a description of that object can be retrieved via the URI itself. The authentication uses Client Side Certificates and exactly the same protocol stack as is used for global commercial transactions. As a result the following are possible:

  • the user need not remember a global identifier, or even type one ( his WebID ) in order to login to a new site. It is locked into the client certificate
  • the user need not to create a new password per web site. This is dealt with using public private key cryptography built into TLS
  • the protocol is very efficient: it requires at maximum one more connection over and above the connection to the desired web site
  • it fits in cleanly with web architecture

WebID does not rely on Certification Authorities to identify Certify the user, but uses a trick similar to that used by OpenId: the WebID document, describing the user, is itself used as confirmatory evidence. The globally dereferenceability of the WebID enables other agents to describe trust relations which can then be used by the Relying Party to evaluate a level of trust.

It is furthermore extremely easy to create certificates for most desktop browsers - a one click affair. Once this is done, even the smallest devices can be used to identify the user, as the data remains on the web.

fil in more

Social Data: Co-ordinating the Core

Then we will describe the kinds of social data that can be accessed from a WebID and the various protocols for doing so. The first attempt at making a distributed social network was the FOAF (Friend-of-a-Friend) RDF vocabulary for describing social contacdts \cite{@@} in 2002,  it failed to take off outside of the Semantic Web community,  due to a lack of authentication and features, despite being exported by one of the early social networking sites, LiveJournal. While the FOAF vocabulary describes social networking contacts, an older non-RDF format called VCard that defines the kinds of information typically found on a card, such as name, address, and other information that was defined by FOAF in a more idiosyncratic manner. Another recent attempt is the XFN (XHTML Friends Microformat), which embeds social contact relationships directly into HTML links using the @@ attribute, and then allows vCard information to be embedded in HTML using the hCard specification. This kind of contact information is currenty deployed by sites such as Myspace @@? and Twitter @@. An increasingly popular social network data standard is PortableContacts, which is a super-set of vCard containing more social features such as @@ and @@. A superset even of PortableContacts is the data-values accessible via the OpenSocial API @@. These formats have measured degrees of overlap, as seen in Table @@. 


In particular, an update of FOAF with the properties given in Table @@ would then make FOAF a superset (@@IMPORT MELVIN CHART) of each of these vocabularies, allowing RDF to cover the core of the Social Web and provide interoperability amongst most social contact data.

@@FOAF, vCard, PortableContacts, OpenSocial

@@General principle of DanBri (see charter)

@@Encrypting data

API Access: OAuth

Importantly, distributed access to only a portion of this data can be granted in a secure manner via OAuth.

@@OAUTH @@OpenIDConnect?

Messaging: XMPP and Atom

@@XMPP, ATOM HTTP CHOICE

Lastly, we will note the ability of social network messaging (ranging from blog comments to chat) to be implemented, both asynchronous and near real-time, to be co-ordintaed via Atom, Pubsubhub, and XMPP, giving special attention to activity stream updates.


Status Updates: OStatus

TODO@@: OStatus: ActivityStreams, Pubsubhubhub

Privacy Policies

TODO@@XACML, AIR, and RIF

Lastly,  research on machine-readable policy languages could integrated may allow WebIDs to become privacy-providers @@DAVERAGGET