graphic with four colored squares
Cover page images (keys)

CrossCloud: Linked Data Saves the World

Sandro Hawke (
20 March 2012
MIT Decentralized Information Group

Context for this Talk

This is an idea I've been incubating for a few years

May 2009 posting to

Not my day job at W3C, but folks there are supportive

I've had a little support from a private foundation

• I did a partial implementation last year (weekend/holiday project)

I'd like to work on this more - TimBL and I are looking for funding

The Problem

To work right, social apps need to reach everybody

• If I share my photos with you on Flickr ... you have to use Flickr

• This applies to every app that involves other people: scrabble, chess, chat, MMORPGs, FPSs, collaborative filtering, open source development, find friends nearby, shared calendars, ...

When an app reaches everyone:

• We all get to work together online

• But we are constrained by that one provider's limits

• It has monopoly power

When it only reaches some people:

• Other people are excluded

• We can't do things that require everybody

• We can't always do the things we want with the people we want

• Folks advocate for the monopoly, out of self interest.


The "standard" solution is standard protocols...

• email (RFC 821, 822)

• http (RFC 2616)

• xmpp (RFC 6120)

• ical (RFC 5546)

• vcard (RFC 2426)

But they

• are very expensive to create

• are even more expensive to change

• don't happen much (weak business case?)

• often don't work well (hard to iterate)

Is there a better way?

Let's Find A Technical Solution

Social apps are "shared-state apps"

• Alice and Bob are looking at different screens.

• But they see the same world-state represented

• Alice makes a change

• They both see it.

Let's figure out shared-state apps

Usually implemented as Client-Server

• Client is the UI (a Viewer and Controller)

• Data lives on a central server

The "PQR Model"

Formalize/Abstract that Client-Server interaction


• Create/Update/Delete - change the world


• Express what you want to know about the world


• get the detailed results of your queries

PQR Example: Chess

post each move each player makes

[ a chess:Move; 
    chess:inGame <>; 
    chess:player <>;
    chess:move chess:c4c5; 
    chess:seq 7 ]

query for the moves the other player makes

• Could do this with polling

• Or Pub/Sub. Query=Subscription / Results=Notify

results will be the set of all moves the other player has made

... or query for the games out there

... or query for the moves from both sides of a particular game

PQR Example: Doom (FPS)

Map designer posts all the details of the map


• post each action done in the world

• query for walls, items, players, etc, nearby

• results will include all data the client UI needs

PQR Interface

Fits nicely with RDF

c = PQR.connect(user_identity, user_authentication)
r = c.query(rdf_graph_pattern, filters)
for binding in r: ...

By itself, this is nothing new.

Database Servers Do This

SQL (Oracle, DB2, MySQL, ...)

NoSQL (MongoDB, CouchDB, Cassandra, ...)

SPARQL 1.1 Server

(Most are not engineered for streaming new answers, though)

Can we decentralize?

A decentralized system where each node provides PQR


1. Each of the systems needs a good reason to participate

• people only adopt standards when they see the benefit

2. The system has to be fast enough, at scale

• acceptable latency: 1s turn-based; 0.1s "real time"

3. Information must only be shown to permitted parties

• read access control, secrecy, privacy

4. Query results must be correctly attributed

• write access control, data integrity, provenance

5. Posted information must remain available (until deleted)

• reliability, durability, high-availability

(What Else?)

Linked Data

Linked Data provides PQR platform, in some situations

Post = make new RDF pages appear somewhere on the Web

Query = dereference all the URLs used in the graph pattern

Results = parse, merge, graph match, filter what came back

Limited Success

Consider a typical FOAF scenario

This works for { eg1:sandro foaf:knows ?x }

• deref of eg1:sandro will find the foaf:knows triples I put there

It MIGHT work for { ?x foaf:knows eg1:sandro }

• If I happened to put in back links.

• Does anyone do that? (no)

It won't work for {?x foaf:knows ?y}

Read-Write Linked Data (RWLD) has the meeting info and times available

bob wants to tell the system his selections....

• provides a URL where he can post it

This works, but:

• Alice has to install meeting scheduling software

• Bob needs to trust Alice's to correctly count and remember his vote

RWLD Chess?

Who hosts all the real-time game data?

Who controls who can see what?

How do I find what games Alice is playing, chess or otherwise?

How long will they keep the game data?

What if I want to play an unsupported chess variation?

Host Your Own Data

Maybe Bob can post his actions to

• He can post his choices on Alice's Event1 poll

• He can post the moves he's making in various chess games

• He can post where's he's shooting in his FPS

(Of course it's really apps that are making these posts)

... and somehow everyone finds these posts

Data Registration

This is the core of my proposal.

1. Some people run RDF trackers, eg

2. If you mint a URI for something, in its "home" document, put a triple saying which trackers you want used

{ <> reg:tracks "*"; }

3. When you use a URI, ping its trackers with your data's source URL

4. To find data using a URI, query its trackers for source URLs.

Data Registration Diagram

Meeting Scheduling App (V2) has the meeting info and times available

bob published on

bob 'registers' his data

• he looks up alice's trackers

• he notifies them

• (they may verify he is publishing data about event1)

• they may notify others that are currently querying

• they will answer future queries, giving bob's data URL

Now Alice and Bob and everyone justs hosts RDF data, and PQR works.

CrossCloud Diagram

Review of Requirements

1. Each of the players needs a selfish reason to play

• publishers - yes, want their data out OR get paid

• trackers - get paid

• consumers - yes, want to find data; may pay

2. The system has to be fast enough, at scale

• Millions of small vocabs - works smoothly

• A few vocabs with millions of sources - that'll fund server farms

3. Information must only be shown to permitted parties (cf read access)

• it's just web identification/authorization

• (well, plus whatever trackers leak)

Review of Requirements (cont'd)

4. Query results must be correctly attributed (cf write access)

• Querier has all the URLs they came from

• Could use SSL and webfinger to tie them to email addresses

5. Post information must always be available (unless delete was authorized)

• Folks who did the post get to pick the server(s) where it goes

• They also get to pick the vocabs, and hense the trackers

• They MIGHT be unreachable if excluded by vocabs (eg licensing)

A Few More Issues (1)

Merging Data (Social Merge)

• What if Bob and Alice say conflicting things?

Vocabulary Migration (Shims)

• How do we avoid being locked into Vocabularies, instead of apps?

RDF Sync

• Pub/Sub for changes in RDF data sources

Web Identity and Authorization

• WebID, OAuth2, etc, etc

Wire protocol / formats

A Few More Issues (2)


• Do we need to characterize large data sources?


• How do you refer to a Graph in a SPARQL Endpoint, etc?

High Availability

• Replication for hosts being down

• Replication for trackers being down

Business Models

• Can I query { ?s ?p ?o } if I'm willing to pay for it?


Prototype last year

• No logins

• No inference

• Sketchy protocol

• Demo: up/down votes of URLs; real time browser-to-browser

Looking for funding

• Applying for Deshpande Innovation grant (reviewers needed)