TF-Graphs-UC/FOAF Use Case

From RDF Working Group Wiki
Jump to: navigation, search

This is a draft of a use case from FOAF project regarding named graphs. At this stage it outlines past experience, rather than constituting specific requirements for new standards.

Background

Since 2000, a network of cross-referenced "FOAF files" have been published in the public Web. These are typically in RDF/XML, and typically described people and associated entities (groups, documents, images). Initially these were hand-crafted, then more were generated using utilities such as foaf-a-matic, and since 2003 many more have been automatically published from social network sites such as LiveJournal, My Opera, Hi5.

Since the earliest FOAF aggregators, it became important to indicate the source, provenance and authorship of FOAF-based RDF documents. Several mechanisms have been explored.

Signed RDF

Provenance via PGP-signed documents, rdfweb-dev list (now foaf-dev), Mon Aug 21 01:07:26 UTC 2000

In these experiments, two things were explored:

  1. representing the 'who signed whose key' from the PGP 'web of trust' in RDF.
  2. representing in RDF the claim that some (typically but not necessarily RDF-based) document had been signed.

The latter practice was explored in more detail. Neither became widely adopted.

The central idiom used (see WOT vocabulary) was for a document to itself include a wot:assurance link that pointed from its own URI to the URI of a document that was emitted from PGP after signature.

So for example, Dan's doc http://danbri.org/foaf.rdf might have a wot:assurance link to http://danbri.org/foaf.rdf.asc

Edd Dumbill documents this early work, in particular in his second IBM Developer Works article:

Edd Dumbill
01 Aug 2002
XML Watch: Support online communities with FOAF
How the Friend-of-a-Friend vocabulary addresses issues of accountability and privacy

... for example:

<foaf:Person>
 <foaf:name>Edd Dumbill</foaf:name>
 <foaf:mbox rdf:resource="mailto:edd@xml.com" />
 <!-- personal, PGP signed, details here -->
 <rdfs:seeAlso>
   <rdf:Description rdf:about="http://example.org/edd/personal.rdf">
      <wot:assurance rdf:resource="http://example.org/edd/personal.rdf.asc" />
   </rdf:Description>
 </rdfs:seeAlso>
</rdf:Person>

Edd's article also explores the requirements around storing this kind of data.

Note that at this time, RDF query had not been standardised, and most systems offered only basic triple-pattern matching. Further that typical RDF storage systems did not include any quads mechanism. Edd's article shows how he implemented a provenance system on top of Dave Beckett's Redland DB (indicating authorship / source of triples, results of PGP checking etc). Subsequent changes to Redland made it possible to use its built-in provenance mechanism, and standardisation of SPARQL allowed queries to be expressed that make constraints against the source of triples.

Recent developments: WebID / FOAF+SSL

The FOAF "Web of Trust" Vocabulary is an early experiment, and ripe for revision, particularly to address new work around FOAF+SSL aka WebID protocol, which uses the X509 family of technology rather than OpenPGP / GPG. Some wiki-style notes towards this revision are here.

W3C has chartered a WebID incubator group.

Related more general discussions continue on the foaf-protocols list.

Drafting Requirements

It may be that we have no specific technical requirements. This draft is a move towards writing down some detailed scenarios to determine what we need.

Firstly, SPARQL in current form should be enough for us to ask questions that relate to information provenance.

Simple case:

"How old is Dan, according to people who are his colleagues" can be expressed in SPARQL.

The PGP/crypto-related techniques above can be used to add assurance to certain layers of data from the SPARQL store.

A W3C spec for Named Graphs beyond SPARQL might allow us to serialize complex datasets that interconnect claims with supporting evidence about those claims.

Working through the "How old is Dan, according to people who are his colleagues" case:

  1. We want values ?y for foaf:age property of the thing ?x whose foaf:homepage is http://danbri.org/, as asserted by people who have a foaf:workplaceHomepage ?h with the same as the current true value of that property.
  2. We want to serialize to some standard form our entire repository of relevant information, including who-said-what metadata, such that it could be reconstituted elsewhere and the same query be successfully run.
  3. We want where possible to cryptographically assure this kind of activity [vague,...].

A version of the original FOAF requirements draft, from 2000 has some relevant use cases:

While RDF is defined in terms of a rather abstract information model, our needs are rather practical. We want to be able to ask the

Web sensible questions and common kinds of thing (documents, organisations, people) and get back sensible results. "Find me today's web page recommendations made by people who work for Medical organisations". "Find me recent publications by people I've co-authored documents with."

"Show me critiques of this web page, and the home pages of the author of that critique"

Although old, these original use cases are not yet fully met by the current Semantic Web landscape. They may be worth rethinking, but for now these are offered as the draft RDF WG FOAF 'named graphs' use case: keeping track of several sources that combined can answer such queries.

  • The very earliest FOAF aggregators made the simplifying assumption that we could believe all triples, and that they could be harmlessly merged.
  • This soon proved ineffective. The next phase of FOAF aggregators partitioned triples by source, but also tended to believe each publisher (often using claims expressed in terms of the foaf:PersonalProfileDocument class)
  • recent Social Web trends (openid/oauth), FOAF+SSL and the original wot:assurance work point towards stronger checking of document claims and provenance.

Expectation here is that RDF WG named graph mechanism will make it possible for aggregates of FOAF-related RDF to be shared in standard form, as well as what we can do now which is expose them via SPARQL.

TODO

  • work through the dan/age or 2000-era cases with full test cases.