Why Graphs

From RDF Working Group Wiki
Jump to: navigation, search

__NUMBEREDHEADINGS__

Why do we need a W3C Recommendation on how to work with multiple RDF graphs? This page has the answers. Each entry describes a situation where such a recommendation would be very helpful.

These situations (use cases) are selected as being simple, important to several people in the Working Group, and distinct (or seemingly distinct) in the features they require of the technology.

Use cases may include one or more solution designs as an illustration of that use case, to help people understand it. The illustrations are not necessarily actual proposed solutions. In particular, note that each use case has a simple solution using TriG, but the ways they use TriG are probably incompatible.

In theory, each proposed design should show how it can solve each of these use cases.

A much more detailed and complete list is TF-Graphs-UC. This simplified list is based on Sandro's Jan 4 email and then ongoing email and telecon discussions.

1 Shared Web Crawler

Several systems want to use the data gathered by one RDF crawler. They don't need previous versions of the data.

1.1 Simple Solution

The crawler publishes its data as TriG where the graph label is the URL from which the data was fetched.

For example, Example Corporation might run a crawler that publishes its accumulated data at http://example.com/all. A GET of that location would return a document like this:

 <http://www.w3.org/People/Berners-Lee/card> {
   # ... the triples recently fetched from that URL
 }
 <http://www.dbpedia.org/resource/Tim_Berners-Lee> {
   # ... the triples recently fetched from that URL
 }

In Richard Cyganiak's report of this use case, he said DERI uses N-Quads for this.

2 Archiving Web Crawler

Several systems want to use the data gathered by one RDF crawler. They want the crawler to keep previous versions of the data. This might be used for showing users when particular parts of the data changed or for providing a consistent view of the crawled data at some point in the past.

2.1 Simple Solution

Use TriG with the graph label being some new identifier created at the time the retrieval was done. Some other data, in the default graph, connects that identifier with the URL used to fetch the content.

Example:

<http://crawler.example.org/r8571> { ... triples fetched in retrieval 8671 }
{ 
   <http://crawler.example.org/r8571> eg:source <http://example.org>;
                                      eg:date "2011-01-04T00:03:11"^^xs:dateTime
}

3 Endorsement

A system wants to convey to another system in RDF that some person agrees with or disagrees with certain RDF triples.

3.1 Simple Solution

Use TriG with the graph label being an identifier for an RDF Graph (g-snap), so that it can be referred to in the default graph.

For example:

{ eg:sandro eg:endorses <g1> }
<g1> { ... the triples I'm endorsing ... }

4 Separation of Inference

People run forward-chaining inference rules on their RDF data, and they want to be able to keep the inferred triples separatable from the given ones. This allows them, among other things, to delete and regenerate the inferred triples after the underlying data changes.

This is a simpler use case than keeping the full derivation information, but may be enough for these design purposes.

4.1 Simple Solution

TriG or N-Quads where bnodes are allowed to be shared.

For example:

Alice knows Dan Brickley, who sometimes likes to not have a URI:

eg:Alice foaf:knows _:u1.
_:u1 foaf:mbox_sha1sum="70c053d15de49ff03a1bcc374e4119b40798a66e";
     foaf:name="Dan Brickley"

From the FOAF spec and namespace document:

foaf:mbox_sha1sum rdfs:domain foaf:Agent

By RDFS semantics, we can infer:

_:u1 rdf:type foaf:Agent

Then a query for the foaf:name of every agent will show "Dan Brickley", whereas it would not without the inference. (The domain of foaf:name is owl:Thing.)

Now, how do we keep the inferred triple separate from the givens? In SPARQL, we could put the givens in one arbitrarily named graph and the conclusions in another. But if we dump and restore, we'll need to use a format that allows the bnode to be shared between those graphs.

5 SPARQL Backup and Restore

People running SPARQL systems want to be able to dump the contents of their database to a file and be able to restore it later. The format needs to be standards so they can load it on a different vendor's SPARQL system, or give it to someone else to load on their SPARQL system.

5.1 Simple Solution

TriG, with no additional semantics:

{ eg:s eg:p eg:o . }
eg:g { eg:s eg:p eg:o. }

5.2 Other Designs

5.2.1 N-Quads, with no additional semantics

eg:s eg:p eg:o .
eg:s eg:p eg:o eg:g .

5.2.2 SPARQL Update Subset

As people do with SQL, have the dump format be a sequence of SPARQL Update statements which reconstruct the database. That is:

INSERT DATA { eg:s eg:p eg:o. }
INSERT DATA { GRAPH eg:g { eg:s eg:p eg:O. } }
  • Does not preserve bNode relationships across graphs, or the same graph is split across { } blocks.

5.3 Potential Complications

  1. Some SPARQL systems (eg 4store) maintain the default graph as a merge of the named graphs. Should the dump include every triple twice to show this? Or should there be am indicator it's a dump of this kind of system? What happens if you try to restore a dump with content in the default graph into this kind of a system?
  2. It seems important to keep the solution semantics-free, like the named graphs in SPARQL. Otherwise a SPARQL dump will be asserting things not asserted by the SPARQL database it conveys.