Named- and background graphs, triples vs quads, trust, etc.

(follow-up from an off-list discussion between Andy, Jeen and myself)

The current SPARQL document defines the concept of "RDF Datasets". An
RDF Datasets is defined as a union of RDF graphs: a single, unnamed
background graph and zero or more named graphs. The SPARQL document
explicitly mentions that the relationship between named and background
graphs is not defined and actually mentions two useful arrangements
(section 7.1 in the 2005/03/24 version of the editors draft):

    1. place provenance information about the named graphs in the
       background graphs

    2. include the RDF merge of the named graphs in the background graph.

The first point refers to an arrangement where the background graph is
separate from the named graphs.

SPARQL further allows one to query the names of graphs by using the
GRAPH keyword. Using this GRAPH keyword will result in the querying of
triples from named graphs, omitting it will result in the querying of
the background graph.

If the first arrangement of named and background graphs is considered,
then this query mechanism essentially is a mechanism for querying quads,
not triples! The graph name is no longer just an ignorable attribute of
triples, but is now an essential part of it. It appears to me that there
is a mismatch between RDF and SPARQL here.

The second arrangement is an entirely different story. Here, graph names
are "merely" a triple grouping mechanism; the set of triples that is
queried does not depend on whether a query asks for the name of the
graph.

My main concern with the current spec is that it leaves the choice of
the arrangement for RDF Datasets up to the implementer of the query
engine. These two arrangements seem to be largely incompatible with
each other and as such has the potential to split the RDF community in
two camps: an "RDF is quads" camp and an "RDF is triples plus context"
camp.

I see two possible ways to solve this issue:
   1/ standardize on a single arrangement (preferably the latter), or
   2/ move the choice of arrangement into the query language.

Option 2 is probably the best solution, considering the many use cases
requiring different arrangements (trust issues and all...). Also, it
only requires small modification to the current SPARQL spec in order to
allow the query writer to specify whether the query involves a specific
named graph, all named graphs, only the background graph, or the union
of it all. These required modifications are:

   - Make the GRAPH attribute truly optional so that it no longer
     influences which (set of) graph(s) is queried.

   - Allow variables in GRAPH attributes to be left unbound for triples
     that are in the background graph.

With these modifications, SPARQL would again be usable as a true triple
query language, while still offering the choice of graph arrangement but
without risking the interoperability of SPARQL-aware tools. The
following queries explain the effect of the proposed modifications:

Q1 will evaluate against the union of all graphs:

   Q1: SELECT * WHERE {?s ?p ?o}

Q2 will evaluate against the graph with name <URI>:

   Q2: SELECT * WHERE GRAPH <URI> {?s ?p ?o}

Q3 will again evaluate against the union of all graphs, leaving ?g
unbound for triples in the background graph and including multiple
solutions for triples that are in more than one graph:

   Q3: SELECT * WHERE GRAPH ?g {?s ?p ?o}

Q4 will evaluate against all named graphs:

   Q4: SELECT * WHERE GRAPH ?g {?s ?p ?o} FILTER bound(?g)

Q5 will evaluate against the background graph:

   Q5: SELECT * WHERE GRAPH ?g {?s ?p ?o} FILTER !bound(?g)

Optionally, a syntactical shortcut could be introduced for restricting
queries to the background graph (cf Q5), e.g.:

   Q6: SELECT * WHERE GRAPH BACKGROUND {?s ?p ?o}

I would like the DAWG to seriously consider this proposal. Keeping the
SPARQL spec as it is today can have disastrous effects on the
interoperability of SPARQL-aware tools.

Regards,

Arjohn Kampman

-- 
arjohn.kampman@aduna.biz
Aduna BV - http://aduna.biz/
Prinses Julianaplein 14-b, 3817 CS Amersfoort, The Netherlands
tel. +31-(0)33-4659987  fax. +31-(0)33-4659987

Received on Wednesday, 30 March 2005 16:10:33 UTC