Design:BasicFederatedQuery

From SPARQL Working Group
Jump to: navigation, search


Federated query is the ability to take a query and provide solutions based on information from many different sources.

A building block is the ability to have one query be able to issue a query on another SPARQL endpoint during query execution.

Status, To Do's

  • Allow SERVICE without argument?
  • Allow nesting of SERVICE?
  • Need to preserve order
  • Algebra

Definition and scope of feature

ISSUE: How to specify BasicFederatedQuery in a way that acknowledges optional nature of feature & security issues

Federated query is the ability to take a query and provide solutions based on information from many different sources. It is a hard problem in it's most general form and is the subject of continuing (and continuous) research.

A building block is the ability to have one query be able to issue a query on another SPARQL endpoint during query execution.

Different approaches are possible. One is to assume top-to-bottom order, so if a variable is bound above the invocation of the remote source then it is passed as a parameter. This is however a departure from SPARQL's declarative semantic and scope rules. It would be better if the federated subquery would leave all latitude to the implementation to decide on join order and would thus be more in the non-procedural spirit of SPARQL.

Ideally, federating queries should be transparent. With SPARQL, this comes at a high cost, specially if any of multiple end points may in principle match any of the triple patterns in a query. This means that information about colocation of data within one end point is hard to express. For this purpose, if a subquery in a SPARQL query is labeled with "SERVICE" then this means that the subquery should be sent as a whole, without attempts of dividing it into smaller fragments and that two SERVICE group patters or subqueries should not be merged into single request. Nevertheless these rules are not absolute, because optimizer may prove that some pattern is totally empty or produce a subset of the result of other subquery.

Syntax

Changes the the grammar. Examples are also useful here and in the defintion.

Grammar rules. Examples are always useful. See A.8 Grammar in SPARQL/2008.

Algebra operator

Describe algebra operators introduced. This is the definition of the operation together with it's evaluation.

See 12.4 SPARQL Algebra and 12.5 Evaluation Semantics

SPARQL evaluation is strict evaluation and in applicative order - in other words, logically, the arguments to a algebra operation are evaluated then the operator called on the results. Implementations often improve on this but must obtain the same results of this naive evaluation.

Mapping from AST to algebra

How do we get from the syntax (Abstract Syntax Tree) to the algebra expression. Often trivial but consider potential interactions with elements in groups (which get joined together) and Basic Graph Patterns and filter migration (filters act as is done after the BGP regards of where in a BGP they are written).

12.2 SPARQL Query.

Test Cases

Look in the local database of my books and find the author for each one as specified by web-accessible query SPARQL service.

PREFIX : <http://example/>
PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
SELECT ?a
FROM <mybooks.rdf>
{
  ?b dc:title ?title .
  SERVICE <http://sparql.org/books>
    { ?s dc:title ?title . ?s dc:creator ?a }
}

Return the people whom Orri knows at both semanticweb.com and myopenlink.com:

select ?contact1 where {
  SERVICE <http://www.semanticweb.com/sparql>
    {select ?contact1 where { ?me foaf:nick "Orri" . ?me foaf:knows ?f . ?f foaf:name ?contact1 }}
  SERVICE <http://www.myopenlink.com/sparql>
     {select ?contact2 where { ?me foaf:nick "Orri" . ?me foaf:knows ?f . ?f foaf:name ?contact2 }}
  filter (?contact1 = ?contact2)
 }

or, equivalent query,

select ?contact where {
  SERVICE <http://www.semanticweb.com/sparql>
    {?me1 foaf:nick "Orri" ; foaf:knows ?f1 . ?f1 foaf:name ?contact }
  SERVICE <http://www.myopenlink.com/sparql>
     {?me2 foaf:nick "Orri" ; foaf:knows ?f2 . ?f2 foaf:name ?contact }
 }

It is clear from the use case that it makes little sense to look for contacts of Orri's IRI at myopenlink.com in the repository at semanticweb.com if both sites were known to assign different person URI's. Without the explicit partitioning of the patterns, it would be difficult to infer this, so single group graph pattern like

select ?contact
FROM SERVICE <http://www.semanticweb.com/sparql>
FROM SERVICE <http://www.myopenlink.com/sparql>
where {
    ?me1 foaf:nick "Orri" ; foaf:knows ?f1 . ?f1 foaf:name ?contact .
    ?me2 foaf:nick "Orri" ; foaf:knows ?f2 . ?f2 foaf:name ?contact .
 }

is definitely less efficient.

The query optimizer would have all latitude in determining which site had fewer contacts and putting this first in a loop join or for retrieving both in parallel and doing a hash intersection or any other possible execution plan. The plans' relative merit is entirely dependent on the expected cardinalities and access latencies of the sites.