FeDeRate/QueryProjectionThroughRules

From W3C Wiki

Query Projection Through Rules

Given a set of rules over some arbitrary data, how do we transform a query over the conclusions of those rules into a query over the ground facts?

Motivation

A Direct Mapping of Relational Data to RDF defines an RDF graph for a given relational database. This ontology of this "direct graph" is specifically constrained; it does not use the shared ontologies we'd like to see in the Semantic Web. RDF rules, like RIF frames rules, SWRL, N3, etc can transform this graph into a more SemWeb-friendly structure, re-using popular ontologies like FOAF. Projects like FeDeRate can transform SPARQL queries over the direct graph into SQL queries. This document discusses a process which can be used to transform SPARQL queries over the SemWeb-friendly graph into SPARQL queries over the direct graph.

Terms

  • rule: a mapping from one graph to another.
  • rule body: the part of the rule which matches the input graph.
  • rule head: the part of the rule which defines the inferred graph.
  • possible inference: the graphs which can be inferred from a given rule and an arbitrary input graph.
  • head query: a query in which all triples could be satisfied by some possible inference.
  • body query: a query in which all triples could be satisfied by an arbitrary input graph.
  • rule invocation: a mapping from variables in a rule head to terms in a head query.
  • query mapper: a function which maps head queries to body queries for some given rules.

Example Rule

 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX empP: <http://hr.example/DB/Employee#>
 PREFIX task: <http://hr.example/DB/Task#>
 SHAREDVARS DRACONIAN
 INTERSECTION {<Emp_TaskToFoaf>}?emp {<Emp_TaskToFoaf>}?man
 
 <Emp_TaskToFoaf> 
   CONSTRUCT { ?emp  foaf:last_name  ?wname .
               ?emp  foaf:knows      ?man .
               ?man  foaf:last_name  ?mname }
   WHERE { ?emp  empP:lastName   ?wname .
           ?pair task:drone      ?emp .
           ?pair task:manager    ?man .
           ?man  empP:lastName   ?mname }

Syntax

  • (BASE | PREFIX)* — URI resolution details
  • Parameters* — These are some old names so don't read too much semantics into them.
    • "SHAREDVARS" — whether the query mapper assume arbitrary connectivity in between the rule heads.
    • "INTERSECTION" — a limited set of points where the rules heads may connect.
  • (ruleName? CONSTRUCT)* — an optional name and then a SPARQL CONSTRUCT.

Query Mapper

  • for each BGP in the query
    • for each triple Tq with terms {TT} matching a triple Th with variables Vth in rule head H,
      • test to see if there is an existing rule invocation Ri compatible with (where all Vth are unbound or bound to TT) the terms in the query
        • if yes, Ri += the the map from Vth to TT (the variables bindings which make Th match Tq).
        • if no, create a new Ri with the the map from Vth to TT.
  • mapped BGP is cross product of all rule body invocations with the bound terms substituted and with fresh variables (e.g. ?_Emp_TaskToFoaf_0_pair) for body variables not matched in the head. repeated invocations of a rule (with different bindings) is a disjunction.

Single Rule Head Example

 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 SELECT ?lname {
   ?who  foaf:last_name    ?lname .
   ?who  foaf:knows        ?whom  .
   ?whom foaf:last_name    'Smith'^^xsd:string }

Matching this against the example rule, with no allowance for shared variables (by commenting out INTERSECTION), we get the expected solution:

 $ SPARQL -npm Emp_TaskToFoaf-noShare.map two.rq 
 SELECT ?lname  WHERE
 {
   ?_Emp_TaskToFoaf_2_pair <http://hr.example/DB/Task#drone> ?who .
   ?_Emp_TaskToFoaf_2_pair <http://hr.example/DB/Task#manager> ?whom .
   ?whom <http://hr.example/DB/Employee#lastName> "Smith"^^<http://www.w3.org/2001/XMLSchema#string>  .
   ?who <http://hr.example/DB/Employee#lastName> ?lname .   
 }

Matching against the rule example which allows ?who and ?whom to connect gives us several equivalent but pointless formulations:

 $ SPARQL -npm Emp_TaskToFoaf-noShare.map two.rq 
 SELECT ?lname  WHERE
 {
   {
     ?_Emp_TaskToFoaf_2_pair <http://hr.example/DB/Task#drone> ?who .
     ?_Emp_TaskToFoaf_2_pair <http://hr.example/DB/Task#manager> ?whom .
     ?whom <http://hr.example/DB/Employee#lastName> "Smith"^^<http://www.w3.org/2001/XMLSchema#string>  .
     ?who <http://hr.example/DB/Employee#lastName> ?lname .
   }
 UNION
   {
     ?_Emp_TaskToFoaf_0_pair <http://hr.example/DB/Task#drone> ?who .
     ?_Emp_TaskToFoaf_0_pair <http://hr.example/DB/Task#manager> ?whom .
     ?_Emp_TaskToFoaf_1_pair <http://hr.example/DB/Task#drone> ?whom .
     ?_Emp_TaskToFoaf_1_pair <http://hr.example/DB/Task#manager> ?_Emp_TaskToFoaf_1_man .
     ?_Emp_TaskToFoaf_1_man <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_1_mname .
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_0_mname .
     ?whom <http://hr.example/DB/Employee#lastName> "Smith"^^<http://www.w3.org/2001/XMLSchema#string>  .
     ?who <http://hr.example/DB/Employee#lastName> ?lname .
   }
 UNION
   {
     ?_Emp_TaskToFoaf_3_pair <http://hr.example/DB/Task#drone> ?whom .
     ?_Emp_TaskToFoaf_3_pair <http://hr.example/DB/Task#manager> ?who .
     ?_Emp_TaskToFoaf_4_pair <http://hr.example/DB/Task#drone> ?who .
     ?_Emp_TaskToFoaf_4_pair <http://hr.example/DB/Task#manager> ?whom .
     ?whom <http://hr.example/DB/Employee#lastName> "Smith"^^<http://www.w3.org/2001/XMLSchema#string>  .
     ?who <http://hr.example/DB/Employee#lastName> ?lname .
     ?who <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_4_wname .
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_4_mname .
   }
 UNION
   {
     ?_Emp_TaskToFoaf_5_pair <http://hr.example/DB/Task#drone> ?_Emp_TaskToFoaf_5_emp .
     ?_Emp_TaskToFoaf_5_pair <http://hr.example/DB/Task#manager> ?who .
     ?_Emp_TaskToFoaf_6_pair <http://hr.example/DB/Task#drone> ?who .
     ?_Emp_TaskToFoaf_6_pair <http://hr.example/DB/Task#manager> ?whom .
     ?whom <http://hr.example/DB/Employee#lastName> "Smith"^^<http://www.w3.org/2001/XMLSchema#string>  .
     ?who <http://hr.example/DB/Employee#lastName> ?lname .
     ?_Emp_TaskToFoaf_5_emp <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_5_wname .
     ?who <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_6_wname .
   }
 }

These are effectively equivalent; they just re-combine the pieces of the rule head in different ways to match the query. Their presence in the UNION breaks the cardinality of the mapped query.

Double Rule Head Example

Naturally, we need this connectivity to answer some queries. If we disallow graph continuity, we fail to match:

 SPARQL -npm Emp_TaskToFoaf-noShare.map two.rq 
 failed to match triples prefixed by "!" in
 SELECT ?lname 
 WHERE
 {
      ?who <http://xmlns.com/foaf/0.1/last_name> ?lname .
      ?who <http://xmlns.com/foaf/0.1/knows> ?whom .
 !    ?whom <http://xmlns.com/foaf/0.1/knows> ?whom2 .
 !    ?whom2 <http://xmlns.com/foaf/0.1/last_name> "Smith"^^xsd:string  .
 }

If we enable graph continuity, we get a useful disjoint (first below) plus the useless reformulations. Next to the mapped graph patterns are the bindings from variable in the Emp_TaskToFoaf rule head to term in the query. For instance, in the first solution below, the query triple ?who <http://xmlns.com/foaf/0.1/last_name> ?lname . matched the rule head ?emp foaf:last_name ?wname ., binding the rule variable ?emp to ?who.

 SPARQL -npm Emp_TaskToFoaf.map two.rq 
 SELECT ?lname 
 WHERE
 {
   {                                                                           #┌────────┬────────┬─────────┬───────┬─────────┐
     ?who <http://hr.example/DB/Employee#lastName> ?lname .                    #│ ?emp   │ ?man   │ ?mname  │ ?pair │ ?wname  │
     ?_Emp_TaskToFoaf_3_pair <http://hr.example/DB/Task#drone> ?who .          #│   ?who │  ?whom │      -- │    -- │  ?lname │
     ?_Emp_TaskToFoaf_3_pair <http://hr.example/DB/Task#manager> ?whom .       #│  ?whom │ ?whom2 │ "Smith" │    -- │      -- │
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_3_mname . #└────────┴────────┴─────────┴───────┴─────────┘
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_4_wname .
     ?_Emp_TaskToFoaf_4_pair <http://hr.example/DB/Task#drone> ?whom .
     ?_Emp_TaskToFoaf_4_pair <http://hr.example/DB/Task#manager> ?whom2 .
     ?whom2 <http://hr.example/DB/Employee#lastName> "Smith"^^xsd:string  .
   }
 UNION
   {                                                                           #┌────────┬────────┬─────────┬───────┬─────────┐
     ?who <http://hr.example/DB/Employee#lastName> ?lname .                    #│ ?emp   │ ?man   │ ?mname  │ ?pair │ ?wname  │
     ?_Emp_TaskToFoaf_0_pair <http://hr.example/DB/Task#drone> ?who .          #│   ?who │  ?whom │      -- │    -- │  ?lname │
     ?_Emp_TaskToFoaf_0_pair <http://hr.example/DB/Task#manager> ?whom .       #│  ?whom │ ?whom2 │      -- │    -- │      -- │
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_0_mname . #│ ?whom2 │     -- │      -- │    -- │ "Smith" │
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_1_wname . #└────────┴────────┴─────────┴───────┴─────────┘
     ?_Emp_TaskToFoaf_1_pair <http://hr.example/DB/Task#drone> ?whom .
     ?_Emp_TaskToFoaf_1_pair <http://hr.example/DB/Task#manager> ?whom2 .
     ?whom2 <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_1_mname .
     ?whom2 <http://hr.example/DB/Employee#lastName> "Smith"^^xsd:string  .
     ?_Emp_TaskToFoaf_2_pair <http://hr.example/DB/Task#drone> ?whom2 .
     ?_Emp_TaskToFoaf_2_pair <http://hr.example/DB/Task#manager> ?_Emp_TaskToFoaf_2_man .
     ?_Emp_TaskToFoaf_2_man <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_2_mname .
   }
 UNION
   {
     ?whom2 <http://hr.example/DB/Employee#lastName> "Smith"^^xsd:string  .    #┌────────┬────────┬─────────┬───────┬─────────┐
     ?_Emp_TaskToFoaf_5_pair <http://hr.example/DB/Task#drone> ?whom2 .        #│ ?emp   │ ?man   │ ?mname  │ ?pair │ ?wname  │
     ?_Emp_TaskToFoaf_5_pair <http://hr.example/DB/Task#manager> ?who .        #│ ?whom2 │   ?who │  ?lname │    -- │ "Smith" │
     ?who <http://hr.example/DB/Employee#lastName> ?lname .                    #│   ?who │  ?whom │      -- │    -- │      -- │
     ?who <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_6_wname .  #│  ?whom │ ?whom2 │      -- │    -- │      -- │
     ?_Emp_TaskToFoaf_6_pair <http://hr.example/DB/Task#drone> ?who .          #└────────┴────────┴─────────┴───────┴─────────┘
     ?_Emp_TaskToFoaf_6_pair <http://hr.example/DB/Task#manager> ?whom .
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_6_mname .
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_7_wname .
     ?_Emp_TaskToFoaf_7_pair <http://hr.example/DB/Task#drone> ?whom .
     ?_Emp_TaskToFoaf_7_pair <http://hr.example/DB/Task#manager> ?whom2 .
     ?whom2 <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_7_mname .
   }
 UNION
   {
     ?_Emp_TaskToFoaf_8_emp <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_8_wname .
     ?_Emp_TaskToFoaf_8_pair <http://hr.example/DB/Task#drone> ?_Emp_TaskToFoaf_8_emp .
     ?_Emp_TaskToFoaf_8_pair <http://hr.example/DB/Task#manager> ?who .
     ?who <http://hr.example/DB/Employee#lastName> ?lname .                    #┌────────┬────────┬─────────┬───────┬─────────┐
     ?who <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_9_wname .  #│ ?emp   │ ?man   │ ?mname  │ ?pair │ ?wname  │
     ?_Emp_TaskToFoaf_9_pair <http://hr.example/DB/Task#drone> ?who .          #│     -- │   ?who │  ?lname │    -- │      -- │
     ?_Emp_TaskToFoaf_9_pair <http://hr.example/DB/Task#manager> ?whom .       #│   ?who │  ?whom │      -- │    -- │      -- │
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_9_mname . #│  ?whom │ ?whom2 │ "Smith" │    -- │      -- │
     ?whom <http://hr.example/DB/Employee#lastName> ?_Emp_TaskToFoaf_10_wname .#└────────┴────────┴─────────┴───────┴─────────┘
     ?_Emp_TaskToFoaf_10_pair <http://hr.example/DB/Task#drone> ?whom .
     ?_Emp_TaskToFoaf_10_pair <http://hr.example/DB/Task#manager> ?whom2 .
     ?whom2 <http://hr.example/DB/Employee#lastName> "Smith"^^xsd:string  .
   }
 }

The most pressing question is, how can one detect which of these is redundant and which is needed for completeness? I expect the answer lies in the mapping tables (which are implemented as SPARQL result sets).

@@@Old Stuff

As I understand magic sets, you use them to model e.g. relational rules:

--  Management(worker name, department name, manager name)
CREATE VIEW Management FROM
  SELECT lackey.name AS worker, dept.name AS department, boss.name AS manager
    FROM Employees AS boss
         JOIN Departments AS dept ON boss.manages=dept.id
         JOIN Employees AS lackey ON lackey.department=dept.id


This, plus some schema (for position of vars) gets boiled down into a horn rule:

Management(WORKER, DEPARTMENT, MANAGER) <=
  Employees(NAME, _, _, DEPARTMENT)
  Departments(ID, _, NAME, _)
  Employees(NAME, ID)

A query on Management gets pushed into a join on the constituant tables (and if they are views, into *their* constituants…).

Given a schema:

CREATE TABLE Employee
  (empid INT, PRIMARY KEY (empid),
   firstName STRING,
   lastName STRING,
   birthday DATE,
   manager INT, FOREIGN KEY (manager) REFERENCES Employee(empid)
  );
CREATE TABLE Tasks
  (taskid INT, PRIMARY KEY (taskid),
   name STRING,
   lead INT, FOREIGN KEY (lead) REFERENCES Employee(empid)
  );
CREATE TABLE TaskAssignments
  (id INT PRIMARY KEY, PRIMARY KEY (id),
   task INT, FOREIGN KEY (task) REFERENCES Tasks(taskid),
   employee INT, FOREIGN KEY (employee) REFERENCES Employee(empid)
  );


, an interface graph definition:

PREFIX foaf : <http://xmlns.com/foaf/0.1/>
PREFIX empP : <http://hr.example/DB/Employee#>
CONSTRUCT {
  ?who foaf:first_name ?fname .
  ?who foaf:last_name ?lname .
} WHERE {
  ?who empP:firstName ?fname .
  ?who empP:lastName ?lname .
}


and a interface query:

PREFIX foaf : <http://xmlns.com/foaf/0.1/>
PREFIX xsd : <http://www.w3.org/2001/XMLSchema#>
SELECT ?emp {
  ?emp foaf:last_name ?name
  FILTER (?name = "Smith"^^xsd:string)
}


, FeDeRate can produce either an SQL query:

SELECT R_emp.empid AS emp
FROM Employee AS R_emp
WHERE R_emp.empid IS NOT NULL AND R_emp.lastName="Smith"


or an SQL triples view:

CREATE VIEW triples AS
  SELECT CONCAT("http://hr.example/DB/", "Employee", "/", "empid", ".", R_S.empid, "#record") AS S,
         "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>" AS P,
         "<http://xmlns.com/foaf/0.1/Person>" AS O
    FROM Employee AS R_S
    WHERE R_S.firstName IS NOT NULL AND R_S.lastName IS NOT NULL
UNION
  SELECT CONCAT("http://hr.example/DB/", "Employee", "/", "empid", ".", R_S.empid, "#record") AS S,
         "<http://xmlns.com/foaf/0.1/first_name>" AS P,
         CONCAT("'", R_S.firstName, "'^^<http://www.w3.org/2001/XMLSchema#string>") AS O
   FROM Employee AS R_S
  WHERE R_S.empid IS NOT NULL AND R_S.firstName IS NOT NULL
UNION
  SELECT CONCAT("http://hr.example/DB/", "Employee", "/", "empid", ".", R_S.empid, "#record") AS S,
         "<http://xmlns.com/foaf/0.1/last_name>" AS P,
         CONCAT("'", R_S.lastName, "'^^<http://www.w3.org/2001/XMLSchema#string>") AS O
    FROM Employee AS R_S
   WHERE R_S.empid IS NOT NULL AND R_S.lastName IS NOT NULL


This is done by creating a horn rule for each triple in the rules heads. Iterating through the triples in any BGP in the query (assuming non-variable predicates), we can find the corresponding rule, check if we have invoked that rule with that specific variable substitution. If it's "compatible", continue to the next triple; if there's no corresponding set of bindings for that rule, add a binding which makes the triple "true". See this in detail below.

The resulting set of rules and corresponding bindings are instantiated, and the graph is trimmed to get rid of parts that bind variables that are uninteresting to the query (@@ discuss soundness).

Best Matches

The reason that the rule can have multiple bindings is that a query can match the rules heads in multiple ways. The interface graph definition:

PREFIX foaf : <http://xmlns.com/foaf/0.1/>
PREFIX empP : <http://hr.example/DB/Employee#>
PREFIX task : <http://hr.example/DB/Task#>
CONSTRUCT {
  ?emp foaf:last_name ?wname .
  ?emp foaf:knows ?man .
  ?man foaf:last_name ?mname .
} WHERE {
  ?emp empP:lastName ?wname .
  ?pair task:drone ?emp .
  ?pair task:manager ?man .
  ?man empP:lastName ?mname .
}

has one rule, here called "rule1". The index is:

_ foaf:last_name _ => [(?emp foaf:last_name ?wname, rule1),
                       (?man foaf:last_name ?mname, rule1)] .
_ foaf:knows _ => [(?emp foaf:knows ?man, rule1)] .


For more rigor, examine a query which requires multiple invocations of the rule:

PREFIX foaf : <http://xmlns.com/foaf/0.1/>
PREFIX xsd : <http://www.w3.org/2001/XMLSchema#>
SELECT ?lname {
  ?who foaf:last_name ?lname .
  ?who foaf:knows ?whom .
  ?whom foaf:knows ?whom2 .
  ?whom2 foaf:last_name "Smith"^^xsd:string .
}


This query has predicates (foaf:last_name) which match the rule head in multiple places. The current algorithm recursively matches each triple in the query in as many places as possible, and then selects the solution with the smallest non-zero number of rule "invocations". `?who foaf:last_name ?lname` matches rule1 with either `?who=>?emp, ?lname=>?wname` or `?who=>?man, ?lname=>?mname`. The former requires two total rule invocations and the latter three, so the former is chosen. In more detail, that's:

rule1 ?emp ?wname ?man
?who ?lname ?whom
?whom --- ?whom2

vs:

rule1 ?emp ?wname ?man
?man ?mname ---
?who ?whom
?whom --- ?whom2

Inventing variable names for the variables referenced in rule1's body, but not in the head:

rule1 ?emp ?wname ?man ?mname
?who ?lname ?whom ---
?whom --- ?whom2 "Smith"xsd:string

we can instantiate rule1's body with the above variable substitutions. We trim the triples which do not lead to a variable selected or constrained in the user query, resulting in a stem query:

PREFIX empP : <http://hr.example/DB/Employee#>
PREFIX task : <http://hr.example/DB/Task#>
PREFIX xsd : <http://www.w3.org/2001/XMLSchema#>
SELECT ?lname WHERE {
  ?who empP:lastName ?lname .
  ?_0_pair task:drone ?who .
  ?_0_pair task:manager ?whom .
  ?_1_pair task:drone ?whom .
  ?_1_pair task:manager ?whom2 .
  ?whom2 empP:lastName "Smith"^^xsd:string .
}


which is can be converted to SQL.

Application to Query Federation

By associating each query with an endpoint, this algorithm can be used to insert service clauses into a submitted query. The completeness of the Best Match semantics above become an issue as you throw away opportunities for e.g. service one to bind first_name and service two to bind last_name; weird, but possible.

@@ noodling on query approach @@

If we formed a graph from the query:

  <who> foaf:last_name <lname> .
  <who> foaf:knows <whom> .
  <whom> foaf:knows <whom2> .
  <whom2> foaf:last_name "Smith"^^xsd:string .

we still need to add `<whom> foaf:last_name "XXX"xsd:string .` to get the rule head to match they query, (result:


┌────────┬─────────┬────────────────────────────────────────────────────┬──────────────────────────────────────────────────┐
│ ?emp   │ ?man    │ ?mname                                             │ ?wname                                           │
│  <who> │  <whom> │   "XXX"^^<http://www.w3.org/2001/XMLSchema#string> │                                          <lname> │
│ <whom> │ <whom2> │ "Smith"^^<http://www.w3.org/2001/XMLSchema#string> │ "XXX"^^<http://www.w3.org/2001/XMLSchema#string> │
└────────┴─────────┴────────────────────────────────────────────────────┴──────────────────────────────────────────────────┘


). If, instead of adding that phoney triple patter, we instead make all of the rule head triple patterns OPTIONAL (because the query pattern can be any subset of the CONSTRUCTed data):

SELECT * {
  OPTIONAL { ?emp foaf:last_name ?wname }
  OPTIONAL { ?emp foaf:knows ?man }
  OPTIONAL { ?man foaf:last_name ?mname }
}


, we get results which are dependent on the order of the triples evaluated in the match (as OPTIONALs are ordered):

┌─────────┬─────────┬────────────────────────────────────────────────────┬────────────────────────────────────────────────────┐
│ ?emp    │ ?man    │ ?mname                                             │ ?wname                                             │
│   <who> │  <whom> │                                                 -- │                                            <lname> │
│ <whom2> │   <who> │                                            <lname> │ "Smith"^^<http://www.w3.org/2001/XMLSchema#string> │
│ <whom2> │ <whom2> │ "Smith"^^<http://www.w3.org/2001/XMLSchema#string> │ "Smith"^^<http://www.w3.org/2001/XMLSchema#string> │
└─────────┴─────────┴────────────────────────────────────────────────────┴────────────────────────────────────────────────────┘


A possible solution to this is some sort of FULL JOIN semantics, but I have no idea then how to remove the redundant rows.