Design:Negation

From SPARQL Working Group
Revision as of 03:27, 29 September 2009 by Lfeigenb (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Negation

Definition and Scope of feature

Provide an explicit operator for negation in SPARQL, i.e. to express a graph pattern which must not match in the dataset.

This defines negation for the pattern based NOT EXISTS.

An alterntive design is the use of set MINUS, then it is the removal of certain query solutions from a set of other query solutions. See below.

Status

In discussion in the working group.

http://www.w3.org/2009/sparql/track/issues/29 - which negation design to choose?

Design : NOT EXISTS

One design is a binary NOT EXISTS operator as a graph pattern operation. This is the UNSAID design from the previous working group. The operator filters matches to the preceding (left hand side) pattern by taking each possibility, then seeing is the pattern (right hand side) matches for that set of bindings.

Examples : Graph Pattern Operator

All people, who do not know Simon

Dataset:

 <#alice> foaf:name "Alice"; 
          foaf:knows [foaf:name "Simon"] .
 <#bob> foaf:name "Bob" .

Query:

 SELECT ?name
 WHERE {
   ?x foaf:name ?name
   NOT EXISTS {
     ?x foaf:knows [foaf:name "Simon"]
   }
 }

returns "Bob".

Ordering issue: In the following, the NOT EXISTS always matches, because ?x is not yet bound and Alice matches the NOT EXISTS pattern. Hence, also Bob is not returned.

 SELECT ?name
 WHERE {
   NOT EXISTS {
     ?x foaf:knows [foaf:name "Simon"]
   }
   ?x foaf:name ?name
 }

This form is clearly readable also for nested NOT EXISTS:

All people, with only mutual friends (i.e. who do not foaf:know people, who do not foaf:know themselves)

 SELECT ?name
 WHERE {
   ?x foaf:name ?name
   NOT EXISTS {
     ?x foaf:knows ?y
     NOT EXISTS {
       ?y foaf:knows ?x
     }
   }
 }

Filter based

The graph operator acts to filter matches of the left hand side using the righ hand side pattern. In this sense, it is a FILTER, but unlike a SPARQL FILTER syntax it does not act logically over later triple patterns. In the translation to the SPARQL algebra, it does split the BGP at this point (c.f. OPTIONAL) and then becomes an algebra filter.

Functions EXISTS and NOT EXISTS for a FILTER could also be introduced:

All people, who do not know Simon

 SELECT ?name
 WHERE {
   ?x foaf:name ?name
   FILTER (
     !EXISTS {
       ?x foaf:knows [foaf:name "Simon"]
     }
   )
 }

As FILTER is evaluated in the end, this does not result in teh same ordering issues. Moreover, if we follow the initial assumption, that negated subqueries can not bind variables, they naturally correspond to subqueries.

A disadvantage here is, that this syntactic form is more difficult to detect and optimize if it is part of a complex filter expression.

This form becomes more difficult to read in the presence of nested negation:

All people, with only mutual friends (i.e. who do not foaf:know people, who do not foaf:know themselves)

 SELECT ?name
 WHERE {
   ?x foaf:name ?name
   FILTER (
     !EXISTS {
       ?x foaf:knows ?y
       FILTER (
         !EXISTS {
           ?y foaf:knows ?x
         }
       )
     }
   )
 }

Syntax

The keywords EXISTS and NOT EXISTS can be used both inside and outside FILTERs.

EXISTS is mainly for completeness.

GraphPatternNotTriples ::=  ... | ExistsPattern | NotExistsPattern
ExistsPattern ::= 'EXISTS' GroupGraphPattern
ExistsPattern::= 'NOT EXISTS' GroupGraphPattern

For for FILTER

BuiltInCall  	  ::=  ... | 'EXISTS' GroupGraphPattern | 'NOT EXISTS' GroupGraphPattern

This is (syntactically) ExistsPattern and NotExistsPattern again but a parser might trigger different internal structures so it is better to restate the forms.

Algebra Operator

There is a filter operator "exists" that takes a graph pattern. exists returns true/false depending on whether the pattern matches. No additional binding of variables occurs.

Mapping from AST to Algebra

 { ?s rdf:type <t> 
   NOT EXISTS { ?s <p> ?v }
   ?s :p ?o
 }

becomes a filter using (not (exists ...)) over the preceding pattern. That is, it is a filter in the algebra but it does not get moved around during translation to the algebra as it would if a FILTER in the syntax (without adding more syntax {} to control that).

 (join
   (filter (not (exists { ?s <p> ?v }))
      (bgp (triple (?s rdf:type <t>))))
   (bgp (triple (?s :p ?o))))

So NOT EXISTS as a graph pattern operator is equivalence to the filter form:

P1 NOT EXISTS P2

{P1 FILTER (NOT EXISTS P2)}

The additional { } puts the FILTER in the right place.

Test Cases

@@TODO


Alternative Design : MINUS

An alternative approach is based on set based MINUS, e.g. SeRQL. This assumes both operands of MINUS return the same variable bindings (in SQL terms, column compatible), in order to allow for the set based semantics. E.g. if the left hand side of MINUS binds 3 variables, the right hand side must bind exactly these three as well.

SQL has both MINUS and EXISTS condition.

Example in SeRQL (from here):

 SELECT title
 FROM {album} dc10:title {title}
 MINUS
 SELECT title
 FROM {album} dc10:title {title};
              dc10:creator {creator}
 WHERE creator like "Paul"
 USING NAMESPACE
     dc10 = <http://purl.org/dc/elements/1.0/>,
     dc11 = <http://purl.org/dc/elements/1.1/>

Distinguish MINUS from NOT EXISTS

Dataset:

 _:simon1 foaf:name "Simon" .
 _:simon2 foaf:name "Simon" .
          foaf:knows _:simon1 .
 _:alice  foaf:name "Alice" ; 
          foaf:knows _:simon1 .
 _:bob    foaf:name "Bob" .
 _:eve    foaf:knows _:simon1 .
 _:dan    foaf:name "Dan" ;
          foaf:knows _:simon2 .

UNSAID Query:

 SELECT ?who
 WHERE {
   ?who foaf:name ?name
   UNSAID {
     ?who  foaf:knows ?whom .
     ?whom foaf:name "Simon" .
   }
 }
?who !foaf:knows "Simon"
 ?who  ?name  ?whom
_:simon1 "Simon"
_:bob "Bob"

MINUS Query:

 SELECT ?who
 WHERE {
   ?who foaf:name ?name
   MINUS {
     ?who  foaf:knows ?whom .
     ?whom foaf:name "Simon" .
   }
 }
?x foaf:name ?name
 ?who  ?name
_:simon1 "Simon"
_:simon2 "Simon"
_:alice "Alice"
_:bob "Bob"
_:dan "Dan"
?who knows [ name "Simon" ]
 ?who  ?whom
_:simon2 _:simon1
_:alice _:simon1
_:eve _:simon1
_:dan _:simon2


Both return ( _:simon1, _:bob ). P1 MINUS P2 has more of a bottom-up evaluation so the temporary set may be larger (include folks like _:eve who know Simon, but don't have a foaf:name).

MINUS with unification on multiple columns

 SELECT ?who
 WHERE {
   ?who foaf:name ?name
   MINUS {
     ?who  foaf:knows ?whom .
     ?whom foaf:name ?name .
   }
 }
?x foaf:name ?name
 ?who  ?name
_:simon1 "Simon"
_:simon2 "Simon"
_:alice "Alice"
_:bob "Bob"
_:dan "Dan"
?who knows [ name "Simon" ]
 ?who  ?name  ?whom
_:simon2 "Simon" _:simon1
_:alice "Simon" _:simon1
_:eve "Simon" _:simon1
_:dan "Simon" _:simon2

Because the binding { ?who:simon2, ?name:"Simon" } appears in the MINUS set, that row gets removed, yielding the same results as

 SELECT ?who
 WHERE {
   ?who foaf:name ?name
   UNSAID {
     ?who  foaf:knows ?whom .
     ?whom foaf:name ?name .
   }
 }

@@Untested thesis: the difference in these two forms is only detectable in oddball queries with a OPTIONAL in P1 binds variable ?x differently than it does in an OPTIONAL in P2.