SPARQL OPTIONAL vs UNION treatment of sub-graph matching

As part of our work in implementing ontology-based querying, we have
developed an algebra for SPARQL which we have recently submitted for
publication. During the development of this query algebra, we have found
that the treatment of graph matching in OPTIONAL and UNION graph patterns
is, in our opinion, inconsistent, in particular with respect of the issue of
sub-graph matching - that is, in the issue of whether solutions to a graph
pattern can be sub-graphs of other solutions.
 
Consider, for example, the following OPTIONAL query (from the SPARQL spec):
 
Data:
@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .



_:a  rdf:type        foaf:Person .

_:a  foaf:name       "Alice" .

_:a  foaf:mbox       <mailto:alice@example.com> .

_:a  foaf:mbox       <mailto:alice@work.example> .



_:b  rdf:type        foaf:Person .

_:b  foaf:name       "Bob" .
OPTIONAL Query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name ?mbox

WHERE  { ?x foaf:name  ?name .

         OPTIONAL { ?x  foaf:mbox  ?mbox }

       }
OPTIONAL query Result:  

name	mbox	
"Alice"	<mailto:alice@example.com>	
"Alice"	<mailto:alice@work.example>	
"Bob"	(unbound)	
 
Now, consider, over the same data set, the following UNION query:
UNION Query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name ?mbox

WHERE  {    { ?x foaf:name  ?name }

            UNION 
            { ?x foaf:name  ?name . ?x  foaf:mbox  ?mbox }

       }
UNION Query Result, as per (our understanding of the) current SPARQL working
draft:
 
  
name	 mbox	
"Alice"	 (unbound)	
"Bob"	 (unbound)	
"Alice"	 <mailto:alice@example.com>	
"Alice"	 <mailto:alice@work.example>	

where the first two rows match the first part of the UNION pattern, and the
second two rows match the second part.

As can be seen, the results from the OPTIONAL and UNION queries are
different only in that the UNION query allows a sub-graph of another
solution, while OPTIONAL explicitly does not. While there is no requirement
in SPARQL that the two queries presented above produce the same results, we
argue that the implementation of query processors and optimizers for SPARQL
would be made simpler if either OPTIONAL or UNION is redefined so that both
the queries above yield the same result - and, therefore, so that OPTIONAL
can be defined in terms of UNION as follows:

P1 OPTIONAL P2  = P1 UNION { P1 . P2}

where both P1 and P2 are graph patterns, and where P1 . P2 is the
conjunction of P1 and P2.

In our opinion, the better solution would be to require that UNION graph
patterns state the requirement that sub-graphs of other solutions for the
graph pattern should be eliminated (much in the same way as OPTIONAL states
this requirement). As can be seen from above, the row that matches "Alice"
to ?name and leaves ?mbox unbound is at the very least useless, and
potentially dangerous (implying, as it does, that there is no mailbox for
"Alice"). 

Alternatively, both UNION and OPTIONAL could keep sub-graphs of other graphs
that match the pattern. In this second case, we would strongly suggest that
either a solution modifier or a graph pattern modifier be introduced so that
sub-graphs get eliminated from the final result at the query writer's
option.

Patrick Shironoshita
Ph: +1(305)670-5111 ext. 11
Fax: +1(305)670-7722
www.infotechsoft.com


 

Received on Friday, 27 October 2006 10:53:36 UTC