Re: shapes-ISSUE-30 (shape-and-data-graphs): Are shapes and data in the same graph? [SHACL Spec] from Holger Knublauch on 2015-04-09 (public-data-shapes-wg@w3.org from April 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 10 Apr 2015 09:32:04 +1000
To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <55270BF4.10003@topquadrant.com>

On 4/10/2015 0:33, Dimitris Kontokostas wrote:
> What is the point of supporting SPARQL if we cannot support SPARQL 
> endpoints? I thought one of the goals of having SPARQL as a syntax in 
> SHACL was to be able to move away from the RAM limitations and get the 
> benefits of the SPARQL query optimizations.
> Jena is a great tool and I use it a lot but when the dataset gets big 
> (a few GB) the validation speed gets exponentially slower compared to 
> Virtuoso.

This depends on how to use a library such as Jena. Jena does have its 
own SPARQL engine (ARQ) built-in, which by default works on basically 
every database by opening triple iterators and then doing all the FILTER 
processing in "client" memory. However, Jena can also be used in a way 
that it sends complete Queries to a remote database, e.g. with a SPARQL 
end point (create custom Query and QueryExecution objects, and there are 
other hooks on the Algebra layer). From a specification's point of view 
this seems to be just an implementation detail. The spec only specifies 
that it must have a dataset and then when a query gets executed on a 
given named graph, it's up to the implementation to decide how to 
execute that query - it may just pass it on to the database in a single 
transaction and thus use all the native goodies of that database.

>
> There is already user story 34 that captures this need and I could 
> many others if needed.

I am not doubting this, and of course supporting databases is very 
important!

>
> Nevertheless, a SPARQL endpoint can be considered as an RDF dataset 
> and named graphs can indeed be used to separate constraints and data.
>
> What needs to be defined by the WG is not the support of SPARQL 
> endpoints but if the constraints and the data MUST be on the same 
> dataset or not if they can exist in separate datasets.
> The fact that Jena can merge two dataset in memory is just an 
> implementation optimization IMHO.

Agreed. And I currently don't see how we could support multiple 
datasets. I currently experiment with a design based on named graphs, 
similar to what Richard proposed - either via a variable GRAPH 
?shapesGraph or a dedicated special URI that would be used via GRAPH 
sh:ShapesGraph. Some queries currently do indeed need to jump back and 
forth between the default graph and that "Shapes Graph". There are 
multiple options on how to address these scenarios, but it is not clear 
to me yet what design will work best overall.

Holger

Received on Thursday, 9 April 2015 23:33:29 UTC