Re: shapes-ISSUE-30 (shape-and-data-graphs): Are shapes and data in the same graph? [SHACL Spec]

On Fri, Apr 10, 2015 at 2:32 AM, Holger Knublauch <holger@topquadrant.com>
wrote:

> On 4/10/2015 0:33, Dimitris Kontokostas wrote:
>
>> What is the point of supporting SPARQL if we cannot support SPARQL
>> endpoints? I thought one of the goals of having SPARQL as a syntax in SHACL
>> was to be able to move away from the RAM limitations and get the benefits
>> of the SPARQL query optimizations.
>> Jena is a great tool and I use it a lot but when the dataset gets big (a
>> few GB) the validation speed gets exponentially slower compared to Virtuoso.
>>
>
> This depends on how to use a library such as Jena. Jena does have its own
> SPARQL engine (ARQ) built-in, which by default works on basically every
> database by opening triple iterators and then doing all the FILTER
> processing in "client" memory. However, Jena can also be used in a way that
> it sends complete Queries to a remote database, e.g. with a SPARQL end
> point (create custom Query and QueryExecution objects, and there are other
> hooks on the Algebra layer). From a specification's point of view this
> seems to be just an implementation detail. The spec only specifies that it
> must have a dataset and then when a query gets executed on a given named
> graph, it's up to the implementation to decide how to execute that query -
> it may just pass it on to the database in a single transaction and thus use
> all the native goodies of that database.
>
>
>> There is already user story 34 that captures this need and I could many
>> others if needed.
>>
>
> I am not doubting this, and of course supporting databases is very
> important!
>
>
>> Nevertheless, a SPARQL endpoint can be considered as an RDF dataset and
>> named graphs can indeed be used to separate constraints and data.
>>
>> What needs to be defined by the WG is not the support of SPARQL endpoints
>> but if the constraints and the data MUST be on the same dataset or not if
>> they can exist in separate datasets.
>> The fact that Jena can merge two dataset in memory is just an
>> implementation optimization IMHO.
>>
>
> Agreed. And I currently don't see how we could support multiple datasets.
> I currently experiment with a design based on named graphs, similar to what
> Richard proposed - either via a variable GRAPH ?shapesGraph or a dedicated
> special URI that would be used via GRAPH sh:ShapesGraph. Some queries
> currently do indeed need to jump back and forth between the default graph
> and that "Shapes Graph". There are multiple options on how to address these
> scenarios, but it is not clear to me yet what design will work best overall.


I think you are referring  to sh:valueShape and the sh:hasShape(?shape)
function right? I don't see any other case that could be problematic.
In this case, I was waiting for some clear definition for recursion in
order to make a proposal but I think we have many options to go with.
For example: If the data and the constraints are in the same graph we can
use the sh:hasShape() function you propose, otherwise use algorithm X to
execute the ShEx validation in multiple steps or Algorithm Y to convert the
ShEx shape into a (giant) SPARQL query similar to the ShEx 2 SPARQL [1].
If recursion is forbidden, things get much simpler and maybe - I need to
work on this first to say for sure - ShEx shapes could be just treated as
class shapes with an extra SPARQL filter.

We need to have a clear definition of the ShEx shapes to see our options
and we shouldn't limit the language design in advance.

Proposed resolution:Shapes and data are expected to exist in different
graphs unless specified specified otherwise

Dimitris


[1] http://www.w3.org/2013/ShEx/toSPARQL.html


>
>
> Holger
>
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://http://aligned-project.eu
Homepage:http://aksw.org/DimitrisKontokostas
Research Group: http://aksw.org

Received on Friday, 10 April 2015 05:13:40 UTC