Re: ISSUE-23: Where should we query for class and subClassOf? from Irene Polikoff on 2016-01-20 (public-data-shapes-wg@w3.org from January 2016)

From: Irene Polikoff <irene@topquadrant.com>
Date: Tue, 19 Jan 2016 20:56:23 -0500
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-data-shapes-wg@w3.org
Message-Id: <39A00575-4BF0-4BDD-9189-6845E9B56AF8@topquadrant.com>

I agree that requiring applications to copy triples from one graph to another before they can submit to SHACL engine complicates things and it is a concern.

Can it be assumed as a default that boy shapes and data graph are queried and then, for cases where performance of this may be an issue, have a way to set an indicator that only shapes graph should be queried?

May guess is that this is not often will be a performance issue. Looking for parent classes of a given class is an easy query even for a large dataset.

Sent from my iPhone

> On Jan 19, 2016, at 8:09 PM, Holger Knublauch <holger@topquadrant.com> wrote:
> 
> (I predict that the separation between shapes and data graph will become a FAQ topic for SHACL. It may have been better to leave this topic out in version 1 of the standard, as many users will topple over it. Such things may break the standard's adoption because they complicate everything. In the end, the main benefit of having a shapes graph is optimizing performance (so that the data graph is not polluted), yet this may be considered premature optimization as engines can probably take care of this themselves.)
> 
> Anyway, your specific suggestion MAY work for me, as the shapes graph is a conceptual/logical entity only and engines may inject any number of triples into the shapes graph prior to validation. This doesn't make life easier though, so I am not sure.
> 
> Two issues come to my mind:
> 
> 1) if we assume the rdfs:subClassOf triples reside in the shapes graph only, then the user/engine needs to take care to consider extensibility: the data graph may include extensions of the core ontology that a shapes graph was developed against. For example someone may create a subclass ex2:Cat while the shape only knows about ex1:Animal. To prepare validation, some agent would need to make sure that ex2:Cat rdfs:subClassOf ex1:Animal is visible to the shapes graph.
> 
> 2) sh:class currently looks at the data graph. If we change the behavior of sh:scopeClass then we arguably would also need to change sh:class to walk the shapes graph.
> 
> Holger
> 
> 
>> On 20/01/2016 4:38 AM, Arthur Ryman wrote:
>> While reading the spec I noticed the following statement:
>> 
>> "To determine class membership, the rdf:type and rdfs:subClassOf
>> triples are queried in the data graph."
>> 
>> However, querying the data graph for class and subclass information is
>> inconsistent with the example SPARQL for determining which shapes are
>> classes:
>> 
>> "As syntactic sugar for the scenario above, SHACL includes a rule that
>> if a class is also a shape (in the shapes graph), then the
>> sh:scopeClass triple pointing at itself can be omitted. This rule is
>> illustrated by the following SPARQL CONSTRUCT query, which may be
>> executed over the shapes graph prior to validation, to produce the
>> implicit sh:scopeClass triples."
>> 
>> CONSTRUCT {
>> ?class sh:scopeClass ?class .
>> }
>> WHERE {
>> ?class rdfs:subClassOf*/rdf:type rdfs:Class .
>> ?class rdfs:subClassOf*/rdf:type sh:Shape .
>> }
>> 
>> I propose that in both cases we query the shapes graph, NOT the data graph.
>> 
>> Recall that we expect the application to provide a shapes graph and a
>> data graph as input to the SHACL validator. Therefore, the application
>> can always copy any rdfs:Class and rdfs:subClassOf triples into the
>> shapes graph. Although RDF does not require class definitions to be
>> separated from data instances, in practice these are often separated.
>> Both shapes and classes are more properly regarded as metadata than
>> data.
>> 
>> -- Arthur
> 
>

Received on Wednesday, 20 January 2016 01:57:03 UTC