Re: ISSUE-105: Prefixes in SPARQL fragments from Holger Knublauch on 2016-05-08 (public-data-shapes-wg@w3.org from May 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 9 May 2016 09:19:09 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <80936584-9a65-3869-7300-a040a58b7e04@topquadrant.com>

On 8/05/2016 20:20, Eric Prud'hommeaux wrote:
>
> On May 7, 2016 6:36 PM, "Karen Coyle" <kcoyle@kcoyle.net 
> <mailto:kcoyle@kcoyle.net>> wrote:
> >
> > A better explanation, perhaps:
> >
> >
> > On 5/6/16 10:13 PM, Holger Knublauch wrote:
> >>>
> >>> At the same time, I understand the immediate need. I think we need,
> >>> however, to clarify what we expect in the pre-SHACL process on both
> >>> shapes and data graphs. Already there is the decision that rdf:type
> >>> statements must be explicit in both graphs, even though that implies
> >>> some pre-processing.
> >>
> >>
> >> Usually no pre-processing of the data graph is needed. The language is
> >> designed so that it walks the subclass hierarchy in a couple of
> >> important places, making the (expensive and often even impractical)
> >> pre-processing of rdf:type triples unnecessary.
> >
> >
> > The requirement that rdf:type declarations be explicit, because no 
> inferencing will be done on domains and ranges in the data graph's 
> defined vocabulary, implies that in some cases the data graph will 
> need to be pre-processed so that the rdf:type declarations are there 
> in the graph. We have talked about this. SHACL has some expectations, 
> and how one gets ones data to meet those expectations is out of band.
> >
> > How does SHACL walk the subclass hierarchy unless that information 
> is in the data graph? Carrying those declarations in a data graph, in 
> my experience, would be unusual.
>
> What if validation takes a list of data graphs instead of a single 
> graph? One could stick the supplemental inferences from preprocessing 
> into a separate graph and thus not alter or have to copy the initial data.
>

While it is technically no problem to merge multiple data graphs into a 
new virtual union graph (see Jena's MultiUnion), SHACL would run into 
scalability issues if we allowed multiple data graphs as arguments. A 
SPARQL processor could no longer exploit the optimizations of a database 
and would instead need to break every BGP into multiple small SPO queries.

Having said this, some RDF databases with inferencing support will apply 
the very technique that you are describing, and keep the inferences as a 
separate internal graph. I don't think that SHACL would need to be 
concerned with such implementation details though. It is a very common 
practice to have SPARQL queries that walk the class hierarchy with 
rdfs:subClassOf* at runtime, and our definition of SHACL should continue 
to do the same. Some SHACL implementations may apply inferencing instead 
and bypass the rdfs:subClassOf* trick, but that's up to the implementation.

Holger

Received on Sunday, 8 May 2016 23:19:42 UTC