shapes-ISSUE-80 (Scheme URIs): Constraint to limit IRIs against scheme/namespace, possibly with dereferencing [SHACL Spec]

shapes-ISSUE-80 (Scheme URIs): Constraint to limit IRIs against scheme/namespace, possibly with dereferencing [SHACL Spec]

http://www.w3.org/2014/data-shapes/track/issues/80

Raised by: Holger Knublauch
On product: SHACL Spec

This requirement has come up several times, most recently in http://lists.w3.org/Archives/Public/public-rdf-shapes/2015Aug/0000.html
Given that it appears to be a genuine use case, we should consider adding support for this in the Core language.

The problem statement could be summarized that some property values need to stem from a controlled vocabulary with a given scheme URI or namespace, and validation should also make sure that the terms actually exist. Some of this data may originate from online resources.

As a minimum, I believe we should add a keyword such as sh:valueScheme (suggested by Miika Alonen).  For example

sh:property [
    sh:predicate org:memberOf ;
    sh:valueClass skos:Concept ;
    sh:valueScheme <http://id.loc.gov/authorities/names> .
] .

would enforce the constraint that all IRI values of org:memberOf must start with "http://id.loc.gov/authorities/names".  This would basically be a macro for a SPARQL STRSTARTS operation and is easy to implement.  (A side question: would sh:valueScheme imply that all values must be IRIs). I would vote for adding this in any case.

The more difficult question is what to do with dynamic look up of resources. Phil Archer seems to suggest that the system should also go to the URL and make a live lookup to see if the site returns 200. While this is certainly doable, I am concerned about performance, and in many deployment scenarios people may actually prefer to store the reference data in a controlled named graph.

The sh:valueScheme information could be used as a pre-processor to download the required triples. We could decide to leave this aspect outside of the spec. If these were downloaded into the main graph then it would have the advantage that the system could equally validate the sh:valueClass. However, it is not always possible or efficient to modify the main query graph. Another option would be for a pre-processor to download the missing triples into a given named graph, separate from the main data graph. The system could then use something like GRAPH <...scheme...> { ... } to verify that the triple exists, which would cover the use case in which the data is already downloaded.

Requires further discussion, but I want to capture this as an ISSUE to indicate that we do take user feedback serious.

Received on Thursday, 13 August 2015 22:31:59 UTC