Warning:
This wiki has been archived and is now read-only.

ISSUE-3: Graph Shape Association

From RDF Data Shapes Working Group
Jump to: navigation, search

This page is about the various ways to make connections between RDF graphs or portions of RDF graphs and constraints (or shapes) to be validated against that graph or portions thereof. Issue 3 is concerned with this connection.

This page is not about how to select which part of the graph is to be validated against particular constraints or shapes. (This latter is the subject of the separate wiki page showing how nodes in an RDF graph are associated with shapes or constraints.)

Proposal

There are two ways of invoking a SHACL Engine.

In the first way there are two arguments. One argument is for SHACL structures. The other argument is for the data graph or dataset. The engine then gets SHACL constructs from the SHACL argument and checks that the data graph or dataset is valid with respect to those constructs. It is possible that the two arguments are the same, but there is no special processing in this case.

In the second way there is only one argument, which is an RDF graph or dataset. In the simple case, this RDF graph (or the default graph of the dataset) is used for SHACL structures and the graph or dataset is used as the data. However, if the RDF graph (or default graph of the dataset) has a shacl pointer then the pointer is used to access an additional RDF graph which is merged with this graph to form the SHACL structures. If there is a data pointer then the pointer is used to access an RDF graph or dataset with is used *instead* of this graph as the data graph.


This proposal handles the tricky cases where data is being returned from a service and other situations where there is a need to partly or wholely intertwine the SHACL constructs and the data.

A service could return an RDF graph that contains something like this:

The start node is X
The start shape is S
Look at <https:...> for the shape definitions and other SHACL stuff
... data ... data .. data

To verify this claim a SHACL engine would run the shapes starting at the start node and the start shape against all the data in this graph.


A service could also return a small RDF graph that contains something like this:

Look at <https:...> for the scoped shapes
Look at <http:...> for the data

To verify this claim a SHACL engine would check that the data satisfies the scoped shapes.

Note that in the first service example that whatever encoding is used to specify the start node, start shape, and pointer to the other SHACL stuff ends up as part of the data that is being checked whereas in the second service example no SHACL-related informmation need show up in the data being checked.

Different Ways of Associating Constraints or Shapes with RDF Graphs

Embedding

The simplest kind of connection is having the constraint or shape be part of the RDF graph. These constraints or shapes would have to be identifiable as constraints or shapes and there would have to be indications on how they are to be validated.

So validating the graph available at ex:graph would look within this graph for sets of triples that encode a constraint or shape and validate them against the graph or portions thereof.

This kind of connection would be difficult in OWL constraints, because OWL constraints look just like OWL axioms. SPIN can work this way, with the RDF graph including SPIN triples linking classes to constraints that are also in the graph. ShEx does not appear to work this way.

Explicit Linking

Explicit links can be used to make connections from RDF graphs to documents containing constraints or shapes. This is similar to owl:imports. The RDF graph includes a triple whose object is the URL of a document containing constraints or shapes. (There might be some other way to get from the object of the triple to the constraints or shapes, but using the normal web mechanisms is the most natural way.) It is also possible to have an indirect link from the RDF graph to the constraints, i.e., the graph imports an ontology, which itself imports constraints or shapes.

So if an RDF graph containing data to be validated is available at ex:data, it would include a triple something like ex:data shape:imports ex:constraints and the document available at ex:constraints would include constraints or shapes to be validated against the graph.

OWL constraints can work this way, using a different explicit link from owl:imports. SPIN uses spin:imports for this purpose.

[HK: I believe we need two properties, one for the traditional owl:imports and one that includes the constraints without running the actual tests over the union graph. The latter is important for performance reasons: you want to be able to run all tests on a single database, yet the constraint definitions should not be repeated in every database graph. (PFPS: Agreed. However, this document is not about how to run the constraints, just how to get to them.) SPIN has spin:imports for that scenario.]

Implicit Linking

Implicit links can be used to make connections from RDF graphs to documents containing constraints or shapes. This is similar to the "follow your nose" philosophy. An IRI in the RDF graph can be turned into a URL which is a URL of a document containing constraints or shapes.

So if an RDF graph containing data to be validated is available at ex:data, the validation process would look at all of the IRIs in the graph (e.g., foaf:mbox and foaf:Person and ex1:Student and ex2:JohnSmith), try to use them to retrieve documents that contain constraints or shapes (perhaps the documents available at foaf:, ex1:, and ex2:, and then validate the graph or portions of the graph against these constraints.

Just about any setup that does not need external control of the constraints or shapes can work this way. OWL constraints and SPIN fall into this category. ShEx probably doesn't, as it needs some notion of what shapes are supposed to be applied to.

No Linking

The constraint or shape validation mechanism itself could take multiple arguments, one of which is a set of constraints or shapes to be validated.

So if an RDF graph containing data to be validated is available at ex:data and some constraints or shapes are available at ex:constraints, the constraint validation mechanism could be called as validate(ex1:data,ex2:constraints).

Any kind of constraint or shape system could be easily changed to work this way.