Existing Systems

From RDF Data Shapes Working Group
Jump to: navigation, search

There are several existing systems that have been proposed as potential starting points for the output of the working group. A high-level overview and pointers for future reading are given below for each of them, presented in alphabetical order.

LDOM

LDOM (Linked Data Object Model) is an RDF-based modeling language that is compatible with Linked Data principles and leverages some object-oriented concepts to the Web. LDOM makes it possible to define classes together with their associated properties, and to specify additional constraints that valid members of those classes should fulfill. These additional constraints are expressed using SPARQL, yet the structural definitions of associated properties can also be used by tools without a SPARQL processor. In addition to a vocabulary for structural constraints, LDOM includes modularization features that make it possible to define new modeling terms (SPARQL functions and templates) based on executable semantics.

LDOM Primer

OWL Constraints (Stardog ICV)

OWL constraints verifies an RDF graph (that possibly contains an RDFS ontology) against a set of constraints, each of which being an OWL axiom. The verification succeeds if each constraint is true in the model of the RDF graph under the closed world and unique names assumption. This model is essentially what most people think of as the informal meaning of an RDF graph.

A typical axiom is of the form SubClassOf(class description), which requires that each instance of class satisfies the requirements of description. For example, SubClassOf(ex:person ObjectExactCardinality(2 ex:child)) requires that each instance of ex:person in the model has precisely two values for the ex:child property. Other kinds of axioms can be used to check other kinds of constraints, including constraints against particular nodes in the RDF graph or constraints on properties.

A version of OWL constraints is implemented in Stardog ICV. The ideas underlying OWL constraints start with proposals for constraints in OWL itself, such as Opening, Closing Worlds — On Integrity Constraints. Using Description Logics for RDF Constraint Checking and Closed-World Recognition presents an overview of how these ideas are modifed to work in an RDF context. Validating RDF with OWL Integrity Constraints shows how Stardog ICV was designed.

RDFunit

RDFunit verifies an RDF graph against a set of constraints in the form of SPARQL queries. The verification succeeds if each SPARQL query produces no results when run on the RDF graph. The queries are required to contain the SPARQL variable ?resource that denotes the RDF node that is involved in the error.

In order to provide a high level constraint editing interface, RDFUnit can read constraints defined in OWL, IBM Resource Shapes and DSP. For OWL, constraints are treated under the closed world and unique names assumption and a set of common OWL axioms is supported. RDFUnit supports the whole IBM Resource shapes spec but limited to shapes that are bound to a class through the oslc:describes property. In the case of DSP, RDFUnit supports a limited subset of the spec. The translation of high level constraints to RDFUnit SPARQL constraints is performed through a set of generators. In addition, to further facilitate the re-use of SPARQL queries, RDFUnit supports SPARQL patterns. A predefined set of patterns & generators is available here.

In RDFUnit SPARQL queries are not valid SPARQL queries, in the term that they ommit the SELECT, ASK, CONSTRUCT part of the query e.g. {?resource ?p ?o FILTER(...)}. Depending on the user reporting preferences and the constraint decorations the query type and requested variables are generated on runtime. RDFUnit supports four reporting formats that may transform the SPARQL query to ASK {}, SELECT count( DISTINCT ?resource) {}), SELECT DISTINCT ?resource {} or SELECT DISTINCT ?resource ?decor1 ?decor2 ?decor3 {} query.

In the general case, the user specifies a set of constraints that should validate an RDF graph. If the constraints are in OWL, RS or DSP they are first translated to RDFUnit's constraints specifications and then the validation is performed. As an auto discovery mode, RDFUnit can check the RDF graph and identify all used namespaces, dereference them and acquire defined constraints in OWL (usually the case), RS or DSP.

Resource Shapes

Proposed Version

The basic operation of OSLC Resource Shapes is validation, which takes as input a shape S, an RDF graph G, and a node N in G, and determines whether (G,N) conforms to S. Shapes have identifiers and can refer to each other so validation is a recursive process. This basic operation can be invoked directly or can be invoked based on the service that produced G in response to a a request related to N. Shapes can check the types of nodes in the graph, their properties, and the occurrence and value of properties.

Previous Version

A Resource Shape describes the expected contents of an RDF graph. Given a resource shape S, an RDF graph G, and a non-literal node N in G, a conformant Resource Shape processor P will produce a message M that states if (G,N) conforms to S, or the extend to which it fails to conform to P. The simplest type of processor P is a validator that produces a boolean message M which is TRUE if and only if (G,N) conforms to S. A validator thus defines a function, P(G,N,S) = Verdict where Verdict is TRUE or FALSE.

The OSLC Resource Shape specification describes the behavior of a validator informally since the target audience of the specification is application developers. This informality has not caused problems in practice. However, implementation experience with validators has shown there are some corner cases in the specification which would benefit from a more careful explanation.

In practice, a simple TRUE/FALSE message is not helpful to application developers since their task is normally to create valid RDF graphs. When the output is FALSE, the application developer should be given more information about where the problems occurs. It is therefore useful to create processors that output detailed error messages. The Resource Shape specification does not define error messages. That is left to the processor implementations.

The Resource Shape specification defines several types of constraints. For example, it is possible to constrain the occurrence of a property X in G to be exactly one, zero or one, one or more, or zero or more. Given (G, N), define the function count(G,N,X) as follow:

count(G,N,X) = #{O | (N,X,O) in G}

Then the occurrence constraints are:

oslc:Exactly-one <=> count(G,N,X) = 1
oslc:Zero-or-one <=> count(G,N,X) <= 1
oslc:One-or-more <=> count(G,N,X) >= 1
oslc:Zero-or-more <=> count(G,N,X) >= 0 ((which is always TRUE)

Most of the constraints in the specification can be formalized in a similar way. The meaning of the constraints should be clear from the specification except for a few corner cases.

Some of the constraints may involve more than one graph. For example, it is possible to assert that in the triple (N,X,O) the node O is a URI of an information resource that itself has an RDF graph G' that conforms to a shape S' (see oslc:valueShape). The association of G' with O is typically assumed to be the result of sending an HTTP GET request to O with an Accept header that specifies some RDF content type (Turtle, JSON-LD, RDF/XML, etc.). However, an implementation might use other mechanisms, e.g. a cached copy, a named graph in a quad store, etc. In this case the processor is validating a finite collection (Gi,Ni,Si) where i=1,..,n.

The specification also defines several mechanisms for associating a node N with a graph G. The simplest mechanism is when the graph contains one or more triples of the form (N, oslc:instanceShape S) where S is a URI that identifies a Resource Shape information resource (see oslc:instanceShape). The processor then must validate each (G,N,S) such that (N, oslc:instanceShape, S) is in G.

Another mechanism is when the shape S asserts that the object of some property X conforms to another shape S' (using oslc:valueShape). Then for every (N,X,O) in G, the graph S' associated with O must conform to S', e.g. P(G',O,S')=TRUE

Another mechanism is when a web application provides a service description document that links certain of its URIs to shapes (see oslc:resourceShape). For example, a creation factory is a URI that accepts HTTP POST requests to create new information resources. The body of these requests is an RDF graph G. Suppose the service description links the creation factory to a shape S. S identifies a set of zero RDF types (see oslc:describes). If S describes type T and (N,rdf:type,T) is in G then G must conform to S, otherwise the application will fail the creation request.

ShExC

The ShEx API takes a starting node, a schema, and a flag to interpret the shapes in the schema as closed or open; validate(R, S, flag). The starting node R is valid with respect to the start node S of the schema when all of the following are met:

  • Each rule r in S corresponds to some number n of triples R P O such that P O match the property/object constraints in r, and that n is within the required cardinality bounds.
  • If the rule is a reference (has '@' before the name of another shape, validitiy of O is tested against that shape s, validate(O, s).
  • If the schema is "closed", there are no triples S P O such that S was tested against one or more shapes during the validation process (i.e. it was R).

A common use of this API is a "find-types" utility which tests all nodes against all shapes.

SPIN

SPIN verifies an RDF graph against a document containing classes connected to constraints in the form of SPARQL queries. The verification succeeds if each SPARQL query with ?this bound to instances of the class produces no results when run on the RDF graph. The query does not have to contain a ?this variable, thus allowing global constraints.

For example, person spin:ask "ASK { FILTER (spl:objectCount(?this, child) <= 2) requires that each instance of person is the subject of at most two different triples with predicate child.

In addition to checking constraints for a whole graph, SPIN also supports checking constraints for a given resource only, in which case the engine will walk up the class hierarchy to discover all "relevant" constraints. These constraints may walk into adjacent resources if needed.

In the context of the WG, SPIN is superseded by LDOM.