This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).
In order to decide how to interpret a dataset, it must be determined how plain RDF graphs are interpreted. According to the existing standards, there are several ways of interpreting a set of triples, each being tied to what SPARQL calls an entailment regime. As of today, the standard entailment regimes are Simple Entailment, RDF Entailment, RDFS Entailment, D Entailment, OWL Entailment with Direct Semantics, OWL Entailment with RDF Based Semantics and RIF Entailment.
The entailment regime is determined by the application and should be either fixed by it and described in the documentation, or changeable through the application setup.
Informal description of the semantics
Given an entailment regime E, the dataset is interpreted in the following way:
- the default graph has the same meaning as an isolated RDF graph according to the regime E. So, a dataset with no <name,graph> pair can be identified with a plain RDF graph. This allows us to treat RDF graph as if they were datasets with minimal abuse of notations.
- the <n,g> pairs are interpreted as a relationship between the "name" n (or the resource it denotes) and a certain graph, not necessarily g. This relationship is considered to be true for a dataset, if the graph in relation with n E-entails g.
This means that the name n (or the resource it denotes) is associated with a graph that has at least the truth of the graph g, according to the chosen entailment regime.
Issue 1: can the entailment regime of the default graph be different from the one of the <name,graph> pairs?
Issue 2: do we want to allow an entailment regime that is "weaker" than Simple Entailment? Something like the "no-semantics" in one of our previous proposals.
Issue 3: can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a possible extension in a previous proposal.
Issue 4: should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite. See also Issue 6 below.
Issue 5: this semantics does not completely covers the "graph quote" use case where one wants to explicitly say that the graph is quoted, that is, the terms used, in addition to their meaning, are important.
Let E be an entailment regime and V a vocabulary of IRIs and literals. An E-dataset-interpretation over vocabulary V is a pair I = <Id,IGEXT> such that:
- Id is an E-interpretation over vocabulary V;
- IGEXT is a function from a set of IRIs in V to the set of RDF graphs.
Issue: IGEXT could be a function from the set of resources defined by Id, instead of a set of IRIs.
Further, I is extended into a function assigning truth values to graphs, <name,graph> pairs and dataset as follows:
- for a graph G, I(G) is true iff Id(G) is true;
- for an IRI n and RDF graph g, I(<n,g>) is true iff IGEXT(n) is defined and E-entails g;
- for a dataset D=(DG,<n1,G1>,...,<nk,Gk>), I(D) is true iff I(DG) is true and for all i in 1..k, I(<ni,Gi>) is true.
Issue 6: if IGEXT maps resources to graphs rather than IRIs to graphs, then the second item must be replaced by "for an IRI n and RDF graph g, I(<n,g>) is true iff IGEXT(Id(n)) is defined and E-entails g;".