This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).
The first part of this page reflects a semantics that some of us (AZ, RC, IH) have agreed on. The second part contains some explicit issues that the group may want to formally vote on to make the decision process clearer; the semantics put forward reflects our choices on those issues, but the group may want to decide otherwise.
In order to decide how to interpret a dataset, it must be determined how plain RDF graphs are interpreted. According to the existing standards, there are several ways of interpreting a set of triples, each being tied to what SPARQL calls an entailment regime. As of today, the standard entailment regimes are Simple Entailment, RDF Entailment, RDFS Entailment, D Entailment, OWL Entailment with Direct Semantics, OWL Entailment with RDF Based Semantics and RIF Entailment.
The entailment regime is determined by the application and should be either fixed by it and described in the documentation, or changeable through the application setup.
Informal description of the semantics
Given an entailment regime E, the dataset is interpreted in the following way:
- the default graph has the same meaning as an isolated RDF graph according to the regime E. So, a dataset with no <name,graph> pair can be identified with a plain RDF graph. This allows us to treat RDF graphs as if they were datasets with a minimal abuse of notations.
- the <n,g> pairs are interpreted as a relationship between the resource the "name" n denotes and a certain graph, not necessarily g. This relationship is considered to be true for a dataset if the graph in relation with n E-entails g.
This means that the resource denoted by the name n is associated with a graph that has at least the truth of the graph g, according to the chosen entailment regime.
Let E be an entailment regime and V a vocabulary of IRIs and literals. An E-dataset-interpretation over vocabulary V is a pair I = <Id,IGEXT> such that:
- Id is an E-interpretation over vocabulary V;
- IGEXT is a function from the set of resources defined by Id to the set of RDF graphs.
Further, I is extended into a function assigning truth values to graphs, <name,graph> pairs and dataset as follows:
- for a graph G, I(G) is true iff Id(G) is true;
- for an IRI n and RDF graph g, I(<n,g>) is true iff IGEXT(Id(n)) is defined and E-entails g;
- for a dataset D=(DG,<n1,G1>,…,<nk,Gk>), I(D) is true iff I(DG) is true and for all i in 1…k, I(<ni,Gi>) is true.
Issues that the group may have to vote on to make the decision process clean
Issue 0: Should we say anything about the semantics of RDF datasets at all?
Issue 1: Can the entailment regime of the default graph be different from the one of the <name,graph> pairs? More generally, could we assign a different entailment regime to each individual <name,graph> pair? Note that the SPARQL Entailment document allows for that, and it would be relatively easy, mathematically, to extend the semantics to do that. However, the result would be fairly complicated.
Issue 2: Do we want to allow an entailment regime that is “weaker” than Simple Entailment? Something like the "no-semantics" in one of our previous proposals. Note that this has any relevance only if the answer to Issue 1 is “yes”. Otherwise this amounts to not using any semantics at all to the dataset, which does not require any further formalism.
Issue 3: Can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a possible extension in a previous proposal. Another alternative explored in the past was the usage of extra predicates in TriG.
Issue 4: should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite. In formal terms, IGEXT could map IRIs to graphs, and not resources; in which case the formalism would refer to IGEXT(n) instead of IGEXT(Id(n)).
Issue 5: In <n,G>, does n denote G, or may n denote any resource? Note that this is related to Issue-4. In terms of the terminology used in the current RDF Semantics, the current semantics is not “denoting”, because Id does not map n to any graph.
Issue 6: Is it sufficient for the truth of I(<n,g>) that IGEXT(n) E-entails g, or should we require that IGEXT(n) is equivalent to g under E-entailment? This is open-graph versus closed-graph semantics.
Issue 7: Should the truth of a named graph require that the named graph satisfies the default graph?