Difference between revisions of "TF-Graphs/Minimal-dataset-semantics"

From RDF Working Group Wiki
Jump to: navigation, search
(Issues that the group may have to vote on to make the decision process clean)
Line 1: Line 1:
 
This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).
 
This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).
  
The first part of this page reflects a semantics that some of us (AZ, RC, IH) have agreed on. The second part contains some explicit issues that the group may want to formally vote on to make the decision process clearer; the semantics put forward reflects our choices on those issues, but the group may want to decide otherwise.
+
The first part of this page reflects a semantics that some of us (AZ, RC, IH) have agreed on. The second part contains some explicit design issues that the group may want to consider; the semantics put forward reflects choices on those issues, but the group may want to decide otherwise.
  
 
= Semantics =
 
= Semantics =

Revision as of 17:10, 10 September 2012

This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).

The first part of this page reflects a semantics that some of us (AZ, RC, IH) have agreed on. The second part contains some explicit design issues that the group may want to consider; the semantics put forward reflects choices on those issues, but the group may want to decide otherwise.

Semantics

In order to decide how to interpret a dataset, it must be determined how plain RDF graphs are interpreted. According to the existing standards, there are several ways of interpreting a set of triples, each being tied to what SPARQL calls an entailment regime. As of today, the standard entailment regimes are Simple Entailment, RDF Entailment, RDFS Entailment, D Entailment, OWL Entailment with Direct Semantics, OWL Entailment with RDF Based Semantics and RIF Entailment.

The entailment regime is determined by the application and should be either fixed by it and described in the documentation, or changeable through the application setup.

Informal description of the semantics

Given an entailment regime E, the dataset is interpreted in the following way:

  • the default graph has the same meaning as an isolated RDF graph according to the regime E. So, a dataset with no <name,graph> pair can be identified with a plain RDF graph. This allows us to treat RDF graphs as if they were datasets with a minimal abuse of notations.
  • the <n,g> pairs are interpreted as a relationship between the resource the "name" n denotes and a certain graph, not necessarily g. This relationship is considered to be true for a dataset if the graph in relation with n E-entails g.

This means that the resource denoted by the name n is associated with a graph that has at least the truth of the graph g, according to the chosen entailment regime.

Model-theoretic semantics

Let E be an entailment regime and V a vocabulary of IRIs and literals. An E-dataset-interpretation over vocabulary V is a pair I = <Id,IGEXT> such that:

  • Id is an E-interpretation over vocabulary V;
  • IGEXT is a function from the set of resources defined by Id to the set of RDF graphs.

Further, I is extended into a function assigning truth values to graphs, <name,graph> pairs and dataset as follows:

  • for a graph G, I(G) is true iff Id(G) is true;
  • for an IRI n and RDF graph g, I(<n,g>) is true iff IGEXT(Id(n)) is defined and E-entails g;
  • for a dataset D=(DG,<n1,G1>,…,<nk,Gk>), I(D) is true iff I(DG) is true and for all i in 1…k, I(<ni,Gi>) is true.

Design decision that the group may have to consider on to make the decision process clean

Design Decision 0: Should we say anything about the semantics of RDF datasets at all?

Design Decision 1: Can the entailment regime of the default graph be different from the one of the <name,graph> pairs? More generally, could we assign a different entailment regime to each individual <name,graph> pair? Note that the SPARQL Entailment document allows for that, and it would be relatively easy, mathematically, to extend the semantics to do that. However, the result would be fairly complicated.

Design Decision 2: Do we want to allow an entailment regime that is “weaker” than Simple Entailment? Something like the "no-semantics" in one of our previous proposals. Note that this has any relevance only if the answer to Issue 1 is “yes”. Otherwise this amounts to not using any semantics at all to the dataset, which does not require any further formalism.

Design Decision 3: Can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a possible extension in a previous proposal. Another alternative explored in the past was the usage of extra predicates in TriG.

Design Decision 4: should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite. In formal terms, IGEXT could map IRIs to graphs, and not resources; in which case the formalism would refer to IGEXT(n) instead of IGEXT(Id(n)).

Design Decision 5: In <n,G>, does n denote G, or may n denote any resource? Note that this is related to Issue-4. In terms of the terminology used in the current RDF Semantics, the current semantics is not “denoting”, because Id does not map n to any graph.

Design Decision 6: Is it sufficient for the truth of I(<n,g>) that IGEXT(n) E-entails g, or should we require that IGEXT(n) is equivalent to g under E-entailment? This is open-graph versus closed-graph semantics.

Design Decision 7: Should the truth of a named graph require that the named graph satisfies the default graph?

Test cases

The following test cases are examples for entailments, non-entailments, equivalences and contradictions among RDF datasets.

Basics

Just like RDF graphs, RDF datasets are assumed to be expressions that have truth, that is, can be true or false.

Since RDF datasets are logical expressions, we can speak of the same logical relationships that hold between RDF graphs:

  • Entailment: If the truth of A can be shown or presumed, then B is true as well.
  • Equivalence: Two RDF datasets A and B are equivalent if they both have the same truth value. A entails B and B entails A.
  • Contradiction: One dataset is a contradiction if it cannot be true under any circumstances. Two RDF datasets A and B contradict each other if they cannot both be true.
  • Consistency: One dataset is consistent if it is not a contradiction. Dataset A is consistent with dataset B if A can be true or false regardless of the truth of B, and vice versa.

The truth of an RDF dataset is defined with respect to a graph extension, a relationship that associates RDF graphs with resources. Think of it as capturing a snapshot of the contents of all g-boxes in the universe of discourse.

An RDF dataset is true if its default graph is true and if all the named graphs are true. A named graph <n,G> is true if the resource denoted by n has an RDF graph that entails G as its graph extension. Note that it is not required that G be true.

Notation

These test cases assume the following notation.

This is an RDF dataset with one named graph and an empty default graph:

:g1 { :s :p :o }

This is an RDF dataset with one triple in the default graph and no named graphs:

{ :s :p :o }

This is an RDF dataset with one triple in the default graph, one triple in a named graph, and a second empty named graph:

{ :s :p :o }
:g1 { :s :p :o }
:g2 {}

This is an RDF graph (*not* an RDF dataset):

:s :p :o

The default graph is asserted

T1.1 Under simple dataset entailment:

{ :s :p :o }

entails

# This is an RDF graph, not an RDF dataset
:s :p :o

T1.2 Under OWL dataset entailment:

{ :o1 owl:differentFrom :o1 }

is a contradiction.

Entailment works within the default graph

T2.1 Under simple dataset entailment:

{ :s :p :o1, :o2 }

entails

{ :s :p :o1 }

T2.2 Under simple dataset entailment:

{ :s :p :o1 }

entails

{ :s :p [] }

T2.3 Under simple dataset entailment:

{ :s :p _:blank1 }

is equivalent to (and hence entails)

{ :s :p _:blank2 }

T2.4 Under simple dataset entailment:

{ :s :p _:blank1 }

is equivalent to (and hence entails)

{ :s :p _:blank1, _:blank2 }

Named graphs are not asserted

T3.1 Under simple dataset entailment:

:g1 { :s :p :o }

does not entail

{ :s :p :o }

T3.2 Under simple dataset entailment:

:g1 { :s :p :o }

does not entail

# This is an RDF graph, not an RDF dataset
:s :p :o

T3.3 Under OWL dataset entailment:

:g1 { :o1 owl:differentFrom :o1 }

is consistent.

Entailment works within named graphs

T4.1 Under simple dataset entailment:

:g1 { :s :p :o1, :o2 }

entails

:g1 { :s :p :o1 }

T4.2 Under simple dataset entailment:

:g1 { :s :p :o1 }

entails

:g1 { :s :p [] }

T4.3 Under simple dataset entailment:

:g1 { :s :p _:blank1 }

is equivalent to (and hence entails)

:g1 { :s :p _:blank2 }

T4.4 Under simple dataset entailment:

:g1 { :s :p _:blank1 }

is equivalent to (and hence entails)

:g1 { :s :p _:blank1, _:blank2 }

Empty named graphs are trivially true

T5.1 Under simple dataset entailment:

:g1 { :s :p :o }

entails

:g1 {}

T5.2 Under simple dataset entailment:

:g1 {}

is equivalent to

# empty default graph, no named graphs

An RDF dataset is the conjunction of the default + named graphs

T6.1 Under simple dataset entailment:

:g1 { :s :p :o }

entails

# empty default graph, no named graphs

T6.2 Under simple dataset entailment:

:g1 { :s :p :o }
:g2 { :s :p :o }

entails

:g1 { :s :p :o }

T6.3 Under simple dataset entailment:

{ :s :p :o }
:g1 { :s :p :o }

entails

:g1 { :s :p :o }

T6.4 Under simple dataset entailment:

{ :s :p :o }
:g1 { :s :p :o }

entails

{ :s :p :o }

Different named graphs do not contradict each other

T7.1 Under simple dataset entailment:

:g1 { :s :p :o1 }

is consistent with

:g1 { :s :p :o2 }

The same entailment regime is active in default and named graphs

T8.1 Under OWL dataset entailment:

{ :s :p :o1. :o1 owl:sameAs :o2 }

entails

{ :s :p :o2 }

T8.2 Under OWL dataset entailment:

:g1 { :s :p :o1. :o1 owl:sameAs :o2 }

entails

:g1 { :s :p :o2 }

Named graphs are independent from each other

T9.1 Under OWL dataset entailment:

:g1 { :s :p :o1 }
:g2 { :o1 owl:sameAs :o2 }

is consistent with, but does not entail

:g1 { :s :p :o1, :o2 }

Named graphs are independent from the default graph

T10.1 Under OWL dataset entailment:

{ :o1 owl:sameAs :o2 }
:g1 { :s :p :o1 }

is consistent with, but does not entail

:g1 { :s :p :o1, :o2 }

T10.2 Under OWL dataset entailment:

{ :s :p :o1 }
:g1 { :o1 owl:sameAs :o2 }

is consistent with, but does not entail

{ :s :p :o1, :o2 }

Indirection between graph name and graph

T11.1 Under OWL dataset entailment:

{ :g1 owl:differentFrom :g2 }
:g1 { :s :p :o }
:g2 { :s :p :o }

is consistent.


T11.2 Under OWL dataset entailment:

{ :g1 owl:sameAs :g2 }
:g1 { :s :p :o }

entails

{ :g1 owl:sameAs :g2 }
:g1 { :s :p :o }
:g2 { :s :p :o }

(Note: This entailment would not hold under the IRI-IGEXT version of the semantics; see Issue 4 above.)

Issue 0 test case: Do we define a semantics for RDF datasets?

All test cases above assume the answer “yes” to Issue 0. If the answer is “no”, then one cannot speak of RDF datasets in terms of entailment or contradiction. A separate notion of “equivalence” for RDF datasets would have to be defined; this would probably be “RDF dataset isomorphism”, that is, two datasets are equivalent if they only differ in their blank nodes.

Is Issue 0 was “no”:

{ :s :p 42.0 }

would not be equivalent to (but consistent with):

{ :s :p +42.00 }

Issue 1: Different regime for default graphs and named graphs?

The test cases above assume that the same entailment regime holds for the default graph and for the named graphs. So, if a particular entailment holds in the default graph G, then it also holds in a named graph G.

Under OWL dataset semantics:

{ :o1 owl:differentFrom :o1 }
:g1 {}

is a contradiction.


Under OWL dataset semantics:

{}
:g1 { :o1 owl:differentFrom :o1 }

is a contradiction.


Under simple dataset semantics:

{ :o1 owl:differentFrom :o1 }
:g1 {}

is consistent.


Under simple dataset semantics:

{}
:g1 { :o1 owl:differentFrom :o1 }

is consistent.

Issue 2: No-Semantics

The proposal is to introduce a new “no-semantics” entailment regime, in which a graph G entails only isomorphisms of G.

Under no-dataset-semantics:

g1 { :s :p :o1 }

contradicts

g1 { :s :p :o2 }

Under no-dataset-semantics:

{ :s :p :o1 }

contradicts

{ :s :p :o2 }

The second case assumes that the same entailment regime applies to default graph and named graph; otherwise it could be made consistent by applying a normal entailment regime to the default graph, and no-semantics only to the named graphs.

Issue 3: Let the dataset announce its assumed entailment regime?

(No complete proposal for this is on the table yet, so no test case.)

Issue 4: Does the graph extension assign graphs to resources or to IRIs?

Under the RES-IGEXT variant of the semantics (formalized above):

{ :g1 owl:sameAs :g2 }
:g1 { :s :p :o }

entails

{ :g1 owl:sameAs :g2 }
:g1 { :s :p :o }
:g2 { :s :p :o }

Under the IRI-IGEXT variant of the semantics, this particular entailment would not hold, because the graph is associated with the IRI, and not with the resource that is declared as having two names.

Issue 5: Does the graph name denote the graph?

Under the semantics formalized above:

{ :g1 owl:differentFrom :g2 }
:g1 { :s :p :o }
:g2 { :s :p :o }

is consistent, because :g1 and :g2 can be associated with the same graph even though they denote different resources.

Under an alternative semantics where :g1 and :g2 directly denote the graphs, the above would be a contradiction.

Issue 6: Open-graph or closed-graph semantics

The test cases above assume the open-graph version of the semantics. Under the open-graph version with simple entailment:

:g1 { :s :p :o1, :o2 }

entails

:g1 { :s :p :o1 }

Under the alternative closed-graph version of the semantics:

:g1 { :s :p :o1, :o2 }

contradicts

:g1 { :s :p :o1 }

Issue 7: Is the default graph universally true?

The test cases above assume that the default graph has a truth value, and it is asserted; however, its truth is not presumed in the named grahps.

Under OWL dataset semantics:

{ :o1 owl:sameAs :o2.
  :s :o :o1 }

entails

# This is an RDF graph, not an RDF dataset
:o1 owl:sameAs :o2.
:s :o :o1, :o2

Under OWL dataset semantics:

{ :o1 owl:sameAs :o2. }
:g1 { :s :o :o1 }

is consistent with, but does not entail

{ :o1 owl:sameAs :o2. }
:g1 { :s :o :o1, :o2 }

Issue 7 asks whether the truth of the default graph should be presumed in the named graphs. In that case, the second case would be an entailment.