Difference between revisions of "TF-Graphs/Minimal-dataset-semantics"

From RDF Working Group Wiki
Jump to: navigation, search
m (Informal description of the semantics)
Line 1: Line 1:
This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions".  
+
This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).
The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).
+
  
=Entailment regime=
+
The first part of this page reflects a semantics that some of us (AZ, RC, IH) have agreed on. The second part contains some explicit issues that the group may want to formally vote on to make the decision process clearer; the semantics put forward reflects our choices on those issues, but the group may want to decide otherwise.
 +
 
 +
= Semantics =
  
 
In order to decide how to interpret a dataset, it must be determined how plain RDF graphs are interpreted. According to the existing standards, there are several ways of interpreting a set of triples, each being tied to what SPARQL calls an ''entailment regime''. As of today, the standard entailment regimes are [http://www.w3.org/ns/entailment/Simple Simple Entailment], [http://www.w3.org/ns/entailment/RDF RDF Entailment], [http://www.w3.org/ns/entailment/RDFS RDFS Entailment], [http://www.w3.org/ns/entailment/D D Entailment], [http://www.w3.org/ns/entailment/OWL-Direct OWL Entailment with Direct Semantics], [http://www.w3.org/ns/entailment/OWL-RDF-Based OWL Entailment with RDF Based Semantics] and [http://www.w3.org/ns/entailment/RIF RIF Entailment].
 
In order to decide how to interpret a dataset, it must be determined how plain RDF graphs are interpreted. According to the existing standards, there are several ways of interpreting a set of triples, each being tied to what SPARQL calls an ''entailment regime''. As of today, the standard entailment regimes are [http://www.w3.org/ns/entailment/Simple Simple Entailment], [http://www.w3.org/ns/entailment/RDF RDF Entailment], [http://www.w3.org/ns/entailment/RDFS RDFS Entailment], [http://www.w3.org/ns/entailment/D D Entailment], [http://www.w3.org/ns/entailment/OWL-Direct OWL Entailment with Direct Semantics], [http://www.w3.org/ns/entailment/OWL-RDF-Based OWL Entailment with RDF Based Semantics] and [http://www.w3.org/ns/entailment/RIF RIF Entailment].
Line 8: Line 9:
 
The entailment regime is determined by the application and should be either fixed by it and described in the documentation, or changeable through the application setup.
 
The entailment regime is determined by the application and should be either fixed by it and described in the documentation, or changeable through the application setup.
  
=Informal description of the semantics=
+
== Informal description of the semantics ==
  
 
Given an entailment regime ''E'', the dataset is interpreted in the following way:
 
Given an entailment regime ''E'', the dataset is interpreted in the following way:
  
* the default graph has the same meaning as an isolated RDF graph according to the regime ''E''. So, a dataset with no <name,graph> pair can be identified with a plain RDF graph. This allows us to treat RDF graph as if they were datasets with minimal abuse of notations.
+
* the default graph has the same meaning as an isolated RDF graph according to the regime ''E''. So, a dataset with no <name,graph> pair can be identified with a plain RDF graph. This allows us to treat RDF graphs as if they were datasets with a minimal abuse of notations.
* the <''n'',''g''> pairs are interpreted as a relationship between the "name" ''n'' (or the resource it denotes) and a certain graph, not necessarily ''g''.  This relationship is considered to be true for a dataset, if the graph in relation with ''n'' ''E''-entails ''g''.
+
* the <''n'',''g''> pairs are interpreted as a relationship between the resource the "name" ''n'' denotes and a certain graph, not necessarily ''g''.  This relationship is considered to be true for a dataset if the graph in relation with ''n'' ''E''-entails ''g''.
  
This means that the name ''n'' (or the resource it denotes) is associated with a graph that has at least the truth of the graph ''g'', according to the chosen entailment regime.
+
This means that the resource denoted by the name ''n'' is associated with a graph that has at least the truth of the graph ''g'', according to the chosen entailment regime.
  
'''Issue 1:''' ''can the entailment regime of the default graph be different from the one of the <name,graph> pairs?''
+
== Model-theoretic semantics ==
  
'''Issue 2:''' ''do we want to allow an entailment regime that is "weaker" than Simple Entailment? Something like the "[[TF-Graphs/Dataset-semantics-2.0#RDF_no-semantics|no-semantics]]" in one of our previous proposals.''
+
Let ''E'' be an entailment regime and ''V'' a vocabulary of IRIs and literals. An ''E''-dataset-interpretation over vocabulary ''V'' is a pair ''I'' = <''I<sub>d</sub>'',IGEXT> such that:
  
'''Issue 3:''' ''can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a [[TF-Graphs/Dataset-semantics-2.0#Extensions|possible extension]] in a previous proposal.''
+
* ''I<sub>d</sub>'' is an ''E''-interpretation over vocabulary ''V'';
 +
* IGEXT is a function from the set of resources defined by ''I<sub>d</sub>'' to the set of RDF graphs.
  
'''Issue 4:''' ''should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite. See also Issue 6 below.
+
Further, ''I'' is extended into a function assigning truth values to graphs, <name,graph> pairs and dataset as follows:
  
'''Issue 5:''' ''this semantics does not completely covers the "graph quote" use case where one wants to explicitly say that the graph is quoted, that is, the terms used, in addition to their meaning, are important.''
+
* for a graph ''G'', ''I''(''G'') is true iff ''I<sub>d</sub>''(''G'') is true;
 +
* for an IRI ''n'' and RDF graph ''g'', ''I''(<''n'',''g''>) is true iff IGEXT(''I<sub>d</sub>''(''n'')) is defined and ''E''-entails ''g'';
 +
* for a dataset ''D''=(''DG'',<''n''<sub>1</sub>,''G''<sub>1</sub>>,…,<''n''<sub>''k''</sub>,''G''<sub>''k''</sub>>), ''I''(''D'') is true iff ''I(''DG'') is true and for all ''i'' in 1…''k'', ''I''(<''n''<sub>''i''</sub>,''G''<sub>''i''</sub>>) is true.
  
=Model-theoretic semantics=
+
= Issues that the group may have to vote on to make the decision process clean =
  
Let ''E'' be an entailment regime and ''V'' a vocabulary of IRIs and literals. An ''E''-dataset-interpretation over vocabulary ''V'' is a pair ''I'' = <''I<sub>d</sub>'',IGEXT> such that:
+
'''Issue 0:''' ''Should we say anything about the semantics of RDF datasets at all?''  
  
* ''I<sub>d</sub>'' is an ''E''-interpretation over vocabulary ''V'';
+
'''Issue 1:''' ''Can the entailment regime of the default graph be different from the one of the <name,graph> pairs? More generally, could we assign a different entailment regime to each individual <name,graph> pair?'' Note that the SPARQL Entailment document allows for that, and it would be relatively easy, mathematically, to extend the semantics to do that. However, the result would be fairly complicated.  
* IGEXT is a function from a set of IRIs in ''V'' to the set of RDF graphs.
+
  
'''Issue:''' ''IGEXT could be a function from the set of resources defined by ''I<sub>d</sub>'', instead of a set of IRIs.
+
'''Issue 2:''' ''Do we want to allow an entailment regime that is “weaker” than Simple Entailment? Something like the "[[TF-Graphs/Dataset-semantics-2.0#RDF_no-semantics|no-semantics]]" in one of our previous proposals.'' Note that this has any relevance only if the answer to Issue 1 is “yes”. Otherwise this amounts to not using any semantics at all to the dataset, which does not require any further formalism.
  
Further, ''I'' is extended into a function assigning truth values to graphs, <name,graph> pairs and dataset as follows:
+
'''Issue 3:''' ''Can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a [[TF-Graphs/Dataset-semantics-2.0#Extensions|possible extension]] in a previous proposal. Another alternative explored in the past was the usage of extra predicates in TriG.''
  
* for a graph ''G'', ''I''(''G'') is true iff ''I<sub>d</sub>''(''G'') is true;
+
'''Issue 4:''' ''should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite. In formal terms, IGEXT could map IRIs to graphs, and not resources; in which case the formalism would refer to IGEXT(''n'') instead of IGEXT(''I''<sub>d</sub>(''n''))''.
* for an IRI ''n'' and RDF graph ''g'', ''I''(<''n'',''g''>) is true iff IGEXT(''n'') is defined and ''E''-entails ''g'';
+
 
* for a dataset ''D''=(''DG'',<''n''<sub>1</sub>,''G''<sub>1</sub>>,...,<''n''<sub>''k''</sub>,''G''<sub>''k''</sub>>), ''I''(''D'') is true iff ''I(''DG'') is true and for all ''i'' in 1..''k'', ''I''(<''n''<sub>''i''</sub>,''G''<sub>''i''</sub>>) is true.
+
'''Issue 5:''' ''In <n,G>, does n denote G, or may n denote any resource?'' Note that this is related to Issue-4. In terms of the terminology used in the current RDF Semantics, the current semantics is not “denoting”, because I<sub>d</sub> does not map ''n'' to any graph.
 +
 
 +
'''Issue 6:''' ''Is it sufficient for the truth of I(<n,g>) that IGEXT(n) E-entails g, or should we require that IGEXT(n) is equivalent to g under E-entailment? This is open-graph versus closed-graph semantics.''
  
'''Issue 6:''' ''if IGEXT maps resources to graphs rather than IRIs to graphs, then the second item must be replaced by "for an IRI ''n'' and RDF graph ''g'', ''I''(<''n'',''g''>) is true iff IGEXT(''I''<sub>d</sub>(''n'')) is defined and ''E''-entails ''g'';".''
+
'''Issue 7:''' ''Should the truth of a named graph require that the named graph satisfies the default graph?''

Revision as of 13:29, 7 September 2012

This is a new attempt to provide a semantics of datasets. It is made such that it covers a descent amount of use cases and more use cases can be covered by proper "semantic extensions". The formal semantics only describes what can be deduced from or assumed to be true of a dataset. It does not describe a mechanism by which an RDF dataset is affected. Implementations are free to ignore the semantics and manipulate the syntactic structure only (e.g., parsers, editors).

The first part of this page reflects a semantics that some of us (AZ, RC, IH) have agreed on. The second part contains some explicit issues that the group may want to formally vote on to make the decision process clearer; the semantics put forward reflects our choices on those issues, but the group may want to decide otherwise.

Semantics

In order to decide how to interpret a dataset, it must be determined how plain RDF graphs are interpreted. According to the existing standards, there are several ways of interpreting a set of triples, each being tied to what SPARQL calls an entailment regime. As of today, the standard entailment regimes are Simple Entailment, RDF Entailment, RDFS Entailment, D Entailment, OWL Entailment with Direct Semantics, OWL Entailment with RDF Based Semantics and RIF Entailment.

The entailment regime is determined by the application and should be either fixed by it and described in the documentation, or changeable through the application setup.

Informal description of the semantics

Given an entailment regime E, the dataset is interpreted in the following way:

  • the default graph has the same meaning as an isolated RDF graph according to the regime E. So, a dataset with no <name,graph> pair can be identified with a plain RDF graph. This allows us to treat RDF graphs as if they were datasets with a minimal abuse of notations.
  • the <n,g> pairs are interpreted as a relationship between the resource the "name" n denotes and a certain graph, not necessarily g. This relationship is considered to be true for a dataset if the graph in relation with n E-entails g.

This means that the resource denoted by the name n is associated with a graph that has at least the truth of the graph g, according to the chosen entailment regime.

Model-theoretic semantics

Let E be an entailment regime and V a vocabulary of IRIs and literals. An E-dataset-interpretation over vocabulary V is a pair I = <Id,IGEXT> such that:

  • Id is an E-interpretation over vocabulary V;
  • IGEXT is a function from the set of resources defined by Id to the set of RDF graphs.

Further, I is extended into a function assigning truth values to graphs, <name,graph> pairs and dataset as follows:

  • for a graph G, I(G) is true iff Id(G) is true;
  • for an IRI n and RDF graph g, I(<n,g>) is true iff IGEXT(Id(n)) is defined and E-entails g;
  • for a dataset D=(DG,<n1,G1>,…,<nk,Gk>), I(D) is true iff I(DG) is true and for all i in 1…k, I(<ni,Gi>) is true.

Issues that the group may have to vote on to make the decision process clean

Issue 0: Should we say anything about the semantics of RDF datasets at all?

Issue 1: Can the entailment regime of the default graph be different from the one of the <name,graph> pairs? More generally, could we assign a different entailment regime to each individual <name,graph> pair? Note that the SPARQL Entailment document allows for that, and it would be relatively easy, mathematically, to extend the semantics to do that. However, the result would be fairly complicated.

Issue 2: Do we want to allow an entailment regime that is “weaker” than Simple Entailment? Something like the "no-semantics" in one of our previous proposals. Note that this has any relevance only if the answer to Issue 1 is “yes”. Otherwise this amounts to not using any semantics at all to the dataset, which does not require any further formalism.

Issue 3: Can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a possible extension in a previous proposal. Another alternative explored in the past was the usage of extra predicates in TriG.

Issue 4: should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite. In formal terms, IGEXT could map IRIs to graphs, and not resources; in which case the formalism would refer to IGEXT(n) instead of IGEXT(Id(n)).

Issue 5: In <n,G>, does n denote G, or may n denote any resource? Note that this is related to Issue-4. In terms of the terminology used in the current RDF Semantics, the current semantics is not “denoting”, because Id does not map n to any graph.

Issue 6: Is it sufficient for the truth of I(<n,g>) that IGEXT(n) E-entails g, or should we require that IGEXT(n) is equivalent to g under E-entailment? This is open-graph versus closed-graph semantics.

Issue 7: Should the truth of a named graph require that the named graph satisfies the default graph?