Warning:
This wiki has been archived and is now read-only.

AnotherSpin

From RDF Working Group Wiki
Jump to: navigation, search

Another Spin on RDF with Contexts

(7 June 2012) I just discovered that essentially the same proposal, with almost identical motivation, was made by Deb McGuinness, Ling Ding and Jie Bao TWO YEARS AGO (!) See http://www.slideshare.net/baojie_iowa/2010-0624-rdfcontext.

OK, describing the RDFC idea as finding a compromise between global and local views of IRI meanings, and as a revision of RDF itself, has raised some hackles and concerns. This Wiki page is a different way to spin essentially the same idea, making it sound a lot less threatening (I hope). HOwever, apart from the wording used to describe it, this is exactly the same proposal.

We don't say we are redefining RDF, and we don't talk about RDFC. We don't use the term 'local' or talk about changing the meanings of IRIs. Instead, we talk about extending RDF with an option that allows users to describe and name their own semantic extensions, so that they and others can use them to express more things in RDF. Its a kind of do-it-yourself RDF-extending kit.

The idea of semantic extension is already introduced and sketchily defined in the 2004 specs, the relevant text being "Particular uses of RDF, including as a basis for more expressive languages such as DAML+OIL [DAML] and OWL [OWL], may impose further semantic conditions in addition to those described here, and such extra semantic conditions can also be imposed on the meanings of terms in particular RDF vocabularies. Extensions or dialects of RDF which are obtained by imposing such extra semantic conditions may be referred to as semantic extensions of RDF. " So the proposal is now to take this idea and run a little further with it.

Although the terminology has changed, the formal machinery and its intended uses are almost exactly as in the earlier proposal, so I have kept the same IRI names for comparison. However, if this way of describing it is preferred, then we might want to change rdf:inherits to something like rdf:inExtension.

Semantic Extensions to RDF

A semantic extension, or simply an extension, to RDF represents a named public agreement to use a particular vocabulary of IRIs, called the reserved vocabulary of the extension, with a particular meaning defined by the extension. Semantic extensions must not violate the basic semantics of RDF, but they can extend it by imposing special meanings on IRIs. The OWL and RDFS standards are semantic extensions to RDF defined by their W3C specifications documents. (In these cases the reserved vocabulary constitutes a namespace, but this is not required.) Users may invent their own extensions and indicate them by an IRI. Then, including a triple

<> rdf:inherits C .

in an RDF graph, where C indicates the semantic extension, means that occurrences of any C-reserved IRIs in the rest of the graph are to be interpreted using the semantic constraints of that extension. Several rdf:inherits triples may be included in a graph, in which case all the indicated extensions apply, each to its own reserved vocabulary. When a graph includes one or more rdf:inherits triples, we will say that the graph inherits the semantic extensions, and that any URIs in the graph which are not part of the reserved vocabulary of any inherited extensions are unreserved.

Note, we have to refer to occurrences of IRIs, since the semantic constraints of the extension are required to apply only when the extension is explicitly inherited. The same IRI used in a graph which does not inherit the semantic extension is unreserved, and its meaning need not be affected by the semantic extension constraints.

We can think of the RDF specs themselves as determining an 'empty extension' which is considered to be inherited, by default, by all other extensions; and that a graph with no rdf:inherits triple is implicitly understood to inherit the empty extension. The common presumption that every IRI has a single unique global meaning can then be stated as the assumption that the empty extension reserves every IRI. It is possible to use RDF under this assumption, ignoring the extension machinery altogether (as RDF is used at present, following the 2004 specifications) but the extension mechanism also allows users to be explicit about which semantic assumptions they wish to inherit in their RDF assertions, and to relate various such presumptions of meanings to one another explicitly, if they wish to do so.

(***This presumes that there is a clear notion of a graph boundary, which is not true in current (2004) RDF, but is widely presumed. This is something we need to fix.***)

Note, we use the term 'indicates' because the indicating URI might denote something else. In general, any IRI may be used to indicate a semantic extension, but it does not therefore denote or refer to or identify that extension. Semantic extensions are not considered to be RDF resources; they are not things in the universe of discourse of RDF. An IRI indicates a semantic extension simply by being used as the object of an rdf:inherits triple. The recommended practice is to have the IRI which is used to indicate an extension identify a document which defines the reserved vocabulary and semantic conditions of the extension; but any IRI, for example one which identifies a human person or an OWL class, may be used to indicate an extension.

(***This is to allow current practice where an IRI is used to 'label' an RDF graph in a dataset while also being used to denote something else. Several major RDF supporters use this, so we need to allow it.***)

It is possible that two extensions might impose inconsistent conditions on the same reserved vocabulary, so that a graph which inherits both of them cannot be satisfied. In this case the graph is considered to be RDF-inconsistent, and this situation may be flagged as an error.

A description of a semantic extension must define the reserved vocabulary of the extension, so that there is an algorithm which will determine, for any IRI, whether or not it is in the vocabulary; and may describe the intended meanings of these IRIs in a form which defines a set of RDF interpretations on the vocabulary as being those which satisfy the semantic conditions of the extension. A semantic extension may specify certain syntactic conditions on RDF graphs or combinations of RDF triples; in which case an algorithm must be defined which can determine, for any RDF graph, whether or not it satisfies these syntactic constraints of the extension. (*** Thinking of OWL-DL, obviously. ***)

A (recommended) way to define the reserved vocabulary is to include an RDF graph in the documentation which uses all the IRIs in the reserved vocabulary, called the extension graph. This extension graph may also represent some of the semantic conditions on the vocabulary. However, semantic conditions may be defined in other ways, for example by stating conditions on interpretations directly, by providing axioms or rules, or in natural-language text. They may be omitted altogether, but of course the less information that is provided about the semantic conditions, the less useful the extension may be. We leave this situation under-defined deliberately, in anticipation of a scenario where a relatively under-defined extension gradually becomes clearer as it gets used by a community.

(***I believe that it is important to keep this under-defined, allowing informally expressed descriptions of extensions, to make it very easy for people to use extensions even if their definitions are informal. It also provides a semi-official way to connect RDF semantic content to what used to be called 'social meaning'. For example, the text might specify that a given class name IRI is interpreted to mean the class of all adult human beings, without giving any further explanation or being obliged to 'axiomatize' this idea.***)

A semantic extension which does not define its semantic constraints may be used as a public flag to draw attention to the fact that graphs which inherit that extension are all in explicit agreement concerning the meaning of the restricted vocabulary, whatever it may be. Of course, simply using a given set of IRIs should imply this agreement, but in practice this rule is sometimes unreliable, and the semantic extension mechanism provides an additional level of explicit confirmation of an intention to use a vocabulary in a strictly consistent manner.

The extension graph, if provided, must be true under the semantic conditions of the extension. An extension may be completely specified by its extension graph, in which case the semantic constraint is simply that the graph be true. In this case, called a graph extension, rdf:inherits has exactly the same meaning as owl:imports. This is understood to be the case when the extension is indicated by an IRI identifying an RDF graph, or a graph container, or a document which parses to a graph in some accepted RDF notation.

If the extension graph of an extension A inherits the extension B, then the extension A is called an extension of B, and includes all of the semantic constraints of B, plus (presumably) some others of its own. The reserved vocabulary of A may overlap with that of B, which indicates that A imposes further conditions on this part of the vocabulary. An extension of an extension B must not contradict any of the conditions imposed by B, but it may add further conditions which are understood to be conjoined ("and-ed") to the B conditions. For example, if A extends B and B specifies that the IRI x:Person is the class of human beings, then A may specify that this IRI is the class of American citizens, but it may not claim that it is the class of insects. An extension may be an extension of several other extensions.

A special case

A graph may be asserted in itself; that is, in the graph extension defined by treating it as the extension graph of that extension. This means that its entire vocabulary is classified as restricted to it, and so has the effect of 'isolating' the IRIs in the graph from any meaning they might have in other graphs, allowing them to be interpreted locally in that particular graph as a unique context of meaning. We could call this a solipsist graph. It reproduces exactly the semantics of datasets as defined by Antoine. This technique may be found useful when sorting RDF content found "in the wild" into coherent groups, without being obliged to treat IRIs in separate groups as necessarily identical in meaning. This can be done using graph naming in a SPARQL dataset, using Sandro's trick for making names stick.:

{ <u1> a rdf:Graph }

<u1> { <u1> rdf:inherits <u1> ...}

Semantics

(*** For semanticians, this is more like a 'punning' approach, different from the way the context semantics was defined. Although that was more in the spirit of the 2004 semantics, I think this is more intuitive and has better entailments. But we could go either way.***)

Let EXTV be the set of IRIs which indicate semantic extensions. An extension interpretation is a mapping J from EXTV to RDF interpretations, such that the vocabulary of J(x) is the reserved vocabulary of the extension indicated by x, and J(x) satisfies the semantic conditions of the extension indicated by x.

A given IRI may have a different meaning when it occurs in a graph which inherits an extension, so an interpretation mapping has to be considered now as defined on occurrences of IRIs rather than on IRIs themselves. Define a set of occurrences of IRIs an extended vocabulary. Say that an occurrence of an IRI in a triple in a graph which inherits X is constrained by X when the IRI is in the reserved vocabulary of X.

An extended interpretation I of an extended vocabulary V is an RDF interpretation IJ of the IRIs which occur in V together with an extension interpretation IEX. The interpretation of an occurrence O of an IRI is defined as

I(O) = IEX(X)(O) if O is constrained by X, otherwise IJ(X)

and the rest of the semantics (for triples, graphs, blank nodes, etc.) are exactly as in the 2004 semantic specifications.

Note that extension interpretations are defined directly on the indicating IRIs, independently from any interpretation of that IRI.


The only way this semantics differs from that defined previously is that the machinery of indicating extensions by IRIs is here made completely independent from that which defines what resources the IRIs refer to, so equality reasoning does not apply to extensions:

A owl:sameAs B

{ <> rdf:inherits A . S P O . }

does not entail

{ <> rdf:inherits B . S P O . }

I actually think this is better, because allowing equality and class reasoning to apply to extensions could get things into an indescribable tangle where what IRIs mean depends upon obscure OWL reasoning.

Why bother?

One might ask, why bother? Since the effect of a semantic extension can be achieved by defining a vocabulary of IRIs and specifying what it is intended to mean, more or less formally, just as we do now. There are several responses to this.

First, the suggested machinery can be viewed simply as supplying some normative discipline to this existing practice, giving a standard way to refer to the defining documents for such a namespace and to allow explicit linking to the important semantic sources rather than relying on (what some may consider to be a mis-use of) the IRI de-hashing convention. It also allows for a decoupling of RDF semantic extension vocabularies from the HTTP/XML namespace conventions, since the reserved vocabulary can be any set of IRIs.

Second, this machinery can be used to go beyond the ontology-plus-namespace convention, by allowing for hierarchies of extensions to build upon one another in a series of semantic refinements, without needing to invent a whole new vocabulary. Suppose for example an ontology is published, and used, which defines a set of concepts in chemistry, but does not provide the notion of isotope. To use this ontology in a context in which isotopes – say, carbon-12 and carbon-14 – are distinguished, might require re-writing it to change the sense of "element", and this re-writing requires inventing an entire new namespace, which then needs to be related to the previous one, probably by mis-using owl:sameAs. With the current machinery, however, one could define an extension which introduces the concept of isotopes, re-defines chemical element to be the union of isotope classes, and retains the old terminology unchanged. Data can then be transferred to this new ontology simply by changing one inheritance triple in a large graph, with no internal re-writing of data required. Similar advantages accrue when ontologies are changed or updated; the old terminology can be re-used with a new semantic inheritance, even though a strict adherence to the "cool URIs" doctrine would require that the data be re-formulated using a new namespace, to track the change in meaning. (There is an existing use case. The semantic extension machinery is very similar to the 'microtheory' machinery developed by Guha and used extensively and successfully in the large-scale CYC ontology, largely in the way just outlined, to allow context-dependent refinements of meaning applied to a single vocabulary. See here for an extended description. For "context" read "extension".)

The "Cool URI" idea can be re-stated here as "Cool extensions". What the RDF Web needs in order to be stable and reliable is not stable URIs as such, but stable URIs-in-semantic-extensions; and for this to be possible, semantic extensions must be stable and reliably linked to IRIs. This will require a certain discipline to be adopted in coining and especially modifying semantic extension defining documents. Rather than changing the definitions of a semantic extensions, a new extension should be defined and linked to the older one. If possible, it should be an extension of the older version, but if this is not possible (as it often will not be) there should be some way to find the current version of any extension from any older ones. <At this point Pat is reduced to handwaving.>

examples

Time-dependent properties and intervals.

Define an extension indicated by ex:TimeDependentProperty with the extension graph { ex:TimeDependentProperty a rdfs:Class } and the semantic constraint defined as follows. ex:TimeDependentProperty is a class of RDF properties which are time-dependent: that is, their truth depends upon the time and data they are asserted; and this timedependent truthvalue is indicated by asserting them in any semantic extension indicated by an IRI which denotes a time and/or date. That is, if P is in this class and a triple of the form S P O is asserted in a graph which inherits an extension indicated by E, and E denotes a time and/or date, then P is understood to be true of the pair <S, O> at the time or date denoted by E.

(Note, this goes beyond anything that could be defined using an OWL ontology, but we have here defined it pretty crisply, and it could be baked even harder if required, using talk of 3-place relations.)

Then we can write time-stamped data by defining date extensions which extend ex:TimeDependentProperty, thus:

{ <> rdf:inherits ex:TimeDependentProperty .

ex:occupies a ex:TimeDependentProperty .

ex:date1 owl:sameAs "2012-01-08"^^xsd:date .

ex:date2 owl:sameAs "2012-01-09"^^xsd:date .

...}

date1 { <> rdf:inherits ex:date1 .

Bill ex:occupies room7 .

Arthur ex:occupies room10 .}

date2 { < > rdf:inherits ex:date2 .

Bill ex:occupies room10 .

Arthur ex:occupies room7 . }


///More to come///


Progressive refinement a la Cyc

Mutual agreement on standard definitions eg in ISO

Links to legal texts, government documents, etc..