Easy Keys

From OWL
Jump to: navigation, search


Background

Keys (aka, inverse functional datatype properties) are clearly of vital importance to many applications. Key reasoning in general in the context of OWL can be unfeasibly difficult (given what we currently know and anticpate). However, general inverse functional properties are almost always overkill. Instead of tackling the general problem, we propose a restricted version which meets important use cases. We aim for robust implementations quickly, so are willing to forgo feasible features that unduely complicate the implementation (as far as we can currently tell). As our implementation knowledge advances, we can update the spec to cover more features.

Basic details

Easy Keys should be:

  • Useful enough to be worth specifying
  • Low impact on implementations
  • Clear enough to specify

Often, applications merely require a key checking on explicit data (i.e., named individuals and known key values). This alone is a big and effective reduction in expressivity since it allows us to (mostly) use DL Safe rules to specify key semantics. DL Safe rules degrade graceful as we step down the overall expressiveness of the langauge, making this spec useful for RDF + rules implementations.

Keys in general have (or may have) the following properties

  1. Missing key values raise an error (optional; non first order)
  2. Functionality constraints on keys (optional)
  3. If X and Y have the same key values y, then X=Y. (essential; this is what it is to be a key!)

The first feature is not expressible directly in first order logic, thus not in OWL or OWL + DL Safe rules. So we forgo this for the moment. Functionality can be expressed in OWL and degrades to the known individual/value case just fine.

The third feature can be expressed as a DL Safe rule (assuming safety restrictions on datatype atoms), e.g., the following DL safe rule expresses that `keyProperty` is a key property:

 keyProperty(X, Z), keyProperty(Y, Z) -> X = Y. 

Key properties can be restricted to certain classes, e.g., the following DL safe rule expresses that `keyProperty` is a key property for the class `aClass`:

 keyProperty(X, K),  keyProperty(Y, K), aClass(X), aClass(Y) -> X = Y. 

Simple and Compound Keys

In the above examples, we had a single key property. If we want several properties together to act as a key, we talk about compound keys. For example, assume we want to express that having the same SS# and being the same age (since, who knows, we might reuse SS# every year, or every generation) implies being the same individual (that is, we want to say that `ss` and `age` act as a compound key). We can do this using the following DL safe rule:

ss(X, K1), age(X, K2), ss(Y, K1), age(Y, K2) -> X = Y.

Rough details

Syntax

One specifies that a datatype property is a key for a certain class with a keyAxiom:

keyPropertyExpression := dataPropertyExpression |
   objectPropertyExpression
keyAxiom := 'KeyFor' '(' { annotation} keyPropertyExpression { keyPropertyExpression } description')'

With this one can declare both simple keys:

 KeyFor(hasSS Person)
Editor's Note: Ick, this syntax isn't perfectly clear. It should be read as "hasSS is a key for the class Person".

and compound keys:

KeyFor(hasId hasAge Person)

In the later case, two people are the same if their have the same age and id and, if we want (but we don't have to), we can think of hasId, hasAge being declared as bing functional elsewhere in our ontology.

At the moment, we require that if there is only one property


RDF Mapping

For the RDF mapping, we reverse things a little bit. The class is in the subject position and we have a list of properties that make up the key. E.g.,

Person owl:hasKey (hasSS)

or

Person owl:hasKey (hasId, hasAge)

The list is, semantically speaking, unordered.

Semantics

The semantics of a key constraint

KeyFor (p1,...,pn, d1,...,dm,  C)

(where the pj are object properties and the dj are datatype properties) in an ontology O is defined as follows: in addition, for an interpretation I to be a model of O, I has to satisfy the following:

forall x,y:
          if
                 there exists z1,...zn,v1,...,vm such that:
                       C(x) and C(y) and HU(x) and HU(y) and HU(z1) and... HU(zn) and
                       p1(x,z1) and p1(y,z1) and ... pk(x,zk) and pk(y,zk) and
                       d1(x,v1) and d1(y,v1) and ...dm(x,vm) and dm(y,vm) 
          then 
                  x=y

where HU is a predicate holding of all elements of the Herbrand universe. That is, it holds of all individuals named in O.

The above statement can be captured by a (DL-safe) rule, for which we plan to make use of BLD http://www.w3.org/TR/rif-bld.

Spec Proposal

Keys (Syntax)

A key axiom KeyFor( KPE Class ) states that the key property expression KPE is a key for individuals of a certain class — that is, for each combination of values that satisfy KPE, there can be at most one (named) individual x which is an instance of class C such that x is connected by each of the properties in KPE to the particular satisfying value. In the case where k contains a single property and the class is owl:Thing, the expression is similar to an inverse functional property, though it can be a data property and it only applies to named individuals.

Note that keys apply only to named individuals, thus, all other things being equal, a key axiom is strictly weaker than a corresponding inverse functional property. For example, when KPE contains a single object property, OP, the axiom:

KeyFor(OP, owl:Thing)

is strictly weaker than the related axiom:

SubClassOf( owl:Thing MaxCardinality( 1 InverseOf( OP ) ) )

However, unlike inverse functionality, keys can be declared for data properties and sets of properties (which can be mixed data and object properties).

keyPropertyExpression := dataPropertyExpression |
   objectPropertyExpression
keyAxiom := 'KeyFor' '(' { annotation} keyPropertyExpression { keyPropertyExpression } description')'

Consider the ontology consisting of the following axioms.

Key( a:hasID "Person" ) Each person is identified by their ID.
PropertyAssertion( a:hasID a:Peter 777^^xsd:integer ) a:Peter has the id 7777 which is an integer.
PropertyAssertion( a:hasID a:Peter_Griffin 777^^xsd:integer ) a:Peter_Griffin has the id 777 which is an integer.

Since a:Peter and a:Peter_Griffin have the same key, they must be equal; that is, this ontology entails the assertion SameIndividual( a:Peter a:Peter_Griffin ).

Key( a:dateOfBirth "a:name" "address" "Person" ) Each person is uniquely identified by the combination of their date of birth, name, and address.
PropertyAssertion( a:hasID a:Peter 777^^xsd:integer )


class="name" | PropertyAssertion( a:hasID a:Peter 777^^xsd:integer )

a:Peter has the id 7777 which is an integer.
PropertyAssertion( a:hasID a:Peter_Griffin 777^^xsd:integer ) a:Peter_Griffin has the id 777 which is an integer.

RDF Mapping

  • KPE denotes an key property expression; a KPE is either an OPE or a DPE
Table 1. Transformation to Triples
Functional-Style Syntax S Triples Generated in an Invocation of T(S) Main Node of T(S)
KeyFor(KPE1...KPEn CE) T(CE) owl:hasKey T(SEQ KPE1...KPEn)
Table 15. Parsing of Axioms
If G contains this pattern... ...then the following axiom is added to O.
_:x rdf:type owl:AllDisjointProperties
_:x owl:members T(SEQ y1 ... yn)
{ DPE(yi) ≠ ε for each 1 ≤ i ≤ n }
KeyFor( KPE(y1) ... KPE(yn) )

Semantics

Discussion

Why are these keys easy? The reason is the usage of "HU" in the above description of the semantics: key constraints only act on individuals explicitly present in our ontology, but not on those whose existence is only implied. And key constraints "act" by inferring new equalities.

In the definition of the semantics, the predicate HU(.) is applied to all "abstract" variables, i.e., to all variables occurring in objectproperties, but not to datatype variables, i.e., those occurring in the second place of a datatype property. This is so because, for abstract variables, it is (a) necessary for decidability/DL-safety, and (b) it is easy to determine for which elements HU(.) holds -- but it would be more tricky and cause problems for datatype values, and is not necessary for decidability.

Why not easiest (easy-peasy) keys?

(Easy Keys reduce to Easy Peasy keys when you have no user defined datatypes on key properties.)

The easiest keys would be restrict the range of data valued variables to explicitly named data values in the ontology. That is to handle the abstract variables and the data variables in precisely the same way. One could then preprocess the ontology, looking for explicit data values, and treat them as the only substitution candidates for the variables.

@@Explain HU

Unfortunately, this approach leads to some very counterintuitive cases. Let's say we have a class:

Class: GoodEmployee EquivalentClass: hasId some integer

and that hasId is a key property. Now consider the following facts:

Individual: Mary
     Types: GoodEmployee
Individual: Sheevah
     Types: GoodEmployee.

In this ontology, thus far, nothing forces Mary to be the same as Sheevah.

But suppose we change the definition:

Class: GoodEmployee EquivalentClass: hasId some integer[ > 15, < 17 ]

Now, there's only one possible Id a GoodEmployee can have (i.e., 16), and Mary and Sheevah thus have the same id, but since "16" does not appear explicitly in the ontology, the key reasoning isn't triggered.

If that were all, perhaps it'd be tolerable, but no consider adding an additional fact:

Individual: Demeter
    Facts: hasAge "16"^^integer

Note that this is a completely unrelated fact. But since we now have "16" appearing explicitly in the ontology, the ontology entails Mary sameAs Sheevah.

(At the F2F, Jeremy observed that this sort of easier key could be hard for some implementation strategie. For example, a forward chaining rule engine might add a fact containing "16" during reasoning and then work very hard to avoid entailing the equality.)

Such anomalies are the primary reason for moving to slightly less easy keys. The slightly less easy keys introduces a fair bit of implementation challenge since you can get nontrivial interactions when you have large, but finite, enumerations of key values, e.g., integer[>16, <1006].

Why not full inverse functional data properties

If you restrict yourself to single key properties (i.e., no composite keys), then our Easy Keys yield a subset of the entailments you can get with inverse functional datatype properties (IFDPs). There are two main problems with IFDPs:

  1. While decidable, they are very hard to reason with. In fact, there are no implementations that we are aware of and no implementor we know of relishes the challenge. Without IFDPs, datatype reasoning can be confined to a single individual (asserted or generated) at a time. That is, you gather all the data constraints applicable to the individual, and then pass them to the datatype reasoner. If the datatype reasoner says that they are consistent, then you can mark that individual as checked, and (largely) forget about it. With IFDPs, you lose this locality -- essentially, you have to be ready to check all individuals with datapropeties at any time. Since in OWL, not all of the potential key values may be explicit, this involves maintaining a large constraint system that is constantly updated. Not Fun.
  2. It's not clear that modelers would know what to do with such a powerful feature, or even if they'd want such power in most cases. Clearly, they want keys, but it's not at all clear that they want IFDPs.

So, in essence, we have a lot of implementation work (and research!) for not obvious gain.

What do we miss with EasyKeys? Basically, we miss the ability to entail subsumptions through key constraints (at least, directly through key constraints; if we have nominals, for example, we can force subsumptions in specific cases). This is essentially the difference between full SWRL rules and DL Safe SWRL rules (see Bijan's series Understanding SWRL for an explication). The major use case we could think of for this is aligning database schemas. If you could reduce database schemas to OWL classes (including the key constraints as IFDPs), then your reasoner could show whether the two schemas were equivalent or disjoint. If one had keys that were integers over 1000, and the other had keys that were integers from 100 to 9999, you'd get an inconsistency.

To our minds, this is a very specialized application which, in fact, has never occurred. If it does, one can always appeal to IFDP in OWL Full as an extension. However, easy key reasoning (i.e., merging individuals with the same key) is very common and its lack has been identified to us by several users are being a hard blocker to adopting OWL.

Keys on BNodes

===Adding to Profiles===
Retrieved from "http://www.w3.org/2007/OWL/wiki/index.php?title=Easy_Keys&oldid=8476"