DRAFT $Id: paper.html,v 1.29 2002/11/18 17:01:20 sandro Exp $

Semantic Web Languages:
Fundamentals and Combinations

Sandro Hawke

W3C MIT/LCS
200 Technology Square, Cambridge, MA 02139, USA

Abstract

RDF's simple data model is incapable of directly expressing logically-complex information, such as negations or conditional facts. This weakness has been considered acceptable because more expressive languages can potentially be layered on top of RDF. Layering such languages has been shown to be difficult, however, and techniques for doing so are not widely understood. This paper presents the LX family of languages, which includes syntaxes for full first-order logic, RDF and a quotation mechanism. The common sub-language of triples is then used as a base for layering the more-expressive languages. Software which implements and tests this layering is discussed, along with early and potential applications of this Semantic Web "Logic Layer".

Keywords

RDF, First-Order Logic, FOL, Semantic Web, Logic Layer, Layering, Languages, OTTER, XSB, CWM, N3, Lbase, Semantics.

1. Introduction

In the emerging architecture of the Semantic Web, RDF acts as the standard language for representing information. Its basic model is seen as simple enough for widespread deployment, while its extensibility should allow arbitrarily complex and sophisticated uses to emerge, all neatly layered on the same basic interchange language. Where XML offers an extensible collection of languages which can use the same parsers and tools and have expressions from one XML language used neatly in expressions from another, RDF offers a single language in which one can express arbitrary facts, given a suitable (extensible) set of defined terms. In short, RDF may provide a common language which is, in practice, both simple enough to widely deploy and extensible enough to support all reasonable needs.

Doubts have emerged, however, over the possibility of layering highly-expressive logic languages on RDF [Patel-Schneider02]. Without such layering, RDF cannot be applied in many areas where formal knowledge representation has proved useful and many applications will need to use alternative languages, potentially fragmenting the Semantic Web. Since Patel-Schneider showed DAML+OIL [DAML+OIL] getting tangled in its layering on RDF, prospects have improved with the newer formalization of the OWL semantics [OWL Semantics], but the details of a proper layering for full first-order logic on RDF are still missing. McDermott and Dou's contribution [McDermott02] avoids serious tangles, but only by omitting essential components. LX, as presented here, provides a complete layering of full first-order logic on RDF.

This layering is defined using a traditional serial syntax for first-order logic, introduced in section 2 as "Vanilla." The dyadic-conjunctive-existential subset of this, commonly associated with RDF, is dubbed "Primordial" and forms a base for layering. RDF itself is more expressive and is formalized here as two extensions to Primordial called "Breadfruit" and "Flatbread". Like N-Triples [N-Triples], these are non-standard alternatives to the W3C-recommended XML serialization for RDF.

The fifth whimsically-named LX language, "Mushroom", introduces a syntax for reification, with quoting and unquoting of logic sentences. These languages have an obvious syntactic combination ("Full LX"), a full-featured knowledge representation language roughly comparable to N3 [N3] or KIF [KIF92], but the primary goal here is to define the languages we want to layer, and the language on which we want to layer.

The exact nature of "layering" as applied to languages is perhaps unclear. The analogy to protocol and API layerings requires some thought, if one is to make sense of the emerging Semantic Web architecture. In particular, in what sense can one "layer" a logic language on another logic language? The answer, that defining an external meaning for a logical constant is analogous to defining a type field in a packet, is explored in section 3.

Section 3 goes on to define such meanings for logical constants (that is, to create ontologies) which allow the more-expressive serial languages of section 2 to be layered on Primordial. These ontologies can be used in practice to describe (reify) and express RDF and FOL knowledge.

Section 4 presents preliminary results in developing software using these concepts, languages, and ontologies. In particular, a first-order axiomatization of the ontology for first-order logic was developed and used, with an automated theorem prover, to demonstrate the basic functionality of this approach.

Sections 5 and 6 address future work and current conclusions.

2. Serial Languages

LX (version 3) contains five closely related serial languages, all given whimsical but evocative names. They are all LR(1) languages defined on US-ASCII input strings and are meant to be simple, practical, and not novel.

2.1. Vanilla: FOL

The "Vanilla" serialization is an unsurprising syntax for first-order logic. It serves as a reference point, being only trivially different from the language of first-order logic (FOL) as described in numerous logic texts and widely implemented in reasoners.

Most of these trivial differences are due to typographic constraints; rather than using subscripted letters, Vanilla allows names more like those used in programming languages. This is not a new variation, so LX borrows a previous approaches; it simply uses the formula syntax from OTTER [OTTER], a prominent theorem prover, with minor changes.

Other options include a subset of Infix KIF [KIF98] (although the text warns that it "is not intended for use in the exchange of knowledge between computers") and Lbase, which is somewhat less traditional in its syntax which appears to be high-order, even through it is technically a first-order logic [Lbase].

The differences between Otter's formula syntax and Vanilla are:

  1. Otter has Prolog's syntactic sugar for lists: [first, second, third | rest]. (Breadfruit will re-introduce this to LX, but it does not belong in Vanilla, since it is not a standard feature of first-order logic.)
  2. Vanilla slightly constrains the syntax of identifiers, removing Otter's special identifiers (the ones starting with "$"), and limiting the use of special characters in identifiers. Specifically, Vanilla only allows identifiers like in C ([a-zA-Z_][a-zA-Z_0-9]*) and URI-References surrounded by angle brackets and single quotes, like
    '<http://www.example.com/animals#Dog>'
    There is no semantic difference between theses two classes of identifiers in Vanilla, but in section 3.2 an important difference at another level will be introduced.
  3. Logical connectives are allowed only between sentences in Vanilla, as in FOL. Their use inside OTTER terms is non-standard, and has no meaning to the reasoner.
  4. In some cases, the precedence rules Otter uses seem to be different from those documented. Vanilla follows the documentation.

Comparison of Syntaxes For First-Order Logic

Vanilla LX Equivalent Forms Name
P(a) | Q(a) P(a) ∨ Q(a)
P(a) v Q(a)
disjunction, boolean OR
P(a) & Q(a) P(a) ∧ Q(a)
P(a) · Q(a)
conjunction, boolean AND
-P(a) ¬P(a)
~P(a)
negation, boolean NOT
P(a) -> Q(a) P(a) ⊃ Q(a)
P(a) ⇒ Q(a)
material conditional, implication, boolean IF/THEN
P(a) <-> Q(a) P(a) ≡ Q(a)
P(a) ⇔ Q(a)
material equivalence, biconditional, boolean IFF
all x P(x) ∀x P(x)
(x)P(x)
universal quantification
exists x P(x) ∃x P(x) existential quantification
all x exists y (P(x,y) | Q(x,y)) ∀x ∃y (P(x,y) ∨ Q(x,y))
P(x) & Q(x) -> -R(x) (P(x) ∧ Q(x)) ⊃ ¬ R(x)

2.2. Primordial: Pure Triples

Primordial is the sub-language of Vanilla in which all atomic sentences use a dyadic (2-argument) predicate, there are no function terms, existential but not universal quantification is used, and conjunction is the only logical connective. This has the same expressive power as the more common existential-conjunctive sub-language of FOL, but is even more syntactically restrictive in disallowing logic functions.

2.3. Breadfruit: RDF as p(s,o)

Although RDF is very close to Primordial in spirit, it contains several elements which are not present in first-order logic and are thus absent from Primordial. Foremost of these is literals, which are character strings, data values represented by character strings, or character strings paired with a language identifier. In addition to literals, RDF also has two different kinds of sequences, a type construct, a reification construct, and some odder bits, depending what exactly one considers part of RDF.

The RDF Core Working Group has perhaps clouded the picture by adding these features in two different ways: for some (like literals), they have extended RDF's abstract syntax; for others (like reification), they have used semantic layering (as discussed in section 3, below) so the feature is visible only to systems which recognize certain URI-References. In LX, these extra-Primordial features are added purely by syntactically extending the language to create Breadfruit. (The Breadfruit ontology (section 3.3) takes the other route and allows the same expressiveness within Primordial using only layering.)

Breadfruit extends Primordial:

  1. Monadic (1-argument) predicates are returned, to represent classes.
  2. String literals using double-quote marks and \-escaping as in N-Triples [N-Triples] are added
  3. Prolog-style lists are added. These look like [] = the empty list, [x] = the singleton list with x in it, [a,b,c] = a 3-element list, [a,b|x] = a list starting with a and b, which are followed by the list x. (The "|" character occurs between terms here, not sentences, and so is distinct from its use in disjunction.)
  4. Various function terms are added, as detailed below.

RDF graphs are mapped to Breadfruit sentences as follows. This is similar to the Lbase mapping in Appending A of [RDF Semantics]:

  1. RDF type arcs turn into Class(instance) sentences.
  2. Other arcs turn into Predicate(subject, object) sentences.
  3. Plain, no-language literals turn into "..." escaped as in N-Triples.
  4. Plain literals with a language turn into langString(string, lang)
  5. Typed literals turn into dataValue(string, type) or xml(string) in the case of rdf:XMLLiteral
  6. Unnamed nodes turn into variables quantified as existential at the document-level
  7. URI-References turn into names of the form '<...>' (the single-quotes are part of the name).
  8. Collections made via rdf:first/rest/nil turn into Prolog-style [...] structures
  9. Bags turn into bag(list)
  10. Alts turn into alt(list).
  11. RDF SEQs with rdf:_nnn turn into seq(subj, list). Perhaps seq(X,X) for all X. The triple "_:x rdf:_2 _:b" would turn into "seq(x, [g1,b|g2])" where _:x and _:b mapped to x and b, and g1 and g2 were introduced existential variables. A clever processor might recognize more complete lists and manage to avoid generating extra terms like g1 and g2.
  12. RDF reification is handle similarly with { ... }. "_:x rdf:subject _:s" would turn into "stating(x, { g1(s, g2) }) ". The stating predicate is required to match the RDF Core notion that RDF statements not uniquely determined by their subject, predicate, and object.

2.4. Flatbread: RDF as rdf(s,p,o)

One potential drawback to Breadfruit is that RDF predicates are encoded as Vanilla predicates, which means (because Vanilla is first-order) they are not subject to quantification. While this may be desirable for some applications, it can frustrate others.

An alternative coding lowers the syntactic order by one step, much like the "holds" predicate in KIF: instead of each RDF triple being written as a dyadic atomic sentence, it is written as a triadic one with a fixed predicate. This is the approach taken in [DAML Axioms] and in the LX version 2 axioms.

This allows some kinds of axioms, such as those for RDFS, to be directly conjoined with the knowledge base, but it is only a one-level trick: the syntactic order-lowering of the knowledge base keeps it artificially separated from the axioms, preventing the complete self-referential semantics from being expressed. The facts and formulas in the knowledge base itself can no longer be understood directly by a conventional reasoner. (See also section 3.7.)

2.6. Mushroom: Reification

Mushroom LX introduces support for reification with two syntactic features:

  1. Terms may be of the form "{" sentence "}", and such terms denote the given logical sentence (closed formula) in abstract form. This syntax comes from n3 [N3], but its use of open formulas is not permitted here. What facts may be true of the sentence are both the syntactic ones (ontologized in section 3), and the logical ones, available as below;
  2. A truth predicate, WellFormedAndTrue, is true of a sentence (as above) if and only if it is both well-formed and true. The meaning of "well-formed" for Primordial+Mushroom is purely grammatical, but in section 3.6 its definition becomes more complex to avoid paradox as Vanilla is mixed in.

3. Layered Languages

3.1 The Concept of Layering

The division of labor in networked computer systems into protocol layers can be seen from two perspectives: (1) the implementation of each layer uses communication services provided by the lower layer, or, alternatively (2) the information content associated with each layer is embedded within the information content associated with the layer beneath it. In the first perspective, the TCP driver calls the IP driver which calls the Ethernet driver. In the second, the TCP information is embedded in the data portion of the IP information, which is itself embedded within the data portion of an Ethernet packet.

The second perspective is of interest here, since languages express information content. The layering in the Semantic Web "Wedding Cake" [Berners-Lee00] shows "Logic" on "Ontology vocabulary" on "RDF + rdfschema" on "XML + NS + xmlschema". It is hard to imagine how a message or packet with some logic content (perhaps an FOL expression), might carry it embedded inside some ontological content (whatever that might mean).

It seems the layering in this classic diagram is more figurative, speaking in terms of technologies requiring other "underlying" technologies. In this sense Logic-on-Ontology and Ontology-on-RDF makes sense. If we want to carry a Vanilla expression in a Primordial one, as detailed below, we do so by defining the meaning of certain URI-Refs. The URI-Refs with their definitions comprise an ontology, so in this sense, ontology is a vital intermediate layer.

Returning to the stricter sense of layering as embedding, how exactly do we layer a more-expressive logic on a less-expressive one? The obvious, traditional approach in language design is to extend the syntax, allowing the more-expressive language to be a syntactic superset of the less expressive one. The added syntactic constructs provide a way to say additional things. In this sense, Vanilla is strictly more expressive than Primordial, and we can imagine Vanilla to be layered on Primordial.

But this is not RDF layering and this is not protocol layering, where an Ethernet packet containing an IP packet is, after all, still an Ethernet packet. A Primordial expression extended in this manner to be a Vanilla one would no longer conform to Primordial syntax.

Ethernet packets are capable of carrying IP packets because of two features: the type field, which can be set to 0x800 to indicate IP, and the data field, which is opaque to the Ethernet layer and carries the IP packet. For a Primordial expression to carry a Breadfruit or Vanilla one, we need to use URI-Refs for both type and data.

In fact, a Primordial expression using URI-Refs from some ontologies has greater expressivity than an Ethernet packet. Like Ethernet, it can carry arbitrary (eg binary) data with associated type information, but it can also carry such information for multiple higher layers simultaneously.

This simultaneous carriage of information for higher layers allows multiple ones to sit directly on Primordial, instead of being stacked on top of each other, or next to each other above some multiplexing layer. It is like the IP packet, in addition to its main payload, also carrying small option fields, intended to speak to routers along the way. In Primordial and RDF, such options co-exist as equals; the message can carry many payloads in many languages to many receivers simultaneously.

3.2 The Power of External Names

But how does defining the meaning of URI-Refs allow arbitrary data to be carried in a Primordial expression or RDF graph? This is a question of communication. Consider a robotic agent, Alice, built on a knowledge base and reasoner (as in [AIMA]). A trusted user, Sam, sends a message to Alice, informing her of an obstacle in her path. Assume the simplified "telepathic" case, where Sam's message is simply inserted into Alice's knowledge base, ignoring issues of trust, parsing, indexicals, etc.

If Alice and Sam share a vocabulary for obstacles and locations, he can simply assert something like "location(largeGreenThing, location51)" and she will understand it. If their shared vocabulary does not contain a term for large red things, things get more complicated. He can make up a term (using some convention to prevent unintended name reuse), and say "location(sam_largeGreenThing, location51)", but of course Alice will know no more than that something is there. If he additionally asserts that sam_largeGreenThing has some particular mass and volume, using terms Alice does know, she should be able to act appropriately.

One of the basic notions of the Semantic Web, however, is that Sam need not tell Alice everything about sam_largeGreenThing whenever he uses the term. Instead, he can publish the necessary descriptions on the web, and Alice can retrieve it if necessary. He may include an rdfs:isDefinedBy declaration to give the retrieval address, or she may simply use the URI part of the URI-Reference as a "home" address.

If Sam's description or definition of sam_largeGreenThing is in the same language as his message to Alice, then there is no formal difference between the techniques: Alice ends up with the same definition in her KB (and can perform the same inferences) whether it was sent by Sam or she fetched it off the web.

But what if Sam's definition is in a different language? If his message is in Primordial but his definition on the web is in Vanilla, and Alice can understand both, then his message can, in effect, say things which cannot be said in Primordial. For example, he could define sam_ChristmasThing as something green or red; when he says there is a sam_ChristmasThing at location12, he has made a disjunctive assertion, even though the language of his message (Primordial) has no disjunction connective.

If this seems far fetched, consider that rdfs:isDefinedBy is defined in English. Its meaning is available on the web, but only to those of us who can understand English. How is this different from a meaning which is available only to those who can understand Vanilla?

The point is this: when you use name in a message which is to be directly incorporated into a knowledge base, the choice of a name goes from being arbitrary (as in a typical logical formula) to being crucial; if the name exactly matches some name already present in the KB or which is used again later, then the meaning of the message is tied to the meaning of other information in the KB.

To help keep this clear we use the follow convention when using LX expressions as messages or knowledge-base fragments: all long-form (URI-Ref) LX names are called "external" and conform to certain standards of meaning as in [RDF Concepts]; each short-form name is either "internal" (and must be rewritten to be distinct from all other internal names in any KB before merging) or is mapped to a long-form name.

The indication of which short-form names are to be internal and which are to be mapped (and how they are mapped) must be carried in message meta-data.

A suggested format for such meta-data:

@internal(<short-form name>).

@internal(<short-form prefix>*).

@external(<short-form name>, <URI-Reference>).

@external(<short-form prefix>*, <URI-Reference prefix>).

From this point on, assume the declaration @external(lx_*, "http://www.w3.org/2002/08/LX/v3/ont#") applies to this paper.

It is a flaw in the design of RDF (as of this writing) that while it has internal names and external names, it does not allow internal names to be used as predicates. The best workaround is to use external names from a private space (eg UUIDs). This technique is essentially Skolemization, and like Skolemization, is unfortunately only valid in an assertional context.

Perhaps the most interesting conclusion is that the entailment of an RDF document is of little interest compared to the new entailments of an agent's knowledge base when the RDF document is added. It is only in combination that we see the difference between internal and external names, as external names become points of attachment to distributed knowledge.

3.3 Breadfruit

As an example of layering-by-defining-terms, and to further formalize Breadfruit, consider a Breadfruit ontology. This introduces URI-Ref names which can be used in Primordial to express the same thing a Breadfruit expression.

Start with lists: instead of the [...] syntax, we can use three terms lx_first, lx_rest, and lx_nil. This is obvious, since this RDF uses vocabulary here, instead of syntax.

Where RDF uses syntax, however, we can use vocabulary. Let lx_zero denote the natural number zero and lx_succ be the 1-1 predicate connecting each natural number to the one after it. Now let lx_unicode_v3_2 be the mapping from numbers, acting as Unicode "code points", to their corresponding characters in version 3.2 of the Unicode standard. Using these terms and our list terms, we can identify every finite sequence of unicode characters in Primordial.

Is such a sequence an RDF plain, untagged literal? If not, we could introduce another predicate mapping it to such a literal, but let us assume it is.

An alternative vocabulary, using more terms and axioms but fewer triples, would be to use example_x00 through example_0xff to denote bytes, and then assemble the bytes into a UTF-8 encoding of a unicode string. The axioms for pairwise distinctness of the bytes would be tedious, however.

Functions such as langString and predicates such as seq are easy to map to vocabulary terms. As an example, the Breadfruit expression:

likes(sandro, langString(chat, enUS)).

where chat and enUS denote strings, which have been tediously phrased in terms of zero, succ, unicode, first, rest and nil, turns into the Primordial expression:

likes(sandro, lit1).
lx_langString1(lit1, chat).
lx_langString2(lit1, enUS).

The details are on the website [LX].

3.4 Vanilla and Breadfruit

Now it is easy to define a trivial encoding of Vanilla expressions using Primordial:

all x likes(sandro, x)

can be encoded in Breadfruit (and thus Primordial, as above) as vs1:

exists vs1 vanillaSerialization(vs1, "all x likes(sandro, x)").

This is unsatisfying in two ways: (1) the parts of the Vanilla sentence are not available directly as properties of vs1, so to use the sentence one needs be aware of all Vanilla's syntactic details, and (2) the sentence is described, but not asserted, so the Primordial expression does not actually say anything about "sandro" -- except that it's a string of characters in some sentence. These problems are addressed, respectively, in each of the next two sections.

3.5 Pure Vanilla

To provide an abstraction of the Vanilla syntax, and thus an abstraction of first-order logic syntaxes in general, we create an ontology. And RDF graph describing a Vanilla sentence ends up looking very much like the abstract syntax tree (AST) for the sentence.

This approach was developed by McDermott et al [McDermott01] using an earlier version of RDF (without LISP-style lists) and with somewhat more lenient requirements. That approach, as with LX version 2, did not provide substitutional opacity; the names used in a sentence were essentially lost. Its distinction between constants and variables is also subject harmful interactions with RDF semantics, as is its default handling of assertions, as discussed in the next section.

The full details of this ontology are too lengthy to include here, but are on the website [LX]. It has two forms: a natural language definition of each term, with appeal to standard literature for first-order logic, and an axiomatization using Vanilla/OTTER, as discussed in section 4.1. The approach is described here:

3.6 Vanilla and Mushroom

The Vanilla ontology above allows us to identify Vanilla sentences (and formulas, terms, etc) in Primordial, but not to assert them. A true Primordial expression (using only the Vanilla ontology) can tell us that S is the disjunction of P and Q, but it cannot assert S, telling us that P or Q is true.

McDermott and Dou found this sufficient; they considered any sentence which is not used in another sentence in the RDF graph to be asserted. [private communication] (Why else would it be in the KB, after all?)

Unfortunately, this kind of assertion-by-default interacts badly with RDF semantics. "An RDF graph entails all its subgraphs" [RDF Semantics], independent of higher layers. Higher layers should not encode information in an RDF graph such that a subgraph says something which is not entailed (in the logic of the higher layer) by the full graph.

For example, an RDF graph encoding "not S" (for some sentence S) in McDermott and Dou's style would RDF-entail the subgraph which contained only "S". "S" is a "top-level" sentence in the subgraph and, by the default rule, should be considered asserted. Thus, in the combined semantics of RDF and the McDermott and Dou ontology with implicit assertions, "not S" entails "S".

The first step toward a solution is to introduce an explicit truth predicate. Most simply, this could be a class of sentences which are well-formed Vanilla, are true, and contain no external names:

# Example of a Breadfruit expression with an embedded Vanilla one
example_WellFormedTrueVanillaWithoutExternalNames(s).
lx_vanillaSerialization(s, "...Vanilla expression goes here ...").

This is a one-level truth predicate, since Vanilla itself has no such predicate. Self-reference is impossible, because the Vanilla expression itself has no external names. (Or if it does, the Breadfruit expression is simply false.)

But this is unsatisfying; we want external names. As soon as we allow them, however, we open the door to paradoxes of self-reference:

# Paradox example
example_TrueVanilla('<http://example.com/kb#s1>').
lx_vanillaSerialization('<http://example.com/kb#s1>', 
                        "-example_TrueVanilla('<http://example.com/kb#s1>')").

This is a common problem in knowledge representation, with numerous approaches suggested [Guha91, Sowa93, Attardi95, Barwise95, Connolly00].

One solution is to explicitly disqualify negated references. Primordial itself has no negation, which only enters though truth predicates and can thus be controlled. This predicate might be called True-and-containing-no-negated-references.

Even more useful is the predicate proposed by [Perlis85] and used in some versions of KIF [KIF92 p.52], which is True-when-rewritten-to-contain-no-negated-references. Or even more precisely: able-to-be-rewritten-without-negated-references-and-then-true. So the value of this predicate on the Liar Paradox is false because such a rewrite is impossible, not because the paradoxical sentence is itself false.

If we consider suitability for such rewriting to be a part of well-formedness, then it becomes reasonable to simple called this WellFormedAndTrue, the essential part of the Mushroom language.

3.7. Fast vs. Full Reasoners

There appear to be two techniques for doing FOL reasoning with a Vanilla expression embedded in a Primordial expression. These two techniques have emerged in WebOnt discussion of OWL semantics as well [OWL Semantics], and may well be fundamental to Semantic Web languages.

Fast recognition-based reasoners start by doing a pattern-match on their input to extract the embedded sentences. Then they perform traditional, direct reasoning with those sentences.

Full axiom-based reasoners use the Primordial expression itself, along with axioms from the ontologies of the embedded languages.

It seems likely that "fast" reasoners will be much faster than "full" ones, but will be unable to perform some valid inferences involving self-referential structures.

The full axiom-based reasoning approach also clarifies the semantics of primordial expressions using terms from multiple ontologies

4. Software

Although the primary results of this work are conceptual, software has been developed to test the concepts explored here, and to begin to explore real applications of the Semantic Web Logic Layer.

4.1. Axioms

Although Vanilla is hardly a conventional programming language, the process of expressing the Vanilla ontology in FOL axioms (in Vanilla itself) turned out to be one of software development, where a FOL reasoner (OTTER) was used a a language interpreter, to check the work and provide desired output.

These axioms were tested on an ad hoc basis, but were later incorporated in the the test suite for loopback testing, below.

4.2. Modules

A partial implementation is available [LX] as a Python package, including modules for parsing the various serial languages, generating serializations, "describing" (reifying) Python structures (such as LX.Formulas) to a knowledge base, "recognizing" them (reconstructing Python structures out of information in a knowledge base), and interacting with various reasoners (as below).

4.3. Loopback Testing

The first general test is a "syntactic loopback" test: each Vanilla sentence is described (encoded) and asserted in an RDF graph; then a graph match is done to find any asserted sentence, extract their description, and serialize them. The test succeeds if one sentence is formed and its serialization matches the original Vanilla sentence.

This test has been passed on versions 1 and 2 of the ontology with a set of several dozen complex FOL sentences.

Then second general test is a "semantic loopback" test. Each test sentence is placed in an RDF graph (as in the syntactic loopback test); then a FOL reasoner is asked to prove the original sentence from that RDF graph and the axioms for Vanilla and Mushroom.

This test has been passed by version 2 of the ontologies (with Flatbread-style RDF) with a set of several dozen complex FOL sentences; it fails on some sentences with more than two existential and two universal variables, suggesting the presence of a bug in the quantification axioms.

4.4. Integration with Existing Reasoners

So far integration has been attempted with three broadly different reasoning systems, each using a different style, and with somewhat different goals.

OTTER [OTTER], an automated theorem prover, has been used, in general, to prove inconsistency and particular results (via refutational hyperresolution) and consistency (by terminating during hyperresolution). (Its companions MACE and ICGNS have been used to prove consistency of subsets of the axioms.) For automated testing and tool integration, OTTER has been hidden behind a Python knowledge-base driver, which forks and controls a otter process, parsing its output. Tighter integration with otter by delving into the code and turning the essential components into a library has been considered by not attempted.

XSB [XSB], a tabled Prolog system, promises to be an effective reasoner for Horn knowledge bases, and like otter could be used in a separate process or as a library, but so far has only been using in this effort in the recognizer; the Prolog style of pattern matching, especially with DCGs, makes the reconstruction of a reified sentences almost trivial.

CWM [Cwm], a experimental Semantic Web reasoner, has been modified to optionally store its knowledge base in an LX-friendly form, and use this to perform conversion between languages and reification and partial reificiation (flattening) of N3 and RDF. Plans are underway to integrate XSB and OTTER as CWM reasoners, using LX as the interchange language.

5. Applications and Future Work

There are two styles of applying LX: logic and metalogic. The logic applications are unsurprising: access control, database views and constraints, policies and preferences, ontology mapping, service coordination and planning, taxonomy, etc. These areas are sometimes better served by logics less expressive than FOL, such as RDFS or OWL. The less-expressive languages can be easier to understand and generally come with performance guarantees which are impossible with FOL. But sometimes FOL is necessary, sometimes it is more convenient, and often the performance costs are acceptable. LX provides FOL on the Semantic Web.

LX can also be used as a metalogic, defining the characteristics of other languages in a form usable by both humans and machines. An axiomatization of OWL, for instance, can help people understand its semantics. That same axiomatization in LX allows FOL reasoners to provide OWL inference. This approach may not make sense of OWL, which is expected to be standardized and widely implemented, but other experimental logics, if axiomatized in LX, can be used by LX reasoners on the Semantic Web without any additional software deployment.

6. Conclusions

When trying to say something in an language without sufficient expressive power, we have two choices: we can move to a more-expressive language (perhaps a super-language, adding only the features we need), or we can give additional meaning to linguistic elements which are present but were undefined.

The super-language (syntactic extension) approach is perhaps simpler and is quite common in the evolution of formal languages, but it lacks forward-compatibility: systems built to handle one language cannot generally handle extended versions. By contrast, the adding-definitions approach behaves like protocol layering: each component of the system handles the elements it knows and treats the rest as opaque.

Both approaches can coexist. Primordial, a language of pure triples, can be syntactically extended to be Vanilla first-order logic, or some external (URI-Ref) terms can be defined to allow Vanilla expressions to be described and then asserted. The first approach is probably easier for humans and domain-specific applications, while the second can be better supported by a shared infrastructure. Machine translation between the two may help smooth the difference.

RDF, being very close to Primordial, with internal and external names, is thus suited to be a foundation for other languages no matter which route is taken, although it remains important to avoid confusion between the alternatives or committing to one in a way which might socially, if not technically, exclude the other.

Acknowledgments

This work has been supported by the DARPA/DAML project under MIT/AFRL cooperative agreement number F30602- 00-2-0593.

Despite the author's affiliation with the W3C, this work is not on the W3C recommendation track. It is not the product of a W3C working group or interest group and should in no way be construed as reflecting the position of the W3C or its members.

The author is grateful to Pat Hayes, Bijan Parsia, Lynn Andrea Stein, Peter Patel-Schneider and especially Tim Berners-Lee and Dan Connolly for their advice and suggestions related to this work.

References

[AIMA] Stuart Russell and Peter Norvig: Artificial Intelligence: A Modern Approach. Prentice-Hall, New Jersey, 1995.

[Attardi95] Giuseppe Attardi and Maria Simi: A formalization of viewpoints. Fundamenta Informaticae, 23(3), 1995. Also technical report TR-93-062 at the International Computer Science institute, Berkeley, California 94704-1105. ftp://ftp.di.unipi.it/pub/Papers/attardi/fundamenta.ps.gz

[Berners-Lee00] Tim Berners-Lee: Semantic Web. Presentation at XML 2000 http://www.w3c.org/2000/Talks/1206-xml2k-tbl/slide10-0.html

[Connolly00] Dan Connolly: Notes on RDF and FOPC. http://www.w3.org/2000/07/hs78/FOPC

[Cwm] Tim Berners-Lee: Cwm.Website, http://www.w3.org/2000/10/swap/doc/cwm

[DAML Axioms] Richard Fikes, Deborah McGuinness: An Axiomatic Semantics for RDF, RDF-S, and DAML+OIL. W3C Note, http://www.w3.org/TR/2001/NOTE-daml+oil-axioms-20011218

[DAML+OIL] : DAML+OIL (March 2001) Reference Description. http://www.w3.org/TR/daml+oil-reference

[Guha92] Ramanathan Guha: Contexts: a formalization and some applications. Stanford University, Stanford, CA, 1992 http://www-formal.stanford.edu/guha/guha-thesis.ps

[KIF92] Michael R. Genesereth and Richard E. Fikes: Knowledge interchange format, version 3.0, reference manual. Technical Report Logic Group Report Logic-92-1, Stanford University, 1992. http://meta2.stanford.edu/kif/Hypertext/node35.html

[KIF98] : Knowledge Interchange Format. Draft proposed American National Standard (dpANS) NCITS.T2/98-004 http://logic.stanford.edu/kif/dpans.html#Infix

[Lbase] R.V. Guha and Patrick Hayes: LBase: Semantics for Languages of the Semantic Web. Draft http://www.w3.org/2002/06/lbase/

[Liar] Jon Barwise, John Etchemendy: The Liar: An Essay on Truth and Circularity. Oxford University Press; Reprint edition (April 1995)

[LX] Sandro Hawke: LX Logic Exchange. Website http://www.w3.org/2002/08/LX

[Marchiori] Massimo Marchiori and Janne Saarela: Query + Metadata + Logic = Metalog. Proceedings of Query Languages 1998 (QL'98), 1998, W3C/MIT. http://www.w3.org/TandS/QL/QL98/pp/metalog.html and, updated: http://www.w3.org/RDF/Metalog/CIKM-050299.html

[McDermott01] Drew McDermott, Jonathan Borden, Mark Burstein, Douglas Smith, and Richard Waldinger. : A Proposal for Encoding Logic in Rdf/daml. Technical reports, Yale CS, 2001. ftp://ftp.cs.yale.edu/pub/mcdermott/papers/noworry.ps.gz

[McDermott02] Drew V. McDermott, Dejing Dou : Representing Disjunction and Quantifiers in RDF. International Semantic Web Conference 2002: 250-263 http://www.cs.yale.edu/~dvm/papers/McDermottDou02.pdf

[N3] Tim Berners-Lee: Notation 3.Website, http://www.w3.org/DesignIssues/Notation3

[N-Triples] Jan Grant, Dave Beckett, eds: RDF Test Cases, Section 3, N-TriplesW3C Working Draft http://www.w3.org/TR/rdf-testcases/#ntriples

[OTTER94] William McCune: Otter 3.0 User's Guide. Technical Report ANL-94/6, January 1994. http://www-unix.mcs.anl.gov/AR/otter/.

[OWL Semantics] : Web Ontology Language (OWL) Abstract Syntax and Semantics. http://www.w3.org/TR/2002/WD-owl-semantics-20021108/

[Patel-Schneider02] Peter F. Patel-Schneider, Dieter Fensel : Layering the Semantic Web: Problems and Directions. International Semantic Web Conference 2002: 16-29. http://www.cs.vu.nl/~dieter/ftp/paper/layering.pdf

[Perlis85] Don Perlis: Languages with Self-Reference I: Foundations. (or: We Can Have Everything in First-Other Logic!) Artificial Intelligence, 25:301-322, 1985. http://www.cs.umd.edu/projects/active/papers/85/sref-I.ps.Z

[RDF Concepts] Graham Klyne, Jeremy J. Carroll, eds: Resource Description Framework (RDF): Concepts and Abstract SyntaxW3C Working Draft http://www.w3.org/TR/2002/WD-rdf-concepts-20021118/

[RDF Semantics] Patrick Hayes, Editor: RDF Semantics.W3C Working Draft http://www.w3.org/TR/2002/WD-rdf-mt-20021112/

[Sowa93] John Sowa: FOL vs HOL vs MML. E-mail to Interlingua list, 1993, http://www-ksl.stanford.edu/email-archives/interlingua.messages/455.html

[XSB02] Konstantinos Sagonas, Terrance Swift, David S. Warren, et al: The XSB System, Version 2.5. http://xsb.sourceforge.net/