Tim Berners-Lee
Date: @@@@, last change: $Date$
Status: personal view only. Editing status: first draft.

Semantic Web Layering

Introduction

The Semantic web is an abstract space of data expressed using interoperable standards and Internet protocols, so as to allow machine processing. It is an application of the Worl Wide Web. WWW is an abstract space of information made by teh common use of the URI specification, and by the specifications of Internet protocols includeing HTTP, HTML, XML and other standards. The semantic Web, like the WWW used for human readable information (HW), is a web of documents, but unlike the HW those document contain machine-processable data rather than human-centric multimedia information resources. The SW, like the HW, is a contains references between documents, but unlike the HW, those references are abstract references to concpets shared between documents. The SW and HW are alike in that in many cases they allow information about somethiung to be retreived by looking up its identifier using standard protocols, and they are alike in that they rely on the specifications of those protocols being shared by all using the web.

The initial roadmap [RM] called for several layers: a data layer which could be employed without knowledge of higher layers. These layers would extend the Semwnatic Web language expressing ontological relationships, processing rules. It also suggested that an overaching logical framework would be useful which would allow interchange between different rule systems.

The initial RDF specification provides a very simple triple-based structure for information, which allows list-oriented or table-oriented data to be expressed in a common structure. (The RDF specification also provides other things such as "Bags" which we will ignore here).

An essential aspect of RDF is that every part of a triple is defined by a URI, an identifier in the WWW system. This makes it a web language, in uses not only globally unique identifiers, but the specific identifiers which have properties already defined by otehr W3C specifications. One of the features provided by URIs is one essential to the WWW and to the Semantic Web the ability to create a dereferencable URI, one will alow any system whcih comes across it to be able to, network and server avaialbility, to retreive a representation of that identified. Therefore RDF works as part of the Web suite of protocols in paralle with the HTTP protocol: that RDF may be used in a document to express some proeties of an object, but HTTP explains how one might find such an RDF document given the identifier. This is only one example of the value of the two specifications sharing the URI space, and it may be considered trivial by network engineers. However, concept has been difficult to accept by some logicians not accustomers to formalizing their own systems as part of such open-ended weblike systems.

RDF Philosophy

While the RDF 1.0 specification not very explicit about the philoophy, the author understands that, when RDF is document is a communication between two parties

that the import of the commication is the independent statement (logical conjunction) of the import of each triple. and
the import of each triple is defined by the specification of the predicate part of the triple.

The values of the other parts, subject and object, are parameters to the definition of the predicate. For example, if the triple, in subject, predictae, object order, is <mycar> <color> <blue> where each is a URI, then the specfication of <color> defines the meaning. It may for example, say that the meaning if that the subject has a color identifies as the object.

Many people assume this basic philosophy, I feel, without realizing it. It is accepted that you can remove a triple from a RDF document and leave a valid RDF document This follows from the semantics of the whole being the conjunction of the semantics of the parts. Similarly, they would consider it impossible to define a new thing "loud" such that any thing which had this <color> made a loud noise. This would be the spec of the object overriding the specification of the predicate. The predicate implies that the object must be a color, and "loud" is not a color. The triple is conradictory: wrong. This clarity in RDF may be unexpected, as for example it does not exist in XML, where two new attributes in a new namespaces can be defined to fight each other, theyr specs defining contradictory things, and the language providing no guidance for resolving the conflict between the two specifications.

The way that specifications are written on the internet is the stuff of normal engineering practice, and one would not expect to need elaboration. The way identifiers on the Web can point to documents by their identifiers, similarly, is naturally understood my many web engineers. Similarly, engineers have for year been used to specifications in which english is used to define terms, and math to express some of the constraints but not all of the meaning. However, the fact is that some of those from the knowledge representation community have found it unacceptable to work in a world with any outside connections. They have found it impossibl to live in a world in which any more than the very basic axioms are defined in english (and even that is often glossed over). Some have found the whole architecture of the web in its functioning by the specification of new terms in successive specifications to be "nonsense". Therefore, it it may be necessary to formalize this aspect of the web before there can more general underatanding.

The need for layering

RDF Schema (RDFS) defines a few terms on top of RDF for very common and useful concepts. That is, some URIs are given whose meaning is defined in the RDFS specification. One term is (using conventional shorthand) rdfs:type. This acts as a gateway between binary relations (Properties, which are used in the predicate position) and unary preciates (Classes, which are used in the object positioin with rdf:type in the predicate position). RDFS provides simple notion of class, including a subclass relation, and it produced a subproperty, range and domain properties provides a rather minimal system allowing classification of things and inheritance of properties.

It is minimalist, in the sense that there is no constraint that a class not be a member of a class, or that a class not also be property. So while these protections which are made in many practical systems, the door is left open for more constrained practical systems to be developed.

The RDF schema langauge as it is provides in fact useful facilities lacks many features which typical systems need to work in practice. A simple example is the ability to declare a property as being "unambiguous", which would allow a system to conclude that two concepts described in different parts of the web are in fact the same. This is one of the most fundemantal operations in a semantic web in which anyone can write anything about anything, and it is often only later one relize that the whole. The most fundamental piece of metadata stored with a database is that a column is a primary key - an umabiguous property whose values can therefore be used to identify rows. However, the idea of throwing out RDF and inventing a new incompatible system for these new features would have gone against the whol idea of interoperablility and a single semantic web.

The assumption was that the ontology community which had used, with description logics, a rather larger vocabulary for describing relations, had probably experience to provide apporpriate features. The Ontology work in W3C was started as DARPA funded group developing what DARPA called DAML[], but which was sufficiently close to OIL that it was merged to become DAML+OIL. The language was conciously taken as a starting point by the W3C WebOnt activity.

The Peter Patel-Schneider Problem

Unfortunately, the layering between the DAML+OIL langauge, under certain assumptions, and RDF, didn't work. A paper [] details the fact that a combination of the RDF and DML axioms, under certian assumptions, cause a contradiction to be derivable from those axioms. The existence of this paradox, concludes the paper, is that the layering concept was flawed and the semantic web roadmap was flawed: that the ontology language should be developed independently of RDF. The paper conclused with a picture tower of Bable, perhaps in advocacy for a multitude of non-interoperable languages, rather than a consistent semantic web. This is generally known as the Peter Patel-Schneider Problem.

So at the time of writing, this problem threatens development of many layers of the semantic web. The RDF Core working group has been forced to look at the problem, and the ideas of "Dark Triples" have been raised. Though is author was unable to find an original definition of dark triples, from the various opnions expressed [mailinglist] it appears that this is some atempt to make some RDF data simply not apply to OWL, to not exist for those using the OWL language. A working draft Abstract Syntax for OWL concludes that OWL can be expressed in RDF but that there must be "no other RDF triples". This seems to contravene the spirit of

The peter PS Paradox is reproduced here:

@@@

The assumption had been that (a) new

The PPS problem

The reason for the paradox

Possible solution directions

on paradoxes in general ... you can't have self-reference and negation. Paradoxes come in many forms, but strong alaogies exist between for example the lair paradox ("This sentence is false") and the Russel Paradox (Is the class of all classes which are not members of themselves a member of itself or not?). The paradox arises from the attempts to meet the needs of RDF(S), which includes the rather self-referential characteristics that classes can contain classes, and properties can have properties, and so on, with the eneds of ontology sysetms which have negation in various forms, including cardinality constraints and disjoint classes.

To meet the latter at the expense of the former is described in [Abstract Syntax]. OWL becomes a separate langauge, with some surface similarity to RDF but no interchange of information between RDFS and OWL. Alas there is no axiomatic semantics given in the abstract syntax document, but as the langauge is close to DAML+OIL, one would assume that the various incomplete daml+oil semantics could be used.

To preserve the former at the expense of the latter, one would remove from the language those (many) features which allow forms of negation

cardinality
disjoint classes, but also
The arbitrary distinction between individual properties and datatype properties

but one would maybe be able to keep most of:

unambiguous (inverse functional) and unique (functional) properties
transitivity, inverse
restrictions (not using the things thrown out)
oneOf and lists

and put back in for example

daml:equivalentTo
rdf:property

It is also possible to keep the concepts, but remove the axioms which allow the generation of them. In othr words, the complement of a class could be kept in terms of its axiomatic semantics, without keeping the assumption that every class has a complement.

Is there any reaon to prevent anyone from continuing to use rdf:Property?

Worries with abstract syntax

Division between data and indiv properties. This points to a whole difference in assumptions about what things mean. I has asssuemd i could write :weight = "5". Use URIs anywhere that variable are. myconf: conf:start <http://conf.com/start#time>.
use of rdfs:domain with datatypes. A data type is a relationship between a representational string and an abstract value. This datatype model in the AS doesn't work. For examle, to say p has range decimal allwos "10" as a value, which would be allowed by binary too. What you want to say is how the string should be interpreted.

{?c a owl:Class} => {[owl:complement ?c] a owl:Class}.

{?c1 owl:complement ?c2} => { ?c2 owl:complement ?c1}.

{?x owl:type[owl:complement

Conclusion

The OWL system creates a subset of the semantic web in which restrictions are made, but within which one can do certian operations such as determining consistency of a dataset in polynomial time. The concepts used by OWL are in many cases subclasses and subProperties of general concepts which are well defined and useful in the wider web. In these cases, OWL should use the

This in 2002/07 was the case with rdf:type, but not with UnambiguousProperty, UniqueProperty, TransitiveProperty. equivalentTo. These should be restored.

References

Patel-Schneider, petel, et al., (@@The PPS paradox paper), unfortunately a paper publication not available online.

@@ pointers from Jos

message re the above

Abstract syntax for OWL paper

Up to Design Issues

Tim BL