Why RDF? Schema and ontology languages for RDF and XML

by Dan Brickley, $Id: schemarama.html,v 1.1 2002/03/13 18:44:06 danbri Exp $

Status of this document

I wanted to write something on the role of Web ontology languages in a world that already has XML DTDs, a dozen XML schema languages, RDF and RDF Schema. This is as far as I got, so I thought I'd put it online and maybe finish it later if anyone liked it. Ended up being mostly a broad brush account of RDF for an XML-ish audience. I wrote it as a single huge paragraph in about half an hour, and haven't polished it or tweaked it much since. -- Dan

Introduction

W3C's Resource Description Framework (RDF) provides a simple graph-based information model for Web applications that need to exchange data in a flexible yet predictable way. Since RDF data is typically encoded and exchanged using XML documents, RDF in effect provides a set of restrictions or design conventions for well-formed XML documents. XML provides use with the notion of both wellformed and valid documents. A valid document conforms to some specific DTD (or schema), whereas a merely 'well formed' document is constrained only to have the basic elements, attributes and content structure shared by all XML documents.

RDF can be thought of as an attempt to find a middle ground between the strict notion of 'conformance to a named schema' and the much weaker 'unconstrained tag soup'. RDF is particularly suited for the deployment of XML in the World Wide Web[@ref], since the distributed nature of the Web often requires us to mix information from multiple sources and applications within a single document. While the notion of 'wellformedeness' and XML Namespaces mechanism provides the basic infrastructure for mixed-namespace XML markup, RDF supplies a much needed set of constraints and conventions for using these. RDF can be thought of as offering 'design patterns' for creators of mixed-namespace documents. By writing mixed-namespace XML in the style specified by the RDF Syntax recommendation, we reduce some of the unpredictability and variation associated with the unconstrained use of well-formed XML. A wellformed XML document that is written in conformance with the RDF syntax is understood to encode an edge-labeled directed graph, whose nodes and edges may be labeled with Web resource identifiers (URIs).

One consequence of this is that RDF/XML falls in the camp of 'data oriented' rather than 'document oriented' XML markup. This greatly simplifies storage, indexing and query requirements since the RDF/XML spec spells out the few cases in which ordering is significant. In general, the node/edge/node triples that constitute an RDF graph can be considered without preserving information about the order in which they were encoded in some actual RDF/XML document. RDF parsers thus throw away a lot of information when converting RDF/XML into the abstract data graph. As a graph structure, RDF has great utility even without knowledge of the specific vocabularies used. While these vocabularies (as documented in RDF schemas associated with namespaces) can provide useful meta-information to support more sophisticated applications, RDF was designed to be useful in the absence of schema information. This characteristic of RDF can be thought of as a restriction of XML's notion of well-formedness, and as embodying the notion of 'semi-structured' or schema-less data.

If this is the case, ie. if we can use RDF query, storage and API without need for RDF schemas, what use is RDF schema? And what use might we make of more sophisticated extensions to RDF schema such as DAML+OIL or W3C's new Web Ontology language? RDF Schema provides a rather minimalistic basis for RDF vocabulary description. RDF Schema documents are written in RDF/XML, and provide an RDF description of the components of an RDF vocabulary. The RDF information model consists of little more than the notion of edge-labeled graphs whose nodes and edges may be identified through RFC2396 URI-references.

RDF additionally makes the stronger claim that these graph data structures are not arbitrary computational constructs, but encodings of claims about the world. An RDF document is the kind of thing that (in some context) can be true or false. When dealing with RDF, as a consequence of this, we often deal with two parallel sets of terminology. Considered as a graph data structure, we talk of 'nodes and arcs' (or edges); considered as a representational formalism that makes claims about the world, we talk of 'classes' of 'resource' and their 'properties'. When we hear that RDF is supposed to be 'semantic', or 'meaningful', it is related to this second set of terminology, and to the goal that RDF documents should be considered as 'saying things about the world'.

RDF is grounded in the world primarily in two ways: by the use of URI references to associate parts of the RDF graph with the things they represent or denote, and by the formal semantics of the vocabularies (RDF schemas) used in some particular graph. RDF Schema, as mentioned earlier, is based around the practice of describing RDF vocabularies using RDF. It adopts the perspective that the core concepts of RDF description ('class', 'property' etc.) are just more things that we might describe in RDF. Specifically, it allows a vocabulary designer to create RDF/XML descriptions of the classes and properties they use to model some particular target domain (eg. people, documents, calendars, workflow...).

This is where RDF shows some added value in comparision to traditional DTD-based approaches in the XML world. An XML schema or DTD will say exactly what can or can't be written within a certain class of XML document. RDF schema, by contrast, focusses on the types of thing such a document might describe, and on the types of properties (and relationships) such a description might want to call upon. RDF Schema 1.0 provides a way of defining classes, and relationship types (properties). It also lets us say that some property "makes sense" in the context of specific named classes. For example, that 'author' is a relationship between books (or 'works') and people (or perhaps 'agents'). RDF Schema says "there are resources and relationships; we name categories of resource, and categories of relationship so we can say more about them.

What more might we want to say about resource types and relationship types in a Web environment?

This is where Web ontology languages come in...

To be continued...?