The RDF Model, Rearticulated

Status

Draft, awaiting comments. (What issues are not addressed? What might not work?)

Purpose

The RDF model is a simple model of communication where entities express knowledge as three-part sentences using symbolic terms. It is chararacterized by its minimalist design which is intended to simplify working with meta-information.

This document specifies a standard method of communicating which serves as a foundation for various vocabulary and syntax standards. Only issues necessary for interoperation of these higher layers are intended to be addressed here.

The Model

Identifiers

We use the RDF model to communicate about objects in some open universe of discourse. The objects are identified by URI-References, which are character strings withsyntax and semantics defined in RFC 2396. The semantics (the mapping from strings to objects) are delegated by that standard to other standards and organizations.

Identifiers which are mapped to objects understood by all communicating parties are "terms" in a "vocabulary." It is often useful to publish some of formal assertions (a schema or ontology) about the objects identified by the terms in a vocabulary.

Sentences

An RDF sentence is a collection of three identifiers fulfilling distinct roles. The roles have different names for different areas of application. A sentence may be considered a sequence, but the ordering of the roles is also dependent on the area of application. Particular syntaxes may specify terminology and ordering.

Ordering English Language first second third
Formal Logic second first third
Terminology Natural Language subject predicate object
Logic first term binary relation second term
Directed Graph from node label to node
Programming Systems object property, attribute,
key, index
value
Functional argument function value

Semantics

An RDF sentence is said to be "true" if the object identified as its predicate is a relationship which exists between the objects identified as its subject and object, respectively. The meaning of a true sentence therefore depends entirely on the mapping of its identifiers to objects and on the meaning of the existence of the given relationship.

Operation

The fundamental operation in the RDF Model is the speech act of transmitting an RDF sentence, which asserts its truth for the sender at the moment transmission begins.

Syntaxes

Various RDF syntaxes are possible. Each syntax defines a mapping between a sequence of symbols in some alphabet (such as bits, bytes, or unicode characters) and unordered set of RDF sentences.

Guidance for Using the Model

Literals

Like all other objects, character strings must be identified by URI-References when used with RDF. The URI specification tree defines a data: URI scheme for this, although it need not be the only part of URI-Reference space which maps to character strings.

It is also reasonable to talk about character strings without mapping them to identifiers. In fact, doing so can make it possible to talk about the parts of a string without decomposing the identifiers, a process which can complicate formal semantics.

An RDF syntax may of course provide convenient ways to identify commonly used objects, such as numbers, but their mapping into the RDF sentences depends on some vocabulary.

Creating Identifiers

Some parts of the space of possible URI-References currently have no defined mapping to objects in any universe of discourse. For instance, http:foo#tag identifiers have no defined mapping when http:foo identifies an object of media type "text/plain" (or many other media types). It seems logical that whoever publishes http:foo should have the authority to define the mapping for such objects, but conceivably the IETF could dictate otherwise at some point in the future.

Some other parts of the space of possible strings have been defined for delegated identifier creation, not based on web publishing, such as the urn:oid:, tann:, and tag: schemes. None of these are internet standards (or on track to become so, as of this writing).

Maybe this document should simply override the URI definition on some of these issues? It can do that, with the potential problem of conflicting semantically with some possible future IETF work.

Anonymous Objects

Some RDF syntaxes provide a way to indicate which sentences are true without syntactically requiring URI-References for every field. These are sometimes called "anonymous resources" and are conventionally handled by the parser simply making up new identifiers. These are two drawbacks to this technique:

  1. It means that the syntax->table->syntax roundtrip can make the expression considerably harder to read. This problem should be addressed by having syntaxes map essentially 1-1 to sets of sentences, if necessary using some vocabulary for expressing presentation issues separate from fundamental content.

  2. You can't parse the syntax the same way when using it as a query (where anonymous nodes are logically taken as existential variables). This can be addressed by having a parser have two modes (declaration and query), or by using some of the presentation information from (1) in constructing a query (in some query vocabulary).

Identification of Sentences

Sentences may be identified by identifying their three identifier strings in their roles. To have a URI-Reference identifier for a sentence requires some higher-level mechanism such as a defined URI syntax for combining three URI-References into one, or a vocabulary for describing sentences themselves (reification).

Context Dependency

With certain vocabularies, the truth of a sentence is relative to facts about the act of its transmission, such as the identity of the sender and the time transmission begins. The use of such context-dependent vocabulary terms shifts complexity from the sentences themselves to the systems receiving the sentences. While this is appropriate for many applications, it is strongly recommended that all context dependent vocabulary definitions include a formal mapping to a context-independent vocabulary.

This also applies to state changes and logical non-monotonicity. If a sentence could be true at some time and not true at some other time, then it is using a context-dependent vocabulary, and a mapping to a context-independent vocabulary should be provided.

Remaining Issues (To Do)


Sandro Hawke
$Date: 2001/03/11 19:47:59 $