Copyright © 1997,1998,1999,2000 W3C (MIT, INRIA, Keio ), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
Important: this document should not be mistaken for a W3C Specification. The text below was crudely excerpted from the RDF Model and Syntax REC. Unlike that document, this current text is not a complete work.
This document has been prepared (solely as a personal contribution by the editor) as a strawman discussion document for consideration by the RDF Interest Group.
It was produced by taking the RDF Model and Syntax specification and removing most of the content that relates to the RDF 1.0 XML grammar, examples, and other material not directly relevant to the specification of the RDF data model. It should be noted that the initial version of this excerpted 'RDF Model Overview' was not produced with any great attention to detail, and serves solely as a 'proof of concept' or strawman sketch.
RDF implementors health warning: please don't use this work as a reference document, tempting as it may be. The sole use for this is to futher the discussion on www-rdf-interest and refine the RDF Issues List. It should also be emphasised that no commitment to any ongoing maintainance or refinement of this document has been made.
Comments on this discussion document may be sent to <www-rdf-interest@w3.org>, the mailing list of the RDF Interest Group.
The World Wide Web was originally built for human consumption, and although everything on it is machine-readable, this data is not machine-understandable. It is very hard to automate anything on the Web, and because of the volume of information the Web contains, it is not possible to manage it manually. The solution proposed here is to use metadata to describe the data contained on the Web. Metadata is "data about data" (for example, a library catalog is metadata, since it describes publications) or specifically in the context of this specification "data describing Web resources". The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.
The foundation of RDF is a model for representing named properties and property values. The RDF model draws on well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources and an RDF model can therefore resemble an entity-relationship diagram. (More precisely, RDF Schemas — which are themselves instances of RDF data models — are ER diagrams.) In object-oriented design terminology, resources correspond to objects and properties correspond to instance variables.
The RDF data model is a syntax-neutral way of representing RDF expressions. The data model representation is used to evaluate equivalence in meaning. Two RDF expressions are equivalent if and only if their data model representations are the same. This definition of equivalence permits some syntactic variation in expression without altering the meaning. (See Section 6. for additional discussion of string comparison issues.)
The basic data model consists of three object types:
Resources | All things being described by RDF expressions are called resources. A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [ URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable. |
Properties | A property is a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties. This document does not address how the characteristics of properties are expressed; for such information, refer to the RDF Schema specification). |
Statements | A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object. The object of a statement (i.e., the property value) can be another resource or it can be a literal; i.e., a resource (specified by a URI) or a simple string or other primitive datatype defined by XML. In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor. There are some syntactic restrictions on how markup in literals may be expressed; see Section 2.2.1. |
RDF defines three types of container objects:
Note: The definitions of Bag and Sequence explicitly permit duplicate values. RDF does not define a core concept of Set, which would be a Bag with no duplicates, because the RDF core does not mandate an enforcement mechanism in the event of violations of such constraints. Future work layered on the RDF core may define such facilities.
To represent a collection of resources, RDF uses an additional resource that identifies the specific collection (an instance of a collection, in object modeling terminology). This resource must be declared to be an instance of one of the container object types defined above. The type property, defined below, is used to make this declaration. The membership relation between this container resource and the resources that belong in the collection is defined by a set of properties defined expressly for this purpose. These membership properties are named simply "_1", "_2", "_3", etc. Container resources may have other properties in addition to the membership properties and the type property. Any such additional statements describe the container; see Section 3.3, Distributive Referents, for discussion of statements about each of the members themselves.
A common use of containers is as the value of a property. When used in this way, the statement still has a single statement object regardless of the number of members in the container; the container resource itself is the object of the statement.
In addition to making statements about Web resources, RDF can be used for making statements about other RDF statements; we will refer to these as higher-order statements. In order to make a statement about another statement, we actually have to build a model of the original statement; this model is a new resource to which we can attach additional properties.
Statements are made about resources. A model of a statement is the resource we need in order to be able to make new statements (higher-order statements) about the modeled statement.
For example, let us consider the sentence
Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.
RDF would regard this sentence as a fact. If, instead, we write the sentence
Ralph Swick says that Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.
we have said nothing about the resource http://www.w3.org/Home/Lassila; instead, we have expressed a fact about a statement Ralph has made. In order to express this fact to RDF, we have to model the original statement as a resource with four properties. This process is formally called reification in the Knowledge Representation community. A model of a statement is called a reified statement.
To model statements RDF defines the following properties:
A new resource with the above four properties represents the original statement and can both be used as the object of other statements and have additional statements made about it. The resource with these four properties is not a replacement for the original statement, it is a model of the statement. A statement and its corresponding reified statement exist independently in an RDF graph and either may be present without the other. The RDF graph is said to contain the fact given in the statement if and only if the statement is present in the graph, irrespective of whether the corresponding reified statement is present.
To model the example above, we could attach another property to the reified statement (say, "attributedTo") with an appropriate value (in this case, "Ralph Swick").
...
Reification is also needed to represent explicitly in the model the statement grouping implied by Description elements. The RDF graph model does not need a special construct for Descriptions; since Descriptions really are collections of statements, a Bag container is used to indicate that a set of statements came from the same (syntactic) Description. Each statement within a Description is reified and each of the reified statements is a member of the Bag representing that Description. As an example, the RDF fragment
The RDF Model and Syntax specification shows three representations of the data model; as 3-tuples (triples), as a graph, and in XML. These representations have equivalent meaning. The mapping between the representations used in this specification is not intended to constrain in any way the internal representation used by implementations.
The RDF data model is defined formally as follows:
We can view a set of statements (members of Statements) as a directed labeled graph: each resource and literal is a vertex; a triple {p, s, o} is an arc from s to o, labeled by p. This is illustrated in figure 11.
Figure 11: Simple statement graph template
This can be read either
o is the value of p for s
or (left to right)
s has a property p with a value o
or even
the p of s is o
For example, the sentence
Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila
would be represented graphically as follows:
Figure 12: Simple statement graph
and the corresponding triple (member of Statements) would be
{creator, [http://www.w3.org/Home/Lassila], "Ora Lassila"}
The notation [I] denotes the resource identified by the URI I and quotation marks denote a literal.
Using the triples, we can explain how statements are reified (as introduced in Section 4). Given a statement
{creator, [http://www.w3.org/Home/Lassila], "Ora Lassila"}
we can express the reification of this as a new resource X as follows:
{type, [X], [RDF:Statement]}
{predicate, [X], [creator]}
{subject, [X], [http://www.w3.org/Home/Lassila]}
{object, [X], "Ora Lassila"}
From the standpoint of an RDF processor, facts (that is, statements) are triples that are members of Statements. Therefore, the original statement remains a fact despite it being reified since the triple representing the original statement remains in Statements. We have merely added four more triples.
The property named "type" is defined to provide primitive typing. The formal definition of type is:
|
Furthermore, the formal specification of reification is:
|
The resource r in the definition above is called the reified statement. When a resource represents a reified statement; that is, it has an RDF:type property with a value of RDF:Statement, then that resource must have exactly one RDF:subject property, one RDF:object property, and one RDF:predicate property.
As described in Section 3, it is frequently necessary to represent a collection of resources or literals; for example to state that a property has an ordered sequence of values. RDF defines three kinds of collections: ordered lists, called Sequences, unordered lists, called Bags, and lists that represent alternatives for the (single) value of a property, called Alternatives.
Formally, these three collection types are defined by:
|
To represent a collection c, create a triple {RDF:type, c, t} where t is one of the three collection types RDF:Seq, RDF:Bag, or RDF:Alt. The remaining triples {RDF:_1, c, r1}, ..., {RDF:_n, c, rn}, ... point to each of the members rn of the collection. For a single collection resource there may be at most one triple whose predicate is any given element of Ord and the elements of Ord must be used in sequence starting with RDF:_1. For resources that are instances of the RDF:Alt collection type, there must be exactly one triple whose predicate is RDF:_1 and that is the default value for the Alternatives resource (that is, there must always be at least one alternative).