Strawman: RDF-Model-Summary-1

Resource Description Framework: Data Model Summary

RDF Interest Group Discussion Document

This Version:: http://www.w3.org/2000/09/rdfmodel/1 $Date: 2000/09/08 13:45:48 $
Newest Version:: http://www.w3.org/2000/09/rdfmodel/
Editor (butcher ;-):: Dan Brickley <danbri@w3.org>, World Wide Web Consortium / ILRT
Rather crudely derrived from REC-rdf-syntax-19990222, whose editors were:: Ora Lassila <ora.lassila@research.nokia.com>, Nokia Research Center
Ralph R. Swick <swick@w3.org>, World Wide Web Consortium

Status of This Document

Important: this document should not be mistaken for a W3C Specification. The text below was crudely excerpted from the RDF Model and Syntax REC. Unlike that document, this current text is not a complete work.

This document has been prepared (solely as a personal contribution by the editor) as a strawman discussion document for consideration by the RDF Interest Group.

It was produced by taking the RDF Model and Syntax specification and removing most of the content that relates to the RDF 1.0 XML grammar, examples, and other material not directly relevant to the specification of the RDF data model. It should be noted that the initial version of this excerpted 'RDF Model Overview' was not produced with any great attention to detail, and serves solely as a 'proof of concept' or strawman sketch.

RDF implementors health warning: please don't use this work as a reference document, tempting as it may be. The sole use for this is to futher the discussion on www-rdf-interest and refine the RDF Issues List. It should also be emphasised that no commitment to any ongoing maintainance or refinement of this document has been made.

Comments on this discussion document may be sent to <www-rdf-interest@w3.org>, the mailing list of the RDF Interest Group.

Introduction
Basic RDF Model
RDF containers
RDF statements
Formal Model for RDF
Glossary
Appendix: References

1. Introduction

The World Wide Web was originally built for human consumption, and although everything on it is machine-readable, this data is not machine-understandable. It is very hard to automate anything on the Web, and because of the volume of information the Web contains, it is not possible to manage it manually. The solution proposed here is to use metadata to describe the data contained on the Web. Metadata is "data about data" (for example, a library catalog is metadata, since it describes publications) or specifically in the context of this specification "data describing Web resources". The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.

2. Basic RDF Model

The foundation of RDF is a model for representing named properties and property values. The RDF model draws on well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources and an RDF model can therefore resemble an entity-relationship diagram. (More precisely, RDF Schemas — which are themselves instances of RDF data models — are ER diagrams.) In object-oriented design terminology, resources correspond to objects and properties correspond to instance variables.

The RDF data model is a syntax-neutral way of representing RDF expressions. The data model representation is used to evaluate equivalence in meaning. Two RDF expressions are equivalent if and only if their data model representations are the same. This definition of equivalence permits some syntactic variation in expression without altering the meaning. (See Section 6. for additional discussion of string comparison issues.)

The basic data model consists of three object types:

Resources	All things being described by RDF expressions are called resources. A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [ URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable.
Properties	A property is a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties. This document does not address how the characteristics of properties are expressed; for such information, refer to the RDF Schema specification).
Statements	A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object. The object of a statement (i.e., the property value) can be another resource or it can be a literal; i.e., a resource (specified by a URI) or a simple string or other primitive datatype defined by XML. In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor. There are some syntactic restrictions on how markup in literals may be expressed; see Section 2.2.1.

3. Container Model

RDF defines three types of container objects:

Bag	An unordered list of resources or literals. Bags are used to declare that a property has multiple values and that there is no significance to the order in which the values are given. Bag might be used to give a list of part numbers where the order of processing the parts does not matter. Duplicate values are permitted.
Sequence	An ordered list of resources or literals. Sequence is used to declare that a property has multiple values and that the order of the values is significant. Sequence might be used, for example, to preserve an alphabetical ordering of values. Duplicate values are permitted.
Alternative	A list of resources or literals that represent alternatives for the (single) value of a property. Alternative might be used to provide alternative language translations for the title of a work, or to provide a list of Internet mirror sites at which a resource might be found. An application using a property whose value is an Alternative collection is aware that it can choose any one of the items in the list as appropriate.

Note: The definitions of Bag and Sequence explicitly permit duplicate values. RDF does not define a core concept of Set, which would be a Bag with no duplicates, because the RDF core does not mandate an enforcement mechanism in the event of violations of such constraints. Future work layered on the RDF core may define such facilities.

To represent a collection of resources, RDF uses an additional resource that identifies the specific collection (an instance of a collection, in object modeling terminology). This resource must be declared to be an instance of one of the container object types defined above. The type property, defined below, is used to make this declaration. The membership relation between this container resource and the resources that belong in the collection is defined by a set of properties defined expressly for this purpose. These membership properties are named simply "_1", "_2", "_3", etc. Container resources may have other properties in addition to the membership properties and the type property. Any such additional statements describe the container; see Section 3.3, Distributive Referents, for discussion of statements about each of the members themselves.

A common use of containers is as the value of a property. When used in this way, the statement still has a single statement object regardless of the number of members in the container; the container resource itself is the object of the statement.

4. Modeling Statements and Statements about Statements

In addition to making statements about Web resources, RDF can be used for making statements about other RDF statements; we will refer to these as higher-order statements. In order to make a statement about another statement, we actually have to build a model of the original statement; this model is a new resource to which we can attach additional properties.

Statements are made about resources. A model of a statement is the resource we need in order to be able to make new statements (higher-order statements) about the modeled statement.

For example, let us consider the sentence

Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.

RDF would regard this sentence as a fact. If, instead, we write the sentence

Ralph Swick says that Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.

we have said nothing about the resource http://www.w3.org/Home/Lassila; instead, we have expressed a fact about a statement Ralph has made. In order to express this fact to RDF, we have to model the original statement as a resource with four properties. This process is formally called reification in the Knowledge Representation community. A model of a statement is called a reified statement.

To model statements RDF defines the following properties:

subject	The subject property identifies the resource being described by the modeled statement; that is, the value of the subject property is the resource about which the original statement was made (in our example, http://www.w3.org/Home/Lassila).
predicate	The predicate property identifies the original property in the modeled statement. The value of the predicate property is a resource representing the specific property in the original statement (in our example, creator).
object	The object property identifies the property value in the modeled statement. The value of the object property is the object in the original statement (in our example, "Ora Lassila").
type	The value of the type property describes the type of the new resource. All reified statements are instances of RDF:Statement; that is, they have a type property whose object is RDF:Statement. The type property is also used more generally to declare the type of any resource, as was shown in Section 3, "Containers".

A new resource with the above four properties represents the original statement and can both be used as the object of other statements and have additional statements made about it. The resource with these four properties is not a replacement for the original statement, it is a model of the statement. A statement and its corresponding reified statement exist independently in an RDF graph and either may be present without the other. The RDF graph is said to contain the fact given in the statement if and only if the statement is present in the graph, irrespective of whether the corresponding reified statement is present.

To model the example above, we could attach another property to the reified statement (say, "attributedTo") with an appropriate value (in this case, "Ralph Swick").

...

Reification is also needed to represent explicitly in the model the statement grouping implied by Description elements. The RDF graph model does not need a special construct for Descriptions; since Descriptions really are collections of statements, a Bag container is used to indicate that a set of statements came from the same (syntactic) Description. Each statement within a Description is reified and each of the reified statements is a member of the Bag representing that Description. As an example, the RDF fragment

5. Formal Model for RDF

The RDF Model and Syntax specification shows three representations of the data model; as 3-tuples (triples), as a graph, and in XML. These representations have equivalent meaning. The mapping between the representations used in this specification is not intended to constrain in any way the internal representation used by implementations.

The RDF data model is defined formally as follows:

There is a set called Resources.
There is a set called Literals.
There is a subset of Resources called Properties.
There is a set called Statements, each element of which is a triple of the form
{pred, sub, obj}

Where pred is a property (member of Properties), sub is a resource (member of Resources), and obj is either a resource or a literal (member of Literals).

We can view a set of statements (members of Statements) as a directed labeled graph: each resource and literal is a vertex; a triple {p, s, o} is an arc from s to o, labeled by p. This is illustrated in figure 11.

statement graph template D

Figure 11: Simple statement graph template

This can be read either

o is the value of p for s

or (left to right)

s has a property p with a value o

or even

the p of s is o

For example, the sentence

Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila

would be represented graphically as follows:

Simple statement graph D

Figure 12: Simple statement graph

and the corresponding triple (member of Statements) would be

{creator, [http://www.w3.org/Home/Lassila], "Ora Lassila"}

The notation [I] denotes the resource identified by the URI I and quotation marks denote a literal.

Using the triples, we can explain how statements are reified (as introduced in Section 4). Given a statement

{creator, [http://www.w3.org/Home/Lassila], "Ora Lassila"}

we can express the reification of this as a new resource X as follows:

{type, [X], [RDF:Statement]}
{predicate, [X], [creator]}
{subject, [X], [http://www.w3.org/Home/Lassila]}
{object, [X], "Ora Lassila"}

From the standpoint of an RDF processor, facts (that is, statements) are triples that are members of Statements. Therefore, the original statement remains a fact despite it being reified since the triple representing the original statement remains in Statements. We have merely added four more triples.

The property named "type" is defined to provide primitive typing. The formal definition of type is:

There is an element of Properties known as RDF:type.
Members of Statements of the form {RDF:type, sub, obj} must satisfy the following: sub and obj are members of Resources. [RDFSchema] places additional restrictions on the use of type.

Furthermore, the formal specification of reification is:

There is an element of Resources, not contained in Properties, known as RDF:Statement.
There are three elements in Properties known as RDF:predicate, RDF:subject and RDF:object.
Reification of a triple {pred, sub, obj} of Statements is an element r of Resources representing the reified triple and the elements s₁, s₂, s₃, and s₄ of Statements such that
s₁: {RDF:predicate, r, pred}
s₂: {RDF:subject, r, subj}
s₃: {RDF:object, r, obj}
s₄: {RDF:type, r, [RDF:Statement]}

The resource r in the definition above is called the reified statement. When a resource represents a reified statement; that is, it has an RDF:type property with a value of RDF:Statement, then that resource must have exactly one RDF:subject property, one RDF:object property, and one RDF:predicate property.

As described in Section 3, it is frequently necessary to represent a collection of resources or literals; for example to state that a property has an ordered sequence of values. RDF defines three kinds of collections: ordered lists, called Sequences, unordered lists, called Bags, and lists that represent alternatives for the (single) value of a property, called Alternatives.

Formally, these three collection types are defined by:

There are three elements of Resources, not contained in Properties, known as RDF:Seq, RDF:Bag, and RDF:Alt.
There is a subset of Properties corresponding to the ordinals (1, 2, 3, ...) called Ord. We refer to elements of Ord as RDF:_1, RDF:_2, RDF:_3, ...

To represent a collection c, create a triple {RDF:type, c, t} where t is one of the three collection types RDF:Seq, RDF:Bag, or RDF:Alt. The remaining triples {RDF:_1, c, r₁}, ..., {RDF:_n, c, r_n}, ... point to each of the members r_n of the collection. For a single collection resource there may be at most one triple whose predicate is any given element of Ord and the elements of Ord must be used in sequence starting with RDF:_1. For resources that are instances of the RDF:Alt collection type, there must be exactly one triple whose predicate is RDF:_1 and that is the default value for the Alternatives resource (that is, there must always be at least one alternative).

Glossary

Arc: A representation of a property in a graph form; specifically the edges in a directed labeled graph.
Attribute: A characteristic of an object. In Chapter 6 this term refers to a specific XML syntactic construct; the name="value" portions of an XML tag.
Element: As used here, this term refers to a specific XML syntactic construct; i.e., the material between matching XML start and end tags.
Literal: The most primitive value type represented in RDF, typically a string of characters. The content of a literal is not interpreted by RDF itself and may contain additional XML markup. Literals are distinguished from Resources in that the RDF model does not permit literals to be the subject of a statement.
Node: A representation of a resource or a literal in a graph form; specifically, a vertex in a directed labeled graph.
Property: A specific attribute with defined meaning that may be used to describe other resources. A property plus the value of that property for a specific resource is a statement about that resource. A property may define its permitted values as well as the types of resources that may be described with this property.
Resource: An abstract object that represents either a physical object such as a person or a book or a conceptual object such as a color or the class of things that have colors. Web pages are usually considered to be physical objects, but the distinction between physical and conceptual or abstract objects is not important to RDF. A resource can also be a component of a larger object; for example, a resource can represent a specific person's left hand or a specific paragraph out of a document. As used in this specification, the term resource refers to the whole of an object if the URI does not contain a fragment (anchor) id or to the specific subunit named by the fragment or anchor id.
Statement: An expression following a specified grammar that names a specific resource, a specific property (attribute), and gives the value of that property for that resource. More specifically here, an RDF statement is a statement using the RDF/XML grammar specified in this document.
Triple: A representation of a statement used by RDF, consisting of just the property, the resource identifier, and the property value in that order.

Appendix: References

[Dexter94]: F. Halasz and M. Schwarz. The Dexter Hypertext Reference Model. Communications of the ACM, 37(2):30--39, February 1994. Edited by K. Grønbæck and R. Trigg. http://www.acm.org/pubs/citations/journals/cacm/1994-37-2/p30-halasz/
[HTML]: HTML 4.0 Specification, Raggett, Le Hors, Jacobs eds, World Wide Web Consortium Recommendation; http://www.w3.org/TR/REC-html40
[ISO10646]: ISO/IEC 10646. The applicable version of this standard is defined in the XML specification [XML].
[NAMESPACES]: Namespaces in XML; Bray, Hollander, Layman eds, World Wide Web Consortium Recommendation; http://www.w3.org/TR/1999/REC-xml-names-19990114.
[PICS]: PICS Label Distribution Label Syntax and Communication Protocols, Version 1.1, W3C Recommendation 31-October-96; http://www.w3.org/TR/REC-PICS-labels.
[RDFSchema]: Resource Description Framework (RDF) Schemas; Brickley, Guha, Layman eds., World Wide Web Consortium Working Draft; http://www.w3.org/TR/1998/WD-rdf-schema
[RFC2119]: Key words for use in RFCs to Indicate Requirement Levels; S. Bradner, March 1997; RFC2119.
[Unicode]: The Unicode Standard. The applicable version of this standard is the version defined by the XML specification [XML].
[URI]: Uniform Resource Identifiers (URI): Generic Syntax; Berners-Lee, Fielding, Masinter, Internet Draft Standard August, 1998; RFC2396.
[XML]: Extensible Markup Language (XML) 1.0; World Wide Web Consortium Recommendation; http://www.w3.org/TR/REC-xml.
[XMLinHTML]: XML in HTML Meeting Report; Connolly, Wood eds.; World Wide Web Consortium Note; http://www.w3.org/TR/NOTE-xh.