W3C

RDF Primer

W3C Editor's Draft 24 November 2002

This version:
http://www.w3.org/TR/2002/WD-rdf-primer-20021124/
Latest version:
http://www.w3.org/TR/rdf-primer/
Previous version:
http://www.w3.org/TR/2002/WD-rdf-primer-20020426/
Editors:
Frank Manola, The MITRE Corporation, fmanola@mitre.org
Eric Miller, W3C, em@w3.org
Series Editor:
Brian McBride, Hewlett-Packard Laboratories, bwm@hplb.hpl.hp.com

Abstract

The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning.

This Primer is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents.

Status of this Document

This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity. This document incorporates material developed by the Working Group designed to provide the reader with the basic knowledge required to effectively use RDF in their particular applications.

This document is being released for review by W3C members and other interested parties to encourage feedback and comments. This is the current state of an ongoing work on the Primer.

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use it as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

In conformance with W3C policy requirements, known patent and IPR constraints associated with this Working Draft are detailed on the RDF Core Working Group Patent Disclosure page.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

Table of Contents

  1. Introduction
  2. Making Statements About Resources
      2.3 The RDF Model
      2.4 Structured Property Values and Blank Nodes
      2.5 Typed Literals
      2.6 Concepts Summary
  3. An XML Syntax for RDF: RDF/XML
      3.1 Basic Principles
      3.2 Defining New RDF Resources
      3.3 RDF/XML Summary
  4. Other RDF Capabilities
      4.1 RDF Containers
      4.2 RDF Collections
      4.3 RDF Reification
      4.4 Miscellaneous RDF Facilities
            4.4.1 More on Structured Values: rdf:value
  5. Defining RDF Vocabularies: RDF Schema
      5.1 Defining Classes
      5.2 Defining Properties
      5.3 Interpreting RDF Schema Declarations
      5.4 Other Schema Information
      5.5 Richer Schema Languages
  6. Some RDF Applications: RDF in the Field
      6.1 Dublin Core Metadata Initiative
      6.2 PRISM
      6.3 XPackage
      6.4 RSS 1.0: RDF Site Summary
      6.5 CIM/XML
      6.6 Gene Ontology Consortium
  7. Other Parts of the RDF Specification
      7.1 RDF Semantics
      7.2 Test Cases
  8. References
      8.1 Normative References
      8.2 Informational References
  9. Acknowledgments

Appendices

  A. Uniform Resource Identifiers (URIs) Survival Guide
  B. Extensible Markup Language (XML) Survival Guide
  C. Changes


1. Introduction

The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. Examples include information about items available from online shopping facilities (e.g., information about specifications, prices, and availability), or the description of a Web user's preferences for information delivery.

RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created.

RDF is based on the idea of identifying things using Web identifiers (URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values. To make this discussion somewhat more concrete as soon as possible, the group of statements "there is someone whose name is Eric Miller, whose email address is em@w3.org, and whose title is Dr." could be represented as the RDF graph in Figure 0:

An RDF Graph
     Describing Eric Miller
Figure 0: An RDF Graph Describing Eric Miller

Figure 0 illustrates that RDF uses URIs to identify:

RDF also provides an XML-based syntax (called RDF/XML) for recording and exchanging these graphs. The following is a small chunk of RDF in RDF/XML corresponding to the graph in Figure 0:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns="http://www.w3.org/2000/10/swap/pim/contact#">

  <Person rdf:about="http://www.w3.org/People/EM/contact#me">
    <fullName>Eric Miller</fullName>
    <mailbox rdf:resource="mailto:em@w3.org"/>
    <personalTitle>Dr.</personalTitle> 
  </Person>

</rdf:RDF>

Note that this RDF/XML also contains URIs, as well as properties like mailbox and fullName (in an abbreviated form), and their respective values em@w3.org, and Eric Miller.

Like HTML, this RDF/XML is machine processable, and, using URIs, can link pieces of data across the Web. However, unlike conventional hypertext, RDF URIs can refer to any identifiable thing, including things that may not be directly retrievable on the Web (such as the person Eric Miller). The result is that in addition to describing such things as Web pages, we can also describe cars, businesses, people, news events, etc. In addition, RDF properties themselves have URIs, to precisely identify the kind of relationship that exists between the linked items.

The following documents contribute to the specification of RDF:

This Primer is intended to augment these other documents, to help information system designers and application developers understand the features of RDF and how to use them. In particular, the Primer is intended to answer such questions as:

  • What does RDF look like?
  • What information can RDF represent?
  • How is RDF information created, accessed, and processed?
  • How can existing information be combined with RDF?

The Primer is a non-normative document, which means that it does not provide a definitive specification of RDF. The examples and other explanatory material in the Primer are provided to help you understand RDF, but they may not always provide definitive or fully-complete answers. In such cases, you should refer to the relevant normative parts of the RDF specification. To help you do this, we provide links pointing to the relevant parts of the normative specifications.

2. Making Statements About Resources

RDF is intended to provide a simple way to state properties of (make assertions about) Web resources, e.g., Web pages. For example, imagine that we want to state the fact that someone named John Smith created a particular Web page. A straightforward way to state this in English would be in the form of a simple statement such as:

http://www.example.org/index.html has a creator whose value is John Smith

We've underlined parts of this statement to illustrate that, in order to describe the properties of something, we need ways to name, or identify, a number of things:

  • We need a way to identify the thing we want to describe (the Web page, in this case)
  • We need a way to identify a specific property (creator, in this case) of the thing that we want to describe
  • We need a way to identify the thing we want to assign as the value of this property (who the creator is), for the thing we want to describe

In this statement, we've used the Web page's URL (Uniform Resource Locator) to identify it. In addition, we've used the word "creator" to identify the property we want to talk about, and the two words "John Smith" to identify the thing (a person) we want to say is the value of this property.

We could state other properties of this Web page by writing additional English statements of the same general form, using the URL to identify the page, and words (or other expressions) to identify the properties and their values. For example, to specify the date the page was created, and the language in which the page is written, we could write the additional statements:

http://www.example.org/index.html has a creation-date whose value is August 16, 1999
http://www.example.org/index.html has a language whose value is English

RDF is based on the idea that the things we want to describe have properties which have values, and that resources can be described by making statements, similar to those above, that specify those properties and values. RDF uses a particular terminology for talking about the various parts of statements. Specifically, the part that identifies the thing the statement is about (the Web page in this example) is called the subject. The part that identifies the property or characteristic of the subject that the statement specifies (creator, creation-date, or language in these examples) is called the predicate, and the part that identifies the value of that property is called the object. So, taking the English statement

http://www.example.org/index.html has a creator whose value is John Smith

the RDF terms for the various parts of the statement are:

  • the subject is the URL http://www.example.org/index.html
  • the predicate is the word "creator"
  • the object is the words "John Smith"

However, while English is good for communicating between (English-speaking) humans, RDF is about making machine-processable statements. To make these kinds of statements suitable for processing by machines, we need two things:

  • a system of machine-processable identifiers that allows us to identify a subject, predicate, or object in a statement without any possibility of confusion with a similar-looking identifier that might be used by someone else on the Web.
  • a machine-processable language for representing these statements and exchanging them between machines.

Fortunately, the existing Web architecture provides us with both of the necessary mechanisms. The Web's Uniform Resource Identifier (URI) provides us with a way to uniquely identify anything we want to talk about in an RDF statement, and the Extensible Markup Language (XML) provides us with a format for representing and exchanging RDF statements. The next two sections briefly describe these mechanisms.

@@Now that Secs 2.1 and 2.2 are appendices, need a smooth segue here. In particular, need to briefly introduce URIrefs and QNames.@@

@@Mention that full discussion of RDF URIrefs (and the graph model as a whole) is in the Concepts document.@@

2.3 The RDF Model

Now that we've introduced URI references for identifying things we want to talk about on the Web, and XML as a machine-processable way of representing RDF statements, we can describe how RDF lets us use URIs to make statements about resources. In the introduction, we said that RDF was based on the idea of expressing simple statements about resources, where those statements are built using subjects, predicates, and objects. In RDF, we could represent our original English statement:

http://www.example.org/index.html has a creator whose value is John Smith

by an RDF statement having:

  • a subject http://www.example.org/index.html
  • a predicate http://purl.org/dc/elements/1.1/creator
  • and an object http://www.example.org/staffid/85740

Note how we have introduced URIrefs to identify not only the subject of the original statement, but also the predicate and object, instead of using the words "creator" and "John Smith", respectively. We'll discuss this further a bit later on.

RDF models statements as nodes and arcs in a graph. In this notation, a statement is represented by:

  • a node for the subject, labeled with its URIref
  • a node for the object, labeled with its URIref
  • an arc for the predicate, labeled with its URIref, directed from the subject node to the object node.

So the RDF statement above would be represented by the graph shown in Figure 1:

A Simple RDF Statement
Figure 1: A Simple RDF Statement

Groups of statements are represented by corresponding groups of nodes and arcs. So if we wanted to also represent the additional statements

http://www.example.org/index.html has a creation-date whose value is August 16, 1999
http://www.example.org/index.html has a language whose value is English

we could, by introducing suitable URIrefs to name the properties "creation-date" and "language", use the graph shown in Figure 2:

Several Statements About the Same Resource
Figure 2: Several Statements About the Same Resource

Figure 2 illustrates that the objects of RDF statements may be either resources identified by URIrefs, or constant values (called literals) represented by character strings, in order to represent certain kinds of property values (literals may not be the subjects or predicates of RDF statements). In drawing RDF graphs, nodes that represent resources identified by URIrefs are shown as ellipses, while nodes that represent literals are shown as boxes (labeled by the literal itself). RDF graphs can be described as "labeled directed graphs", since the arcs have labels, and are "directed" (point in a specific direction, from subject to object).

@@mention that a full discussion of RDF literals is in the Concepts document.@@

Sometimes it is not convenient to draw graphs when discussing them, so an alternative way of writing down the statements, called triples, is also used. In the triples notation, each statement in the graph is written as a simple triple of subject, predicate, and object node labels (either URIref or literal), in that order. The triples representing the three statements shown in Figure 2 would be written in full as:

<http://www.example.org/index.html> <http://purl.org/dc/elements/1.1/creator> <http://www.example.org/staffid/85740> .

<http://www.example.org/index.html> <http://www.example.org/terms/creation-date> "August 16, 1999" .

<http://www.example.org/index.html> <http://www.example.org/terms/language> "English" .

Each triple corresponds to a single arc in the graph, complete with the arc's beginning and ending nodes (the subject and object of the statement). Unlike the drawn graph (but like the original statements), the triples notation requires that a node be separately identified for each statement it appears in. So, for example, http://www.example.org/index.html appears three times (once in each triple) in the triples representation of the graph, but only once in the drawn graph. However, the triples represent exactly the same information as the drawn graph, and this is a key point: what is fundamental to RDF is the graph model of the statements. The notation used to represent or depict the graph is secondary.

The full triples notation requires that URI references be written out completely, in angle brackets, which, as the example above illustrates, can result in very long lines. For convenience, we will use a shorthand way of writing triples in the rest of this Primer, and also in other RDF specifications. In this shorthand, we can substitute a QName without angle brackets as an abbreviation of a full URI reference. So, for example, if the QName prefix foo is mapped to the namespace URI http://example.org/somewhere/, then the QName foo:bar is shorthand for the URIref http://example.org/somewhere/bar. We will also make extensive use in these examples of several "well-known" QName prefixes (which we will use without explicitly specifying them each time), defined as follows:

prefix rdf:, namespace URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#
prefix rdfs:, namespace URI: http://www.w3.org/2000/01/rdf-schema#
prefix dc:, namespace URI: http://purl.org/dc/elements/1.1/
prefix daml:, namespace URI: http://www.daml.org/2001/03/daml+oil#
prefix ex:, namespace URI: http://www.example.org/ (or http://www.example.com/)
prefix xsd:, namespace URI: http://www.w3.org/2001/XMLSchema#

We will also use variations on the "example" prefix ex: as needed in the examples, where this will not cause confusion, for example,

prefix exterms:, namespace URI: http://www.example.org/terms/ (for terms used by our example organization),
prefix exstaff:, namespace URI: http://www.example.org/staffid/ (for our example organization's staff identifiers),
prefix ex2:, namespace URI: http://www.domain2.example.org/ (for a second example organization), and so on.

Using our new shorthand, we can write the previous set of triples as:

ex:index.html dc:creator exstaff:85740 .

ex:index.html exterms:creation-date "August 16, 1999" .

ex:index.html exterms:language "English" .

The examples we've just given of RDF statements begin to illustrate some of the advantages of using URIrefs as RDF's basic way of identifying things. For instance, instead of identifying the creator of the Web page in our first example by the character string "John Smith", we've assigned him a URIref, in this case (using a URIref based on his employee number) http://www.example.org/staffid/85740 . An advantage of using a URIref in this case is that we can be more precise in our identification. That is, the creator of the page isn't the character string "John Smith", or any one of the thousands of people named John Smith, but the particular John Smith associated with that URIref (whoever created the URIref defines the association). Moreover, since we have a URIref for the creator of the page, it is a full-fledged resource, and we can record additional information about him, such as his name, and age, as in the graph shown in Figure 3:

More Information About John Smith
Figure 3: More Information about John Smith

These examples also illustrate that RDF uses URIrefs as predicates in RDF statements. That is, rather than using character strings (or words) such as "creator" or "name" to identify properties, RDF uses URIrefs. Using URIrefs to identify properties is important for a number of reasons. First, it allows us to distinguish the properties we use from properties someone else may use that would otherwise be identified by the same character string. For instance, in our example, example.org uses "name" to mean someone's full name written out as a character string literal (e.g., "John Smith"), but someone else may intend "name" to mean something different (e.g., the name of a variable in a piece of program text). A program encountering "name" as a property identifier on the Web wouldn't necessarily be able to distinguish these uses. However, if example.org writes http://www.example.org/terms/name for its "name" property, and the other person writes http://www.domain2.example.org/genealogy/terms/name for hers, we can keep straight the fact that there are distinct properties involved (even if a program cannot automatically determine the distinct meanings). Another reason why it is important to use URIrefs to identify properties is that it allows us to treat RDF properties as resources themselves. Since properties are resources, we can record descriptive information about them (e.g., the English description of what example.org means by "name"), simply by adding additional RDF statements with the property's URIref as the subject.

Using URIrefs as subjects, predicates, and objects in RDF statements allows us to begin to develop and use a shared vocabulary on the Web, reflecting (and creating) a shared understanding of the concepts we talk about. For example, in the triple

ex:index.html  dc:creator  exstaff:85740 .

the predicate dc:creator, when fully expanded as a URIref, is an unambiguous reference to the "creator" attribute in the Dublin Core metadata attribute set, a widely-used set of attributes (properties) for describing information of all kinds. The writer of this triple is effectively saying that the relationship between the Web page (identified by http://www.example.org/index.html ) and the creator of the page (a distinct person, identified by http://www.example.org/staffid/85740 ) is exactly the concept identified by http://purl.org/dc/elements/1.1/creator . Moreover, anyone else, or any program, that understands http://purl.org/dc/elements/1.1/creator will know exactly what is meant by this relationship.

Of course, RDF's use of URIrefs doesn't solve all our problems because, for example, people can still use different URIrefs to refer to the same thing. However, the fact that these different URIrefs are used in the commonly-accessible "Web space" creates the opportunity both to identify equivalences among these different references, and to migrate toward the use of common references.

The result of all this is that RDF provides a way to make statements that applications can more easily process. Now an application can't actually "understand" such statements, of course, but it can deal with them in a way that makes it seem like it does. For example, a user could search the Web for all book reviews and create an average rating for each book. Then, the user could put that information back on the Web. Another web site could take that list of book rating averages and create a "Top Ten Highest Rated Books" page. Here, the availability and use of a shared vocabulary about ratings, and a shared group of URIrefs identifying the books they apply to, allows individuals to build a mutually-understood and increasingly-powerful (as additional contributions are made) "information base" about books on the Web. The same principle applies to the vast amounts of information that people create about thousands of subjects every day on the Web.

RDF statements are similar to a number of other formats for recording information, such as:

  • entries in a simple record or catalog listing describing the resource in a data processing system.
  • rows in a simple relational database.
  • simple assertions in formal logic

and information in these formats can be treated as RDF statements, allowing RDF to be used as a unifying model for integrating data from many sources.

2.4 Structured Property Values and Blank Nodes

Things would be very simple if the only types of information we had to record about things were obviously in the form of the simple RDF statements we've illustrated so far. However, most real-world data involves structures that are more complicated than that, at least on the surface. For instance, in our original example, we recorded the date the Web page was created as a single exterms:creation-date property, with a simple character string literal as its value. However, suppose we wanted to show, as the value of the exterms:creation-date property, the month, day, and year as separate pieces of information? Or, in the case of John Smith's personal information, suppose we wanted to record his address. We might write the whole address out as a character string literal, as in the triple

exstaff:85740  exterms:address  "1501 Grant Avenue, Bedford, Massachusetts 01730" .

However, suppose we wanted to record John's address as a structure consisting of separate street, city, state, and Zip code values? How do we do this in RDF?

We can represent such structured information in RDF by considering the aggregate thing we want to talk about (like John Smith's address) as a resource, and then making statements about that new resource. So, in the RDF graph, in order to break up John Smith's address into its component parts, we create a new node to represent the concept of John Smith's address, and assign that concept a new URIref to identify it, say http://www.example.org/addressid/85740 (which we will abbreviate as exaddressid:85740). We then write RDF statements (create additional arcs and nodes) with that node as the subject, to represent the additional information, producing the graph shown in Figure 4:

Breaking Up John's Address
Figure 4: Breaking Up John's Address

or the triples:

exstaff:85740      exterms:address  exaddressid:85740 .
exaddressid:85740  exterms:street   "1501 Grant Avenue" .
exaddressid:85740  exterms:city     "Bedford" .
exaddressid:85740  exterms:state    "Massachusetts" .
exaddressid:85740  exterms:Zip      "01730" .

Using this approach allows us to represent structured information in RDF, but it can involve generating numerous "intermediate" URIrefs to represent aggregate concepts such as John's address, concepts that may never need to be referred to directly from outside a particular graph, and thus don't, strictly speaking, require "universal" identifiers. In addition, in the drawing of the graph representing the group of statements shown in Figure 4, we don't really need the URIref we assigned to identify "John Smith's address", since we could just as easily have drawn the graph as in Figure 5:

Using a Blank Node
Figure 5: Using a Blank Node

In Figure 5, which is a perfectly good RDF graph, we've used a node without a label to stand for the concept of "John Smith's address". This unlabeled node, or blank node, functions perfectly well in the drawing without needing a URIref, since the node itself provides the necessary connectivity between the various other parts of the graph. (Blank nodes were previously called anonymous resources in [RDF-MS].) However, we would need some form of explicit identifier for that node if we wanted to represent this graph as triples. To see this, we can try to write the triples corresponding to what is shown in the drawn graph. What we would get would be something like:

exstaff:85740  exterms:address  ??? .
???            exterms:street   "1501 Grant Avenue" .
???            exterms:city     "Bedford" .
???            exterms:state    "Massachusetts" .
???            exterms:Zip      "01730" 

where ??? stands for something that indicates the presence of the blank node. Since a complex graph might contain more than one blank node, we would also need a way to differentiate between the various blank nodes in the triples representation of the graph. To do this, we use node identifiers, having the form _:name, to indicate the presence of blank nodes in triples. For instance, in this example we might generate the node identifier _:johnaddress to refer to the blank node, in which case the resulting triples might be:

exstaff:85740  exterms:address  _:johnaddress .
_:johnaddress  exterms:street   "1501 Grant Avenue" .
_:johnaddress  exterms:city     "Bedford" .
_:johnaddress  exterms:state    "Massachusetts" .
_:johnaddress  exterms:Zip      "01730" .

In a triples representation of a graph, each distinct blank node in the graph is given a different node identifier. Unlike URIrefs and literals, node identifiers are not considered to be actual parts of the RDF graph (this can be seen by looking at the drawn graph in Figure 5 and noting that there is no node identifier used to label the blank node). Node identifiers only have significance within the triple representation of the graph, and only for the purpose of distinguishing one blank node from another (so that two groups of triples that differ only by re-naming their node identifiers are considered to represent identical RDF graphs). Node identifiers also have significance only within the triples representing a single graph (so that two different graphs with the same number of blank nodes might use the same node identifiers to distinguish them, and it would be unwise to assume that blank nodes from different graphs having the same node identifiers referred to the same resource). If it is expected that a node in a graph will need to be referenced from outside the graph, a URIref should be assigned to identify it.

At the beginning of this section, we noted that we can represent aggregate structures, like John Smith's address, by considering the aggregate thing we want to talk about as a resource, and then making statements about that new resource. This example illustrates an important aspect of RDF: RDF directly represents only binary relationships, e.g. the relationship between John Smith and the literal representing his address. When we try to represent the relationship between John and the group of separate components of this address, we are dealing with an n-ary (n-way) relationship (in this case, n=5) between John and the street, city, state, and zip components. In order to represent such structures directly in RDF (e.g., considering the address as a group of street, city, state, and zip sub-components), we need to break this n-way relationship up into a group of separate binary relationships. Blank nodes give us one way to do this. Each time we have an n-ary relationship, we can choose one of the participants as the subject of the relationship (John in this case), and create a blank node to represent the rest of the relationship (John's address in this case). We can then represent the remaining participants in the relationship (such as the city in our example) as separate properties of the new resource represented by the blank node.

Blank nodes also give us a way to more accurately make statements about resources that may not have URIs, but that are described in terms of relationships with other resources that do have URIs. For example, when making statements about a person, say Jane Smith, it may seem natural to use that person's email address as her URI, e.g., mailto:jane@example.org. However, this approach can cause a number of problems. One obvious problem is that Jane Smith's email address may change when she changes jobs, and so it may be hard to combine information about Jane recorded at different times. Another problem is that we may want to record information about Jane's mailbox (e.g., the server it is on) as well as about Jane herself (e.g., her current address), and using a URIref for Jane based on her email address makes it difficult to know which thing we're talking about. The same problem exists when a company's Web page URL, say http://www.example.com/, is used as the URI of the company itself. Once again, we may need to record information about the Web page (e.g., who created it and when) as well as about the company, and using http://www.example.com/ as an identifier for both makes it difficult to know which thing we're talking about.

The fundamental problem is that using Jane's email address as a stand-in for Jane isn't really accurate: Jane's email address identifies a mailbox, and Jane and her mailbox are not the same thing. When Jane herself doesn't have a URI, a blank node gives us a more accurate way of modeling this situation. We can represent Jane by a blank node, and give the blank node an exterms:emailaddress property having the URIref mailto:jane@example.org as its value. We can also assign the blank node an rdf:type property with a value of exterms:Person (we will discuss types in more detail in the following sections), a exterms:name property with a value of "Jane Smith", and any other descriptive information we might want to provide, as shown in the following triples:

_:jane  exterms:emailaddress   mailto:jane@example.org .
_:jane  rdf:type       exterms:Person .
_:jane  exterms:name   "Jane Smith" .
_:jane  exterms:empID  "23748"
_:jane  exterms:age    "26" .

This says, accurately, that "there is a resource of type Person, whose electronic mailbox is identified by mailto:jane@example.org, whose name is 'Jane Smith', etc." That is, the blank node can be read as "there is a resource". Statements with that blank node as subject then provide information about the characteristics of that resource.

In practice, using blank nodes instead of URIrefs in these cases doesn't change the way we actually handle this kind of information very much. For example, if we know independently that an email address uniquely identifies someone at example.org (particularly if the address is unlikely to be reused), we can still use that fact to associate information about that person from multiple sources, even though the email address is not the person's URI. For example, if we were to find another piece of RDF on the web that described a book, and gives the author's contact information as the email address mailto:jane@example.org, we might reasonably conclude that the author's name is Jane Smith. The point is that saying something like "the author of the book is mailto:jane@example.org" is actually a shorthand for "the author of the book is someone whose email address is mailto:jane@example.org". Using a blank node to represent this "someone" is just a more accurate way to represent the real world situation. (Incidentally, some RDF-based schema languages allow specifying that certain properties are unique identifiers. This is discussed further in Section 5.5.)

2.5 Typed Literals

In the last section, we described how to handle situations in which we needed to take property values represented by character string literals, and break them up into structured values that identify the individual parts of those property values. Using this approach, instead of, say, recording the date a Web page was created as a single exterms:creation-date property, with a single character string literal as its value, we could represent the value as a structure consisting of the month, day, and year as separate pieces of information. However, so far, we've followed the practice of representing any constant values that serve as objects in RDF statements by these simple untyped literals, even when we probably intend for the value of the property to be a number (e.g., the value of a year or age property) or some other kind of more specialized value.

For example, earlier in Figure 3, we illustrated an RDF graph recording information about John Smith. In that graph, we recorded the value of John Smith's exterms:age property as the literal "27", as shown in Figure 6:

Representing John Smith's Age
Figure 6: Representing John Smith's Age

In this case, our hypothetical organization example.org probably intends for "27" to be interpreted as a number, rather than as the string consisting of the character "2" followed by the character "7". However, an application reading that literal "27" would only know how to do that if the application was explicitly given the information that the literal "27" was intended to represent a number, and knew which number the literal "27" was supposed to represent. The common practice in programming languages or database systems is to provide this kind of information by associating a datatype with the literal, in this case, a datatype like decimal or integer. An application that understands the datatype then knows, for example, whether the literal "10" is intended to represent the number ten, the number two, or the string consisting of the character "1" followed by the character "0", depending on whether the specified datatype is integer, binary, or string. In RDF, typed literals are used to provide this kind of information.

Using a typed literal, we could describe John Smith's age as being the integer number 27 using the triple:

<http://www.example.org/staffid/85740>  <http://www.example.org/terms/age> "27"^^<http://www.w3.org/2001/XMLSchema#integer> .

or, using our QName simplification for writing long URIs:

exstaff:85740  exterms:age  "27"^^xsd:integer .

or as shown in Figure 7:

A Typed Literal for John Smith's Age
Figure 7: A Typed Literal for John Smith's Age

Similarly, in the graph shown in Figure 2 describing information about a Web page, we recorded the value of the page's exterms:creation-date property as the character string literal "August 16, 1999". However, using a typed literal, we could describe the creation date of the Web page as being the date August 16, 1999, using the triple:

ex:index.html  exterms:creation-date  "1999-08-16"^^xsd:date .

or as shown in Figure 8:

A Typed Literal for a Web Page's Creation Date
Figure 8: A Typed Literal for a Web Page's Creation Date

As these examples illustrate, an RDF typed literal is formed by explicitly pairing a URIref identifying a particular datatype (in these examples, the datatypes integer and date from XML Schema Part 2: Datatypes [XML-SCHEMA2]) with a literal that the datatype uses to represent the intended value. In each case, this results in a single node in the RDF graph with the pair as its label.

Unlike typical programming languages and database systems, RDF has no built-in set of datatypes of its own, such as datatypes for integers, reals, strings, or dates. Instead, it relies on datatypes defined elsewhere that can be identified by a URIref. RDF typed literals simply provide a way to explicitly indicate, for a given literal, what datatype should be used to interpret it. As far as RDF is concerned, you can write any pair of URIref and literal you want as a typed literal. This gives RDF the flexibility to directly represent information coming from different sources without the need to perform type conversions between these sources and a native set of RDF datatypes. (Type conversions would still be required when moving information between systems with different datatype systems, but RDF would impose no extra conversions into and out of a native set of RDF types.)

The actual interpretation of a typed literal (determining the value it denotes) must be performed by an RDF processor that is programmed to "understand" that datatype. In particular, we've used XML Schema datatypes in the two examples we've just presented, and will be using XML Schema datatypes in most of our other examples as well (for one thing, XML Schema data types have URIrefs we can use to refer to them, specified in [XML-SCHEMA2]). XML Schema datatypes have a "first among equals" status in RDF. They are treated no differently than any other datatype, but they are expected to be the most widely used, and therefore the most likely to be interoperable among different software. As a result, it is expected that many RDF processors will be programmed to recognize these datatypes. However, RDF software could be programmed to process other sets of datatypes as well.

RDF datatype concepts borrow a conceptual framework from XML Schema datatypes [XML-SCHEMA2] to more precisely describe these datatype requirements. RDF's use of this framework is defined in RDF Concepts and Abstract Syntax [RDF-CONCEPTS].

@@Framework discussion removed@@

The flexibility provided by RDF typed literals comes at a price. For one thing, RDF has no way of knowing whether or not a URIref in a typed literal actually identifies a datatype. Moreover, even when a URIref does identify a datatype, RDF itself does not define the validity of pairing that datatype with a particular literal. This validity can only be determined by software built to understand that datatype. For example, you could write the triple:

exstaff:85740  exterms:age  "pumpkin"^^xsd:integer .

or the graph shown in Figure 9:

An Invalid Typed Literal for John Smith's Age
Figure 9: An Invalid Typed Literal for John Smith's Age

The typed literal in Figure 9 is valid RDF, but obviously an error as far as the xsd:integer datatype is concerned, since "pumpkin" is not defined as being in the lexical space of xsd:integer.

In general, RDF software may be called on to process RDF data that contains datatypes that it has not been programmed to understand, in which case there are some things the software will not be able to do. This includes recognizing whether or not a particular string represents a legal value for a particular datatype. In this case, RDF software not built to understand the xsd:integer datatype would not be able to recognize that "pumpkin" is not a valid xsd:integer.

2.6. Concepts Summary

Taken as a whole, RDF is simple: nodes-and-arcs diagrams interpreted as statements about things identified by URIrefs. This section has presented an introduction to these concepts. The normative (i.e., definitive) RDF specification defining these concepts is the RDF Concepts and Abstract Syntax [RDF-CONCEPTS], which should be consulted for further information. Together with the RDF Semantics [RDF-SEMANTICS], [RDF-CONCEPTS] provides the definition of the abstract syntax for RDF, together with its formal semantics (meaning). Additional background on the basic ideas underlying RDF, and its role in providing a general language for describing Web information, can be found in [WEBDATA].

@@Following para moved here from Concepts document sect 1.2 "Background reading", and needs to be fit in.@@

RDF draws upon ideas from knowledge representation, artificial intelligence and data management, including from Conceptual Graphs, logic-based knowledge representation, frames, and relational databases. Some possible sources of background information are [Sowa], [CG], [KIF], [Hayes], [Luger], [Gray].

@@Original text continues.@@

However, in addition to the basic techniques for representing RDF statements in diagrams (or triples) we've seen so far, it should be clear that we also need a way for people to define the vocabularies they intend to use in those statements, including:

  • defining types of things (like ex:Person)
  • defining properties (like ex:age and creation-date), and
  • defining the types of things that can serve as the subjects or objects of statements involving those properties (such as specifying that the value of an ex:age property should always be an xsd:integer).

The basis for describing such vocabularies in RDF is the RDF Vocabulary Description Language 1.0: RDF Schema [RDF-VOCABULARY], which will be described in Section 5.

3. An XML Syntax for RDF: RDF/XML

As we described in Section 2, RDF's conceptual model is a graph. RDF provides an XML syntax for writing down and exchanging RDF graphs, called RDF/XML. Unlike triples, which are intended as a shorthand notation, RDF/XML is the normative syntax for writing RDF. RDF/XML is defined in the RDF/XML Syntax Specification [RDF-SYNTAX]. This section describes this RDF/XML syntax.

3.1. Basic Principles

We can illustrate the basic ideas behind the RDF/XML syntax using some of the examples we've presented already. Suppose we want to represent one of our initial statements:

http://www.example.org/index.html has a creation-date whose value is August 16, 1999

The RDF graph for this single statement, after assigning a URIref to the creation-date property, is shown in Figure 10:

with a triple representation of:

ex:index.html  exterms:creation-date  "August 16, 1999" .

Corresponding RDF/XML syntax for the graph in Figure 6 would be:

1. <?xml version="1.0"?>
2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.             xmlns:exterms="http://www.example.org/terms/">

4.   <rdf:Description rdf:about="http://www.example.org/index.html">
5.       <exterms:creation-date>August 16, 1999</exterms:creation-date>
6.   </rdf:Description>

7. </rdf:RDF>

(we have added line numbers to use in explaining the example).

This seems like a lot of overhead. We can understand better what is going on by considering each part of this XML in turn.

@@may be too much XML tutorial material in discussing lines 1, 2, and 3@@

Line 1, <?xml version="1.0"?>, is the XML declaration, which indicates that the following content is XML, and what version of XML it is.

Line 2 begins an rdf:RDF element. This indicates that the following XML content (starting here and ending with the </rdf:RDF> in Line 7) is intended to represent RDF. Following the rdf:RDF on this same line is an XML namespace declaration, represented as an xmlns attribute of the rdf:RDF start-tag. This declaration specifies that all tags in this content prefixed with rdf: are part of the namespace identified by the URIref http://www.w3.org/1999/02/22-rdf-syntax-ns#. This namespace is the source for the RDF-specific terms used in RDF/XML.

Line 3 specifies another XML namespace declaration, this time for the prefix exterms:. This is expressed as another xmlns attribute of the rdf:RDF element, and specifies that the namespace URIref http://www.example.org/terms/ is to be associated with the exterms: prefix. This namespace is the source for the specific terms defined by our example organization, example.org. The ">" at the end of line 3 indicates the end of the rdf:RDF start-tag. Lines 1-3 are general "housekeeping" necessary to indicate that we are defining RDF/XML content, and to identify the sources of the terms we are using.

Lines 4-6 provide the RDF/XML for the specific statement we're representing. An obvious way to talk about any RDF statement is to say it's a description, and that it's about the subject of the statement (in this case, about http://www.example.org/index.html), and this is the way RDF/XML represents the statement. The rdf:Description start tag in Line 4 indicates that we're starting a description of a resource, and goes on to identify the resource the statement is about (the subject of the statement) using the rdf:about attribute to specify the URIref of the subject resource. Line 5 provides a property element, with the QName <exterms:creation-date> as its tag, to hold the string literal August 19, 1999 of the creation-date property of the statement. It is nested within the containing rdf:Description element, indicating that this property applies to the resource specified in the rdf:about attribute of the rdf:Description element. The URIref of the creation-date property corresponding to the QName <exterms:creation-date> is obtained by appending the name creation-date to the URI of the exterms: prefix (http://www.example.org/terms/), giving http://www.example.org/terms/creation-date. Line 6 indicates the end of this particular rdf:Description element.

Finally, Line 7 indicates the end of the rdf:RDF element started on Line 2.

This example illustrates the basic ideas used by RDF/XML to encode an RDF graph as XML elements, attributes, element content, and attribute values. The URIref labels for properties and object nodes are written as XML QNames, consisting of a short prefix denoting a namespace URI, together with a local name denoting a namespace-qualified element or attribute, as described in Section 2.2. The (namespace URIref, local name) pair are chosen so that concatenating them forms the original node URIref. The URIrefs of subject nodes are stored in XML attribute values. The nodes labeled by character string literals (which are always object nodes) become element text content or attribute values.

@@In above, Section 2.2 now an appendix.@@

We could represent an RDF graph consisting of multiple statements in RDF/XML by using RDF/XML similar to Lines 4-6 in the previous example to separately represent each statement. For example, if we wanted to write the two statements:

ex:index.html  exterms:creation-date  "August 16, 1999" .
ex:index.html  exterms:language "English" .

we could write the RDF/XML as:

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:exterms="http://www.example.org/terms/">

4.    <rdf:Description rdf:about="http://www.example.org/index.html">
5.        <exterms:creation-date>August 16, 1999</exterms:creation-date>
6.    </rdf:Description>

7.    <rdf:Description rdf:about="http://www.example.org/index.html">
8.        <exterms:language>English</exterms:language>
9.    </rdf:Description>

10. </rdf:RDF>

This is the same as our initial example, with the addition of lines 7-9, a second rdf:Description element to represent the second statement. We could represent an arbitrary number of additional statements in the same way, using a separate rdf:Description element for each additional statement. As this example illustrates, once the overhead of writing the XML and namespace declarations is dealt with, writing each additional RDF statement in RDF/XML is both straightforward and not too complicated.

The RDF/XML syntax provides several abbreviations to make common uses easier to write. For example, it is typical for the same resource to be described with several properties and values at the same time, as in the example above. To handle this case, RDF/XML allows multiple property elements representing those properties to be nested within the rdf:Description element that identifies the subject resource. For example, if we wanted to represent our previous group of statements about http://www.example.org/index.html:

ex:index.html  dc:creator  exstaff:85740 .
ex:index.html  exterms:creation-date  "August 16, 1999" .
ex:index.html  exterms:language "English" .

whose graph (the same as Figure 2) is shown in Figure 11:

@@Syntax doc. Sec 2.3@@

Several Statements About the Same Resource
Figure 11: Several Statements About the Same Resource

the RDF/XML syntax for the graph shown in Figure 11 could be written as:

@@Syntax doc. Sec 2.4@@

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:dc="http://purl.org/dc/elements/1.1/"
4.              xmlns:exterms="http://www.example.org/terms/">

5.    <rdf:Description rdf:about="http://www.example.org/index.html">
6.         <exterms:creation-date>August 16, 1999</exterms:creation-date>
7.         <exterms:language>English</exterms:language>
8.         <dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
9.    </rdf:Description>

10. </rdf:RDF>

(we have added line numbers again to use in explaining the example).

Compared with the previous two examples, we've added an additional namespace declaration (in Line 3), and an additional creator property element (in Line 8). In addition, we've nested the three property elements whose subject is http://www.example.org/index.html within a single rdf:Description element identifying that subject, rather than writing a separate rdf:Description element for each statement.

Line 8 also introduces a new form of property element. (The element tag also uses a different namespace prefix, the new namespace prefix dc: we defined in Line 3.) The exterms:language element in Line 7 is similar to the exterms:creation-date element we defined in the first example. Both these elements represent properties with character strings as property values, and such elements are specified by enclosing the character string within start- and end-tags corresponding to the property name. However, the dc:creator element on Line 8 represents a property whose value is another resource, rather than a character string. If we had written the URIref of this resource as a literal string within start- and end-tags in the same way as we wrote the literal values of the other elements, we would be saying that the value of the dc:creator element was the character string http://www.example.org/staffid/85740, rather than the resource identified by that string interpreted as a URIref. In order to indicate the difference, we've written the dc:creator element using what XML calls an empty element (it has no separate end tag), and defined the property value using an rdf:resource attribute within that empty element. The rdf:resource attribute indicates that its value is another resource, identified by its URIref. Because the URIref is being used as an attribute value, RDF/XML requires that we write out the full URIref, rather than abbreviating it as a QName, as we've done in writing element and attribute names.

It is important to understand that the RDF/XML in the example above is an abbreviation. The RDF/XML below, in which each statement is written separately, describes exactly the same RDF graph:

 <?xml version="1.0"?>
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:dc="http://purl.org/dc/elements/1.1/"
             xmlns:exterms="http://www.example.org/terms/">

   <rdf:Description rdf:about="http://www.example.org/index.html">
       <exterms:creation-date>August 16, 1999</exterms:creation-date>
   </rdf:Description>

   <rdf:Description rdf:about="http://www.example.org/index.html">
       <exterms:language>English</exterms:language>
   </rdf:Description>

   <rdf:Description rdf:about="http://www.example.org/index.html">
       <dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
   </rdf:Description>

 </rdf:RDF>

We will describe a few additional RDF/XML abbreviations in the following sections. However, you should consult [RDF-SYNTAX] for a more thorough description of the abbreviations that are available.

RDF/XML also allows us to represent graphs that include nodes that have no URIrefs, i.e., blank nodes. For example, Figure 12 (taken from [RDF-SYNTAX]) shows a graph saying "the document 'http://www.w3.org/TR/rdf-syntax-grammar' has a title 'RDF/XML Syntax Specification (Revised)' and has an editor, the editor has a name 'Dave Beckett' and a home page 'http://purl.org/net/dajobe/' ".

A Graph Containing a Blank Node
Figure 12: A Graph Containing a Blank Node

This illustrates an idea we discussed near the end of Section 2: the use of a blank node to represent something that does not have a URIref, but can be described in terms of other information. In this case, the blank node represents a person, the editor of the document, and the person is described by his name and home page.

RDF/XML provides several ways to represent blank nodes. These are described in [RDF-SYNTAX]. The approach we will illustrate here, and the most general approach, is to assign a blank node identifier (or bnodeID) to the blank node. A bnodeID serves to identify a blank node within a particular RDF/XML document but, unlike a URIref, is unknown outside the document in which it is assigned. A bnodeID is assigned to a blank node using an rdf:nodeID attribute in the rdf:Description element that describes the blank node. Using this approach, RDF/XML corresponding to Figure 12 could be written as follows:

@@Syntax doc. Sec 2.10@@

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:dc="http://purl.org/dc/elements/1.1/"
4.              xmlns:exterms="http://example.org/stuff/1.0/">

5.     <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar">
6.       <dc:title>RDF/XML Syntax Specification (Revised)</dc:title>
7.       <exterms:editor rdf:nodeID="abc"/>
8.     </rdf:Description>

9.     <rdf:Description rdf:nodeID="abc">
10.        <exterms:fullName>Dave Beckett</exterms:fullName>
11.        <exterms:homePage rdf:resource="http://purl.org/net/dajobe/"/>
12.    </rdf:Description>

13. </rdf:RDF>

In this example, the bnodeID is assigned to the blank node in Line 9, and used to reference it in Line 7 The advantage of using a bnodeID over some of the other approaches described in [RDF-SYNTAX] is that using a bnodeID allows the same blank node to be referred to in more than one place in the same RDF/XML document.

Finally, the typed literals we described in Section 2.5 may be used as property values instead of the character string literals we have used in the examples so far. A typed literal is represented in RDF/XML by adding an rdf:datatype attribute specifying a datatype URIref to the property element containing the literal.

@@Syntax doc. Sec 2.9@@

For example, to change the statement shown in Figure 10 to use a typed literal instead of a character literal for the creation-date property, the triple representation might be:

ex:index.html  exterms:creation-date  "1999-08-16"^^xsd:date .

and the corresponding RDF/XML syntax would be:

1. <?xml version="1.0"?>
2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.             xmlns:exterms="http://www.example.org/terms/">

4.   <rdf:Description rdf:about="http://www.example.org/index.html">
5.     <exterms:creation-date rdf:datatype=
         "http://www.w3.org/2001/XMLSchema#date">1999-08-16
       </exterms:creation-date>
6.   </rdf:Description>

7. </rdf:RDF>

In Line 5, a typed literal is given as the value of the ex:creation-date property element by adding an rdf:datatype attribute to the element's start-tag to specify the datatype. The value of this attribute is the URIref of the datatype, in this case, the URIref of the XML Schema date datatype. Since this is an attribute value, the full URIref must be written out, rather than using the QName abbreviation xsd:date that we used in the triple. A literal appropriate to this datatype is then written as the element content, in this case, the literal 1999-08-16, which is the literal representation for August 16, 1999 in the XML Schema date datatype.

For the most part, we will continue to use XML-style (untyped) character literals in our examples. However, you should be aware that typed literals from appropriate datatypes, such as XML Schema datatypes, can always be used instead.

The facilities we have illustrated so far provide a simple but general way to serialize graphs in RDF/XML. Using these facilities, an RDF graph is written in RDF/XML as follows:

  • All blank nodes are assigned blank node identifiers.
  • Each node is listed in turn as the subject of an un-nested rdf:Description element, using an rdf:about attribute if the node has a URIref, or an rdf:nodeID attribute if the node is blank.
    For each triple with this node as subject, an appropriate property element is created, with either literal content (possibly empty), an rdf:resource attribute specifying the object of the triple (if the object node has a URIref), or an rdf:nodeID attribute specifying the object of the triple (if the object node is blank).

Compared to some of the serialization approaches described in [RDF-SYNTAX], this simple serialization approach provides the most direct representation of the actual graph structure, and is particularly recommended for applications in which the output RDF/XML is to be used in further RDF processing.

3.2. Defining New RDF Resources

@@These aren't really *new* resources; look at Dave and Brian comments@@

So far, we've been describing resources that we imagine have been defined (and given URIrefs) already. For instance, in our initial examples, we've been providing descriptive information about example.org's web page, whose URIref was http://www.example.org/index.html. We referred to this resource (defined elsewhere) using an rdf:about attribute. However, obviously we also want to be able to introduce new resources. For example, suppose a company, example.com, wanted to provide an RDF-based catalog of its products as an RDF/XML document, identified by (and located at) http://www.example.com/2002/04/products. Within that resource, each product might be given a separate RDF description. This catalog, along with one of these descriptions (the catalog entry for a model of tent called the "Overnighter") might be written:

@@Syntax doc. Sec 2.14@@

1.   <?xml version="1.0"?>
2.   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.               xmlns:exterms="http://www.example.com/terms/">

4.     <rdf:Description rdf:ID="item10245">
5.          <exterms:model>Overnighter</exterms:model>
6.          <exterms:sleeps>2</exterms:sleeps>
7.          <exterms:weight>2.4</exterms:weight>
8.          <exterms:packedSize>14x56</exterms:packedSize>
9.     </rdf:Description>

  ...other product descriptions...

10.  </rdf:RDF>

(We've included the surrounding xml, RDF, and namespace information in lines 1 through 3, and line 10, but this information would only need to be defined once for the whole catalog, not repeated for each entry in the catalog).

This is similar to our previous examples in the way it represents the properties (model, sleeping capacity, weight) of the resource (the tent) being described. However, in line 4, the rdf:Description element has an rdf:ID attribute instead of an rdf:about attribute. Using rdf:ID indicates that we are using a fragment identifier, given by the value of the rdf:ID attribute ("item10245" in this case, which might be the catalog number used by example.com), as a shorthand for the complete URIref of the resource we want to describe. This fragment identifier item10245 will be interpreted relative to a base URI, in this case, the URI of the containing catalog. The full URIref for the tent is formed by taking the base URI (of the catalog), and appending #item10245 to it, giving the URIref http://www.example.com/2002/04/products#item10245.

The rdf:ID attribute is somewhat similar to the ID attribute in XML and HTML, in that it defines a label which can be used to refer to this resource. This label must be unique within the resource (in this case, the catalog) in which it is defined. Any other RDF within this catalog could refer to this resource (this particular catalog entry) by using the relative URIref #item10245 in a rdf:about attribute. This would be understood to refer to another resource defined within the catalog. We could also have introduced the URIref of the catalog entry itself by specifying rdf:about="#item10245" instead of rdf:ID="item10245" (i.e., by specifying the relative URIref directly). The two forms are essentially synonyms: the full URIref formed by RDF is the same in either case: http://www.example.com/2002/04/products#item10245.

RDF located outside the catalog could refer to this catalog entry by using the full URIref, i.e., by concatenating the relative URIref #item10245 of the catalog entry to the base URI of the catalog, forming the absolute URIref http://www.example.com/2002/04/products#item10245. For example, an outdoor sports web site exampleRatings.com might use RDF to provide ratings of various tents. The (5-star) rating given to the tent we described earlier might then be represented on exampleRatings.com's web site as:

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:sportex="http://www.exampleRatings.com/terms/">

4.    <rdf:Description rdf:about="http://www.example.com/2002/04/products#item10245">
5.         <sportex:ratingBy>Richard Roe</sportex:ratingBy>
6.         <sportex:numberStars>5</sportex:numberStars>
7.    </rdf:Description>
8.  </rdf:RDF>

In this example, line 4 uses an rdf:Description element with an rdf:about attribute whose value is the full URIref of the tent's catalog entry, defined by the earlier RDF description. The use of this URIref allows the tent being referred to in the rating to be precisely identified.

This example not only shows how new resources can be defined in RDF/XML; it also illustrates one of the basic architectural principles of the Web, which is that anyone should be able say anything they want about existing resources [BERNERS-LEE98]. The example also illustrates that the RDF describing a particular resource does not need to be located all in one place; instead, it may be distributed throughout the web. This is true not only for examples like this one, in which one organization is rating or commenting on resources defined by another, but also for situations in which the original creator of a resource (or anyone else) wishes to amplify the description of that resource by providing additional information about it. This may be done either by modifying the original document in which the resource was defined, to add the properties and values needed to describe the additional information, or, as this example illustrates, by creating a separate document, and providing the additional properties and values in rdf:Description elements that refer to the original resource using rdf:about.

The previous example indicated that fragment identifiers such as #item10245 will be interpreted relative to a base URI. By default, this base URI would be the URI of the resource in which the fragment is used. However, in some cases it is desirable to be able to explicitly specify this base URI. For instance, suppose that in addition to the catalog located at http://www.example.com/2002/04/products, example.org wanted to provide a duplicate catalog on a mirror site, say at http://mirror.example.com/2002/04/products. This could create a problem, since if the catalog was retrieved from the mirror site, the URIref generated for our example tent would be http://mirror.example.com/2002/04/products#item10245, rather than http://www.example.com/2002/04/products#item10245, and hence apparently a different tent. To deal with this problem, RDF/XML supports XML Base [XML-BASE], which allows an XML document to specify a base URI other than the URI of the document itself. In this case, we would define the catalog as:

1.   <?xml version="1.0"?>
2.   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.               xmlns:exterms="http://www.example.com/terms/"
4.               xml:base="http://www.example.com/2002/04/products">

5.     <rdf:Description rdf:ID="item10245">
6.          <exterms:model>Overnighter</exterms:model>
7.          <exterms:sleeps>2</exterms:sleeps>
8.          <exterms:weight>2.4</exterms:weight>
9.          <exterms:packedSize>14x56</exterms:packedSize>
10.    </rdf:Description>

  ...other product descriptions...

11.  </rdf:RDF>

The xml:base declaration in line 4 specifies that the base URI for the content within the rdf:RDF element (until another xml:base attribute is specified) is http://www.example.com/2002/04/products, and all relative URIrefs cited within that content will be interpreted relative to that base, no matter where the actual content is located. As a result, the relative URIref of our tent, #item10245, will generate the same absolute URIref, http://www.example.com/2002/04/products#item10245, no matter where the catalog is located.

So far, we've been talking about a single product description, a particular model of tent, from example.com's catalog. However, example.com will probably offer several different models of tents, as well as multiple instances of other categories of products, such as backpacks, hiking boots, and so on. This idea of instances of things that can be classified into different kinds or categories is similar to the programming language concept of objects having different types or classes. RDF supports this concept by providing a predefined property, rdf:type. When an RDF resource is defined as having an rdf:type property, the value of that property is considered to be a resource that defines a category or class of things, and the original resource is considered to be an instance of that category or class. Using rdf:type, example.com might indicate that our product description is that of a tent as follows:

1.   <?xml version="1.0"?>
2.   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.               xmlns:exterms="http://www.example.com/terms/"
4.               xml:base="http://www.example.com/2002/04/products">

5.     <rdf:Description rdf:ID="item10245">
6.          <rdf:type rdf:resource="http://www.example.com/terms/Tent" />
7.          <exterms:model>Overnighter</exterms:model>
8.          <exterms:sleeps>2</exterms:sleeps>
9.          <exterms:weight>2.4</exterms:weight>
10.         <exterms:packedSize>14x56</exterms:packedSize>
11.    </rdf:Description>

  ...other product descriptions...

12.  </rdf:RDF>

Note the use of the rdf:type property to indicate that the instance belongs to class Tent. In this case, we imagine that example.com has defined its classes as part of the same vocabulary that it uses to describe its other terms (such as the property exterms:weight), so we use the absolute URIref of the class to refer to it. If example.com had defined these classes in the product catalog itself, we could have used the relative URIref #Tent to refer to it.

RDF itself does not define a vocabulary for defining application-specific classes of things, like Tent in this example. Instead, such classes would be defined in an RDF Schema. The RDF Schema vocabulary is described in Section 5. Other vocabularies for defining classes can also be defined, such as the DAML+OIL and OWL languages described in Section 5.5. In addition, RDF defines several pre-defined types of its own for various purposes. These will be described in Section 4.

Since defining resources as instances of specific types is fairly common, the RDF/XML syntax provides a special abbreviation for instances defined as members of classes using the rdf:type property. In this abbrevation, the rdf:type property and value are removed, and the rdf:Description element name is replaced by the class name. Using this abbreviation, example.com's tent from the example above could also be defined as:

@@Syntax doc. Sec 2.13; introduce term "typed node"@@

1.   <?xml version="1.0"?>
2.   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.               xmlns:exterms="http://www.example.com/terms/"
4.               xml:base="http://www.example.com/2002/04/products">

5.     <exterms:Tent rdf:ID="item10245">
6.          <exterms:model>Overnighter</exterms:model>
7.          <exterms:sleeps>2</exterms:sleeps>
8.          <exterms:weight>2.4</exterms:weight>
9.          <exterms:packedSize>14x56</exterms:packedSize>
10.    </exterms:Tent>

  ...other product descriptions...

11.  </rdf:RDF>

Both this abbreviation and the previous description of the tent (using the full <rdf:Description rdf:ID="10245"> element) illustrate that RDF statements can be written in RDF/XML in a way that closely resembles the descriptions that might have been written directly in XML. This is an important consideration, given the increasing use of XML in all kinds of applications, since it suggests that RDF could be used in these applications without major changes in information structure being required, and that much deployed XML can be interpreted as RDF statements.

@@Note often need to add an identifier to XML as subject; clarify how XML can be interpreted as RDF.@@

3.3. RDF/XML Summary

The examples above have illustrated some of the basic ideas behind the RDF/XML syntax. For a discussion of the basic principles behind the modeling of RDF statements in XML (known as striping), and other details and examples about writing RDF in XML, refer to the RDF/XML Syntax Specification [RDF-SYNTAX].

@@mention other abbrevations covered in syntax; more nested forms@@

4. Other RDF Capabilities

RDF provides a number of additional capabilities, including some built-in types and properties for representing groups of resources and RDF statements, and capabilities for deploying RDF information in the World Wide Web. These additional capabilities are described in the following sections.

4.1. RDF Containers

There is often a need to describe groups of things. For example, we might want to say that a book was created by several authors, or to list the students in a course, or the software modules in a package. RDF provides several pre-defined types and properties that can be used to describe such groups.

First, RDF provides three predefined types (together with some associated predefined properties) for describing containers. A container is a resource that contains things. The contained things are called members. The members of a container may be resources or literals. RDF defines three types of containers:

  • rdf:Bag
  • rdf:Seq
  • rdf:Alt

A Bag (a resource having type rdf:Bag) is a group of resources or literals, possibly including duplicate members, where there is no significance in the order of the members. For example, a Bag might be used to describe a group of part numbers in which the order of entry or processing of the part numbers does not matter.

A Sequence or Seq (a resource having type rdf:Seq) is a group of resources or literals, possibly including duplicate members, where the order of the members is significant. For example, a Sequence might be used to describe a group that must be maintained in alphabetical order.

An Alternative or Alt (a resource having type rdf:Alt) is a group of resources or literals that are alternatives (typically for a single value of a property). For example, an Alt might be used to describe alternative language translations for the title of a book, or to describe a list of alternative Internet sites at which a resource might be found. An application using a property whose value is an Alt container should be aware that it can choose any one of the members of the group as appropriate.

To describe a resource as being one of these types of containers, you give the resource an rdf:type property whose value is one of the pre-defined resources rdf:Bag, rdf:Seq, or rdf:Alt (whichever is appropriate). The container resource (which may either be a blank node or a resource with a URIref) denotes the group as a whole. The members of the container can be described by defining a container membership property for each member with the container resource as its subject and the member as its object. These membership properties have names of the form rdf:_n, where n is an integer, e.g., rdf:_1, rdf_2, rdf_3, and so on, and are used specifically for describing the members of containers. Container resources may also have other properties that describe the container, in addition to the container membership properties and the rdf:type property.

It is important to understand that while these types of containers are described using pre-defined RDF types and properties, any special meanings associated with these containers, e.g., that the members of an Alt container are alternative values, are only intended meanings. These specific container types, and their definitions, are provided with the aim of establishing a shared convention among those who need to describe groups of things. All RDF does is provide the types and properties that can be used to construct the RDF graphs to describe each type of container. RDF has no more built-in understanding of what a resource of type rdf:Bag is than it has of what a resource of type ex:Tent, that we discussed in Section 3.2, is. In each case, applications must be written to behave according to the particular meaning involved for each type. This point will be expanded on in the following examples.

A typical use of a container is to indicate that the value of a property is a group of things. For example, to represent the sentence "Course 6.001 has the students Amy, Tim, John, Mary, and Sue", you could describe the course by giving it a s:students property whose value is a container of type rdf:Bag (the group of students) and then, using the container membership properties, describe the individual students as being members of that container, as in the RDF graph shown in Figure 14:

A Simple Bag Container
Figure 14: A Simple Bag Container

Since the value of the s:students property in this example is described as a Bag, there is no intended significance in the order given for the URIrefs of each student, even though the properties in the graph have integers in their names. It is up to applications creating and processing graphs that include rdf:Bag containers to ignore any (apparent) order in the names of the membership properties.

RDF/XML provides some special syntax and abbreviations to make it simpler to describe such containers. For example, the following RDF/XML describes the graph shown in Figure 14:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:s="http://example.edu/students/vocab#">

   <rdf:Description rdf:about="http://example.edu/courses/6.001">
      <s:students>
         <rdf:Bag>
            <rdf:li rdf:resource="http://example.edu/students/Amy"/>
            <rdf:li rdf:resource="http://example.edu/students/Tim"/>
            <rdf:li rdf:resource="http://example.edu/students/John"/>
            <rdf:li rdf:resource="http://example.edu/students/Mary"/>
            <rdf:li rdf:resource="http://example.edu/students/Sue"/>
         </rdf:Bag>
      </s:students>
   </rdf:Description>
</rdf:RDF>

Note that RDF/XML provides li as a convenience element to avoid having to explicitly number each membership property. The numbered properties rdf:_1, rdf:_2, and so on are generated from the li elements in forming the corresponding graph. The element name li was chosen to be mnemonic with the term "list item" from HTML. Note also the use of a <rdf:Bag> element within the <s:students> property element. The <rdf:Bag> element is an example of the typed node we saw earlier in Section 3.2, and is an abbreviation of an rdf:Description element, together with an rdf:type property, describing the Bag. Since no URIref is specified, the Bag is a blank node. Its nesting within the <s:students> property element is an abbreviated way of indicating that the blank node is the value of this property. These abbreviations are described further in [RDF-SYNTAX].

The graph structure for an rdf:Seq container, and the corresponding RDF/XML, are similar to those for an rdf:Bag (the only difference is in the type, rdf:Seq). Once again, although an rdf:Seq container is intended to describe a sequence, it is up to applications creating and processing the graph to appropriately interpret the sequence of integer-valued property names.

As an illustration of an Alt container, the sentence "The source code for X11 may be found at ftp.example.org, ftp.example1.org, or ftp.example2.org" could be expressed in the RDF graph shown in Figure 15:

A Simple Alt Container
Figure 15: A Simple Alt Container

The graph in Figure 15 could be written in RDF/XML as:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:s="http://example.org/packages/vocab#">

<rdf:RDF>
   <rdf:Description rdf:about="http://example.org/packages/X11">
      <s:DistributionSite>
         <rdf:Alt>
            <rdf:li rdf:resource="ftp://ftp.example.org"/>
            <rdf:li rdf:resource="ftp://ftp.example1.org"/>
            <rdf:li rdf:resource="ftp://ftp.example2.org"/>
         </rdf:Alt>
      </s:DistributionSite>
   </rdf:Description>
</rdf:RDF>

An Alt container is intended to have at least one member, identified by the property rdf:_1. This member is intended to be considered as the default or preferred value. Other than the member identified as rdf:_1, the order of the remaining elements is not significant.

The RDF in Figure 15 as written states simply that the value of the s:DistributionSite site property is the Alt container resource itself. Any additional meaning that is to be read into this graph, e.g., that one of the members of the Alt container is to be considered as the value of the s:DistributionSite site property, or that ftp://ftp.example.org is the default or preferred value, must be built into an application's understanding of how an Alt is intended to behave, and/or into the meaning defined for the particular property (s:DistributionSite in this case), which also must be understood by the application.

Alt containers are frequently used in conjunction with language tagging. For example, a work whose title has been translated into several languages might have its Title property pointing to an Alt container holding each of the language variants.

The distinction between the intended meanings of a Bag and an Alt can be further illustrated by considering the authorship of the book "Huckleberry Finn". The book has exactly one author, but the author has two names (Mark Twain and Samuel Clemens). Either name is sufficient to specify the author. Thus using an Alt container of the author's names more accurately represents the relationship than using a Bag (which might suggest there are two different authors).

Users are free to choose their ways to describe groups of resources, rather than using the ones described here. These RDF containers are merely provided as common definitions that, if generally used, could help make data involving groups of resources more interoperable.

Sometimes there are clear alternatives to using these RDF container types. For example, a relationship between a particular resource and a group of other resources could be indicated by making the first resource the subject of multiple statements using the same property. This is structurally not the same as the resource being the subject of a single statement whose object is a container containing multiple members. In some cases, these two structures may have equivalent meaning, but in other cases they may not. The choice of which to use in a given situation should be made with this in mind.

Consider as an example the relationship between a writer and her publications. We might have the sentence:

Sue has written "Anthology of Time", "Zoological Reasoning", and "Gravitational Reflections".

In this case, there are three resources each of which was written independently by the same writer. This could be expressed using repeated properties as:

exstaff:Sue exterms:publication ex:AnthologyOfTime .
exstaff:Sue exterms:publication ex:ZoologicalReasoning .
exstaff:Sue exterms:publication ex:GravitationalReflections .

In this example there is no stated relationship between the publications other than that they were written by the same person. Each of the statements is an independent fact, and so using repeated properties would be a reasonable choice. However, this could just as reasonably be represented as a statement about the group of resources written by Sue:

exstaff:Sue exterms:publication _:z
_:z rdf:type rdf:Bag .
_:z rdf:_1 ex:AnthologyOfTime .
_:z rdf:_2 ex:ZoologicalReasoning .
_:z rdf:_3 ex:GravitationalReflections .

On the other hand, the sentence:

The resolution was approved by the Rules Committee, having members Fred, Wilma, and Dino.

says that the committee as a whole approved the resolution; it does not necessarily state that each committee member individually voted in favor of the resolution. In this case, it would be potentially misleading to model this sentence as three separate exterms:approvedBy statements, one for each committee member, as shown below:

ex:resolution exterms:approvedBy ex:Fred .
ex:resolution exterms:approvedBy ex:Wilma .
ex:resolution exterms:approvedBy ex:Dino .

since these statements say that each member individually approved the resolution.

In this case, it would be better to model the sentence as a single exterms:approvedBy statement whose subject is the resolution and whose object is the committee itself. The committee resource could then be described as a Bag whose members are the members of the committee, as in the following triples:

ex:resolution exterms:approvedBy ex:rulesCommittee
ex:rulesCommittee rdf:type rdf:Bag .
ex:rulesCommittee rdf:_1 ex:Fred .
ex:rulesCommittee rdf:_2 ex:Wilma .
ex:rulesCommittee rdf:_3 ex:Dino .

Finally, when using these RDF containers, it is important to understand that you are not constructing containers, as you would a programming language data structure; instead, you are describing containers (groups of things) that actually exist. For instance, in the Rules Committee example just given, the Rules Committee is an unordered group of people, whether you describe it in RDF that way or not. When you give the Rules Committee resource an rdf:type property whose value is rdf:Bag, you are simply describing the Rules Committee as having whatever characteristics you associate with things of type rdf:Bag, not constructing a particular data structure to hold the members of the group (you could indicate that the Rules Committe was a Bag without describing any members at all). Similarly, when you use the container membership properties, you are simply describing a container resource as having certain things as members. You are not necessarily saying that the things that you describe as members are the only members that exist. For example, the triples given above to describe the Rules Committee say only that Fred, Wilma, and Dino are members of the Bag, not that they are the only members of the Bag.

4.2. RDF Collections

A limitation of the containers described in Section 4.1 is that there is no way to close them, i.e., to say "these are all the members of the container". This is because, while one graph may describe some of the members, there is no way to exclude the possibility that there is another graph somewhere that describes additional members. RDF provides support for describing groups containing only the specified members, in the form of RDF collections. An RDF collection is a group of things represented as a list structure in the RDF graph. This list structure is constructed using the predefined type rdf:List, the predefined properties rdf:first and rdf:rest, and the predefined resource rdf:nil.

To illustrate this, you could represent the sentence "The students in course 6.001 are Amy, Tim, and John" using the graph shown in Figure 16:

A Simple Alternative Container
Figure 16: An RDF Collection (list structure)

For each member of the collection, such as s:Amy, there is a corresponding resource of type rdf:List. This list resource is linked to the collection member by an rdf:first property, and to the rest of the list by an rdf:rest property. The end of the list is indicated by an rdf:rest property being the resource rdf:nil. This structure will be familiar to those who know the Lisp programming language. As in Lisp, the rdf:first and rdf:rest properties allow applications to traverse the structure.

RDF/XML provides a special notation to make it easier to describe collections. In RDF/XML, a collection is described by a property element that has the attribute rdf:parseType="Collection", and that contains a group of nested elements representing the members of the collection. The rdf:parseType="Collection" attribute indicates that the enclosed elements are to be used to create the corresponding list structure in the RDF graph.

To illustrate how t