RDF Primer

1. Introduction

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. However, by generalizing the concept of a "Web resource", RDF can be used to represent information about anything that can be identified on the Web, such as information about items available from online shopping facilities (e.g., information about prices, publishers, and availability of books or recordings).

RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created.

To make this discussion somewhat more concrete as soon as possible, the following is a small chunk of RDF in its XML serialization format.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns="http://www.w3.org/2000/10/swap/pim/contact#">
  <Person rdf:about="http://www.w3.org/People/EM/contact#me">
    <mailbox rdf:resource="mailto:em@w3.org"/>
    <fullName>Eric Miller</fullName>
    <personalTitle>Semantic Web Activity Lead</personalTitle> 
  </Person>
</rdf:RDF>

This example roughly translates as a collection of statements "there is someone called Eric Miller, with the email address em@w3.org, and who is the Semantic Web Activity Lead". Note that the example contains what seem to be Web addresses, as well as some "properties" like "mailbox" and "fullName", and the values "em@w3.org", and "Eric Miller".

Like HTML, this form of information is machine processable, and links pieces of data across the Web. However, unlike conventional hypertext, RDF links can reference any identifiable things, including things that may or may not be Web-based data. The result is that in addition to describing Web pages, we can also convey information about cars, businesses, people, news events, etc. Further, RDF links themselves can be labeled, to indicate the kind of relationship that exists between the linked items.

The complete specification of RDF consists of a number of documents:

RDF Model Theory (and graph syntax)
RDF/XML syntax
RDF Schema (and datatypes)
RDF Test Cases
RDF Primer (this document)

This Primer is intended to augment the other parts of the RDF specification, to help information system designers and application developers understand the features of RDF, and how to use them. In particular, the Primer is intended to answer such questions as:

What information can RDF represent?
What does RDF look like?
How is RDF information created, accessed, and processed?
How can existing information be combined with RDF?

The Primer is a non-normative document, which means that it does not provide a definitive (from the W3C's point of view) specification of RDF. The examples and other explanatory material in this document are provided to help you understand RDF, but they may not always provide definitive or fully-complete answers. In such cases, you should refer to the relevant normative parts of the RDF specification. To help you do this, we provide links pointing to the relevant parts of the normative specifications.

2. Making Statements About Resources

RDF is intended to provide a simple way to state properties of (facts about) Web resources, e.g., Web pages. For example, imagine that we want to record the fact that someone named John Smith created a particular Web page. A straightforward way to state this fact in English would be in the form of a simple statement, e.g.:

http://www.example.org/index.html has a creator whose value is John Smith

We've underlined parts of this statement to illustrate that, in order to describe the properties of something, we need ways to name, or identify, a number of things:

We need a way to identify the thing we want to describe (the Web page, in this case)
We need a way to identify a specific property (the creator) of the thing that we want to describe
We need a way to identify the thing we want to assign as the value of this property (who the creator is), for the thing we want to describe

In this statement, we've used the Web page's URL (Uniform Resource Locator) to identify it. In addition, we've used the word "creator" to identify the property we want to talk about, and the two words "John Smith" to identify the thing (a person) we want to say is the value of this property.

We could state other properties of this Web page by writing additional English statements of the same general form, using the URL to identify the page, and words (or other expressions) to identify the properties and their values. For example, to specify the date the page was created, and the language in which the page is written, we could write the additional statements:

http://www.example.org/index.html has a creation-date whose value is August 16, 1999
http://www.example.org/index.html has a language whose value is English

(note the use of "August 16, 1999" to identify a date).

RDF is based on the idea that the things we want to describe have properties which have values, and that resources can be described by making statements, similar to those above, that specify those properties and values. RDF uses a particular terminology for talking about the various parts of statements. Specifically, the part that identifies the thing the statement is about (the Web page in this example) is called the subject. The part that identifies the property or characteristic of the subject that the statement specifies (creator, creation-date, or language in this case) is called the predicate, and the part that identifies the value of that property is called the object. So, taking the English statement

http://www.example.org/index.html has a creator whose value is John Smith

the RDF terms for the various parts of the statement are:

the subject is the URL http://www.example.org/index.html
the predicate is the word "creator"
the object is the words "John Smith"

However, while English is good for communicating between (English-speaking) humans, RDF is about making machine-processable statements. To make these kinds of statements suitable for processing by machines, we need two things:

a system of machine-processable identifiers that allows us to identify a subject, object, or predicate in a statement without any possibility of confusion with a similar-looking identifier that might be used by someone else on the Web.
a machine-processable format for representing these statements and exchanging them between machines.

Fortunately, the existing Web architecture provides us with both of the necessary mechanisms. The Web's Uniform Resource Identifier (URI) provides us with a way to uniquely identify anything we want to talk about in an RDF statement, and the Extensible Markup Language (XML) provides us with a format for representing and exchanging RDF statements. The next two sections briefly describe these mechanisms.

2.1 Identifiers: Uniform Resource Identifier (URI)

If we want to discuss something, we must first identify it. How else will you know what one is referring to? In everyday communication, identity is assigned in many ways: "Bob", "The Moon", "373 Whitaker Ave.", "California", "VIN 2745534", "today's weather", etc., and ambiguities are generally resolved in terms of a shared semantic context between the sender and the receiver. To identify "things" on the Web, we also use identifiers.

As we've seen, the Web already provides one form of identifier, the Uniform Resource Locator (URL). We used a URL in our original example to identify the Web page that John Smith created. A URL is a string that identifies a Web resource by representing its primary access mechanism (essentially, its network "location"). However, we would like to be able to record information about many things in addition to Web pages. In particular, we'd like to record information about lots of things that don't have URLs. For example, I don't have a URL, and yet my employer needs to record all sorts of things about me in order to pay my salary, keep track of the work that I've been doing, and so on. My doctor needs to record other sorts of things about me in order to keep track of my medical history, tests that have been performed (and the results, who performed them, and when), shots I've received, etc.

We've recorded information about lots of things that don't have URLs in files (both manual and automated) for many years, and the way we identify those things is by assigning them identifiers : values that we uniquely associate with the individual things. The identifiers we use to identify various kinds of things go by names like "Social Security Number", "Part Number", "license number", "employee number", "user-id", etc. In some cases, these identifiers (such as Social Security Numbers) are assigned by an official authority of some kind. In other cases, these identifiers are generated by a private organization or individual. In some cases, these identifiers have a national or international scope within which they are unique (a Social Security Number has national scope), while in other cases they may only be unique within a very limited scope (my employee number is only unique among the numbers assigned by my specific employer). Nevertheless, these identifiers serve, if used properly, to identify the things we want to talk about.

The Web provides its own form of identifier for these purposes, called the Uniform Resource Identifier (URI). URIs are similar to URLs, in that different persons or organizations can independently create them, and use them to identify things. However, unlike URLs, URIs are not limited to identifying things that have network locations, or use other computer access mechanisms. In fact, we can create a URI to refer to anything we want to talk about, including

network-accessible things, such as an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), or a collection of other resources.
things that are not network-accessible, such as human beings, corporations, and bound books in a library.
abstract concepts that don't physically exist, like the concept of a "creator".

URIs essentially constitute an infinite stock of names that can be used to identify things. No one person or organization controls who makes URIs or how they can be used. While some URI schemes (such as URL's http:) depend on centralized systems (such as DNS), other schemes (such as freenet: ) are completely decentralized. This means that (as with any other kind of name), you don't need special authority or permission to create a URI for something, and you can create URIs for things you don't own (just as you can use whatever name you like for things you don't own in ordinary language). The URI is the foundation of the Web. While nearly every other part of the Web can be replaced, the URI cannot: it holds the Web together.

Since the URI is such a general identification mechanism, capable of identifying anything, it should not be surprising that RDF uses URIs as its mechanism for identifying the subjects, objects, and predicates in statements. In fact, RDF defines a resource as anything that is identifiable by a URI, and hence using URIs allows RDF to describe practically anything, and to state relationships between such things as well. We'll see how this works just a bit further on. But before we do that, we need to introduce a way for RDF statements to be physically represented and exchanged.

@@Introduce additional URI schemes explicitly: URNs, mailto:; rather than just mentioning freenet. Cite the W3C page on addressing: http://www.w3.org/Addressing/, and on registered addressing schemes: http://www.w3.org/Addressing/schemes.html@@.

@@Also need to resolve where the fragment identifiers text goes. It logically goes here with the URI discussion, but the explanatory text uses N-Triples, which are not introduced until later.@@

2.2 Documents: Extensible Markup Language (XML)

XML was designed to allow anyone to design their own document format and then write a document in that format. Like HTML documents (Web pages), XML documents contain text. This text consists primarily of plain text content, and markup in the form of tags. This markup allows a processing program to interpret the various pieces of content (elements). In HTML, the set of permissible tags, and their interpretation, is defined by the HTML specification. However, XML allows users to define their own markup languages (tags and the structures in which they can appear) adapted to their own specific requirements. For example, the following is a simple passage marked up using an XML-based markup language:

<sentence><person href="http://example.com/#me">I</person> 
just got a new pet <animal>dog</animal>.</sentence>

Elements delimited by tags ("sentence", "person", etc.) are introduced to reflect a particular structure associated with the passage. These tags allow a program written with an understanding of these particular elements to properly interpret the passage.

This particular markup language uses the words "sentence," "person," and "animal" to attempt to convey meaning. And they would to an English-speaking person reading it, or to a program specifically written to interpret this vocabulary. However, there is no built-in meaning here. For example, to non-English speakers, or to a program not written to understand this markup, the element "person" may mean absolutely nothing. Take the following for example:

<dfgre><reghh bjhb="http://example.com/#me">I</reghh> 
just got a new pet <yudis>dog</yudis>.</dfgre>

To a machine, this is the exact same structure as the previous example. However, it is no longer clear what is being said. Moreover, others may have used the same words in their own markup languages, but with completely different intended meanings. For example, "sentence" in another markup language might refer to the amount of time that a convicted criminal must serve in a penal institution. So additional mechanisms must be provided to help keep XML vocabulary straight.

To prevent confusion, it is necessary to uniquely identify markup elements. This is done in XML using XML Namespaces . A namespace is just a way of identifying a part of the Web (space) which acts as a qualifier for a specific set of names. A "namespace" is created for an XML markup language by creating a URI for it. By qualifying tag names with the URIs of their namespaces, anyone can create their own tags and properly distinguish them from tags created by others. A useful practice is to create a Web page to describe the markup language (and the intended meaning of the tags) and use the URL of that Web page as the URI for its namespace.

<my:sentence xmlns:my="http://example.org/xml/documents/">
   <my:person my:href="http://example.com/#me">I</my:person> 
just got a new pet <my:animal>dog</my:animal>.
</my:sentence>

Since everyone's tags have their own URIs, we don't have to worry about tag names conflicting. The elements mean the same if they have the same URIs.

RDF defines a specific XML markup language for use in writing down RDF information, and for exchanging it between machines. An example of this language was given in Section 1, and the language is described in Section 3.

@@Needs some brief additional explanation of the namespace mechanism, and how it's used.@@

2.3 The RDF Model

Now that we've introduced URIs for identifying things we want to talk about on the Web, and XML as a machine-processable way of representing RDF statements, we can describe how RDF lets us use URIs to make statements about resources. In the introduction, we said that RDF was based on the idea of expressing simple statements about resources, using subjects, predicates, and objects. In RDF, we could represent our original English statement:

http://www.example.org/index.html has a creator whose value is John Smith

by an RDF statement having:

a subject http://www.example.org/index.html
a predicate http://purl.org/dc/elements/1.1/creator
and an object http://www.example.org/staffid/85740

RDF models statements as nodes and arcs in a graph. In this notation, a statement is represented by a node for the subject, a node for the object, and a labeled arc between them for the predicate. So the RDF statement above would be represented by the graph:

Figure 1: A Simple RDF Statement

Collections of statements are represented by corresponding collections of nodes and arcs. So if we wanted to also represent the additional statements

http://www.example.org/index.html has a creation-date whose value is August 16, 1999
http://www.example.org/index.html has a language whose value is English

we could, introducing suitable URIs to name the properties "creation-date" and "language", use the following graph:

Figure 2: Several Statements About the Same Resource

This graph illustrates that RDF permits the objects in statements to be simple strings, if necessary to represent property values, as well as URIs. In drawing RDF graphs, nodes that represent URIs are shown as ellipses, while nodes that represent strings are shown as boxes. RDF graphs are technically "labeled directed graphs", since the arcs have labels, and are "directed" (point in a specific direction, from subject to object).

Sometimes it is not convenient to draw graphs, so an alternative way of writing down the statements, called N-Triples, can also be used. In the N-Triples notation, each statement in the graph is written as a simple triple of subject, predicate, and object, in that order. The N-Triples representing the above three statements would be written:

<http://www.example.org/index.html> 
<http://purl.org/dc/elements/1.1/creator> 
<http://www.example.org/staffid/85740> .

<http://www.example.org/index.html> 
<http://www.example.org/terms/creation-date> 
"August 16, 1999" .

<http://www.example.org/index.html> 
<http://www.example.org/terms/language> 
"English" .

Each triple corresponds to a single arc in the graph, complete with the arc's beginning and ending nodes (the subject and object of the statement). Unlike the drawn graph, the N-Triples notation requires that a node be separately identified for each statement it appears in. So, for example, http://www.example.org/index.html appears three times (once in each triple) in the N-Triples representation of the graph, but only once in the drawn graph.

These examples begin to illustrate some of the advantages of using URIs as RDF's basic way of identifying things. For instance, instead of identifying the creator of the Web page in our first example by the string "John Smith", we've assigned him a URI, in this case (using a URI based on his employee number) http://www.example.org/staffid/85740 . An advantage of using a URI in this case is that we can be more precise in our identification. That is, the creator of the page isn't the string "John Smith'', or any one of the thousands of people having "John Smith" as their name, but the particular John Smith associated with that URI (whoever created the URI defines the association). Moreover, since we have a URI for the creator of the page, it is a full-fledged resource, and we can record additional information about him, such as his name, and age, as in the graph

Figure 3: More Information about John Smith

The examples also illustrate that RDF uses URIs as predicates in RDF statements. That is, rather than using strings such as "creator" or "name" to identify properties, RDF uses URIs. Using URIs to identify properties is important for a number of reasons. First, it allows us to distinguish the properties we use from properties someone else may use that would otherwise be identified by the same text string. For instance, in our example, example.org uses "name" to mean someone's full name written out as a string (e.g., "John Smith"), but someone else may intend "name" to mean something different (e.g., the name of a variable in a piece of program text). A program encountering "name" as a property identifier on the Web wouldn't necessarily be able to distinguish these uses. However, if example.org writes http://www.example.org/terms/name for its "name" property, and the other person writes http://www.example.org/geneology/terms/name for hers, we can keep straight the fact that there are distinct properties involved (even if a program can't automatically determine the distinct meanings). Another reason why it is important to use URIs to identify properties is that it allows us to treat RDF properties as resources themselves. Since properties are resources, we can record descriptive information about them (e.g., the English description of what example.org means by "name"), simply by adding additional RDF statements with the property's URI as the subject.

Using URIs as subjects, objects, and predicates in RDF statements allows us to begin to develop and use a shared vocabulary on the Web, reflecting (and creating) a shared understanding of the concepts we talk about. For example, in the N-Triple

<http://www.example.org/index.html> 
<http://purl.org/dc/elements/1.1/creator> 
<http://www.example.org/staffid/85740> .

the predicate http://purl.org/dc/elements/1.1/creator is an unambiguous reference to the "creator" attribute in the Dublin Core metadata attribute set, a widely-used collection of attributes (properties) for describing information of all kinds. The writer of this triple is effectively saying that the relationship between the Web page (identified by http://www.example.org/index.html ) and the creator of the page (a distinct person, identified by http://www.example.org/staffid/85740 ) is exactly the concept defined by http://purl.org/dc/elements/1.1/creator . Moreover, anyone else, or any program, that understands http://purl.org/dc/elements/1.1/creator will know exactly what is meant by this relationship.

As a result, RDF provides a way to make statements that applications can more easily process. Now an application can't actually "understand" such statements, of course, but it can deal with them in a way that makes it seem like it does. For example, a user could search the Web for all book reviews and create an average rating for each book. Then, the user could put that information back on the Web. Another web site could take that information (the list of book rating averages) and create a "Top Ten Highest Rated Books" page.

@@This discussion of machine-processability could use some further qualification and amplification.@@

RDF statements are similar to a number of other formats for recording information, such as:

entries in a simple record or catalog listing describing the resource in a data processing system.
rows in a simple relational database.
simple assertions in formal logic

and information in these formats can be treated as RDF statements, allowing RDF to be used as a unifying model for integrating data from many sources.

2.4 Structured Property Values

Things would be very simple if the only types of information we had to record about things were obviously in the form of the simple RDF statements we've illustrated so far. However, most real-world data involves structures that are more complicated than that, at least on the surface. For instance, in our original example, we recorded the date the Web page was created as a simple string value. However, suppose we wanted to record the month, day, and year as separate pieces of information? Or, in the case of John Smith's personal information, suppose we wanted to record his address. We might write the whole address out as a string, as in the N-Triple

<http://www.example.org/staffid/85740> 
<http://www.example.org/terms/address>
"1501 Grant Avenue, Bedford, Massachusetts 01730" .

However, suppose we wanted to use RDF to record the various pieces of information about his address as separate street, city, state, and Zip code values. How do we do this?

We can represent such structured information in RDF by considering the aggregate thing we want to talk about (like John Smith's address) as a separate resource, and then making separate statements about that new resource. So, in the RDF graph, in order to break up John Smith's address into its component parts, we create a new node to represent the concept of John Smith's address, and assign that concept a new URI to identify it, say http://www.example.org/addressid/85740 . We then write RDF statements (create additional arcs and nodes) with that node as the subject, to represent the additional information, producing the graph below:

Figure 4: Breaking Up John's Address

or the N-Triples:

<http://www.example.org/staffid/85740> 
<http://www.example.org/terms/address> 
<http://www.example.org/addressid/85740> .

<http://www.example.org/addressid/85740> 
<http://www.example.org/terms/street> 
"1501 Grant Avenue" .

<http://www.example.org/addressid/85740> 
<http://www.example.org/terms/city> 
"Bedford" .

<http://www.example.org/addressid/85740> 
<http://www.example.org/terms/state> 
"Massachusetts" .

<http://www.example.org/addressid/85740> 
<http://www.example.org/terms/Zip> 
"01730" .

In the drawing of the graph above, the new URI we assigned to identify "John Smith's address" really serves no purpose, since we could just as easily have drawn the graph:

Figure 5: Using a Blank Node

In this drawing, which is a perfectly good RDF graph, we've used a node without a label to stand for the concept of "John Smith's address". This unlabeled node, or blank node, functions perfectly well in the drawing without needing a URI. However, we do need some form of explicit identifier for that node in order to represent this graph as N-Triples. To see this, we can try to write the N-Triples corresponding to what is shown in the drawn graph. What we would get would be something like:

<http://www.example.org/staffid/85740> 
<http://www.example.org/terms/address> 
??? .

??? 
<http://www.example.org/terms/street> 
"1501 Grant Avenue" .

??? 
<http://www.example.org/terms/city> 
"Bedford" .

??? 
<http://www.example.org/terms/state> 
"Massachusetts" .

??? 
<http://www.example.org/terms/Zip> 
"01730" .

where ??? stands for something that indicates the presence of the blank node. Since in a complex graph there might be more than one such blank node, we also need a way to differentiate between the various blank nodes in the corresponding N-Triples representation. To do this, the N-Triples notation uses a concept of node identifiers to identify blank nodes. These are temporary identifiers distinct from URIs (and having their own syntax in N-Triples) that are used to indicate the presence of blank nodes in the N-Triples representation. In this example, we might generate the node identifier _:johnaddress to refer to the blank node, in which case the resulting N-Triples might be:

<http://www.example.org/staffid/85740> 
<http://www.example.org/terms/address> 
_:johnaddress .

_:johnaddress 
<http://www.example.org/terms/street> 
"1501 Grant Avenue" .

_:johnaddress 
<http://www.example.org/terms/city> 
"Bedford" .

_:johnaddress 
<http://www.example.org/terms/state> 
"Massachusetts" .

_:johnaddress 
<http://www.example.org/terms/Zip> 
"01730" .

This is all there is to basic RDF: nodes-and-arcs diagrams interpreted as statements about concepts or digital resources represented by URIs . However, the need for standardized vocabularies for things like the properties "city" and "creator" is evident. The basis for such vocabularies in RDF is RDF Schema , which will be described in Section 4 . Additional discussion of the basic ideas underlying the RDF data model, and its role in providing a general language for describing Web information, can be found in [WEBDATA ].

@@The intro to the idea of standardized vocabularies could be amplified a little.@@

@@Need to point out that blank nodes have only local scope, unlike URIs@@

@@Add an example showing a blank node as representing a Person without an assigned URI, but having local identifiers assigned to it as property values.@@

3. An XML Syntax for RDF

To summarize what we've said already, RDF models statements in terms of a graph consisting of nodes and arcs. The nodes describe resources that can be labeled with URIs, string literals or are blank. The arcs connect the nodes and are all labeled with URIs. This graph is more precisely called a directed edge-labeled graph; each edge is an arc with a direction (an arrow) connecting two nodes. These edges can also be described as triples of subject node , at the blunt end of the arrow/arc, property arc and an object node at the sharp end of the arrow/arc. The property arc is interpreted as an attribute, relationship or predicate of the resource, with a value given by the object node content.

RDF also defines an XML syntax for writing down and exchanging RDF graphs. This syntax is defined in the RDF/XML Syntax Specification ([RDFXML]). In order to encode the graph in XML, the nodes and arcs are turned into XML elements, attributes, element content and attribute values. The URI labels for properties and object nodes are written in XML using XML Namespaces ([XML-NS]) which gives a namespace URI for a short prefix along with namespace-qualified elements and attributes names called local names. The (namespace URI, local name) pair are chosen such that concatenating them forms the original node URI. The URIs labeling subject nodes are stored in XML attribute values. The nodes labeled by string literals (which are always object nodes) become element text content or attribute values.

We can illustrate the basic ideas behind the RDF/XML syntax using some of the examples we've presented already. Suppose we want to represent one of our initial statements:

http://www.example.org/index.html has a creation-date whose value is August 16, 1999

The RDF graph for this statement, after assigning a URI to the creation-date property, is:

Figure 6: A Simple RDF Statement (SVG version)

with an N-Triple representation of:

<http://www.example.org/index.html> 
<http://www.example.org/terms/creation-date> 
"August 16, 1999" .

Corresponding RDF/XML syntax for this would be:

1. <?xml version="1.0"?>
2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.             xmlns:ex="http://www.example.org/terms/">

4.   <rdf:Description rdf:about="http://www.example.org/index.html">
5.       <ex:creation-date>August 16, 1999</ex:creation-date>
6.   </rdf:Description>
7. </rdf:RDF>

(we have added line numbers to use in explaining the example).

This seems like a lot of overhead. We can understand better what is going on by considering each part of this XML in turn.

Line 1, <?xml version="1.0"?>, is the XML declaration, which indicates that the following content is XML, and what version of XML it is.

Line 2 begins an rdf:RDF element. This indicates that the following XML content (starting here and ending with the </rdf:RDF> in Line 7) is intended to represent RDF. Following the rdf:RDF on this same line is an XML namespace declaration, represented as an xmlns attribute of the rdf:RDF start-tag. This declaration specifies that all tags in this content prefixed with rdf: are part of the namespace identified by the namespace name (a URI) http://www.w3.org/1999/02/22-rdf-syntax-ns#. This namespace is the source for the RDF-specific terms used in RDF/XML.

@@Not a URI with #; need to deal with this URI reference business@@

Line 3 specifies another XML namespace declaration, this time for the prefix ex:. This is expressed as another xmlns attribute of the rdf:RDF element, and specifies that the namespace name http://www.example.org/terms/ is to be associated with the ex: prefix. This namespace is the source for the specific terms defined by our example organization, example.org. The ">" at the end of line 3 indicates the end of the rdf:RDF start-tag. Lines 1-3 are general "housekeeping" necessary to indicate that we are defining RDF/XML content, and to identify the sources of the terms we are using.

Line 4 begins the RDF/XML for the specific statement we're representing. An obvious way to talk about this RDF statement is to say it's a description, and that it's about http://www.example.org/index.html. This is exactly the way the RDF/XML represents the statement. The rdf:Description element in Line 4 indicates that we're starting a description, and goes on to define the resource the statement is about (the subject of the statement) using the rdf:about attribute to specify the URI of the subject resource. The <ex:creation-date> element in Line 5 holds the value August 19, 1999 of the creation-date property of the statement. It is nested within the preceding rdf:Description element, indicating that this property applies to the resource specified in the containing rdf:Description element. An RDF processor would form the complete URI of the creation-date property by converting the ex: prefix to the namespace URI defined for it in Line 3, and appending creation-date to it.

Finally, Lines 6 and 7 indicate the ends of the rdf:Description and rdf:RDF elements, respectively.

The RDF/XML syntax provides several abbreviations to make common uses easier to write. For example, it is typical for the same resource to be described with several properties and values at the same time. To handle this case, RDF/XML allows multiple child elements representing those properties to be nested within the rdf:Description element identifying the subject resource. For example, if we wanted to represent our previous collection of statements about http://www.example.org/index.html:

<http://www.example.org/index.html> 
<http://purl.org/dc/elements/1.1/creator> 
<http://www.example.org/staffid/85740> .

<http://www.example.org/index.html> 
<http://www.example.org/terms/creation-date> 
"August 16, 1999" .

<http://www.example.org/index.html> 
<http://www.example.org/terms/language> 
"English" .

whose graph (the same as Figure 2) is:

Figure 7: Several Statements About the Same Resource

the RDF/XML syntax for this would be:

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:dc="http://purl.org/dc/elements/1.1/"
4.              xmlns:ex="http://www.example.org/terms/">

5.    <rdf:Description rdf:about="http://www.example.org/index.html">
6.         <ex:creation-date>August 16, 1999</ex:creation-date>
7.         <ex:language>English</ex:language>
8.         <dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
9.    </rdf:Description>
10. </rdf:RDF>

(we have added line numbers again to use in explaining the example).

Compared with the previous example, we've added an additional namespace declaration (in Line 3), and two additional elements defining properties (in Lines 7 and 8). The ex:language element in Line 7 is similar to the ex:creation-date element we defined earlier. Both these elements represent properties with strings as property values, and such elements are specified by enclosing the string value within start- and end-tags corresponding to the property name. The dc:creator element on Line 8 illustrates the syntax used when the property value is another (existing) resource, rather than a string. In this case, the property is represented by what XML calls an empty element (it has no end tag), and the property value is defined by an rdf:resource attribute within that empty element. The rdf:resource attribute indicates that its value is another resource, identified by its URI. This element also uses a different namespace prefix, the new namespace prefix dc: we defined in Line 3.

So far, we've been describing resources that have been defined (and given URIs) already. For instance, in our initial examples, we've been providing descriptive information about example.org's web page, whose URI was http://www.example.org/index.html. We referred to this resource (defined elsewhere) using an rdf:about attribute. However, obviously we also want to be able to define new resources. For example, suppose a company, example.com, wanted to provide an RDF-based catalog of its products as an RDF/XML document, identified by http://www.example.com/2002/04/products. Within that resource, each product might be given a separate RDF description. An example of one of these descriptions (the catalog entry for a tent) might be:

1.   <?xml version="1.0"?>
2.   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.               xmlns:ex="http://www.example.com/terms/">

4.     <rdf:Description rdf:ID="10245">
5.          <ex:product>tent</ex:product>
6.          <ex:model>Overnighter</ex:model>
7.          <ex:sleeps>2</ex:sleeps>
8.          <ex:weightInKg>2.4</ex:weightInKg>
9.          <ex:packedSize>14x56</ex:packedSize>
10.    </rdf:Description>
11.  </rdf:RDF>

(We've included the surrounding xml, RDF, and namespace information in lines 1 through 3, and line 11, but this information would only need to be defined once for the whole catalog, not repeated for each entry in the catalog).

This is similar to our previous examples in the way it represents the properties (model, sleeping capacity, weight) of the resource (the tent) being described. However, in line 4, the rdf:Description element has an rdf:ID attribute instead of an rdf:about attribute. Using rdf:ID indicates that we are describing a new resource, identified by the value of the rdf:ID attribute ("10245" in this case, which might be the catalog number used by example.com), rather than referring to an existing resource defined somewhere else. The rdf:ID attribute is somewhat similar to the ID attribute in XML and HTML, in that it defines a label which can be used to refer to this new resource. This label must be unique within the resource (in this case, the catalog) in which it is defined. Any other RDF within this catalog could refer to this new resource (this particular catalog entry) using the relative URI #10245. This would be understood to refer to another resource defined within the catalog.

RDF located outside the catalog could refer to this catalog entry by concatenating the relative URI #10245 of the catalog entry to the base URI of the catalog, forming the absolute URI http://www.example.com/2002/04/products#10245. For example, an outdoor sports web site exampleRatings.com might use RDF to provide ratings of various tents. The (5-star) rating given to the tent we described earlier might then be represented on exampleRatings.com's web site as:

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:sportex="http://www.exampleRatings.com/terms/">

4.    <rdf:Description rdf:about="http://www.example.com/2002/04/products#10245">
5.         <sportex:ratingBy>Richard Roe</sportex:ratingBy>
6.         <sportex:numberStars>5</sportex:numberStars>
7.    </rdf:Description>
8.  </rdf:RDF>

In this example, line 4 uses an rdf:Description element with an rdf:about attribute, since it is referring to a resource defined somewhere else. The value of this attribute is the URI of the tent's catalog entry, defined in the earlier RDF description. The use of this URI allows the tent being referred to in the rating to be precisely identified.

This example not only shows how new resources can be defined in RDF/XML; it also illustrates one of the basic architectural principles of the Web, which is that anyone should be able say anything they want about existing resources [BERNERS-LEE98]. The example also illustrates the fact that the RDF describing a particular resource does not need to be located all in one place; instead, it may be distributed throughout the web. This is true not only for examples like this one, in which one organization is rating or commenting on resources defined by another, but also for situations in which the original creator of a resource (or anyone else) wishes to amplify the description of that resource by providing additional information about it. This may be done either by modifying the original document in which the resource was defined, to add the properties and values needed to describe the additional information, or, as this example illustrates, by creating a separate document, and providing the additional properties and values in an rdf:Description element that refers to the original resource using rdf:about.

The RDF/XML syntax has many other capabilities. For example, the figure below (from the RDF/XML Syntax Specification) shows a graph saying "the document 'http://www.w3.org/TR/rdf-syntax-grammar' has a title 'RDF/XML Syntax Specification (Revised)' and has an editor, the editor has a name 'Dave Beckett' and a home page 'http://purl.org/net/dajobe/' ".

Figure 8: Graph for Another RDF/XML Example

In this case, the graph contains a blank node representing the editor (who apparently has not been given a URI). Some RDF/XML corresponding to this graph is:

1.  <?xml version="1.0"?>
2.  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.              xmlns:dc="http://purl.org/dc/elements/1.1/"
4.              xmlns:ex="http://example.org/stuff/1.0/">

5.     <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar">
6.       <dc:title>RDF/XML Syntax Specification (Revised)</dc:title>
7.       <ex:editor rdf:parseType="Resource">
8.         <ex:fullName>Dave Beckett</ex:fullName>
9.         <ex:homePage rdf:resource="http://purl.org/net/dajobe/" />
10.      </ex:editor>
11.    </rdf:Description>
12. </rdf:RDF>

Much of this XML is similar to what we have seen before. What is new is in lines 7-10, which specify the blank node, and its properties and their values. Line 7 starts with ex:editor, indicating that it is defining the ex:editor property of the containing rdf:Description element (in Line 5). However, it uses an attribute rdf:parseType="Resource to indicate that it is defining a new resource as the value of the ex:editor property. This resource is not given a name (there is no ID attribute), so it corresponds to a blank node. Within the ex:editor start and end tags (on lines 7 and 10), lines 8 and 9 define the ex:fullName and ex:homePage properties of this new resource, respectively. The end tag </ex:editor> on line 10 indicates the end of the information provided about this new resource.

These examples have illustrated some of the basic ideas behind the RDF/XML syntax. For a discussion of the basic principles behind the modeling of RDF statements in XML (known as striping), and other abbreviations that can be used when writing RDF in XML, refer to the RDF/XML Syntax Specification.

4. Defining RDF Vocabularies: RDF Schema

RDF defines a simple data model for describing the properties of resources, and interrelationships among resources, in terms of named properties and values. However, RDF user communities also require the ability to specify that they are describing certain types or classes of resources, and which specific properties will be used to describe each of those types or classes. For example, the company example.com from our examples in Section 3 would want to define classes such as Tent, and properties such as model, weightInKg, and packedSize to describe them. Similarly, people interested in describing bibliographic resources would want to define classes such as Book or MagazineArticle, and describe them using properties such as author, title, and subject. Other applications might require defining classes such as Person and Company, and properties such as age, jobTitle, stockSymbol, and numberOfEmployees. The RDF data model itself provides no mechanisms for specifying these things. Instead, such classes and properties are defined in an RDF schema . The facilities for defining RDF schemas are specified in the RDF Schema Specification [RDFSCHEMA].

The RDF Schema specification does not specify a specific vocabulary of classes like Tent or Book, and properties like weightInKg or author. Instead, it specifies the mechanisms needed to define such classes and properties, and to control which classes and properties are used together (for example, you probably wouldn't want the property jobTitle to be used in the description of a Tent). In other words, the RDF Schema mechanism provides a basic type system for use in RDF models. The RDF Schema type system is somewhat similar to the type systems of object-oriented programming languages such as Java. For example, the RDF Schema type system allows resources to be defined as instances of one or more classes. In addition, it allows classes to be organized in a hierarchical fashion; for example a class Dog might be defined as a subclass of Mammal which is a subclass of Animal, meaning that any resource which is in class Dog is also considered to be in class Animal.

The RDF Schema specification uses the RDF data model itself to define the RDF type system, by providing a set of pre-defined RDF resources and properties that can be used to define user-specific classes and properties. These pre-defined RDF Schema resources effectively define the RDF Schema vocabulary, and become part of the RDF model of any description that uses them. We will illustrate these basic resources and properties in the following sections.

4.1. Defining Classes

A class in RDF Schema corresponds to the generic concept of a Type or Category, similar to the notion of a class in object-oriented programming languages such as Java. RDF classes can be defined to represent almost anything, such as Web pages, people, document types, databases or abstract concepts. Classes are defined using the pre-defined resources rdfs:Class and rdfs:Resource, and the pre-defined properties rdf:type and rdfs:subClassOf.

First of all, all things described in RDF are called resources, and are considered to be instances of the pre-defined class rdfs:Resource. As a result, rdfs:Resource is the most basic class in the RDF Schema type system.

The property rdf:type is used to indicate that a resource is a member of a class, and thus has all the characteristics that are to be expected of a member of that class. When a resource has an rdf:type property whose value is some specific class, we say that the resource is an instance of the specified class. The value of an rdf:type property for some resource is always another resource which is a class.

A class is a resource whose rdf:type property has a value which is the pre-defined resource rdfs:class. So a new class, such as MotorVehicle, is defined by creating an RDF resource to represent the new class, and giving it an rdf:type property whose value is the pre-defined resource rdfs:Class.

The resource rdfs:Class itself has an rdf:type of rdfs:Class. Individual classes (for example, MotorVehicle) will always have an rdf:type property whose value is rdfs:Class (or some subclass of rdfs:Class, as described below). A resource may be an instance of more than one class.

A subset/superset relation between classes is defined using the pre-defined rdfs:subClassOf property. The rdfs:subClassOf property is transitive. This means that if class A is a subclass of some broader class B, and B is a subclass of C, then A is also implicitly a subclass of C. Consequently, resources that are instances of class A will also be instances of C, since A is a sub-set of both B and C. Only instances of rdfs:Class can have the rdfs:subClassOf property, and the property value is always a resource whose rdf:type is rdfs:Class. A class may be a subclass of more than one class.

The following example defines a simple class hierarchy. We first define a class MotorVehicle. We then define three subclasses of MotorVehicle, namely PassengerVehicle, Truck and Van. We then define a class Minivan which is a subclass of both Van and PassengerVehicle.

Figure 9: A Simple Class Hierarchy

Some corresponding RDF/XML syntax, defining the new classes using the techniques for creating new resources described in Section 3, is shown below. Note the use of rdf:ID to assign names (relative URIs), such as MotorVehicle, to the new resources (classes in this case), which are then referred to in other class definitions within the same schema.

<rdf:RDF xml:lang="en"  
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<!-- Note: this RDF schema would typically be used in RDF instance data     
by referencing it with an XML namespace declaration, for example    
xmlns:xyz="http://www.w3.org/2000/03/example/vehicles#".  This allows    
us to use abbreviations such as xyz:MotorVehicle to refer    
unambiguously to the RDF class 'MotorVehicle'. -->

<rdf:Description rdf:ID="MotorVehicle">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
</rdf:Description>

<rdf:Description rdf:ID="PassengerVehicle">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>

<rdf:Description rdf:ID="Truck">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>

<rdf:Description rdf:ID="Van">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>

<rdf:Description rdf:ID="MiniVan">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#Van"/>
  <rdfs:subClassOf rdf:resource="#PassengerVehicle"/>
</rdf:Description>

</rdf:RDF>

4.2. Defining Properties

Properties are defined using the pre-defined resource rdf:Property, and the pre-defined properties rdfs:domain, rdfs:range, and rdfs:subPropertyOf.

All properties in RDF are defined as instances of the predefined class rdf:Property. A new property, such as weightInKg, is defined by creating an RDF resource to represent the new property, and giving it an rdf:type property whose value is the pre-defined resource rdf:Property.

RDF Schema also provides a mechanism for specifying simple constraints on the use of properties and classes in RDF data. The basic constraints are those that describe limitations on the types of values that are valid for some property, or on the classes to which it makes sense to assign such properties. Specifically:

A range constraint specifies that the value of a property should be a resource of a designated class. For example, a range constraint applied to the author property might specify that the value of an author property must be a resource of class Person.
A domain constraint specifies that a property may be used on resources of a certain class. For example, a domain constraint applied to the author property might specify that the author property could only originate from a resource that was an instance of class Book.

Domain and range constraints are specified using the predefined RDF properties rdfs:range and rdfs:domain.

The rdfs:range property is used to indicate the class(es) that the values of a property must be members of. The value of a rdfs:range property is always a rdfs:Class. Range constraints are only applied to properties.

A property may have zero, one, or more than one range property. If there is no range property, the class of the property value is unconstrained. If there is exactly one range property, the property value must be an instance of the specified class that is the value of the range property. If there is more than one range property, the property value must be an instance of all of the classes that are values of those range properties. For example, if we assert that property xyz:hasMother has both a rdfs:range of Female and an rdfs:range of Person, this means that any value of property xyz:hasMother must be both an instance of class Female and an instance of class Person.

The rdfs:domain property is used to indicate the class(es) on whose members some specified property can be used. As with the rdfs:range property, the value of a rdfs:domain property is always a rdfs:Class, and domain constraints are only applied to properties.

A property may have zero, one, or more than one domain property. If there is no domain property, the property may be used with any resource. If there is exactly one domain property, the property may only be used on resources that are instances of the specified class that is the value of the domain property. If there is more than one domain property, the property can only be used on resources that are instances of all of the classes that are values of those domain properties.

We can illustrate the use of these constraint properties by continuing with our earlier example of MotorVehicle. In this example, we define two properties: registeredTo and rearSeatLegRoom. The registeredTo property is applicable to any MotorVehicle and its value is a Person. For the sake of this example, rearSeatLegRoom only applies to instances of class PassengerVehicle. The value is a Number, which is the number of centimeters of rear seat legroom (we assume that the classes for Person and Number are defined elsewhere>). These definitions are shown in the RDF/XML below:

<rdf:RDF xml:lang="en"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdf:Description rdf:ID="registeredTo">
  <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
  <rdfs:domain rdf:resource="#MotorVehicle"/>
  <rdfs:range rdf:resource="http://www.w3.org/2000/03/example/classes#Person"/>
</rdf:Description>

<rdf:Description rdf:ID="rearSeatLegRoom">
  <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
  <rdfs:domain rdf:resource="#PassengerVehicle"/> 
  <rdfs:range rdf:resource="http://www.w3.org/2000/03/example/classes#Number"/>
</rdf:Description>

</rdf:RDF>

As noted earlier, the RDF Schema type system is similar to the type systems of object-oriented programming languages such as Java. However, RDF differs from many programming language type systems in that instead of defining a class in terms of the properties its instances may have, an RDF schema defines properties in terms of the classes of resource to which they apply. The classes of resource to which properties apply are specified using domain and range constraints on the properties. For example, a classical object-oriented programming language might define a class Tent having an attribute called packedSize of type Literal. A corresponding RDF schema would define a class Tent, and a property packedSize having a domain of Tent and a range of Literal.

The difference between these approaches may not be immediately obvious, but it can be significant. In the programming language class definition, the attribute packedSize is part of the definition of class Tent, and applies only to instances of class Tent. Another class might also have an attribute called packedSize, but this would be considered a separate attribute. In other words, the scope of attribute definitions in most programming languages is restricted to the type in which they are defined. In RDF, on the other hand, property definitions are, by default, independent of class definitions, and have, by default, global scope (althought they may optionally be restricted to apply only to certain classes using domain constraints). So, for example, an RDF schema could define a property packedSize without a domain constraint. This property could then be used to describe instances of any class that might be considered to have a packed size. One benefit of the RDF property-based approach is that it becomes easier to extend the use of property definitions to situations that might not have been anticipated by the original definer, provided the properties have not been made overly specific by domain constraints. (Of course, this is a "benefit" which must be used with care, to insure that properties are not mis-applied in inappropriate situations.)

Although RDF Schema provides a mechanism for describing constraints such as domain and range constraints, it does not say whether or how an application must process the constraint information. For example, while an RDF schema might assert that an author property is used to indicate resources that are members of the class Person, it does not say whether or how an application should act in processing that class information. Different applications might use these constraints in different ways - e.g., a validator will look for errors, an interactive editor might suggest legal values, and a reasoning application might infer the class from other information and then announce any inconsistencies.

Now that we've shown how to define classes and properties using RDF Schema, we can see what instances corresponding to those definitions might look like. For example, the following is an instance of the PassengerVehicle class we defined above (which we assume is being defined in the same document as the schema), together with some hypothetical values for its registeredTo and rearSeatLegRoom properties. Note the use of the rdf:type property to indicate its class membership. Also note how we can apply a registeredTo property to this instance of PassengerVehicle, because PassengerVehicle is a subclass of MotorVehicle.

  <?xml version="1.0"?>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
              xmlns:xyz="http://www.w3.org/2000/03/example/vehicles#>

    <rdf:Description rdf:ID="johnSmithsCar">
         <rdf:type resource="#PassengerVehicle"/>
         <xyz:registeredTo rdf:resource="http://www.example.org/staffid/85740"/>
         <xyz:rearSeatLegRoom>127</xyz:rearSeatLegRoom>
    </rdf:Description>
  </rdf:RDF>

The RDF/XML syntax provides a special abbreviation for instances defined as members of classes using the rdf:type property. In this abbrevation, the rdf:type property and value are removed, and the rdf:Description element name is replaced by the class name. Using this abbreviation, John's car from the example above could also be defined as:

  <?xml version="1.0"?>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
              xmlns:xyz="http://www.w3.org/2000/03/example/vehicles#>

    <xyz:PassengerVehicle rdf:ID="johnSmithsCar">
         <xyz:registeredTo rdf:resource="http://www.example.org/staffid/85740"/>
         <xyz:rearSeatLegRoom>127</xyz:rearSeatLegRoom>
    </xyz:PassengerVehicle>
  </rdf:RDF>

RDF Schema provides a way to specialize properties as well as classes. The property rdfs:subPropertyOf can be used to specify that one property is a specialization of another. A property may be a specialization of zero, one or more properties. If some property P2 is a subPropertyOf another more general property P1, and if a resource A has a P2 property with a value B, this implies that the resource A also has a P1 property with value B. All RDF rdfs:range and rdfs:domain properties that apply to an RDF property also apply to each of its sub-properties.

As an example, if the property biologicalFather is a subproperty of the broader property biologicalParent, and if Fred is the biologicalFather of John, then it is implied that Fred is also the biologicalParent of John.

<rdf:RDF xml:lang="en"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdf:Description rdf:ID="biologicalParent">
  <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
</rdf:Description>

<rdf:Description rdf:ID="biologicalFather">
  <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
  <rdfs:subPropertyOf rdf:resource="#biologicalParent"/>
</rdf:Description>

</rdf:RDF>

4.3. Other Schema Information

RDF Schema also defines a number of other properties, which can be used to provide documentation and other information about an RDF schema or about instances. For example the rdfs:comment property can be used to provide a human-readable description of a resource. The rdfs:label property can be used to provide a more human-readable version of a resource's name. The rdfs:seeAlso property can be used to indicate a resource that might provide additional information about the subject resource. The rdfs:isDefinedBy property is a subproperty of rdfs:seeAlso, and can be used to indicate the resource that defines the subject resource. For further discussion of the use of these properties, you should consult the RDF Schema Specification.

4.4. Richer Schema Languages

RDF Schema provides basic capabilities for defining RDF vocabularies, but additional capabilities are also possible, and can be useful. These capabilities may be provided through further development of RDF Schema, or in other languages. For example, there is currently no way in RDF to indicate that the value of a Person's "age" property must be an integer (number of years); all literal values in RDF are currently strings (although an application is free to interpret a string "25" given as the value of an "age" property as being a number, RDF itself does not define this as the proper interpretation, or enforce this as a constraint when entering the value). An RDF Datatyping specification is currently under development, and may become part of the RDF specifications at some future time.

Other richer schema capabilities have also been identified as useful. For example:

cardinality constraints on properties, e.g., that a Person has exactly one father.
that a given property (such as hasAncestor) is transitive, e.g., that if A hasAncestor B, and B hasAncestor C, then A hasAncestor C.
that two different classes, defined in separate schemas, actually represent the same concept.
that two different instances, defined separately, actually represent the same individual.
the ability to define new classes in terms of combinations (e.g., unions and intersections) of other classes.

The additional capabilities mentioned above, in addition to others, are the targets of ontology languages such as DAML+OIL, and the language currently being developed by the W3C's Web-Ontology Working Group. Both these languages are based on RDF and RDF Schema (and DAML+OIL currently provides all the additional capabilities mentioned above). The intent of such languages is to provide additional machine-processable semantics for resources, that is, to make the machine representations of resources more closely resemble their intended real world counterparts. While such capabilities are not necessarily needed to build useful applications using RDF (see Section 6 for a description of several RDF applications), the development of such languages is a very active subject of work as part of the development of the Semantic Web.

@@The above could clearly be developed a great deal further!@@

5. RDF Containers

There is often a need to represent collections of things. For example, we might want to say that a book was created by several authors, or to list the students in a course, or the software modules in a package. RDF provides several pre-defined container types that can be used to do this.

A Bag (type rdf:Bag) is an unordered collection of resources or literals. A Bag is used to represent a collection that has multiple values, and there is no significance to the order in which the values are given. For example, a Bag might be used to represent a collection of part numbers in which the order of entry or processing of the part numbers does not matter. A Bag can contain duplicate values.

A Sequence (type rdf:Seq) is an ordered collection of resources or literals. A Sequence is used to represent a collection that has multiple values, and the order of the values is significant. For example, a Sequence might be used to represent a collection that must be maintained in alpabetical order. A Sequence can contain duplicate values.

An Alternative (type rdf:Alt) is a collection of resources or literals that represent alternative values (typically for a single value of a property). For example, an Alternative might be used to specify alternative language translations for the title of a book, or to provide a list of alternative Internet sites at which a resource might be found. An application using a property whose value is an Alternative collection should be aware that it can choose any one of the items in the collection as appropriate.

To represent a specific instance of one of these types of collections, you create a new resource, and give it an rdf:type property whose value is one of the pre-defined resources rdf:Bag, rdf:Seq, or rdf:Alt (whichever is appropriate). This new container resource represents the collection as a whole, and may either be a blank node or be given a URI. The members of the collection are then indicated by defining a membership property for each member that has the new container resource as its subject and the member resource as its object. These membership properties have the names rdf:_1, rdf_2, rdf_3, and so on, and are used specifically for defining the members of containers. Container resources may also have other properties that describe the container, in addition to the membership properties and the rdf:type property.

A typical use of a container is to represent the value of a property. For example, to represent the sentence "The students in course 6.001 are Amy, Tim, John, Mary, and Sue", the RDF graph might be:

Figure 10: A Simple Bag Container (SVG version)

This can be written in RDF/XML as:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:s="http://mycollege.edu/students/vocab#">

   <rdf:Description rdf:about="http://mycollege.edu/courses/6.001">
      <s:students>
         <rdf:Bag>
            <rdf:li rdf:resource="http://mycollege.edu/students/Amy"/>
            <rdf:li rdf:resource="http://mycollege.edu/students/Tim"/>
            <rdf:li rdf:resource="http://mycollege.edu/students/John"/>
            <rdf:li rdf:resource="http://mycollege.edu/students/Mary"/>
            <rdf:li rdf:resource="http://mycollege.edu/students/Sue"/>
         </rdf:Bag>
      </s:students>
   </rdf:Description>
</rdf:RDF>

Since the value of the s:students property is expressed as a Bag, there is no significance in the order given for the URIs of each student.

Note that the RDF/XML uses li as a convenience element to avoid having to explicitly number each membership property. The RDF processor will generate the numbered properties rdf:_1, rdf:_2, and so on from the li elements as necessary. The element name li was chosen to be mnemonic with the term "list item" from HTML.

As an illustration of an Alternative container, the sentence "The source code for X11 may be found at ftp.x.org, ftp.example.org, or ftp.example2.org" would have an RDF graph:

Figure 11: A Simple Alternative Container (SVG version)

This can be written in RDF/XML as:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:s="http://x.org/packages/vocab#">

<rdf:RDF>
   <rdf:Description about="http://x.org/packages/X11">
      <s:DistributionSite>
         <rdf:Alt>
            <rdf:li resource="ftp://ftp.x.org"/>
            <rdf:li resource="ftp://ftp.example.org"/>
            <rdf:li resource="ftp://ftp.example2.org"/>
         </rdf:Alt>
      </s:DistributionSite>
   </rdf:Description>
</rdf:RDF>

In this case, the value of the s:DistributionSite site property is considered to be one of the members of the Alternative container, any one of which would be an acceptable value. An Alternative container is required to have at least one member. This member is identified by the property rdf:_1, and is intended to be considered to be the default or preferred value.

Alternative containers are frequently used in conjunction with language tagging. For example, a work whose title has been translated into several languages might have its Title property pointing to an Alternative container holding each of the language variants.

The examples above illustrate that the general structures of the RDF graphs for both Bags and Alternatives are the same (and they are also the same for Sequences); only the indicated rdf:type is different. RDF considers these types as essentially "hints" to a processing application on how to properly interpret the structures. This is because RDF processors are not in a position to control how an application actually uses these structures. For example, an RDF processor has no way to force an application to use the first member of an Alternative collection as a default value. Similarly, an RDF processor has no way to force an application to ignore order in processing a Bag.

RDF processors are also limited in their ability to enforce structural constraints on these collections. For example, these structures explicitly permit duplicate values. RDF does not define a Set container, which would be a Bag with no duplicates, because RDF processors are not necessarily in a position to enforce a no-duplicates constraint (for example, a duplicate might exist somewhere else on the web, unknown to the processor). Also, if you create the membership properties yourself, RDF does not insist that the property numbers be contiguous starting with rdf:_1. For example, you could create a legal Bag with just the membership properties rdf:_3, rdf:_7, rdf:_8, and rdf:_11 (although an RDF processor would not generate these property names from a collection of rdf:li properties).

Users are also free to choose their own representations for collections, rather than using the ones described here. These RDF collections are merely provided as common definitions that, if generally used, would help make data involving collections more interoperable.

@@We've omitted the discussion from the M&S about the use of container objects vs. multiple statements with the same predicate, since the examples were rather forced.@@

6. Some RDF Applications: RDF in the Field

@@section intro TBD@@

6.1 Dublin Core Metadata Initiative

The Dublin Core is a set of "elements" (properties) for describing documents. The element set was originally developed at the March 1995 Metadata Workshop in Dublin, Ohio, and has subsequently been modified on the basis of later Dublin Core Metadata Workshops. The goal of the Dublin Core is to provide a minimal set of descriptive elements that facilitate the description and the automated indexing of document-like networked objects, in a manner similar to a library card catalog. The Core metadata set is intended to be suitable for use by resource discovery tools on the Internet, such as the "webcrawlers" employed by popular World Wide Web search engines. In addition, the Core is meant to be sufficiently simple to be understood and used by the wide range of authors and casual publishers who contribute information to the Internet. Dublin Core elements have become widely used in documenting Internet resources (we have already used the Dublin Core creator element in earlier examples). The current elements of the Dublin Core are defined in The Dublin Core Metadata Element Set, Version 1.1: Reference Description, and contain definitions for the following properties:

Title: A name given to the resource.
Creator: An entity primarily responsible for making the content of the resource.
Subject: The topic of the content of the resource.
Description: An account of the content of the resource.
Publisher: An entity responsible for making the resource available
Contributor: An entity responsible for making contributions to the content of the resource.
Date: A date associated with an event in the life cycle of the resource.
Type: The nature or genre of the content of the resource.
Format: The physical or digital manifestation of the resource.
Identifier: An unambiguous reference to the resource within a given context.
Source: A Reference to a resource from which the present resource is derived.
Language: A language of the intellectual content of the resource.
Relation: A reference to a related resource.
Coverage: The extent or scope of the content of the resource.
Rights: Information about rights held in and over the resource.

Information using the Dublin Core elements may be represented in any suitable language (e.g., in HTML Meta elements). However, RDF is an ideal representation for Dublin Core information. The examples below represent the simple description of a set of resources in RDF using the Dublin Core vocabulary. Note that the specific Dublin Core RDF vocabulary shown here is not intended to be authoritative. The Dublin Core Reference Description is the authoritative reference.

Here is a description of a Web site home page using Dublin Core properties:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description about="http://www.dlib.org">
      <dc:title>D-Lib Program - Research in Digital Libraries</dc:title>
      <dc:description>The D-Lib program supports the community of people
       with research interests in digital libraries and electronic
       publishing.</dc:description>
      <dc:publisher>Corporation For National Research Initiatives</dc:publisher>
      <dc:date>1995-01-07</dc:date>
      <dc:subject>
        <rdf:Bag>
          <rdf:li>Research; statistical methods</rdf:li>
          <rdf:li>Education, research, related topics</rdf:li>
          <rdf:li>Library use Studies</rdf:li>
        </rdf:Bag>
      </dc:subject>
      <dc:type>World Wide Web Home Page</dc:type>
      <dc:format>text/html</dc:format>
      <dc:language>en</dc:language>
    </rdf:Description>
</rdf:RDF>

Note that both RDF and Dublin Core define an (XML) element called "Description" (although here we've written the Dublin Core element in lower case). The XML namespace mechanism enables us to distinguish between these two elements (one is rdf:Description, and the other is dc:description). Also, as a matter of interest, if you access "http://purl.org/dc/elements/1.1/" in a Web browser (as of the current writing), you will get an RDF Schema declaration for the Dublin Core Element Set 1.1.]

The second example describes a published magazine.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dcterms="http://purl.org/dc/terms/">
    <rdf:Description about="http://www.dlib.org/dlib/may98/05contents.html">
      <dc:title>DLIB Magazine - The Magazine for Digital Library Research
        - May 1998</dc:title>
      <dc:description>D-LIB magazine is a monthly compilation of
       contributed stories, commentary, and briefings.</dc:description>
      <dc:contributor>Amy Friedlander</dc:contributor>
      <dc:publisher>Corporation for National Research Initiatives</dc:publisher>
      <dc:date>1998-01-05</dc:date>
      <dc:type>electronic journal</dc:type>
      <dc:subject>
        <rdf:Bag>
          <rdf:li>library use studies</rdf:li>
          <rdf:li>magazines and newspapers</rdf:li>
        </rdf:Bag>
      </dc:subject>
      <dc:format>text/html</dc:format>
      <dc:identifier>urn:issn:1082-9873</dc:identifier>
      <dcterms:isPartOf rdf:resource="http://www.dlib.org"/>
    </rdf:Description>
 </rdf:RDF>

In this example, we've used (in the third line from the bottom) the Dublin Core qualifier isPartOf (from a separate namespace) to indicate that this magazine is "part of" the previously-described web site.

The third example is of a specific article in the magazine referred to in the previous example.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dcterms="http://purl.org/dc/terms/">
    <rdf:Description about=
    "http://www.dlib.org/dlib/may98/miller/05miller.html">
      <dc:title>An Introduction to the Resource Description Framework</dc:title>
      <dc:creator>Eric J. Miller</dc:creator>
      <dc:description>The Resource Description Framework (RDF) is an
       infrastructure that enables the encoding, exchange and reuse of
       structured metadata. rdf is an application of xml that imposes needed
       structural constraints to provide unambiguous methods of expressing
       semantics. rdf additionally provides a means for publishing both
       human-readable and machine-processable vocabularies designed to
       encourage the reuse and extension of metadata semantics among
       disparate information communities. the structural constraints rdf
       imposes to support the consistent encoding and exchange of
       standardized metadata provides for the interchangeability of separate
       packages of metadata defined by different resource description
       communities. </dc:description>
      <dc:publisher>Corporation for National Research Initiatives</dc:publisher>
      <dc:subject>
        <rdf:Bag>
          <rdf:li>machine-readable catalog record formats</rdf:li>
          <rdf:li>applications of computer file organization and
           access methods</rdf:li>
        </rdf:Bag>
      </dc:subject>
      <dc:rights>Copyright @ 1998 Eric Miller</dc:rights>
      <dc:type>Electronic Document</dc:type>
      <dc:format>text/html</dc:format>
      <dc:language>en</dc:language>
      <dcterms:isPartOf 
         rdf:resource="http://www.dlib.org/dlib/may98/05contents.html"/>
    </rdf:Description>
</rdf:RDF>

In this final example, we've also used the qualifier isPartOf, this time to indicate that this article is "part of" the previously-described magazine.

6.2 PRISM

@@probably need to introduce the idea of "metadata" a little bit better; possibly under the DC section@@

PRISM (Publishing Requirements for Industry Standard Metadata) is a metadata specification developed in the publishing industry. Magazine publishers and their vendors formed the PRISM Working Group to identify the industry's needs for metadata and define a specification to meet them. Publishers want to use existing content in more ways in order to get a greater return on the investment made in creating it. Converting magazine articles to HTML for posting on the web is one example. Licensing it to aggregators like LexisNexis is another. All of these are "first uses" of the content; typically they all go live at the time the magazine hits the stands. The publishers also want their content to be "evergreen". It might be used in new issues, such as in a retrospective article. It could be used by other divisions in the company, such as in a book compiled from the magazine's photos, recipes, etc. Another use is to license it to outsiders, such as in a reprint of a product review, or in a retrospective produced by a different publisher. This overall goal requires a metadata approach which emphasizes discovery, rights tracking, and end-to-end metadata.

Discovery: Discovery is a general term for finding content which encompasses searching, browsing, content routing (described further in section [reference]), and other techniques. Discussions of discovery frequently center on a consumer searching a public web site. However, discovering content is much broader than that. The audience may be consumers, or it may be internal users such as researchers, designers, photo editors, licensing agents, etc. To assist discovery, PRISM provides elements for the topics, formats, genre, origin, and contexts of a resource. It also provides for categorizing resources using multiple subject description taxonomies.

Rights Tracking: Magazines frequently contain material licensed from others. Photos from a stock photo agency are the most common type of licensed material, but articles, sidebars, and all other types of content may be licensed. Simply knowing if content was licensed for one-time use, requires royalty payments, or is wholly-owned by the publisher is a struggle. PRISM provides elements for basic tracking of such rights. A separate namespace defined in the PRISM specification allows one to build descriptions of places, times, and industries where content may or may not be used.

End-to-end metadata: Most published content already has metadata created for it. Unfortunately, when content moves between systems, the metadata is frequently discarded, only to be re-created later in the production process at considerable expense. PRISM aims to reduce this problem by providing a specification that can be used in multiple stages in the content production pipeline. An important feature of the PRISM specification is its use of other existing specifications. Rather than create an entirely new thing, the group decided to use existing specifications as much as possible, and only define new things where needed. For this reason, the PRISM specification uses XML, RDF, Dublin Core, and well as various ISO formats and vocabularies.

A PRISM description may be as simple as a few elements from the Dublin Core with literal values. The example below describes a photograph, giving basic information on its title, photographer, format, etc.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xml:lang="en-US">

 <rdf:Description rdf:about="http://wanderlust.com/2000/08/Corfu.jpg">
  <dc:title>Walking on the Beach in Corfu</dc:title>
  <dc:description>Photograph taken at 6:00 am on Corfu with two models
  </dc:description>
  <dc:creator>John Peterson</dc:creator>
  <dc:contributor>Sally Smith, lighting</dc:contributor>
  <dc:format>image/jpeg</dc:format>
 </rdf:Description>
</rdf:RDF>

PRISM also augments the Dublin Core to allow more detailed descriptions. The augmentations are defined in three new namespaces:

prism: This is the main namespace. Most of its elements are more specific versions of elements from the Dublin Core. For example, dc:date is extended by elements like prism:publicationTime, prism:releaseTime, prism:expirationTime, etc.

pcv: Currently, common practice for describing the subject(s) of an article is by supplying appropriate-seeming keywords. Unfortunately, simple keywords do not make a great difference in retrieval performance, due to the fact that different people will use different keywords [BATES96]. Best practice is to code the articles with subject terms from a "controlled vocabulary". The vocabulary should provide as many synonyms as possible for its terms in the vocabulary. This way the controlled terms provide a meeting ground for the keywords supplied by the searcher and the indexer. The PRISM Controlled Vocabulary (pcv) namespace provides elements for specifying terms in a vocabulary, the relations between terms, and alternate names for the terms.

prl: Digital Rights Management is an area undergoing considerable upheaval. There are a number of proposals for rights management languages, but none are clearly favored throughout the industry. Because there was no clear choice to recommend, the PRISM Rights Language (PRL) was defined as an interim measure. It provides elements which let people say if an item can or can't be 'used', depending on conditions of time, geography, and industry. This is believed to be an 80/20 tradeoff which will help publishers begin to save money when tracking rights. It is not intended to be a general rights language, or allow publishers to automatically enforce limits on consumer uses of the content.

PRISM uses RDF because of its abilities for dealing with descriptions of varying complexity. Currently, a great deal of metadata uses simple string values, such as

<dc:coverage>Greece</dc:coverage>

Over time we expect uses of the PRISM specification to become more sophisticated, moving from simple literal values to more structured values. In fact, that range of values is a situation we face now. Some publishers already use sophisticated controlled vocabularies, others are barely using manually-supplied keywords. Some examples of the different kinds of values that can be given are:

<dc:coverage>Greece</dc:coverage>

<dc:coverage         
rdf:resource="rdf:about="http://prismstandard.org/vocabs/ISO-3166/GR">

and

<dc:coverage>
  <pcv:Descriptor
      rdf:about="http://prismstandard.org/vocabs/ISO-3166/GR">
    <pcv:label xml:lang="en">Greece</pcv:label>
    <pcv:label xml:lang="fr">Grece</pcv:label>
  </pcv:Descriptor>
</dc:coverage>

Note also that there are elements whose meanings are similar, or subsets of other elements. For example, the geographic subject of a resource could be given with

<prism:subject>Greece</prism:subject>
<dc:coverage>Greece</dc:coverage>

or

<prism:location>Greece</prism:location>

Any of those elements might use the simple literal value, or a more complex structured value. Such a range of possibilities cannot be adequately described by DTDs, or even by the newer XML Schemas. While there is a wide range of syntax to deal with, RDF's graph model has a simple structure - a list of 'triples'. Dealing with the metadata in the triples domain makes it much easier for older software to accommodate content with new extensions.

We will close this section with two final examples. First says that the image (.../Corfu.jpg) can't be used (#none) in the tobacco industry (code 21 in SIC, the Standard Industrial Classifications).

<rdf:RDF xmlns:prism="http://prismstandard.org/namespaces/basic/1.0/"
         xmlns:prl="http://prismstandard.org/namespaces/prl/1.0/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">

 <rdf:Description rdf:about="http://wanderlust.com/2000/08/Corfu.jpg">
  <dc:rights  rdf:parseType="Resource"
         xml:base="http://prismstandard.org/vocabularies/1.0/usage.xml">
     <prl:usage rdf:resource="#none"/>
     <prl:industry rdf:resource="http://prismstandard.org/vocabs/SIC/21"/>
  </dc:rights>
 </rdf:Description>
</rdf:RDF>

The second says that the photographer for the Corfu image was employee 3845, better known as John Peterson. It also says that the geographic subject of the photo is Greece. It does so by providing, not just a code from a controlled vocabulary, but a cached version of the information for that term in the vocabulary.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:pcv="http://prismstandard.org/namespaces/pcv/1.0/"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xml:base="http://wanderlust.com/">

  <rdf:Description rdf:about="/2000/08/Corfu.jpg">
    <dc:identifier rdf:resource="/content/2357845" />
    <dc:creator>
      <pcv:Descriptor rdf:about="/emp3845">
        <pcv:label>John Peterson</pcv:label>
      </pcv:Descriptor>
    </dc:creator>
    <dc:coverage>
      <pcv:Descriptor
          rdf:about="http://prismstandard.org/vocabs/ISO-3166/GR">
        <pcv:label xml:lang="en">Greece</pcv:label>
        <pcv:label xml:lang="fr">Grece</pcv:label>
      </pcv:Descriptor>
    </dc:coverage>
  </rdf:Description>
</rdf:RDF>

6.3 XPackage

Many situations involve the need to maintain information about structured collections of resources and their associations that are, or may be, used as a unit. The XML Package (XPackage) specification provides a framework for defining such collections, called packages. XPackage specifies a framework for describing the resources included in such packages, the properties of those resources, their method of inclusion, and their relationships with each other. XPackage applications include specifying the stylesheets used by a document, declaring the images shared by multiple documents, indicating the author and other metadata of a document, describing how namespaces are used by XML resources, and providing a manifest for bundling resources into a single archive file.

The XPackage framework is based upon XML, RDF, and XLink, and provides two RDF vocabularies: one for general packaging descriptions, and another for describing XML-based resources. Although XPackage is an application of RDF, the package description document is defined by an XML Schema. This allows XPackage to be implemented as a general XML application without an RDF processor, while still maintaining RDF compliance of conforming documents. The XPackage framework also allows customization through extension and/or restriction.

One application of XPackage is the description of XHTML documents and their supporting resources. An XHTML document retrieved from a web site may rely on other resources such as stylesheets and image files that also need to be retrieved. However, the identities of these supporting resources may not be obvious without processing the entire document. Other information about the document, such as the name of its author, may also not be available without processing the document. XPackage allows such descriptive information to be stored in a standard way in a package description document containing RDF. The outer elements of a package description document describing such an XHTML document might look like the following example (with namespace declarations removed for simplicity):

<?xml version="1.0"?>
<xpackage:description>
  <rdf:RDF>

    (description of individual resources go here)

  </rdf:RDF>
</xpackage:description>

Resources (such as the XHTML document, stylesheets, and images) are described within this package description document. The XHTML document resource itself is described using an RDF resource description element <xpackage:resource> from the XPackage ontology (vocabulary). Each resource description element may include RDF properties from various ontologies. In the example below, the document's MIME content type ("application/xhtml+xml") is defined using a standard XPackage property from the XPackage ontology, xpackage:contentType. Another property, the document's author (in this case, "Garret Wilson"), is described using a property from a custom ontology, the Dublin Core, resulting in a dc:creator property. XPackage itself specifies an extension property set specifically for XML-based resources, the XML ontology, including specifying XML namespaces and stylesheets used with the xmlprop:namespace and xmlprop:style properties, respectively.

    <!--doc.html-->
    <xpackage:resource rdf:about="urn:examples:xhtmldocument-doc">
      <rdfs:comment>The XHTML document.</rdfs:comment>
      <xpackage:location xlink:href="doc.html"/>
      <xpackage:contentType>application/xhtml+xml</xpackage:contentType>
      <xmlprop:namespace rdf:resource="http://www.w3.org/1999/xhtml"/>
      <xmlprop:style rdf:resource="urn:examples:xhtmldocument-stylesheet"/>
      <xmlprop:annotation rdf:resource="urn:examples:xhtmldocument-annotation"/>
      <dc:creator>Garret Wilson</dc:creator>
      <xpackage:manifest>
        <rdf:Bag>
          <rdf:li rdf:resource="urn:examples:xhtmldocument-stylesheet"/>
          <rdf:li rdf:resource="urn:examples:xhtmldocument-image"/>
        </rdf:Bag>
      </xpackage:manifest>
    </xpackage:resource>

The xpackage:manifest property indicates that both the stylesheet and image resources are necessary for processing; those resources are described separately within the package description document. The example stylesheet resource description below lists its location ("stylesheet.css") using the XPackage ontology xpackage:location property (which is compatible with XLink), and shows through use of the XPackage ontology xpackage:contentType property that it is a CSS stylesheet ("text/css").

    <!--stylesheet.css-->
    <xpackage:resource rdf:about="urn:examples:xhtmldocument-css">
      <rdfs:comment>The document stylesheet.</rdfs:comment>
      <xpackage:location xlink:href="stylesheet.css"/>
      <xpackage:contentType>text/css</xpackage:contentType>
    </xpackage:resource>

The full version of this example may be found in the XML Package specification.

6.4 Intelligent Routing

@@This section needs editing now that we have the prior application sections.@@

The world is full of information. Behind the millions of pages on the Internet's publicly visible part, the Web, there are many times as many documents flowing in and out of organizations via emails, cross-company networks and constant always-on information "feeds".

Every document that passes along the wires has to be inspected, processed or re-routed. A document simply written by one human being has to be read by another before anybody knows its worth or where it should be redirected. This is fine for a person-to-person email but, for information destined to a broad circulation, this can be expensive, often reducing the value of the information by raising its handling cost or simply making it late.

For example, when an individual subscribes to a source of news, it's usually on the understanding that everything in that feed is of interest and so everything will be delivered without question. For the distributor to sort out the interesting ones for you manually would be time-consuming, expensive and boring; so instead we accept dozens of emails and delete most of them every morning. And of course it is time-consuming, expensive and boring. Subscription to some less self-critical sources is a step to be taken very seriously.

When a company subscribes to a news feed, it may be risking a deluge of unwanted data. If it intends to circulate the information within the company or to a broad range of clients, it charges itself with checking every document by eye or investing in extra software technology. Without such protection, the company's networks will soon collapse under the load or its clients will consider themselves willfully "spammed" and withdraw their custom.

The redirection of such feeds is therefore a matter of utmost commercial sensitivity in a context of huge and increasing volumes and complexity of data. The technology concerned is "routing" and, in the most modern cases, relies on RDF.

The need, traditionally, for human inspection of incoming documents comes from the fact that, on its own, text has no value. It only has value when you know what it's about, what authority is its source and who it's intended for. Everything else is, as we know, just material for spamming. For a software agent to recognize a document's worth it must have access to an evaluation that is consistently readable, whatever the format of the document, and is reliable in its description.

For those two objectives, we need an internationally standardized language and a globally recognized set of values. These are RDF, together with RDF Schemas such as defined by Dublin Core and PRISM. The longed-for independent evaluation takes the form of an associated RDF document.

Not that every document from every information source comes with its associated RDF description... yet. It is the case however that almost every serious source supplies some value-based annotation in the form of metadata, the significant content of RDF. For example, news feeds generally come in one of a selection of annotated formats, mostly based on XML, such as NewsML or XMLNews. Most standards-oriented companies are adding freely-accessible metadata to their document formats. Adobe, for example, recently announced XMP whereby metadata can be inserted into (and more importantly extracted from) PDF documents. The message from such companies is that, even if you can't understand or even have no right to read the contents, you are entitled to know enough to make an evaluation for your own use or for clients who can use the information. For freely available information, standards are the key.

Now this basic process (source embeds standard annotations: annotations are used to divert and sort documents) is certainly not new. Email (SMTP) and news (NNTP) protocols use standard keyword-value-pair headers which are fundamental to their operation: such documents are marked up according to known and publicized standards. What is new is to normalize all these local formats to a general one and thereby be able to appeal to a globally consistent set of values in making judgments.

For a universal router to do its job, it needs to cancel out these variations in format. Until the world adopts one standard, this will be a matter of tact and ingenuity but the existence of a core standard is important here. When a slew of formats and value-systems need to be compared it is safer to have one standard to convert to first and then to compare rather than do it piecemeal - and that standard must be broader than all the others. Again, RDF (and RDF Schemas) standards are the natural choice.

An Information Router collects metadata and stores it (rather like an enormous RDF document describing maybe millions of resources at once). This metadata store holds the descriptions in exactly the terms of RDF. They can therefore be exported or imported as industry-standard RDF without loss or confusion. World-wide, repositories of metadata may be synchronized and refreshed by exchanging RDF. While humans are exchanging images, videos and news items, metadata servers are exchanging compact RDF evaluations of them (the images, videos and news items, not the humans).

The actual documents described, orders of magnitude larger than the metadata, can be stored elsewhere or just left where they are (located by URI, of course). The metadata is compact and loaded with value. Judgments about distributing material can be made in a context values (the standard predicate systems like Dublin Core) and a vast number of alternatives, all without moving the actual documents around or indeed even looking at them, by computer or by human eye.

Judgments are made by applying RDF "queries" which are testing the value of a document to the reader: whether the subject is interesting, the content is suitable, the author respected, the source reliable, the document accessible, the cost reasonable, the language intelligible, the conclusion desirable, the format tractable, the medium handleable, etc., etc. The actual form of a query varies from product to product. (In any case, the consumer would be given a graphical way to express his wishes.)

In one case, a query takes the form of a modified RDF description which, if you like, asks to be proved or disproved by a body of metadata. So an RDF Description that stated that a document exists with the title "Financial history of Belize" can be viewed as a request to find such a document.

The news distributor's server runs, in addition to the usual server software, one of these Information Router packages, applies queries on behalf of its clients and delivers just those documents that survive the evaluation.

If a complex multi-layered query describing just what it takes to please you is associated with your name as a subscriber, you can, using software available today, guarantee that what you get sent is exactly and only what you need. It's an end to spam, thanks to RDF.

6.5 RSS: RDF Site Summary 1.0

@@ TBD @@

7. Other Parts of RDF

@@section intro TBD@@

7.1 Model Theory

RDF is being developed as part of the W3C's Semantic Web Activity . As described in the Semantic Web Activity Statement ,

The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration, and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be shared and processed by automated tools as well as by people.

RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that helped initiate the original Web. In order to serve this purpose, the meaning of RDF statements must be defined in a very precise manner.

The RDF Model Theory document provides this precise definition, through what is technically called a "model-theoretic semantics". A model-theoretic semantics for a language assumes that the language refers to a 'world', and describes the minimal conditions that a world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. The RDF model theory is couched in the language of set theory simply because that is the normal language of mathematics - for example, the model theory assumes that names denote things in a set IR called the 'universe' - but the use of set-theoretic language is not supposed to imply that the things in the universe are set-theoretic in nature.

The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they preserve meaning.

The RDF model theory treats RDF as a simple assertional language, in which each triple makes a distinct assertion, and the meaning of any triple is not changed by adding other triples. Based on the semantics defined in the model theory, it is simple to translate an RDF graph into a logical expression with essentially the same meaning.

@@Revised and/or additional discussion of the model theory to be added as time permits@@

7.2 Test Cases

The RDF Test Cases document supplements the textual RDF specifications with specific examples of RDF/XML syntax and the corresponding RDF graph triples. To describe these examples, it introduces the N-triples notation used in earlier sections of the Primer. The test cases themselves are also published in machine-readable form at Web locations referenced by the Test Cases document, so developers can use these as the basis for some automated testing of RDF software.

The Test Cases document also contains a number of "entailment tests", which indicate entailments (conclusions) that applications are allowed by the RDF specifications to draw from RDF data.

The test cases are not a complete specification of RDF, and are not intended to take precedence over the normative specification documents. However, they are intended to illustrate the intent of the RDF Core Working Group with respect to the design of RDF, and developers may find these test cases helpful should the wording of the specifications be unclear on any point of detail.

7.3 Reification

@@TBD@@

@@other parts may also be identified; also TBD@@

8. RDF As a Data Model

@@This is currently a placeholder (the title might change too) for a brief discussion of several related topics: (a) how RDF relates to XML (why you might want to use RDF, rather than using XML structures directly); (b) where RDF fits in the general world of data models, particularly its relationship to the relational data model, and related work on binary relational and other "semantic" data models. As part of this latter material, there will be pointers to some of the literature on database (schema) design (functional dependencies are highly relevant to RDF design), since analysis and design is going to be needed to develop robust RDF applications, and a lot of prior work exists on this subject that can be drawn from; (c) "identifier design": that deciding how to assign URIs to things is a design issue too, and some of the issues involved (e.g., options when different people assign different URIs to the same thing). The idea is mainly to point out the issues, and cite some sources for further reading. Depending on how the material turns out, it might be distributed in other sections instead of being placed here.@@

9. References

[BATES96] Indexing and Access for Digital Libraries and the Internet: Human, Database, and Domain Factors , Marcia J. Bates, 1996 http://is.gseis.ucla.edu/research/mjbates.html

[BERNERS-LEE98] What the Semantic Web can represent , Tim Berners-Lee, 1998 http://www.w3.org/DesignIssues/RDFnot.html

[DC] Dublin Core Metadata Initiative , http://dublincore.org/

[RDFMT] RDF Model Theory , W3C Working Draft, 14 February 2002 http://www.w3.org/TR/rdf-mt/

[RDFXML] RDF/XML Syntax Specification (Revised) , W3C Working Draft, 18 December 2001 http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20011218/

[RDFTEST] RDF Test Cases , W3C Working Draft, 12 September 2001 (contains N-Triples ) http://www.w3.org/TR/2001/WD-rdf-testcases-20010912/

[RDFSCHEMA] RDF Schema Specification 1.0 , (editor's working draft), September 2001 http://www.w3.org/2001/sw/RDFCore/Schema/20010913/

[RDFISSUE] RDF Issue Tracking , http://www.w3.org/2000/03/rdf-tracking/

[RFC 2396] RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax , August 1998 http://www.isi.edu/in-notes/rfc2396.txt

[WEBDATA] Web Architecture: Describing and Exchanging Data , W3C Note, 7 June 1999
http://www.w3.org/1999/04/WebData

[XML] Extensible Markup Language (XML) 1.0 , W3C Recommendation, 10 February 1988, http://www.w3.org/TR/1998/REC-xml-19980210.html

[XML-NS] Namespaces in XML , W3C Recommendation, 14 January 1999, http://www.w3.org/TR/REC-xml-names/

10. Acknowledgments

This document has benefited from inputs from many members of the RDF Core Working Group. Specific thanks to Dave Beckett, Dan Brickley, Ronald Daniel, Martyn Horner, Graham Klyne, Sean Palmer, Patrick Stickler, Aaron Swartz, and Garret Wilson, who provided valuable contributions to this document.

RDF Primer

Editors' Working Draft 24 April 2002

Abstract

Status of this Document

Table of Contents

1. Introduction

2. Making Statements About Resources

2.1 Identifiers: Uniform Resource Identifier (URI)

2.2 Documents: Extensible Markup Language (XML)

2.3 The RDF Model

2.4 Structured Property Values

3. An XML Syntax for RDF

4. Defining RDF Vocabularies: RDF Schema

4.1. Defining Classes

4.2. Defining Properties

4.3. Other Schema Information

4.4. Richer Schema Languages

5. RDF Containers

6. Some RDF Applications: RDF in the Field

6.1 Dublin Core Metadata Initiative

6.2 PRISM

6.3 XPackage

6.4 Intelligent Routing

6.5 RSS: RDF Site Summary 1.0

7. Other Parts of RDF

7.1 Model Theory

7.2 Test Cases

7.3 Reification

8. RDF As a Data Model

9. References

10. Acknowledgments