Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. This Primer is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a W3C Working Draft of the RDF Core Working Group and has been produced as part of the W3C Semantic Web Activity (Activity Statement).
In response to last call comments on the 23 January 2003 working draft of the Primer, the descriptions of RDF datatypes, containers, collections, and reification have been clarified, material on RDF vocabularies and XML literals has been added, and numerous other editorial changes have been made. Detailed changes from the 23 January 2003 working draft are described in the Changes section.
This Working Draft consolidates changes and editorial improvements undertaken in response to feedback received during the Last Call publication of the RDF Core specifications which began on 23 January 2003. A list of the Last Call issues addressed by the Working Group is also available. This document has been endorsed by the RDF Core Working Group.
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to how the changes made affect existing implementations and content.
In conformance with W3C policy requirements, known patent and IPR constraints associated with this Working Draft are detailed on the RDF Core Working Group Patent Disclosure page.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
1. Introduction
2. Making Statements About
Resources
2.1 Basic Concepts
2.2 The
RDF Model
2.3 Structured Property Values and Blank
Nodes
2.4 Typed Literals
2.5 Concepts Summary
3. An XML Syntax for RDF:
RDF/XML
3.1 Basic Principles
3.2 Abbreviating and Organizing RDF
URIrefs
3.3 RDF/XML Summary
4. Other RDF
Capabilities
4.1 RDF
Containers
4.2 RDF
Collections
4.3 RDF
Reification
4.4 More
on Structured Values: rdf:value
4.5 XML
Literals
5. Defining RDF Vocabularies: RDF
Schema
5.1 Defining Classes
5.2 Defining Properties
5.3 Interpreting RDF Schema
Declarations
5.4 Other Schema Information
5.5 Richer Schema Languages
6. Some RDF Applications: RDF
in the Field
6.1 Dublin Core Metadata Initiative
6.2 PRISM
6.3 XPackage
6.4 RSS 1.0:
RDF Site Summary
6.5 CIM/XML
6.6 Gene
Ontology Consortium
6.7 Describing Device Capabilities and User
Preferences
7. Other Parts of the RDF
Specification
7.1 RDF
Semantics
7.2 Test
Cases
8. References
8.1 Normative References
8.2 Informational References
9. Acknowledgments
A. More on
Uniform Resource Identifiers (URIs)
B. More on the Extensible Markup
Language (XML)
C. Changes
The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web. Examples include information about items available from on-line shopping facilities (e.g., information about specifications, prices, and availability), or the description of a Web user's preferences for information delivery.
RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created.
RDF is based on the idea of identifying things using Web
identifiers (called Uniform Resource Identifiers,
or URIs), and describing resources in terms of simple
properties and property values. This enables RDF to represent
simple statements about resources as a graph of nodes
and arcs representing the resources, and their properties and
values. To make this discussion somewhat more concrete as soon as
possible, the group of statements "there is
a Person
identified by http://www.w3.org/People/EM/contact#me, whose name is
Eric Miller, whose email address is em@w3.org, and whose title is
Dr." could be represented as the RDF graph in Figure 1:
Figure 1 illustrates that RDF uses URIs to identify:
http://www.w3.org/People/EM/contact#mehttp://www.w3.org/2000/10/swap/pim/contact#Personhttp://www.w3.org/2000/10/swap/pim/contact#mailboxmailto:em@w3.org
as the value of the mailbox property (RDF also uses character
strings such as "Eric Miller", and values from other datatypes
such as integers and dates, as the values of properties)RDF also provides an XML-based syntax (called RDF/XML) for recording and exchanging these graphs. Example 1 is a small chunk of RDF in RDF/XML corresponding to the graph in Figure 1:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">
<contact:Person rdf:about="http://www.w3.org/People/EM/contact#me">
<contact:fullName>Eric Miller</contact:fullName>
<contact:mailbox rdf:resource="mailto:em@w3.org"/>
<contact:personalTitle>Dr.</contact:personalTitle>
</contact:Person>
</rdf:RDF>
Note that this RDF/XML also contains URIs, as well as
properties like mailbox and fullName (in an
abbreviated form), and their respective values
em@w3.org, and Eric Miller.
Like HTML, this RDF/XML is machine processable, and, using URIs, can link pieces of information across the Web. However, unlike conventional hypertext, RDF URIs can refer to any identifiable thing, including things that may not be directly retrievable on the Web (such as the person Eric Miller). The result is that in addition to describing such things as Web pages, RDF can also describe cars, businesses, people, news events, etc. In addition, RDF properties themselves have URIs, to precisely identify the relationships that exist between the linked items.
The following documents contribute to the specification of RDF:
This Primer is intended to provide an introduction to RDF and describe some existing RDF applications, to help information system designers and application developers understand the features of RDF and how to use them. In particular, the Primer is intended to answer such questions as:
The Primer is a non-normative document, which means that it does not provide a definitive specification of RDF. The examples and other explanatory material in the Primer are provided to help readers understand RDF, but they may not always provide definitive or fully-complete answers. In such cases, the relevant normative parts of the RDF specification should be consulted. To help in doing this, the Primer describes the roles these other documents play in the complete specification of RDF, and provides links pointing to the relevant parts of the normative specifications, at appropriate places in the discussion.
It should also be noted that these RDF documents update and clarify previously-published RDF specifications, the Resource Description Framework (RDF) Model and Syntax Specification [RDF-MS] and the Resource Description Framework (RDF) Schema Specification 1.0 [RDF-S]. As a result, there have been some changes in terminology, syntax, and concepts. This Primer reflects the newer set of RDF specifications given in the bulleted list of RDF documents cited above. Hence, readers familiar with the older specifications, and with earlier tutorial and introductory articles based on them, should be aware that there may be differences between the current specifications and those previous documents. The RDF Issue Tracking document [RDFISSUE] can be consulted for a list of issues raised concerning the previous RDF specifications, and their resolution in the current specifications.
RDF is intended to provide a simple way to make statements about Web resources, e.g., Web pages. This section describes the basic ideas behind the way RDF provides these capabilities (the normative specification describing these concepts is RDF Concepts and Abstract Syntax [RDF-CONCEPTS]).
Imagine trying to state that someone named John Smith created a particular Web page. A straightforward way to state this in English would be in the form of a simple statement such as:
http://www.example.org/index.html
has a creator whose value is John Smith
Parts of this statement are emphasized to illustrate that, in order to describe the properties of something, there need to be ways to name, or identify, a number of things:
In this statement, the Web page's URL (Uniform Resource Locator) is used to identify it. In addition, the word "creator" is used to identify the property, and the two words "John Smith" to identify the thing (a person) that is the value of this property.
Other properties of this Web page could be described by writing additional English statements of the same general form, using the URL to identify the page, and words (or other expressions) to identify the properties and their values. For example, the date the page was created, and the language in which the page is written, could be described using the additional statements:
http://www.example.org/index.html
has a creation-date whose value is August 16,
1999
http://www.example.org/index.html has a
language whose value is English
RDF is based on the idea that the things being described have properties which have values, and that resources can be described by making statements, similar to those above, that specify those properties and values. RDF uses a particular terminology for talking about the various parts of statements. Specifically, the part that identifies the thing the statement is about (the Web page in this example) is called the subject. The part that identifies the property or characteristic of the subject that the statement specifies (creator, creation-date, or language in these examples) is called the predicate, and the part that identifies the value of that property is called the object. So, taking the English statement
http://www.example.org/index.html
has a creator whose value is John Smith
the RDF terms for the various parts of the statement are:
http://www.example.org/index.htmlHowever, while English is good for communicating between (English-speaking) humans, RDF is about making machine-processable statements. To make these kinds of statements suitable for processing by machines, two things are needed:
Fortunately, the existing Web architecture provides both these necessary facilities.
As illustrated earlier, the Web already provides one form of identifier, the Uniform Resource Locator (URL). A URL was used in the original example to identify the Web page that John Smith created. A URL is a character string that identifies a Web resource by representing its primary access mechanism (essentially, its network "location"). However, it is also important to be able to record information about many things that, unlike Web pages, do not have network locations or URLs.
The Web provides a more general form of identifier for these purposes, called the Uniform Resource Identifier (URI). URLs are a particular kind of URI. All URIs share the property that different persons or organizations can independently create them, and use them to identify things. However, URIs are not limited to identifying things that have network locations, or use other computer access mechanisms. In fact, a URI can be created to refer to anything that needs to be referred to in a statement, including
Because of this generality, RDF uses URIs as the basis of
its mechanism for identifying the subjects, predicates, and
objects in statements. To be more precise, RDF uses URI
references [URIS]. A URI reference
(or URIref) is a URI, together with an optional
fragment identifier at the end. For example, the URI
reference http://www.example.org/index.html#section2
consists of the URI http://www.example.org/index.html
and (separated by the "#" character) the fragment identifier
Section2. RDF defines a resource as anything
that is identifiable by a URI reference, so using URIrefs
allows RDF to describe practically anything, and to state
relationships between such things as well. URIrefs and fragment
identifiers are discussed further in Appendix A, and in [RDF-CONCEPTS].
To represent RDF statements in a machine-processable way,
RDF uses the Extensible
Markup Language [XML]. XML was
designed to allow anyone to design their own document format
and then write a document in that format. RDF defines a
specific XML markup language, referred to as RDF/XML,
for use in representing RDF information, and for exchanging it
between machines. An example of RDF/XML was given in Section 1. That example (Example 1) used tags such as
<contact:fullName> and
<contact:personalTitle> to delimit the text
content Eric Miller and Dr., respectively.
Such tags allow programs written with an understanding of what
the tags mean to property interpret that content. Appendix B provides further background on
XML in general. The specific RDF/XML syntax used for RDF is
described in more detail in Section
3.
Section 2.1 has introduced RDF's basic statement concepts, the idea of using URI references to identify the things referred to in RDF statements, and RDF/XML as a machine-processable way to represent RDF statements. With that background, this section describes how RDF uses URIs to make statements about resources. The introduction said that RDF was based on the idea of expressing simple statements about resources, where each statement consists of a subject, a predicate, and an object. In RDF, the English statement:
http://www.example.org/index.html
has a creator whose value is John Smith
could be represented by an RDF statement having:
http://www.example.org/index.htmlhttp://purl.org/dc/elements/1.1/creatorhttp://www.example.org/staffid/85740Note how URIrefs are used to identify not only the subject of the original statement, but also the predicate and object, instead of using the words "creator" and "John Smith", respectively (some of the effects of using URIrefs in this way will be discussed later in this section).
RDF models statements as nodes and arcs in a graph. RDF's graph model is defined in [RDF-CONCEPTS]. In this notation, a statement is represented by:
So the RDF statement above would be represented by the graph shown in Figure 2:
Groups of statements are represented by corresponding groups of nodes and arcs. So, to reflect the additional English statements
http://www.example.org/index.html
has a creation-date whose value is August 16,
1999
http://www.example.org/index.html has a
language whose value is English
in the RDF graph, the graph shown in Figure 3 could be used (using suitable URIrefs to name the properties "creation-date" and "language"):
Figure 3 illustrates that objects in RDF statements may be either URIrefs, or constant values (called literals) represented by character strings, in order to represent certain kinds of property values. Literals may not used as subjects or predicates in RDF statements. In drawing RDF graphs, nodes that are URIrefs are shown as ellipses, while nodes that are literals are shown as boxes. (The simple character string literals used in these examples are called plain literals, to distinguish them from the typed literals to be introduced in Section 2.4. The various kinds of literals that can be used in RDF statements are defined in [RDF-CONCEPTS].) .
Sometimes it is not convenient to draw graphs when discussing them, so an alternative way of writing down the statements, called triples, is also used. In the triples notation, each statement in the graph is written as a simple triple of subject, predicate, and object, in that order. For example, the three statements shown in Figure 3 would be written in the triples notation as:
<http://www.example.org/index.html> <http://purl.org/dc/elements/1.1/creator> <http://www.example.org/staffid/85740> . <http://www.example.org/index.html> <http://www.example.org/terms/creation-date> "August 16, 1999" . <http://www.example.org/index.html> <http://www.example.org/terms/language> "English" .
Each triple corresponds to a single arc in the graph,
complete with the arc's beginning and ending nodes (the subject
and object of the statement). Unlike the drawn graph (but like
the original statements), the triples notation requires that a
node be separately identified for each statement it appears in.
So, for example, http://www.example.org/index.html
appears three times (once in each triple) in the triples
representation of the graph, but only once in the drawn graph.
However, the triples represent exactly the same information as
the drawn graph, and this is a key point: what is fundamental
to RDF is the graph model of the statements. The
notation used to represent or depict the graph is
secondary.
The full triples notation requires that URI references be
written out completely, in angle brackets, which, as the
example above illustrates, can result in very long lines on a page. For
convenience, the Primer uses a shorthand way of writing triples
(the same shorthand is also used in other RDF specifications).
This shorthand substitutes an XML qualified name
(or QName) without angle brackets as an abbreviation
for a full URI reference (QNames are discussed further in Appendix B).
A QName contains a prefix that has
been assigned to a namespace URI, followed by a colon, and then
a local name. The full URIref is formed
from the QName by appending the local name to the
namespace URI assigned to the prefix. So, for example, if the
QName prefix foo is assigned to the namespace URI
http://example.org/somewhere/, then the QName
foo:bar is shorthand for the URIref
http://example.org/somewhere/bar.
Primer examples will also use several "well-known" QName
prefixes (without explicitly specifying them
each time), defined as follows:
prefix rdf:, namespace URI:
http://www.w3.org/1999/02/22-rdf-syntax-ns#
prefix rdfs:, namespace URI:
http://www.w3.org/2000/01/rdf-schema#
prefix dc:, namespace URI:
http://purl.org/dc/elements/1.1/
prefix owl:, namespace URI:
http://www.w3.org/2002/07/owl#
prefix ex:, namespace URI:
http://www.example.org/ (or
http://www.example.com/)
prefix xsd:, namespace URI:
http://www.w3.org/2001/XMLSchema#
Obvious variations on the "example" prefix
ex: will also be used as needed in the examples, for instance,
prefix exterms:, namespace URI:
http://www.example.org/terms/ (for terms used by an
example organization),
prefix exstaff:, namespace URI:
http://www.example.org/staffid/ (for the example
organization's staff identifiers),
prefix ex2:, namespace URI:
http://www.domain2.example.org/ (for a second example
organization), and so on.
Using this new shorthand, the previous set of triples can be written as:
ex:index.html dc:creator exstaff:85740 . ex:index.html exterms:creation-date "August 16, 1999" . ex:index.html exterms:language "English" .
Since RDF uses URIrefs instead of words to name things in statements, RDF refers to a set of URIrefs (particularly a set intended for a specific purpose) as a vocabulary. Often, the URIrefs in such vocabularies are organized so that they can be represented as a set of QNames using a common prefix. That is, a common namespace URIref will be chosen for all terms in a vocabulary, typically a URIref under the control of whoever is defining the vocabulary. URIrefs that are contained in the vocabulary are formed by appending individual local names to the end of the common URIref. This forms a set of URIrefs with a common prefix. For instance, as illustrated by the previous examples, an organization such as example.com might define a vocabulary consisting of URIrefs starting with the prefix http://www.example.org/terms/ for terms it uses in its business, such as "creation-date" or "product", and another vocabulary of URIrefs starting with http://www.example.org/staffid/ to identify its employees. RDF uses this same approach to define its own vocabulary of terms with special meanings in RDF. The URIrefs in this RDF vocabulary all begin with http://www.w3.org/1999/02/22-rdf-syntax-ns#, conventionally associated with the QName prefix rdf:.
The RDF Vocabulary Description Language (described in Section 5) defines an additional set of terms having URIrefs that begin with http://www.w3.org/2000/01/rdf-schema#, conventionally associated with the QName prefix rdfs:. (Where a specific QName prefix is commonly used in connection with a given set of terms in this way, the QName prefix itself is sometimes used as the name of the vocabulary. For example, someone might refer to "the rdfs: vocabulary".)
Using common URI prefixes provides a convenient way to organize the URIrefs for a related set of terms. However, this is just a convention. The RDF model only recognizes full URIrefs; it does not "look inside" URIrefs or use any knowledge about their structure. In particular, RDF does not assume there is any relationship between URIrefs just because they have a common leading prefix (see Appendix A for further discussion). Moreover, there is nothing that says that URIrefs with different leading prefixes cannot be considered part of the same vocabulary. A particular organization, process, tool, etc. can define a vocabulary that is significant for it, using URIrefs from any number of other vocabularies as part of its vocabulary.
In addition, sometimes an organization will use a vocabulary's namespace URIref as the URL of a Web resource that provides further information about that vocabulary. For example, as noted earlier, the QName prefix dc: will be used in Primer examples, associated with the namespace URIref http://purl.org/dc/elements/1.1/. In fact, this refers to the Dublin Core vocabulary described in Section 6.1. Accessing this namespace URIref in a Web browser will retrieve additional information about the Dublin Core vocabulary (specifically, an RDF schema). However, this is also just a convention. RDF does not assume that a namespace URI identifies a retrievable Web resource (see Appendix B for further discussion).
In the rest of the Primer, the term vocabulary will be used when referring to a set of URIrefs defined for some specific purpose, such as the set of URIrefs defined by RDF for its own use, or the set of URIrefs defined by example.com to identify its employees. The term namespace will be used only when referring specifically to the syntactic concept of an XML namespace (or in describing the URI assigned to a prefix in a QName).
URIrefs from different vocabularies can be freely mixed in RDF graphs. For example, the graph in Figure 3 uses URIrefs from the exterms:, exstaff:, and dc: vocabularies. Also, RDF imposes no restrictions on how many statements using a given URIref as predicate can appear in a graph to describe the same resource. For example, if the resource ex:index.html had been created by the cooperative efforts of several staff members in addition to John Smith, example.org might have written the statements:
ex:index.html dc:creator exstaff:85740 . ex:index.html dc:creator exstaff:27354 . ex:index.html dc:creator exstaff:00816 .
These examples of RDF statements begin to
illustrate some of the advantages of using URIrefs as RDF's
basic way of identifying things. For instance, in the first
statement, instead of
identifying the creator of the Web page by
the character string "John Smith", he has been assigned a URIref,
in this case (using a URIref based on his employee number)
http://www.example.org/staffid/85740 . An advantage of
using a URIref in this case is that the identification
of the statement's subject can be more precise.
That is, the creator of the page is not the
character string "John Smith", or any one of the thousands of
people named John Smith, but the particular John Smith
associated with that URIref (whoever created the URIref defines
the association). Moreover, since there is a URIref to refer to
John Smith, he is a full-fledged resource, and
additional information can be recorded about him, simply by adding
additional RDF statements with John's URIref as the subject.
For example, Figure 4 shows some additional
statements giving John's name and age.
These examples also illustrate that RDF uses URIrefs as
predicates in RDF statements. That is, rather than
using character strings (or words) such as "creator" or "name"
to identify properties, RDF uses URIrefs. Using URIrefs to
identify properties is important for a number of reasons.
First, it distinguishes the properties one person may use from
different properties someone else may use that would otherwise be
identified by the same character string. For instance, in the
example in Figure
4, example.org uses "name" to mean someone's full name
written out as a character string literal (e.g., "John Smith"),
but someone else may intend "name" to mean something different
(e.g., the name of a variable in a piece of program text). A
program encountering "name" as a property identifier on the Web
(or merging data from multiple sources) would not necessarily be
able to distinguish these uses. However, if example.org writes
http://www.example.org/terms/name for its "name"
property, and the other person writes
http://www.domain2.example.org/genealogy/terms/name
for hers, it is clear that there are distinct
properties involved (even if a program cannot automatically
determine the distinct meanings). Also, using URIrefs to identify properties
enables the properties to be treated as resources themselves.
Since properties are resources, additional
information can be recorded about them (e.g., the English description of what
example.org means by "name"), simply by adding additional RDF
statements with the property's URIref as the subject.
Using URIrefs as subjects, predicates, and objects in RDF statements supports the development and use of a shared vocabulary on the Web, since people can discover and begin using vocabularies already used by others to describe things, reflecting a shared understanding of those concepts. For example, in the triple
ex:index.html dc:creator exstaff:85740 .
the predicate dc:creator, when fully expanded as a
URIref, is an unambiguous reference to the "creator" attribute
in the Dublin Core metadata attribute set (discussed further in
Section 6.1), a widely-used set of
attributes (properties) for describing information of all
kinds. The writer of this triple is effectively saying that the
relationship between the Web page (identified by
http://www.example.org/index.html ) and the creator of
the page (a distinct person, identified by
http://www.example.org/staffid/85740 ) is exactly the
concept identified by
http://purl.org/dc/elements/1.1/creator.
Another
person familiar with the Dublin Core vocabulary,
or who finds out what dc:creator
means (say by looking up its definition on the Web)
will know what is meant by this relationship.
In addition, based on this understanding, people can
write programs to behave in accordance with
that meaning when processing triples containing the predicate
dc:creator.
Of course, this depends on increasing the general use of URIrefs to refer to things instead of using literals; e.g., using URIrefs like exstaff:85740 and dc:creator instead of character string literals like John Smith and creator.
Even then, RDF's use of URIrefs does not solve all identification
problems because, for example, people can still use different
URIrefs to refer to the same thing. However, the fact that
these different URIrefs are used in the commonly-accessible
"Web space" creates the opportunity both to identify
equivalences among these different references, and to migrate
toward the use of common references.
In addition, it is important to distinguish between any meaning that RDF itself associates with terms used in RDF statements, such as dc:creator in the previous example, and additional, externally-defined meaning that people (or programs written by those people) might associate with those terms.
As a language, RDF directly defines only the graph syntax of subject, predicate, and object triples, certain meanings associated with URIrefs in the rdf: vocabulary, and certain other concepts to be described later, as normatively defined in [RDF-CONCEPTS] and [RDF-SEMANTICS]. However, RDF does not define the meanings of terms from other vocabularies, such as dc:creator, that might be used in RDF statements. Specific vocabularies will be created, with specific meanings assigned to the URIrefs defined in them, externally to RDF. RDF statements using URIrefs from these vocabularies may convey the specific meanings associated with those terms to people familiar with these vocabularies, or to RDF applications written to process these vocabularies, without conveying any of these meanings to an arbitrary RDF application.
For example, people can associate meaning with a triple such as
ex:index.html dc:creator exstaff:85740 .
based on the meaning they associate with the appearance of the
word "creator" as part of the URIref dc:creator, or based on
their understanding of the specific definition of dc:creator
in the Dublin Core vocabulary.
However, as far as an arbitrary RDF application is concerned the triple might as
well be something like
fy:joefy.iunm ed:dsfbups fytubgg:85740 .
as far as any built-in meaning is concerned. Similarly, any
natural language text describing the meaning of dc:creator
that might be found on the Web provides no
additional meaning that such a program can directly use.
Of course, URIrefs from a particular vocabulary can be used in RDF statements even though a given application may not be able to associate any special meanings with them.
For example,
generic RDF software would recognize that
the above expression is an RDF statement, that ed:dsfbups is the
predicate, and so on. It will simply not associate with the triple any special meaning that the vocabulary developer might have associated with a URIref like ed:dsfbups. Moreover, based on their understanding of a given vocabulary, people can write RDF applications to behave in accordance with the special meanings assigned to URIrefs from that vocabulary, even though that meaning will not be accessible to RDF applications not written in that way.
The result of all this is that RDF provides a way to make
statements that applications can more easily process. Now an
application cannot actually "understand" such statements, as noted
above, any more than a database system "understands" terms like "employee" or "salary" in processing a query like SELECT NAME FROM EMPLOYEE WHERE SALARY > 35000.
However, if an application is appropriately written,
it can deal with RDF statements in a way that makes it seem
like it does understand them, just as a database system and its applications can do useful work in processing employee and payroll information without understanding "employee" and "payroll".
For example, a user could search the Web for all
book reviews and create an average rating for each book. Then,
the user could put that information back on the Web. Another
Web site could take that list of book rating averages and
create a "Top Ten Highest Rated Books" page. Here, the
availability and use of a shared vocabulary about ratings, and
a shared group of URIrefs identifying the books they apply to,
allows individuals to build a mutually-understood and
increasingly-powerful (as additional contributions are made)
"information base" about books on the Web. The same principle
applies to the vast amounts of information that people create
about thousands of subjects every day on the Web.
RDF statements are similar to a number of other formats for recording information, such as:
and information in these formats can be treated as RDF statements, allowing RDF to be used to integrate data from many sources.
Things would be very simple if the only types of information
to be recorded about things were obviously in the form of the
simple RDF statements illustrated so far. However, most
real-world data involves structures that are more complicated
than that, at least on the surface. For instance, in the
original example, the date the Web page was created is recorded
as a single exterms:creation-date property, with a
plain literal as its value. However, suppose
the value of the exterms:creation-date property
needed to record
the month, day, and year as separate pieces of information? Or,
in the case of John Smith's personal information, suppose
John's address was being described. The whole address could be
written out as a plain literal, as in the triple
exstaff:85740 exterms:address "1501 Grant Avenue, Bedford, Massachusetts 01730" .
However, suppose John's address needed to be recorded as a structure consisting of separate street, city, state, and postal code values? How would this be done in RDF?
Structured information like this is represented in RDF by
considering the aggregate thing to be described (like
John Smith's address) as a resource, and then making statements
about that new resource. So, in the RDF graph, in order to
break up John Smith's address into its component parts,
a new node is created to represent the concept of John Smith's
address, with a new URIref to identify it,
say http://www.example.org/addressid/85740
(abbreviated as exaddressid:85740).
RDF statements (additional arcs and nodes) can then be
written with that
node as the subject, to represent the additional information,
producing the graph shown in Figure
5:
or the triples:
exstaff:85740 exterms:address exaddressid:85740 . exaddressid:85740 exterms:street "1501 Grant Avenue" . exaddressid:85740 exterms:city "Bedford" . exaddressid:85740 exterms:state "Massachusetts" . exaddressid:85740 exterms:postalCode "01730" .
This way of representing structured information in RDF can
involve generating numerous "intermediate" URIrefs
such as exaddressid:85740 to represent aggregate concepts such as
John's address. Such concepts may never need to be referred to
directly from outside a particular graph, and hence may not
require "universal" identifiers. In addition, in the
drawing of the graph representing the group of
statements shown in Figure 5,
the URIref assigned to identify "John Smith's
address" is not really needed, since the graph could just as easily
have been drawn as in Figure 6:
Figure 6, which is a perfectly good RDF graph, uses a node without a URIref to stand for the concept of "John Smith's address". This blank node serves its purpose in the drawing without needing a URIref, since the node itself provides the necessary connectivity between the various other parts of the graph. (Blank nodes were called anonymous resources in [RDF-MS].) However, some form of explicit identifier for that node is needed in order to represent this graph as triples. To see this, trying to write the triples corresponding to what is shown in Figure 6 would produce something like:
exstaff:85740 exterms:address ??? . ??? exterms:street "1501 Grant Avenue" . ??? exterms:city "Bedford" . ??? exterms:state "Massachusetts" . ??? exterms:postalCode "01730" .
where ??? stands for something that indicates the presence
of the blank node. Since a complex graph might contain more
than one blank node, there also needs to be a way to differentiate
between these different blank nodes in a triples representation
of the graph. As a result, triples use blank
node identifiers, having the form _:name, to
indicate the presence of blank nodes. For instance,
in this example a blank node identifier _:johnaddress
might be used to refer to the blank node, in which
case the resulting triples might be:
exstaff:85740 exterms:address _:johnaddress . _:johnaddress exterms:street "1501 Grant Avenue" . _:johnaddress exterms:city "Bedford" . _:johnaddress exterms:state "Massachusetts" . _:johnaddress exterms:postalCode "01730" .
In a triples representation of a graph, each distinct blank node in the graph is given a different blank node identifier. Unlike URIrefs and literals, blank node identifiers are not considered to be actual parts of the RDF graph (this can be seen by looking at the drawn graph in Figure 6 and noting that the blank node has no blank node identifier). Blank node identifiers are just a way of representing the blank nodes in a graph (and distinguishing one blank node from another) when the graph is written in triple form. Blank node identifiers also have significance only within the triples representing a single graph (two different graphs with the same number of blank nodes might independently use the same blank node identifiers to distinguish them, and it would be incorrect to assume that blank nodes from different graphs having the same blank node identifiers are the same). If it is expected that a node in a graph will need to be referenced from outside the graph, a URIref should be assigned to identify it.
The beginning of this section noted that aggregate structures, like John Smith's address, can be represented by considering the aggregate thing to be described as a separate resource, and then making statements about that new resource. This example illustrates an important aspect of RDF: RDF directly represents only binary relationships, e.g. the relationship between John Smith and the literal representing his address. Representing the relationship between John and the group of separate components of this address involves dealing with an n-ary (n-way) relationship (in this case, n=5) between John and the street, city, state, and postal code components. In order to represent such structures directly in RDF (e.g., considering the address as a group of street, city, state, and postal code components), this n-way relationship must be broken up into a group of separate binary relationships. Blank nodes provide one way to do this. For each n-ary relationship, one of the participants is chosen as the subject of the relationship (John in this case), and a blank node is created to represent the rest of the relationship (John's address in this case). The remaining participants in the relationship (such as the city in this example) are then represented as separate properties of the new resource represented by the blank node.
Blank nodes also provide a way to more accurately make
statements about resources that may not have URIs, but that are
described in terms of relationships with other resources that
do have URIs. For example, when making statements
about a person, say Jane Smith, it may seem natural to use a
URI based on that person's email address as her URI, e.g.,
mailto:jane@example.org. However, this approach can
cause problems. For example, it may also be necessary to record information
about Jane's mailbox (e.g., the server it is on) as well as
about Jane herself (e.g., her current address), and using a
URIref for Jane based on her email address makes it difficult
to know whether it is Jane or her mailbox that is being described.
The same problem exists when a company's Web page URL, say
http://www.example.com/, is used as the URI of the
company itself. Once again, it may be necessary to record information
about the Web page itself (e.g., who created it and when) as well as
about the company, and using http://www.example.com/
as an identifier for both makes it difficult to know which
of these is the actual subject.
The fundamental problem is that using Jane's
mailbox as a stand-in for Jane is not really
accurate: Jane and her mailbox are not the same thing, and
hence they should be identified differently. When Jane herself
does not have a URI, a blank node provides a more accurate way
of modeling this situation. Jane can be represented by a blank
node, and that blank node used as the subject of a statement
with exterms:mailbox as the property
and the URIref mailto:jane@example.org as
its value. The blank node could also be described with an
rdf:type property having a value of
exterms:Person (types are discussed in more detail
in the following sections), an exterms:name property
having a value of "Jane Smith", and any other
descriptive information that might be useful, as shown in
the following triples:
_:jane exterms:mailbox mailto:jane@example.org . _:jane rdf:type exterms:Person . _:jane exterms:name "Jane Smith" . _:jane exterms:empID "23748" . _:jane exterms:age "26" .
This says, accurately, that "there is a resource of type
exterms:Person, whose electronic mailbox is identified
by mailto:jane@example.org, whose name is Jane
Smith, etc." That is, the blank node can be read as "there
is a resource". Statements with that blank node as subject then
provide information about the characteristics of that
resource.
In practice, using blank nodes instead of URIrefs in these
cases does not change the way this kind of
information is handled very much. For example, if it is known
that an email address uniquely identifies someone at
example.org (particularly if the address is unlikely to be
reused), that fact can still be used to associate information
about that person from multiple sources, even though the email
address is not the person's URI. In this case, if some
RDF is found on the Web that describes a book, and
gives the author's contact information as
mailto:jane@example.org, it might be reasonable to conclude
that the author's name is Jane Smith. The point is that saying
something like "the author of the book is
mailto:jane@example.org" is typically a shorthand for
"the author of the book is someone whose mailbox is
mailto:jane@example.org". Using a blank node to
represent this "someone" is just a more accurate way to
represent the real world situation. (Incidentally, some
RDF-based schema languages allow specifying that certain
properties are unique identifiers of the resources they
describe. This is discussed further in
Section 5.5.)
Using blank nodes in this way can also help avoid the use of
literals in what might be inappropriate situations. For example,
in describing Jane's book, lacking a URIref to identify the author,
the publisher might have written (using the
publisher's own ex2terms: vocabulary):
ex2terms:book78354 rdf:type ex2terms:Book . ex2terms:book78354 ex2terms:author "Jane Smith" .
However, the author of the book is not really the character string "Jane Smith", but a person whose name is Jane Smith. The same information might be more accurately given by the publisher using a blank node, as:
ex2terms:book78354 rdf:type ex2terms:Book . ex2terms:book78354 ex2terms:author _:author78354 . _:author78354 rdf:type ex2terms:Person . _:author78354 ex2terms:name "Jane Smith" .
This essentially says "resource ex2terms:book78354
is of type ex2terms:Book, and its author is a resource of type
ex2terms:Person, whose name is Jane
Smith." Of course, in this particular case the publisher might instead
have assigned its own URIrefs to its authors instead of using blank nodes to
identify them, in order to encourage external
references to its authors.
The last section described how to handle situations
in which property values represented by plain
literals had to be broken up into structured values to
represent the individual parts of those literals. Using
this approach, instead of, say, recording the date a Web page
was created as a single exterms:creation-date
property, with a single plain literal as its value,
the value would be represented as a structure consisting of the month,
day, and year as separate pieces of information, using separate
plain literals to represent the corresponding values.
However, so
far, all constant values that serve as objects in RDF statements
have been represented by these plain
(untyped) literals, even when the intent is probably for the value
of the property to be a number (e.g., the value of a
year or age property) or some other kind of
more specialized value.
For example, Figure 4
illustrated an RDF graph recording information about John
Smith. That graph recorded the value of John Smith's
exterms:age property as the plain literal "27", as
shown in Figure 7:
In this case, the hypothetical organization example.org
probably intends for "27" to be interpreted as a number, rather
than as the string consisting of the character "2" followed by
the character "7"
(since the literal represents the value of an "age"
property). However, there is no information in Figure 7's graph
that explicitly indicates that "27" should be interpreted as
a number. Similarly, example.org also
probably intends for "27" to be interpreted as a decimal
number, i.e., the value twenty seven, rather than, say,
as an octal number, i.e., the value twenty three.
However, once again there is no information in Figure 7's graph that
explicitly indicates this. Specific applications might be written
with the understanding that they should
interpret values of the exterms:age property as decimal
numbers, but this would mean that proper interpretation of this
RDF would depend on information not explicitly provided
in the RDF graph, and hence on information that would not necessarily
be available to other applications that might need to interpret this RDF.
The common practice in
programming languages or database systems is to provide this
additional information about how to interpret a literal
by associating a datatype with the
literal, in this case, a datatype like decimal or
integer. An application that understands the datatype
then knows, for example, whether the literal "10" is intended
to represent the number ten, the number two,
or the string consisting of the character "1" followed by the
character "0", depending on whether the specified datatype is
integer, binary, or string. In RDF,
typed
literals are used to provide this kind of information.
An RDF typed literal is formed by pairing a string with a URIref that identifies a particular datatype. This results in a single literal node in the RDF graph with the pair as the literal. The value represented by the typed literal is the value that the specified datatype associates with the specified string. For example, using a typed literal, John Smith's age could be described as being the integer number 27 using the triple:
<http://www.example.org/staffid/85740> <http://www.example.org/terms/age> "27"^^<http://www.w3.org/2001/XMLSchema#integer> .
or, using the QName simplification for writing long URIs:
exstaff:85740 exterms:age "27"^^xsd:integer .
or as shown in Figure 8:
Similarly, in the graph shown in Figure
3 describing information about a Web page, the
value of the page's exterms:creation-date property
was written as
the plain literal "August 16, 1999". However, using a typed
literal, the creation date of the Web page could be explicitly described as
being the date August 16, 1999, using the triple:
ex:index.html exterms:creation-date "1999-08-16"^^xsd:date .
or as shown in Figure 9:
Unlike typical programming languages and database systems,
RDF has no built-in set of datatypes of its own, such as
datatypes for integers, reals, strings, or dates.
Instead,
RDF typed literals simply provide a way to explicitly
indicate, for a given literal, what datatype should be used to
interpret it. The datatypes used in typed literals are defined
externally to RDF, and identified by their datatype
URIs.
(There is one exception: RDF defines a built-in datatype with the
URIref rdf:XMLLiteral to represent XML content as a literal
value. This datatype is defined in
[RDF-CONCEPTS], and its use is
described in Section 4.5.)
For instance, the examples in Figure 8
and Figure 9 use the datatypes integer and
date from the XML Schema datatypes defined in XML Schema Part 2:
Datatypes [XML-SCHEMA2].
An advantage of this approach is that it
gives RDF the flexibility to directly represent information
coming from different sources without the need to perform type
conversions between these sources and a native set of RDF
datatypes. (Type conversions would still be required when
moving information between systems having different sets of datatypes,
but RDF would impose no extra conversions into and out
of a native set of RDF datatypes.)
RDF datatype concepts are based on a conceptual framework from XML Schema datatypes [XML-SCHEMA2], as described in RDF Concepts and Abstract Syntax [RDF-CONCEPTS]. This conceptual framework defines a datatype as consisting of:
xsd:date, this set of values is a set of dates.xsd:date defines 1999-08-16 as being a legal
way to write a literal of this type
(as opposed, say, to August 16, 1999).xsd:date
determines that, for this datatype, the string 1999-08-16
represents the date August 16, 1999. The lexical-to-value
mapping is a factor because the same character string may represent
different values for different datatypes.Not all datatypes are suitable for use in RDF. For a datatype
to be suitable for use in RDF, it
must conform to the conceptual framework just described. This basically means
that, given a character string, the datatype must
unambiguously define
whether or not the string is in its lexical space, and
what value in its value space the string represents.
For example, the basic XML Schema datatypes
such as xsd:string, xsd:boolean, xsd:date,
etc. are suitable
for use in RDF. However, some of the built-in XML Schema datatypes
are not suitable for use in RDF. For example, xsd:duration does
not have a well-defined value space, and xsd:QName requires an
enclosing XML document context. Lists of the XML Schema datatypes
that are currently considered suitable and unsuitable for use in
RDF are given in [RDF-SEMANTICS].
Since the value that a given typed literal denotes is defined
by the typed literal's datatype, and, with the exception of
rdf:XMLLiteral, RDF does not define any datatypes,
the actual interpretation of a
typed literal appearing in an RDF graph (e.g., determining
the value it denotes) must be performed by software
that is written to correctly process not only RDF, but the
typed literal's datatype as well. Effectively, this software must
be written to process an extended language that includes not
only RDF, but also the datatype, as part of its built-in
vocabulary.
This raises the issue of which datatypes will be generally available in
RDF software.
Generally, the XML Schema datatypes that are listed as suitable
for use in RDF in [RDF-SEMANTICS]
have a "first among equals" status in RDF.
As noted already, the examples in Figure 8 and
Figure 9
used some of these XML Schema datatypes, and the Primer will be
using these datatypes in
most of its other examples of typed literals as well (for one thing, XML Schema
datatypes already have assigned URIrefs that can be used to refer to them, specified
in [XML-SCHEMA2]). These XML Schema datatypes are
treated no differently than any other datatype, but they are
expected to be the most widely used, and therefore the most
likely to be interoperable among different software. As a
result, it is expected that much RDF software will also be
written to process these datatypes. However, RDF software
could be written to process other sets of datatypes as
well, assuming they were determined to be suitable for use
with RDF, as described already.
In general, RDF software may be called on to process RDF
data that contains references to datatypes that the software
has not been written to
process, in which case there are some things the software
will not be able to do.
For one thing, with the exception of rdf:XMLLiteral,
RDF itself does not define the URIrefs that identify datatypes.
As a result, RDF software, unless it has been written to recognize specific
URIrefs, will not be able to determine whether or not
a URIref written in a typed literal actually identifies a datatype.
Moreover, even when a URIref does identify a datatype, RDF
itself does not define the validity of pairing that datatype
with a particular literal. This validity can only be determined
by software written to correctly process that particular datatype.
For example, the typed literal in the triple:
exstaff:85740 exterms:age "pumpkin"^^xsd:integer .
or the graph shown in Figure 10:
is valid RDF, but obviously an error as far as the
xsd:integer datatype is concerned, since "pumpkin" is
not defined as being in the lexical space of
xsd:integer.
RDF software not written to
process the xsd:integer datatype would not be able
to recognize this error.
However, proper use of RDF typed literals provides more information about the intended interpretation of literal values, and hence makes RDF statements a better means of information exchange among applications.
Taken as a whole, RDF is basically simple: nodes-and-arcs diagrams interpreted as statements about things identified by URIrefs. This section has presented an introduction to these concepts. As noted earlier, the normative (i.e., definitive) RDF specification describing these concepts is RDF Concepts and Abstract Syntax [RDF-CONCEPTS], which should be consulted for further information. The formal semantics (meaning) of these concepts is defined in the (normative) RDF Semantics [RDF-SEMANTICS] document.
However, in addition to the basic techniques for describing things using RDF statements discussed so far, it should be clear that people or organizations also need a way to describe the vocabularies (terms) they intend to use in those statements, specifically, vocabularies for:
ex:Person)ex:age and
ex:creation-date), andex:age
property should always be an xsd:integer).The basis for describing such vocabularies in RDF is the RDF Vocabulary Description Language 1.0: RDF Schema [RDF-VOCABULARY], which will be described in Section 5.
Additional background on the basic ideas underlying RDF, and its role in providing a general language for describing Web information, can be found in [WEBDATA]. RDF draws upon ideas from knowledge representation, artificial intelligence, and data management, including Conceptual Graphs, logic-based knowledge representation, frames, and relational databases. Some possible sources of background information on these subjects include [SOWA], [CG], [KIF], [HAYES], [LUGER], and [GRAY].
As described in Section 2, RDF's conceptual model is a graph. RDF provides an XML syntax for writing down and exchanging RDF graphs, called RDF/XML. Unlike triples, which are intended as a shorthand notation, RDF/XML is the normative syntax for writing RDF. RDF/XML is defined in the RDF/XML Syntax Specification [RDF-SYNTAX]. This section describes this RDF/XML syntax.
The basic ideas behind the RDF/XML syntax can be illustrated using some of the examples presented already. Take as an example the English statement:
http://www.example.org/index.html
has a creation-date whose value is August 16,
1999
The RDF graph for this single statement, after assigning a
URIref to the creation-date property, is shown in Figure 11:
with a triple representation of:
ex:index.html exterms:creation-date "August 16, 1999" .
(Note that a typed literal is not used for the date value in this example. Representing typed literals in RDF/XML will be described later in this section).
Example 2 shows the RDF/XML syntax corresponding to the graph in Figure 11:
1. <?xml version="1.0"?> 2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3. xmlns:exterms="http://www.example.org/terms/"> 4. <rdf:Description rdf:about="http://www.example.org/index.html"> 5. <exterms:creation-date>August 16, 1999</exterms:creation-date> 6. </rdf:Description> 7. </rdf:RDF>
(Line numbers are added to help in explaining the example).
This seems like a lot of overhead. It is easier to understand what is going on by considering each part of this XML in turn (a brief introduction to XML is provided in Appendix B).
Line 1, <?xml version="1.0"?>, is the XML
declaration, which indicates that the following content is XML,
and what version of XML it is.
Line 2 begins an rdf:RDF element. This indicates
that the following XML content (starting here and ending with
the </rdf:RDF> in line 7) is intended to
represent RDF. Following the rdf:RDF on this same line
is an XML namespace declaration, represented as an
xmlns attribute of the rdf:RDF start-tag.
This declaration
specifies that all tags in this content
prefixed with rdf: are part of the namespace
identified by the URIref
http://www.w3.org/1999/02/22-rdf-syntax-ns#.
URIrefs beginning with the string
http://www.w3.org/1999/02/22-rdf-syntax-ns#
are used for terms from the RDF vocabulary.
Line 3 specifies another XML namespace declaration, this
time for the prefix exterms:. This is expressed as
another xmlns attribute of the rdf:RDF
element, and specifies that the namespace URIref
http://www.example.org/terms/ is to be associated with
the exterms: prefix.
URIrefs beginning with the string
http://www.example.org/terms/
are used for terms from the vocabulary defined by the example organization,
example.org.
The ">" at the end of line 3 indicates the end
of the rdf:RDF start-tag. Lines 1-3 are general
"housekeeping" necessary to indicate that this is
RDF/XML content, and to identify the namespaces being used
within the RDF/XML content.
Lines 4-6 provide the RDF/XML for the specific statement
shown in Figure
11. An obvious way to talk about any RDF
statement is to say it is a description, and that it is
about the subject of the statement (in this case,
about http://www.example.org/index.html), and this is the way
RDF/XML represents the statement. The rdf:Description
start-tag in line 4 indicates the start of a
description of a resource, and goes on to identify the
resource the statement is about (the subject of the
statement) using the rdf:about attribute to specify
the URIref of the subject resource.
Line 5 provides a
property element, with the QName
exterms:creation-date as its tag, to
represent the predicate and object of the statement.
The QName exterms:creation-date is chosen
so that appending
the local name creation-date to the URIref of the
exterms: prefix (http://www.example.org/terms/)
gives the statement's predicate URIref
http://www.example.org/terms/creation-date.
The content of this property element is the object of the
statement, the plain literal August 19, 1999
(the value of the creation-date property of the subject resource).
The property element is nested within the containing
rdf:Description element, indicating that this property
applies to the resource specified in the rdf:about
attribute of the rdf:Description element. Line 6
indicates the end of this particular rdf:Description
element.
Finally, Line 7 indicates the end of the rdf:RDF
element started on line 2.
Example 2 illustrates the basic ideas used by RDF/XML to encode an RDF graph as XML elements, attributes, element content, and attribute values. The URIrefs of predicates (as well as some nodes) are written as XML QNames, consisting of a short prefix denoting a namespace URI, together with a local name denoting a namespace-qualified element or attribute, as described in Appendix B. The (namespace URIref, local name) pair is chosen so that concatenating them forms the URIref of the original node or predicate. The URIrefs of subject nodes are written as XML attribute values (URIrefs of object nodes may sometimes be written as attribute values as well). Literal nodes (which are always object nodes) become element text content or attribute values. (Many of these options are described later in the Primer; all of these options are described in [RDF-SYNTAX]).
An RDF graph consisting of multiple statements can be represented in RDF/XML by using RDF/XML similar to Lines 4-6 in Example 2 to separately represent each statement. For example, to write the following two statements:
ex:index.html exterms:creation-date "August 16, 1999" . ex:index.html exterms:language "English" .
the RDF/XML in Example 3 could be used:
1. <?xml version="1.0"?> 2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3. xmlns:exterms="http://www.example.org/terms/"> 4. <rdf:Description rdf:about="http://www.example.org/index.html"> 5. <exterms:creation-date>August 16, 1999</exterms:creation-date> 6. </rdf:Description> 7. <rdf:Description rdf:about="http://www.example.org/index.html"> 8. <exterms:language>English</exterms:language> 9. </rdf:Description> 10. </rdf:RDF>
Example 3 is the same as Example 2, with the addition of lines 7-9,
a second rdf:Description element to represent the
second statement. An arbitrary number of
additional statements could be written in the same way, using a separate
rdf:Description element for each additional statement.
As Example 3 illustrates, once the
overhead of writing the XML and namespace declarations is dealt
with, writing each additional RDF statement in RDF/XML is both
straightforward and not too complicated.
The RDF/XML syntax provides a number of abbreviations to
make common uses easier to write. For example, it is typical
for the same resource to be described with several properties
and values at the same time, as in Example
3, where the resource ex:index.html is the subject
of several statements. To handle such cases, RDF/XML allows
multiple property elements representing those properties to be
nested within the rdf:Description element that
identifies the subject resource. For example, to
represent the following group of statements about
http://www.example.org/index.html:
ex:index.html dc:creator exstaff:85740 . ex:index.html exterms:creation-date "August 16, 1999" . ex:index.html exterms:language "English" .
whose graph (the same as Figure 3) is shown in Figure 12:
the RDF/XML shown in Example 4 could be written:
1. <?xml version="1.0"?> 2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3. xmlns:dc="http://purl.org/dc/elements/1.1/" 4. xmlns:exterms="http://www.example.org/terms/"> 5. <rdf:Description rdf:about="http://www.example.org/index.html"> 6. <exterms:creation-date>August 16, 1999</exterms:creation-date> 7. <exterms:language>English</exterms:language> 8. <dc:creator rdf:resource="http://www.example.org/staffid/85740"/> 9. </rdf:Description> 10. </rdf:RDF>
Compared with the previous two examples, Example 4 adds an additional namespace
declaration (in line 3), and an additional dc:creator
property element (in line 8). In addition, the
property elements for the three properties whose subject is
http://www.example.org/index.html are nested within a single
rdf:Description element identifying that subject,
rather than writing a separate rdf:Description element
for each statement.
Line 8 also introduces a new form of property element. (The
element tag also uses a different namespace prefix, the new
namespace prefix dc: defined in line 3.) The
exterms:language element in line 7 is similar to the
exterms:creation-date element used in Example 2. Both these elements represent
properties with plain literals as property values, and such
elements are written by enclosing the literal within start-
and end-tags corresponding to the property name. However, the
dc:creator element on line 8 represents a property
whose value is another resource, rather than a
literal. If the URIref of this resource were written as a
plain literal within start- and end-tags in the same way as
the literal values of the other elements, this would
say that the value of the dc:creator element was
the character string
http://www.example.org/staffid/85740, rather than the
resource identified by that literal interpreted as a URIref. In
order to indicate the difference, the
dc:creator element is written using what XML calls an
empty-element tag (it has no separate end-tag), and
the property value is written using an rdf:resource
attribute within that empty element. The rdf:resource
attribute indicates that the property element's value is
another resource, identified by its URIref. Because the URIref
is being used as an attribute value, RDF/XML requires
the URIref to be written out (as an absolute or relative URIref),
rather than abbreviating it as a
QName as was done in writing element and attribute
names (absolute and relative URIrefs are discussed in
Appendix A).
It is important to understand that the RDF/XML in Example 4 is an abbreviation. The RDF/XML in Example 5, in which each statement is written separately, describes exactly the same RDF graph (the graph of Figure 12):
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:exterms="http://www.example.org/terms/">
<rdf:Description rdf:about="http://www.example.org/index.html">
<exterms:creation-date>August 16, 1999</exterms:creation-date>
</rdf:Description>
<rdf:Description rdf:about="http://www.example.org/index.html">
<exterms:language>English</exterms:language>
</rdf:Description>
<rdf:Description rdf:about="http://www.example.org/index.html">
<dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
</rdf:Description>
</rdf:RDF>
The following sections will describe a few additional RDF/XML abbreviations. [RDF-SYNTAX] provides a more thorough description of the abbreviations that are available.
RDF/XML can also represent graphs that include nodes that have no URIrefs, i.e., the blank nodes described in Section 2.3. For example, Figure 13 (taken from [RDF-SYNTAX]) shows a graph saying "the document 'http://www.w3.org/TR/rdf-syntax-grammar' has a title 'RDF/XML Syntax Specification (Revised)' and has an editor, the editor has a name 'Dave Beckett' and a home page 'http://purl.org/net/dajobe/' ".
This illustrates an idea discussed in Section 2.3: the use of a blank node to represent something that does not have a URIref, but can be described in terms of other information. In this case, the blank node represents a person, the editor of the document, and the person is described by his name and home page.
RDF/XML provides several ways to represent graphs
containing blank nodes. These are all described in [RDF-SYNTAX]. The approach
illustrated here, which is the most direct approach, is to
assign a blank node identifier to each blank node. A
blank node identifier serves to identify a blank node within a
particular RDF/XML document but, unlike a URIref, is unknown
outside the document in which it is assigned. A blank node is
referred to in RDF/XML using an rdf:nodeID attribute,
with a blank node identifier as its value, in places where the
URIref of a resource would otherwise appear. Specifically,
a statement with a blank node as its subject can be written in
RDF/XML using an rdf:Description element with
an rdf:nodeID attribute instead of an
rdf:about attribute. Similarly, a statement with a
blank node as its object can be written using a property
element with an rdf:nodeID attribute instead of an
rdf:resource attribute. Using rdf:nodeID, Example 6 shows the RDF/XML corresponding
to Figure 13:
1. <?xml version="1.0"?> 2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3. xmlns:dc="http://purl.org/dc/elements/1.1/" 4. xmlns:exterms="http://example.org/stuff/1.0/"> 5. <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"> 6. <dc:title>RDF/XML Syntax Specification (Revised)</dc:title> 7. <exterms:editor rdf:nodeID="abc"/> 8. </rdf:Description> 9. <rdf:Description rdf:nodeID="abc"> 10. <exterms:fullName>Dave Beckett</exterms:fullName> 11. <exterms:homePage rdf:resource="http://purl.org/net/dajobe/"/> 12. </rdf:Description> 13. </rdf:RDF>
In Example 6, the blank node
identifier abc is used in line 9 to identify the blank
node as the subject of several statements, and is used in line
7 to indicate that the blank node is the value of a resource's
exterms:editor property. The advantage of using a
blank node identifier over some of the other approaches
described in [RDF-SYNTAX] is that
using a blank node identifier allows the same blank node to be
referred to in more than one place in the same RDF/XML
document.
Finally, the typed literals described in Section 2.4 may be used as property
values instead of the plain literals used in the
examples so far. A typed literal is represented in RDF/XML by
adding an rdf:datatype attribute specifying a datatype
URIref to the property element containing the literal.
For example, to change the statement in Example 2 to use a typed literal instead
of a plain literal for the creation-date property, the
triple representation would be:
ex:index.html exterms:creation-date "1999-08-16"^^xsd:date .
with corresponding RDF/XML syntax shown in Example 7:
1. <?xml version="1.0"?>
2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3. xmlns:exterms="http://www.example.org/terms/">
4. <rdf:Description rdf:about="http://www.example.org/index.html">
5. <exterms:creation-date rdf:datatype=
"http://www.w3.org/2001/XMLSchema#date">1999-08-16
</exterms:creation-date>
6. </rdf:Description>
7. </rdf:RDF>
In line 5 of Example 7, a typed
literal is given as the value of the ex:creation-date
property element by adding an rdf:datatype attribute
to the element's start-tag to specify the datatype. The value
of this attribute is the URIref of the datatype, in this case,
the URIref of the XML Schema date datatype. Since this
is an attribute value, the URIref must be written out, rather
than using the QName abbreviation xsd:date
used in the triple. A literal appropriate to this datatype is
then written as the element content, in this case, the literal
1999-08-16, which is the literal representation for
August 16, 1999 in the XML Schema date datatype.
In the rest of the Primer, the examples will use typed literals from appropriate datatypes rather than plain (untyped) literals, in order to emphasize the value of typed literals in conveying more information about the intended interpretation of literal values. (The exceptions will be that plain literals will continue to be used in examples taken from actual applications that do not currently use typed literals, in order to accurately reflect the usage in those applications.)
Example 7 illustrates that using typed literals requires writing an rdf:datatype attribute with
a URIref identifying the datatype for each element whose value is a typed literal. As noted earlier, RDF/XML requires that URIrefs used as attribute values
must be written out, rather than abbreviated as a QName.
XML entities can be used in RDF/XML to improve readability
in such cases, by providing an additional abbreviation
facility for URIrefs. An XML entity declaration essentially
associates a name with a string of characters. When the entity
name is used elsewhere within an XML document, XML parsers replace
the entity name with the corresponding string. For example, the
ENTITY declaration (specified as part of a DOCTYPE
declaration at the beginning of the RDF/XML document):
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
assigns the name xsd to the string representing the
namespace URIref for XML Schema datatypes. This declaration allows
the full namespace URIref to be abbreviated elsewhere in the XML
document as the entity &xsd;. Using this abbreviation,
Example 7 could also be written as shown in
Example 8.
1. <?xml version="1.0"?>
2. <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
3. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
4. xmlns:exterms="http://www.example.org/terms/">
5. <rdf:Description rdf:about="http://www.example.org/index.html">
6. <exterms:creation-date rdf:datatype="&xsd;date">1999-08-16
</exterms:creation-date>
7. </rdf:Description>
8. </rdf:RDF>
Note the DOCTYPE declaration in line 2 defining the entity,
and the use of that entity in line 6.
For readability purposes, the examples in the rest of the
Primer will use the XML entity &xsd; as just described.
XML entities are discussed
further in Appendix B.
As illustrated in Appendix B,
other URIrefs (and, more generally, other strings)
can also be abbreviated using XML entities.
However, the URIrefs for XML Schema datatypes are the only ones that will be
abbreviated in this way in Primer examples.
Although additional abbreviated forms for writing RDF/XML are available, the facilities illustrated so far provide a simple but general way to express graphs in RDF/XML. Using these facilities, an RDF graph is written in RDF/XML as follows:
rdf:Description element, using an
rdf:about attribute if the node has a URIref, or
an rdf:nodeID attribute if the node is
blank.rdf:resource attribute
specifying the object of the triple (if the object node has a
URIref), or an rdf:nodeID attribute specifying
the object of the triple (if the object node is blank).Compared to some of the more abbreviated approaches described in [RDF-SYNTAX], this simple approach provides the most direct representation of the actual graph structure, and is particularly recommended for applications in which the output RDF/XML is to be used in further RDF processing.
So far, the examples have assumed that the resources
being described have been given URIrefs already. For instance, the initial
examples provided descriptive information about
example.org's Web page, whose URIref was
http://www.example.org/index.html. This resource was identified
in RDF/XML
using an rdf:about attribute citing its full URIref.
Although RDF does not specify or control how URIrefs are
assigned to resources, sometimes it is desirable to achieve the
effect of assigning URIrefs to resources that are part
of an organized group of resources. For example, suppose a
sporting goods company, example.com, wanted to provide an
RDF-based catalog of its products, such as tents, hiking boots,
and so on, as an RDF/XML document, identified by (and located
at) http://www.example.com/2002/04/products. In that
resource, each product might be given a separate RDF
description. This catalog, along with one of these
descriptions, the catalog entry for a model of tent called the
"Overnighter", might be written in RDF/XML as shown in Example 9:
1. <?xml version="1.0"?> 2. <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> 3. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 4. xmlns:exterms="http://www.example.com/terms/"> 5. <rdf:Description rdf:ID="item10245"> 6. <exterms:model rdf:datatype="&xsd;string">Overnighter</exterms:model> 7. <exterms:sleeps rdf:datatype="&xsd;integer">2</exterms:sleeps> 8. <exterms:weight rdf:datatype="&xsd;decimal">2.4</exterms:weight> 9. <exterms:packedSize rdf:datatype="&xsd;integer">784</exterms:packedSize> 10. </rdf:Description> ...other product descriptions... 11. </rdf:RDF>
(The surrounding xml, DOCTYPE, RDF, and namespace information is included in lines 1 through 4, and line 11, but this information would only need to be provided once for the whole catalog, not repeated for each entry in the catalog).
Example 9 is similar to previous
examples in the way it represents the properties (model,
sleeping capacity, weight) of the resource (the tent) being
described. However, in line 5, the rdf:Description
element has an rdf:ID attribute instead of an
rdf:about attribute. Using rdf:ID specifies
a fragment identifier, given by the
value of the rdf:ID attribute (item10245 in
this case, which might be the catalog number assigned by
example.com), as an abbreviation of the complete URIref of the
resource being described. The fragment identifier
item10245 will be interpreted relative to a base
URI, in this case, the URI of the containing catalog
document. The full URIref for the tent is formed by taking the
base URI (of the catalog), and appending the character
"#" (to
indicate that what follows is a fragment identifier) and then
item10245 to it, giving the absolute URIref
http://www.example.com/2002/04/products#item10245.
The rdf:ID attribute is somewhat similar to the ID
attribute in XML and HTML, in that it defines a name which must
be unique relative to the current base URI (in this example, that of the catalog).
In this case, the rdf:ID
attribute appears to be assigning a name (item10245)
to this particular kind of tent. Any other RDF/XML within this
catalog could refer to the tent by using either the absolute URIref
http://www.example.com/2002/04/products#item10245,
or the relative URIref #item10245. The
relative URIref would be understood as being a URIref defined relative to the
base URIref of the catalog. Using a similar abbreviation,
the URIref of the tent could also be given by specifying
rdf:about="#item10245" in the catalog entry (i.e., by
specifying the relative URIref directly) instead of
rdf:ID="item10245" . As an abbreviation mechanism,
the two forms are essentially
synonyms: the full URIref formed by RDF/XML is the same in
either case:
http://www.example.com/2002/04/products#item10245.
However, using rdf:ID provides an additional check
when assigning a set of distinct names, since a given value of the
rdf:ID attribute can only appear once relative to the
same base URI (the catalog document, in this example). Using
either form, example.com would be giving the URIref for the
tent in a two-stage process, first assigning the URIref for the
whole catalog, and then using a relative URIref in the
description of the tent in the catalog to indicate the URIref
that has been assigned to this particular kind of tent.
Moreover, this use of a relative URIref can be thought of either
as being an abbreviation for a full URIref that has been
assigned to the tent independently of the RDF, or as being the
assignment of the URIref to the tent within the catalog.
RDF located outside the catalog could refer to this
tent by using the full URIref, i.e., by concatenating the
relative URIref #item10245 of the tent to the base URI
of the catalog, forming the absolute URIref
http://www.example.com/2002/04/products#item10245. For
example, an outdoor sports Web site exampleRatings.com might
use RDF to provide ratings of various tents. The (5-star)
rating given to the tent described in Example 9 might then be represented on
exampleRatings.com's Web site as shown in Example 10:
1. <?xml version="1.0"?> 2. <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> 3. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 4. xmlns:sportex="http://www.exampleRatings.com/terms/"> 5. <rdf:Description rdf:about="http://www.example.com/2002/04/products#item10245"> 6. <sportex:ratingBy rdf:datatype="&xsd;string">Richard Roe</sportex:ratingBy> 7. <sportex:numberStars rdf:datatype="&xsd;integer">5</sportex:numberStars> 8. </rdf:Description> 9. </rdf:RDF>
In Example 10, line 5 uses an
rdf:Description element with an rdf:about
attribute whose value is the full URIref of the tent. The use
of this URIref allows the tent being referred to in the rating
to be precisely identified.
These examples illustrate several points. First, even though RDF does not specify or control how URIrefs are assigned to resources (in this case, the various tents and other items in the catalog), the effect of assigning URIrefs to resources in RDF can be achieved by combining a process (external to RDF) that identifies a single document (the catalog in this case) as the source for descriptions of those resources, with the use of relative URIrefs in descriptions of those resources within that document. For instance, example.com could use this catalog as the central source where its products are described, with the understanding that if a product's item number is not in an entry in this catalog, it is not a product known to example.com. (Note that RDF does not assume any particular relationship exists between two resources just because their URIrefs have the same base, or are otherwise similar. This relationship may be known to example.com, but it is not directly defined by RDF.)
These examples also illustrate one of the basic architectural principles of the Web, which is that anyone should be able to freely add information about an existing resource, using any vocabulary they please [BERNERS-LEE98]. The examples further illustrate that the RDF describing a particular resource does not need to be located all in one place; instead, it may be distributed throughout the Web. This is true not only for situations like this one, in which one organization is rating or commenting on a resource defined by another, but also for situations in which the original definer of a resource (or anyone else) wishes to amplify the description of that resource by providing additional information about it. This may be done by modifying the RDF document in which the resource was originally described, to add the properties and values needed t