RDF Primer

RDF - the Resource Description Framework - is a de facto metadata language developed by the W3C. Metadata is simply a term for "data about data". Metadata is used in all walks of life - from information in Web pages such as: title, author, and last modified dates; to information about books from online shopping facilities: prices, publisher, availablility. RDF is a common framework enabling people to express this data in such as way as it can be interoperable. By choosing to use this common framework, you get the added benefit that you can use some of the many tools around (RDF parsers and processors) to maintain the data. This Primer is designed to provide the reader the basic fundamentals required to effectivly use RDF in their particular metadata applications.

Status of this document

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use it as reference material or to cite as other than "work in progress".

Introduction

There is a dedicated core community working on RDF, and it is very easy to get help on projects, to ascertain how RDF may or may not be able to help in your application scenario. This primer is not intended as a substitute for reading the specifications, or getting to grips with the work currently being done, but it is intended as a valuable resource for enabling you to find out:-

What does RDF look like?
How can I write/access/process RDF?
How does RDF effect me?

The key principles behind RDF are in fact very simple, and it is relatively easy to port current information models so that they use RDF. It is also just as easy to build new information systems from scratch using RDF.

RDF itself is related to many different academic and business environments and domains, among the groups finding utility in RDF are librarians, logicians, database maintainers, knowledge representation communities, and news/information syndicators.

Enough procastrination; what does RDF "look" like? The following a small chunk of RDF in XML format (don't worry if you don't know what XML is for the time being):-

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.w3.org/2000/10/swap/pim/contact#"> <Person rdf:about="http://www.w3.org/People/EM/contact#me"> <mailbox rdf:resource="mailto:em@w3.org"/> <fullName>Eric Miller</fullName> <personalTitle>Semantic Web Activity Lead</personalTitle> </Person> </rdf:RDF>

This example is just a representation of some simple data which roughly translates as "there is someone called Eric Miller, with the email address em@w3.org, and who is the Semantic Web Activity Lead". Note that there are Web addresses in there - the utility of which we shall explain later on - and some rather obvious things including some "properties" like "mailbox" and "fullName", and the values "em@w3.org", and "Eric Miller".

The advantage of having this information in a machine processable format is that we can link bits of data across the Web. The twist in the plot is that instead of the simple "hyperlinks" that one would find in HTML (the links in the documents), we can link any "thing" to any other "thing". So, instead of talking about Web pages, and sites, we can talk about cars, business, personnel, news events... in fact, anything.

As we continue through the primer, we shall be addressing standard ways of modelling things in RDF, implemeting systems, the relationship between RDF and the "Semantic Web", and discussing further resources and implementation for you to chase up.

Identifiers: Uniform Resource Identifier (URI)

If we want to discuss something, we must first identify it. How else will you know what one is referring to? In everyday communication, identity is assigned in many ways "Bob", "The Moon", "373 Whitaker Ave.", "California", "VIN 2745534", "todays weather", etc. and ambiguities are generally resolved due to a shared semantic context between the sender and the receiver. To identify "things" on the Web, we also use identifiers. Because we use a uniform system of identifiers, and because each item identified is considered a "resource," we call these identifiers "Uniform Resource Identifiers" or URIs for short. We can give a URI to anything, and anything that has a URI can be said to be "on the Web". The URI is the foundation of the Web. While nearly every other part of the Web can be replaced, the URI cannot: it holds the Web together.

URIs are decentralized. No one person or organization controls who makes them or how they can be used. While some URI schemes (such as URL's http:) depend on centralized systems (such as DNS), other schemes (such as freenet:) are completely decentralized. This means that you don't need anyone's permission to create a URI.

@@ segue needed @@

Documents: Extensible Markup Language (XML)

XML was designed to allow anyone to design their own document format and then write a document in that format. These document formats can include markup to enhance the meaning of the document's content. This markup is "machine-readable," that is, programs can read and understand the corresponding structure.

The following is a simple passage marked up using an XML-based markup language:

Elements ("scentence", "person", etc.) are introduced to reflect a particular structure associated with the passage. As you might have guessed already, there is a problem here. I've used the words "sentence," "person," and "animal" in my markup language to convey meaning. But these are pretty common words so we should be ok, right? Wrong. To a non-english speakers, the element "person" may mean absoluely nothing to him/her. Take the following for example.

To a machine, its the exact same structure. All of the sudden, its no longer clear what it is ones trying to say. Also, what if others have used these same words in their own markup languages but indeed have comletely different meanings? Perhaps "sentence" in another markup language refers to the amount of time that a convicted criminal must serve in a penal institution. How is my computer to keep these straight?

To prevent confusion, one must uniquely identify my markup elements. And what better way to identify them than with a Uniform Resource Identifier. To do this in XML, we use XML Namespaces. This way, anyone can create their own tags and mix them with tags made by others. A namespace is just a way of identifying a part of the Web (space) from which we derive the meaning of these names. I create a "namespace" for my markup language by creating a URI for it. I'll probably create a Web page to describe my markup language and use the URL of my Web page as the URI for my namespace. @@ referernce The Professor and the Madman? @@

Since everyone's tags have their own URIs, we don't have to worry about tag names conflicting. The elements mean the same if they have the same URI's.

<my:sentence my:xmlns="http://example.org/xml/documents/">
<my:person my:href="http://example.com/#me">I</my:person> just got a new pet <my:animal>dog</my:animal>.
</my:sentence>

The RDF Model

It's wonderful that we can create URIs and talk about them with our web pages. However, it'd be even better if we could talk about them in a way that computers could begin to process what we're saying. For example, it's one thing to say "I really like 'Weaving the Web.'" on a web discussion forum. But what would this mean to a computer?

RDF gives you a way to make statements that are machine-processable. Now the computer can't actually "understand" what you said, of course, but it can deal with it in a way that makes it seem like it does. For example, I could search the Web for all book reviews and create an average rating for each book. Then, I could put that information back on the Web. Another website could take that information (the list of book rating averages) and create a "Top Ten Highest Rated Books" page. RDF provides a way of recording knowledge so that applications can more easily process.

RDF is really quite simple. If XML is the ASCII of the future, RDF can be thought of as a simple scentence grammer for allowing people and applications to communicate more effectively. RDF can be said to consist purely of simple sentences (or statements). A set of predefined collections of "words" useful for construsting particular kinds of scentences are provided. These "words" are discussed in section @@ section @@.

An RDF statement consists of three parts:

a subject,
a predicate, and
an object.

Statements are about Web resources, so subjects and objects are URIs, machine-readable identifiers.

Objects can also be plain text strings. Saying the document "http://www.w3c.org/2001/sw/ has a title of 'Semantic Web Home Page'" is represented by the statement:

a subject "http://www.w3c.org/2001/sw/"
a predicate "title"
an object "Semantic Web Home Page"

To disambiguate the different predicates that can be used, ("title" as in "title of book", or "title as in "title of person (Dr., Mr., Mrs. etc.)), every predicate must be given a URI. In this case, for example we may choose to use a predicate that conveys more of the "title of book" semantic. More specifically, the Dublin Core predicate of "http://purl.org/dc/elements/1.1/title". The statment then becomes

"http://www.w3c.org/2001/sw/", "http://purl.org/dc/elements/1.1/title", "The Semantic Web Home Page" .

@@ figure @@

This demonstrates that URIs can be used to name not only concrete digital documents on the web, but abstract entities as well (the meaning of 'title'). This turns out to be increasingly important when we begin to define additional relationships to predicates (e.g. company x's version of title and company y's version of title mean the same thing). More on this issue is dicussed in Section @@ ?? @@.

In order to talk about non-digital resources, hoever, we must give them URIs. For example, to talk about the organization W3C (i.e., use it the subject of a statement), we must give it a URI. Let's give it the URI "http://www.w3c.org/organization". We can now say things such as ... http://www.w3c.org/2001/sw/ is a document. It was created by http://www.w3c.org/organization. http://www.w3c.org/organization is an organization. It has the name 'W3C'"

"http://www.w3c.org/2001/sw/", "http://purl.org/dc/elements/1.1/title", "The Semantic Web Home Page" . @@ more @@

@@ figure @@

More complicated RDF expressions like this are usually represented as graphs, where the subjects and objects are nodes, and the predicates are edges5.

This is all there is to basic RDF - nodes-and-arcs diagrams interpreted as statements about concepts or digital resources represented by URIs6 . However, the need for standardized vocabularies for things like "organization" and the predicate "is a" is evident. The basis for such vocabularies in RDF is RDF Schema7.

This specification provides the basic vocabulary to express relationships between terms: resources being instances of terms ("http://www.w3c.org/organization is an organization"), terms being subterms of other terms ("a hex-head bolt is a type of machine bolt") and so on.

It also provides means to restrict the usage of predicates: "is a parent of" only applies to persons, etc. The terms instance, subterm, applies to are the kind of terms defined by the RDF Schema specification.

Using the vocabulary provided by RDF Schema, it is easy to create your own semantically rich vocabularies.

The RDF Syntax

Many RDF developers encounter the details of the RDF XML syntax at a relatively late stage. RDF distinguishes carefully between the edge-labeled graph information model and the encoding of this model in XML documents. This allows a lot of work to be done without familiarity with the XML syntax in which RDF is written. Some familiarity with the XML syntax is still valuable, and for developers familiar with XML in general and with the RDF graph model, this knowledge can be acquired fairly easily.

The notion of "striping" is a very useful conceptual tool for understanding RDF/XML: the RDF 1.0 syntax has been informally described as a "striped" graph encoding syntax. Striping is described in more detail below.

Tools for Learning

Two other tools are also useful when learning RDF/XML: parsers and visualisers. The first such tool was Janne Sareela's SiRPAC; there are now a large number of RDF parsers available, in a variety of programming languages. An RDF parser is a tool that takes an XML encoding ("serialization") of an RDF graph, and returns a textual or programmatic representation of the graph. Playing with an RDF parser such as ARP, the parser used by W3C's RDF Validation Service makes it easy to experiment with RDF/XML files and see the associated node-edge-node triples that constitute the corresponding graph structure.

The other tool that can help an RDF developer get to grips with the syntax is GraphViz, or one of the GraphViz-based RDF visualization tools such as RDFViz. GraphViz is a graph visualisation toolkit. It can take descriptions of (various kinds of) graph and generate reasonably pretty pictures in various image formats. There are now a variety of filters that take the output from an RDF/XML parser and generate .dot input files for GraphViz. This can be incredibly useful when learning the RDF/XML syntax, or debugging RDF content. A GraphViz-based RDF visualizer is now also part of W3C's RDF Validator service.

So, armed with parsers, visualisation tools and the RDF syntax spec, all of which are available from the RDF home page, how can a content-producer get a quick feel for the structure of RDF/XML? The basic concept to understand when looking at the XML syntax is striping. This can give one a handle on the essential organising principle of RDF's XML syntax. It should be noted, however, that this emphasis is slightly contrary to the way the original RDF spec is organised.

A Striped Syntax

To learn how to read and write RDF in XML syntax, you need to feel comfortable with the graph-based information model at the heart of RDF. Objects ('resources') linked together by typed relationships or 'properties'. And you need to be at ease with the way RDF tries to use names in URI syntax wherever possible, to name both resources, their types ('classes') and their attributes and interelationships ('properties'). If you're happy with all that, you'll also need some mental baggage from the XML side of things. RDF graphs are encoded in XML, and this encoding makes use of some features of XML. You need to know about the basic abstract structure of all XML documents: the tree of elements (some decorated with attribute/value pairs), and about the way these are manifested as nested hierachies of opening and closing angle-bracketted "tags" in XML documents. You'll also perhaps have heard of the notion of a well-formed XML document, of 'namespaces', of DTDs, of XML Schemas and various other features. These are all good to know about, but the critical concepts to possess here are the notions of (i) well-formedness, and (ii) XML namespaces, backed up by general comfort with XML's elements/attributes/nesting structure. Having gotten this far, it isn't such a big leap to grasp the basic pattern that underlies the RDF/XML serialization syntax: striping.

An XML syntax for RDF specifies a strategy for encoding the node-edge-node structure that RDF cares about in terms of the (attribute-decorated) element hierarchy that XML cares about. There are a number of ways this can be done. RDF 1.0 adopts a style that we term 'striped'; other conventions have been proposed, but the focus here is on RDF 1.0. The XML syntax needs to map from RDF's URI-named resources, properties and classes ( nodes, edge-types, node types... if you prefer a more visual terminology) into a class of well-formed XML documents. The XML namespace mechanism is used for this. So our main task here is to explain how the node-edge-node structures from RDF become element and attribute structures in XML. To do this, we can focus on the notion of striping and forget some annoying details for now.

Stripes and Graphs

While not all RDF/XML fits the pattern described here, a lot of it does. Additionally, the online validation service is your friend: it checks your syntax, and can generate tabular and graphical views of the graph so you can make sure you're written what you mean to write.

So, this is what we mean about striping.

Consider a graph of nodes, each with a type (ie. category or 'class'), and each having a bunch of named properties (relationships) connecting it to other nodes, which might be simply string-y values, or further nodes that are themselves at the sharp and/or blunt ends of various other edges in the graph. We need to create XML elements (possibly with associated attributes) that stand for these nodes and arcs. RDF's convention for doing this is called striped because, as you look at the XML element nesting structure, elements alternately represent nodes and edges.

Worked Example (s1.rdf)

Here we're saying, loosly, that "there exists a Person with a name, 'John', and that person 'livesWith' a Person that has a father that is a Person with a name 'Fred' ". That's all our example piece of RDF/XML tells us.

The RDF node-and-edge view of this is shown graphically below. To undestand striping, we need compare the abstract graph structure of RDF to the details of the XML nesting structure, ie. the way some elements are 'inside' (rather than alongside) others.

note: this RDF/XML example is numbered, to show the levels of XML nesting inside the rdf:RDF wrapper element.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/rdf-syntax-ns#"
     xmlns="http://example.com/some-dlg-schema#"> 

1:<Person>
2:   <name> John  </name>
2:   <livesWith>  
3:      <Person>
4:         <father>
5:            <Person>
6:               <name> Fred </name>
5:            </Person>
4:         </father>
3:      </Person>
2:   </livesWith>
1:</Person>

</rdf:RDF>

Graph structure

This RDF/XML encodes the graph depicted in the following diagram. Note that the blank nodes indicate resources that were mentioned but not explicitly named with URIs in the XML serialization.

Represented as triples, the graph is as follows:

RDF Graph as Triples

The RDF graph is a collection of triples that represent statements about the named properties of resources. The 'subject' denotes the resource described; the 'predicate' denotes a property of that resource, and the 'object' indicates a value of that property for the specified resource. Predicates correspond to edges in the graph, and to the even-numbered stripes in the XML document hierarchy shown here.

Number	Subject	Predicate	Object
1	genid:23334	http://example.com/some-dlg-schema#name	John
2	genid:23336	http://example.com/some-dlg-schema#name	Fred
3	genid:23336	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://example.com/some-dlg-schema#Person
4	genid:23335	http://example.com/some-dlg-schema#father	genid:23336
5	genid:23335	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://example.com/some-dlg-schema#Person
6	genid:23334	http://example.com/some-dlg-schema#livesWith	genid:23335
7	genid:23334	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://example.com/some-dlg-schema#Person

The same data, presented in the RDF Core WG's "ntriples" graph dump syntax is written as:

_:j23337 <http://example.com/some-dlg-schema#name> " John " . _:j23339 <http://example.com/some-dlg-schema#name> " Fred " . _:j23339 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/some-dlg-schema#Person> . _:j23338 <http://example.com/some-dlg-schema#father> _:j23339 . _:j23338 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/some-dlg-schema#Person> . _:j23337 <http://example.com/some-dlg-schema#livesWith> _:j23338 . _:j23337 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/some-dlg-schema#Person> .

Walk-through

Here is an informal walk-through of the XML document's structure. The first level of XML elements, our first occurance of Person, stands for a node (some specific instance of the type of thing we're calling 'Person'). And then the striping starts. The next level in, we see two XML elements: one is 'name', the other 'livesWith'. These stand not for nodes in the graph, but edges. The first is an edge labeled 'name' connecting our person to the node whose content is the string 'John'. The second is an edge labeled 'livesWith' that points from our first Person node to a second Person node.

So now we're into the third level of XML nesting, and the striping pattern means that this odd-numbered level of nesting is describing a node. Any XML sub-elements below it in the XML tree are, accordingly, representations of that Person's properties, ie. edges in the graph. We have one such edge, 'father', whose XML element contains the third 'Person' element (standing for a node of type Person). That element has just one sub-element, 'name', which provides a label for an edge connecting the third person to a node whose content is the string 'Fred'.

So to recap we've seen: a node (of type Person), with edge ('name': John), and edge ('livesWith') pointing at a node (of type Person) having an edge ('father') pointing at a node (of type Person) with an edge ('name': John).

The XML elements at the 1st, 3rd, and 5th levels of nesting all stand for individual nodes, in our scenario they happen to all be of the same type, Person. The XML elements at the 2nd, 4th, and 6th levels of nesting represent labeled edges in the graph, ie. RDF properties.

We alternate between node-describing and edge-describing XML elements, starting always with the description of a node. For node-describing elements the XML element name maps onto the type, or class of the resource represented by the node. For edge-describing elements, the XML element name supplies a label for the RDF property that connects the associated resources.

This is RDF striping. Understanding this basic representational convention is all you need to understand most RDF/XML examples you'll encounter.

Some observations (gory details and small print)

You can't tell, without starting at the top and counting on your fingers, whether an XML element in the RDF serialisation represents an edge, or a node. But often you can cheat! Look again at the example, and notice that edge even-numbered layer of XML, the red 'edge label' stripes, has a name beginning with a lower case letter. Many RDF vocabularies (including the core RDF specs themselves) adopt this convention. We name properties with a lower case, and classes of thing with an upper case name (eg. 'Person').

I haven't mentioned the rdf:Description element. The RDF 1.0 Model and Syntax spec gives this a lot of attention when presenting the RDF syntax. Basically it can occur on any of the node-describing XML elements (ie. odd-numbered) in the striped syntax. It is redundant, and a bit confusing since apart from the option of putting rdf:Description on the node-describing elements, we can always map from the name of these nodes to an RDF type that is a class for the thing the node describes. In our example, 'Person'. So the existence of rdf:Description in the syntax complicates things. Whenever you see it, pretend you saw a node called 'Resource' instead; that way, you can read it as 'there exists a Resource...'.

We've said nothing about namespaces here yet. RDF uses the XML namespace mechanism to associate all these classes and properties with Web identifiers (URIs). We've said nothing here about the use of XML attributes. Here's a short version. When you see an attribute on a node-level element, eg on the 'Person' elements in the example above, it always stands for an RDF property, whose value is always written a simple literal string.. Except for some some special cases, of course, otherwise things would be too simple. One special case is important: the rdf:about attribute. When you see rdf:about, this is RDF's way of telling you that we know a URI name for the thing concerned. These are not treated as properties, but are in a sense 'built in' to RDF at a deep level. Also rdf:ID, and xmlns:*, xml:lang, xml:base and probably some others. See the syntax spec for details. But the basic idea is: when you see attributes on a node-level XML element (the ones whose names often begin with capital letters), the attribute represents an edge pointing to a literal value.

Another important case: representing edges that point to nodes that are described elsewhere (within the same document but not within this part of the element tree; or elsewhere in the Web). For this, RDF has the rdf:resource attribute. This always appears at the edge-level of the XML document, ie. on elements that stand for edges rather than for nodes. Apart from that, it functions similarly to the rdf:about, in that it uses URIs to point off to a node instead of describing it inline.

There are many other corner cases in the spec. RDF's rdf:parseType attribute, for example, complicates the simplistic striping model described here. But for many common cases, the notion of 'striped syntax' will provide some useful mental scaffolding that'll help you read the XML not just "as XML", but as an XML description of the abstract RDF graph. If in doubt, experiment with the free online parsing and visualisation service at W3C.

RDF in the Field

Dublin Core Metadata Initiative

@@ hmm... who can i get to help on this that I don't have working on other things... aaron? @@

XMP

@@ who from Adobe? @@

PRISM

@@ RonD can you provide? @@

Site Maps / Topic Maps

RSS RDF Site Summary 1.0

@@ Rael can you provide? @@

Intelligent Routing

The world is full of information. Behind the millions of pages on the Internet's publicly visible part, the Web, there are many times as many documents flowing in and out of organizations via emails, cross-company networks and constant always-on information `feeds'.

Every document that passes along the wires has to be inspected, processed or re-routed. A document simply written by one human being has to be read by another before anybody knows its worth or where it should be redirected. This is fine for a person-to-person email but, for information destined to a broad circulation, this can be expensive, often reducing the value of the information by raising its handling cost or simply making it late.

For example, when an individual subscribes to a source of news, it's usually on the understanding that everything in that feed is of interest and so everything will be delivered without question. For the distributor to sort out the interesting ones for you manually would be time-consuming, expensive and boring; so instead we accept dozens of emails and delete most of them every morning. And of course it is time-consuming, expensive and boring. Subscription to some less self-critical sources is a step to be taken very seriously.

When a company subscribes to a news feed, it may be risking a deluge of unwanted data. If it intends to circulate the information within the company or to a broad range of clients, it charges itself with checking every document by eye or investing in extra software technology. Without such protection, the company's networks will soon collapse under the load or its clients will consider themselves willfully `spammed' and withdraw their custom.

The redirection of such feeds is therefore a matter of utmost commercial sensitivity in a context of huge and increasing volumes and complexity of data. The technology concerned is `routing' and, in the most modern cases, relies on RDF.

The need, traditionally, for human inspection of incoming documents comes from the fact that, on its own, text has no value. It only has value when you know what it's about, what authority is its source and who it's intended for. Everything else is, as we know, just material for spamming. For a software agent to recognize a document's worth it must have access to an evaluation that is consistently readable, whatever the format of the document, and is reliable in its description.

For those two objectives, we need an internationally standardized language and a globally recognized set of values. These are RDF and RDF Schemas such as Dublin Core and Prism. The longed-for independent evaluation takes the form of an associated RDF document.

Not that every document from every information source comes with its associated RDF description... yet. It is the case however that almost every serious source supplies some value-based annotation in the form of metadata, the significant content of RDF. For example, news feeds generally come in one of a selection of annotated formats, mostly based on XML, such as NewsML or XMLNews. Most standards-oriented companies are adding freely-accessible metadata to their document formats. Adobe, for example, recently announced XMP whereby metadata can be inserted into (and more importantly extracted from) PDF documents. The message from such companies is that, even if you can't understand or even have no right to read the contents, you are entitled to know enough to make an evaluation for your own use or for clients who can use the information. For freely available information, standards are the key.

Now this basic process (source embeds standard annotations: annotations are used to divert and sort documents) is certainly not new. Email (SMTP) and news (NNTP) protocols use standard keyword-value-pair headers which are fundamental to their operation: such documents are marked up according to known and publicized standards. What is new is to normalize all these local formats to a general one and thereby be able to appeal to a globally consistent set of values in making judgments.

For a universal router to do its job, it needs to cancel out these variations in format. Until the world adopts one standard, this will be a matter of tact and ingenuity but the existence of a core standard is important here. When a slew of formats and value-systems need to be compared it is safer to have one standard to convert to first and then to compare rather than do it piecemeal - and that standard must be broader than all the others. Again, RDF (and RDF Schemas) standards are the natural choice.

An Information Router collects metadata and stores it (rather like an enormous RDF document describing maybe millions of resources at once). This metadata store holds the descriptions in exactly the terms of RDF. They can therefore be exported or imported as industry-standard RDF without loss or confusion. World-wide, repositories of metadata may be synchronized and refreshed by exchanging RDF. While humans are exchanging images, videos and news items, metadata servers are exchanging compact RDF evaluations of them (the images, videos and news items, not the humans).

The actual documents described, orders of magnitude larger than the metadata, can be stored elsewhere or just left where they are (located by URI, of course). The metadata is compact and loaded with value. Judgments about distributing material can be made in a context values (the standard predicate systems like Dublin Core) and a vast number of alternatives, all without moving the actual documents around or indeed even looking at them, by computer or by human eye.

Judgments are made by applying RDF `queries' which are testing the value of a document to the reader: whether the subject is interesting, the content is suitable, the author respected, the source reliable, the document accessible, the cost reasonable, the language intelligible, the conclusion desirable, the format tractable, the medium handleable, etc, etc. The actual form of a query varies from product to product. (In any case, the consumer would be given a graphical way to express his wishes.)

In one case, a query takes the form of a modified RDF description which, if you like, asks to be proved or disproved by a body of metadata. So an RDF Description that stated that a document exists with the title `Financial history of Belize' can be viewed as a request to find such a document.

The news distributor's server runs, in addition to the usual server software, one of these Information Router packages, applies queries on behalf of its clients and delivers just those documents that survive the evaluation.

If a complex multi-layered query describing just what it takes to please you is associated with your name as a subscriber, you can, using software available today, guarantee that what you get sent is exactly and only what you need. It's an end to spam, thanks to RDF.

RDF Modules - Putting RDF to Work

@@ clarify that RDF is defined to be model, schema and syntax ... Separation of vocabulary modules acorss RDF Specs artifact of management, etc. @@

@@ note: does breaking down the semanitcs in terms of logical components make sense? I no longer am sure... ask working group for feedback. @@

Typing Vocabulary

RDF provides for simple typing. Using defined primitives in RDF, we can say that "Fido" is a type of "Dog", and that "Dog" is a sub class of animal. We can also create properties "haircolor", "tailsize", etc. used for describing characteristics of a class. Some slightly more "advanced" stuff such as creating ranges and domains for properties (e.g. "tailsize" can only be used for describing the class of "Dog" so if I see "tailsize" descibing a "human" something is odd) is discussed in the following section on Contraint Vocabulary.

The simple typing vocabulary for RDF include the notion of a "Resource" (rdfs:Resource), a "Class" (rdfs:Class), and the "Property" (rdf:Property). These are all "classes", in that terms may belong to these classes. For example, all terms in RDF are types of resource. To declare that something is a "type" of something else, we just use the rdf:type property.

rdfs:Resource rdf:type rdfs:Class .
rdfs:Class rdf:type rdfs:Class .
rdf:Property rdf:type rdfs:Class .
rdf:type rdf:type rdf:Property .

This simply says that "Resource is a type of Class, Class is a type of Class, Property is a type of Class, and type is a type of Property". These are all true statements.

It is quite easy to make up your own classes. For example, let's create a class called "Dog".

:Dog rdf:type rdfs:Class .

in XML/RDF this would be

  <rdfs:Class rdf:ID = "Dog" />

Now we can say that "Fido is a type of Dog".

:Fido rdf:type :Dog .

which would be represented in XML/RDF as

  <Dog rdf:ID = "Fido" />

We can also create properties quite easily by saying that a term is a type of rdf:Property, and then use those properties to describe our dog.

:haircolor rdf:type rdf:Property .

and which in XML/RDF would be used to describe a particular characteristic of our dog

  <Dog rdf:ID = "Fido">
    <haircolor>Brown</haircolor>
  </Dog>

RDF also has a few more properties that we can make use of: rdfs:subClassOf and rdfs:subPropertyOf. These allow us to say that one class or property is a sub class or sub property of another. For example, we might want to say that the class "Dog" is a sub class of the class "Animal". To do that, we simply say:

:Dog rdfs:subClassOf :Animal .

which would be represented in XML/RDF as

  <rdfs:Class rdf:ID = "Dog">
    <rdf:subClassOf rdf:ID = "Animal" />
  </rdfs:Class>

Hence, when we say that Fido is a Dog, we are also saying that Fido is an Animal. We can also say that there are other sub classes of Animal:-

:Human rdfs:subClassOf :Animal .
:Duck rdfs:subClassOf :Animal .

And then create new instances of those classes:-

:Bob rdf:type :Human .
:Quakcy rdf:type :Duck .

And then we can invent another property, use that, and build up even more information...

:owns rdf:type rdf:Property .
:Bob :owns :Fido .
:Bob :owns :Quacky .
:Bob :name "Bob Fleming" .
:Quacky :name "Quakcy" .

And so on. You can see that RDF is very simple, and yet allows one to build up knowledge bases of data in RDF very very quickly.

Contraint Vocabulary

The next concepts which RDF provides us, which are important to mention, are ranges and domains. Ranges and domains let us say what classes the subject and object of each property must belong to. For example, we might want to say that the property ":bookTitle" must always apply to a book, and have a literal value:-

:Book rdf:type rdfs:Class .
:bookTitle rdf:type rdf:Property .
:bookTitle rdfs:domain :Book .
:bookTitle rdfs:range rdfs:Literal .
:MyBook rdf:type :Book .
:MyBook :bookTitle "My Book" .

rdfs:domain always says what class the subject of a triple using that property belongs to, and rdfs:range always says what class the object of a triple using that property belongs to.

Descriptive Vocabulary

RDF also contains a set of properties for describing the particular terms (Classes, Properties) one is defining by providing providing comments, labels, and the like. The two properties for doing this are rdfs:label and rdfs:comment, and an example of their use is:

:bookTitle rdf:type rdf:Property .
:bookTitle rdfs:label "bookTitle" .
:bootTitle rdfs:comment "the title of a book" .

It is a best practise to always label and comment your new properties, classes, and other terms.

Collection Vocabulary

Example of how one might use the Collection Vocabulary in practice is discussed in section @@ section @@.

Reification Vocabulary

Example of how one might use the Reification Vocabulary in practice is discussed in section @@ section @@.

Putting RDF to Work

@@ striping? @@

Creating Instance Data

Ok, so lets get our hands dirty and describe something.

Creating Schemas

Declaring Properties

Declaring Classes

Controlled Vocabularies

Data Types

Back to Instance Data: Mixing and Matching Vocabularies

Distributed Description: Joining the Web

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:dt="http://purl.org/dc/terms/">
<my:WebPage rdf:about="http://www.w3.org/2001/sw/">
  <dc:title>W3C Semantic Web Activity</dc:title> 
  <dc:creator rdf:resource =" http://www.w3.org/People/EM/contact#me" />
<my:WebPage>
</rdf:RDF>

@@ have to figure our cwm circle and arrow rules for displaying text @@

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns =" http://www.w3.org/2000/10/swap/pim/contact#">

<Person rdf:about =" http://www.w3.org/People/EM/contact#me">
  <mailbox rdf:resource =" mailto:em@w3.org" />
  <fullName>Eric Miller</fullName>
  <personalTitle>Semantic Web Activity Lead</personalTitle>
  <company>W3C World Wide Web Consortium</company>
  <phone>614.763.1100</phone>
</Person>

</rdf:RDF>

@@ have to figure our cwm circle and arrow rules for displaying text @@

Striping Introduction

Todo

RDF Data Model

1. Introduction

The basic difficulty in providing this software support is that the Web was originally aimed at providing its resources to people, not to other software, and so Web resources do not have descriptions of their meanings or capabilities that software can understand. For example, the meaning of a Web page is determined by human understanding of the screen content when the page is displayed in a browser. This meaning is inaccessible to a piece of software. As a result, software such as search engines must rely on such techniques as simple text matching, rather than being able to process Web resources based on an understanding of their true relationships to a user's intentions and needs.

The Semantic Web is going to change all that. The Semantic Web will enhance the Web's inter-linked information and service resources with software-interpretable descriptions of the resources' meanings, capabilities, and inter-relationships. These descriptions will allow tools such as agents, search engines, or service brokers to more automatically and reliably find and use appropriate resources in response to user requirements. At the same time, the Semantic Web creates the infrastructure for entirely new classes of agent-oriented capabilities. @@need some discussion of these other capabilities; also want to refer to apps involving non-Web resources

The W3C’s Resource Description Framework (RDF) is, as its name suggests, a framework (or approach) for describing Web resources. The approach is based on some very simple ideas, but when those ideas are taken together, and suitably generalized (as they are in RDF), they provide a means for describing practically anything, in a form that can be processed by software. The use of RDF (and richer approaches based on it) provides the basic technology for providing the software-interpretable descriptions required to support the Semantic Web.

At the same time, the use of RDF does not necessarily involve the use of inference, logic programming, or related technologies, as the term “semantic” might suggest. RDF also provides essential support for applications such as providing simple information about Web content (provenance, content ratings), defining privacy policies, specifying site maps, or supporting description-based Web service brokering.

@@Editorial notes: this version addresses various comments (mostly by Pat) on the original version, and preserves the original order of topic presentation. There have been some suggestions to move the description of URIs to the beginning, and then talk about Ntriples and graphs ("pure RDF"). A problem with that approach is that I think a primer on RDF ought to start off talking about RDF's general approach to describing things in terms of subjects, properties, and their values, rather than a discussion of how to identify things (in particular, the use of URIs to name properties doesn't have much significance until you talk about what a property is). I've got a version in the works that presents part of "The General Idea", then does the URI stuff, and then gets into triples, but the URI section is a major interruption to what I believe is the natural flow of the text (at least to someone with a database or other non-Web background). Anyway, we'll see.

2. The general idea

“the creator of [the particular Web page we’re talking about ] is John Smith “

We’ve underlined parts of this statement to illustrate that, in order to describe the properties of something, we need ways to identify a number of things:

In this statement, in addition to using a URL to identify the Web page, we’ve used the word “creator” to identify the property we want to talk about, and the two words “John Smith” to identify the thing (a person) we want to say is the value of this property.

We can state other properties of this Web page by writing additional English statements of the same general form, using the URL to identify the page, and words (or other strings) to identify the properties and their values. For example, to specify the date the page was created, and the language in which the page is written, we could write the additional statements:

RDF assumes as its basic model that things have properties which have values, and that resources can be described by making statements, similar to those above, that specify those properties and values. RDF uses a particular terminology for talking about the various parts of statements. Specifically, the part that identifies the thing the statement is about (the Web page in this example) is called the subject . The part that identifies the property or characteristic of the subject that the statement specifies (creator, creation-date, or language in this case) is called the predicate, and the part that identifies the value of that property is called the object. So, taking the statement

RDF statements are similar to a number of other formats for recording information, such as:

RDF represents statements as nodes and arcs in a graph. In this notation, a statement is represented by a node for the subject, a node for the object, and a labeled arc between them for the predicate, as in:

Collections of statements are represented by corresponding collections of nodes and arcs. So the three statements we’ve given so far would be represented by the following graph:

The graph is technically a labeled directed graph, since the arcs have labels, and are “directed” (point in a specific direction, from subject to object).

Sometimes it is not convenient to draw graphs, so an alternative way of writing down the statements, called triples, can be used. In the triples notation, each statement in the graph is written as a simple triple of subject, predicate, and object, in that order. The triples representing the above three statements would be written:

<http://www.foobar.org/index.html>   creator              "John Smith"
<http://www.foobar.org/index.html>   creation-date   "August 16, 1999"
<http://www.foobar.org/index.html>   language          "English"

Each triple corresponds to a single arc in the graph, complete with the arc’s beginning and ending nodes (the subject and object of the statement). Unlike the drawn graph, the triple notation requires that a node be separately identified for each statement it appears in. So, for example, http://www.foobar.org/index.html appears three times (once in each triple) in the triple representation of the graph, but only once in the drawn graph.

In each of the statements we’ve considered so far, the object has been a simple string (e.g., we've used “John Smith" to identify a particular person, and "August 16, 1999" to identify a particular date). In RDF, the objects in statements may be any kind of string. More importantly, the objects in RDF statements may also be the URLs of other Web resources. This allows us to represent not only the properties of individual resources, but also relationships between those resources and others. So, for example, we could represent the fact that the resource at http://www.barbaz.org/myprojects.html has the same creator as the resource at http://www.foobar.org/index.html by the statement

<http://www.foobar.org/index.html> sameCreatorAs <http://www.barbaz.org/myprojects.html>

Adding these two additional statements to the original ones would give us the graph shown below:

This graph illustrates another aspect of the way we represent RDF graphs in drawings: nodes that represent URIs are shown as ellipses, while nodes that represent strings are shown as boxes.

3. RDF identifiers

We first need to provide further detail about how RDF actually specifies the subjects, predicates, and objects of statements. So far, the identifiers we’ve used are:

We’ve recorded information about lots of things that don’t have URLs using files (both manual and automated) for many years, and the way we identify those things is by assigning them identifiers: values that we uniquely associate with the individual things. The identifiers we use to identify various kinds of things go by names like “Social Security Number”, “Part Number”, “license number”, “employee number”, “user-id”, etc. In some cases, these identifiers (such as Social Security Numbers) are assigned by an official authority of some kind. In other cases, these identifiers are generated by a private organization or individual. In some cases, these identifiers have a national or international scope within which they are unique (a Social Security Number has national scope), while in other cases they may only be unique within a very limited scope (my employee number is only unique among the numbers assigned by my specific employer). Nevertheless, these identifiers serve, if used properly, to identify the things we want to talk about.

As we’ve seen, the Web already provides one form of identifier, the Uniform Resource Locator (URL). A URL is a string that identifies a Web resource by representing its primary access mechanism (essentially, its network “location”). However, URLs are a subset of a more general and powerful concept, the Uniform Resource Identifier (URI). URIs (defined in [RFC2396]) are similar to URLs in that different persons or organizations can independently create them, and use them to identify things. However, unlike URLs, URIs are not limited to identifying things that have network locations, or use other computer access mechanisms. In fact, we can create a URI to refer to anything we want to talk about, including

Since the URI is such a general identification mechanism, capable of identifying anything, it should not be surprising that RDF uses URIs as its basic mechanism for identifying things. Specifically, uses URIs to identify both subjects and objects in RDF statements (the objects in some statements, such as age values or names, will still be identified by strings). In fact, RDF defines a resource as anything that is identifiable by a URI, and hence using URIs allows RDF to describe practically anything, and to state relationships between such things as well.

Now that we have URIs to identify resources, we can be more complete and precise about recording information. For example, instead of identifying the creator of the Web page in our original example by the string “John Smith”, we can assign him a URI, say (using a URI based on his employee number) http://www.foobar.org/staffid/85740 . The RDF statement stating this fact would then have the graph:

<http://www.foobar.org/index.html> creator <http://www.foobar.org/staffid/85740>

One advantage of using a URI to identify the creator of the page in this example is that we can be more precise in our identification. That is, the creator of the page isn’t the string “John Smith”, or any one of the thousands of people having “John Smith” as their name, but the particular John Smith associated with that URI (whoever created the URI defines the association). Moreover, since we have a URI for the creator of the page, it is a full-fledged resource, and we can record additional information about him, such as his name, and age, as in the graph

<http://www.foobar.org/index.html>        creator     <http://www.foobar.org/staffid/85740>
<http://www.foobar.org/staffid/85740>    name       "John Smith"
<http://www.foobar.org/staffid/85740>    age          "27"

We've just shown how RDF uses URIs as subjects and objects in its statements. However, in these latest examples, we’ve still oversimplified something. RDF also uses URIs as predicates in RDF statements. That is, rather than using strings such as “creator” or “name” to identify properties, RDF uses URIs..

Using URIs to identify properties is important for a number of reasons. First, it allows us to distinguish the properties we use from properties someone else may use that would otherwise be identified by the same text string. For instance, in our example, foobar.org uses “name” to mean someone's full name written out as a string (e.g., “John Smith”), but someone else may intend "name" to mean something different (e.g., the name of a variable in a piece of program text). A program encountering “name” as a property identifier on the Web wouldn’t necessarily be able to distinguish these uses. However, if foobar.org writes http://www.foobar.org/terms/name for its “name” property, and the other person writes http://geneology.org/terms/name for hers, we can keep straight the fact that there are distinct properties involved (even if a program can't automatically determine the distinct meanings). Another reason why it is important to use URIs to identify properties is that it allows us to treat RDF properties as resources themselves. Since properties are resources, we can record descriptive information about them (e.g., the English description of what foobar.org means by “name”), simply by adding additional RDF statements with the property's URI as the subject.

Using URIs as subjects, objects, and predicates in RDF statements allows us to begin to develop and use a shared vocabulary on the Web, reflecting (and creating) a shared understanding of the concepts we talk about. For example, now that we know to use URIs (where we can) to identify all the parts of an RDF statement, we can write the statement “the creator of http://www.foobar.org/index.html is John Smith“ as the triples

<http://www.foobar.org/index.html> <http://purl.org/dc/elements/1.1/creator> <http://www.foobar.org/staffid/85740> .
<http://www.foobar.org/staffid/85740> <http://www.foobar.org/terms/name> "John Smith" .

The URI http://purl.org/dc/elements/1.1/creator for the “creator” property in the first triple is an unambiguous reference to the “creator” attribute in the Dublin Core metadata attribute set, a widely-used collection of attributes (properties) for describing information of all kinds. The writer of this triple is effectively saying that the relationship between the Web page (identified by http://www.foobar.org/index.html ) and the creator of the page (a distinct person, identified by http://www.foobar.org/staffid/85740 ) is exactly the concept defined by http://purl.org/dc/elements/1.1/creator . Moreover, anyone else, or any program, which understands http://purl.org/dc/elements/1.1/creator will know exactly what is meant by this relationship.

Incidentally, the triples above, using URIs in the subject, predicate, and (where appropriate) object positions (and with periods at the ends of the lines), are now in a formal RDF notation called Ntriples, which is defined for linearizing RDF graphs.

4. Complex Data

However, suppose we wanted to record the various pieces of information about his address as separate street, city, state, and Zip code values? How do we do this using RDF?

In RDF, we can represent such structured information by considering the aggregate thing we want to talk about (like John Smith's address) as a separate resource, and then making separate statements about that new resource. So, in the RDF graph, in order to break up John Smith’s address into its component parts, we create a new node to represent the concept of John Smith’s address, and assign that concept a new URI to identify it, say http://www.foobar.org/addressid/85740 . We then write RDF statements (create additional arcs and nodes) with that node as the subject, to represent the additional information, producing the graph below:

@@Note: this figure needs to be redone, with <http://www.foobar.org/addressid/85740> in the current blank address node in the graph

<http://www.foobar.org/staffid/85740>       <http://www.foobar.org/terms/address>   <http://www.foobar.org/addressid/85740> .
<http://www.foobar.org/addressid/85740> <http://www.foobar.org/terms/street>      "1501 Grant Avenue" .
<http://www.foobar.org/addressid/85740> <http://www.foobar.org/terms/city>         "Bedford" .
<http://www.foobar.org/addressid/85740> <http://www.foobar.org/terms/state>       "Massachusetts" .
<http://www.foobar.org/addressid/85740> <http://www.foobar.org/terms/Zip>          "01730" .

In the drawing of the graph above, the new URI we assigned to identify "John Smith's address" really You are recommended to use CSS to specify the font and properties such as its size and color. This will reduce the size of HTML files and make them easier maintain compared with using elements. serves no purpose, since we could just as easily have drawn the graph