Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
RDF Concepts and Abstract Syntax defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Proposed Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is one document in a set of six (Primer, Concepts, Syntax, Semantics, Vocabulary, and Test Cases) intended to jointly replace the original Resource Description Framework specifications, RDF Model and Syntax (1999 Recommendation) and RDF Schema (1999 Proposed Recommendation). It has been developed by the RDF Core Working Group as part of the W3C Semantic Web Activity (Activity Statement, Group Charter) for publication on 15 December 2003.
The design expressed in earlier versions of these documents has been widely reviewed and satisfies the Working Group's chartered technical requirements. The Working Group has addressed all comments received, making changes as necessary. Many implementations have been reported, covering among them all features of the specification. Changes to this document since the Second Last Call Working Draft are detailed in the Appendix A.
W3C Advisory Committee Representatives are now invited to submit their formal review via Web form, as described in the Call for Review. Additional comments may be sent to a Team-only list, w3t-semweb-review@w3.org. The public is invited to send comments to www-rdf-comments@w3.org (archive) and to participate in general discussion of related technology on www-rdf-interest@w3.org (archive). The review period extends until 19 January 2004.
The W3C maintains a list of any patent disclosures related to this work.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. This abstract syntax is quite distinct from XML's tree-based infoset [XML-INFOSET]. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references.
Normative documentation of RDF falls into the following areas:
Within this document, normative sections are explicitly labelled as such. Explicit notes are informative.
The framework is designed so that vocabularies can be layered. The RDF and RDF vocabulary definition (RDF schema) languages [RDF-VOCABULARY] are the first such vocabularies. Others (cf. OWL [OWL] and the applications mentioned in the primer [RDF-PRIMER]) are in development.
In section 2, the background rationale and design goals are introduced. Key concepts follow in section 3. Section 4 discusses URI references reserved for use by RDF.
Section 5 discusses datatypes. XML content of literals is described in section 5.1, and the abstract syntax is defined in section 6 of this document.
Section 7 discusses the role of fragment identifiers in URI references used with RDF.
RDF has an abstract syntax that reflects a simple graph-based data model, and formal semantics with a rigorously defined notion of entailment providing a basis for well founded deductions in RDF data.
The development of RDF has been motivated by the following uses, among others:
RDF is designed to represent information in a minimally constraining, flexible way. It can be used in isolated applications, where individually designed formats might be more direct and easily understood, but RDF's generality offers greater value from sharing. The value of information thus increases as it becomes accessible to more applications across the entire Internet.
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
Note: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See [RDF-SEMANTICS] for more information about "model theory" as used in the literature of mathematics and logic.
RDF has a formal semantics which provides a dependable basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, or URIrefs). URI references are used for naming all kinds of things in RDF.
The other kind of value that appears in RDF data is a literal.
RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information among applications.
RDF can use values represented according to XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make statements about any resource.
In general, it is not assumed that complete information about any resource is available. RDF does not prevent anyone from making assertions that are nonsensical or inconsistent with other statements, or the world as people see it. Designers of applications that use RDF should be aware of this and may design their applications to tolerate incomplete or inconsistent sources of information.
RDF uses the following key concepts:
The underlying structure of any expression in RDF is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph (defined more formally in section 6). This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term "graph").
Each triple represents a statement of a relationship between the things denoted by the nodes that it links. Each triple has three parts:
The direction of the arc is significant: it always points toward the object.
The nodes of an RDF graph are its subjects and objects.
The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the things denoted by subject and object of the triple. The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains. A formal account of the meaning of RDF graphs is given in [RDF-SEMANTICS].
A node may be a URI with optional fragment identifier (URI reference, or URIref), a literal, or blank (having no separate form of identification). Properties are URI references. (See [URI], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 6.4.)
A URI reference or literal used as a node identifies what that node represents. A URI reference used as a predicate identifies a relationship between the things represented by the nodes it connects. A predicate URI reference may also be a node in the graph.
A blank node is a node that is not a URI reference or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements, but has no intrinsic name.
A convention used by some linear representations of an RDF graph to allow several statements to reference the same unidentified resource is to use a blank node identifier, which is a local identifier that can be distinguished from all URIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers. Note that such blank node identifiers are not part of the RDF abstract syntax, and the representation of triples containing blank nodes is entirely dependent on the particular concrete syntax used.
Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.
A datatype consists of a lexical space, a value space and a lexical-to-value mapping, see section 5.
For example, the lexical-to-value mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Lexical-to-Value Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
RDF predefines just one datatype rdf:XMLLiteral, used for embedding XML in RDF (see section 5.1).
There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with URI references. The predefined XML Schema datatypes [XML-SCHEMA2] are expected to be widely used for this purpose.
RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [XML-SCHEMA2] provides an extensibility framework suitable for defining new datatypes for use in RDF.
Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals.
A literal may be the object of an RDF statement, but not the subject or the predicate.
Literals may be plain or typed :
Continuing the example from section 3.3, the typed literals that can be defined using the XML Schema datatype xsd:boolean are:
Typed Literal | Lexical-to-Value Mapping | Value |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
For text that may contain
markup, use typed literals
with type rdf:XMLLiteral.
If language annotation is required,
it must be explicitly included as markup, usually by means of an
xml:lang
attribute.
[XHTML] may be included within RDF
in this way. Sometimes, in this latter case,
an additional span
or div
element is needed to carry an
xml:lang
or lang
attribute.
The string in both plain and typed literals is recommended to be in Unicode Normal Form C [NFC]. This is motivated by [CHARMOD] particularly section 4 Early Uniform Normalization.
Some simple facts indicate a relationship between two things. Such a fact may be represented as an RDF triple in which the predicate names the relationship, and the subject and object denote the two things. A familiar representation of such a fact might be as a row in a table in a relational database. The table has two columns, corresponding to the subject and the object of the RDF triple. The name of the table corresponds to the predicate of the RDF triple. A further familiar representation may be as a two place predicate in first order logic.
Relational databases permit a table to have an arbitrary number of columns, a row of which expresses information corresponding to a predicate in first order logic with an arbitrary number of places. Such a row, or predicate, has to be decomposed for representation as RDF triples. A simple form of decomposition introduces a new blank node, corresponding to the row, and a new triple is introduced for each cell in the row. The subject of each triple is the new blank node, the predicate corresponds to the column name, and object corresponds to the value in the cell. The new blank node may also have an rdf:type property whose value corresponds to the table name.
As an example, consider Figure 6 from the [RDF-PRIMER]:
This information might correspond to a row in a table "STAFFADDRESSES", with a primary key STAFFID, and additional columns STREET, STATE, CITY and POSTALCODE.
Thus, a more complex fact is expressed in RDF using a conjunction (logical-AND) of simple binary relationships. RDF does not provide means to express negation (NOT) or disjunction (OR).
Through its use of extensible URI-based vocabularies, RDF provides for expression of facts about arbitrary subjects; i.e. assertions of named properties about specific named things. A URI can be constructed for any thing that can be named, so RDF facts can be about any such things.
The ideas on meaning and inference in RDF are underpinned by the formal concept of entailment, as discussed in the RDF semantics document [RDF-SEMANTICS]. In brief, an RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred .
RDF uses URI references to identify resources and properties. Certain URI references are given specific meaning by RDF. Specifically, URI references with the following leading substring are defined by the RDF specifications:
Used with the RDF/XML serialization, this URI prefix string corresponds to XML namespace names [XML-NS] associated with the RDF vocabulary terms.
Note: this namespace name is the same as that used in the earlier RDF recommendation [RDF-MS].
Vocabulary terms in the rdf: namespace are listed in section 5.1 of the RDF syntax specification [RDF-SYNTAX]. Some of these terms are defined by the RDF specifications to denote specific concepts. Others have syntactic purpose (e.g. rdf:ID is part of the RDF/XML syntax).
The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [XML-SCHEMA2].
A datatype consists of a lexical space, a value space and a lexical-to-value mapping.
The lexical space of a datatype is a set of Unicode [UNICODE] strings.
The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
A datatype is identified by one or more URI references.
RDF may be used with any datatype definition that conforms to this abstraction, even if not defined in terms of XML Schema.
Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF. [RDF-SEMANTICS] contains a more detailed discussion of specific XML Schema built-in datatypes.
Note: When the datatype is defined using XML Schema:
RDF provides for XML content as a possible literal value. This typically originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax [RDF-SYNTAX].
Such content is indicated in an RDF graph using a typed literal whose datatype is a special built-in datatype rdf:XMLLiteral, defined as follows.
Note: Not all values of this datatype are compliant with XML 1.1 [XML 1.1]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
Note: XML values can be thought of as the [XML-INFOSET] or the [XPATH] nodeset corresponding to the lexical form, with an appropriate equality function.
Note: RDF applications may use additional equivalence relations, such as
that which relates an
xsd:string
with an rdf:XMLLiteral
corresponding to
a single text node of the same string.
This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.
This section also defines equivalence between RDF graphs. A definition of equivalence is needed to support the RDF Test Cases [RDF-TESTS] specification.
Implementation Note: This abstract syntax is the syntax over which the formal semantics are defined. Implementations are free to represent RDF graphs in any other equivalent form. As an example: in an RDF graph, literals with datatype rdf:XMLLiteral can be represented in a non-canonical format, and canonicalization performed during the comparison between two such literals. In this example the comparisons may be being performed either between syntactic structures or between their denotations in the domain of discourse. Implementations that do not require any such comparisons can hence be optimized.
An RDF triple contains three components:
An RDF triple is conventionally written in the order subject, predicate, object.
The predicate is also known as the property of the triple.
An RDF graph is a set of RDF triples.
The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph.
Two RDF graphs G and G' are equivalent if there is a bijection M between the sets of nodes of the two graphs, such that:
With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'.
A URI reference within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
The encoding consists of:
The disallowed octets that must be %-escaped include all those that do not correspond to US-ASCII characters, and the excluded characters listed in Section 2.4 of [URI], except for the number sign (#), percent sign (%), and the square bracket characters re-allowed in [RFC-2732].
Disallowed octets must be escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the 2-digit hexadecimal numeral corresponding to the octet value).
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference.
Note: RDF URI references are compatible with International Resource Identifiers as defined by [XML Namespaces 1.1].
Note: this section anticipates an RFC on Internationalized Resource Identifiers. Implementations may issue warnings concerning the use of RDF URI References that do not conform with [IRI draft] or its successors.
Note: The restriction to absolute URI references is found in this abstract syntax. When there is a well-defined base URI, concrete syntaxes, such as RDF/XML, may permit relative URIs as a shorthand for such absolute URI references.
Note: Because of the risk of confusion between RDF URI references that would be equivalent if derefenced, the use of %-escaped characters in RDF URI references is strongly discouraged. See also the URI equivalence issue of the Technical Architecture Group [TAG].
A literal in an RDF graph contains one or two named components.
All literals have a lexical form being a Unicode [UNICODE] string, which SHOULD be in Normal Form C [NFC].
Plain literals have a lexical form and optionally a language tag as defined by [RFC-3066], normalized to lowercase.
Typed literals have a lexical form and a datatype URI being an RDF URI reference.
Note: Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
Note: When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications.
Note: The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input.
Two literals are equal if and only if all of the following hold:
Note: RDF Literals are distinct and distinguishable from RDF URI references; e.g. http://example.org as an RDF Literal (untyped, without a language tag) is not equal to http://example.org as an RDF URI reference.
The datatype URI refers to a datatype. For XML Schema built-in datatypes, URIs such as http://www.w3.org/2001/XMLSchema#int are used. The URI of the datatype rdf:XMLLiteral may be used. There may be other, implementation dependent, mechanisms by which URIs refer to datatypes.
The value associated with a typed literal is found by applying the lexical-to-value mapping associated with the datatype URI to the lexical form.
If the lexical form is not in the lexical space of the datatype associated with the datatype URI, then no literal value can be associated with the typed literal. Such a case, while in error, is not syntactically ill-formed.
Note: In application contexts, comparing the values of typed literals (see section 6.5.2) is usually more helpful than comparing their syntactic forms (see section 6.5.1). Similarly, for comparing RDF Graphs, semantic notions of entailment (see [RDF-SEMANTICS]) are usually more helpful than syntactic equality (see section 6.3).
The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all RDF URI references and the set of all literals are pairwise disjoint.
Otherwise, this set of blank nodes is arbitrary.
RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.
RDF uses an RDF URI Reference, which may include a fragment identifier, as a context free identifier for a resource. RFC 2396 [URI] states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.
These apparently conflicting views are reconciled by considering that a URI reference in an RDF graph is treated with respect to the MIME type application/rdf+xml [RDF-MIME-TYPE]. Given an RDF URI reference consisting of an absolute URI and a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the absolute URI component. Thus:
This provides a handling of URI references and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior. Note that nothing here requires that an RDF application be able to retrieve any representation of resources identified by the URIs in an RDF graph.
This document contains a significant contribution from Pat Hayes, Sergey Melnik and Patrick Stickler, under whose leadership was developed the framework described in the RDF family of specifications for representing datatyped values, such as integers and dates.
The editors acknowledge valuable contributions from the following: Frank Manola, Pat Hayes, Dan Brickley, Jos de Roo, Dave Beckett, Patrick Stickler, Peter F. Patel-Schneider, Jerome Euzenat, Massimo Marchiori, Tim Berners-Lee, Dave Reynolds and Dan Connolly.
Jeremy Carroll thanks Oreste Signore, his host at the W3C Office in Italy and Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", part of the Consiglio Nazionale delle Ricerche, where Jeremy is a visiting researcher.
This document is a product of extended deliberations by the RDFcore Working Group, whose members have included: Art Barstow (W3C), Dave Beckett (ILRT), Dan Brickley (ILRT), Dan Connolly (W3C), Jeremy Carroll (Hewlett Packard), Ron Daniel (Interwoven Inc), Bill dehOra (InterX), Jos De Roo (AGFA), Jan Grant (ILRT), Graham Klyne (Nine by Nine), Frank Manola (MITRE Corporation), Brian McBride (Hewlett Packard), Eric Miller (W3C), Stephen Petschulat (IBM), Patrick Stickler (Nokia), Aaron Swartz (HWG), Mike Dean (BBN Technologies / Verizon), R. V. Guha (Alpiri Inc), Pat Hayes (IHMC), Sergey Melnik (Stanford University) and Martyn Horner (Profium Ltd).
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working Group members who contributed to this earlier work are: Nick Arnett (Verity), Tim Berners-Lee (W3C), Tim Bray (Textuality), Dan Brickley (ILRT / University of Bristol), Walter Chang (Adobe), Sailesh Chutani (Oracle), Dan Connolly (W3C), Ron Daniel (DATAFUSION), Charles Frankston (Microsoft), Patrick Gannon (CommerceNet), R. V. Guha (Epinions, previously of Netscape Communications), Tom Hill (Apple Computer), Arthur van Hoff (Marimba), Renato Iannella (DSTC), Sandeep Jain (Oracle), Kevin Jones, (InterMind), Emiko Kezuka (Digital Vision Laboratories), Joe Lapp (webMethods Inc.), Ora Lassila (Nokia Research Center), Andrew Layman (Microsoft), Ralph LeVan (OCLC), John McCarthy (Lawrence Berkeley National Laboratory), Chris McConnell (Microsoft), Murray Maloney (Grif), Michael Mealling (Network Solutions), Norbert Mikula (DataChannel), Eric Miller (OCLC), Jim Miller (W3C, emeritus), Frank Olken (Lawrence Berkeley National Laboratory), Jean Paoli (Microsoft), Sri Raghavan (Digital/Compaq), Lisa Rein (webMethods Inc.), Paul Resnick (University of Michigan), Bill Roberts (KnowledgeCite), i Tsuyoshi Sakata (Digital Vision Laboratories), Bob Schloss (IBM), Leon Shklar (Pencom Web Works), David Singer (IBM), Wei (William) Song (SISU), Neel Sundaresan (IBM), Ralph Swick (W3C), Naohiko Uramoto (IBM), Charles Wicksteed (Reuters Ltd.), Misha Wolf (Reuters Ltd.) and Lauren Wood (SoftQuad).
There were no substantive changes.
The following editorial changes have been made: