Copyright © 2006-2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page’s title, content type, and preview image to provide appropriate information in a user’s newsfeed when she clicks the “Like” button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines.
A key enabling technology behind these developments HTML and RDFa (Resource Description Framework in Attributes), which provides a set of markup attributes to augment visual information on the Web with machine-readable hints. In this Primer, we show how to express data using RDFa in HTML, and in particular how to mark up existing human-readable Web page content to express machine-readable data.
This document provides only a Primer to RDFa. The complete specification of RDFa, with further examples, can be found in the RDFa 1.1 Core specification [RDFA-CORE] and in the XHTML+RDFa 1.1 specification [XHTML-RDFA].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was previously published as a W3C Note by the RDFa Task Force. The responsibility of updating the document was then handed over to the RDFa Working Group in 2010. The RDFa Working Group's charter was expanded in March 2011 to include work on the RDF API in addition to continuing work on the RDFa specifications. Due to the expansion in scope, the RDFa Working Group was renamed to the RDF Web Applications Working Group.
This document was published by the RDF Web Applications Working Group as a Working Draft. This document is intended to become a W3C Note. If you wish to make comments regarding this document, please send them to public-rdfa-wg@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The web is a rich, distributed repository of interconnected information. Until recently, it was organized primarily for human consumption. On a typical web page, an HTML author might specify a headline, then a smaller sub-headline, a block of italicized text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands what the headline expresses - a blog post title. The sub-headline indicates the author, the italicized text is the article's publication date, and the single-word links are subject categories. Computers do not understand the nuances between the information; the gap between what programs and humans understand is large.
Figure 1: On the left, what browsers see. On the right, what humans see. Can we bridge the gap so that browsers see more of what we see?
What if the browser, or any machine consumer such as a Web crawler, received information on the meaning of a web page’s visual elements? A dinner party announced on a blog could be easily copied to the user’s calendar, an author’s complete contact information to the user’s address book. Users could automatically recall previously browsed articles according to categorization labels (i.e., tags). A photo copied and pasted from a web site to a school report would carry with it a link back to the photographer, giving her proper credit. A link shared by a user to his social network contacts would automatically carry additional data pulled from the original web page: a thumbnail, an author, and a specific title. When web data meant for humans is augmented with hints meant for computer programs, these programs become significantly more helpful, because they begin to understand the data's structure.
RDFa allows HTML authors to do just that. Using a few simple HTML attributes, authors can mark up human-readable data with machine-readable indicators for browsers and other programs to interpret. A web page can include markup for items as simple as the title of an article, or as complex as a user's complete social network.
RDFa benefits from the power of RDF [RDF-PRIMER], the W3C’s standard for interoperable machine-readable data. However, readers of this document are not expected to understand RDF. Readers are expected to understand a basic level of HTML.
Historically, RDFa 1.0 [RDFA-SYNTAX] was specified only for XHTML. RDFa 1.1 [RDFA-CORE] is the newest version and the one used in this document. RDFa 1.1 is specified for both XHTML [XHTML-RDFA] and HTML [HTML-RDFA]. RDFa also works in a variety of other XML-based languages like SVG [SVG12], ePub and the Open Document Format. While this document uses XHTML in all of the examples, RDFa 1.1 works well in HTML4 and HTML5. For simplicity, we use the term “HTML” throughout this document to refer to all of the HTML-family languages.
RDFa is based on attributes. While some of the HTML attributes (e.g., href
, rel
) have been re-used, some of the RDFa 1.1 attributes are new. This is important because some of the HTML or XHTML validators may not properly validate the HTML code until they are updated to recognize the new RDFa 1.1 attributes. This is rarely a problem in practice since browsers simply ignore attributes that they do not recognize. None of the RDFa-specific attributes have any effect on the visual display of the HTML content. Authors do not have to worry about pages marked up with RDFa looking any different to a human being from pages not marked up with RDFa.
Consider Alice, a blogger who publishes a mix of professional and personal articles at http://example.com/alice
. We will construct markup examples to illustrate how Alice can use RDFa. The complete markup of these examples are available on a dedicated page.
RDFa 1.1 offers a number of additional techniques to significantly simplify the markup. Usage of these techniques are not required by the author, because the methods used in previous sections cover most of the markup requirements. However, when the the RDFa content becomes more complicated, these techniques may help authors simplify their markup.
In a number of simple use cases, such as the Open Graph Protocol, HTML authors will only use a single vocabulary. Rather than force the author to reproduce og:
prefix on every piece of metadata, RDFa 1.1 introduces the vocab
attribute to let the author declare a single vocabulary for a chunk of HTML. Thus, instead of:
<html prefix="og: http://ogp.me/ns#"> <head> <title>The Trouble with Bob</title> <meta property="og:title" content="The Trouble with Bob" /> <meta property="og:type" content="text" /> <meta property="og:image" content="http://example.com/alice/bob-ugly.jpg" /> ... </head> ...
Alice can write:
<html> <head vocab="http://ogp.me/ns#"> <title>The Trouble with Bob</title> <meta property="title" content="The Trouble with Bob" /> <meta property="type" content="text" /> <meta property="image" content="http://example.com/alice/bob-ugly.jpg" /> ... </head> ...
Of course, it’s still a good idea to mark up content in the HTML body and reuse existing rendered text and links. The use of vocab
is compatible with RDFa either in the head or body of the HTML document, whether or not content
is used.
As Alice continues to mark up her page with more RDFa, she may use more and more terms from vocabularies like Dublin Core [DC11] and FOAF [FOAF]. Alice would like the short names she uses to map to different vocabularies. Although she would like to avoid prefixes, the vocab
attribute was not designed to work with multiple vocabularies. To help solve both authoring simplicity and the ability to reconcile terminology, RDFa 1.1 introduces the idea of bundling vocabularies into a single profile. This technique makes it particularly easy for HTML authors to combine the use of multiple vocabularies using simple short-hand names.
To implement this feature, publishers can publish an RDFa profile document, which maps short names like name
to full URIs like http://xmlns.com/foaf/0.1/name
. The specifics of how to prepare such a profile document is out of scope for this simple introduction. However, the markup for authors who wish to use a profile is quite simple:
<div profile="http://example.org/profiles/alice"> <span property="title">The trouble with Bob</span> <span property="name">Alice</span> ... </div>
The markup above uses Alice's profile to define short-hand meanings for both "title" and "name". The markup above yields the expected information using the Dublin Core and the FOAF vocabularies, respectively.
Profile documents can also define prefixes. That is, the author can use a profile which contains a number of mapping declarations. The mapping declarations in the profile document are equivalent to a set of prefix
declarations in the blog post. For example, in the following example:
<div prefix="og: http://ogp.me/ns# dc: http://purl.org/dc/terms/"> <h2 property="og:title">The trouble with Bob</h2> <h3 property="dc:creator">Alice</h3> ... </div>
The author could replace the prefix
attribute with a profile:
<div profile="http://example.org/profiles/prefixes"> <h2 property="og:title">The trouble with Bob</h2> <h3 property="dc:creator">Alice</h3> ... </div>
The example above assumes that the profile http://example.org/profiles/prefixes
contains prefix definitions for og:
and dc:
. While replacing two prefix definitions with one profile reference does not greatly reduce the markup necessary, in more complex cases, the authors may have many more prefix defintions in which case the usage of a profile may become very useful.
Profiles provide an easy way to bundle prefix and term declarations in one place. However, the author is still required to refer to those profiles explicitly in their documents. There are a number of vocabularies and their prefixes that are very widely used by the Web community - the Dublin Core vocabulary is a good example. These common vocabularies tend to be defined over and over again, but sometimes Web page authors forget to declare the prefixes.
To alleviate this issue, RDFa 1.1 defines the concept of default profiles. These profiles, maintained by the W3C, are always implicitly referred to by any RDFa 1.1 content. That is, the RDFa processor will automatically load these profiles first, for every page that is processed. Profile and prefix declarations in a document always override declarations made in a default profile, but if a web page author forgets to declare a common vocabulary such as Dublin Core or FOAF, the RDFa Processor will fall back to the declaration in the default profile.
In HTML, there are two default profiles, namely:http://www.w3.org/profile/rdfa-1.1
.http://www.w3.org/profile/html-rdfa-1.1
.Authors can consult each default profile to find out which prefixes and terms are included automatically.
Default profiles are used as a mechanism to correct RDFa documents where authors accidentally forgot to declare common prefixes. While authors may rely on these default profiles to be available for RDFa 1.1 documents, the prefixes may change over the course of 5-10 years. The best way to ensure that the prefixes that document authors use always map to the intent of the author is to use the prefix
attribute to declare these prefixes. Do not depend on the default profile to declare common prefixes.
For example, the following example does not declare the dc:
prefix using either the prefix
or profile
attribute:
<div> <h2 property="dc:date">2011-03-19</h2> <h3 property="dc:creator">Alice</h3> ... </div>
However, an RDFa processor will still recognize the dc:date
and dc:creator
short-hand and expand the values to the corresponding URI. The RDFa processor is able to do this because the dc
prefix is part of the default http://www.w3.org/profile/rdfa-1.1
profile.
Since default profiles are meant to be a last-resort mechanism to help novice document authors, the markup above is not recommended. The rest of this document will utilize authoring best practices by declaring all prefixes in order to make the document author's intentions explicit.
Alice would also like to make information about herself, such as her email address, phone number, and other details, easily available to her friends’ contact management software. This time, instead of describing the properties of a web page, she’s going to describe the properties of a person: herself. To do this, she adds deeper structure, so that she can connect multiple items that themselves have properties.
Alice already has contact information displayed on her blog.
<div> <p> Alice Birpemswick </p> <p> Email: <a href="mailto:alice@example.com">alice@example.com</a> </p> <p> Phone: <a href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> </div>
The Dublin Core vocabulary does not provide property names for describing contact information, but the Friend-of-a-Friend [FOAF] vocabulary does. In RDFa, it is common and easy to combine different vocabularies in a single page. Alice imports the FOAF vocabulary and declares a foaf:Person
. For this purpose, Alice uses typeof
, an RDFa attribute that is specifically meant to declare a new data item with a certain type:
<div prefix="foaf: http://xmlns.com/foaf/0.1/" typeof="foaf:Person">
...
Alice realizes that she only intends to use the FOAF vocabulary at this point, so she uses the vocab
attribute to further simplify her markup.
<div vocab="http://xmlns.com/foaf/0.1/" typeof="Person">
...
Then, Alice indicates which content on the page represents her full name, email address, and phone number:
<div vocab="http://xmlns.com/foaf/0.1/" typeof="Person"> <p property="name"> Alice Birpemswick </p> <p> Email: <a rel="mbox" href="mailto:alice@example.com">alice@example.com</a> </p> <p> Phone: <a rel="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> </div>
Note how Alice did not specify about
like she did when adding blog entry metadata. If she is not declaring what she is talking about, how does the RDFa Processor know what she's identifying? In RDFa, the typeof
attribute on the enclosing div
implicitly sets the subject of the properties marked up within that div
. That is, the name, email address, and phone number are associated with a new node of type foaf:Person
. This node has no URI to identify it, so it is called a blank node as shown on the figure:
Figure 6: A Blank Node: blank nodes are not identified by URI. Instead, many of them have an RDFa typeof
attribute that identifies the type of data they represent. This approach — providing no name but adding a type — is particularly useful when listing a number of items on a page that have no permanent URL, e.g., calendar events, authors on an article, friends on a social network, etc.
RDF, the Resource Description Framework, is the abstract data representation we have drawn out as graphs in the examples above. Each arrow in the graph is represented as a subject-property-object triple: the subject is the node at the start of the arrow, the property is the arrow itself, and the object is the node or literal at the end of the arrow. A set of such RDF triples is often called an “RDF graph”, and it is typically stored in what is often called a “Triple Store” or a “Graph Store”.
Consider the first example graph:
The two RDF triples for this graph are written, using the Turtle syntax [TURTLE], as follows:
<http://www.example.com/alice/posts/trouble_with_bob> <http://purl.org/dc/elements/1.1/title> "The Trouble with Bob"; <http://purl.org/dc/elements/1.1/creator> "Alice" .
Also, the TYPE arrows we drew are no different from other arrows. The TYPE is just another property that happens to be a core RDF property, rdf:type
. The rdf
vocabulary is located at http://www.w3.org/1999/02/22-rdf-syntax-ns#
. The contact information example from above should thus be diagrammed as:
The point of RDF is to provide a universal language for expressing data. A unit of data can have any number of properties that are expressed as URIs. These URIs can be reused by any publisher, much like any web publisher can link to any web page, even ones they did not create themselves. Given data, in the form of RDF triples, collected from various locations, and using the RDF query language SPARQL [RDF-SPARQL-QUERY], one can search for “friends of Alice’s who created items whose title contains the word ‘Bob’,” whether those items are blog posts, videos, calendar events, or other data types.
RDF is an abstract data model meant to maximize the reuse of vocabularies. RDFa is a way to express RDF data within HTML, in a way that is machine-readable, and by reusing the existing human-readable data in the document.
As Alice marks up her page with RDFa, she may discover the need to express data, such as her favorite photos, that is not covered by existing vocabularies. If she needs to, Alice can create a custom vocabulary suited for her needs. Once a vocabulary is created, it can be used in RDFa markup like any other vocabulary.
The instructions on how to create a vocabulary, also known as an RDF Schema, are available in Section 5 of the RDF Primer [RDF-SCHEMA]. At a high level, the creation of a vocabulary for RDFa involves:
http://example.com/photos/vocab#
.Photo
and Camera
, as well as the property takenWith
that relates a photo to the camera with which it was taken.prefix="photo: http://example.com/photos/vocab#"
and typeof="photo:Camera"
.It is worth noting that anyone who can publish a document on the Web can publish a vocabulary and thus define new data fields they may wish to express. RDF and RDFa allow fully distributed extensibility of vocabularies.
There is a wide variety of tools that can be used to generate or process RDFa data. Good sources for these are the RDFa page of the W3C Semantic Web Wiki, or the RDFa Wiki’s implementation page. The RDFa Wiki also contains further examples and information on how to get involved.
This section contains a set of more advanced RDFa examples. They are provided to help the reader understand a few more RDFa usage patterns. Many of these examples describe not only how to encode data into RDF but also what an application might try to do with the data. Note that the implementation of those examples may require programmatic access to the RDFa content. Programmatic access to the data is provided via the RDFa API [RDFA-API].
Amy has enriched her band’s web-site to include event information. Google Rich Snippets are used to mark up information for search engines to use when displaying enhanced search results. Amy also uses some JavaScript code that she found on the web that automatically extracts the event information from a page and adds an entry into a personal calendar.
Brian finds Amy’s web-site through Google and opens the band’s page. He decides that he wants to go to the next concert. Brian is able to add the details to his calendar by clicking on the link that is automatically generated by the JavaScript tool. The JavaScript extracts the RDFa from the web page using RDFa API [RDFA-API], and places the event into Brian's personal calendaring software — Google Calendar. Amy also uses the RDFa API [RDFA-API] to automatically extract the event information from a page and some additional JavaScript code that she found on the web to add an entry into her personal calendar.
<div prefix="v: http://rdf.data-vocabulary.org/#" typeof="v:Event"> <a rel="v:url" href="http://amyandtheredfoxies.example.com/events" property="v:summary">Tour Info: Amy And The Red Foxies</a> <span rel="v:location"> <a typeof="v:Organization" rel="v:url" href="http://www.kammgarn.de/" property="v:name">Kammgarn</a> </span> <div rel="v:photo"><img src="foxies.jpg"/></div> <span property="v:summary">Hey K-Town, Amy And The Red Foxies will rock Kammgarn in October.</span> When: <span property="v:startDate" content="2009-10-15T19:00">15. Oct., 7:00 pm</span>- <span property="v:endDate" content="2009-10-15T21:00">9:00 pm</span> Category: <span property="v:eventType">concert</span> </div>
Dale has a site that contains a number of images, showcasing his photography. He has already used RDFa to add licensing information about the images to his pages by following the instructions provided by Creative Commons. Dale would like to display the correct Creative Commons icons for each image so that people will be able to quickly determine which licenses apply to each image. He writes a few lines of Javascript code by using the RDFa API [RDFA-API] to extract the URI of the applied license from the web page and then uses it to load an image for that URL through the Creative Commons website.
<div prefix="cc: http://creativecommons.org/ns#"> <img src="http://dale.example.com/images/image1.png" rel="cc:license" resource="http://creativecommons.org/licenses/by/3.0/us/"/> <a rel="cc:attributionURL" rel="cc:attributionURL" href="http://dale.example.com" rel="cc:attributionURL" property="cc:attributionName">Dale</a> </div>
Mary is responsible for keeping the projects section of her company’s home page up-to-date. She wants to display info-boxes that summarize details about the members associated with each project. The information should appear when hovering the mouse over the link to each member's homepage. Since each member’s homepage is annotated with RDFa, Mary writes a script that requests the page’s content and extracts necessary information via the RDFa API.
To use unique identification for the different interest areas, Mary decides to use
URIs rather than simple text. She chooses to use the terms
defined by DBpedia. DBPedia is a dump of Wikipedia data that is expressed as a vocabulary. It is widely used on the Semantic Web for identifying concepts in the human world. She also makes use of a special
RDFa attribute, namely resource
that plays the same role as href
but does not provide
a clickable link to the browser like href
. This allows her to add a reference to the human
readable version of the interest page on Wikipedia. Since both the resource
and the href
attributes may appear on the same element, the former takes precedence in RDFa while the latter can be used to re-direct the person viewing the page to a human-readable form of the DBPedia entry.
Finally Mary uses the RDFa API [RDFA-API] to extract this kind of information from the HTML source in
order to populate the infoboxes.
<div prefix="dc: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/" about="#me" typeof="foaf:Person"> <span property="foaf:name" content="Bob">My</span> interests are: <ol about="#me" typeof="foaf:Person" rel="foaf:interest"> <li><a resource="http://dbpedia.org/resource/Semantic_Web" href="http://en.wikipedia.org/wiki/Semantic_Web" property="dc:title">Semantic Web</a> </li> <li><a resource="http://dbpedia.org/resource/Facebook" href="http://en.wikipedia.org/wiki/Facebook" property="dc:title">Facebook</a> </li> <li><a resource="http://dbpedia.org/resource/Twitter" href="http://en.wikipedia.org/wiki/Twitter" property="dc:title">Twitter</a> </li> </ol> </div>
Richard has created a site that lists his favourite restaurants and their locations. He doesn’t want to generate code specific to the various mapping services on the Web. Instead of creating specific markup for Yahoo Maps, Google Maps, MapQuest, and Google Earth, he instead adds address information via RDFa to each restaurant entry. This enables him to build a general tool that extracts the address information and access the mapping tool the user wishes.
<div prefix="vc: http://www.w3.org/2006/vcard/ns# foaf: http://xmlns.com/foaf/0.1/" typeof="vcard:VCard"> <span property="vcard:fn">Wong Kei</span> <span property="vcard:street-address">41-43 Wardour Street</span> <span property="vcardlocality">London</span>, <span property="vcard:country-name">United Kingdom</span> <span property="vcard:tel">020 74373071</span> </div>
Marie is a chemist, researching the effects of ethanol on the spatial orientation of animals. She writes about her research on her blog and often makes references to chemical compounds. She would like any reference to these compounds to automatically have a picture of the compound's structure shown as a tooltip, and a link to the compound’s entry on the National Center for Biotechnology Information [NCBI] Web site. Similarly, she would like visitors to be able to visualize the chemical compound in the page using a new HTML5 canvas widget she has found on the web that combines data from different chemistry websites.
<div prefix="dbp: http://dbpedia.org/ontology/ fb: http://rdf.freebase.com/rdf/"> My latest study about the effects of <span about="fb:en.ethanol" typeof="dbp:ChemicalCompound" property="fb:chemistry.chemical_compound.pubchem_id" content="702">ethanol</span> on mice's spatial orientation show that ... </div>
Dave is writing a browser plugin that filters product offers in a web page and displays an icon to buy the product or save it to a public wishlist. The plugin searches for any mention of product names, thumbnails, and offered prices. The information is listed in the URL bar as an icon, and upon clicking the icon, displayed in a sidebar in the browser. He can then add each item to a list that is managed by the browser plugin and published on a wishlist website.
Because many of his pages make use of the Good Relation ontology, which is widely used to markup products,
Dave decides to make use of the vocab
facility of RDFa to simplify his code. He also forgets to declare the rdfs
prefix, but since it is defined by the RDFa default profile, the data that he intended to express using the rdfs
prefix will still be extracted by all conforming RDFa processors.
<div prefix="foaf: http://xmlns.com/foaf/0.1/"> <div vocab="http://purl.org/goodrelations/v1#" about="#offering" typeof="Offering"> <div rel="foaf:page" resource="http://www.amazon.com/Harry-Potter-Deathly-Hallows-Book/dp/0545139708"></div> <div property="rdfs:label">Harry Potter and the Deathly Hallows</div> <div property="rdfs:comment">In this final, seventh installment of the Harry Potter series, J.K. Rowling unveils in spectactular fashion the answers to the many questions that have been so eagerly awaited. The spellbinding, richly woven narrative, which plunges, twists and turns at a breathtaking pace, confirms the author as a mistress of storytelling, whose books will be read, reread and read again.</div> <div rel="foaf:depiction"> <img src="http://ecx.images-amazon.com/images/I/51ynI7I-qnL._SL500_AA300_.jpg" /> </div> <div rel="hasBusinessFunction" resource=http://purl.org/goodrelations/v1#Sell"></div> <div rel="hasPriceSpecification">Buy for <span typeof="UnitPriceSpecification"> <span property="hasCurrency" content="USD">$</span> <span property="hasCurrencyValue">7.49</span> </span> </div> Pay via: <span rel="acceptedPaymentMethods" resource="http://purl.org/goodrelations/v1#PayPal">PayPal</span> <span rel="acceptedPaymentMethods" resource="http://purl.org/goodrelations/v1#MasterCard">MasterCard</span> </div> </div> </div>
At the time of publication, the active members of the RDFa Working Group were:
No normative references.