Please refer to the insert: <strong> errata insert: </strong> for this document, which may include some normative corrections.
delete: <a rel="license" href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"> insert: <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"> Copyright © 2010-2012 delete: <span rel="dcterms:publisher"> delete: <span typeof="foaf:Organization"> delete: <a rel="foaf:homepage" property="foaf:name" content="World Wide Web Consotrium" href="http://www.w3.org/"> delete: <acronym title="World Wide Web Consortium"> 2010-2013 insert: <a href="http://www.w3.org/"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> ® delete: </span> delete: </span> ( delete: <acronym title="Massachusetts Institute of Technology"> insert: <abbr title="Massachusetts Institute of Technology"> MIT delete: </acronym> insert: </abbr> , delete: <acronym title="European Research Consortium for Informatics and Mathematics"> insert: <abbr title="European Research Consortium for Informatics and Mathematics"> ERCIM delete: </acronym> insert: </abbr> , Keio insert: </a> , insert: <a href="http://ev.buaa.edu.cn/"> Beihang ), All Rights Reserved. delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> liability , trademark and document use rules apply.
The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page's title, content type, and preview image to provide appropriate information in a user's newsfeed when she clicks the "Like" button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines.
A key enabling technology behind these developments is the ability to add structured data to HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that allows just that: it provides a set of markup attributes to augment the visual information on the Web with machine-readable hints. In this Primer, we show how to express data using RDFa in HTML, and in particular how to mark up existing human-readable Web page content to express machine-readable data.
This document provides only a Primer to RDFa 1.1. The complete specification of RDFa, with further examples, can be found in the RDFa 1.1 Core [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDFA-CORE"> insert: <a class="bibref" href="#bib-RDFA-CORE"> RDFA-CORE ], RDFa Lite [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDFA-LITE"> insert: <a class="bibref" href="#bib-RDFA-LITE"> RDFA-LITE ], XHTML+RDFa 1.1 [ delete: <a class="bibref" rel="biblioentry" href="#bib-XHTML-RDFA"> insert: <a class="bibref" href="#bib-XHTML-RDFA"> XHTML-RDFA ], and the HTML5+RDFa 1.1 [ delete: <a class="bibref" rel="biblioentry" href="#bib-HTML-RDFA"> insert: <a class="bibref" href="#bib-HTML-RDFA"> HTML-RDFA ] specifications.
delete: </div> delete: <div id="sotd" class="introductory section" typeof="bibo:Chapter" about="#sotd"> insert: </section>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> publications and the latest revision of this technical report can be found in the delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> technical reports index at http://www.w3.org/TR/.
This document was published by the RDF Web Applications RDFa Working Group as a Note. an Editor's Draft. If you wish to make comments regarding this document, please send them to public-rdfa@w3.org ( subscribe , archives ). All feedback is comments are welcome.
Publication as a Working Group Note an Editor's Draft does not imply endorsement by the delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> Patent Policy . delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> Patent Policy .
delete: </div> delete: <div id="toc" typeof="bibo:Chapter" about="#toc" class="section"> insert: </section> content
attribute about
rel
The web is a rich, distributed repository of interconnected information. Until recently, it was organized primarily for human consumption. On a typical web page, an HTML author might specify a headline, then a smaller sub-headline, a block of italicized text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands what the headline expresses-a blog post title. The sub-headline indicates the author, the italicized text is the article's publication date, and the single-word links are subject categories. Computers do not understand the nuances between the information; the gap between what programs and humans understand is large.
delete: <div class="figure c1"> delete: <a class="figurelink" href="diagrams/presentation-vs-semantics.svg"> insert: <figure class="figure c1" id="fig-presentation-vs.-semantics">What if the browser, or any machine consumer such as a Web crawler, received information on the meaning of a web page's visual elements? A dinner party announced on a blog could be copied to the user's calendar, an author's complete contact information to the user's address book. Users could automatically recall previously browsed articles according to categorization labels (i.e., tags). A photo copied and pasted from a web site to a school report would carry with it a link back to the photographer, giving him proper credit. A link shared by a user to his social network contacts would automatically carry additional data pulled from the original web page: a thumbnail, an author, and a specific title. When web data meant for humans is augmented with hints meant for computer programs, these programs become significantly more helpful, because they begin to understand the data's structure.
RDFa allows HTML authors to do just that. Using a few simple HTML attributes, authors can mark up human-readable data with machine-readable indicators for browsers and other programs to interpret. A web page can include markup for items as simple as the title of an article, or as complex as a user's complete social network.
delete: <div id="html-vs.-xhtml" typeof="bibo:Chapter" about="#html-vs.-xhtml" class="section"> insert: <section id="html-vs.-xhtml">Historically, RDFa 1.0 [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDFA-SYNTAX"> insert: <a class="bibref" href="#bib-RDFA-SYNTAX"> RDFA-SYNTAX ] was specified only for XHTML. RDFa 1.1 [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDFA-CORE"> insert: <a class="bibref" href="#bib-RDFA-CORE"> RDFA-CORE ] is the newer version and the one used in this document. RDFa 1.1 is specified for both XHTML [ delete: <a class="bibref" rel="biblioentry" href="#bib-XHTML-RDFA"> insert: <a class="bibref" href="#bib-XHTML-RDFA"> XHTML-RDFA ] and HTML5 [ delete: <a class="bibref" rel="biblioentry" href="#bib-HTML-RDFA"> insert: <a class="bibref" href="#bib-HTML-RDFA"> HTML-RDFA ]. In fact, RDFa 1.1 also works for any XML-based languages like SVG [ delete: <a class="bibref" rel="biblioentry" href="#bib-SVG11"> insert: <a class="bibref" href="#bib-SVG11"> SVG11 ]. This document uses HTML in all of the examples; for simplicity, we use the term "HTML" throughout this document to refer to all of the HTML-family languages.
delete: </div> delete: <div id="validation" typeof="bibo:Chapter" about="#validation" class="section"> insert: </section> RDFa is based on attributes. While some of the HTML attributes (e.g., href
, src
) have been re-used, other RDFa attributes are new. This is important because some of the (X)HTML validators may not properly validate the HTML code until they are updated to recognize the new RDFa attributes. This is rarely a problem in practice since browsers simply ignore attributes that they do not recognize. None of the RDFa-specific attributes have any effect on the visual display of the HTML content. Authors do not have to worry about pages marked up with RDFa looking any different to a human being from pages not marked up with RDFa.
We begin the introduction to RDFa by using a subset of all the possibilities called RDFa Lite 1.1 [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDFA-LITE"> insert: <a class="bibref" href="#bib-RDFA-LITE"> RDFA-LITE ]. The goal, when defining that subset, was to define a set of possibilities that can be applied to most simple to moderate structured data markup tasks, without burdening the authors with additional complexities. Many Web authors will not need to use more than this minimal subset.
delete: <div id="the-first-steps--adding-machine-readable-hints-to-web-pages" typeof="bibo:Chapter" about="#the-first-steps--adding-machine-readable-hints-to-web-pages" class="section"> insert: <section id="the-first-steps-adding-machine-readable-hints-to-web-pages"> Consider Alice, a blogger who publishes a mix of professional and personal articles at http://example.com/alice
. We will construct markup examples to illustrate how Alice can use RDFa. The complete markup of these examples are available on a dedicated page .
The previous example demonstrated how Alice can markup text to make it machine readable. She would also like to mark up the links in a machine-readable way, to express the type of link being described. RDFa lets the publisher add a "flavor", i.e., a label, to an existing clickable link that processors can understand. This makes the same markup help both humans and machines.
In her blog's footer, Alice already declares her content to be freely reusable, as long as she receives due credit when her articles are cited. The HTML includes a link to a Creative Commons [ delete: <a class="bibref" rel="biblioentry" href="#bib-CC-ABOUT"> insert: <a class="bibref" href="#bib-CC-ABOUT"> CC-ABOUT ] license:
insert: <div class="example"><p>All content on this site is licensed under <a href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p>insert: </div>
A human clearly understands this sentence, in particular the meaning of the link with respect to the current document: it indicates the document's license, the conditions under which the page's contents are distributed. Unfortunately, when Bob visits Alice's blog, his browser sees only a plain link that could just as well point to one of Alice's friends or to her CV. For Bob's browser to understand that this link actually points to the document's licensing terms, Alice needs to add some flavor , some indication of what kind of link this is.
She can add this flavor using again the property
attribute. Indeed, when the element contains the href
(or src
) attribute, property
is automatically associated with the value of this attribute rather than the textual content of the a
element. The value of the attribute is the http://creativecommons.org/ns#license
, defined by the Creative Commons :
<p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p>
insert: </div> With this small update, Bob's browser will now understand that this link has a flavor: it indicates the blog's license:
delete: <div class="figure c1"> insert: <figure class="figure c1" id="fig-two-web-pages-connected-by-a-link-labeled-license-and-two-notes-with-a-license-relationship">Alice is quite pleased that she was able to add only structured-data hints via RDFa, never having to repeat the content of her text or the URL of her clickable links.
delete: </div> delete: <div id="setting-a-default-vocabulary" typeof="bibo:Chapter" about="#setting-a-default-vocabulary" class="section"> delete: <h5> insert: </section> In a number of simple use cases, such as our example with Alice's blog, HTML authors will predominantly use a single vocabulary. However, while generating full URLs via a CMS system is not a particular problem, typing these by hand may be error prone and tedious for humans. To alleviate this problem RDFa introduces the vocab
attribute to let the author declare a single vocabulary for a chunk of HTML. Thus, instead of:
<html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title" >The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created" >2011-09-10</span></p> ... </body>insert: </div>
Alice can write:
insert: <div class="example"><html> <head> ... </head> <body vocab="http://purl.org/dc/terms/" > ... <h2 property="title" >The Trouble with Bob</h2> <p>Date: <span property="created" >2011-09-10</span></p> ... </body>insert: </div>
Note how the property values are single "terms" now; these are simply concatenated to the URL defined via the vocab
attribute. The attribute can be placed on any HTML element (i.e., not only on the body
element like in the example) and its effect is valid for all the elements below that point.
Default vocabularies and full URIs can be mixed at any time. I.e., Alice could have written:
insert: <div class="example"><html> <head> ... </head> <body vocab="http://purl.org/dc/terms/" > ... <h2 property="title" >The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created" >2011-09-10</span></p> ... </body>insert: </div>
Perhaps a more interesting example is the combination of the header with the licensing segment of her web page:
insert: <div class="example"><html> <head> ... </head> <body vocab="http://purl.org/dc/terms/" > ... <h2 property="title" >The Trouble with Bob</h2> <p>Date: <span property="created" >2011-09-10</span></p> ... <p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p> </body> </html>insert: </div>
The full URL for the license term is necessary to avoid mixing vocabularies. As an alternative, Alice could have also chosen to use the vocab
attribute again:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/" > ... <h2 property="title" >The Trouble with Bob</h2> <p>Date: <span property="created" >2011-09-10</span></p> ... <p vocab="http://creativecommons.org/ns#" >All content on this site is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p> </body> </html>insert: </div>
because the vocab
in the license paragraph overrides the definition inherited from the body of the document.
The insert: <code> vocab insert: </code>
attribute references structured data vocabularies, identified using URLs. RDFa does not limit the form of these URLs or the document formats accessible by de-referencing them; however users insert: <em class="rfc2119" title="SHOULD"> SHOULD insert: </em> aim to use widely shared, conventional values for identifying such vocabularies, following conventions of case, spelling etc. established by their publishers. insert: </p>
Alice's blog page may contain, of course, multiple entries. Sometimes, Alice's sister Eve guest blogs, too. The front page of the blog lists the 10 most recent entries, each with its own title, author, and introductory paragraph. How, then, should Alice mark up the title of each of these entries individually even though they all appear within the same web page? RDFa provides resource
, an attribute for specifying the "context", i.e., the exact URL to which the contained RDFa markup applies:
<body vocab="http://purl.org/dc/terms/" > ... <div resource="/alice/posts/trouble_with_bob" > <h2 property="title" >The trouble with Bob</h2> <p>Date: <span property="created" >2011-09-10</span></p> <h3 property="creator" >Alice</h3> ... </div> ... <div resource="/alice/posts/jos_barbecue" > <h2 property="title" >Jo's Barbecue</h2> <p>Date: <span property="created" >2011-09-14</span></p> <h3 property="creator" >Eve</h3> ... </div> ... </body>insert: </div>
(Note that we used relative URLs in the example; the value of resource
could have been any URLs, i.e., relative or absolute.) We can represent this, once again, as a diagram connecting URLs to properties:
Alice can use the same technique to give her friend Bob proper credit when she posts one of his photos:
insert: <div class="example"><div resource="/alice/posts/trouble_with_bob" > <h2 property="title" >The trouble with Bob</h2> ... The trouble with Bob is that he takes much better photos than I do: ... <div resource="http://example.com/bob/photos/sunset.jpg" > <img src="http://example.com/bob/photos/sunset.jpg" /> <span property="title" >Beautiful Sunset</span> by <span property="creator" >Bob</span>. </div> </div>insert: </div>
Notice how the innermost resource
value, http://example.com/bob/photos/sunset.jpg
, "overrides" the outer value /alice/posts/trouble_with_bob
for all markup inside the containing div
. Once again, here is a diagram that represents the underlying data of this new portion of markup:
We have seen, in a insert: <a href="#inks-with-flavor"> previous section insert: </a> , how Alice can use RDFa to include Creative Commons statements on her blog. However, the solution in that section assigned these statements insert: <em> to the whole page insert: </em> , and not to individual blog items. This may be an issue if the page includes insert: <a href="#multiple-items"> multiple items insert: </a> . Indeed, Alice may be forced to repeat the relevant statements like this: insert: </p>
insert: <div class="example"><body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... insert: <span class="hilite"> <p vocab="http://creativecommons.org/ns#">All content on this blog item is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. <span property="attributionName">©2011 Alice Birpemswick</span>.</p> insert: </span> </div> ... <div resource="/alice/posts/jims_concert"> <h2 property="title">I was at Jim's concert the other day</h2> <p>Date: <span property="created">2011-10-22</span></p> <h3 property="creator">Alice</h3> ... insert: <span class="hilite"> <p vocab="http://creativecommons.org/ns#">All content on this blog item is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. <span property="attributionName">©2011 Alice Birpemswick</span>.</p> insert: </span> </div> ... </body> insert: </pre>
which may be tedious and error prone. insert: </p>
insert: <p> HTML+RDFa introduces the notion of "Property copying" to alleviate this situation. Using this Alice can "collect" a number of statements as a pattern, and refer to that pattern from other parts of the page. This is done using the magic property insert: <code> rdfa:copy insert: </code>
and the magic type insert: <code> rdfa:Pattern insert: </code>
as follows: insert: </p>
<body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... insert: <span class="hilite"> <link property="rdfa:copy" href="#ccpattern"/> insert: </span> </div> ... <div resource="/alice/posts/jims_concert"> <h2 property="title">I was at Jim's concert the other day</h2> <p>Date: <span property="created">2011-10-22</span></p> <h3 property="creator">Alice</h3> ... insert: <span class="hilite"> <link property="rdfa:copy" href="#ccpattern"/> insert: </span> </div> ... insert: <span class="hilite"> <div resource="#ccpattern" typeof="rdfa:Pattern"> <p vocab="http://creativecommons.org/ns#">All content on this blog item is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. <span property="attributionName">©2011 Alice Birpemswick</span>.</p> </div> insert: </span> </body> insert: </pre>insert: </div>
(Alice may choose to use CSS to make the CC statements invisible on the screen if she wants.) The effect of this structure is to, conceptually, "copy" all the RDFa statements appearing in the pattern to replace the insert: <code> link insert: </code>
element, yielding the following structure: insert: </p>
Alice may want to add her personal data to her individual blog items, too. She decides to combine her FOAF data with the blog items, i.e.:
insert: <div class="example"><div vocab="http://purl.org/dc/terms/"> <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> ... <h3 vocab="http://xmlns.com/foaf/0.1/" property="http://purl.org/dc/terms/creator" typeof="Person"> <span property="name">Alice Birpemswick</span>, Email: <a rel="mbox" property="mbox" href="mailto:alice@example.com">alice@example.com</a>, Phone: <a rel="phone" property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </h3> ... </div> ... </div>insert: </div>
The structured data she generates looks like this:
delete: <div class="figure c1"> insert: <figure class="figure c1" id="fig-the-simple-blog-structure-extended-with-alice-s-foaf-data-as-blank-node"> Unfortunately, this solution is not optimal in two respects. First of all, notice that Alice had to use the full URI for the creator
property: this is because the vocab
attribute is used to set the FOAF terms, i.e., the simple creator
value would have been misinterpreted. We will come back to the issue of using several vocabularies in another section below.
The other issue is that Alice would like to design her Web page so that her personal data would not appear on the page in each individual blog item but, rather, in one place like a footnote or a sidebar. I.e., what she would like to see is something like:
delete: <div class="figure c1"> insert: <figure class="figure c1" id="fig-mock-up-of-alice-s-blog-page-design-with-blogs-on-the-left-and-personal-data-on-the-right">If the FOAF data was included into each blog item, Alice would have to create a complex set of CSS rules to achieve the visual effect she wants.
To solve this, Alice decides to make use of the structure she already used for her FOAF data but, this time, assigning it a separate URI using the resource
attribute:
<div vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person"> <p> <span property="name">Alice Birpemswick</span>, Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>, Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> ... </div>
insert: </div> It is actually considered as a good practice to use real URIs whenever possible, i.e., Alice's new alternative should be preferred in general. Indeed, if a real URI is used, then it becomes possible to unambiguously refer to that particular piece of information, whereas that becomes more complicated with blank nodes.
delete: <p class="note"> insert: <div class="note"> The resource="#me"
markup is a FOAF convention: the URL that represents the person Alice is http://example.com/alice#me
. It should not be confused with Alice's homepage, http://example.com/alice
. Of course, Alice could have used a different URI if, for example, her blog and her personal homepage were kept separate; e.g., she could have used resource="http://alice.example.com/alice/home#myself"
instead of resource="#me"
.
Using the explicit URI for her FOAF data Alice can add a direct reference to the blog item using again the resource
attribute:
<div vocab="http://purl.org/dc/terms/"> <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <h3 property="creator" resource="#me" >Alice</h3> ... </div> </div> ... <div class="sidebar" vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person"> <p> <span property="name">Alice Birpemswick</span>, Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>, Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> ... </div>
insert: </div> The resource
attribute appears, in this case, together with property
on the same element : in this situation resource
indicates the "target" of the relation. Usage of this attribute allows Alice to "distribute" the various parts of her structured data on her page. What she gets is a slightly modified version of the previous structure, where the only difference is the usage of an explicit URI instead of a blank node:
Using this approach, it becomes very easy to also add references to the same data from different blogs:
insert: <div class="example"><div vocab="http://purl.org/dc/terms/"> <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <h3 property="creator" resource="#me" >Alice</h3> ... </div> </div> ... <div vocab="http://purl.org/dc/terms/"> <div resource="/alice/posts/my_photos"> <h2 property="title">I will post my photos nevertheless…</h2> <h3 property="creator" resource="#me" >Alice</h3> ... </div> </div> ... <div class="sidebar" vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person"> <p> <span property="name">Alice Birpemswick</span>, Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>, Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> ... </div>insert: </div>
Leading to the following structure:
delete: <div class="figure c1"> insert: <figure class="figure c1" id="fig-the-simple-blog-structure-with-two-blogs-extended-with-alice-s-foaf-data-with-an-explicit-uri"> insert: <div class="note"> Combined with property
, the resource
attribute plays exactly the same role as href
, already used for "links with flavor", except that it does not provide a clickable link to the browser like href
does. Also, the resource
attribute can be used on any HTML element, as opposed to href
whose usage is restricted, in HTML, to the a
and link
elements.
There is a similarity between this issue and its solution and the issue and the approach taken in the insert: <a href="#patterns"> section on property copying insert: </a> . There is, however, a subtle but important difference between the two. The solution using the insert: <code> resource insert: </code>
attribute introduces a new node in the graph, as shown on insert: <a href="#fig12"> Figure 12 insert: </a> , whereas copying the properties does not. Which of the two approaches should be adopted is often based on the vocabulary that is used. insert: </p>
The previous examples show that, for more complex cases, multiple vocabularies have to be used to express the various aspects of structured data. We have seen Alice using the Dublin Core, as well as the FOAF and the Creative Commons vocabularies, but there may be more. For example. Alice may want to add vocabulary elements defined by search engines on their schema.org site [ delete: <a class="bibref" rel="biblioentry" href="#bib-SCHEMA"> insert: <a class="bibref" href="#bib-SCHEMA"> SCHEMA ].
Alice can use either full URLs for all the terms, or can use the vocab
attribute to abbreviate the terms for the predominant vocabulary. But, in some cases, the vocabularies cannot be separated easily, which means that the usage of vocab
may become awkward. Here is, for example, the kind of HTML she might end up with:
<html> <head> ... </head> <body vocab="http://schema.org/" > <div resource="/alice/posts/trouble_with_bob" typeof="BlogPosting" > <h2 property="http://purl.org/dc/terms/title" >The trouble with Bob</h2> ... <h3 property="http://purl.org/dc/terms/creator" resource="#me">Alice</h3> <div property="articleBody" > <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> ... </body> </html>insert: </div>
Note that the schema.org and the Dublin Core terms are intertwined for a specific blog, and it becomes an arbitrary choice whether to use the vocab
attribute for http://purl.org/dc/terms/
or for http://schema.org/
. We have seen the same problem in a previous section when FOAF and Dublin Core terms were mixed.
To alleviate this problem, RDFa offers the possibility of using prefixed terms: a special prefix
attribute can assign prefixes to represent URLs and, using those prefixes, the vocabulary elements themselves can be abbreviated. The prefix:reference
syntax is used: the URL associated with prefix
is simply concatenated to reference
to create a full URL. (Note that we have already used this convention to simplify our figures.) Here is how the HTML of the previous example looks like when prefixes are used:
<html> <head> ... </head> <body prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/" > <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting" > <h2 property="dc:title" >The trouble with Bob</h2> ... <h3 property="dc:creator" resource="#me">Alice</h3> <div property="schema:articleBody" > <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>insert: </div>
The usage of prefixes can greatly reduce possible errors by concentrating the vocabulary choices to one place in the file. Just like vocab
, the prefix
attribute can appear anywhere in the HTML file, only affecting the elements below. prefix
and vocab
can also be mixed, for example:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/" prefix="schema: http://schema.org/" > <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting" > <h2 property="title" >The trouble with Bob</h2> ... <h3 property="creator" resource="#me">Alice</h3> <div property="schema:articleBody" > <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>insert: </div>
html
element contains a large number of prefix declarations. The character encoding (i.e., UTF-8, UTF-16, ASCII, etc.) used for an HTML5 file is declared using a meta
element in the header. In HTML5 this meta declaration must fall within the first 512 bytes of the page, or the HTML5 processor (browser, parser, etc.) will try to detect the encoding using some heuristics. A very "long" html
tag may therefore lead to problems. One way of avoiding the issue is to place most of the prefix declarations on the body
element. The previous example, whereby the Dublin Core and the schema.org vocabularies are used within the same blog post, raises another issue. It so happens that not only Dublin Core, but also schema.org has a property called creator
. Because RDFa uses URIs to denote properties that, by itself, is not a problem. However, if Alice wants to use both these properties in the same blog post (e.g., because she wants search engines to manage her blog post but, at the same times, she wants Dublin Core aware applications, like catalogs, to handle her blog post, too) this is what she may have to do:
<html> <head> ... </head> <body prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/"> <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting"> <h2 property="dc:title">The trouble with Bob</h2> ... <h3 property="dc:creator" resource="#me" ><span property="schema:creator" resource="#me" >Alice</span></h3> <div property="schema:articleBody"> <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>insert: </div>
Which is a bit awkward. Fortunately, RDFa allows the value of a property
attribute to be a list of values, i.e., she can also write:
<html> <head> ... </head> <body prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/"> <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting"> <h2 property="dc:title">The trouble with Bob</h2> ... <h3 property="dc:creator schema:creator" resource="#me" >Alice</h3> <div property="schema:articleBody"> <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>
insert: </div> yielding the structure:
delete: <div class="figure c1"> insert: <figure class="figure c1" id="fig-the-simple-blog-structure-with-two-different-creator-properties"> insert: <p> Similarly to insert: <code> property insert: </code>
, insert: <code> typeof insert: </code>
also accepts a list of values. For example, schema.org also has a notion of a Person, similar to FOAF; Alice may choose to use both: insert: </p>
<div class="sidebar" prefix="http://xmlns.com/foaf/0.1/ schema: http://schema.org/" resource="#me" insert: <span class="hilite"> typeof="foaf:Person schema:Person" insert: </span> > <p> <span property="foaf:name">Alice Birpemswick</span>, Email: <a property="foaf:mbox" href="mailto:alice@example.com">alice@example.com</a>, Phone: <a property="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> ... </div> insert: </pre>
A number of vocabularies are very widely used by the Web community with well-known prefixes-the prefixes—the Dublin Core vocabulary is a good example. These common vocabularies tend to be defined over and over again, and sometimes Web page authors forget to declare them altogether.
To alleviate this issue, RDFa introduces the concept of an initial context that defines a set of default prefixes. These prefixes, whose list is maintained and regularly updated by the delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> , provide a number of pre-defined prefixes that are known to the RDFa processor. Prefix declarations in a document always override declarations made through the defaults, but if a web page author forgets to declare a common vocabulary such as Dublin Core or FOAF, the RDFa Processor will fall back to those. The list of default prefixes are available on the Web for everyone to read.
For example, the following example does not declare the dc:
prefix using a prefix
attribute:
<html> <head> ... </head> <body> <div> <h2 property="dc:title" >The trouble with Bob</h2> ... <h3 property="dc:creator" resource="#me">Alice</h3> ... </div> </body> </html>insert: </div>
However, an RDFa processor still recognizes the dc:title
and dc:creator
short-hands and expands the values to the corresponding URLs. The RDFa processor is able to do this because the dc
prefix is part of the default prefixes in the initial context.
Default prefixes are used as a mechanism to correct RDFa documents where authors accidentally forgot to declare common prefixes. While authors may rely on these to be available for RDFa 1.1 documents, the prefixes may change over the course of 5-10 years, although the policy of delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> is that once a prefix is defined as part of a default profile, that particular prefix will not be changed or removed. Nevertheless, the best way to ensure that the prefixes that document authors use always map to the intent of the author is to use the prefix
attribute to declare these prefixes.
Since default prefixes are meant to be a last-resort mechanism to help novice document authors, the markup above is not recommended. The rest of this document will utilize authoring best practices by declaring all prefixes in order to make the document author's intentions explicit.
delete: </div> delete: </div> delete: </div> delete: <div id="going-deeper--rdfa-core" typeof="bibo:Chapter" about="#going-deeper--rdfa-core" class="section"> insert: </section>As we have seen in the previous sections, RDFa Lite is fairly powerful. Alice could indeed express complex sets of structured information. However, there are cases when the set of attributes presented so far do not cover all the needs, or make the resulting HTML structure a bit awkward and possibly error-prone. In those cases additional RDFa possibilities, provided through additional RDFa attributes, may come to the rescue; some of these will be presented in this section.
delete: <p class="note"> insert: <div class="note">RDFa Lite does not define a separate class of RDFa processors. In other words conforming RDFa processors are supposed to handle all RDFa features, not only those listed used by RDFa Lite.
delete: <div id="using-the-content-attribute" typeof="bibo:Chapter" about="#using-the-content-attribute" class="section"> insert: </div> content
attribute When creating her blog, Alice decided to use this simple structure to add Dublin Core information to her blog post (see also delete: <a href="#dfig2"> insert: <a href="#dfig2" class="fig-ref"> Figure 2 ):
insert: <div class="example"><html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title" >The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created" >2011-09-10</span></p> ... </body> </html>insert: </div>
However, to do that, Alice had to accept a small compromise. Indeed, although the string "2011-09-10" unambiguously identifies a date for a machine, it does not looks very natural for a human reader. Surely a native English reader would prefer something like "10th of September, 2011". On the other hand, although it is of course possible for a machine to parse and interpret that string as a date, too, it is clearly more complicated to do so. The problem is that, as a default, RDFa uses the textual content of the element for the property value. While this works well in most of the cases, sometimes, like in this example, this has awkward consequences.
To alleviate this problem RDFa makes it possible to re-use the content
attribute of HTML. The blog entry could be written as follows:
<html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created" content="2011-09-10" >10th of September, 2011</span></p> ... </body> </html>
insert: </div> The resulting structure is exactly the same as before (i.e., delete: <a href="#dfig2"> insert: <a href="#dfig2" class="fig-ref"> Figure 2 ). The difference is the presence of the content
attribute: it instructs the RDFa processor to overrule the default behavior of using the textual content, and to use the value of the content
attribute instead. Using this attribute Alice could provide a more readable date, while maintaining an ambiguous unambiguous content for machines using the structured data.
The content
attribute has another important usage. The "traditional" approach to add simple metadata to a Web page has been to use the document header through the link
and the meta
elements. While there is no problem using link
in RDFa Lite (which uses the href
attribute, i.e., can be used to define "flavored" links), the fact that, in a conforming HTML file, the meta
element may have no text content means that the only way of using the header for such statements is to use the content
attribute. For example, using the meta
element is the approach suggested by Facebook for the Open Graph Protocol [ delete: <a class="bibref" rel="biblioentry" href="#bib-OGP"> insert: <a class="bibref" href="#bib-OGP"> OGP ] vocabulary; i.e., if Alice wants to make use of the "Like" button in her posts, this is what she would add to her header:
<html> <head prefix="og: http://ogp.me/ns#" > ... <meta property="og:title" content="The Trouble with Bob" /> <meta property="og:type" content="text" /> <meta property="og:image" content="http://example.com/alice/bob-ugly.jpg" /> ... </head> <body> ... </body> </html>delete: <p class="note"> insert: </div>
In this example the prefix for the Open Graph Protocol vocabulary is defined via the prefix
attribute. Alas, many authors forget to do so. Fortunately, the og
prefix is part of the initial context for RDFa, i.e., the resulting information will be valid even without the prefix declaration…
Alice has already put license information on her page:
insert: <div class="example"><p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p>insert: </div>
but she would like to complete this by recording the date of her copyright statement as a structured data, too. She can use the date
term of Dublin Core:
<p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©<span property="dc:date" >2011</span> Alice Birpemswick.</p>
insert: </div> However, the value used for the date may be ambiguous for machines. Of course, if a program "knows" that that http://purl.org/dc/terms/date
refers to a date, then of course it can find out that the string "2011" stands for a year. But there may be processors that, for example, provide a visual presentation of all the structured data on a specific page, and would like to use a different "widget" to represent a year and again another one to represent, say, an integer number. How would such a processor know which one to choose?
Alice may decide to be helpful by adding an additional information to that item in the form of a datatype . This additional information can be conveyed to the RDFa processor using the datatype
RDFa attribute as follows:
<p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©<span property="dc:date" datatype="xsd:gYear" >2011</span> Alice Birpemswick.</p>
insert: </div> where xsd:gYear
stands for http://www.w3.org/2001/XMLSchema#gYear
, and is one of the standard datatypes defined by delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> 's Datatype specification [ delete: <a class="bibref" rel="biblioentry" href="#bib-XSD11"> XSD11 insert: <a class="bibref" href="#bib-XMLSCHEMA11-2"> XMLSCHEMA11-2 ] which contains such types as booleans, integers, dates, or doubles. ( xsd
is one of the default prefixes for RDFa.)
about
Alice has used the following patterns to define structured data for the individual blogs:
insert: <div class="example"><div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <h3 property="creator" resource="#me">Alice</h3> ... </div>insert: </div>
The role of the resource
attribute in the div
element is to set the "context", i.e., the subject for all the subsequent statements. Also, when combined with the property
attribute, resource
can be used to set the "target", i.e., the object for the statement (much as href
).
This pattern is perfectly fine, but it may become too verbose in some cases. Indeed, let us suppose that Alice would like to set up a separate index page for all her blogs, and the only information she would like to put there, as structured data, is references to the titles. Following the same pattern, she would have to do something like:
insert: <div class="example"><ul> <li resource="/alice/posts/trouble_with_bob" ><span property="title" >The trouble with Bob</span></li> <li resource="/alice/posts/jos_barbecue" ><span property="title" >Jo's Barbecue</span></li> ... </li> </ul>insert: </div>
This of course works, but it is a bit convoluted. Merging the information into one element, i.e.:
insert: <div class="example"><ul resource="/alice/posts/trouble_with_bob"> <li resource="/alice/posts/trouble_with_bob" property="title">The trouble with Bob</li> ... </li> </ul>insert: </div>
would not be correct; the combination of property
and resource
would generate a different statement than originally intended.
RDFa introduces a separate attribute, called about
, that can be used as an alternative to resource
in setting the the context. Using that attribute, Alice could write:
<ul> <li about="/alice/posts/trouble_with_bob" property="title">The trouble with Bob</li> <li about="/alice/posts/jos_barbecue" property="title">Jo's Barbecue</li> ... </li> </ul>insert: </div>
The fundamental difference between about
and resource
is that the former is only used to set the context, whether combined with the property
attribute on the same element or not. This also means that, for such usage, about
and resource
are interchangeable; i.e., in her original blog item, Alice could have chosen to write:
<div about="/alice/posts/trouble_with_bob" > <h2 property="title">The trouble with Bob</h2> <h3 property="creator" resource="#me">Alice</h3> ... </div>
rel
Another pattern that Alice used in her code is as follows:
insert: <div class="example"><div vocab="http://xmlns.com/foaf/0.1/" resource="#me"> <ul> <li property="knows" resource="http://example.com/bob/#me" typeof="Person"> <a property="homepage" href="http://example.com/bob/"><span property="name">Bob</span></a> </li> <li property="knows" resource="http://example.com/eve/#me" typeof="Person"> <a property="homepage" href="http://example.com/eve/"><span property="name">Eve</span></a> </li> <li property="knows" resource="http://example.com/manu/#me" typeof="Person"> <a property="homepage" href="http://example.com/manu/"><span property="name">Manu</span></a> </li> </ul> </div>insert: </div>
Each "branch" in the list sets a separate object (blank nodes in this example) and the same property ( foaf:knows
) is used to bind them to the same context. The property="knows"
had to be repeated in each list element to define the corresponding property. If this structure is generated by some CMS systems, this is of course not a problem. However, if such structure is authored manually, it is clearly error prone: the property name can be misspelled or forgotten.
Instead, Alice could use another RDFa attribute, namely rel
. Using this attribute the corresponding HTML would look as:
<div vocab="http://xmlns.com/foaf/0.1/" resource="#me"> <ul rel="knows" > <li resource="http://example.com/bob/#me" typeof="Person"> <a property="homepage" href="http://example.com/bob/"><span property="name">Bob</span></a> </li> <li resource="http://example.com/eve/#me" typeof="Person"> <a property="homepage" href="http://example.com/eve/"><span property="name">Eve</span></a> </li> <li resource="http://example.com/manu/#me" typeof="Person"> <a property="homepage" href="http://example.com/manu/"><span property="name">Manu</span></a> </li> </ul> </div>
insert: </div> In contrast to property
, rel
never considers the textual content of an element (or the value of the content
attribute). Instead, if no clear target has been specified for a link via, e.g., a resource
or an href
attribute, the processor is supposed to go “down” and find one or more targets in the hierarchy and use those. This is what happens in this case: the knows
attribute on the ul
element does not include any obvious target; however, the processor finds those in the individual li
elements and will use those. This type of pattern is typical for the usage pattern is fairly widespread in RDFa. delete: </p> delete: <p class="note"> of insert: <code> rel insert: </code>
. insert: </p>
In many situations, property
and rel
are interchangeable when the intended structured data involves (flavored) links. There are, however, subtle differences involving, for example, “chaining” that must be used with care. The interested reader should consult the relevant section of the RDFa 1.1 specification for further details. In general, it is a good advise to stay with advised to use property
, when possible.
RDFa benefits from the power of RDF [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDF-PRIMER"> insert: <a class="bibref" href="#bib-RDF-PRIMER"> RDF-PRIMER ], the delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> 's standard for interoperable machine-readable data. Although readers of this document are not expected to understand RDF, some may be interested in how these two specifications interrelate.
RDF, the Resource Description Framework, is the abstract data representation we have drawn out as graphs in the examples above. Each arrow in the graph is represented as a subject-property-object triple: the subject is the node at the start of the arrow, the property is the arrow itself, and the object is the node or literal at the end of the arrow. A set of such RDF triples is often called an "RDF graph", and is typically stored in what is often called a "Triple Store" or a "Graph Store".
Consider the first example graph:
delete: <div class="figure c1"> insert: <figure class="figure c1" id="fig-relationship-value-is-text">The two RDF triples for this graph are written, using the Turtle syntax [ delete: <a class="bibref" rel="biblioentry" href="#bib-TURTLE"> insert: <a class="bibref" href="#bib-TURTLE"> TURTLE ] for RDF, is as follows:
insert: <div class="example"><http://www.example.com/alice/posts/trouble_with_bob> <http://purl.org/dc/terms/title> "The Trouble with Bob" ; <http://purl.org/dc/terms/created> "2011-09-10" .insert: </div>
The TYPE arrows we drew are no different from other arrows. The TYPE is just another property that happens to be a core RDF property, namely rdf:type
. The rdf
vocabulary is located at http://www.w3.org/1999/02/22-rdf-syntax-ns#
. The contact information example from above should thus be diagrammed as:
The point of RDF is to provide a universal language for expressing data and relationships. A unit of data can have any number of properties that are expressed as URLs. These URLs can be reused by any publisher, much like any web publisher can link to any web page, even ones they did not create themselves. Given data, in the form of RDF triples, collected from various locations, and using the RDF query language SPARQL [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDF-SPARQL-QUERY"> RDF-SPARQL-QUERY insert: <a class="bibref" href="#bib-SPARQL-QUERY"> SPARQL-QUERY ], one can search for "friends of Alice's who created items whose title contains the word 'Bob'," whether those items are blog posts, videos, calendar events, or other data types.
RDF is an abstract data model meant to maximize the reuse of vocabularies. RDFa is a way to express RDF data within HTML, in a way that is machine-readable, and by reusing the existing human-readable data in the document.
delete: <div id="custom-vocabularies" typeof="bibo:Chapter" about="#custom-vocabularies" class="section"> insert: <section id="custom-vocabularies">As Alice marks up her page with RDFa, she may discover the need to express data, such as her favorite photos, that is not covered by existing vocabularies. If she needs to, Alice can create a custom vocabulary suited for her needs. Once a vocabulary is created, it can be used in RDFa markup like any other vocabulary.
The instructions on how to create a vocabulary, also known as an RDF Schema, are available in Section 5 of the RDF Primer [ delete: <a class="bibref" rel="biblioentry" href="#bib-RDF-PRIMER"> insert: <a class="bibref" href="#bib-RDF-PRIMER"> RDF-PRIMER ]. At a high level, the creation of a vocabulary for RDFa involves:
http://example.com/photos/vocab#
. Photo
and Camera
, as well as the property takenWith
that relates a photo to the camera with which it was taken. vocab
attribute or with the prefix declaration mechanism. For example: prefix="photo: http://example.com/photos/vocab#"
and typeof="photo:Camera"
. It is worth noting that anyone who can publish a document on the Web can publish a vocabulary and thus define new data fields they may wish to express. RDF and RDFa allow fully distributed extensibility of vocabularies.
delete: </div> delete: </div> delete: <div id="rdfa-tools" typeof="bibo:Chapter" about="#rdfa-tools" class="section"> insert: </section>There is a wide variety of tools that can be used to generate or process RDFa data. Good sources for these are the RDFa page of the delete: <acronym title="World Wide Web Consortium"> insert: <abbr title="World Wide Web Consortium"> W3C delete: </acronym> insert: </abbr> Semantic Web Wiki , although care should be taken that some tools may be related to a previous version of RDFa. Another source may be the RDFa community site’s implementation page . Both these sources are constantly evolving. By the way, the latter is part of a more general community page that contains further examples for using RDFa, general information, as well as information on how to get involved. In particular, RDFa fragments can be tested using the real-time RDFa 1.1 editor that can also display a visual representation of the underlying structural data.
delete: </div> delete: <div id="acknowledgments" typeof="bibo:Chapter" about="#acknowledgments" class="section"> insert: </section>At the time of publication, the active members of the RDF Web Application Working Group were:
Thanks also to Grant Robertson and Guus Schreiber who, though not part of the Working Group, have provided useful comments on earlier drafts of this note.
delete: </div> delete: <div id="references" class="appendix section" typeof="bibo:Chapter" about="#references"> insert: </section>