W3C

RDF/A Primer 1.0

Embedding RDF in XHTML

draft 09 January 2006

This version:
http://www.w3.org/2001/sw/BestPractices/HTML/2006-01-15-rdfa-primer
Latest version:
http://www.w3.org/2006/07/SWD/RDFa/primer
Previous version:
http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-primer
Editor:
Ben Adida, Creative Commons <ben@creativecommons.org>

Abstract

This document introduces the RDF/A syntax for expressing RDF metadata within XHTML. The reader is expected to be fairly familiar with XHTML, and somewhat familiar with RDF.

Status of this Document

This is an internal draft produced by the RDF-in-HTML task force [RDFHTML], a joint task force of the Semantic Web Best Practices and Deployment Working Group [SWBPD-WG] and HTML Working Group [HTML-WG].

This document is for internal review only and is subject to change without notice. This document has no formal standing within the W3C.

Table of Contents

1 Purpose of RDF/A and Preliminaries
2 A Scenario: The Shutr Photo Management System
3 Simple Metadata
    3.1 Literal Properties
    3.2 URI Properties
4 Beyond the Current Document
    4.1 Qualifying Other Documents
    4.2 Inheriting about
    4.3 Qualifying Chunks of Documents
    4.4 Compact URIs (CURIEs)
        4.4.1 Mixing CURIEs and URIs
        4.4.2 Which Attributes are Which?
        4.4.3 Back to Shutr
5 Bibliography


1 Purpose of RDF/A and Preliminaries

RDF/A is a set of attributes used to embed RDF in XHTML. An important goal of RDF/A is to achieve this RDF embedding without repeating existing XHTML content when that content is the metadata. Though RDF/A was initially designed for XHTML2, one should be able to use RDF/A with other XML dialects, e.g. XHTML1, SVG, given proper schema additions.

We note that RDF/A makes use of XML namespaces. In this document, we assume, for simplicity's sake, that the following namespaces are defined: dc for Dublin Core, foaf for FOAF, cc for Creative Commons, and xsd for XML Schema Definitions.

2 A Scenario: The Shutr Photo Management System

Consider a (fictional) photo management web site called Shutr, whose web site is http://shutr.net. Users of Shutr can upload their photos at will, annotate them, organize them into albums, and share them with the world. They can choose to keep these photos private, or make them available for public consumption under licensing terms of their choosing.

The primary interface to Shutr is its web site and the XHTML it delivers. Since photos are contributed by users with significant amount of built-in metadata (camera type, exposure, etc...) and additional, explicitly provided metadata (photo caption, license, photographer's name), Shutr may benefit from using RDF to express this rich metadata.

We explore how Shutr might use RDF/A to express this RDF metadata right in the XHTML it already publishes. We assume an additional XML namespace, shutr, which corresponds to URI http://shutr.net/rdf/shutr#.

3 Simple Metadata

The simplest structured metadata Shutr might want to expose is basic information about a photo album: the creator of the album, the date of creation, and its license. We consider literal properties first, and URI properties second. (We ignore photo-specific metadata for now, as that involves RDF statements about an image, which is not an XHTML document. We will, of course, get back to this soon.)

3.1 Literal Properties

A literal property is a string of text, e.g. "Ben Adida", a number, e.g. "28", or any other typed, self-contained datum that one might want to express as a metadata property.

Consider Mark Birbeck, a user of the Shutr system with username markb, and his latest photo album "Vacation in the South of France." This photo album resides at http://shutr.net/user/markb/album/12345. The XHTML document presented upon request of that URI includes the following XHTML snippet:

<h1>Photo Album #12345: Vacation in the South of France</h1>
<h2>created by Mark Birbeck</h2>

Notice how the rendered XHTML contains elements of the photo album's structured metadata. Using RDF/A, Shutr can mark up this XHTML to indicate these structured metadata properties without repeating the raw data:

<h1>Photo Album #12345: <span property="dc:title">Vacation in the South of France</span></h1>
<h2>created by <span property="dc:creator">Mark Birbeck</span></h2>

An RDF/A-aware browser would thus extract the following RDF triples:

<> dc:title "Vacation in the South of France"^^XMLLiteral .
<> dc:creator "Mark Birbeck"^^XMLLiteral .

(The ^^XMLLiteral notation, which denotes a datatype, will be explained shortly.)

One might wonder, given the above example, if the span element is required to attach RDF properties to rendered content. In fact, it is not: the property attribute can be used on any XHTML element. For example, if the original HTML did not include the explicit words "Photo Album #12345":

<h1>Vacation in the South of France</h1>
<h2>created by Mark Birbeck</h2>

Then the RDF/A might look like this:

<h1 property="dc:title">Vacation in the South of France</h1>
<h2>created by <span property="dc:creator">Mark Birbeck</span></h2>

and would yield the same RDF triples, of course.

A reader who knows about XML datatypes might, at this point in the presentation, wonder what datatype these values will have. Given the above RDF/A, "Vacation in the South of France" is an XML Literal. In some cases, this may not be appropriate. Consider an expanded HTML snippet which includes the photo album's creation date:

<h1>Vacation in the South of France</h1>
<h2>created by Mark Birbeck on 2006-01-02</h2>

A precise way to augment this HTML with RDF/A is:

<h1 property="dc:title">Vacation in the South of France</h1>
<h2>created by <span property="dc:creator">Mark Birbeck</span>
    on <span property="dc:date" type="xsd:date">2006-01-02</span></h2>

which would yield the following triples (note how the default datatype is XMLLiteral, which explains the first example above.):

<> dc:title "Vacation in the South of France"^^XMLLiteral .
<> dc:creator "Mark Birbeck"^^XMLLiteral .
<> dc:date "2006-01-02"^^xsd:date .

Going further, Shutr realizes that 2006-01-02, while a correct xsd:date representation, is not exactly user-friendly. In this case, having the rendered data be the same as the structured data might not be the right answer. Shutr may instead opt for the following RDF/A:

<h1 property="dc:title">Vacation in the South of France</h1>
<h2>created 
  by <span property="dc:creator">Mark Birbeck</span>
  on <span property="dc:date" type="xsd:date"
           content="2006-01-02">
    January 2nd, 2006
     </span>
</h2>

The above XHTML will render the date as "January 2nd, 2006" but will yield the exact same triples as above. The use of the content attribute should be limited to cases where the rendered text is not well-enough structured to represent the metadata.

3.2 URI Properties

A URI property is one that is merely a reference to a web-accessible resource, e.g. an image, a PDF document, or another XHTML document, all reachable via the web.

As Mark Birbeck uploads many photo albums to Shutr, the site decides to build a user-profile page for him, a page that summarizes all of his albums and user profile information for others to see. This profile lives at http://shutr.net/user/markb. Thus, the dc:creator property should probably reference this URI. At the same time, Mark's name on the Shutr site should consistently link to this same URI in a clickable fashion.

The raw XHTML snippet might look like:

<h2>created by <a href="/user/markb">Mark Birbeck</a></h2>

Using the rel attribute, one can easily update this HTML to include an RDF/A statement:

<h2>created by <a rel="dc:creator" href="/user/markb">Mark Birbeck</a></h2>

This would then yield the expected triple:

<> dc:creator </user/markb> .

Similarly, Shutr may want to give its users the ability to license their photos to the world under certain specific conditions. For this purpose, there are numerous existing licenses, including those published by Creative Commons. Thus, if Mark Birbeck chooses to license his vacation album for others to reuse, Shutr might use the following XHTML snippet (currently -- January 2006 -- recommended by Creative Commons):

This document is licensed under a
<a href="http://creativecommons.org/licenses/by-nc/2.5/">
  Creative Commons Non-Commercial License
</a>.

This clickable link has an intended semantic meaning: it is the document's license. Using RDF/A can cement that meaning within the XHTML itself:

This document is licensed under a
<a rel="cc:license"
   href="http://creativecommons.org/licenses/by-nc/2.5/">
  Creative Commons Non-Commercial License
</a>.

Note the use of the rel attribute to indicate a URI property rather than a textual one. The use of this attribute goes hand in hand with an href attribute within the same element. This href attribute indicates the URI object of the RDF triple. Thus, the above RDF/A yields the following triple:

<> cc:license <http://creativecommons.org/licenses/by-nc/2.5/> .

Compared with other existing RDF mechanisms to indicate Creative Commons licensing -- e.g. a parallel RDF/XML file or inline RDF/XML within XHTML comments --, the RDF/A approach provides Creative Commons and Shutr with a significant integrity advantage: the clickable link is is the semantic link, and any change to the target will change both the human and machine views. Also, a simple copy-and-paste of the XHTML will carry through both the rendered and semantic data.

In both cases, the target URI may provide an XHTML document which includes further RDF/A statements. The Creative Commons license page, for example, may include RDF/A statements about its legal details.

4 Beyond the Current Document

The above examples casually swept under the rug the issue of the RDF subject: all the triples expressed were about the current document representing a photo album. However, not all RDF triples in a given XHTML2 document will be about that document itself. In RDF/A, the default subject is the current document, but it can easily be overriden using the about attribute.

4.1 Qualifying Other Documents

Shutr may choose to present many photos in a given XHTML page. In particular, at the URI http://shutr.net/user/markb/album/12345, all of the album's photos will appear inline. Metadata about each photo can be included simply by specifying an about attribute:

<ul>
  <li> <img src="/user/markb/photo/23456" />,
    <span about="/user/markb/photo/23456" property="dc:title">
      Sunset in Nice
    </span>
  </li>

  <li> <img src="/user/markb/photo/34567" />,
    <span about="/user/markb/photo/34567" property="dc:title">
      W3C Meeting in Mandelieu
    </span>
  </li>
</ul>

The above RDF/A yields the following triples:

</user/markb/photo/23456> dc:title "Sunset in Nice"^^XMLLiteral .

</user/markb/photo/34567> dc:title "W3C Meeting in Mandelieu"^^XMLLiteral .

This same approach applies to statements with URI objects. For example, each photo in the album has a creator and may have its own usage license.

<ul>
  <li> <img src="/user/markb/photo/23456" />,
    <span about="/user/markb/photo/23456" property="dc:title">
      Sunset in Nice
    </span>
    taken by photographer
    <a about="/user/markb/photo/23456" 
       rel="dc:creator"
       href="/user/markb">
      Mark Birbeck
    </a>,
    licensed under a
    <a about="/user/markb/photo/23456" rel="cc:license"
       href="http://creativecommons.org/licenses/by-nc/2.5/">
      Creative Commons Non-Commercial License
    </a>.
  </li>

  <li> <img src="/user/markb/photo/34567" /> 
    <span about="/user/markb/photo/34567" property="dc:title">
      W3C Meeting in Mandelieu
    </span>
    taken by photographer
    <a about="/user/markb/photo/34567"
	  rel="dc:creator"
	  href="/user/stevenp">
      Steven Pemberton
    </a>,
    licensed under a
    <a about="/user/markb/photo/34567" rel="cc:license"
       href="http://creativecommons.org/licenses/by/2.5/">
      Creative Commons Commercial License
    </a>.
  </li>
</ul>

This yields the following triples:

</user/markb/photo/23456>
        dc:title "Sunset in Nice"^^XMLLiteral .
</user/markb/photo/23456>
	dc:creator </user/markb> .
</user/markb/photo/23456>
        cc:license <http://creativecommons.org/licenses/by-nc/2.5/> .

</user/markb/photo/34567>
        dc:title "W3C Meeting in Mandelieu"^^XMLLiteral .
</user/markb/photo/34567>
        dc:creator </user/stevenp> .
</user/markb/photo/34567>
        cc:license <http://creativecommons.org/licenses/by/2.5/> .

4.2 Inheriting about

At this point, Shutr might begin to worry about the fast-growing size of its HTML document, given that the photo's URI must be repeated in the about attribute for every RDF property expressed. To address this issue, RDF/A allows the value of this attribute to be inherited from a parent element. In other words, if an element carries a rel or property attribute, but no about attribute, an RDF/A browser will determine the subject of the RDF statement by navigating up the parent hierarchy of that element until it finds an about, or until it gets to the root element, at which point the default is about="".

Thus, the markup for the above example can be simplified to:

<ul>
  <li about="/user/markb/photo/23456">
    <img src="/user/markb/photo/23456" />
    <span property="dc:title">
      Sunset in Nice
    </span>,
    taken by photographer 
    <a rel="dc:creator" href="/user/markb/">
      Mark Birbeck
    </a>,
    licensed under a
    <a rel="cc:license"
       href="http://creativecommons.org/licenses/by-nc/2.5/">
      Creative Commons Non-Commercial License
    </a>.
  </li>

  <li about="/user/markb/photo/34567">
    <img src="/user/markb/photo/34567" />
    <span property="dc:title">
      W3C Meeting in Mandelieu
    </span>,
    taken by photographer 
    <a rel="dc:creator" href="/user/stevenp">
      Steven Pemberton
    </a>
    licensed under a
    <a rel="cc:license"
       href="http://creativecommons.org/licenses/by/2.5/">
      Creative Commons Commercial License
    </a>.
  </li>
</ul>

which yields the same triples as the previous example, though, in this case, one can easily see the parallel to the corresponding N3 shorthand:

</user/markb/photo/23456> dc:title "Sunset in Nice"^^XMLLiteral ;
                          dc:creator </user/markb> ;
                          cc:license <http://creativecommons.org/licenses/by-nc/2.5/> .

</user/markb/photo/34567> dc:title "W3C Meeting in Mandelieu"^^XMLLiteral ;
                          dc:creator </user/stevenp> ;
                          cc:license <http://creativecommons.org/licenses/by/2.5/> .

4.3 Qualifying Chunks of Documents

While it makes sense for Shutr to have a whole web page dedicated to each photo album, it might not make as much sense to have a single page for each camera owned by a user. A single page that describes all cameras belong to a single user is the more likely scenario. For this purpose, RDF/A provides ways to make metadata statements about chunks of documents using natural XHTML constructs.

Consider the page http://shutr.net/user/markb/cameras, which, as its URI implies, lists Mark Birbeck's cameras. Its HTML includes:

<ul>
  <li id="nikon_d200"> Nikon D200, purchased on 2004-06-01.
  </li>

  <li id="canon_sd550"> Canon Powershot SD550, purchased on 2005-08-01.
  </li>
</ul>

and the photo page will then include information about which camera was used to take each photo:

<ul>
  <li> <img src="/user/markb/photo/23456" />
    ...
    using the <a href="/user/markb/cameras#nikon_d200">Nikon D200</a>,
    ...
  </li>
...
</ul>

The RDF/A syntax for formally specifying the relationship is exactly the same as before, as expected:

<ul>
  <li about="/user/markb/photo/23456"> <img src="/user/markb/photo/23456" />
    ...
    using the <a rel="shutr:takenWith" 
		 href="/user/markb/cameras#nikon_d200">Nikon D200</a>,
    ...
  </li>
...
</ul>

which generates the triple:

</user/markb/photo/23456> shutr:takenWith </user/markb/cameras#nikon_d200>

Then, the XHTML snippet at http://shutr.net/user/markb/cameras is:

<ul>
  <li id="nikon_d200" about="#nikon_d200">
    <span property="dc:title" type="xsd:string">
      Nikon D200
    </span>
    purchased on
    <span property="dc:date" type="xsd:date">
      2004-06-01
    </span>
  </li>

  <li id="canon_sd550" about="#canon_sd550">
    <span property="dc:title" type="xsd:string">
      Canon Powershot SD550
    </span>
    purchased on
    <span property="dc:date" type="xsd:date">
      2005-08-01
    </span>
  </li>
</ul>

which then yields the following triples:

<#nikon_d200> dc:title "Nikon D200"^^xsd:string ;
              dc:date "2004-06-01"^^xsd:date .

<#canon_sd550> dc:title "Canon SD550"^^xsd:string ;
               dc:date "2005-08-01"^^xsd:date .

One immediately wonders whether the redundancy between the about and id attributes can be simplified. Partly for this purpose, RDF/A includes elements link and meta, which behave in a special way : they only apply to their immediate parent element, even if an ancestor element bears an alternate about attribute.

<ul>
  <li id="nikon_d200">
    <meta property="dc:title" type="xsd:string">
      Nikon D200
    </span>
    purchased on
    <meta property="dc:date" type="xsd:date">
      2004-06-01
    </span>
  </li>

  <li id="canon_sd550">
    <meta property="dc:title" type="xsd:string">
      Canon Powershot SD550
    </span>
    purchased on
    <meta property="dc:date" type="xsd:date">
      2005-08-01
    </span>
  </li>
</ul>

One might now wonder how meta and link behave when their parent element doesn't have an id or about attribute. The result of such syntax is an RDF bnode, an advanced topic which we skip in this Primer.

4.4 Compact URIs (CURIEs)

For Shutr, as for many other web publishers, the introduction of RDF/A attributes tends to increase the size of the XHTML noticeably, sometimes unnecessarily so: there is significant data duplication with full expression of URIs. We have already shown how judicious use of the about attribute can reduce the number of times an RDF subject is expressed. We have also shown how the use of link and meta elements can further reduce the use of the about attribute when attaching metadata to particular XHTML chunks.

We now address URI duplication, RDF/A's most significant data duplication issue, with Compact URIs, known as CURIEs. A CURIE, e.g. dc:title is composed of a prefix, e.g. dc, followed by a colon, followed by a suffix, e.g. title. The compact URI is resolved by

  • resolving the prefix according to normal XML namespace resolution,
  • resolving the suffix as a relative URI against the base URI defined by the resolved prefix.

Note that QNames used for RDF properties are valid CURIEs, and resolve in exactly the same way. Thus dc:title and cc:license resolve as expected when dc and cc are correctly defined namespaces.

The differences to note between CURIEs and QNames are:

  • CURIEs allow any sequence of legal URI characters in the suffix, including, for example, digits only, dashes, slashes, etc...
  • CURIEs allow the empty string as a prefix, e.g. :next, in which case the base URI defaults to the default XML namespace, which is usually xhtml2 in our case.
  • CURIEs allow the underscore character _ as a prefix when referencing bnodes. More on this in the Advanced section.

4.4.1 Mixing CURIEs and URIs

One of the most important applications of CURIEs in RDF/A is the use of a CURIE/URI attribute, where either a normal URI or a CURIE can be used interchangeably. In order to differentiate between the two types, square brackets [] are used around a CURIE, whereas a URI is written normally.

For example, if Shutr wants to reference the Creative Commons license http://creativecommons.org/licenses/by/2.5/ in an attribute that accepts both CURIEs and URIs, it can use either:

... attr="http://creativecommons.org/licenses/by/2.5/" ...

or, assuming the namespace cclicenses has been properly defined:

... attr="[cclicenses:by/2.5/]" ...

4.4.2 Which Attributes are Which?

In RDF/A, the property attributes property,rel, and rev are all CURIE-only, which ensures backwards compatibility with past uses of rel, e.g. rel="next". The about and href attributes, on the other hand, accept mixed CURIE/URI datatypes. This ensures compatibility with browsers that expect clickability for the href, and consistency between subject and object.

4.4.3 Back to Shutr

Thus, getting back to Shutr's photo list:

<ul>
  <li> <img src="/user/markb/photo/23456" />,
    Sunset in Nice,
    taken by
    <a href="/user/markb">
      Mark Birbeck
    </a>,
    licensed under a 
    <a href="http://creativecommons.org/licenses/by/2.5/">
      Creative Commons License
    </a>.
  </li>

  <li> <img src="/user/markb/photo/34567" />,
    W3C Meeting in Mandelieu
    taken by
    <a href="/user/stevenp">
      Steven Pemberton
    </a>,
    licensed under a 
    <a href="http://creativecommons.org/licenses/by-nc/2.5/">
      Creative Commons Non-Commercial License
    </a>.
  </li>
</ul>

adding metadata to these photos with CURIEs can save significant space (over the non-CURIE use) as soon as there are a number of photos in the list:

<ul xmlns:cclic="http://creativecommons.org/licenses/" xmlns:photos="/user/markb/photo/">
  <li about="[photos:23456]"> <img src="/user/markb/photo/23456" />,
    <span property="dc:title">
      Sunset in Nice
    </span>,
    taken by
    <a rel="dc:creator" href="/user/markb">
      Mark Birbeck
    </a>,
    licensed under a 
    <a rel="cc:license"
       href="[cclic:by/2.5/]">
      Creative Commons License
    </a>.
  </li>

  <li about="[photos:34567]"> <img src="/user/markb/photo/34567" />,
    <span property="dc:title">
      W3C Meeting in Mandelieu
    </span>
    taken by 
    <a rel="dc:creator" href="/user/stevenp">
      Steven Pemberton
    </a>,
    licensed under a 
    <a rel="cc:license"
       href="[cclic:by-nc/2.5/]">
      Creative Commons Non-Commercial License
    </a>.
  </li>
</ul>

Of course, this assumes a browser that can parse CURIEs for clickable links. Initially, complete URIs may be preferable in the href attribute.

5 Bibliography

RDFHTML
RDF-in-HTML Task Force (See http://w3.org/2001/sw/BestPractices/HTML/.)
SWBPD-WG
Semantic Web Best Practices and Deployment Working Group (See http://w3.org/2001/sw/BestPractices/.)
HTML-WG
HTML Working Group (See http://w3.org/MarkUp/Group/.)