RDFa Use Cases: Scenarios for Embedding RDF in HTML

Introduction

Current web pages, written in HTML, contain significant inherent structured data. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites. An event on a web page can be directly imported into a user's desktop calendar. A license on a document can be detected so that the user is informed of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a syntax that expresses RDF structured data in HTML. An important goal of RDFa is to achieve this RDF embedding without repeating existing HTML content when that content is the structured data.

This document presents the major use cases where embedding structured data in HTML using RDFa provides significant benefit. Each use case explores how publishers, tool builders, and consumers benefit from RDFa. In parallel, the reader is encouraged to look at the RDFa Primer and RDFa Syntax (to be published).

In this document, we consider Publishers, Tool Builders, and Users. For simplicity, we give our fictitious users first names whose first letter matches their role: Paul, Patrick, and Peter are publishers, Tim, Tod, and Tara and tool builders, Ursula and Ulrich are users.

Audience

This document assumes a reader who has reasonable experience with HTML and RDF, in particular N3 notation.

An Overview of the Use Cases

Use Case #1 — Basic Structured Blogging: Paul maintains a blog and wishes to "mark up" his existing page with structure so that tools can pick up his blog post tags, authors, titles, and his blogroll. In particular, his HTML blog should be usable as its own structured feed.
Use Case #2 — Publishing an Event - Overriding Some of the Rendered Data: Paul sometimes gives talks on various topics, and announces them on his blog. He would like to mark up these announcements with proper scheduling information, so that RDFa-enabled agents can automatically obtain the scheduling information and add it to the browsing user's calendar. Importantly, some of the rendered data might be more informal than the machine-readable data required to produce a calendar event. Also of importance: Paul may want to annotate his event with a combination of existing vocabularies and a new vocabulary of his own design.
Use Case #3 — Content Management Metadata: Tod sells an HTML-based content management system, where all documents are processed and edited as HTML, sent from one editor to another, and eventually published and indexed. He would like to build up the editorial metadata within the HTML document itself, so that it is easier to manage and less likely to be lost.
Use Case #4 — Self-Contained HTML Fragments: Tara runs a video sharing web site. When Paul wants to blog about a video, he can paste a fragment of HTML provided by Tara directly into his blog. The video is then available inline, in his blog, along with any licensing information (Creative Commons?) about the video.
Use Case #5 — Web Clipboard: Ursula is looking for a new apartment and some items with which to furnish it. She browses various RDFa-enabled web pages, including apartment listings, furniture stores, kitchen appliances, etc. Every time she finds an item she likes, she can point to it, extract the locally-relevant structured data expressed using RDFa, and transfer it to her apartment-hunting page, where it can be organized, sorted, categorized. Any additional features of the HTML that are not structured, e.g. links to photos, are conserved by the transfer.
Use Case #6 — Semantic Wiki: Tim runs an RDFa-aware Semantic Wiki, where users contribute content in Wiki markup, using a WYSIWYG tool, or using HTML+RDFa. In all cases, the semantic wiki produces HTML+RDFa, so that users like Ursula can transfer the structured content from one semantic wiki (or any other RDFa source) to another semantic wiki (or any other RDFa destination). In particular, Ursula may be pasting her apartment-and-furnishing finds into her own Semantic Wiki.
Use Case #7 — Augmented Browsing for Scientists: Patrick writes a science blog where he discusses proteins, genes, and chemicals. As he has very little control over the layout—he's using a fairly constrained hosting provider—, Patrick adds RDFa to indicate the scientific components he's working with. Ulrich, a scientist, can browse Patrick's site with an RDFa-aware browser and automatically cross-reference the proteins and genes that Patrick is talking about.
Use Case #8 — Advanced Data Structures: Patrick keeps a list of his scientific publications on his web site. Using the BibTex vocabulary, he would like to provide structure within this publications page so that Ulrich, who browses the web with an RDFa-aware client, can automatically extract this information and use it to cite Patrick's papers.
Use Case #9 — Publishing a RDF Vocabulary: Paul wants to publish a large vocabulary in RDFS and/or OWL. Paul also wants to provide a clear, human readable description of the same vocabulary. Using RDFa, the terms themselves can be mixed with a descriptive text in HTML. The RDFa engine can then extract the vocabulary in RDF/XML and/or n3 formats, to be included used directly by RDF aware applications (eg, reasoners).

Use Case #1 — Basic Structured Blogging

Paul maintains a blog and wishes to "mark up" his existing page with structure so that tools can pick up his blog post tags, authors, titles, and his blogroll, and so that he does not need to maintain a parallel version of his data in "structured format." For this purpose, Paul chooses the FOAF and Dublin Core vocabularies.

Paul's starting HTML (before RDFa) is:

<html>
    <head><title>Paul's Blog</title></head>
    <body>
...
    <div id="www2007_talk">
    <h2>My WWW2007 Talk</h2>
    a post by Paul.
    <p>
        I'm giving a talk at the WWW2007 Conference about structured blogging.
    </p>

    </div>
...
    <div id="blogroll">
      <ul>
	<li> <a href="http://example.org/tim#me">Tim</a></li>
	<li> <a href="http://example.org/Ursula#me">Ursula</a></li>
...
	<li> <a href="http://example.org/tod#me">Tod</a></li>
      </ul>
    </div>
...
    </body>
</html>

(html)

and the desired RDF triples, meant to reuse the corresponding text and link targets from the HTML above, are:

@prefix dc: <http://purl.org/dc/elements/1.1/>
@prefix foaf: <http://xmlns.com/foaf/0.1/>
    
<#www2007_talk> dc:title "My WWW2007 Talk" ;
                dc:creator "Paul" .

<#me> foaf:knows <http://example.org/Tim#me> ;
      foaf:knows <http://example.org/Ursula#me> ;
      foaf:knows <http://example.org/Tod#me> .

(n3)

A user with an RDFa-aware browser can automatically pick up Paul's list of acquaintances. An RDFa-aware newsreader can use the HTML page itself as a newsfeed, rather than seek out a separate, parallel RSS or Atom file. Importantly, if Paul edits one of his blog posts, the corresponding structured data is also automatically updated. The RDFa-aware newsreader will automatically pick up the updated title, content, and tags.

Note that many blog publishing tools already provide an "auto-update" feature: they generate multiple output formats at the same time, either by "re-baking" the entire site when an update is committed, or by dynamically generating a response to each query. With RDFa, the blog publishing engine can be significantly simplified: it only needs to produce an HTML+RDFa output, which is usable by both humans and automated feed readers. In addition, the structured data now naturally includes information not typically seen in newsfeeds: the user's blogroll, contact information, geo-location, etc., which newsreaders can begin to pick up as they see fit.

Use Case #2 — Publishing an Event - Overriding Some of the Rendered Data

Paul sometimes gives talks on various topics, and announces them on his blog, as well as on a static page of his web site that archives all of the talks he's given. He would like to mark up these announcements with proper scheduling information, so that RDFa-enabled agents can automatically obtain the scheduling information and add it to the browsing user's calendar. Importantly, some of the rendered data might be more informal than the machine-readable data required to produce a calendar event. In addition, Paul wants to add structured data using a combination of existing Dublin Core, vcal, and some of his own terms he uses to categorize the audience of his talks.

Paul's HTML is:

...
<div id="www2007_talk">
  
    <h2>My WWW2007 Talk</h2>
    a post by Paul.

    <p>
        I'm giving a talk at the WWW2007 Conference about structured blogging,
	on the second day of the conference at 10. This will be one of my
	<a href="technical">more technical talks</a>.
    </p>

    </div>
...

(html)

and his desired RDF triples are:

@prefix cal: <http://www.w3.org/2002/12/cal/ical#>
@prefix dc: <http://purl.org/dc/elements/1.1/>
@prefix paul: <http://example.org/Paul/ns#>

<#www2007_talk> a cal:Vevent ;
                dc:title "My WWW2007 Talk" ;
                dc:creator "Paul" ;
                cal:summary "structured blogging" ;
                cal:dtstart "20070509T1000-0800" ;
                paul:audience <technical> .

(n3)

When Ursula points her RDFa-enabled web browser to Paul's blog, she notices a calendar icon next to the event description. By clicking on it, she gets the option to add the event to her calendar of choice. Note that the rendered HTML uses informal language to describe the scheduling of the event, i.e. "the second day of the conference," while the structured data contains the complete iCal timestamp. In addition, the structured data also contains the Dublin Core properties. Finally, though Ursula may not know what to make of the paul:audience predicate, she can quickly find out what this predicate means using typical RDF navigation.

Use Case #3 — Content Management Metadata

Tod sells an HTML-based content management system, where all documents are processed and edited as HTML, sent from one editor to another, and eventually published and indexed. He would like to build up the editorial metadata within the HTML document itself, so that the metadata is never lost.

For this purpose, Tod's software uses RDFa with non-rendered metadata. Peter, one of Tod's customers, runs Foo Magazine, which ships content to aggregators and business partners using HTML. As Peter performs editorial tasks using Tod's content management system, metadata properties are added to the document. These data are not rendered, but they can be extracted using a generic RDFa parser. Peter can thus insert a block of workflow and rights reuse metadata about the document and its components at a single point in the XHTML file and then ship the document off to a business partner.

Peter's baseline HTML, not including the RDFa, is:

<html>
  <head>
    <title>Add Some Tex Mex Sizzle to Your Kid's Lunch</title>
  </head>

  <body>
    <h1>Add Some Tex Mex Sizzle to Your Kid's Lunch</h1>
    <div id='recipe22143'>
      <h2>Amigo Corn Dogs</h2>
      <img id="pic9932" src="http://www.example.org/FooMagazine/img/342.jpg"/>
...
    </div>
  
    <div id='recipe13941'>
      <h2>EZ Bean Tacos</h2>
...
    </div>
...
  </body>
</html>

(html)

and his desired RDF triples are:

@prefix fm: <http://www.example.org/FooMagazine/ns#>
@prefix dc: <http://purl.org/dc/elements/1.1/>
@prefix pr: <http://prismstandard.org/namespaces/1.2/basic/>

<> fm:newsStandDate "2006-04-03" ;
   pr:coverDate "2007-02-24" .

<#recipe13941> fm:ComponentID "XZ3214" ;
               fm:ComponentType "Recipe" ;
               fm:RecipeID "r003423" .

<http://www.example.org/FooMagazine/img/342.jpg> dc:creator "Joe Smith" ;
                                                 pr:embargoDate "2007-03-12" .

(n3)

Use Case #4 — Self-Contained HTML Fragments

Tara runs a video sharing web site. Paul frequently blogs about videos. Some are his own, which he distributes exclusively, while others are videos from Tara's site which he reviews. When Paul wants to blog about a video from Tara's site, he can paste a fragment of HTML provided by Tara directly into his blog:

<div>
  <object width="425" height="350">
    <param name="movie" value="http://example.org/tara/video_123"></param>
  </object>

  The US Constitution, a Documentary.

  available under a
  <a rel="license" href="http://creativecommons.org/licenses/by/2.5/">
    CC License
  </a>. Please provide credit to Tara.
</div>

(html)

Once augmented with RDFa, the HTML fragment above can be copied and pasted into Paul's blog post, carrying along with it the following triples:

@prefix cc: <http://creativecommons.org/ns#>
@prefix dc: <http://purl.org/dc/elements/1.1/>
@prefix xhtml: <http://http://www.w3.org/1999/xhtml#>

<http://example.org/Tara/video_123> dc:title "The US Constitution, a Documentary" ;
                                    xhtml:license <http://creativecommons.org/licenses/by/2.5/> ;
                                    cc:attributionName "Tara" .

(n3)

When Paul uses the HTML+RDFa markup provided by Tara, the video is then available inline, in his blog, along with this structured title and licensing information about the video. A user browsing Paul's blog with an RDFa-aware browser can tell that the video shared from Tara's site is licensed under Creative Commons.

Note specifically that the HTML+RDFa markup allows Paul to display, within a single HTML page, multiple videos, each with its own license, title, and other structured information. The videos excerpted from Tara's site may be available under a Creative Commons license, while Paul's own videos are licensed under different terms.

Note also that Tara has already used the XHTML reserved keyword license in the HTML rel attribute. RDFa should play along with these existing reserved words.

Use Case #5 — Web Clipboard

Ursula is looking for a new apartment and some items with which to furnish it. She browses various RDFa-enabled web pages, including apartment listings, furniture stores, kitchen appliances, etc. Every time she finds an item she likes, she can point to it, extract the locally-relevant structured data expressed using RDFa, and transfer it to her apartment-hunting page, where it can be organized, sorted, categorized. Any additional features of the HTML that are not structured, e.g. links to photos, are conserved by the transfer.

Importantly, the structured data represented by the RDFa is easy to localize to a particular region of the rendered screen, so that Ursula can "point and click" her way to the structured data.The data Ursula aggregates can then be managed using any set of existing RDF tools for querying, sorting, and navigating.

Use Case #6 — Semantic Wiki

Tim runs an RDFa-aware Semantic Wiki (as [SMW]), where users contribute content in Wiki markup, using a WYSIWYG tool, or using HTML+RDFa. In all cases, the semantic wiki produces HTML+RDFa, so that users like Ursula can transfer the structured content from one semantic wiki (or any other RDFa source) to another semantic wiki (or any other RDFa destination). In particular, Ursula may be pasting her apartment-and-furnishing finds, from Use Case #5 — Web Clipboard, into her own Semantic Wiki.

The key principle here is that HTML+RDFa should remain transferable, almost first class: structured data from one location can be transferred to another location, where it can be rendered as HTML, from where it can be once again extracted and transferred.

Use Case #7 — Augmented Browsing for Scientists

Patrick writes web-based science articles where he discusses proteins, genes, and chemicals. As he has very little control over the layout—he's using a fairly constrained hosting provider—, Patrick adds RDFa to indicate the scientific components with which he's working. Note that Patrick clearly wants to reuse the large vocabularies already defined by the scientific community over the years, for example, in this case, the Uniprot vocabulary [UNIPROT]. Ulrich, a scientist, can then browse Patrick's site with an RDFa-aware browser and automatically cross-reference the proteins and genes that Patrick is talking about with his own data.

Specifically, Patrick may write the following blog post (pre-RDFa):

<div>
    Let's talk about the Corticotropin-lipotropin precursor protein, aka UPA3_HUMAN.
...
</div>

(html)

into which he would like to insert the following triples:

@prefix uniprot: <urn:lsid:uniprot.org:ontology:>

uniprot:P30089 a uniprot:Protein ;
               uniprot:name "Corticotropin-lipotropin precursor" ;
               uniprot:mnemonic "UPA3_HUMAN" .

(n3)

Then, Ulrich, who runs an RDFa-aware web browser (e.g. Firefox with a GreaseMonkey plugin), is provided with automatic popups with additional information for all proteins, genes, etc. in Patrick's blog. In particular, Ulrich may get links to related proteins, genes, and publications, where Patrick only added a bit of static markup.

Of course, this kind of augmented scientific browsing should be useful even for large publishers of scientific data, e.g. NCBI [NCBI]. Relationships between genes, proteins, and scientific literature should be expressible as easily as the HTML that embodies the same ideas.

Use Case #8 — Advanced Data Structures

Patrick keeps a list of his scientific publications on his web site. Using the BibTex vocabulary, he would like to provide structure within this publications page so that Ulrich, who browses the web with an RDFa-aware client, can automatically extract this information and use it to cite Patrick's papers without having to transcribe the bibliographic information.

Two important features of the BibTex vocabulary are worth highlighting for this use case: the structure is more than one-level deep, and ordering counts (i.e. who is first author?).

Specifically, Patrick may have the following HTML (pre-RDFa):

...
<div>
  Embedding RDF in HTML,<br />
  by Patrick, Paul, and Peter, in Proceedings of WWW 2007.<br />
  Volume 25, Number 3, May 2007,  pages 6--9.
</div>
...

(html)

into which he would like to insert the following triples:

:Patrick2007 a bibtex:Article;
    bibtex:title "Embedding RDF in HTML" ;
    bibtex:author (
      [
        foaf:name "Patrick"
      ]
      [
        foaf:name "Paul"
      ]
      [
        foaf:name "Peter"
      ]
    );
    bibtex:journal [
      bibtex:name "Proceedings of WWW 2007"
    ] ;
    bibtex:volume "25" ;
    bibtex:number "3" ;
    bibtex:date [
        bibtex:year "2007" ;
        bibtex:month "5"
    ] ;
    bibtex:page [
        bibtex:startPage "6" ;
        bibtex:endPage "9"
    ] .

(n3)

Use Case #9 — Publishing a RDF Vocabulary

Paul wants to publish a vocabulary in RDFS and/or OWL. He also wants to provide a clear, human readable description of the same vocabulary. Using RDFa, the terms themselves can be mixed with a descriptive text in HTML. The RDFa engine can then extract the vocabulary in RDF/XML and/or n3 formats, to be included used directly by RDF aware applications (eg, reasoners).

Consider, specifically, the SKOS example in RDF/XML:

<skos:Concept rdf:about="http:/example.com/Concept/0001">
  <skos:prefLabel>English cuisine</skos:prefLabel>
  <skos:altLabel>English dishes</skos:altLabel>
  <skos:altLabel xml:lang="fr">Cuisine anglaise</skos:altLabel>
  <skos:inScheme rdf:resource="http:/example.com/thesaurus"/>
  <skos:broader rdf:resource="http:/example.com/Concept/0002"/>
  <skos:related rdf:resource="http:/example.com/Concept/0003"/>
</skos:Concept>

An HTML expression (pre-RDFa) of this SKOS concept might look like:

<div>
    <h2>English Cuisine</h2>
    (also called "English dishes", or, in French, "Cuisine anglaise")

    go up to the broader <a href="http://example.com/Concept/0002">Concept #2</a>,<br />
    
    visit the related <a href="http:/example.com/Concept/0003">Concept #3</a>,<br />
    
    part of <a href="http://example.com/thesaurus">the Example Thesaurus</a>,<br />
    
</div>

Expressing a SKOS vocabulary may not require any additional features to those described in prior use cases. However, it is important to consider whether RDFa can indeed express the complexity of SKOS, which may not be captured in other examples.

Comparison to Microformats

Some RDFa use cases may be fulfilled by microformats [MF]. In particular, Use Case #1 and part of Use Case #2 can be achieved using XFN [XFN] and hCal [HCAL]. In such cases, microformats (possibly combined with GRDDL) provide a perfectly appropriate solution. However, in each of the use cases in this document, the microformat approach requires either building a new, complete vocabulary, or mixing several vocabularies in one single application. While this may be simple in the case of the basic existing microformats, it becomes prohibitive when large vocabularies are used, as in Use Case #7 — Augmented Browsing for Scientists or simply when the number of vocabularies mixed within an application becomes too large. RDFa aims to combine, remix, and extend existing vocabularies easily, thus fully enabling Use Case #2 — Publishing an Event - Overriding Some of the Rendered Data, Use Case #3 — Content Management Metadata, and Use Case #7 — Augmented Browsing for Scientists.

In addition, RDFa aims to define a single, non-domain-specific syntax, so that fragments of HTML+RDFa may be consistently interpretable. Specifically, consider Use Case #4 — Self-Contained HTML Fragments, Use Case #5 — Web Clipboard, and Use Case #6 — Semantic Wiki, where it is crucial that a single self-contained HTML fragment be complete enough to carry through the entire RDF structure.

Acknowledgments

The editors gratefully acknowledge contributions from:

Mark Birbeck
Jeremy Carroll
Ivan Herman
Steven Pemberton
Guus Schreiber
Ralph Swick
Elias Torres

Bibliography

SMW: Semantic Wikipedia (See http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_id=1055.)
MF: Microformats (See http://microformats.org/.)
XFN: XHTML Friends Network (See http://gmpg.org/xfn/.)
HCAL: hCalendar Microformat (See http://microformats.org/wiki/hcalendar.)
FOAF: The Friend of a Friend (FOAF) Project (See http://www.foaf-project.org/.)
SWD-WG: Semantic Web Best Deployment Working Group (See http://www.w3.org/2006/07/SWD/.)
RDFHTML: RDF-in-HTML Task Force (See http://www.w3.org/2001/sw/BestPractices/HTML/.)
SWBPD-WG: Semantic Web Best Practices and Deployment Working Group (See http://www.w3.org/2001/sw/BestPractices/.)
XHTML2-WG: XHTML2 Working Group, previously called HTML Working Group (See http://www.w3.org/MarkUp/.)
ICAL-RDF: RDF Calendar Interest Group Note (See http://www.w3.org/TR/rdfcal/.)
VCARD-RDF: Representing vCard Objects in RDF/XML (See http://www.w3.org/TR/vcard-rdf.)
UNIPROT: Uniprot - The Universal Protein Resource (See http://www.pir.uniprot.org/.)
NCBI: National Center for Biotechnology Information (See http://www.ncbi.nlm.nih.gov/.)

RDFa Use Cases: Scenarios for Embedding RDF in HTML

W3C Working Draft 30 March 2007

Abstract

Status of this Document

Table of Contents

Introduction

Audience

An Overview of the Use Cases

Use Case #1 — Basic Structured Blogging

Use Case #2 — Publishing an Event - Overriding Some of the Rendered Data

Use Case #3 — Content Management Metadata

Use Case #4 — Self-Contained HTML Fragments

Use Case #5 — Web Clipboard

Use Case #6 — Semantic Wiki

Use Case #7 — Augmented Browsing for Scientists

Use Case #8 — Advanced Data Structures

Use Case #9 — Publishing a RDF Vocabulary

Comparison to Microformats

Acknowledgments

Bibliography