W3C

RDFa API

An API for extracting structured data from Web documents

W3C Working Draft 08 June 2010

This version:
http://www.w3.org/TR/2010/WD-rdfa-api-20100608/
Latest published version:
http://www.w3.org/TR/rdfa-api/
Latest editor's draft:
http://www.w3.org/2010/02/rdfa/sources/rdfa-dom-api/
Editors:
Benjamin Adrian, German Research Center for Artificial Intelligence GmbH
Manu Sporny, Digital Bazaar, Inc.
Mark Birbeck, Backplane Ltd.

Abstract

RDFa [RDFA-CORE] enables authors to publish structured information that is both human- and machine-readable. Concepts that have traditionally been difficult for machines to detect, like people, places, events, music, movies, and recipes, are now easily marked up in Web documents. While publishing this data is vital to the growth of Linked Data, using the information to improve the collective utility of the Web for humankind is the true goal. To accomplish this goal, it must be simple for Web developers to extract and utilize structured information from a Web document. This document details such a mechanism; an RDFa Document Object Model Application Programming Interface (RDFa DOM API) that allows simple extraction and usage of structured information from a Web document.

How to Read this Document

This section is non-normative.

This document is a detailed specification for an RDFa DOM API. The document is primarily intended for the following audiences:

For those looking for an introduction to the use of RDFa, or some real-world examples, please consult the RDFa Primer [RDFA-PRIMER].

If you are not familiar with RDF, you should read about the Resource Description Framework (RDF) [RDF-CONCEPTS] before reading this document. The [RDF-CONCEPTS] document outlines the core data model that is used by RDFa to express information.

If you are not familiar with RDFa, you should read and understand the [RDFA-CORE] specification. It describes how data is encoded in host languages using RDFa. A solid understanding of concepts in RDFa Core will inevitably help you understand how the RDFa DOM API works in concert with how the data is expressed in a host language.

If you are a Web developer and are already familiar with RDF and RDFa, and you want to programatically extract RDFa content from documents, then you will find the Concept Diagram and Developing with the API sections of most interest. It contains a handful of JavaScript examples on how to use the RDFa API.

Readers who are not familiar with the Terse RDF Triple Language [TURTLE] may want to read the specification in order to understand the short-hand RDF notation used in some of the examples.

This document uses the Web Interface Definition Language [WEBIDL] to specify all language bindings. If you intend to implement the RDFa DOM API you should be familiar with the Web IDL language [WEBIDL].

Examples may contain references to existing vocabularies and use abbreviations in CURIEs and source code. The following is a list of all vocabularies and their abbreviations, as used in this document:

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the RDFa Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-rdfa-wg@w3.org (subscribe, archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This section is non-normative.

RDFa provides a means to attach properties to elements in XML and HTML documents. Since the purpose of these additional properties is to provide information about real-world items, such as people, films, companies, events, and so on, properties are grouped into objects called Property Groups.

The RDFa DOM API provides a set of interfaces that make it easy to manipulate DOM objects that contain information that is also part of a Property Group. This specification defines these interfaces.

A document that contains RDFa effectively provides two data layers. The first layer is the information about the document itself, such as the relationship between the elements, the value of its attributes, the origin of the document, and so on, and this information is usually provided by the Document Object Model, or DOM [DOM-LEVEL-1].

The second data layer comprises information provided by embedded metadata, such as company names, film titles, ratings, and so on, and this is usually provided by RDFa [RDFA-CORE], Microformats [MICROFORMATS], DC-HTML, GRDDL, or Microdata.

Whilst this embedded information could be accessed via the usual DOM interfaces -- for example, by iterating through child elements and checking attribute values -- the potentially complex interrelationships between the data mean that it is more efficient for developers if they have access to the data after it has been interpreted.

For example, a document may contain the name of a person in one section and the phone number of the same person in another; whilst the basic DOM interfaces provide access to these two pieces of information through normal navigation, it is more convenient for authors to have these two pieces of information available in one property collection, reflecting the final Property Group.

All of this is achieved through the RDFa DOM API.

There are many scenarios in which the RDFa DOM API can be used to extract information from a Web document. The following sections describe a few of these scenarios.

1.1 Importing Data

Amy has enriched her band's web-site to include Google Rich Snippets event information. Google Rich Snippets are used to mark up information for the search engine to use when displaying enhanced search results. Amy also uses some JavaScript code that she found on the web that automatically extracts the event information from a page and adds an entry into a personal calendar.

Brian finds Amy's web-site through Google and opens the band's page. He decides that he wants to go to the next concert. Brian is able to add the details to his calendar by clicking on the link that is automatically generated by the Javascript tool. The Javascript extracts the RDFa from the web page and places the event into Brian's personal calendaring software - Google Calendar.

<div prefix="v: http://rdf.data-vocabulary.org/#" typeof="v:Event"> 
  <a rel="v:url" href="http://amyandtheredfoxies.example.com/events" 
     property="v:summary">Tour Info: Amy And The Red Foxies</a>
  
  <span rel="v:location">
  	<a typeof="v:Organization" rel="v:url" href="http://www.kammgarn.de/" property="v:name">Kammgarn</a>
  </span>
  <div rel="v:photo"><img src="foxies.jpg"/></div>
  <span property="v:description">Hey K-Town, Amy And The Red Foxies will rock Kammgarn in October.</span>
  When: 
  <span property="v:startDate" content="2009-10-15T19:00">15. Oct., 7:00 pm</span>-
  <span property="v:endDate" content="2009-10-15T21:00">9:00 pm</span>
  </span>

  Category: <span property="v:eventType">concert</span>
</div>

1.2 Enhanced Browser Interfaces

Dave is writing a browser plugin that filters product offers in a web page and displays an icon to buy the product or save it to a public wishlist. The plugin searches for any mention of product names, thumbnails, and offered prices. The information is listed in the URL bar as an icon, and upon clicking the icon, displayed in a sidebar in the browser. He can then add each item to a list that is managed by the browser plugin and published on a wishlist website.

<div prefix="rdfs: http://www.w3.org/2000/01/rdf-schema#
             foaf: http://xmlns.com/foaf/0.1/
             gr: http://purl.org/goodrelations/v1# 
             xsd: http://www.w3.org/2001/XMLSchema#">
 
  <div about="#offering" typeof="gr:Offering">
    <div property="rdfs:label" content="Harry Potter and the Deathly Hallows" xml:lang="en"></div>
    <div property="rdfs:comment" content="In this final, seventh installment of the Harry Potter series, J.K. Rowling 
    unveils in spectactular fashion the answers to the many questions that have been so eagerly 
    awaited. The spellbinding, richly woven narrative, which plunges, twists and turns at a 
    breathtaking pace, confirms the author as a mistress of storytelling, whose books will be read, 
    reread and read again." xml:lang="en"></div>
    <div rel="foaf:depiction" resource="http://ecx.images-amazon.com/images/I/51ynI7I-qnL._SL500_AA300_.jpg"></div>
    <div rel="gr:hasBusinessFunction" resource="http://purl.org/goodrelations/v1#Sell"></div>
    <div rel="gr:hasPriceSpecification">
      <div typeof="gr:UnitPriceSpecification">
        <div property="gr:hasCurrency" content="USD" datatype="xsd:string"></div>
        <div property="gr:hasCurrencyValue" content="7.49" datatype="xsd:float"></div>
      </div>
    </div>
    <div rel="gr:acceptedPaymentMethods" resource="http://purl.org/goodrelations/v1#PayPal"></div>
    <div rel="gr:acceptedPaymentMethods" resource="http://purl.org/goodrelations/v1#MasterCard"></div>
    <div rel="foaf:page" resource="http://www.amazon.com/Harry-Potter-Deathly-Hallows-Book/dp/0545139708"></div>
  </div>
   
   
</div>

1.3 Data-based Web Page Modification

Dale has a site that contains a number of images, showcasing his photography. He has already used RDFa to add licensing information about the images to his pages, following the instructions provided by Creative Commons. Dale would like to display the correct Creative Commons icons for each image so that people will be able to quickly determine which licenses apply to each image.

<div prefix="cc: http://creativecommons.org/ns#">
  <img src="http://dale.example.com/images/image1.png" 
       rel="cc:license" 
       resource="http://creativecommons.org/licenses/by/3.0/us/"/>
<a 
   href="http://dale.example.com" property="cc:attributionName" 
   rel="cc:attributionURL">Dale</a>
</div>   

1.4 Automatic Summaries

Mary is responsible for keeping the projects section of her company's home page up-to-date. She wants to display info-boxes that summarize details about the members associated with each project. The information should appear when hovering the mouse over the link to each member's homepage. Since each member's homepage is annotated with RDFa, Mary writes a script that requests the page's content and extracts necessary information via the RDFa DOM API.

<div prefix="dc: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/" >
<span about="#me" property="foaf:name" content="Bob">My<span> interests are:
<ol about="#me" typeof="foaf:Person">
<li rel="foaf:interests"><a href="facebook" rel="tag" property="dc:title">facebook</a></li>
<li rel="foaf:interests"><a href="opengraph" rel="tag" property="dc:title">opengraph</a></li>
<li rel="foaf:interests"><a href="semanticweb" rel="tag" property="dc:title">semanticweb</a></li>
</ol>
<p>
Please follow me on 
<span about="#me" rel="foaf:account">
<a href="http://twitter.com/bob" typeof="foaf:term_OnlineAccount" property="foaf:accountName">http://twitter.com/bob</a>.
</p>
</div>

1.5 Data Visualization

Richard has created a site that lists his favourite restaurants and their locations. He doesn't want to generate code specific to the various mapping services on the Web. Instead of creating specific markup for Yahoo Maps, Google Maps, MapQuest, and Google Earth, he instead adds address information via RDFa to each restaurant entry. This enables him to build on top of the structured data in the page as well as letting visitors to the site use the same data to create innovative new applications based on the address information in the page.

<div prefix="vc: http://www.w3.org/2006/vcard/ns# foaf: http://xmlns.com/foaf/0.1/" typeof="vc:VCard">
	<span property="vc:fn">Wong Kei</span>
	<span property="vc:street-address">41-43 Wardour Street</span>
	<span>
		<span property="vc:locality">London</span>, <span property="vc:country-name">United Kingdom</span>
	</span>
	<span property="vc:tel">020 74373071</span>
</div>

1.6 Linked Data Mashups

Marie is a chemist, researching the effects of ethanol on the spatial orientation of animals. She writes about her research on her blog and often makes references to chemical compounds. She would like any reference to these compounds to automatically have a picture of the compound's structure shown as a tooltip, and a link to the compound's entry on the National Center for Biotechnology Information [NCBI] Web site. Similarly, she would like visitors to be able to visualize the chemical compound in the page using a new HTML5 canvas widget she has found on the web that combines data from different chemistry websites.

<div prefix="dbp: http://dbpedia.org/ontology/ fb: http://rdf.freebase.com/rdf/" >
   My latest study about the effects of 
   <span about="[fb:en.ethanol]" 
      typeof="[dbp:ChemicalCompound]" 
      property="[fb:chemistry.chemical_compound.pubchem_id]" 
      content="702">ethanol</span> on mice's spatial orientation show that ...
</div>

2. Design Considerations

This section is non-normative.

RDFa 1.0 [RDFA-SYNTAX] has seen substantial growth since it became an official W3C Recommendation in October 2008. It has seen wide adoption among search companies, e-commerce sites, governments, and content management systems. There are numerous interoperable implementations and growth is expected to continue to rise with the latest releases of RDFa 1.1 [RDFA-CORE], XHTML+RDFa 1.1 [XHTML-RDFA], and HTML+RDFa 1.1 [HTML-RDFA].

In an effort to ensure that Web applications are able to fully utilize RDFa, this specification outlines an API and a set of interfaces that extract RDF Triples from Web documents or other document formats that utilize RDFa. The RDFa DOM API is designed with maximum code expressiveness and ease of use in mind. Furthermore, a deep understanding of RDF and RDFa is not necessary in order to extract and utilize the structured data embedded in RDFa documents.

Since there are many Web browsers and programming environments for the Web, the rapid adoption of RDFa requires an interoperable API that Web document designers can count on being available in all Web browsers. The RDFa DOM API provides a uniform and developer-friendly interface for extracting RDFa from Web documents.

Since most browser-based applications and browser extensions that utilize Web documents are written in JavaScript [ECMA-262], the implementation of the RDFa DOM API is primarily concerned with ensuring that concepts covered in this document are easily utilized in JavaScript.

While JavaScript is of primary concern, the RDFa DOM API specification is language independent and is designed such that DOM tool developers may implement it in many of the other common Web programming languages such as Python, Java, Perl, and Ruby. Objects that are defined by the RDFA DOM API are designed to work as seamlessly as possible with language-native types, operators, and program flow constructs.

2.1 Goals

The design goals that drove the creation of the APIs that are described in this document are:

Ease of Use and Expressiveness
While this should be a design goal for all APIs, special care is taken to ensure that developers can accomplish common tasks using a minimal amount of code. While execution speed is always an important factor to consider, it is secondary to minimizing the amount of work a developer must perform to extract and use data contained in the document.
Modularity and Pluggability
Each part of the API is modular and pluggable, meaning that data storage, parsers and query modules may be replaced by other developer-defined mechanisms as long as the interfaces listed in this document are supported.
DOM Orthogonality
Interfaces defined on the Document should match programming paradigms that are familiar to developers. For example, if one were to attempt to retrieve Element Nodes by Subject, the name of the method should be document.getElementsBySubject(), which mirrors the document.getElementsById() functionality that is a part of [DOM-LEVEL-1].
Support for Non-RDFa Parsers
Other languages that store data in the DOM are considered as first-class languages when it comes to extraction and support via the RDFa DOM API. Mechanisms like DC-HTML, eRDF, Microformats, and Microdata can be used to express structured data in DOM-based languages. It is a goal of this DOM API to ensure that information expressed in these languages can be extracted, using a developer-defined parser, and stored in the Data Store.
Low-level Access and the Freedom to Tinker
Providing an abstract API, while simpler, may not produce the kind of innovation that the semantics community would like to see. It is important to give developers access to the entire RDFa DOM API stack in order to ensure that each layer of the stack can be improved independently of a standards body.
Native Language Constructs
Data is exposed and processed in a way that is natural for Javascript and many other Web programming languages like Python, Ruby and even C++. For example, Property Groups can be exposed as native objects or dynamically accessible, associative arrays. Data Stores can be iterated over by providing an anonymous function or function pointer. By ensuring that programming language constructs are considered in the design of the API, we ensure that the API won't fight the language and thus, the developer.
Macros and Templates
Some of the mechanisms that underpin RDF are difficult to use in everyday programming. For example, having to type out an entire URI is not only laborious for a programmer, but also error prone and overly-verbose. RDFa Core [RDFA-CORE] introduces the concept of a Compact URI Expression, or CURIE. This API builds on the CURIE concept and allows IRIs to be expressed as CURIEs. The API should also provide short-cuts that reduce the amount of code that has to be repeated. Property Group Templates are one example of reducing repetitive code writing as it can be stored in a single variable and re-used for objects.

2.2 Concept Diagram

The following diagram describes the relationship between all concepts discussed in this document.

The RDFa DOM API Concept Stack

Diagram of RDFa DOM API Concepts

The lowest layer of the API defines the basic structures that are used to store information; IRI, Plain Literal, Typed Literal, Blank Node and finally the RDF Triple.

The next layer of the API, the Data Store, supports the storage of information.

The Data Parser and Data Query interfaces directly interact with the Data Store. The Data Parser is used to extract information from the Document and store the result in a Data Store. The Data Query interface is used to extract different views of data from the Data Store. The Property Group is an abstract, easily manipulable view of this information for developers. While Property Group objects can address most use cases, a developer also has access to the information in the Data Store at a basic level. Access to the raw data allows developers to create new views and ways of directly manipulating the data in a Data Store.

The highest layer to the API is the Document object and is what most web developers will use to retrieve Property Groups created from data stored in the document.

3. Developing with the API

This section is non-normative.

3.1 Basic Concepts

This section is non-normative.

This API provides a number of interfaces to enable:

3.1.1 The Basic API

The RDFa DOM API has a number of advanced methods that can be used to access the Data Store, Data Parser and Data Query mechanisms. Most web developers will not need to use the advanced methods - most will only require the following interfaces for most of their day-to-day development activities.

Retrieving Property Groups

document.getItemsByType(type)
Retrieves a list of Property Groups by their type, such as foaf:Person.
document.getItemBySubject(type)
Retrieves a single Property Group by its subject, such as http://example.org/people#bob.
document.getItemsByProperty(property, optional value)
Retrieves a list of Property Groups by a particular property and optional value that the Property Group contains.

Retrieving DOM Elements

document.getElementsByType(type)
Retrieves a list of DOM Nodes by the type of data that they express, such as foaf:Person.
document.getElementsBySubject(type)
Retrieves a list of DOM Nodes by the subject associated with the data that they express, such as http://example.org/people#bob.
document.getElementsByProperty(property, optional value)
Retrieves a list of DOM Nodes by a particular property and optional value that each expresses.

IRI Mapping

document.data.context.setMapping(prefix, iri)
Gets and sets short-hand IRI mappings that are used by the API, such as expanding foaf:Person to http://xmlns.com/foaf/0.1/Person.

Advanced Processing

document.data.query.select(query, template)
Retrieves an array of Property Groups based on a set of selection criteria.
document.data.store.filter(subject, predicate, object)
Filters a given Data Store by matching a given triple pattern.
document.data.parser.iterate(iterFunction)
Iterates through a DOM, passing every data item to the iteration function that is passed into the method.

3.1.2 Using the Basic API

The following section uses the markup shown below to demonstrate how to extract and use Property Groups using the RDFa DOM API. The following markup is assumed to be served from a document located at http://example.org/people.

<div prefix="foaf: http://xmlns.com/foaf/0.1/" about="#albert" typeof="foaf:Person">
  <span property="foaf:name">Albert Einstein</span>
</div>
Working with Property Groups

Retrieving Property Groups by Type

You can retrieve the object that is described above by doing the following:

var people = document.getItemsByType("http://xmlns.com/foaf/0.1/Person");
or you can specify a short-cut to use when specifying the IRI:
document.data.context.setMapping("foaf", "http://xmlns.com/foaf/0.1/");
var people = document.getItemsByType("foaf:Person");

Retrieving Property Groups by Subject

You can also get a Property Group by its subject:

var albert = document.getItemBySubject("http://example.org/people#albert");

You can also specify a relative IRI and the document IRI will be automatically pre-pended:

var albert = document.getItemBySubject("#albert");

Retrieving Property Groups by Property

You can get a list of Property Groups by their properties:

var peopleNamedAlbertEinstein = document.getItemByProperty("foaf:name", "Albert Einstein");

Using Property Groups

You can retrieve property values from Property Groups like so:

var albert = document.getItemBySubject("#albert");
var name = albert.get("foaf:name");

You can also specify values that you would like to map to Property Group attributes:

var albert = document.getItemBySubject("#albert", {"foaf:name": "name"});
var name = albert.name;
Managing Elements with Data

Retrieving Elements Containing Data by Type

You can retrieve the DOM Node that is described above by doing the following:

var elements = document.getElementsByType("http://xmlns.com/foaf/0.1/Person");
or you can specify a short-cut to use when specifying the IRI:
document.data.context.setMapping("foaf", "http://xmlns.com/foaf/0.1/");
var elements = document.getElementsByType("foaf:Person");

Retrieving Elements Containing Data by Subject

You can also get a list of Elements by the subject of data:

var elements = document.getElementsBySubject("http://example.org/people#albert");

You can also specify a relative IRI and the document IRI will be automatically pre-pended:

var elements = document.getElementsBySubject("#albert");

Retrieving Elements by Property

You can get a list of Elements by the properties and values that they declare:

var elements = document.getElementsByProperty("foaf:name", "Albert Einstein");

Modifying DOM Elements

You can modify elements that are returned just like any other DOM Node, for example:

The code above would change the color of all the areas of the page where the item's name is "Bob" to green.

3.2 Advanced Concepts

This section covers a number of concepts that go beyond basic everyday usage of the RDFa DOM API. The interfaces to the DOM API allow you to work with data at an abstract level, or query structured data and override key parts of the software stack in order to extend the functionality that the API provides.

3.2.1 Advanced Queries

The features available via a Query object will depend on the implementation. However, all conforming processors will provide the basic element selection mechanisms described here.

Querying by Type

Perhaps the most basic task is to select Property Groups of a particular type. The type of a Property Group is set in RDFa via the special attribute @typeof. For example, the following markup expresses a Property Group of type Person in the Friend-of-a-Friend vocabulary:

<div typeof="foaf:Person">
  <span property="foaf:name">Albert Einstein</span>
</div>

To locate all Property Groups that are people we would do the following:

var query = document.data.createQuery("rdfa", store);
var ar = query.select( { "rdf:type": "foaf:Person" } );

Note that the Query object has access to the mappings provided via the document.data.context object, so they can also be used in queries. It is also possible to write the same query in a way that is independent of any prefix-mappings:

var ar = query.select( { "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://xmlns.com/foaf/0.1/Person" } );
Querying by Property Value

The previous query selected all Property Groups of a certain type, but it did so by indicating that the property rdf:type should have a specific value. Queries can also specify other properties. For example, given the following mark-up:

<div typeof="foaf:Person">
  <span property="foaf:name">Albert Einstein</span> -
  <span property="foaf:myersBriggs">INTP</span>
  <a rel="foaf:workInfoHomepage" href="http://en.wikipedia.org/wiki/Albert_Einstein">More...</span>
</div>
<div typeof="foaf:Person">
  <span property="foaf:name">Mother Teresa</span> -
  <span property="foaf:myersBriggs">ISFJ</span>
  <a rel="foaf:workInfoHomepage" href="http://en.wikipedia.org/wiki/Mother_Teresa">More...</span>
</div>
<div typeof="foaf:Person">
  <span property="foaf:name">Marie Curie</span> - 
  <span property="foaf:myersBriggs">INTP</span>
  <a rel="foaf:workInfoHomepage" href="http://en.wikipedia.org/wiki/Marie_Curie">More...</span>
</div>

The following query demonstrates how a developer would select and use all Property Groups of type Person that also have a Myers Brigg's personality type of "INTP" (aka: The Architect):

var architects = query.select( {
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://xmlns.com/foaf/0.1/Person",
  "http://xmlns.com/foaf/0.1/myersBriggs": "INTP"
} );

var name = architects[0].get("http://xmlns.com/foaf/0.1/name");

As before, prefix-mappings can also be used:

var architects = query.select( {"rdf:type": "foaf:Person", "foaf:myersBriggs": "INTP"} );

var name = architects[0].get("foaf:name");

Directives to generate the Property Group object based on a template specified by the developer can also be used. In this case, all of the "INTP" personality types are gleaned from the page and presented as Property Groups containing each person's name and blog page:

var architects = query.select( {"rdf:type": "foaf:Person", "foaf:myersBriggs": "INTP"},
                               {"foaf:name": "name", "foaf:workInfoHomepage", "info" } );

var name = architects[0].name;
var infoWebpage = architects[0].info;

3.2.2 Direct Access to the Data Store

The RDFa DOM API allows a developer to not only query the Data Store at via the Data Query mechanism, it also allows a developer to get to the underlying data structures that represent the structured data at the "atomic level".

The filter interface is a part of the Data Store and enables a developer to filter a series of triples out of the Data Store. For example, to extract all triples about a particular subject, the developer could do the following:

var names = document.data.store.filter("http://example.org/people#benjamin");

Developers could also combine subject-predicate filters by doing the following:

var names = document.data.store.filter("http://example.org/people#benjamin", "foaf:nick");

The query above would extract all known nicknames for the subject as triples.

3.2.3 Direct Access to the Parser Stream

The .iterate interface will almost certainly see large changes in the next version of the RDFa API specification. Implementers are warned to not implement the interface and wait for the next revision of this specification.

The iterate interface can be used to process triples in the document as they are discovered. This interface is most useful for processing large amounts of data in low-memory environments.

var iter = document.data.parser.iterate("http://example.org/people#mark");
for(var triple = iter.next(); iter != null; triple = iter.next()) 
{
   // process each triple that is associated with http://example.org/people#mark
}

3.2.4 Overriding the Defaults

The major modules in the RDFa DOM API are meant to be overridden by developers in order to extend basic functionality as well as innovate new interfaces for the RDFa DOM API.

Overriding the Data Store

The API is designed such that a developer may override the default data store provided by the browser by providing their own. This is useful, for instance, if the developer wanted to create a permanent site-specific data store using Local Storage features in the browser, or allowing provenance information to be stored with each triple.

var mydatastore = new MyCustomDataStore();

document.data.store = mydatastore;
Overriding the Data Parser

Developers may create and specify different parsers for parsing structured data from the document that builds upon RDFa, or parses other languages not related to RDFa. For example, Microformats-specific parsers could be created to extract structured hCard data and store it in an object that is compatible with the Data Store interface.

var hcardParser = new MyHCardParser();

document.data.parser = hcardParser;
Overriding the Data Query

The query mechanism for the API can be overridden to provide different or more powerful query mechanisms. For example, by replacing the standard query mechanism, developers could provide a full SPARQL query mechanism:

var sparqlQuery = new MySparqlEngine();
document.data.query = sparqlQuery;

var books = document.data.query.select("SELECT ?book ?title WHERE { ?book <http://purl.org/dc/elements/1.1/title> ?title . }");

4. The Interfaces Specification

The following section contains all of the interfaces that developers are expected to implement.

4.1 The RDF Interfaces

RDFa is a syntax for expressing the RDF Data Model [RDF-CONCEPTS] in Web documents. The RDFa DOM API is designed to extract the RDF Data Model from Web documents. The following RDF Resources are utilized in this specification: Plain Literals, Typed Literals, IRI References (as defined in [IRI]) and Blank Nodes. The interfaces for each of these RDF Resources are detailed in this section. The types as exposed by the RDFa DOM API conform to the same data and comparison restrictions as specified in the RDF concepts specification [RDF-CONCEPTS] and the [IRI] specification.

Each RDF interface provides access to both the extracted RDFa value and the DOM Node from which the value was extracted. This allows developers to extract and use RDF triples from a host language and also manipulate the DOM based on the associated DOM Node. For example, an agent could highlight all strings that are marked up as foaf:name properties in a Web document.

The basic RDF Resource types used in the RDFa DOM API are:

An RDFa DOM API implementer must provide the basic types as described in this specification. An implementer may provide additional types and/or a deeper type or class hierarchy that includes these basic types.

4.1.1 IRI References

An IRI Reference in the RDFa DOM API points to a resource and is further defined in [IRI].

[Constructor(in DOMString value), 
 Constructor(in DOMString value, in Node origin), 
 Stringifies=value]
interface IRI {
    readonly attribute DOMString value;
    readonly attribute Node      origin;
};
Attributes
origin of type Node, readonly
The node that specifies the IRI's value in the RDFa markup.
No exceptions.
value of type DOMString, readonly
The lexical representation of the IRI reference.
No exceptions.

4.1.2 RDF Literals

An RDF Literal is an RDF Resource that represents lexical values in RDFa data. The two RDF Literals provided via the RDFa DOM API are PlainLiterals and TypedLiterals. For a given RDF Literal, either language or type information can be provided. If the type is set, the RDF Literal is a Typed Literal. If a type is not set, it is a Plain Literal.

PlainLiteral
RDF Literals that may contain language information about the given text. The language is specified as a text string as specified in [BCP47] (e.g., 'en', 'fr', 'de').
TypedLiteral
RDF Literals that contain type information about the given text. The type is always specified in the form of an IRI Reference (e.g., http://www.w3.org/2001/XMLSchema#DateTime).
Plain Literals

PlainLiterals have a string value and may specify a language.

The RDFa Working Group is considering whether plain literals should express literal values as UTF-8, or whether the encoding of the source document should be used instead. This section assumes that the encoding from the source document should be used.

[Constructor(in DOMString value), 
 Constructor(in DOMString value, in DOMString language), 
 Constructor(in DOMString value, in DOMString language, in Node origin), 
 Stringifies=value]
interface PlainLiteral {
    readonly attribute DOMString value;
    readonly attribute DOMString language;
    readonly attribute Node      origin;
};
Attributes
language of type DOMString, readonly
A two character language string as defined in [BCP47], normalized to lowercase.
No exceptions.
origin of type Node, readonly
The first node in the DOM tree that is associated with this PlainLiteral.
No exceptions.
value of type DOMString, readonly
The lexical value of the literal encoded in the character encoding of the source document. The value is extracted from an RDFa document using the algorithm defined in the RDFa Core Specification [RDFA-CORE], Section 7.5: Sequence, Step 11.
No exceptions.
Example

This section is non-normative.

The following example demonstrates a few common use cases of the PlainLiteral type.

>> var literal = document.data.store.createPlainLiteral('Harry Potter and the Half-Blood Prince', 'en');
>> print(literal.toString());
Harry Potter and the Half-Blood Prince
>> print(literal.value);
Harry Potter and the Half-Blood Prince
>> print(literal.language);
en
Typed Literals

A TypedLiteral has a string value and a datatype specified as an IRI Reference. TypedLiterals can be converted into native language datatypes of the implementing programming language by registering a Typed Literal Converter as defined later in the specification.

The datatype's IRI reference specifies the datatype of the text value, e.g., xsd:DataTime or xsd:boolean.

The RDFa DOM API provides a method to explicitly convert TypedLiteral values to native datatypes supported by the host programming language. Developers may write their own Typed Literal Converters in order to convert an RDFLiteral into a native language type. The converters are registered by using the registerTypeConversion() method. Default TypedLiteral converters must be supported by the RDFa DOM API implementation for the following XML Schema datatypes:

  • xsd:string
  • xsd:boolean
  • xsd:float
  • xsd:double
  • xsd:boolean
  • xsd:integer
  • xsd:long
  • xsd:date
  • xsd:time
  • xsd:dateTime
[Constructor(in DOMString value, in IRI type), 
 Constructor(in DOMString value, in IRI type, in Node origin), 
 Stringifies=value]
interface TypedLiteral {
    readonly attribute DOMString value;
    readonly attribute IRI       type;
    readonly attribute Node      origin;
    Any valueOf ();
};
Attributes
origin of type Node, readonly
The first node in the DOM tree that is associated with this TypedLiteral.
No exceptions.
type of type IRI, readonly
A datatype identified by an IRI reference
No exceptions.
value of type DOMString, readonly
The lexical value of the literal encoded in the character encoding of the source document. The value is extracted from an RDFa document using the algorithm defined in the RDFa Core Specification [RDFA-CORE], Section 7.5: Sequence, Step 11.
No exceptions.
Methods
valueOf
Returns a native language representation of this literal. The type conversion should be performed by translating the value of the literal using the IRI reference of the datatype to the closest native datatype in the programming language.
No parameters.
No exceptions.
Return type: Any
Example

This section is non-normative.

The following example demonstrates how a TypedLiteral representing a date is automatically converted to JavaScript's native DateTime object.

>> var literal = document.data.store.createTypedLiteral('2010-12-24', "xsd:date");
>> print(literal.toString());
2010-12-24
>> print(literal.value);
2010-12-24
>> print(literal.valueOf());
Fri Dec 24 2010 00:00:00 GMT+0100

4.1.3 Blank Nodes

A BlankNode is an RDF resource that does not have a corresponding IRI reference, as defined in [RDF-CONCEPTS]. The value of a BlankNode is not required to be the same for identical documents that are parsed at different times. The purpose of a BlankNode is to ensure that RDF Resources in the same document can be compared for equivalence by ID.

The reasoning behind how we stringify BlankNodes should be explained in more detail.

BlankNodes are stringified by concatenating "_:" to BlankNode.value

[Constructor, Stringifies]
interface BlankNode {
    readonly attribute DOMString value;
};
Attributes
value of type DOMString, readonly
The temporary identifier of the BlankNode. The value must not be relied upon in any way between two separate RDFa processing runs of the same document.
No exceptions.

Developers and authors must not assume that the value of a Blank Node will remain the same between two processing runs. Blank Node values are only valid for the most recent processing run on the document. Blank Nodes values will often be generated differently by different RDFa Processors.

Example

This section is non-normative.

The following example demonstrates the use of a BlankNode in a JavaScript implementation that uses incrementing numbers for the identifier.

>> var bna = document.data.store.createBlankNode(); // create a new BlankNode A
>> var bnb = document.data.store.createBlankNode(); // create a new BlankNode B
>> print(bna.toString());              // Stringify BlankNode A
_:1
>> print(bna.value);                   //  print value of BlankNode A
1
>> print(bnb.value);                   //  print value of BlankNode B
2

4.1.4 RDF Triples

The RDFTriple interface represents an RDF triple as specified in [RDF-CONCEPTS]. RDFTriple can be used by referring to properties, such as subject, predicate, and object. RDFTriple can also be used by referring to pre-defined indexes. The stringification of RDFTriple results in an N-Triples-based representation as defined in [N3].

[Constructor(in IRI subject, in IRI predicate, in IRI object),
 Constructor(in IRI subject, in IRI predicate, in PlainLiteral object),
 Constructor(in IRI subject, in IRI predicate, in TypedLiteral object),
 Constructor(in IRI subject, in IRI predicate, in BlankNode object),
 Constructor(in BlankNode subject, in IRI predicate, in IRI object),
 Constructor(in BlankNode subject, in IRI predicate, in PlainLiteral object),
 Constructor(in BlankNode subject, in IRI predicate, in TypedLiteral object),
 Constructor(in BlankNode subject, in IRI predicate, in BlankNode object),
 Stringifies, Null=Null]
interface RDFTriple {
    readonly attribute Object subject;
    readonly attribute Object predicate;
    readonly attribute Object object;
};
Attributes
object of type Object, readonly
The object associated with the RDFTriple.
No exceptions.
predicate of type Object, readonly
The predicate associated with the RDFTriple.
No exceptions.
subject of type Object, readonly
The subject associated with the RDFTriple.
No exceptions.
Example

This section is non-normative.

The following examples demonstrate the basic usage of an RDFTriple object. The creation of the triple uses the foaf:name CURIE, which is transformed into an IRI. The example assumes that a mapping has already been created for foaf. For more information on creating RDFa DOM API mappings, see the section on IRI Mappings.

>> var triple = document.data.store.createTriple('http://www.example.com#foo', 'foaf:name', \
   document.data.store.createPlainLiteral('foo'));    //create a new RDFTriple
>> print(triple.subject);        // print the property subject of the RDFTriple
http://www.example.com#foo
>> print(triple.toString());     // stringify the RDFTriple
<http://www.example.com#foo> <http://xmlns.com/foaf/0.1/name> "foo" .

4.2 The Structured Data Interfaces

A number of convenience objects and methods are provided by the RDFa DOM API to help developers manipulate RDF Resources more easily when writing Web applications.

The basic RDF interface types described earlier in this document are utilized by the following Structured Data Interfaces:

4.2.1 Data Context

Processing RDF data involves the frequent use of unwieldy IRI references and frequent type conversion. The DataContext interface is provided in order to simplify contextual operations such as shortening IRIs and converting RDF data into native language datatypes.

It is assumed that this interface is created and available before a document is parsed for RDFa data. For example, while operating within a Browser Context, it is assumed that the following lines of code are executed before a developer has access to the RDFa DOM API methods on the document object:

document.data.context = new DataContext();
document.data.context.setMapping("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
document.data.context.setMapping("xsd", "http://www.w3.org/2001/XMLSchema-datatypes#");

All of the code that sets up the default type converters for Browser Contexts that use Javascript should probably be in the code snippet above.

The following interface allows IRI mappings to be easily created and used by Web Developers at run-time. It also allows for conversion of RDF data into native language datatypes.

interface DataContext {
    void setMapping (in DOMString prefix, in DOMString iri);
    void registerTypeConversion (in DOMString iri, in TypedLiteralConverter converter);
};
Methods
registerTypeConversion
Registers a type converter from given IRI datatype to a native language dataype in the current programming language.
ParameterTypeNullableOptionalDescription
iriDOMStringA string specifying the IRI datatype. The string may be a CURIE. For example: http://www.w3.org/2001/XMLSchema-datatypes#integer or xsd:integer.
converterTypedLiteralConverterA function that converts the TypedLiteral's value into a native language datatype in the current programming language.
No exceptions.
Return type: void
setMapping
Registers a mapping from a prefix to an IRI . The given IRI must be a full IRI. For example, if a developer wants to specify the foaf IRI mapping, they would call setMapping("foaf", "http://xmlns.com/foaf/0.1/"). Calling the setMapping() method with a prefix value that does not exist results in the creation of a new mapping. Calling the method with a null IRI value will remove the mapping.
ParameterTypeNullableOptionalDescription
prefixDOMStringThe prefix to put into the mapping as the key. (e.g., foaf)
iriDOMStringThe IRI reference to place into the mapping as the mapped value of the given prefix. (e.g., "http://xmlns.com/foaf/0.1/")
No exceptions.
Return type: void
Automatic Type Conversion

TypedLiteralConverter is a callable interface that transforms the value of a TypedLiteral into a native language type in the current programming language. The type IRI of the TypedLiteral is used to determine the best mapping to the native language type.

Specifing Typed Literal Converters
[Callback]
interface TypedLiteralConverter {
    Any convertType (in DOMString value);
};
Methods
convertType
Returns the native language value of the passed literal value as a language-native type. If the given value cannot be converted, the given value must be returned.
ParameterTypeNullableOptionalDescription
valueDOMStringThe value to convert that is associated with the RDFTypedLiteral.
No exceptions.
Return type: Any

IRI mappings for all terms in the following vocabularies must be included: rdf and xsd.

Easy IRI Mapping

All methods that accept CURIEs as arguments in the RDFa DOM API must use the algorithm specified in RDFa Core, Section 7.4: CURIE and URI Processing [RDFA-CORE] for TERMorCURIEorURI. The prefix and term mappings are provided by the current document.data.context instance.

Example

This section is non-normative.

The following examples demonstrate how mappings are created and used via the RDFa DOM API.

// create new IRI mapping
document.data.context.setMapping("foaf", "http://xmlns.com/foaf/0.1/");

// The new IRI mapping is automatically used when CURIEs are expanded to IRIs
var people = document.getItemsByType("foaf:Person");

In the example above, the CURIE "foaf:Person" is expanded to an IRI with the value "http://xmlns.com/foaf/0.1/Person" in the getItemsByType method.

Example

This section is non-normative.

The following example demonstrates how a developer could register and use a TypedLiteralConverter.

>> document.data.context.registerTypeConversion('xsd:boolean', function(value) {return new Boolean(value);});
>> var literal = document.data.store.createTypedLiteral('1', 'xsd:boolean');
>> print(literal.toString());
1
>> print(literal.value);
1
>> print(literal.valueOf());
true

4.2.2 Data Store

The DataStore is a set of RDFTriple objects. It provides a basic getter as well as an indexed getter for retrieving individual items from the store. The Data Store can be used to create primitive types as well as store collections of them in the form of RDFTriples.

The forEach method is not properly defined in WebIDL - need to get input from the WebApps Working Group on how best to author this interface.

interface DataStore {
    readonly attribute unsigned long size;
    [IndexGetter]
    Object       get (in unsigned long index);
    boolean      add (in RDFTriple triple);
    IRI          createIRI (in DOMString iri, in optional Node node);
    PlainLiteral createPlainLiteral (in DOMString value, in optional DOMString? language, in optional Node origin);
    TypedLiteral createTypedLiteral (in DOMString value, in DOMString type, in optional Node origin);
    BlankNode    createBlankNode (in optional DOMString name);
    RDFTriple    createTriple (in Object subject, in Object predicate, in Object object);
    [Null=Null]
    DataStore    filter (in Object? subject, in optional IRI? predicate, in optional Object? object, in optional Node? element, in optional RDFTripleFilter filter);
    void         clear ();
    void         forEach (in function callback);
};
Attributes
size of type unsigned long, readonly
A non-negative integer that specifies the size, in RDFTriples, of the store.
No exceptions.
Methods
add
Adds an RDFTriple to the Data Store. Returns True if the RDFTriple was added to the store successfully.
ParameterTypeNullableOptionalDescription
tripleRDFTripleThe triple to add to the Data Store.
No exceptions.
Return type: boolean
clear
Clears all data from the store.
No parameters.
No exceptions.
Return type: void
createBlankNode
Creates a Blank Node given an optional name value.
ParameterTypeNullableOptionalDescription
nameDOMStringThe name of the Blank Node, which will be used when Stringifying the Blank Node.
No exceptions.
Return type: BlankNode
createIRI
Creates an IRI given a value and an optional Node.
ParameterTypeNullableOptionalDescription
iriDOMStringThe IRI reference's lexical value.
nodeNodeAn optional DOM Node to associate with the IRI.
No exceptions.
Return type: IRI
createPlainLiteral
Creates a Plain Literal given a value, an optional language and an optional DOM origin Node.
ParameterTypeNullableOptionalDescription
valueDOMStringThe value of the IRI. The value can be either a full IRI or a CURIE.
languageDOMStringThe language that is associated with the Plain Literal encoded according to the rules outlined in [BCP47].
originNodeThe DOM Node that should be associated with the Plain Literal
No exceptions.
Return type: PlainLiteral
createTriple
Creates an RDF Triple given a subject, predicate and object. If any incoming value does not match the requirements listed below, a Null value must be returned by this method.
ParameterTypeNullableOptionalDescription
subjectObjectThe subject value of the RDF Triple. The value must be either an IRI or a BlankNode.
predicateObjectThe predicate value of the RDF Triple.
objectObjectThe object value of the RDF Triple. The value must be an IRI, PlainLiteral, TypedLiteral, or BlankNode.
No exceptions.
Return type: RDFTriple
createTypedLiteral
Creates a Typed Literal given a value, a type and an optional associated DOM Node.
ParameterTypeNullableOptionalDescription
valueDOMStringThe value of the Typed Literal.
typeDOMStringThe IRI type of the Typed Literal. The argument can either be a full IRI or a CURIE.
originNodeThe DOM Node to associate with the Typed Literal.
No exceptions.
Return type: TypedLiteral
filter
Returns an DataStore, which consists of zero or more RDFTriple objects.
ParameterTypeNullableOptionalDescription
subjectObjectThe subject filter pattern which is used to filter RDFTriple objects. Values can be of type IRI, BlankNode, or Null. If the subject is a non-Null value, the RDFTriple must not be placed in the final output Array unless the given subject matches the RDFTriple's subject. If subject is set to Null, the filter must not reject any triple based on the subject.
predicateIRIThe predicate filter pattern which is used to filter RDFTriple objects. Values can be of type IRI or Null. If the predicate is a non-Null value, the RDFTriple must not be placed in the final output Array unless the given predicate matches the RDFTriple's predicate. If predicate is set to Null, the filter must not reject any triple based on the predicate.
objectObjectThe object filter pattern which is used to filter RDFTriple objects. Values can be of type IRI, TypedLiteral, PlainLiteral, BlankNode, or Null. If the object is a non-Null value, the RDFTriple must not be placed in the final output Array unless the given object matches the RDFTriple's object. If object is set to Null, the filter must not reject any triple based on object.
elementNodeThe parent DOM Node where filtering should start. The implementation must only consider RDF triples on the current DOM Node and its children.
filterRDFTripleFilterA user defined function, returning a true or false value, that determines whether or not an RDFTriple should be added to the final Array.
No exceptions.
Return type: DataStore
forEach
Calls the given callback for each item in the Data Store.
ParameterTypeNullableOptionalDescription
callbackfunctionA function that takes the following arguments: index, subject, predicate, object. The function is called for each item in the Data Store.
No exceptions.
Return type: void
get
Returns the RDFTriple object at the given index in the list.
ParameterTypeNullableOptionalDescription
indexunsigned longThe index of the RDFTriple in the list to retrieve. The value must be a positive integer value greater than or equal to zero and less than DataStore::length.
No exceptions.
Return type: Object
Example

This section is non-normative.

The following examples demonstrate the three mechanisms that are available for navigating through a DataStore; index getter-based iteration, array index-based iteration, and callback-based/functional iteration.

>> var store = document.data.store.filter(); // select all RDF triples from the current Data Store
>> print(store.size);                        // print the number of triples in the Data Store
3
>> for(var i=0; i<store.size; i++) {   // loop through filtered RDF triples
     print(store.get(i));                // print RDF triple by using the explicit get method
     print(store[i]);                    // print RDF triple by using the indexed property method
   }
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "foo" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "foo" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "bar" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "bar" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "foobar" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "foobar" .
>> function printObject(index, subject, predicate, object) { print(object) } // specify a callback function
>> store.forEach(printObject);           // call the callback function for each item in the array
foo
bar
foobar

4.2.3 Data Parser

The Data Parser is capable of processing a DOM Node and placing the parsing results into a Data Store. While this section specifies how one would parse RDFa data and place it into a Data Store, the interface is also intended to support the parsing and storage of various Microformats, eRDF, GRDDL, DC-HTML, and Microdata. Web developers that would like to write customer parsers may extend this interface.

interface DataParser {
    attribute DataStore store;
    [Null=Null]
    DataIterator iterate (in Object? subject, in optional DOMString? predicate, in optional Object? object, in optional Node? element, in optional RDFTripleFilter filter);
    boolean      parse (in Element domElement);
};
Attributes
store of type DataStore
The DataStore that is associated with the DataParser. The results of each parsing run will be placed into the store.
No exceptions.
Methods
iterate
Returns an DataIterator, which is capable of iterating through a set of RDF triples, one RDFTriple at a time. The DataIterator is most useful in small memory footprint environments, or in documents that contain a very large number of triples.
ParameterTypeNullableOptionalDescription
subjectObjectThe subject filter pattern which is used to filter RDFTriple objects. Values can be of type IRI, BlankNode, or Null. If the subject is a non-Null value, the RDFTriple must not be output via the DataIterator unless the given subject matches the RDFTriple's subject. An IRI value must be either a full IRI or a CURIE. If subject is set to Null, the filter must not reject any triple based on the subject.
predicateDOMStringThe predicate filter pattern which is used to filter RDFTriple objects. Values can be of type IRI or Null. If the predicate is a non-Null value, the RDFTriple must not be output via the DataIterator unless the given predicate matches the RDFTriple's predicate. An IRI value must be either a full IRI or a CURIE. If predicate is set to Null, the filter must not reject any triple based on the predicate.
objectObjectThe object filter pattern which is used to filter RDFTriple objects. Values can be of type IRI, TypedLiteral, PlainLiteral, BlankNode, or Null. If the object is a non-Null value, the RDFTriple must not be output via the DataIterator unless the given object matches the RDFTriple's object. An IRI value must be either a full IRI or a CURIE. If object is set to Null, the filter must not reject any triple based on object.
elementNodeThe parent DOM Node where filtering should start. The implementation must only consider RDF Triples on the current DOM Node and its children.
filterRDFTripleFilterA user defined function, returning a true or false value, that determines whether or not an RDFTriple should be output via the DataIterator.
No exceptions.
Return type: DataIterator
parse
Parses starting at the given DOM Element and populates the store with the information that is discovered. If a starting element isn't specified, or the value of the starting element is Null, then the document object must be used as the starting element.
Even though a specific DOM Element can be specified to start the process of placing RDFTriples into the DataStore, the entire document must be processed by an RDFa Processor due to context that may affect the generation of a set of triples. Specifying the DOM Element is useful when a subset of the document data is to be stored in the Data Store.
There are two ways to approach this mechanism. The first is to only parse the sub-tree, ignoring the context of the greater document. The second is to parse the entire document, but only store triples that are a part of the sub-tree.
ParameterTypeNullableOptionalDescription
domElementElementThe DOM Element that should trigger triple generation.
No exceptions.
Return type: boolean

4.2.4 Data Iterator

The DataIterator interface will almost certainly see large changes in the next version of the RDFa API specification. Implementers are warned to not implement the interface and wait for the next revision of this specification.

The DataIterator iterates through a DOM subtree and returns RDFTriples that match a filter function or triple pattern. A DOM Node can be specified so that only triples contained in the Node and its children will be a part of the iteration. The DataIterator is provided in order to allow implementers to provide a less memory intensive implementation for processing triples in very large documents.

A DataIterator is created by calling the document.data.parser.iterate() method.

interface DataIterator {
             attribute DataStore       store;
    readonly attribute Node            root;
    readonly attribute RDFTripleFilter filter;
    readonly attribute RDFTriple       triplePattern;
    RDFTriple next ();
};
Attributes
filter of type RDFTripleFilter, readonly
The RDFTripleFilter is a function that is provided by developers to filter RDFTriples in a subtree.
No exceptions.
root of type Node, readonly
The DOM Node that was used as the starting point for extracting RDFTriples.
No exceptions.
store of type DataStore
The DataStore that is associated with the DataIterator.
No exceptions.
triplePattern of type RDFTriple, readonly
An RDF triple pattern is a set of filter parameters that can be passed to an RDFTripleFilter to match particular triple patterns.
No exceptions.
Methods
next
Returns the next RDFTriple object that is found in the DOM subtree or NULL if no more RDFTriples match the filtering criteria.
No parameters.
No exceptions.
Return type: RDFTriple
Example

This section is non-normative.

The following examples describe the how various filter patterns can be applied to the DOM via document.data.parser.iterate().

>> var iter = document.data.parser.iterate();    // iterate over all RDF triples from current Web document as an DataIterator
>> for(var triple = iter.next(); iter != null; triple = iter.next()) {      // step through all RDF triples
     print(triple);              // print current RDF triple
   }
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "foo" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "bar" .
<http://www.example.com#foo> <http://www.w3.org/2000/01/rdf-schema#label> "foobar" .
>> var iter2 = document.data.parser.iterate("http://www.example.com#foo"); // iterate over triples associated with a single subject
>> var iter3 = document.data.parser.iterate(null, "foaf:name"); // iterate over all triples that contain a foaf:name predicate

4.2.5 Property Group

The PropertyGroup interface provides a view on a particular subject contained in the Data Store. The PropertyGroup aggregates the RDFTriples as a single language-native object in order to provide a more natural programming primitive for developers.

PropertyGroup attributes can be accessed in the following ways in Javascript:

// creates a PropertyGroup for the given subject
var person = document.getItemsBySubject("#bob");

// Access the property group attribute via complete IRI
var name = person.get("http://xmlns.com/foaf/0.1/name");

// Access the property group attribute via CURIE
var name = person.get("foaf:name");
interface PropertyGroup {
    attribute Element       origin;
    attribute Sequence[IRI] properties;
    Sequence[any] get (in DOMString predicate);
};
Attributes
origin of type Element
The DOM Element that specified the first type definition for this PropertyGroup.
No exceptions.
properties of type Sequence[IRI]
A list of all attributes that are available via this PropertyGroup.
No exceptions.
Methods
get
Returns a sequence of IRIs, BlankNodes, PlainLiterals, and/or TypedLiterals in the projection that have a predicate IRI that is equivalent to the given value.
ParameterTypeNullableOptionalDescription
predicateDOMStringA stringified IRI representing a predicate whose values are to be retrieved from the PropertyGroup. For example, using a predicate of http://xmlns.com/foaf/0.1/name will return a sequence of values that represent FOAF names in the PropertyGroup. The given predicate may also be a CURIE.
No exceptions.
Return type: Sequence[any]
Example

This section is non-normative.

The following examples demonstrate how to use the PropertyGroup interface.

>> document.data.context.setMapping("foaf", "http://xmlns.com/foaf/0.1/");
>> var ivan = document.getItemBySubject("http://www.ivan-herman.net/foaf#me"); // extract information about this subject as an PropertyGroup
>> print(ivan.properties);                  // print all projected property names
http://xmlns.com/foaf/0.1/name, 
http://xmlns.com/foaf/0.1/title, 
http://xmlns.com/foaf/0.1/surname, 
http://xmlns.com/foaf/0.1/givenname, 
http://xmlns.com/foaf/0.1/workInfoHomepage
>> ivan.get("foaf:name");                   // named property values of foaf:name
["Herman Iván", "Ivan Herman"]
>> ivan.get("foaf:title");                  // named property values of foaf:title
["Dr."]
>> ivan.get("foaf:workInfoHomepage");       // named property values of foaf:workInfoHomepage
["http://www.iswsa.org/", "http://www.iw3c2.org", "http://www.w3.org/2001/sw/#activity"]
The Origin Pointer

document.getItemsByType() returns a list of Property Groups that match the query. By default there is always a property called origin on the Property Group and all of its properties, which refers back to an element in the source document.

The origin property allows programmers to manipulate DOM objects based on the embedded metadata that they refer to. For example, to set a thin blue border on each Person, we could do this:

var people = document.getItemsByType("foaf:Person");
 
for (var i = 0; i < people.length; i++) {
  people[i].origin.style.border = "1px solid blue";
}
Property Group Templates

A query can be used to retrieve not only basic Property Groups, but can also specify how Property Groups are built by utilizing Property Group Templates.

For example, assume our source document contains the following event, marked up using the Google Rich Snippet Event format (example taken from the Rich Snippet tutorial, and slightly modified):

<div vocab="http://rdf.data-vocabulary.org/#" typeof="Event">
  <a href="http://www.example.com/events/spinaltap" rel="v:url" 
     property="summary">Spinal Tap</a>
  <img src="spinal_tap.jpg" rel="v:photo" />
  <span property="description">After their highly-publicized search for a new drummer, 
  Spinal Tap kicks off their latest comeback tour with a San Francisco show. </span>
 
  When:
  <span property="startDate" content="20091015T1900Z">Oct 15, 7:00PM</span>—
  <span property="endDate" content="20091015T2100Z">9:00PM</span>
</div>

To query for all Event Property Groups we know that we can do this:

var ar = query.select({ "rdf:type": "http://rdf.data-vocabulary.org/#Event" });

However, to build a special PropertyGroup that contains the summary, start date and end date, we need only do this:

var events = query.select({ "rdf:type": "http://rdf.data-vocabulary.org/#Event" },
                          {"rdf:type" : "type", "v:summary": "summary", 
                           "v:startDate": "start", "v:endDate": "end"} );

The second parameter is a PropertyGroup Template. Each key-value pair specifies an IRI to map to an attribute in the resulting PropertyGroup object.

Exposing the embedded data in each Property Group makes it easy to create an HTML anchor that will allow users to add the event to their Google Calendar, as follows:

var anchor, button, i, pg;
 
for (i = 0; i < events.length; i++) {
  // Get the Property Group:
  pg = events[i];
 
  // Create the anchor
  anchor = document.createElement("a");
 
  // Point to Google Calendar
  anchor.href = "http://www.google.com/calendar/event?action=TEMPLATE"
    + "&text=" + pg.summary + "&dates=" + pg.start + "/" + pg.end;
 
  // Add the button
  button = document.createElement("img");
  button.src = "http://www.google.com/calendar/images/ext/gc_button6.gif";
  anchor.appendChild(button);
 
  // Add the link and button to the DOM object
  pg.origin.appendChild(anchor);
}

The result will be that the event has an HTML a element at the end (and any Event on the page will follow this pattern):

<div vocab="http://rdf.data-vocabulary.org/#" typeof="Event">
  .
  .
  .
  <a href="http://www.google.com/calendar/event?action=TEMPLATE&text=Spinal+Tap&dates=20091015T1900Z/20091015T2100Z">
    <img src="http://www.google.com/calendar/images/ext/gc_button6.gif" />
  </a>
</div>

For more detailed information about queries see the Data Query interface.

4.2.6 Data Query

The Data Query interface provides a means to query a Data Store. While this interface provides a simple mechanism for querying a Data Store for RDFa, it is expected that developers will implement other query interfaces that conform to this Data Query interface for languages like SPARQL or other Domain Specific Language.

interface DataQuery {
    attribute DataStore store;
    Sequence[PropertyGroup] select (in Object? query, in optional Object template);
};
Attributes
store of type DataStore
The DataStore that is associated with the DataQuery.
No exceptions.
Methods
select
Generates a sequence of Property Groups that matches the given selection criteria.
ParameterTypeNullableOptionalDescription
queryObjectAn associative array containing predicates as keys and objects to match as values. If the query is null, every item in the Data Store that the query is associated with must returned.
templateObjectA template describing the attributes to create in each Property Group that is returned. The template is an associative array containing predicates as keys and attribute names that should be created in the returned PropertyGroup as values.
No exceptions.

4.3 The Document Interface

The RDFa DOM API is designed to provide a small, powerful set of interfaces that a developer may use to retrieve RDF triples from a Web document. The core interfaces were described in the previous two sections. This section focuses on the final RDFa API that most developers will utilize to generate the objects that are described in the RDF Interfaces and the Structured Data Interfaces sections. The following API is provided by this specification:

4.3.1 Document Interface Extensions

The following section describes all of the extensions that are necessary to enable manipulation of structured data within a Web Document.

interface Document {
    readonly attribute DocumentData data;
    boolean           hasFeature (in DOMString feature);
    PropertyGroupList getItemsByType (in DOMString type);
    PropertyGroupList getItemBySubject (in DOMString subject);
    PropertyGroupList getItemsByProperty (in DOMString property, in DOMString value);
    NodeList          getElementsByType (in DOMString type);
    NodeList          getElementsBySubject (in DOMString subject);
    NodeList          getElementsByProperty (in DOMString property, in DOMString value);
};
Attributes
data of type DocumentData, readonly
The DocumentData interface is useful for extracting and storing data that is associated with the Document.
No exceptions.
Methods
getElementsByProperty
Retrieves a list of Nodes objects based on the value of a given property.
ParameterTypeNullableOptionalDescription
propertyDOMStringA DOMString representing an IRI-based property. The string can either be a full IRI or a CURIE. If the string is a CURIE, the DataContext will be used to resolve the value.
valueDOMStringA DOMString representing the value to match against.
No exceptions.
Return type: NodeList
getElementsBySubject
Retrieves a NodeList consisting of Nodes that have explicitly specified the given subject.
ParameterTypeNullableOptionalDescription
subjectDOMStringA DOMString representing an IRI-based subject. The string can either be a full IRI or a CURIE. If the string is a CURIE, the DataContext will be used to resolve the value.
No exceptions.
Return type: NodeList
getElementsByType
Retrieves a list of Nodes based on the object type of the data that they specify.
ParameterTypeNullableOptionalDescription
typeDOMStringA DOMString representing an rdf:type to select against.
No exceptions.
Return type: NodeList
getItemBySubject
Retrieves a PropertyGroup object based on its subject.
ParameterTypeNullableOptionalDescription
subjectDOMStringA DOMString representing an IRI-based subject. The string can either be a full IRI or a CURIE. If the string is a CURIE, the DataContext will be used to resolve the value.
No exceptions.
Return type: PropertyGroupList
getItemsByProperty
Retrieves a list of PropertyGroup objects based on the values of a property.
ParameterTypeNullableOptionalDescription
propertyDOMStringA DOMString representing an IRI-based property. The string can either be a full IRI or a CURIE. If the string is a CURIE, the DataContext will be used to resolve the value.
valueDOMStringA DOMString representing the value to match against.
No exceptions.
Return type: PropertyGroupList
getItemsByType
Retrieves a list of PropertyGroup objects based on their rdf:type property.
ParameterTypeNullableOptionalDescription
typeDOMStringA DOMString representing an rdf:type to select against.
No exceptions.
Return type: PropertyGroupList
hasFeature
Checks to see whether or not the document object has the RDFa DOM API feature.
ParameterTypeNullableOptionalDescription
featureDOMStringThe feature string to use when checking to see if the Document interface has the RDFa DOM API interfaces. This value should be "rdfa 1.1".
No exceptions.
Return type: boolean

4.3.2 Document Data

The DocumentData interface is used to create structured-data related context, storage, parsing and query objects.

interface DocumentData {
    readonly attribute DataStore   store;
    readonly attribute DataContext context;
    readonly attribute DataParser  parser;
    readonly attribute DataQuery   query;
    DataContext createContext ();
    DataStore   createStore (in DOMString type);
    DataParser  createParser (in DOMString type, in DataStore store);
    DataQuery   createQuery (in DataStore store);
};
Attributes
context of type DataContext, readonly
The default DataContext for the document.
No exceptions.
parser of type DataParser, readonly
The default DataParser for the document.
No exceptions.
query of type DataQuery, readonly
The default DataQuery for the document.
No exceptions.
store of type DataStore, readonly
The default DataStore for the document.
No exceptions.
Methods
createContext
Creates a DataContext and returns it.
No parameters.
No exceptions.
Return type: DataContext
createParser
Creates a DataParser of the given type and returns it.
ParameterTypeNullableOptionalDescription
typeDOMStringThe type of DataParser to create. For example: "rdfa".
storeDataStoreThe DataStore to associate with the DataParser.
No exceptions.
Return type: DataParser
createQuery
Creates a DataQuery for the given store.
ParameterTypeNullableOptionalDescription
storeDataStoreThe DataStore to associate with the DataQuery.
No exceptions.
Return type: DataQuery
createStore
Creates a DataStore of the given type and returns it.
ParameterTypeNullableOptionalDescription
typeDOMStringThe type of DataStore to create.
No exceptions.
Return type: DataStore

4.3.3 Pattern Filters

An important goal of the RDFa DOM API is to help Web developers filter the set of RDF triples in a document down to only the ones that interest them. This section covers pattern-based filters. Pattern filters trigger off of one or more of the subject, predicate, or object properties in RDF triples. This section also introduces the interfaces for the other filter types.

Function Filters

Filter criteria may also be defined by the developer as a filter function. The RDFTripleFilter is a callable function that determines whether an RDFTriple should be included in the set of output triples.

[Callback, Null=Null]
interface RDFTripleFilter {
    boolean match (in RDFTriple triple);
};
Methods
match
A callable function that returns true if the input RDFTriple should be included in the output set, or false if the input RDFTriple should be rejected from the output set.
ParameterTypeNullableOptionalDescription
tripleRDFTripleThe triple to test against the filter.
No exceptions.
Return type: boolean
Example

This section is non-normative.

The examples below use the following HTML code:

<div id="start" about="http://dbpedia.org/resource/Albert_Einstein">

  <span property="foaf:name">Albert Einstein</span>

  <span property="dbp:dateOfBirth" datatype="xsd:date">1879-03-14</span>
  <div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/Germany">
    <span class="hilite"><span property="dbp:conventionalLongName">Federal Republic of Germany</span></span>
  </div>
</div>

The parser.iterate() interface will almost certainly see large changes in the next version of the RDFa API specification. Implementers are warned to not implement the interface and wait for the next revision of this specification.

The following examples demonstrate the use of document.data.store.filter() and document.data.parser.iterate() in JavaScript.

>> var myFilter = new function(element, subject, predicate, object) {         // create a filter function that filters 
     if(subject.value.search(/http:\/\/xmlns.com\/foaf\/0\.1/) >= 0) {        // triples with predicates in the foaf namespace.
         return true;
     } 
   }
   
// start filtering at element with id=start
>> var store = document.data.store.filter(null, null, null, document.getElementById("start"), myFilter);
>> store.forEach(function (triple) {
      print(triple)
   }
<http://dbpedia.org/resource/Albert_Einstein> <http://xmlns.com/foaf/0.1/name> "Albert Einstein"


// start filtering by iterating at element with id=start
>> var iter = document.data.parser.iterate(null, null, null, document.getElementById("start"), myFilter);          
>> for(var triple=iter.next(); triple != null; triple = iter.next()) {
      print(triple)
   }
<http://dbpedia.org/resource/Albert_Einstein> <http://xmlns.com/foaf/0.1/name> "Albert Einstein"

5. The Initialization Process

The RDFa DOM API must be initialized before the Web developer has access to any of the methods that are defined in this specification. To initialize the API environment in a Browser-based environment, an implementor must do the following:

  1. create a default Store object, which will hold information obtained from parsing;
  2. create a defaul Parser object, passing it a pointer to a store;
  3. initiate parsing, to extract information from some object -- usually a DOM object -- and place it into the store;
  4. create a default Query object which can be used to interrogate the information placed in the store;

Some platforms may merge one or more of these steps as a convenience to developers. For example, a browser that supports this API may carry out the first four steps when a document loads, and then expose a Query interface to allow developers to access the Property Groups. Some approaches to this will be discussed in the next section, but before we look at those, we'll give a brief overview of how each of these phases would normally be accomplished.

5.1 Creating the Data Store

To create a store the createStore method is called:

document.data.store = document.data.createStore();

The store object created supports the Store interfaces providing methods to add metadata to the store. These methods are used during parsing to populate the store but they can also be used directly to add additional information. Examples of this are shown later.

5.2 Creating the Data Parser

Once a store has been created, the implementor should create a default parser:

document.data.parser = document.data.createParser("rdfa", store);

Note that an implementation may support many types of parser, so the specific parser required needs to be specified. For example, an implementation may also support a Microformats hCard parser:

var parser = document.data.createParser("hCard", store);

Implementations may also support different versions of a parser, for example:

var parser = document.data.createParser("rdfa1.0", store);
var parser = document.data.createParser("rdfa1.1", store);

Probably should have a URI to identify parsers rather than a string, since not only are there many different Microformats, but also, people may end up wanting to add parsers for RDF/XML, different varieties of JSON, and so on. However, if we treat the parameter here as a CURIE, then we can avoid having long strings. If we do that, then the version number would need to be elided with the language type: "rdfa1.0", "rdfa1.1", and so on.

5.3 Parsing the DOM

Once we have a parser, we can use it to extract information from sources that contain embedded data. In the following example we extract data from the Document object:

parser.parse( document );

Since the parser is connected to a store, the Property Groups obtained from processing the document are now available in the variable document.data.store.

A store can be used more than once for parsing. For example, if we wanted to apply an hCard Microformat parser to the same document, and put the extracted data into the same store, we could do this:

var store = document.data.createStore();
 
document.data.createParser("rdfa", store).parse();
document.data.createParser("hCard", store).parse();

The store will now contain Property Groups from the RDFa parsing, as well as Property Groups from the hCard parsing.

(If the developer wishes to reuse the store but clear it first, then the Store.clear() method can be used.)

Diagram: Show the connection between a Property Group and the DOM.

5.4 Creating the Data Query

Query objects are used to interrogate stores and obtain a list of DOM objects that are linked to Property Groups. Since there are a number of languages and techniques that can be used to express queries, we need to specify the type of query object that we'd like:

var query = document.data.createQuery("rdfa", store);

6. Future Discussion

This section is non-normative.

The current version of the RDFa DOM API focuses on filtering RDF triples. It also provides methods for filtering DOM Nodes that contain certain types of RDF triples.

The RDFa Working Group is currently discussing whether or not to include the following advanced functionality:

A. Acknowledgements

At the time of publication, the members of the RDFa Working Group were:

B. References

B.1 Normative references

[BCP47]
A. Phillips, M. Davis. Tags for Identifying Languages September 2009. IETF Best Current Practice. URL: http://tools.ietf.org/rfc/bcp/bcp47.txt
[HTML-RDFA]
Manu Sporny; et al. HTML+RDFa 04 March 2010. W3C Working Draft. URL: http://www.w3.org/TR/rdfa-in-html/
[IRI]
M. Duerst, M. Suignard. Internationalized Resource Identifiers (IRI). January 2005. Internet RFC 3987. URL: http://www.ietf.org/rfc/rfc3987.txt
[N3]
Tim Berners-Lee; Dan Connolly. Notation3 (N3): A readable RDF syntax. 14 January 2008. W3C Team Submission. URL: http://www.w3.org/TeamSubmission/2008/SUBM-n3-20080114/
[RDF-CONCEPTS]
Graham Klyne; Jeremy J. Carroll. Resource Description Framework (RDF): Concepts and Abstract Syntax. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RDFA-CORE]
Shane McCarron; et al. RDFa Core 1.1: Syntax and processing rules for embedding RDF through attributes.22 April 2010. W3C Working Draft. URL: http://www.w3.org/TR/2010/WD-rdfa-core-20100422
[XHTML-RDFA]
Shane McCarron; et. al. XHTML+RDFa 1.1. 22 April 2010. W3C Working Draft. URL: http://www.w3.org/TR/WD-xhtml-rdfa-20100422

B.2 Informative references

[DOM-LEVEL-1]
Vidur Apparao; et al. Document Object Model (DOM) Level 1. 1 October 1998. W3C Recommendation. URL: http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/
[ECMA-262]
ECMAScript Language Specification, Third Edition. December 1999. URL: http://www.ecma-international.org/publications/standards/Ecma-262.htm
[MICROFORMATS]
Microformats. URL: http://microformats.org
[RDFA-PRIMER]
Mark Birbeck; Ben Adida. RDFa Primer. 14 October 2008. W3C Note. URL: http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014
[RDFA-SYNTAX]
Ben Adida, et al. RDFa in XHTML: Syntax and Processing. 14 October 2008. W3C Recommendation. URL: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014
[SVGTINY12]
Scott Hayman; et al. Scalable Vector Graphics (SVG) Tiny 1.2 Specification. 22 December 2008. W3C Recommendation. URL: http://www.w3.org/TR/2008/REC-SVGTiny12-20081222
[TURTLE]
David Beckett, Tim Berners-Lee. Turtle: Terse RDF Triple Language January 2008. W3C Team Submission. URL: http://www.w3.org/TeamSubmission/turtle/
[WEBIDL]
Cameron McCormack. Web IDL. 19 December 2008. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2008/WD-WebIDL-20081219