On various RDF API-s…

The World Wide Web Consortium’s RDF Web Applications
Working Group
has published the first draft of the RDF Interfaces specification. This is a companion
specification to the upcoming RDF API and the recently published RDFa API.
Until recently the group publishing this specification was known
as the “The RDFa Working Group”. After a need was identified for a
common set of programming APIs for working with structured data on
the Web, the RDFa Working Group was re-created as the RDF Web
Applications Working Group. This article explains how each of the
specifications that this group is producing work together to
create a common Semantic Web publishing and development
environment.

The Semantic Web has gained significant traction in the past few
years. The buzz around this year’s Semantic Technologies
conference, SemTech
2011
, is a sign of the rapid growth of the Semantic Web. The
amount of RDF data published on the Web is steadily growing
thanks, for example, to the Linked Open Data movement, the
eGovernmental initiatives, or the integration RDFa into content
management systems and popular Web destinations. However, if one
looks at the applications that make use of that RDF data, most of
them can be characterized as “server-side”. The bulk of the work
performed crawling, extracting, and processing the data is done
behind the scenes. The Web browser is merely displaying
information that has been created elsewhere. That is,
the domain of the Semantic Web has not reached the
world of “Web Applications”. Semantic Web applications running
within the browser and written almost exclusively in Javascript
are still few and far between. Web Applications can be seen as
operating in a very different programming environment. Developers
have their own development styles, architectures, and distinct
communities. The Web Application development community, in
general, strives for a much greater simplicity and a lower
barrier-to-entry than client-side programmers and developers.

Various Working Groups at W3C, as well as developers and groups
around the world, have been surveying this structured data
landscape. In many ways, the development of RDFa
and the successful deployment thereof, was the first step in this
new world. It is also not a coincidence that there is an
updated version of RDFa
being finalized, taking into account
the needs and desires of the general Web development community.
The development of microformats
and of microdata,
though not closely bound to RDF, are also part of this landscape.
There have also been passionate discussions on the pros and cons
of a JSON serialization of RDF in W3C’s recently formed RDF Working
Group
. These discussions are still ongoing — see the Task Force’s
Wiki page
for further details. All of these communities were
involved in the dialog and identification of a core missing
component for Web Applications bound to the Semantic Web — an API
to access RDF as well as structured data in general.

This leads to the inevitable question: What type of API should be
defined? The Working Group has discussed this question for a long
time. Should the API hide the complexities of RDF or not? Should
it focus on people that know RDF deeply or on people that don’t
know or care about RDF? After several attempts to answer each of
these question, the group decided that a layered approach made the
most sense. That is, we must recognize that there are several
communities and we should provide for each of them, but in a way
that builds into a unified whole. Hence the layered approach to
the design of the RDF structured data APIs:

  1. The RDF Interfaces. A need was identified for
    a low-level API to expose RDF data to Javascript. It does not
    contain any new concepts or abstractions. It provides a
    straight-forward interface to RDF that those familiar with RDF
    will be comfortable with using. This is the lowest level in the
    stack, is called the RDF Interfaces
    specification, and is the document that was just published by
    the W3C.
  2. The RDF API. Once the basic layer is in
    place, one can envisage different libraries building on top of
    the RDF Interfaces. For example, a library for accessing SPARQL
    endpoints in JavaScript, much like Lee
    Feigenbaum’s sparql.js library
    . Other libraries could be
    built to handle SPARQL CONSTRUCT queries or, in the future,
    SPARQL 1.1 UPDATE. It is not the goal of the RDFWA WG to cover
    all possible libraries; innovation is best left to the
    communities across the Web. The RDFWA WG’s job is to build a
    basic framework where this innovation can happen. To provide a
    starting point, a simple API called the RDF API
    is currently under development. The goal is to provide an easy
    first step for Web Applications developers that want to mash up
    exisiting RDF data on the Web without having to dive too deeply
    into advanced concepts like RDF modeling, inferencing, and the
    other more complex aspects of RDF. Conceptually, the RDF API is
    based on the RDF Interfaces specification. Practically,
    developers only need to use the RDF API and can safely ignore
    the lower-level RDF Interfaces if they do not have the time or
    inclination to study RDF in depth.
  3. The RDFa API. Since this journey started with
    RDFa, and the initial set of requests was for an RDFa API, one
    is provided for Web Application developers. While the RDF API is
    a simple entry point for general mash-up applications, the
    structured data expressed in RDFa-family languages like HTML,
    SVG, or EPUB require Document Object Model (DOM) features that
    do not fit nicely into the RDF API. For example, accessing DOM
    nodes that contain a specific RDFa-encoded subject or predicate.
    This functionality and a set of simplified interfaces are
    provided by the RDFa
    API
    .

The rest of the article briefly touches on each of these layers.
For those that want to learn more, the original drafts provide a
more thorough introduction to each layer. While each document is
in the draft stage at the W3C, the RDFWA Working Group does not
expect large design changes in the coming months.

1. The RDF Interfaces layer

The goal of the RDF Interfaces specification is to provide
programmatic access to the core of RDF. That is, it contains
interfaces for Triples, Nodes identified by URIs, Blank Nodes, and
Literals. There are also interfaces for creating and managing
Graphs, including the ability to add and remove triples and merge
graphs. The general concept of an RDF parser is expressed, leaving
implementations to create the obvious RDF/XML, Turtle and RDFa
parsers as well as the less obvious parsers for Microdata or an
RDF conversion of well-known Microformats.

In general, the RDF Interfaces layer is fairly similar to
existing, widely used RDF libraries like RDFLib and Jena. However,
it is not the goal of the RDF Interfaces specification
to replace existing RDF libraries; instead, the goal is to provide
something similar for Javascript (and, possibly, other Web
programming languages that do not have a common interface yet).
This goal, i.e., of being optimized for JavaScript does have
influence on the definition of the interfaces. For example,
JavaScript is extremely flexible in the creation of closures and
anonymous callback functions. This flexibility is embraced as an
advantage and used in Web Applications. As a consequence, the RDF
Interfaces specification has methods on the Graph interface that
can receive anonymous functions. For example, the forEach
method execute a piece of code on each triple in a Graph. There is
also functionality that can automatically execute a
developer-supplied function every time a new triple is added to a
Graph. The forEach method is modelled after the
method with a similar name defined for JavaScript’s Array object.
The example below demonstrates how the RDF Interfaces
specification can be used to parse a TURTLE document and operate
on each triple in the extracted graph:

graph = turtleparser.parse("http://www.example.org/turtle.ttl");
graph.addAll(rdfxmlparser.parse("http://www.example.org/turtle.rdf"));
graph.forEach(function(triple) {
// Code to process each triple
});

Data can also be removed from the graph. For example, to remove
triples whose object is the literal “Ivan”:

// You can write iterative code like the following
tripleArray = graph.toArray();
for(var i = tripleArray.length - 1; i >= 0; i--) {
if(tripleArray[i].object.nominalValue == "Ivan" ) {
graph.remove(tripleArray[i])
}
}
// You can also write something more JavaScript-like
graph.match(null, null, rdf.createLiteral("Ivan")).forEach(function(triple) {
graph.remove(triple);
});
// Finally, there is a far more compact style for this specific operation,
// as offered by the Graph interface. Note also that the createLiteral("Ivan")
// can be replaced by a simple string
graph.removeMatches(null, null, "Ivan");

2. The RDF API layer

The goal of the RDF API layer is to provide an easy way to create
Web Applications that utilize RDF as mash-up data without
requiring the developer to understand the details of RDF. The
central concept in this layer is the Projection. A
Projection bundles a set of triples together that share a common
subject and makes it easy to get to the structured data using
properties as keys. For example, using the Friend-of-a-Friend
vocabulary, one could write the following:

// "ivan" is a Projection that has a specific URI as subject, which we retrieve
// by using a query interface defined in the RDF API.
// Retrieve all of the properties (aka: predicates) associated with Ivan.
props = ivan.getProperties();
// Get the foaf:name for the object, which returns the string "Ivan Herman":
name  = ivan.get("http://xmlns.com/foaf/0.1/name");
// Retrieve the foaf:homepage for the object, which will return
// the string "http://www.ivan-herman.net/"
homepage = ivan.get("http://xmlns.com/foaf/0.1/homepage"); 

Note how this interfaces hides the difference between a URI and a
Literal. This is intentional: for Web Application developers this
differentiation is sometimes hard to understand and to follow.
While RDF application programmers may care about this distinction,
the RDF API level of abstraction does not. If the difference
between Literals, URIs and other native RDF constructs is
important to your application, use the RDF Interfaces layer.

The question of how we get to a Projection still remains. The RDF
API layer has another abstraction called DocumentData,
which holds a single Graph containing structured data in the
document. Using this object, a Projection can be retrieved. For
example:

// Get Ivan’s Friend-of-a-Friend data
ivanFoaf = data.parse(turtleparser, "http://www.ivan-herman.net/foaf.ttl");
// Ivan’s URI in the foaf file, the one that is labelled as a foaf:Person
ivan = ivanFoaf.getProjection("http://www.ivan-herman.net/foaf#me");
// An alternative is to retrieve the Projection by using the foaf:Person
// This code assumes that there is only one foaf:Person in the graph
ivan = ivanFoaf.getProjections("http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://xmlns.com/foaf/0.1/Person")[0];

While these examples use absolute URIs to access data, the RDF
Interfaces, RDF API and RDFa API allow you to use Compact URI
Expressions (or CURIEs) to simplify your code. The APIs allow the
developer to manage CURIE prefixes and terms.

3. The RDFa API layer

This layer is provided to give access to the RDFa information in
a document and reuses most of the concepts from the RDF API, such
as Projections and DocumentData. This layer also defines several
extensions to the main Document
interface of the DOM
. Methods like getElementsBySubject
are provided to query for DOM Nodes containing RDFa data.

When it comes to specifications, the devil is in the details. The
three drafts described in this article are starting to stabilize.
There will be several more publications of drafts, gathering of
feedback from various Web communities, addressing public comments,
and implementation feedback. However, the RDFWA Working Group has
provided a clear indication of the general direction and placed a
stake in the ground. It is now up to the Web community to provide
feedback and guide these specifications toward a common Semantic
Web development environment that will be useful to all Web
Application developers. If you would like to provide feedback on
the specifications, please send comments to the official RDFWA
Working Group mailing list
.

(By Ivan Herman and Manu Sporny)

About Ivan Herman

Ivan Herman is the leader of the Digital Publishing Activity at W3C. For more details, see http://www.w3.org/People/Ivan/

Comments are closed.