Combine the Web of Data and the Web of Documents (RDFa and Drupal 7) 8 November 2010, Shanghai

Tutorial description

Ever since the start of the Semantic Web developments, one of the issues was how to make various types of data available on the Semantic Web for, eg, further integration. Technically, this means making the data available in RDF. One approach is to encode the RDF data in one of its serialization formats, ie, RDF/XML or Turtle, but that approach does not really scale. Interfaces to databases are being developed that can, for example, provide on-the-fly conversion of data into RDF, often via SPARQL endpoints. Automatic or semi-automatic conversions exist for a number of other formats. In general it has been recognized that one should not look for one specific approach; rather, different types of data on the Web require their own, data-specific way of expressing their content in RDF.

One of the obvious source of data on the Web is, of course, hypertext, ie, (X)HTML and, more generally, XML dialects like SVG or ODF. HTML is not only used by end-users to produce Web pages; in fact, most of the HTML pages visible through browsers are produced by various types of back-end processes, primarily CMS systems. The information itself, displayed through HTML, is often just a reflection of a database content. HTML is, therefore, a natural vehicle to incorporate the data that, ultimately, can be processed by the Semantic Web.

The (traditional) Web community also realized that it is beneficial to reveal more of the underlying information than what can simply be displayed by a browser for, essentially, human consumption. The Web 2.0 phenomenon brought a more interactive and participative Web to the fore that often relies on additional information, data, etc. At the moment, getting to such data is complicated and often involves “scraping”, ie, an ad-hoc interpretation of various web sites. Obviously, this is not a long-term solution.

This dual evolution (on the Semantic and on the more “traditional” Web) saw the emergence of a common approach (though with different techniques), namely that of (X)HTML pages including additional information adding some sort of a “meta” information to what is being displayed. This additional information is added to the (X)HTML content as elements or attributes, providing an additional information to the text. If that extra attribute is based on some general conventions, then specialized processors, reading that (X)HTML file and analyzing its structure, can automatically extract those extra information.

RDFa

RDFa [RDFA], defined by the World Wide Web Consortium, provides such a general approach. RDFa defines its own set of attributes that are used alongside some of the XHTML ones. Through the usage of these attributes, and their attached processing, an XML or an HTML file can be regarded as a full serialization of RDF. This, for Semantic Web experts, sounds very simple but, as usual, the devil is in the details: the definition of the attributes and their usage patterns must work well with the requirement of traditional HTML or XML authoring.

RDFa has received a significant attention the past few years. Since the publication of the first version of the W3C Recommendation, in 2008, major sites, like BestBuy or Tesco, have adopted RDFa as part of their portal pages; RDFa has also become an important tool for the publication of Open Governmental Data in the US, UK, and elsewhere. This interest was triggered by the fact that both Google and Yahoo! "understand" RDFa and make it part (though through different approaches) of their core operation. Lately Facebook has also announced the usage of RDFa as part of their Open Graph Protocol.

Based on the experiences of these first few years of usage, as well as the ongoing discussions with the HTML5 group on how to incorporate RDFa into HTML5, a new Working Group has been recently formed at W3C to define an extension of RDFa (referred to as RDFa 1.1) that provides additional facilities to authors in general. This tutorial will cover both the published recommendation for RDFa and the latest results and plans defined for RDFa 1.1.

The slides for the RDFa section are available for download (in PDF format)

Drupal 7

As referred to before, Web pages are rarely authored directly, but they are rather generated via CMS systems. Among those, Drupal has helped many organizations to build their online presence and publish content on the Web. Started in 2001, this Open Source publishing platform has gained a lot of momentum and is now powering many renowned sites for organizations such as the White House and the World Bank, to only name a few. Developed by a growing community of tens of thousands developers and contributors, Drupal has always had a strong commitment to keeping up with the latest developments in Web standards such as XHTML, CSS, and ECMAscript, and has recently incorporated RDFa into its core engine. With an estimated half million websites running on it, Drupal is ranked among the three three CMS platforms in use today [OSCMS]. Drupal's decision to follow the latest RDFa standard will ensure a sustainable future for the data contained in those sites, offering site administrators and editors easy and seamless integration of their content into the Semantic Web.

Drupal is a tool for both developers and non-developers alike. One major focus in the Drupal 7 release cycle was usability, particularly for site administrators. Because of this focus on usability, Drupal is a good tool to quickly get started with RDFa. Drupal 7's internal data structures are mapped to RDF by default, thereby allowing data which was previously locked in data silo governed by obscure schemas to now be open to the Web of Data world. In addition, Drupal 7's theme layer---responsible for rendering the HTML elements---is able to inject RDFa markup in the Web pages it generates in order to annotate each data item with the appropriate RDF information. Fields exposed by this RDFa are title, date of publication, author information, main content, as well as the eventual comments on the page.

Other site specific information can also be easily exported, including tags or other data items the site administrator adds to the site. For example, an event page could contain RDFa mark-up for the location, date of the event, organizers and participants. Drupal's user interface allows this sort of fine tuned data structure to be defined by the site administrator. It also enables each element of this structure to be mapped to RDF through the same administrative user interface. If needed, developers can also leverage the Drupal RDF mapping API to create their own RDFa markup via custom modules, or add SPARQL modules to make the data available through SPARQL queries.

By lowering the barriers to entry to RDFa and allowing more people to publish RDFa content online, Drupal will help bootstrap the Semantic Web and provide more real world data on the emerging Web of Data [ISWC2009].

The slides for the Drupal 7section are available for download (in PDF format)

The goals of this tutorial are:

Speakers

Ivan Herman

Photo of Ivan Herman

Ivan Herman is the Semantic Web Activity Lead at the World Wide Web Consortium; in this capacity, he supervises all the Semantic Web related groups at the W3C, including the work on RDFa. Beyond this general supervisory role he has also been an active member of some of the technical Working Groups, including OWL, SPARQL, and indeed the RDFa Working Group. He has also developed one of the largely used RDFa processors (“RDFa Distiller”, nicknamed pyRdfa) which also has an experimental version that closely follows the development of RDFa 1.1.

Ivan has given Semantic Web Tutorials at various conferences and Workshops, including the WWW2008 conference in Beijing, China, or the Chinese Semantic Web Conference in Nanjing, in 2009. In 2010 he will give an introductory Semantic Web tutorial at the SemTech conference in San Francisco, USA.

For more details on Ivan Herman see http://www.w3.org/People/Ivan/.

Stéphane Corlosquet

Photo of Stéphane Corlosquet

Stéphane Corlosquet has been the main driving force in incorporating Semantic Web capabilities into the Drupal CMS. His RDF CCK and evoc contributed modules to Drupal 6 have naturally evolved to be accepted as standard within the core of the upcoming Drupal 7. He co-authored the ISWC 2009 Best Semantic Web In Use Paper titled "Produce and Consume Linked Data with Drupal!". Stéphane recently finished an M.Sc. in Semantic Web at the Digital Enterprise Research Institute (DERI), Ireland and joined MassGeneral Institute for Neurodegenerative Disease (MIND), MGH as a Software Engineer to work on the Science Collaboration Framework, a Drupal-based distribution to build online communities of researchers in biomedicine.

Stéphane has been a speaker at many Drupal conferences, mostly on the topic of RDF and Drupal. He also spoke at the ISWC2009 conference in Chantilly, VA. In 2010 he will give a tutorial on RDF and Drupal at the SemTech conference in San Francisco, USA.

For more details on Stéphane Corlosquet see http://openspring.net/.

Lin Clark

Photo of Lin Clark

Lin Clark is a Master's student at the Digital Enterprise Research Institute of NUI Galway, Ireland, where she is researching the integration of Semantic Web technologies in Content Management Systems. She is an active member of the Drupal community, a Drupal core contributor, and was a contributor to the RDF in Drupal 7 initiative. She is also experienced in information architecture for large organizations; a site she planned won the 2009 Association of Computing Machinery (ACM) SIGUCCS Best Of Category award for Computing Services Public Web Site.

Lin has spoken about RDF and Drupal at many Drupal and Linked Data events. In 2010, she will give a tutorial on RDF and Drupal at the SemTech conference in San Francisco, USA.

For more details on Lin Clark see http://lin-clark.com/.

Bibliography

[RDFA]
Adida B., Birbeck M., McCarron S., Pemberton S., (Eds.): "RDFa in XHTML: Syntax and Processing", W3C Recommendation (2008).
 
[RDFA1.1 Core]
Adida B., Birbeck M., McCarron S., Herman I., (Eds.): "RDFa 1.1 Core: Syntax and processing rules for embedding RDF through attributes", W3C Working Draft (2010).
 
[OSCMS]
Water & Stone: "Open Source CMS Market Share".
[ISWC2009]
Corlosquet S., Delbru R., Clark T., Polleres A., and Decker S.: "Produce and consume linked data with Drupal!", ISWC2009 Conference Proceedings, pp. 763-778. (2009).

Valid XHTML + RDFa