The long journey to RDFa 1.1…

RDFa 1.1 Core, RDFa 1.1 Lite, and XHTML+RDFa 1.1 have just been published as Web Standards, i.e., W3C Recommendations, accompanied by a new edition of the RDFa Primer. Although it is “merely” and update of the previous RDFa 1.0 standard (published in 2008), it is a significant milestone nevertheless. RDFa 1.1 has restructured RDFa 1.0 in terms of the host languages it can be used with, and has also added some important features.

It has been a long journey. The development of RDFa (and I include RDFa 1.0 in this) was slowed down more by “social” rather than technical issues. Indeed, RDFa is at the crossroad of two different communitites which, alas!, had very little interaction before. As its name suggests, RDFa is of course closely related to RDF, i.e., to the communites related to the Semantic Web, Linked Data, RDF, etc. On the other hand, the very goal of RDFa is to add structured data to markup languages (primarily the HTML family, of course, but also SVG, Atom, etc.). This means that RDFa is also relevant to all these communities, often loosely referred to as the “Web Application” community. The interaction between these communities was not always easy, and was often characterized by misunderstandings, different engineering patterns, different concerns. To make things even more difficult, RDFa was also caught in the middle of the XHTML2 vs. HTML5 controversy: after all, the first drafts of RDFa were developed alongside XTHML2 and, although the current RDFa has long moved away from this heritage, the image of being part of XHTML2 stayed.

But all this is behind us now, and should be relegated to history. In my view the result, RDFa 1.1, reflects a good balance between the concerns and usage patterns of these communities; and that is what really counts. RDFa 1.1 allows the usage of prefixed abbreviation for URIs (so called CURIEs) that the RDF community had been using and got used to for many years, but (in contrast to RDFa 1.0) its usage is now optional: authors may choose to use full URIs wherever and whenever they wish. By the way, prefixes for CURIEs are not defined through the @xmlns mechanism inherited from XML (this was probably the single biggest stumbling block around RDFa 1.0): instead, the usage of @xmlns is deprecated in favour of a dedicated @prefix attribute. Finally, a number of well-known vocabularies have predefined prefixes; authors are not required to define prefixes for, say, the Dublin Core, FOAF, Schema.org, or Facebook’s Open Graph Protocol terms; they are automatically recognized. Finally, beyond these facilities with prefixed terms, RDFa 1.1 authors also have the possibility to define a vocabulary for a markup fragment (via the @vocab attribute) and forget about URIs and prefixes altogether: simple terms in property names or types will authomatically be assigned URIs in that vocabulary. This is particularly important when RDFa is used with a single vocabulary (Schema.org or OGP usage comes to mind again).

The behaviour of @property has been made richer, which means that in many (most?) situations the structured data can be expressed with @property alone, without the usage of @rel or @rev (although the usage of these latter is still possible). This increased simplicity is important for authors who are new to this world and may not, initially, grasp the difference between the classical usage of @propery (i.e., literal objects) and @rel (i.e., URI References as objects). (Unfortunately, this change has created some corner-case backward incompatibilities with RDFa 1.0.)

There are also some other, though maybe less significant, improvements. For example, authors can also express (RDF) lists succintly; this means that RDFa 1.1 can be used to describe, e.g., author lists for an article (where order counts a lot) or an OWL vocabulary. Also, an awkwardness in RDFa 1.0, related to XML Literals, have been removed.

The structure of RDFa has also changed. Whereas the definition of RDFa 1.0 was closely intertwined with XHTML, RDFa 1.1 separates the core definition from what it calls “Host Languages”. This means that RDFa is defined in a way that it can be adapted to all types of XML languages as well as HTML5. There are separate specifications on how RDFa 1.1 applies to XHTML1 and for HTML5, as well as for XML in general; this means that RDFa 1.1 can also be used with SVG, Atom, or MathML, because those languages automatically inherit from the XML definitions.

Last but not least: the Working Group has also defined a separate “subset” language, called RDFa 1.1 Lite. This is not a separate RDFa 1.1 dialect, just an authoring subset of RDFa 1.1: an authoring subset that makes it easy for authors to step into this world easily, without being forced to use all the possibilities of RDFa 1.1 (i.e., RDF). It can be expected that a large percentage of RDFa usage can be covered by this subset, but it would also provide a good stepping stone when more complex structures (mixture of many different vocabularies, datatypes, more complex graph structures, etc) are required.

As I said, it has been a long journey. Many people were involved in the work, both in the Working Group but also through comments coming from the public and from major potential users. But now that the result is there, I can safely say: it was worth the effort. Recent figures on the adoption of structured data on the Web (see, for example the reports published at the LDOW 2012 Workshop recently by Peter Mika and Tim Potter, as well as by Hannes Mühleisen and Christian Bizer) can be summarized by a simple statement: structured data in Web pages is now mainstream, thanks to its adoption by search engines (i.e., Schema.org) or companies like Facebook. And RDFa 1.1 has a major role to play in this evolution.

If you are new to RDFa: the RDFa Primer is of course a good starting point, but it is well worth checking out (and possibly contribute to!) the rdfa.info web site which contains references to tools, documents; you can also try out small RDFa snippets. Enjoy!

(Blog reproduced from my private site.)