W3C

What are the most widely used vocabularies in RDF(a) (a.k.a. default RDFa 1.1 vocabulary prefixes)?

I have already blogged on the concept of the RDFa 1.1 core default profile in a blog I published a few months ago when the second Last Call of the RDFa 1.1 draft was published. This default profile is automatically included in any RDFa 1.1 content by an RDFa 1.1 processor (conceptually, that is; a processor would probably cache the content of this profile). The profile itself defines prefixes for a number of RDF vocabularies, This means that, for example, the following HTML+RDFa file:

<html>
<body>
<p about ="xsd:maxExclusive" rel="rdf:type" resource="owl:DatatypeProperty">
An OWL Axiom: "xsd:maxExclusive" is a Datatype Property in OWL.
</p>
</body>
</html>

(note the missing prefix declarations!) will produce the RDF triple that one might expect, i.e.,

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
xsd:maxExclusive rdf:type owl:DatatypeProperty .

The major question is, of course, (1) what vocabularies are included into such a default profile and (2) how often would that list change?

The answer to the second question is easier: a profile should not change often. Once every 6 months, maybe, or even less frequently, so that caching by the processors would be effective (some processors may even choose to copy, verbatim, the prefix definitions into their code). There is also a rule in the conformance section of the RDFa 1.1 document stating that the content of a profile can only grow; once a vocabulary is included, it should stay there (otherwise existing content will change, and that is not acceptable).

The content issue is more complicated. There again, there is a set of vocabularies whose inclusion is fairly straightforward: all vocabularies published as part of a W3C Recommendation or Note should be part of the list. After all, these vocabularies have undergone a rigorous and general review by the community, as secured by the W3C process. (Ie, the example above should be fine.) However, that cannot be all: the real advantage of having a default profile is to include at least some of the widely used vocabularies. (After all, the really interesting examples are not like the one above but, rather, the general RDFa snippets like the ones I described in another, separate blog.) So here is the real question: what are the, say, 8-10 vocabularies, widely used on the Semantic Web, general in their topic, and used in an RDFa application?

As already described in my earlier blog, there are several approaches that one could take. One is to go through some sort of a registration mechanism like, for example, the W3C xpointer registry. However, that would not necessarily reflect the widespread usage of the vocabularies; after all, vocabulary owners may not want to go through the extra step of the registration process. After some discussions, the Working Group decided to try a different route, namely to use information that search engines can provide on vocabularies. And I am happy to report that this has proven to be possible thanks to Péter Mika, from Yahoo!, and Giovanni Tummarello and his friends, from the Sindice team. Both search engines have performed a crawl over several billions of triples, collected the vocabulary URI-s, sorted them; finally, the top results were merged into a set of default profile prefixes. We deliberately chose to be very restrictive in the numbers, yielding 11 default prefixes beyond the W3C ones. Indeed, one has to take into account that new vocabularies will come up in future and, if they appear on the top of the lists for new crawls in, say, a year from now they will be added to the list. In other words, the list may grow; better stay small at the beginning. Of course, there are a number of technical details on how this list has been generated, how the crawl results were processed, etc.; these are all documented on the W3C site in case you are interested by the details.

So, if the list becomes final (we still anticipate comments and feedbacks before freezing it), it is possible to write something like:

<div typeof="v:Review">
<span property="v:itemreviewed">L’Amourita Pizza</span>
Reviewed by
<span property="v:reviewer">Ulysses Grant</span> on
<span property="v:dtreviewed" content="2009-01-06">Jan 6</span>.
<span property="v:summary">Delicious, tasty pizza on Eastlake!</span>
<span property="v:description">L'Amourita serves up traditional wood-fired
Neapolitan-style pizza, brought to your table promptly and without fuss.
An ideal neighborhood pizza joint.</span>
Rating:
<span property="v:rating">4.5</span>.
Address: <span property="vcard:street-address">111 Lake Drive</span>, <span property="vcard:locality">WonderCity</span>, <span property="vcard:postal-code">5555</span>, <span property="vcard:country-name">Australia</span>.
<address
</div>

without the need to specify the Google snippet or the vcard vocabularies; they are just there!

Of course, further thoughts, comments, etc, are very welcome!