What are the most widely used vocabularies in RDF(a) (a.k.a. default RDFa 1.1 vocabulary prefixes)?

I have already blogged on the concept of the RDFa 1.1 core default profile in a
blog I published a few months ago
when the second
Last Call
of the RDFa 1.1 draft was published. This default
profile is automatically included in any RDFa 1.1 content by an
RDFa 1.1 processor (conceptually, that is; a processor would
probably cache the content of this profile). The profile itself
defines prefixes for a number of RDF vocabularies, This means
that, for example, the following HTML+RDFa file:

<html>
<body>
<p about ="xsd:maxExclusive" rel="rdf:type" resource="owl:DatatypeProperty">
An OWL Axiom: "xsd:maxExclusive" is a Datatype Property in OWL.
</p>
</body>
</html>

(note the missing prefix declarations!) will produce the RDF
triple that one might expect, i.e.,

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
xsd:maxExclusive rdf:type owl:DatatypeProperty .

The major question is, of course, (1) what vocabularies are
included into such a default profile and (2) how often would that
list change?

The answer to the second question is easier: a profile should not
change often. Once every 6 months, maybe, or even less frequently,
so that caching by the processors would be effective (some
processors may even choose to copy, verbatim, the prefix
definitions into their code). There is also a rule
in the conformance section of the RDFa 1.1 document stating that
the content of a profile can only grow; once a vocabulary is
included, it should stay there (otherwise existing content will
change, and that is not acceptable).

The content issue is more complicated. There again, there is a
set of vocabularies whose inclusion is fairly straightforward: all
vocabularies published as part of a W3C Recommendation or Note
should be part of the list. After all, these vocabularies have
undergone a rigorous and general review by the community, as
secured by the W3C process. (Ie, the example above should be
fine.) However, that cannot be all: the real advantage of having a
default profile is to include at least some of the widely used
vocabularies. (After all, the really interesting examples are not
like the one above but, rather, the general RDFa snippets like the
ones I described in another, separate
blog
.) So here is the real question: what are the, say, 8-10
vocabularies, widely used on the Semantic Web, general in their
topic, and used in an RDFa application?

As already described in my
earlier blog
, there are several approaches that one could
take. One is to go through some sort of a registration mechanism
like, for example, the W3C xpointer
registry
. However, that would not necessarily reflect the
widespread usage of the vocabularies; after all, vocabulary owners
may not want to go through the extra step of the registration
process. After some discussions, the Working Group decided to try
a different route, namely to use information that search engines
can provide on vocabularies. And I am happy to report that this
has proven to be possible thanks to Péter Mika, from Yahoo!, and Giovanni Tummarello
and his friends, from the Sindice
team. Both search engines have performed a crawl over several
billions of triples, collected the vocabulary URI-s, sorted them;
finally, the top results were merged into a set of default
profile prefixes
. We deliberately chose to be very
restrictive in the numbers, yielding 11 default prefixes beyond
the W3C ones. Indeed, one has to take into account that new
vocabularies will come up in future and, if they appear on the top
of the lists for new crawls in, say, a year from now they will be
added to the list. In other words, the list may grow; better stay
small at the beginning. Of course, there are a number of technical
details on how this list has been generated, how the crawl results
were processed, etc.; these are all documented
on the W3C site in case you are interested by the details.

So, if the list becomes final (we still anticipate comments and
feedbacks before freezing it), it is possible to write something
like:

<div typeof="v:Review">
<span property="v:itemreviewed">L’Amourita Pizza</span>
Reviewed by
<span property="v:reviewer">Ulysses Grant</span> on
<span property="v:dtreviewed" content="2009-01-06">Jan 6</span>.
<span property="v:summary">Delicious, tasty pizza on Eastlake!</span>
<span property="v:description">L'Amourita serves up traditional wood-fired
Neapolitan-style pizza, brought to your table promptly and without fuss.
An ideal neighborhood pizza joint.</span>
Rating:
<span property="v:rating">4.5</span>.
Address: <span property="vcard:street-address">111 Lake Drive</span>, <span property="vcard:locality">WonderCity</span>, <span property="vcard:postal-code">5555</span>, <span property="vcard:country-name">Australia</span>.
<address
</div>

without the need to specify the Google snippet or the vcard
vocabularies; they are just there!

Of course, further thoughts, comments, etc, are very welcome!

About Ivan Herman

Ivan Herman is the leader of the Digital Publishing Activity at W3C. For more details, see http://www.w3.org/People/Ivan/

Comments are closed.