$Date: 2013-03-01 15:54:47 $
The content of the vocabulary prefixes, to be included in the RDFa 1.1 Default Profile, is defined based on the general usage of those vocabularies on the Semantic Web. This general usage is established using search crawl data, courtesy of Sindice and of Yahoo!. This page describes the methodology used during crawls as well as the possible post-processing steps.
The methodology used for both the Sindice and the Yahoo! cases were essentially the same, namely:
The most complex and possibly controversial step is 2.2 above. Here are the categories of vocabularies that were removed from the result set:
The rules used for the Sindice and the Yahoo cases, respectively, are available for download. The final results of the two crawls and subsequent processing are also available; see the Sindice and the Yahoo! pages for further details.
Both crawl results have a relatively natural cut-off point for the vocabularies that should or should not be considered for a default profile, taking also into account that the number of default prefixes should not be very high (in the range of 10, considering the fact that the list might grow as time goes by). For Yahoo! the cut value of 10 seems to be a natural choice. It is slightly less clear for the Sindice case, though; at present, the vaue of 12 has been used.
However, the two data sets should be considered together; an entry from one dataset that scores very low on the other should not be added. Based on this, the following algorithm is used:
This means that the number of final entries is under max(S,Y). (The python script executing the merge is also available.) The current results are:
|Vocabulary URI||Effective Second Level Domains in the Yahoo! dataset||Effective Second Level Domains in the Sindice dataset|
This list has been included in the RDFa 1.1 Default
Profile (also available in Turtle and RDF/XML). In most of the cases the prefixes are
well known and widely used (e.g.,
foaf); in other cases the prefix.cc
service was used to establish the default prefix (e.g.,
ctag for the http://commontag.org/ns# vocabulary.)