From W3C Wiki

SWEO Community Project: Linking Open Data on the Semantic Web

Statistics on Data sets

This page collects statistics on Data sets that are available as Linked Data.

Please note: This page is outdated

For keeping the LOD cloud diagram up to date, the Linking Open Data community effort has started to collect meta-information about Linked datasets on CKAN, a registry of open data and content packages provided by the Open Knowledge Foundation.

The meta-information from CKAN (and not from this page) is used to draw the LOD cloud diagram and to maintain the statistics about the size of the Web of Linked Data on the EWS LOD frontpage.

The list of Linked Dataset for which we have already collected meta-information on CKAN is found here:

LOD dataset list

Basic statistics about these datasets are provided at:

LOD basic statistics

A guide on how to describe your dataset on CKAN is found here:

LOD CKAN Guidlines

So, if you publish a dataset yourself or if you know detailed statistics for datasets that you use, please add them to CKAN and they will be included into the next revision of the LOD cloud.

If you want to analize the LOD cloud, please not that the meta-information about the datasets is available via the CKAN API.

Historic Version of this page

  • "Wrapper" denotes any "data set" which is not available as an RDF Dump, for which size cannot be accurately assessed, or for which triples are dynamically produced.
  • Note -- some links (e.g., those for RAE 2001 and "R e s e x") have been percent-escaped to get around spam-blocks in the ESW Wiki configuration. De-escaping these should allow loading, if they fail when simply clicked in this table.
Data set Size of the data set (number of triples) Wrapper? endpoint?
ACM (RKB) 12,644,052 N Y
Airport Data 754,585 N Y
AudioScrobbler 600,000,000 Y Y
BBC John Peel 277,000 N Y
BBC Music >10,000,000 N N
BBC Playcount Data 10,000 N Y
BBC Programmes 10,000,000 N Y
Budapest BME (RKB) 42,064 N Y
Bio2RDF:Affymetrix 45,560,115 N Y
Bio2RDF:BioCYC 18,699,622 N Y
Bio2RDF:EBI:ChEBI 7,376,253 N Y
Bio2RDF:DBpedia 190,790 N Y
Bio2RDF:GO 8,188,649 N Y
Bio2RDF:HGNC 1,208,802 N Y
Bio2RDF:KEGG:Compound 177,199 N Y
Bio2RDF:KEGG:Drug 116,822 N Y
Bio2RDF:KEGG:Enzyme 556,888 N Y
Bio2RDF:KEGG:Glycan 94,148 N Y
Bio2RDF:KEGG:Pathway 50,793,314 N Y
Bio2RDF:KEGG:Reaction 110,971 N Y
Bio2RDF:InterPro 534,077 N Y
Bio2RDF:iProClass 191,608,264 N Y
Bio2RDF:MGI 1,860,731 N Y
Bio2RDF:NCBI:GeneID 173,132,553 N Y
Bio2RDF:NCBI:HomoloGene 6,598,206 N Y
Bio2RDF:NCBI:OMIM 750,528 N Y
Bio2RDF:NCBI:PubChem 234,427 N Y
Bio2RDF:NCBI:PubMed 797,000,000 N Y
Bio2RDF:NCBI:UniSTS 3,020,035 N Y
Bio2RDF:OBO 4,507,016 N Y
Bio2RDF:Pathway Commons 28,052,098 N Y
Bio2RDF:PDB 918,736 N Y
Bio2RDF:Pfam 534,077 N Y
Bio2RDF:ProDom 534,077 N Y
Bio2RDF:PROSITE 534,077 N Y
Bio2RDF:Reactome 1,231,166 N Y
Bio2RDF:SGD 1,437,648 N Y
Bio2RDF:UniProt:Enzyme 36,109 N Y
Bio2RDF:UniProt:Taxonomy 3,876,099 N Y
Bio2RDF:UniProt:UniParc 490,000,000 N Y
Bio2RDF:UniProt:UniProtKB 242,000,000 N Y
Bio2RDF:UniProt:UniRef 338,602,962 N Y
CiteSeer (RKB) 8,294,523 N Y
CrunchBase 955,676 Y Y
Daily Med 170,000 N Y
DBLP (RKB) 21,461,906 N Y
data-gov 6,400,000,000 Y
data-gov wiki 5,074,932,510 Y
data.gov.uk 50,000,000 Y
DBLP Berlin 28,000,000
DBLP Hannover 30,000,000
DBpedia 409,000,000 N Y
Discogs >6,000,000 N Y
Diseasome 110,000 N Y
Doapspace 100,000
Drug Bank 1,200,000
ECS Southampton (RKB) 307,772 N Y
eprints (RKB) 2,975,291 N Y
Eurécom (RKB) 40,705 N Y
Eurostat 5,000
Flickrexporter 10,000,000
flickrwrappr 100,000 Y Y
FOAFprofiles 60,000,000
Freebase 100,000,000 N Y
Geonames 93,900,000
GeoSpecies 1,758,624 N Y
GovTrack 1,012,000
Guardian MP data 290,005 N Y
IBM (RKB) 44,466 N Y
IEEE (RKB) 111,442 N Y
IRIT Toulouse (RKB) 175,555 N Y
Jamendo 1,100,000
LAAS CNRS (RKB) 51,940 N Y
Lexvo.org 280,000 N N
LIBRIS 5,000,000
lingvoj 21,000+ N N
LinkedCT 9,809,330 N Y
LinkedGeoData 3,000,000,000 N N
LinkedMDB 6,148,121 N Y
Linked Periodicals 405,949 N Y
Magnatune 322,000 N Y
Moseley Festival 2006-09 1,268 N Y
Musicbrainz 60,000,000 N Y
MySpaceWrapper 12,500,000 Y Y
Nasa spaceflight data 96,940 N Y
Newcastle (RKB) 87,394 N Y
Open Archive Initiative (RKB) 216,428,311 N Y
OpenCalais 4,500,000 N Y
OpenCyc 1,600,000 N N
OpenGuides 10,000
Pisa (RKB) 42,927 N Y
ProductDb 1,700,000 N Y
Project Gutenberg 10,000
Pub Guide 5,000
QDOS 5,000
RAE 2001 (RKB) 2,716,252 N Y
RAMEAU 1,677,568 N N
RDF Book Mashup 100,000,000
RDFohloh 700,000 N Y
RDF-TCM 117,643 N Y
R e s e x (RKB) 46,663 N Y
Revyu 20,000
riese 3,000,000
SemanticWeb.org 50,000
SemWebCentral 5,000
SIDER 160,688 Y
SIOCprofiles 50,000
STW Thesaurus for Economics 107,000 N Y
SurgeRadio 218,000
SWConferenceCorpus, aka Dog Food Server 78,289 N Y
Telegraphis Data (Capitals) 2,584 Y Y
Telegraphis Data (Continents) 126 Y Y
Telegraphis Data (Countries) 8,592 Y Y
Telegraphis Data (Currencies) 2,598 Y Y
TripFS N/A Y Y
UMBEL 100,000
US Census Data 1,000,000,000 N Y
Virtuoso Sponger N/A (RDFizes any Web Resource) Y Y
W3CWordNet 710,000
Wikicompany 10,000
World Factbook 40,000
Yago 100,000 N Y
Yovisto - Academic Videos 3,661,151 N Y
Total 19,562,409,691