Statistics on Data sets
This page collects statistics on Data sets that are available as Linked Data.
Please note: This page is outdated
For keeping the LOD cloud diagram up to date, the Linking Open Data community effort has started to collect meta-information about Linked datasets on CKAN, a registry of open data and content packages provided by the Open Knowledge Foundation.
The meta-information from CKAN (and not from this page) is used to draw the LOD cloud diagram and to maintain the statistics about the size of the Web of Linked Data on the EWS LOD frontpage.
The list of Linked Dataset for which we have already collected meta-information on CKAN is found here:
LOD dataset list
Basic statistics about these datasets are provided at:
LOD basic statistics
A guide on how to describe your dataset on CKAN is found here:
LOD CKAN Guidlines
So, if you publish a dataset yourself or if you know detailed statistics for datasets that you use, please add them to CKAN and they will be included into the next revision of the LOD cloud.
If you want to analize the LOD cloud, please not that the meta-information about the datasets is available via the CKAN API.
Historic Version of this page
- "Wrapper" denotes any "data set" which is not available as an RDF Dump, for which size cannot be accurately assessed, or for which triples are dynamically produced.
- Note -- some links (e.g., those for RAE 2001 and "R e s e x") have been percent-escaped to get around spam-blocks in the ESW Wiki configuration. De-escaping these should allow loading, if they fail when simply clicked in this table.
Data set
|
Size of the data set (number of triples)
|
Wrapper?
|
endpoint?
|
ACM (RKB)
|
12,644,052
|
N
|
Y
|
Airport Data
|
754,585
|
N
|
Y
|
AudioScrobbler
|
600,000,000
|
Y
|
Y
|
BBC John Peel
|
277,000
|
N
|
Y
|
BBC Music
|
>10,000,000
|
N
|
N
|
BBC Playcount Data
|
10,000
|
N
|
Y
|
BBC Programmes
|
10,000,000
|
N
|
Y
|
Budapest BME (RKB)
|
42,064
|
N
|
Y
|
Bio2RDF:Affymetrix
|
45,560,115
|
N
|
Y
|
Bio2RDF:BioCYC
|
18,699,622
|
N
|
Y
|
Bio2RDF:EBI:ChEBI
|
7,376,253
|
N
|
Y
|
Bio2RDF:DBpedia
|
190,790
|
N
|
Y
|
Bio2RDF:GO
|
8,188,649
|
N
|
Y
|
Bio2RDF:HGNC
|
1,208,802
|
N
|
Y
|
Bio2RDF:KEGG:Compound
|
177,199
|
N
|
Y
|
Bio2RDF:KEGG:Drug
|
116,822
|
N
|
Y
|
Bio2RDF:KEGG:Enzyme
|
556,888
|
N
|
Y
|
Bio2RDF:KEGG:Glycan
|
94,148
|
N
|
Y
|
Bio2RDF:KEGG:Pathway
|
50,793,314
|
N
|
Y
|
Bio2RDF:KEGG:Reaction
|
110,971
|
N
|
Y
|
Bio2RDF:InterPro
|
534,077
|
N
|
Y
|
Bio2RDF:iProClass
|
191,608,264
|
N
|
Y
|
Bio2RDF:MGI
|
1,860,731
|
N
|
Y
|
Bio2RDF:NCBI:GeneID
|
173,132,553
|
N
|
Y
|
Bio2RDF:NCBI:HomoloGene
|
6,598,206
|
N
|
Y
|
Bio2RDF:NCBI:OMIM
|
750,528
|
N
|
Y
|
Bio2RDF:NCBI:PubChem
|
234,427
|
N
|
Y
|
Bio2RDF:NCBI:PubMed
|
797,000,000
|
N
|
Y
|
Bio2RDF:NCBI:UniSTS
|
3,020,035
|
N
|
Y
|
Bio2RDF:OBO
|
4,507,016
|
N
|
Y
|
Bio2RDF:Pathway Commons
|
28,052,098
|
N
|
Y
|
Bio2RDF:PDB
|
918,736
|
N
|
Y
|
Bio2RDF:Pfam
|
534,077
|
N
|
Y
|
Bio2RDF:ProDom
|
534,077
|
N
|
Y
|
Bio2RDF:PROSITE
|
534,077
|
N
|
Y
|
Bio2RDF:Reactome
|
1,231,166
|
N
|
Y
|
Bio2RDF:SGD
|
1,437,648
|
N
|
Y
|
Bio2RDF:UniProt:Enzyme
|
36,109
|
N
|
Y
|
Bio2RDF:UniProt:Taxonomy
|
3,876,099
|
N
|
Y
|
Bio2RDF:UniProt:UniParc
|
490,000,000
|
N
|
Y
|
Bio2RDF:UniProt:UniProtKB
|
242,000,000
|
N
|
Y
|
Bio2RDF:UniProt:UniRef
|
338,602,962
|
N
|
Y
|
CiteSeer (RKB)
|
8,294,523
|
N
|
Y
|
CrunchBase
|
955,676
|
Y
|
Y
|
Daily Med
|
170,000
|
N
|
Y
|
DBLP (RKB)
|
21,461,906
|
N
|
Y
|
data-gov
|
6,400,000,000
|
Y
|
|
data-gov wiki
|
5,074,932,510
|
Y
|
|
data.gov.uk
|
50,000,000
|
Y
|
|
DBLP Berlin
|
28,000,000
|
|
|
DBLP Hannover
|
30,000,000
|
|
|
DBpedia
|
409,000,000
|
N
|
Y
|
Discogs
|
>6,000,000
|
N
|
Y
|
Diseasome
|
110,000
|
N
|
Y
|
Doapspace
|
100,000
|
|
|
Drug Bank
|
1,200,000
|
|
|
ECS Southampton (RKB)
|
307,772
|
N
|
Y
|
eprints (RKB)
|
2,975,291
|
N
|
Y
|
Eurécom (RKB)
|
40,705
|
N
|
Y
|
Eurostat
|
5,000
|
|
|
Flickrexporter
|
10,000,000
|
|
|
flickrwrappr
|
100,000
|
Y
|
Y
|
FOAFprofiles
|
60,000,000
|
|
|
Freebase
|
100,000,000
|
N
|
Y
|
Geonames
|
93,900,000
|
|
|
GeoSpecies
|
1,758,624
|
N
|
Y
|
GovTrack
|
1,012,000
|
|
|
Guardian MP data
|
290,005
|
N
|
Y
|
IBM (RKB)
|
44,466
|
N
|
Y
|
IEEE (RKB)
|
111,442
|
N
|
Y
|
IRIT Toulouse (RKB)
|
175,555
|
N
|
Y
|
Jamendo
|
1,100,000
|
|
|
LAAS CNRS (RKB)
|
51,940
|
N
|
Y
|
Lexvo.org
|
280,000
|
N
|
N
|
LIBRIS
|
5,000,000
|
|
|
lingvoj
|
21,000+
|
N
|
N
|
LinkedCT
|
9,809,330
|
N
|
Y
|
LinkedGeoData
|
3,000,000,000
|
N
|
N
|
LinkedMDB
|
6,148,121
|
N
|
Y
|
Linked Periodicals
|
405,949
|
N
|
Y
|
Magnatune
|
322,000
|
N
|
Y
|
Moseley Festival 2006-09
|
1,268
|
N
|
Y
|
Musicbrainz
|
60,000,000
|
N
|
Y
|
MySpaceWrapper
|
12,500,000
|
Y
|
Y
|
Nasa spaceflight data
|
96,940
|
N
|
Y
|
Newcastle (RKB)
|
87,394
|
N
|
Y
|
Open Archive Initiative (RKB)
|
216,428,311
|
N
|
Y
|
OpenCalais
|
4,500,000
|
N
|
Y
|
OpenCyc
|
1,600,000
|
N
|
N
|
OpenGuides
|
10,000
|
|
|
Pisa (RKB)
|
42,927
|
N
|
Y
|
ProductDb
|
1,700,000
|
N
|
Y
|
Project Gutenberg
|
10,000
|
|
|
Pub Guide
|
5,000
|
|
|
QDOS
|
5,000
|
|
|
RAE 2001 (RKB)
|
2,716,252
|
N
|
Y
|
RAMEAU
|
1,677,568
|
N
|
N
|
RDF Book Mashup
|
100,000,000
|
|
|
RDFohloh
|
700,000
|
N
|
Y
|
RDF-TCM
|
117,643
|
N
|
Y
|
R e s e x (RKB)
|
46,663
|
N
|
Y
|
Revyu
|
20,000
|
|
|
riese
|
3,000,000
|
|
|
SemanticWeb.org
|
50,000
|
|
|
SemWebCentral
|
5,000
|
|
|
SIDER
|
160,688
|
|
Y
|
SIOCprofiles
|
50,000
|
|
|
STW Thesaurus for Economics
|
107,000
|
N
|
Y
|
SurgeRadio
|
218,000
|
|
|
SWConferenceCorpus, aka Dog Food Server
|
78,289
|
N
|
Y
|
Telegraphis Data (Capitals)
|
2,584
|
Y
|
Y
|
Telegraphis Data (Continents)
|
126
|
Y
|
Y
|
Telegraphis Data (Countries)
|
8,592
|
Y
|
Y
|
Telegraphis Data (Currencies)
|
2,598
|
Y
|
Y
|
TripFS
|
N/A
|
Y
|
Y
|
UMBEL
|
100,000
|
|
|
US Census Data
|
1,000,000,000
|
N
|
Y
|
Virtuoso Sponger
|
N/A (RDFizes any Web Resource)
|
Y
|
Y
|
W3CWordNet
|
710,000
|
|
|
Wikicompany
|
10,000
|
|
|
World Factbook
|
40,000
|
|
|
Yago
|
100,000
|
N
|
Y
|
Yovisto - Academic Videos
|
3,661,151
|
N
|
Y
|
Total
|
19,562,409,691
|
|
|