Project
|
Data Exposed
|
Size of Dump and Data Set
|
Archive URL
|
Publisher / Maintainer URI
|
Addgene
|
Addgene catalog (tab delimited file)
|
1.1 MB
|
tab-delimited file
|
|
Allen Brain Atlas
|
Science Commons extract from ABA Web site, on or shortly before 26 Feb 2007
|
51 MB
|
dump file
|
|
Airport Data
|
SPARQL
|
754,585
|
|
Rob Styles
|
BAMS
|
BAMS
|
5.6 MB
|
bams-from-swanson-98-4-23-07.owl
|
|
BBC John Peel sessions from DBtune.org
|
holding data released during Hackday, 2007
|
277,000 triples
|
peel.tar.gz
|
[ URI?]
|
BBOP
|
All OBO ontologies
|
36 MB
|
obo-all.tgz
|
[URI]
|
BBOP
|
selected OBO ontologies, downloaded ~21 April 2007, augmented with inferred relations
|
2.6 MB
|
obo-in-owl.tgz
|
[URI]
|
Billion Triples Challenge Dataset 2008
|
various dumps
|
1 billion triples
|
download page
|
|
Billion Triples Challenge Dataset 2009
|
crawled Web data
|
1.14 billion triples
|
download page
|
|
Billion Triples Challenge Dataset 2010
|
crawled Web data
|
3.2 billion triples
|
download page
|
|
Bio2RDF
|
various bio- and gene- related datasets
|
10+ billion triples
|
download page
|
|
Bitzi
|
collaborative file describing service
|
330,026 discrete files, 270MB uncompressed
|
dump directory
|
|
British Geological Survey (BGS) OpenGeoscience
|
1:625 000 Geology of the UK (DigMapGB), BGS geochronology and chronostratigraphy, BGS Lexicon of Named Rock Units
|
Approx 840,000 (19 MB compressed) N-Triples
|
data_bgs_ac_uk_ALL.zip
|
British Geological Survey (BGS)
|
|
Chef Moz
|
290344 restaurants - 104856 reviews - 59243 links to reviews - 2402 editors
|
size?
|
URL?
|
|
Data-gov Wiki
|
Datasets containing RDF data converted from datasets published at http://data.gov (and other sources). The datasets are clustered by dc:subject, e.g. government budget, environmental statistics, housing and population statistics, medical cost, energy consumption, public library statistics, and labor statistics.
|
5+ billion triples
|
dump directory
|
Tetherless World Constellation
|
DBpedia
|
Data set containing extracted data from Wikipedia. About 2.6 million concepts described by 247 million triples, including abstracts in 14 different languages
|
247 million triples
|
dump directory
|
|
DMOZ RDF Dump
|
DMOZ
|
size?
|
URL?
|
|
DOAP Store
|
provides daily generated dumps with all its DOAP project descriptions
|
size?
|
RDF/XML, N3
|
|
DOAPspace
|
All 55,000+ DOAP profiles available as RDF/XML DOAP. This includes all DOAP created by doapspace and all DOAP spidered.
|
size?
|
XML/RDF tarball
|
|
Entrez Gene
|
Select fields from Entrez Gene records
|
7.7 MB
|
gene-owl.tgz
|
[URI]
|
Entrez Gene Extract from [1]
|
Entrez Gene Extract from [2]
|
5.6 MB
|
gene-pubmed.ttl.tgz
|
[ URI?]
|
Freebase RDF Store
|
Freebase Views of Freebase Topics following the principles of Linked Data. The dataset extractions contain aggregated data from: Wikipedia, MusicBrainz, IMDB, TVDB, Flickr and more... (tab delimited file)
|
505 Mbytes compressed in the Bzip2 format
|
tab-separated values file
|
|
FlyAtlas
|
FlyAtlas and Affy D2 probe-to-gene
|
size?
|
FlyAtlas data and Affy D2 probe-to-gene
|
|
Fly-TED
|
derived from data published by www.fly-ted.org and provides metadata on images depicting in situ hybridisation in D. melanogaster testes.
|
size?
|
RDF dump
|
|
Galen from co-ode.org
|
Galen from co-ode.org
|
1.9 MB
|
galen.tgz
|
[ URI?]
|
GeoSpecies Knowledge Base
|
Information on Biological Orders, Families, Species as well as species occurrence records and related data
|
1.888M Triples
|
geospecies.rdf.gz
|
Peter J. DeVries, UW - Madison
|
GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI)
|
GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI)
|
73 MB
|
goa-in-owl.tgz
|
[ URI?]
|
GovTrack.us
|
about the U.S. congress
|
13 million triples
|
dump directory
|
|
HCLSIG LODD group
|
various dumps
|
??
|
HCLSIG LODD-related datasets
|
|
Homologene
|
what?
|
626 KB
|
homologene.tgz
|
URI?
|
Jamendo from DBtune.org
|
data from the Jamendo website
|
1.1 million triples
|
jamendo-rdf.tar.gz
|
[ URI?]
|
Lexvo
|
Information about languages, words, characters, and other human language-related entities.
|
~20MB
|
RDF Dump
|
Gerard de Melo
|
LinkedCT
|
Linked Clinical Trials
|
9.8 million triples. 1.6GB NTriples dump
|
Dump Directory
|
Publisher / Maintainer
|
LinkedMDB
|
Linked Data about Movies
|
6.1 million triples. 850MB NTriples dump
|
Dump Directory
|
Publisher / Maintainer
|
Linked Sensor Data
|
Datasets for sensors and sensor observations, converted from weather data at Mesowest. Contains descriptions of 20 thousand weather stations and 160 million observations.
|
1.7 billion triples
|
download page
|
Kno.e.sis: Semantic Sensor Web
|
Magnatune from DBtune.org
|
data from the Magnatune label
|
Size?
|
magnatune_main.rdf
|
[ URI?]
|
MeSH headings
|
List of all associations of MeSH headings to papers indexed by Medline extracted from 2007 Medline baseline distribution
|
758 MB
|
medline-mesh.tgz
|
[URI]
|
MeSH titles
|
Extracted from 2007 Medline baseline distribution
|
670 MB
|
medline-titles.tgz
|
[URI]
|
MeSH pairs
|
NLM 2007 MeSH descriptor/qualifier pairs
|
13 MB
|
mesh-qualified-headings.ttl.gz
|
[URI]
|
MusicBrainz
|
—
|
Currently the zipped version of this data is 102MB
|
dump directory
|
|
Neurocommons text mining pilot
|
extracted from Temis software applied to 7% of Medline records
|
24 MB
|
neurocommons-text-mining.tgz
|
[URI]
|
NLM 2007 MeSH
|
NLM 2007 MeSH
|
13 MB
|
mesh-skos.tgz
|
[ URI?]
|
OpenCyc
|
OpenCyc Ontology
|
~1.6 million triples, ~150MB uncompressed
|
gzipped .owl file
|
Cycorp, Inc.
|
OSM Semantic Network
|
Geo-vocabulary utilised in OpenStreetMap. Linked to LinkedGeoData and WordNet W3C.
|
~130K triples
|
Core RDF Mappings RDF
|
Andrea Ballatore
|
Open Directory
|
—
|
size?
|
dump directory
|
|
Ordnance Survey
|
administrative geography data
|
size?
|
administrative geography RDF dump
|
|
RAMEAU subject headings
|
SKOS representation of the RAMEAU book indexing vocabulary, maintained by the French National Library (BnF)
|
130 MB uncompressed
|
download folder
|
Antoine Isaac and Rameau committee
|
Quotations Book
|
at least 42,000 famous quotations with author and subject
|
size?
|
QB's Quotes RDF
|
|
RKB Explorer Data
|
25 different domains, each with a separate data set. The data sets are focused on scientific research; these include DBLP, Citeseer, CORDIS, NSF, EPSRC, RAE2001, KISTI, UNLOCODE, Wordnet, voiD, OS.
|
~60 million triples
|
Use Semantic Web Sitemaps to find the URLs
|
Hugh Glaser
|
Rpm Find
|
data exposed?
|
expands to about 1.3GB
|
dump directory
|
|
Science Commons
|
A bridging ontology, from Science Commons, importing other ontologies used in the prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.
|
19 KB
|
sciencecommons.owl
|
[URI]
|
Semantic Bible
|
(for New Testament Names) is a semantic knowledge base describing each named thing in the New Testament
|
about 600 names
|
ontology definition, instance data
|
NTNames base URI
|
Semantic Web Dog Food
|
Metadata for several semantic web related conferences and workshops, including the most recent ISWC, ESWC and WWW events.
|
size?
|
URLs?
|
|
SIMILE Data Collection
|
various data sets including CIA's World Factbook, Library of Congress' Thesaurus of Graphic Materials, National Cancer Institute's cancer thesaurus, Web Consortium's Technical Reports
|
Size?
|
dump URL?
|
|
STW Thesaurus for Economics
|
Thesaurus for economics and business economics, including a classification of subject categories. Maintained by the German National Library of Economics (ZBW)
|
12 MB uncompressed
|
download page
|
ZBW
|
SwetoDblp
|
ontology focused on bibliography data of publications from DBLP with additions that include affiliations, universities, and publishers
|
11M triples
|
vocabulary, instances
|
|
TaxonConcept Knowledge Base
|
Species Concepts and related Biodiversity Informatics data
|
8.2M Triples
|
txn_base.rdf.gz species_01.rdf.gz species_02.rdf.gz
|
Peter J. DeVries, UW - Madison
|
|
Telegraphis Linked Open Data
|
Countries, Continents, Capitals, and Currencies collected from GeoNames and Wikipedia data
|
<10k triples a piece
|
Countries, Continents, Capitals, and Currencies
|
Pipian (Ian Jacobi)
|
Texai Lexicon
|
machine readable dictionary derived from WordNet 2.1, Wiktionary, the CMU Pronouncing Dictionary and the OpenCyc lexicon. Each lexicon word sense entry contains links back to the source dictionary entry, and also to OpenCyc if the entry is has been mapped to the Cyc ontology.
|
Size?
|
dump URL?
|
|
TCMGeneDIT Dataset
|
Traditional Chinese medicine, gene and disease association dataset and a linkset mapping TCM gene symbols to Extrez Gene IDs created by Neurocommons
|
288kb compressed
|
TCMGeneDIT_r3_ttl.tar.gz [3]
|
JunZhao
|
t4gm.info
|
Thesaurus for Graphic Materials
|
7.3MB uncompressed
|
RDF dump
|
Bradley P. Allen
|
UniProt
|
a large life sciences data set
|
300M+ triples
|
dump directory
|
|
U.S. Census data
|
population statistics at various geographic levels, from the U.S. as a whole, down through states, counties, sub-counties (roughly, cities and incorporated towns)
|
1 billion triples
|
dump available on request
|
JT
|
U.S. SEC data
|
corporate ownership
|
1.8 million triples
|
RDF dump
|
JT
|
van Assem et al's ontology
|
(used by output of MeSH to SKOS conversion)
|
2.2 KB
|
mesh07-eswc06.rdfs
|
[URI]
|
Wikipedia³
|
metadata extracted from Wikipedia
|
47 million triples
|
dump URL?
|
|
Yale Senselab
|
Yale Senselab
|
216 KB
|
senselab.tgz
|
[URI]
|
YAGO
|
The complete YAGO ontology
|
1Gb
|
YAGO as zipped RDFS
|
YAGO team at the Max-Planck Institute for Informatics
|
YAGO
|
The subClassOf hierarchy of the YAGO ontology
|
7Mb
|
YAGO hierarchy as zipped RDFS
|
YAGO team at the Max-Planck Institute for Informatics
|
World Bank Linked Data
|
The World Bank Data published using the Linked Data principles. Currently it has World Development Indicators, World Bank Finances, World Bank Projects and Operations, and World Bank Climate Change data.
|
160 million triples
|
See void:dataDump values
|
Sarven Capadisli
|