DataSetRDFDumps

From W3C Wiki
Jump to: navigation, search

Linked Data Sets (i.e., with Dereferenceable URIs) available as RDF Dumps

  • Please provide the URL for the directory containing the RDF dump files.
  • Please try to have one directory or tarball per dump set -- such that we can retrieve and load the entire URL contents, to have a restored snapshot.
  • Please include a Publisher/Maintainer URI, for use in constructing attribution triples.
Project Data Exposed Size of Dump and Data Set Archive URL Publisher / Maintainer URI
Addgene Addgene catalog (tab delimited file) 1.1 MB tab-delimited file
Allen Brain Atlas Science Commons extract from ABA Web site, on or shortly before 26 Feb 2007 51 MB dump file
Airport Data SPARQL 754,585 Rob Styles
BAMS BAMS 5.6 MB bams-from-swanson-98-4-23-07.owl
BBC John Peel sessions from DBtune.org holding data released during Hackday, 2007 277,000 triples peel.tar.gz [ URI?]
BBOP All OBO ontologies 36 MB obo-all.tgz [URI]
BBOP selected OBO ontologies, downloaded ~21 April 2007, augmented with inferred relations 2.6 MB obo-in-owl.tgz [URI]
Billion Triples Challenge Dataset 2008 various dumps 1 billion triples download page
Billion Triples Challenge Dataset 2009 crawled Web data 1.14 billion triples download page
Billion Triples Challenge Dataset 2010 crawled Web data 3.2 billion triples download page
Bio2RDF various bio- and gene- related datasets 2.7 billion triples download page
Bitzi collaborative file describing service 330,026 discrete files, 270MB uncompressed dump directory
British Geological Survey (BGS) OpenGeoscience 1:625 000 Geology of the UK (DigMapGB), BGS geochronology and chronostratigraphy, BGS Lexicon of Named Rock Units Approx 840,000 (19 MB compressed) N-Triples data_bgs_ac_uk_ALL.zip British Geological Survey (BGS)
Chef Moz 290344 restaurants - 104856 reviews - 59243 links to reviews - 2402 editors size? URL?
Data-gov Wiki Datasets containing RDF data converted from datasets published at http://data.gov (and other sources). The datasets are clustered by dc:subject, e.g. government budget, environmental statistics, housing and population statistics, medical cost, energy consumption, public library statistics, and labor statistics. 5+ billion triples dump directory Tetherless World Constellation
DBpedia Data set containing extracted data from Wikipedia. About 2.6 million concepts described by 247 million triples, including abstracts in 14 different languages 247 million triples dump directory
DMOZ RDF Dump DMOZ size? URL?
DOAP Store provides daily generated dumps with all its DOAP project descriptions size? RDF/XML, N3
DOAPspace All 55,000+ DOAP profiles available as RDF/XML DOAP. This includes all DOAP created by doapspace and all DOAP spidered. size? XML/RDF tarball
Entrez Gene Select fields from Entrez Gene records 7.7 MB gene-owl.tgz [URI]
Entrez Gene Extract from [1] Entrez Gene Extract from [2] 5.6 MB gene-pubmed.ttl.tgz [ URI?]
Freebase RDF Store Freebase Views of Freebase Topics following the principles of Linked Data. The dataset extractions contain aggregated data from: Wikipedia, MusicBrainz, IMDB, TVDB, Flickr and more... (tab delimited file) 505 Mbytes compressed in the Bzip2 format tab-separated values file
FlyAtlas FlyAtlas and Affy D2 probe-to-gene size? FlyAtlas data and Affy D2 probe-to-gene
Fly-TED derived from data published by www.fly-ted.org and provides metadata on images depicting in situ hybridisation in D. melanogaster testes. size? RDF dump
Galen from co-ode.org Galen from co-ode.org 1.9 MB galen.tgz [ URI?]
GeoSpecies Knowledge Base Information on Biological Orders, Families, Species as well as species occurrence records and related data 1.888M Triples geospecies.rdf.gz Peter J. DeVries, UW - Madison
GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI) GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI) 73 MB goa-in-owl.tgz [ URI?]
GovTrack.us about the U.S. congress 13 million triples dump directory
HCLSIG LODD group various dumps  ?? HCLSIG LODD-related datasets
Homologene what? 626 KB homologene.tgz URI?
Jamendo from DBtune.org data from the Jamendo website 1.1 million triples jamendo-rdf.tar.gz [ URI?]
Lexvo Information about languages, words, characters, and other human language-related entities. ~20MB RDF Dump Gerard de Melo
LinkedCT Linked Clinical Trials 9.8 million triples. 1.6GB NTriples dump Dump Directory Publisher / Maintainer
LinkedMDB Linked Data about Movies 6.1 million triples. 850MB NTriples dump Dump Directory Publisher / Maintainer
Linked Sensor Data Datasets for sensors and sensor observations, converted from weather data at Mesowest. Contains descriptions of 20 thousand weather stations and 160 million observations. 1.7 billion triples download page Kno.e.sis: Semantic Sensor Web
Magnatune from DBtune.org data from the Magnatune label Size? magnatune_main.rdf [ URI?]
MeSH headings List of all associations of MeSH headings to papers indexed by Medline extracted from 2007 Medline baseline distribution 758 MB medline-mesh.tgz [URI]
MeSH titles Extracted from 2007 Medline baseline distribution 670 MB medline-titles.tgz [URI]
MeSH pairs NLM 2007 MeSH descriptor/qualifier pairs 13 MB mesh-qualified-headings.ttl.gz [URI]
MusicBrainz Currently the zipped version of this data is 102MB dump directory
Neurocommons text mining pilot extracted from Temis software applied to 7% of Medline records 24 MB neurocommons-text-mining.tgz [URI]
NLM 2007 MeSH NLM 2007 MeSH 13 MB mesh-skos.tgz [ URI?]
OpenCyc OpenCyc Ontology ~1.6 million triples, ~150MB uncompressed gzipped .owl file Cycorp, Inc.
OSM Semantic Network Geo-vocabulary utilised in OpenStreetMap. Linked to LinkedGeoData and WordNet W3C. ~130K triples Core RDF Mappings RDF Andrea Ballatore
Open Directory size? dump directory
Ordnance Survey administrative geography data size? administrative geography RDF dump
RAMEAU subject headings SKOS representation of the RAMEAU book indexing vocabulary, maintained by the French National Library (BnF) 130 MB uncompressed download folder Antoine Isaac and Rameau committee
Quotations Book at least 42,000 famous quotations with author and subject size? QB's Quotes RDF
RKB Explorer Data 25 different domains, each with a separate data set. The data sets are focused on scientific research; these include DBLP, Citeseer, CORDIS, NSF, EPSRC, RAE2001, KISTI, UNLOCODE, Wordnet, voiD, OS. ~60 million triples Use Semantic Web Sitemaps to find the URLs Hugh Glaser
Rpm Find data exposed? expands to about 1.3GB dump directory
Science Commons A bridging ontology, from Science Commons, importing other ontologies used in the prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology. 19 KB sciencecommons.owl [URI]
Semantic Bible (for New Testament Names) is a semantic knowledge base describing each named thing in the New Testament about 600 names ontology definition, instance data NTNames base URI
Semantic Web Dog Food Metadata for several semantic web related conferences and workshops, including the most recent ISWC, ESWC and WWW events. size? URLs?
SIMILE Data Collection various data sets including CIA's World Factbook, Library of Congress' Thesaurus of Graphic Materials, National Cancer Institute's cancer thesaurus, Web Consortium's Technical Reports Size? dump URL?
STW Thesaurus for Economics Thesaurus for economics and business economics, including a classification of subject categories. Maintained by the German National Library of Economics (ZBW) 12 MB uncompressed download page ZBW
SwetoDblp ontology focused on bibliography data of publications from DBLP with additions that include affiliations, universities, and publishers 11M triples vocabulary, instances
TaxonConcept Knowledge Base Species Concepts and related Biodiversity Informatics data 8.2M Triples txn_base.rdf.gz species_01.rdf.gz species_02.rdf.gz Peter J. DeVries, UW - Madison
Telegraphis Linked Open Data Countries, Continents, Capitals, and Currencies collected from GeoNames and Wikipedia data <10k triples a piece Countries, Continents, Capitals, and Currencies Pipian (Ian Jacobi)
Texai Lexicon machine readable dictionary derived from WordNet 2.1, Wiktionary, the CMU Pronouncing Dictionary and the OpenCyc lexicon. Each lexicon word sense entry contains links back to the source dictionary entry, and also to OpenCyc if the entry is has been mapped to the Cyc ontology. Size? dump URL?
TCMGeneDIT Dataset Traditional Chinese medicine, gene and disease association dataset and a linkset mapping TCM gene symbols to Extrez Gene IDs created by Neurocommons 288kb compressed TCMGeneDIT_r3_ttl.tar.gz [3] JunZhao
t4gm.info Thesaurus for Graphic Materials 7.3MB uncompressed RDF dump Bradley P. Allen
UniProt a large life sciences data set 300M+ triples dump directory
U.S. Census data population statistics at various geographic levels, from the U.S. as a whole, down through states, counties, sub-counties (roughly, cities and incorporated towns) 1 billion triples dump available on request JT
U.S. SEC data corporate ownership 1.8 million triples RDF dump JT
van Assem et al's ontology (used by output of MeSH to SKOS conversion) 2.2 KB mesh07-eswc06.rdfs [URI]
Wikipedia³ metadata extracted from Wikipedia 47 million triples dump URL?
Yale Senselab Yale Senselab 216 KB senselab.tgz [URI]
YAGO The complete YAGO ontology 1Gb YAGO as zipped RDFS YAGO team at the Max-Planck Institute for Informatics
YAGO The subClassOf hierarchy of the YAGO ontology 7Mb YAGO hierarchy as zipped RDFS YAGO team at the Max-Planck Institute for Informatics
World Bank Linked Data The World Bank Data published using the Linked Data principles. Currently it has World Development Indicators, World Bank Finances, World Bank Projects and Operations, and World Bank Climate Change data. 160 million triples See void:dataDump values Sarven Capadisli
  • Lots of others. Please feel free to add plenty :)