HCLS/HCLS dbpedia

From W3C Wiki

Mapping Life Science and Health Care ontologies and datasets to DBpedia and YAGO

Task Objectives

  1. Identifying possibilities for mappings between HCLS data (such as the OBO ontologies, MeSH in SKOS, NeuronDB) to DBpedia. Listing them on this wiki page, together with possible queries that could be used for automated mapping.
  2. Choosing relations to use for the mappings (e.g., rdfs:seeAlso, owl:sameAs, owl:equivalentClass)
  3. Writing, publishing and running scripts to create the mappings
  4. Deciding how to publish the mappings on the web (downloadable .zip files, SPARQL endpoint, dedicated graph in the SPARQL endpoint of the HCLS KB, linked data...)
  5. Publishing the mappings
  6. Agreeing on a community process to update the mappings periodically

Participants

  • Matthias Samwald (DERI Galway)
  • Kingsley Idehen - interested in publishing since DBpedia, Uniprot, HCLS, Yago are all available from Virtuoso instances within close proximity at OpenLink
  • (add yourself if you participate)

Queries and mappings

List all properties in the DBpedia triplestore

 select distinct ?property where
  {?property a rdf:Property .}


List all properties that are used to describe proteins

Simple query to get a feeling for what the domain and range of a property are

 select distinct * where {?resource <http://dbpedia.org/property/SOME_PROPERTY_HERE> ?value}

LIMIT 40


Properties that are of interest for mapping purposes

Prime candidates

http://dbpedia.org/property/uniprot

http://dbpedia.org/property/goCode

http://dbpedia.org/property/casNumber

http://dbpedia.org/property/casno (sometimes used for chembox identifiers of resources, not for resources themselves)

http://dbpedia.org/property/inchi

http://dbpedia.org/property/chebi

http://dbpedia.org/property/meshname

http://dbpedia.org/property/meshid

http://dbpedia.org/property/diseasesdb

Scratchpad

http://dbpedia.org/property/iupacName http://dbpedia.org/property/molecularWeight http://dbpedia.org/property/casNumber http://dbpedia.org/property/casno (sometimes used for chembox identifiers of resources, not for resources themselves) http://dbpedia.org/property/pubchem http://dbpedia.org/property/smiles http://dbpedia.org/property/iupacname http://dbpedia.org/property/iupacName http://dbpedia.org/property/inchi http://dbpedia.org/property/chebi

http://dbpedia.org/property/meshname http://dbpedia.org/property/meshid

http://dbpedia.org/property/mgiid http://dbpedia.org/property/omim http://dbpedia.org/property/homologene

http://dbpedia.org/property/pmid http://dbpedia.org/property/doi

http://dbpedia.org/property/diseasesdb

http://dbpedia.org/property/icd10 often refers to a separate resource derived from a wiki template, such as http://dbpedia.org/page/Arthritis/icd10/ICD10 (compare this to http://en.wikipedia.org/wiki/Arthritis -- the representation in DBpedia seems puzzling / not usable)

http://dbpedia.org/property/regnum http://dbpedia.org/property/divisio http://dbpedia.org/property/ordo http://dbpedia.org/property/subfamilia http://dbpedia.org/property/tribus http://dbpedia.org/property/phylum http://dbpedia.org/property/genus

From proteins: http://dbpedia.org/property/interpro http://dbpedia.org/property/scop http://dbpedia.org/property/opmProtein http://dbpedia.org/property/pfam http://dbpedia.org/property/pdb http://dbpedia.org/property/prosite http://dbpedia.org/property/smart http://dbpedia.org/property/opmFamily http://dbpedia.org/property/name http://dbpedia.org/property/hgncid http://dbpedia.org/property/omim http://dbpedia.org/property/chromosome http://dbpedia.org/property/band http://dbpedia.org/property/entrezgene http://dbpedia.org/property/refseq http://dbpedia.org/property/arm http://dbpedia.org/property/uniprot http://dbpedia.org/property/umichopmProperty http://dbpedia.org/property/homologene http://dbpedia.org/property/mgiid http://dbpedia.org/property/ecNumber http://dbpedia.org/property/iubmbEcNumber http://dbpedia.org/property/goCode

Notes

  • It seems like the domains of some properties are a bit heterogeneous, additional restrictions (e.g., on the YAGO classification of a resource) might be needed.
  • Some properties are not widely used (< 200 times) and can be disregarded for the mappings.

Related resources

Categories