HCLSIG BioRDF Subgroup/Tasks/Reagents/Status Reports/2006-04-17

From W3C Wiki

This note describes first attempts at translating the Alzforum Antibody Directory in to RDF.

Science Background: http://en.wikipedia.org/wiki/Antibody

Goal of this round: Review the contents of the database, start building a model (ontology), translate the database using it, see what problems arise.

Technology: I am targeting OWL, using custom tools written in a java-based common lisp which use the Pellet libraries.

Selected Issues:

  • Namespace - what form of URI's to use for the various entities
  • Model - How to model the various entities
    • The antibody isotype (e.g. IgG versus IgM), light versus heavy chain
    • Experimental Methods that this antibody can be used for (http://www.alzforum.org/res/com/ant/glossary.asp) Note there are implicit relations between methods that should be modeled (e.g. more and less general terms)
    • Sample Preparations - e.g. frozen sections, cells etc.
    • Construction - how the antibody was created
    • Epitopes - the part of the protein the antibody binds to
    • Source - how the antibody was created, and in which species
    • Reactivity/Specificity - what forms of the protein (e.g. phosphorylated) in which species (and which not)
  • Connecting to existing ontologies, vocabularies - which to use
    • Species: NCBI taxonomy?
    • Gene: lsid? ncbi entrez URL?
    • Methods: PSI-MI insufficient, SNOMED?
    • Epitope: Sequence ontology? - but no way to specify e.g positional range of a sequence
    • Company information - foaf? vcard?
  • Parsing - Important information is in a restricted but varying subset of natural language
    • Gene name is sometimes a standard name, sometimes not (e.g. embedded greek letter xml entities)
    • Epitope descriptions (a variety of sorts of description)
    • Reactivity/Specificity (mostly species, but with negative information, such as not mouse)
    • Applicable experimental methods (some methods were not in glossary). May be list, may include negative information
    • Method Glossary (manipulated in emacs to lisp form -> generate owl from it)