HCLSIG/PharmaOntology/Resources

From W3C Wiki

Existing Resources

Ontologies

1. SO-Pharm (Suggested Ontology for Pharmacogenomics)

  • License:
  • Notes

SO-Pharm is a knowledge base for tying genotype (especially SNPs), drug and phenotype. It directly imports Coulet's SNP ontology, ChEBI and DO in OWL. Other ontologies were translated to OWL. Notably, Coulet drew his own UML diagrams for clinical trials.

Michel Dumontier's comments (pdf) are on page 6.

n-ary relations are dealt with by the addition of new classes. 'is composed of', a subproperty of ro:has_part, domain and range are each disjunctions of over 10 classes, and is distinct from the SNP property 'isComposedOf'.

In the paper, a 'demylenised_patient' is modelled as a subclass of person with a conjunction of further restriction classes, one of which is:

only isEnrolledIn ( only isDefinedBy ( only isComposedOf oneOf { mercaptopurine_treatment } ) ).

So, if the patient is enrolled in anything (a 'clinical trial' by the range axiom) then he is enrolled in a clinical trial that can only (be)DefinedBy its involving mercaptopurine_treatment. He cannot also be enrolled in a dissimilar trial. This rule, if intended, might more reasonably be stated as a restriction on 'clinical trial', but such would contradict the comments in the ontology.

2. Epoch Clinical Trial Ontologies

  • License:
  • Notes:

Emphasis for the Epoch project is on a) tracking participants and b) tracking clinical specimens (pdf). As such, the head ontology has FacilitiesPlan and OperationalPlan classes, and has a particularly rich structure about SpecimenWorkflow, right down to hasShippingInstructions for SpecimenShipping.

I don't see the parts about tracking participants through trials. This might possibly be because the ontology imports the currently (5/09) broken links OrganizationOntology and ProtocolOntology.

There is a Clinical Trial Database that has been mapped to use the Epoch suite here and (pdf).

3. Human Disease Ontology (implemented with UMLS)

  • License:
  • Notes

4. Experiment Ontology (developed by Lilly) Document and OWL

  • License:
  • Notes

5. Computer-based Patient Record Ontology (see: wiki and attachment on that page)

  • License:
  • Notes

6. BioTop

  • License: Creative Commons - Attribution License.
  • Notes

Implemented in OWL-DL, this suite of upper ontologies consists of BioTop, ChemTop + 4 bridge ontologies to tie in BFO-RO. The suite tries to be independent of granularity, and as such has had to eschew use of the subclasses of bfo:IndependentContinuant. There are bridges to GO, Cell Ontology and ChEBI. There are no imports.

Quality and QualityRegion are distinct subclasses of Thing. Canonicity is a Quality and CononicityRegion is a QualityRegion. CanonicalState is formed from

hasInherence some ( Canonicity and qualityLocated some CanonicalRegion ).

Role, also a direct descendant of Thing, is broken out into seven subclasses, none of them formed by property restrictions. ChemicalRole has Catalytic- and Reagent- subclasses; DrugRole is subclassed to Therapeutic- and HealthRelated-. All these are organized into a tree enforced by disjointness.

The descendants of ProcessualEntity do not have disjointness axioms. Causing, Complicating, Disrupting and ManagingCare are all subclasses, not formed by restriction.

BioTop has a selection of generic properties aligned with RO. Notable additions are hasAbstractPart and qualityLocated.

7. UMLS

  • License:
  • Notes

8. Drug Ontology

  • License: "License content equivalent to physician's drug reference was the primary source for populating this ontology."
  • Notes

"Contains important concepts such as generic drugs, brand names, classes of drugs, drug-drug interactions, drug-allergy interactions."(LSDIS) Holding classes, object and data properties, and individuals, this ontology seems particularly strong in its vocabulary for fomularies, intake routes and dosage forms.

Created with an early version of Protege and free of any annotations, there are no URIs. 'property' is a class, among whose subclasses are dosage_form and intake_route. Individual 'capsule', and several others, are of type dosage_form. There are no comparable individuals which are of type intake_route. There are properties for Monograph IX class levels 1-3.

9. TopBio

  • License:
  • Notes

10. OBI

  • License:
  • Notes

11. PR&D Ontology from Lilly

  • License:
  • Notes

12. Relations Ontology

  • License:
  • Notes

RO is a small hierarchy of properties for OBO ontologies. The world is divided into continuants and processes, where "The terms 'continuant' and 'process' are generalizations of GO's 'cellular component' and 'biological process' but applied to entities at all levels of granularity, from molecule to whole organism." http://genomebiology.com/2005/6/5/R46

The native format for RO is OBO instead of OWL, and this has a few consequences. In the OWL mapping, in addition to the changes for obo:is_a and obo:instance_of, some important property characteristics disappear. For example, obo:part_of is transitive, reflexive and anti-symmetric, as seen in the downloadable ro.obo. But in the downloadable ro.owl, transitivity is the only surviving property characteristic. OBO anti-symmetry (R(x,y) and R(y,x) implies x=y) is a weaker characteristic than OWL asymmetry (R(x,y) implies ~R(y,x)). The OWL 2 New Features and Rationale discusses the mapping.

13. CDISC

  • License:
  • Notes

The SDTM (Study data tabulation model) "is built around the concept of observations collected about subjects who participated in a clinical study." Observations are broken out into Findings, Events and Interventions.

Eric Neumann has argued that, in addition to describing Observations, SDTM should hold high-level concepts of Study and Subject. There should be provision for holding unique URIs provided by NCI Thesaurus, and the types of observations should be augmented with further URIs to express refined descriptions. While these suggestions have gone unheeded, SDTM presumably will become the standard for submitting clinical trial data to the FDA. SDTM data might well be a valuable addition to the LOD cloud, or at least to the RDF accessible from the translational medicine ontology.

Given the above, what can be done? Could the translational medicine ontology use SDTM data?

Study and subject do not have their own domains; they are relegated to 'identifier variables'. These identifier variables, notably STUDYID and USUBJID (www2_sas_com/proceedings/forum2008/207-2008.pdf), are "keys" for study and subject, and would be properties with rdfs:domain Observation. Under OWL 1 there is no provision for an inverse functional datatype property, so there would be no way to 'pivot' on the USUBJID in order to make visible his genotype and biomarkers.

With OWL 2 "easy keys" (pdf), however, we now will be able to isolate the subject, where he is uniquely determined by a combination of properties whose domain is sdtm:Observation. Instead of having Observation as the only large-scale class, we can now have both Study and Subject as well, as Neumann suggested. The only downside is that easy keys can be used on only named individuals in the graph; they cannot be used on bnodes. So, every member of both Study and Subject must be specified explicitly in the ABox component, which should not be too burdensome. In fact, these now can be URIs in the RDF, outside of SDTM itself.

So declarations of the domains, ranges and easy keys might look something like:

PropertyDomain( sdtm:studyProperty   sdtm:Observation )
PropertyDomain( sdtm:subjectProperty sdtm:Observation )
PropertyRange(  sdtm:studyProperty   <datatype_for_STUDYID> )
PropertyRange(  sdtm:subjectProperty <datatype_for_USUBJID> )
HasKey( transmed:Study   sdtm:studyProperty )
HasKey( transmed:Subject sdtm:studyProperty sdtm:subjectProperty )

where we have kept sdtm:, the namespace of TransMed's version of the SDTM ontology, separate from transmed: itself.

Since we need named instances of classes Study and Subject, we need:

ClassAssertion( transmed:Study   a:study123 )
ClassAssertion( transmed:Subject a:study123/subject456 )
PropertyAssertion( sdtm:studyProperty   a:study123 <STUDYID_for_DiseaseStudy123> )
PropertyAssertion( sdtm:subjectProperty a:study123/subject456 <USUBJID_for_DiseaseStudy123.subject456> )


14. HL-7

  • License:
  • Notes

Barry Smith's pithy HL7 blog provides the most readable and entertaining accounts of HL7.

Google "hoot72 hl7" for an ontology and a mapping from HL7 v2.x messages to RDF.

15. Sequence Ontology

  • License:
  • Notes

16. BioPAX and Wiki

  • License:
  • Notes

Real Entities are distinguished from the UtilityClasses. Disjointness is established for top subclasses under Entity. All 50 properties have domain and ranges. There are 18 universal restriction classes and 20 cardinality restrictions. The PhysicalEntity class is meant to accommodate instances in distinct states, such as post-translational changes to a protein, and which participate in interactions, whereas the EntityReference class unifies these physical states into a commonly used reference.

Michel Dumontier's comments (pdf) are at the bottom of page 9.

Pathway is an Entity, and PathwayStep is a UtilityClass. They can be related by the near-inverse properties pathwayOrder and stepProcess. nextStep establishes an ordering between PathwaySteps, and this is typically used to isolate steps in a biochemical pathway. Pathways need not be decomposed into a network of parts, but can instead be composed of a bag of interactions, and this is typically used for molecular reactions. cofactor, controlled and controller are notable subproperties of 'participant' to specify the nature of participants in interactions.

The Interaction class should probably be an owl:equivalentClass to its immediate subclasses. "Since it is a highly abstract class in the ontology, instances of the interaction class should never be created." Same is true of SequenceLocaton and Xref classes.

17. Systems Biology Ontology

SBO classes give a vocabulary for describing the components of systems biology and their interactions. With the native source being OBO, there are no properties. Many of the classes look like they easily could be mapped or subclassed to other ontologies. But the most important of the major classes is 'mathematical expression'. For its subtree, the leaves and certain other classes bear annotation defs that hold MathML lambda expressions, especially for rate equations. The variables of these expressions are other SBO terms, and this ties the corpus of mathematically defined rate equations to their intended parameters.

There is some interest in bringing the Systems Biology Markup Language (SBML) into the HCLS.

18. GGF3

  • License:
  • Notes

19. PATO ('Phenotype, Attribute and Trait Ontology' or Phenotypic Quality Ontology)

  • License:
  • Notes

Native representation is OBO. PATO is primarily a class hierarchy of 2000 classes, without disjointness but meant to express, I believe, a tree partition. Each class is a term for a quality of a phenotype, independent of the phenotype that bears it. There are only 17 properties, some of which participate in existential restrictions. The top class 'quality' has subclasses for qualities inhering in continuants and in processes. A third subclass will be obsolete, and houses concepts of intensity, magnitude and deviation from normalcy. From George Gkoutos ppt: "Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exists."

Notably, there are classes for lacking or having extra processual parts, for being disfunctional, for being unnecessary or insufficient, for having a variety of dispositions of variabilities, and for being absent from an organism.

20. Human Phenotype Ontology

  • License:
  • Notes

Native representation is OBO, meant to describe clinical abnormality. This is a hierarchy of 9000 classes without disjointness that is a graph instead of a tree. The three main partitions are graphs for organ abnormality, inheritance, and onset and clinical course. There are no properties.

Under inheritance, 'Autosomal dominant vs. multifactorial' is a subclass of Multifactorial. The clinical course vocabulary has categories of phenotypic variability that distinguishes 'Highly variable phenotype and severity' from 'Highly variable phenotype, even within families'.

21. FMA

  • License:
  • Notes

22. NCI Thesaurus

  • License:
  • Notes

23. MedDRA Medical Dictionary for Regulatory Activities

  • License: Owned by the IFPMA International Federation of Pharmaceutical Manufacturers Association; hosted presumably by Northrop Grumman. Core service for revenue > $5B is ~$63K.
  • Notes

24. SNOMED

  • License:
  • Notes

John Madden's discussion of SNOMED and OWL is here.

Alan Rector's (pdf) is a good view on the 10-year future of biomedical ontologies. See, in particular, the Appendix on SNOMED/OWL.

25. Dolce

  • License:
  • Notes

26. IEDB

  • License:
  • Notes

27. OCRe (Ontology of Clinical Research)

  • License:
  • Notes

28. [1] Trial Bank Project


Use Case Examples

1. Parkinson's Use Case with questions grouped by "role" [2]

High-Level Picture

1. High-level picture of a patient and disease centric model