Talk:HCLS/ClinicalObservationsInteroperability/FDATherapeuticAreaOntologies

From W3C Wiki

Status of this Document

This is a project page for work on Semantic Web ontologies for FDA Therapeutic Areas.

Legend

This document uses some CSS styles to highlight frequently-searched text:

  • design decision
  • pending issue
  • ericP or Charlie to study
  • schema or OWL model
  • example instance data
  • inline comments

Goals

  • Discovery of smaller signals and more fine-grained strata.
  • Query/rule re-use between trials,
  • Concept re-use between Therapeutic Areas.
    • Minimize cognative burden, particularly when crossing TAs.

Strategy

  • Interview SMEs -- Subject Matter Experts sign off on Concept Maps presenting a graphical representation of the concepts and relationship in a TA domain.
  • Model concepts in RDF/OWL.
    • Re-use BRIDG where possible
  • Define specific datatype and value sets.
    • Re-use SIO21090 datatype definitions from O-RIM. These include complex types like uncertainty ranges, c.f. dt:TS.precision -- are they needed?
    • dt:CD and dt:CDCoding appear to be parallel e.g. both have a codeSystem c.f. C-CDA example
  • Test models with instance data, either invented or extracted from SDTM.
    • Note any place where arbitrary choices are made during synthesis -- these are points of diffusion.
    • Queries performed for examination are added to ontology as endpointos.
    • Specific examination criteria aren't identified in Rheumatoid Arthritis so we can't test this with data and queries.
  • Use existing controlled terminoliges where possible for e.g. controlled terminologies:
    • If the appropriate term is defined elsewhere, we use the existing term and its defintion. Will the SME-approved definition ever differ from that of the adopted term?
    • If no appropriate term is defined elsewhere, use skos:narrowerThan links to more general terms in order to a. document terms, b. shrink the search scope for data coded with that term, c. enable the same endpoints for post-market surveillance over clinical data

Design

Modularity

A principal goal of modeling clinical trial submissions in RDF is to leverage RDF's decentralized extensibility model based on globally-unique unambiguous identifiers for concepts and value sets. Sharing terms between submissions is a necessary component to enabling pooling of trial data. Distributed development of TAs hinges on discoverability of terms. To that end, TA-specific ontologies are broken down into:

  • core ontologies -- shared datatypes, common structures like the evaluation of the effects of an intervention as a set of baseline observations, a performed intervention (e.g. substance administration), and a set of evaluation observations.
  • shared ontologies -- observations, interventions and value sets shared between TA ontologies.
  • TA ontologies -- capture the information required by the concept map.

For instanace, RenalTransplantation includes observations related to kidney function from renal.ttl and procedures and rejection evaluation from transplant.ttl .

Basis on BRIDG

In the interest of optimizing interoperability and leveraging industry interest in BRIDG, the FDA TA ontologies are mostly derived from BRIDG classes.

Domain

BRIDG is specifically limited to clinical trials. Procedures and observations are derived from from activity classes. These activity classes are scoped to stages of the clinical trial phases: definition, planning, scheduling and performance. For example, a clinical trial protocol may examine the effects of a particular drug of study which will be used in conjunction with a specific set of concomittant medications. The protocol includes prescribed administrations of the these medications as bridg:DefinedSubstanceAdministrations, usually associated with specific daytime offsets from the start of the study. A CRO will begin a study, perhaps adapting these bridg:DefinedSubstanceAdministrations to bridg:PlannedSubstanceAdministrations @@I don't understand the principled deliniation@@. When a participant enters the study, the calendar of bridg:DefinedActivities gets mapped to a calendar of bridg:ScheduledActivities. This will be populated with instances of e.g. bridg:ScheduledSubstanceAdministration. When the trial management software signals that it's time to perform some activity for a particular participant, the performed event will be associated with the defined, planned and scheduled events, for instance:

 <BobAdminDrugX-1> bridg:PerformedActivity.instantiatedDefinedActivity <DrugXProtocol-Admin1> .

Property Naming

BRIDG is designed as a UML model and is expressed in RDF/OWL following the ODM 1.0 Specificaiton. This has some impact on the ontology:

  • Properties are derived from bridg:attributeProperty or bridg:associationProperty.
  • attributeProperties have the name of the domain in the name, e.g. bridg:DefinedObservation.targetAnatomicSiteCode.
  • assocationProperties include the names of both the domain and range properties, e.g. bridg:PerformedActivity.instantiatedDefinedActivity.
  • Following this naming convention means that property names are effectively single occurance over the ontology. There's no resuse of e.g. instantiated for to tie a bridg:PerformedObservation to more than one of (bridg:DefinedActivity, bridg:PlannedActivity bridg:ScheduledActivity).
    • strategy: derived bridg:PerformedActivity.instantiatedDefinedActivity, bridg:PerformedActivity.instantiatedPlannedActivity, etc. from mybridg:instantiated

Structure

The majority of the data needed for efficacy analysis comes from bridg:PerformedObservationResults. BRIDG ties these back to bridg:PerformedObservations. The information in a Concept Map includes:

  • observations of interest
  • coding for results of these observations
  • coding for consequential diagnoses and assessments
  • concomittant medications
  • analysis covariates

BRIDG resources


Coding specificity

BRIDG, and indeed most of clinical care, relies on shared terms identified by a tuple of a coding system identifier, a term code, and sometimes a version. The O-RIM's RDF representation of the CD.Code ISO21090 type looks like:

 # Example instance data -- Sue's baseline measured in cerebral spinal fluid
 :labObs1234
   # Rheumatoid Factor Observations are defined in the Rheumatoid Artheritis ontology
   a ra:RheumatoidFactorObservation ;
   # All observations have a time
   core:hasObservationTime "2013-07-07T19:02:00Z"^^xsd:dateTime ;
   # We reuse the hl7 schema to capture the convention of a code and a code system (à la ISO 21090)
   hl7:coding [ dt:CDCoding.code "14034-3" ; dt:CDCoding.codeSystem "2.16.840.1.113883.6.1" ;
                dt:CDCoding.displayName "Rheumatoid factor:Arbitrary Concentration:Point in time:Cerebral spinal fluid:Quantitative" ; dt:CDCoding.codeSystemName "LOINC" ] ;
   # Instance data will of course have results...
   core:hasResultValue [ data:value 65 ; data:units ucum:u_mL ].

Clinical data interoperability, such as it is, relies mostly on two systems using the same code to capture an observation or procedure. The model restriction on renal:RheumatoidFactorObservation specifies that the coding have a code of some value. For a very precise restriction, the model for the above instance data looks like

 # Definition of RheumatoidFactorObservation
 renal:RheumatoidFactorObservation
   rdfs:subClassOf
     # MUST have a LOINC code of 14034-3
     [ owl:onProperty hl7:coding ; owl:someValuesFrom :loinc-14034-3 ] .
 
 # loinc-14034-3 is anything with a LOINC code of 14034-3.
 :loinc-14034-3
   owl:equivalentClass [
     owl:intersectionOf (
       [ owl:onProperty dt:CDCoding.codeSystem ; owl:hasValue "2.16.840.1.113883.6.1" ]
       [ owl:onProperty dt:CDCoding.code       ; owl:hasValue "14034-3" ]
       ) ] .
 
 # Additional subclass information relating this code to other LOINC or EVS codes:
 :loinc-14034-7
   rdfs:subClassOf
     :loinc-14034-3 , :loinc-14034-9 , :evs-foo .

The hierarchy at the bottom allows a query for a general code like 14034-7 can be answered by any more instance data with a more precise code like 14034-3 or evs:foo. The converse is not true. In order to ask specific questions of data submitted with more general codes, the submission would have to complement the codes with specifics that supplied all six axes (site, methodology, units,...). Use cases may motivate extending our Observation class to capture these axes (e.g. statification by site or methodology), but this would be more work than using specific codes.

The FDA TA Ontologies can:

  1. Insist that submitted data contain the e.g. Observation type arcs from the TA. This makes the assertion of the type arc a speach act asserting compliance with an aspect of the TA.
  2. Identify one or more codes, any of which imply that the Observation of of some desired type.
  3. Specify that the e.g. Observation carry a particular code.

For the latter two, we must pick either existing terms or invent new terms with sufficient definitions that we have "clinical compatibility" between subjects in different studies. The modeling is the same either way, apart from additional skos:narrowerThan assertion tying our specific term to a widely-deployed general term.

LOINC

LOINC terms are excruciatingly precise (driven my both clinical care and liability use cases). They have a consistent coded description convention for the different axes. For instance, another study may measure rheumatoid factor with a sample taken at a different location, using a different measurement (mass concentration vs. volume concentration), with a different units, etc. The LOINC Ontology (under development) captures these axes:

 :labObs5678 # Joe's baseline measured in body fluid
   a renal:RheumatoidFactorObservation ;
   core:hasObservationTime "2012-06-06T18:01:00Z"^^xsd:dateTime ;
   hl7:coding [ dt:CDCoding.code "30231-5" ; dt:CDCoding.codeSystem "2.16.840.1.113883.6.1" ; # or EVS or ...
                dt:CDCoding.displayName "Rheumatoid factor:ACnc:Pt:Body fld:Qn" ; dt:CDCoding.codeSystemName "LOINC" ] ;
   core:hasResultValue [ data:value 65 ; data:units ucum:u_mL ].

Example usage:

  • Sue is in trial 1, Joe in trial 2.
  • A reviewer is looking for signals the aggregate of these two studies.
  • Sue and Joe end up in some strata (by e.g. history and outcome).
  • Resulting meta-study now has participants from both studies, with different protocol( implementation)s and potentially different e.g. measuerment sites.

Deciderata

  1. What specificity provides the required clinical significance to study aggregation?
  2. What will inconvenience the sponsors?
  3. What will helpfully guide the sponsors?
  4. What's current practice?
    1. Which measurements start out with specific lab codes because those systems are in place in the care facilities?
    2. Which are mapped to LOINC from foreign coding systems or from no code system at all?

Scope

The current project is scoped to efficacy analysis, though of course much of the model incorporates data that would be needed for safety analysis. Initially, the project scope excluded protocol modeling. While the Concept Maps include the notion of concomittant medications, this property is not intrisic to the medication but instead to the use of the medication in a particular study. It's not possible to model concomittant medications without modeling protocol. There might be additional metadata in bridg:ProductRelationships.


Issues

tool choice

For

  • viewing
  • presenting
  • developing
  • mapping data to

, we choose n of:

  • Protege
  • TBC free
  • TBC standard
  • TBC maestro

.

For value set terms, tie back to provenance. Provide some organizer for to group by e.g. NCI EVS, SNOMED, FDA-TA, etc.


Organizer provenance term
FDA-TA LOINC term1
FDA-TA NCI EVS term2
FDA-TA SNOMED term3
FDA-TA NCI EVS term4


TODO

integrate:

  • -check if adequately completed- Add definitions to e.g. MH: rollup of 9b 9c 9h.

interop issues

  • term specificity How much specificity is required for interoperability of observations between trials in a TA? Is any e.g. LOINC code for SCr good enough? Is the metric for interop whether one could transplant a patient from one study to another?
  • choice between NCI EVS CUIs vs. e.g. more specific loinc and an even more specific loinc term. Using a viable submission term reduces/eliminates the requirement for a termonology server. UMLS is volatile, breaking CIUs.
  • time points RenalTransplantation and ComplicatedUTI have intervals from protocol start. These are currently modeled as reusable DefinedObservations (which makes the TA sort of a template for protocols). What's the reuse utility of having two studies in the same TA have a Week3SCrObservation? What's the utility between different studies?