This document is intended to provide simple but working examples of how clinical trial data can be expressed in and leverage the Semantic Web. The Semantic Web uses a triples based model called RDF. Here, I use a particularly mail-friendly expression of this (called "Turtle"). The RDF in this document is from prot.ttl.

Different organizations define value sets as the enumerated set of possible responses to a given CRF question by taking terms from either existing standard terminologies such as MedDRA, LOINC, SNOMED, CDISC, etc. or from organization-specific (private) terminologies. This examples assumes that each terminology has its own namespace, and, in the case of organization-specific terminologies, hypothetical namespaces have been defined (e.g. “pharma1,” etc.). Note that alternatively, organizations can use their existing domain names to establish name spaces for their defined terms. The examples use URLs that appear realistic – e.g. "http://www.w3.org/2013/02/ValueSet/prot/..." – but were developed solely for the purpose of this discussion and do not represent actual namespaces. In real life, this would use spaces already set aside by http://ihtsdo.org/ (SNOMED), http://loinc.org/, etc.

We like to avoid repeating long URLs so we declare "prefixes" to simplify the text.

@prefix prot: <http://www.w3.org/2013/02/ValueSet/prot#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix LOINC: <http://www.w3.org/2013/02/ValueSet/LOINC#> .
@prefix pharma1: <http://www.w3.org/2013/02/ValueSet/pharma1#> .
@prefix pharma2: <http://www.w3.org/2013/02/ValueSet/pharma2#> .
@prefix terminology1: <http://www.w3.org/2013/02/ValueSet/terminology1#> .
@prefix terminology2: <http://www.w3.org/2013/02/ValueSet/terminology2#> .

If terminology1 has defined terms like 1+, 2+, etc. and terminology2 terms like slight, moderate, etc. 3rd (and 4th, ...) parties can define value sets re-using those terms. These value sets are used as the answer set for multiple CRF questions.

#################################################################
# Value Sets
#################################################################

pharma1:valueSet1234 owl:oneOf (                                    # the value set "pharma1:valueSet1234" includes the terms "1Plus" ... "4Plus" from "terminology1".
  terminology1:1Plus  terminology1:2Plus
  terminology1:3Plus  terminology1:4Plus ) .
pharma2:valueSetPick owl:oneOf (                                    # the value set "pharma2:valueSetPick" includes those plus "not_checked", "WNL" and "absent" from terminology2.
  terminology2:not_checked  terminology2:WNL  terminology2:absent
  terminology1:1Plus  terminology1:2Plus  terminology1:3Plus
  terminology1:4Plus ) .
pharma2:valueSetBoth owl:oneOf (                                    # the value set pharma2:valueSetBot includes overlapping values from terminology1 and terminology2.
  terminology2:not_checked  terminology2:WNL  terminology2:absent   # more ontology statements are required to assert that, for a given question, 1Plus is the same as mild.
  terminology1:1Plus  terminology1:2Plus  terminology1:3Plus
  terminology1:4Plus  terminology2:slight  terminology2:moderate
  terminology2:severe  terminology2:very_severe ) .

We then have definitions for the semantic of, and permissible values of, questions which appear on CRFs:

#################################################################
# Questions
#################################################################

pharma1:Q_Extol rdfs:label "Exercise tolerance"  ;                               # Extol has a label of "Exercise Tolerance"
  rdfs:subClassOf prot:CRFQuestion , LOINC:pulmonary_functon_test ,           # Extol is a subclass of CRFQuestion and pulmonary_functon_test
     [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma1:valueSet1234 ] . # every Extol has an obsValue property with a value in valueSet1234.
pharma2:Q_Tread rdfs:label "Treadmill endurance" ;
  rdfs:subClassOf prot:CRFQuestion , LOINC:cardiac_function_test ,
     [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma2:valueSetPick ] .
pharma2:Q_Sleep rdfs:label "Sleep disturbance"   ;
  rdfs:subClassOf prot:CRFQuestion , LOINC:interruption_of_REM_sleep ,
     [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma2:valueSetBoth ] .

These leverage a constraints language called OWL (for Web Ontology Language) and annotate the tests with some semantics. In this case, we're pretending that LOINC terms are human-readable. Now a study which tries to answer with data which is not in the appropriate value set will be marked as inconsistent. For instance, terminology2:WNL is NOT a member of pharma1:valueSet1234 and thus not a permitted value of a pharma1:Q_Extol result.

The above representation essentially treats questions as classes of answer values. While effective for demonstration purposes, a more rigorous model divides questions from the classes of potential answers to those questions. The auxiliary document prot-questans.ttl uses this more rigorous model.

pharma1:Q_Extol rdfs:label "Exercise tolerance" ;
    a LOINC:pulmonary_functon_test ;
    prot:resultConstraints pharma1:Q_Extol_Result .
pharma1:Q_Extol_Result rdfs:subClassOf prot:CRFResult ,
     [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma1:valueSet1234 ] .

pharma2:Q_Tread rdfs:label "Treadmill endurance" ;
    a LOINC:cardiac_function_test ;
    prot:resultConstraints pharma2:Q_Tread_Result .
pharma2:Q_Tread_Result rdfs:subClassOf prot:CRFResult ,
    [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma2:valueSetPick ] .

pharma2:Q_Sleep rdfs:label "Sleep disturbance" ;
    a LOINC:interruption_of_REM_sleep ;
    prot:resultConstraints pharma2:Q_Sleep_Result .
pharma2:Q_Sleep_Result rdfs:subClassOf prot:CRFResult ,
     [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma2:valueSetBoth ] .

We'd like to query for the "compatible" data. Since these are all values with very weak semantics on their own, we must leverage the annotations attached to the question definitions above. These, while distinct, can leverage a hierarchy with the terminology from which they were drawn. I've used a rather flat sub-class-of hierarchy to illistrate this:

LOINC:pulmonary_functon_test rdfs:subClassOf LOINC:exercise_evaluation .
LOINC:cardiac_function_test rdfs:subClassOf LOINC:exercise_evaluation .

There is a large structure of study defintions with Case Report Forms connecting questions to protocols for acquiring e.g. a time-series of values. This is not required for this example, but can be viewed to supply some context:

#################################################################
# CRFs
#################################################################

pharma1:study1 a prot:Study ;
    prot:version 1.0 ; # like MetaDataVersion@Name
    prot:crfs ( pharma1:crf1 pharma1:crf2 ) . # like ItemGroupDefs
pharma1:crf1 a prot:CRF ;
    prot:questionSet ( pharma1:initial pharma1:Q_Extol_set1 pharma1:follow ).
pharma1:Q_Extol_set1 a prot:TimeSequence ;
    prot:question pharma1:Q_Extol ;
    prot:atMinuteOffsets 1, 5, 15, 60 .

There is some glue which allows the reasoner to do reasonable stuff.

#################################################################
# Ontolological Statements
#################################################################

prot:obsValue a owl:ObjectProperty .
# prot:question a owl:ObjectProperty .
prot:CRFQuestion rdfs:subClassOf [ owl:onProperty prot:obsValue ; owl:cardinality 1 ] .

[ a owl:AllDifferent ;
  owl:distinctMembers ( terminology1:1Plus
                        terminology1:2Plus
                        terminology1:3Plus
                        terminology1:4Plus
                        terminology2:WNL
                        terminology2:absent
                        terminology2:moderate
                        terminology2:not_checked
                        terminology2:severe
                        terminology2:slight
                        terminology2:very_severe
#                        pharma1:bar
#                        pharma1:crf1
#                        pharma1:Q_Extol
#                        pharma1:foo
#                        pharma2:baz
#                        pharma2:crf2
#                        pharma2:crf3
#                        pharma2:Q_Sleep
#                        pharma2:Q_Tread
                      )
       #<http://www.w3.org/2013/02/ValueSet/prot> a owl:Ontology .
#pharma1:ExerciseTolerance a owl:Class .
#pharma1:valueSet1234 a owl:Class .
#
#prot:CRF a owl:Class .
#prot:CRFQuestion a owl:Class .
#pharma1:bar a owl:NamedIndividual .
#pharma1:foo a owl:NamedIndividual .
#pharma2:baz a owl:NamedIndividual .
#terminology1:1Plus a owl:NamedIndividual .
#terminology1:2Plus a owl:NamedIndividual .
#terminology1:3Plus a owl:NamedIndividual .
#terminology1:4Plus a owl:NamedIndividual .
#terminology2:WNL a owl:NamedIndividual .
#terminology2:absent a owl:NamedIndividual .
#terminology2:moderate a owl:NamedIndividual .
#terminology2:not_checked a owl:NamedIndividual .
#terminology2:severe a owl:NamedIndividual .
#terminology2:slight a owl:NamedIndividual .
#terminology2:very_severe a owl:NamedIndividual .

] .

Presuming we have acquired data for some combination of study, CRF, question and sampling time,

#################################################################
# Instantiations
#################################################################

pharma1:study1_crf1_Q_extol_t1 a pharma1:Q_Extol ; # study1_crf1_Q_extol_t1 instantiates Extol ("Exercise Tolerance" defined above).
    prot:obsValue terminology1:1Plus .             # study1_crf1_Q_extol_t1 has an obsValue of 1Plus (which is in valueSet1234).
pharma2:studyA_crf5_Q_tread_t1 a pharma2:Q_Tread ;
    prot:obsValue terminology1:2Plus .
pharma2:studyB_crf3_Q_sleep_t4 a pharma2:Q_Sleep ;
    prot:obsValue terminology1:3Plus .

A query for a answers of a common parent will separate the results for the exercise tolerance and treadmill endurance from those for sleep disturbance:

PREFIX : <http://www.w3.org/2013/02/ValueSet/prot#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX LOINC: <http://www.w3.org/2013/02/ValueSet/LOINC#>
SELECT ?t ?val {
  ?t :obsValue ?val ;
     a LOINC:exercise_evaluation .
}

yields the values from the related tests:

?t?val
pharma1:study1_crf1_Q_extol_t1terminology1:1Plus
pharma2:studyA_crf5_Q_tread_t1terminology1:2Plus

The above query requires runtime inferencing. This can be done entirely in SPARQL by using property paths:

PREFIX : <http://www.w3.org/2013/02/ValueSet/prot#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX LOINC: <http://www.w3.org/2013/02/ValueSet/LOINC#>
SELECT ?t ?val {
  ?t :obsValue ?val ;
     rdf:type/rdfs:subClassOf* LOINC:exercise_evaluation
}

$Revision: 1.9 $ of $Date: 2013/03/03 18:33:08 $ by $Author: eric $