Copyright © 2006-2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
CDISC's Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence of mechanism is not currently not possible / or not well described. We intend to propose an augmentation for the SDTM model using RDF-OWL that will support the inclusion of biomarker data and genotyping from subjects, associated with known mechanisms and endpoint descriptors.
This HCLSIG task force focuses on the topic of “applying semantics to R&D Informatics efforts in support Drug Safety and Efficacy” within clinial trials, as well as post-market surveillance. We also intend to demonstrate how Semantic Web standards can be applied to issues related to these in the near-term. Specifically, the task force focuses on the following areas for scenarios and activities: Identify/address challenges and needs regarding Biomarkers and Pharmacogenomics in coordination with FDA guidelines Semantic applications around Drug Safety: Signals and Notification Possible applications of Semantic Web in Clinical Trial planning, management, analysis, and reporting (e.g., EDC and EHR Single-Source, data security, integrity) Facilitating electronic submissions as per the Common Technical Document (eCTD) specifications, http://www.fda.gov/cder/guidance/7087rev.htm ) X Use Cases document to illustrate, in detail, the techniques XX provides for associating documents with appropriate instructions for extracting any embedded data.
This is a draft of an interest group note. It does not yet reflect endorsement by the Semantic Web in Health Care and Life Sciences Interest Group.
This HCLSIG task force (DSE) focuses on the topic of “applying semantics to R&D Informatics efforts in support Drug Safety and Efficacy” within clinical trials, as well as post-market surveillance. We also intend to demonstrate how Semantic Web standards can be applied to issues related to these in the near-term. Specifically, the task force focuses on the following areas for scenarios and activities [HCLS].
Digital data from both Non-Clinical (animal) and Clinical Studies (human) needs to be organized according to the following areas:
The tabular mode proposed by SDTM allows defining the observation forms and codes, but is constrained for wide usage by several factors. Specifically, it needs a more precise way of describing codes (ala URIs), and supporting optional and required extensions that are dependent on certain classes of studies. SDTM needs to be extended using a flexible mode to incorporate key elements of translation medicine. This means the inclusion of biomarker and genotype informationa must be efficiently (multiple sets of diverse measurements per subject per study) and scientifically (molecular, mechanistic, and phenotypic associations) addressed.
The objectives focused on a few key items related to the SDTM model and possible extensions to it: Develop and document Scenarios for some of the above identified areas Identify and validate some initial Best Practices for handling safety and efficacy information through semantics, which incorporate current vocabulary conventions Create one or more public Semantic Web-based Demonstrations (see Clinical Trial Demo) Coordination and collaboration with relevant organizations, possibly CDISC, ICH, HL7-RCRIM, EMEA, FDA, NCI-caBIG
The requirement to convert ODM/XML to RDF may not be approach the problem by addressing SDTM elements; data + metadata , codelists and definitions embedded in one study, instead use references to metadata and defs.
The use of information to improve the development of Efficacious and Safe Drugs rests on the proper and timely utilization of diverse information sets, and the adoption and compliance of well-defined policies. As information becomes more diverse and policies more central to the pharmaceutical industry, the development of information systems that are better suited to handle multiple information types (data and ontologies) while complying with defined policies (rules and actions) will become essential. Semantic Web technology standards offer potential solutions for: Aggregating Study Datasets, around Biomarkers (and following eCTD guidelines) Enhancing management of non-clinical and clinical controlled vocabularies that will be certainly expanding and evolving (adaptability) Providing fast access to current safety information though semantic-enabled channels (Pharmacovigilance) Applying Rules, Integrity, and Security in support of policy compliance and management (HIPAA, CFR21Part11 and Sarbanes-Oxley)
The Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence ofd mechanism is not currently possible. We intend to propose an augmented SDTM model using RDF-OWL that will support the inclusion of biomarker data from subjects, associated with known mechanisms and endpoint descriptors.
The following examples are work in progress (collaborative whiteboard) of how to define and organize clinical data ala the SDTM model using an RDF approach. N3 is being used here to make editing and comprehension easier. Some basic syntactical rules are reviewed here:
@prefix cdisc: <http://www.cdisc.org/sdtm/vocab> .
@prefix dse: <http://www.w3.org/2001/sw/hcls/dse> .
@prefix nci: <http://nci.nih.gov/cadsr/vocabulary> .
@prefix nist: <http://nist.gov/units> .
@prefix time <http://www.w3.org/2006/time> .
// Sex Text Code: 'MALE', 'FEMALE', 'UNKNOWN', 'Intersex'
<http://clinic.com/study/T2271>
a cdisc:Study ;
cdisc:subject <http://clinic.com/study/T2271/subject/S83221> ;
cdisc:subject <http://clinic.com/study/T2271/subject/S74343> ;
... .
<http://clinic.com/study/T2271/subject/S83221>
a cdisc:Subject ;
nci:sex_code nci:Female ;
// here I assume cdisc:Diastolic_BP is a subproperty of cdisc:VSTest --
cdisc:observation <http://clinic.com/study/T2271/subject/S83221/observation/O6622> ;
cdisc:observation <http://clinic.com/study/T2271/subject/S83221/observation/O6561> ;
... .
<http://clinic.com/study/T2271/subject/S83221/observation/O6622>
a cdisc:Diastolic_BP ;
cdisc:obs_context cdisc:patient_lying ;
cdisc:obs_value "98" ;
cdisc:obs_units nist:mmHg .
<http://clinic.com/study/T2271/subject/S83221/observation/O6561>
a cdisc:Pulse ;
cdisc:obs_context cdisc:patient_lying ;
cdisc:obs_value "64";
cdisc:obs_units nist:bpm .
Question: Is the mixing of domaion specific vocabularies with data record information a problem? Can it simply be resolved by using multiple ontologies?
Example Based on simulated Clinical Data from Stephen Dobson
<http://clinic.com/study/T2271/subject/4183542663506>
a cdisc:Subject ;
nci:sex_code nci:Female ;
cdisc:treatment <http://clinic.com/study/T2271/subject/4183542663506/observation/O2241> ;
cdisc:vitalSigns <http://clinic.com/study/T2271/subject/4183542663506/observation/O6561> ;
cdisc:adverseEvent <http://clinic.com/study/T2271/subject/4183542663506/observation/O6622> ;
// ROUTE DRGGROUP DOSE pid treatment tpfday tptday
// IV B 7 MG 4183542663506 7mg then 14mg SEMWEB 6/11/84 7/11/84
<http://clinic.com/study/T2271/subject/S83221/observation/O2241 >
a cdisc:Treatment ; // cdisc:Treatment is a subclass of cdisc:Observation
cdisc:design_arm <http://clinic.com/study/T2271/treated_B/double_dose> ;
dse:route cdisc:IV_route ;
dse:drug_group "B" ;
cdisc:dose "7" ;
cdisc:dose_units nist:mg ;
cdisc:treatment "7mg then 14mg SEMWEB" ;
cdisc:first_date "6/11/84" ;
cdisc:term_date "7/11/84" ;
... // How best to define Treatments and Experimental Design ? using cdisc:design_arm to link back to design graph?
// VTLTEXT VTLRES VISIT_ID pid collday related
// Standing Diastolic BP (mmHg) 75 BASELINE 4183542663506 6/11/84 1
<http://clinic.com/study/T2271/subject/S83221/observation/O6561 >
a cdisc:Vital_sign ; // cdisc:Vital_sign is a subclass of cdisc:Observation
cdisc:visit_id cdisc:BASELINE ;
cdisc:visit_date "6/11/84" ;
dse:obs_context [ cdisc:position cdisc:patient_standing . ] ;
cdisc:diastolic [ a cdisc:StandingDiastolic_BP ;
dse:vtltext "Standing Diastolic BP (mmHg)" ;
dse:related_measure "1" ;
dsecdisc:obs_value "75" ;
dse:obs_units nist:mmHg .
] .
// pid AEFDAY AETDAY AESEV AESEVT AESER AESERT PREFTEXT BODYTEXT
// 4183542663506 6 9 2 MODERATE 2 NO ABDOMEN ENLARGED BODY AS A WHOLE
<http://clinic.com/study/T2271/subject/S83221/observation/O6622 >
a cdisc:Adverse_Event ; // cdisc:Adverse_Event is a subclass of cdisc:Observation
cdisc:visit_id cdisc:BASELINE ;
time:first_date "6" ;
time:term_date "9" ;
time:duration_days "2" ;
dse:severity AE:MODERATE ;
dse:rating "2" ;
dse:RT "NO" ;
dse:prefText "NO ABDOMEN ENLARGED" ;
dse:bodyText "BODY AS A WHOLE" ;
dse:obs_context [ cdisc:position cdisc:patient_standing . ] .
Below, in the related resources section, two examples are attached from CDISC usage of NCI caDSR for the so called SDTM Controlled Terminologies (see slides on NCIt and CDISC CT). These examples include the permissible values as strings to be incorporated in SDTM datasets, e.g:
It is important to recognize the different approaches in 1) CDISC SDTM standard, and in 2) NCI Thesaurus and in what I could like to see as 3) Observation Types Ontologies, see more details below. And how to relate these to existing terminologies such as LOINC codes and Clinical Findings in SNOMED CT.
This would enable the publication of observation types ontologies. As a formal descriptions of the required patient and measurement context such as these for the measurement of blood pressure:
Such observation types ontologies could also be the place to describe the different classifications such as this type of observation 'is-a-Metabolic_Marker', according to nciOncology and 'is-a-Finding', according to SDTM general classes.
This demo illustrates how clinical trials data formatted as RDF can be visualized. It takes advantage of MIT's SIMILE's Exhibit technology, and shows how easy it is to merge and visualize aggregated data (graphs) through the Web. The data was obtained from four separate SDTM generated Excel documents, that were converted and merged using SIMILE's BABEL utility.
CDISC's SDTM model can be mapped into an RDF based model, provided that its key data entities (e.g., Subjects, Observations, Studies) are mapped to RDF-S or OWL types and then specified (especially those that cannot be defined by blank nodes) using URIs. There are clear advantages of using such a model, since additional links and metadata can easily be extended to any study set. Considerations should still me made on whether all current attributes are truly bound conceptually to their subject (true entity characteristics), or are contextually (e.g., Study or Project) dependent. These may be handled by different sets of vocabulary and predicates.
The SDTM does provide an extension mechanism called Supplemented Qualifiers (SUPPQUAL). Such datasets consist of supplementary qualifiers extending the predefined and permissible variables for record qualifiers in the different SDTM domains. SDTM also has also a way to relate records from different domains. For example relating the Pharmacokinetic Parameters to their Concentrations. However, CDISC do not provide a mechanism to define what qualifiers are required for each type of observation. This may be an area where appropriate Semantic Web and ontological contributions could effectively address these issues.
The more complex issues we have not yet addressed include an appropriate mapping of terminology codes and strings into OWL or SKOS defined URI entities. Many of these issues will probably be discussed and addressed in the new Clinical Observations Interoperability task force (http://esw.w3.org/topic/HCLS/ClinicalObservationsInteroperability/).
HCLSIG/Drug Safety and Efficacy/SDTM Notes Draft 1.0 (last edited 2007-11-05 08:40:14 by 212)This concludes the SDTM Note. Further DSE discussions can be found at http://esw.w3.org/topic/HCLSIG/Drug_Safety_and_Efficacy.
The editor would like to thank the following Working Group members for authoring this document:
This document is a product of the Drug Safety and Efficacy Task Force of the HCLS Interest Group.
$Log: Note_DSE_20071108.html,v $ Revision 1.3 2007/11/09 15:35:33 eric ~ reflect that this in not a Note ~ well-formed Revision 1.2 2007/11/09 13:52:58 eneumann2 something helpful Revision 1.1 2007/11/05 04:33:51 eneumann2 Note_DSE_20071108.html Revision 1.0 2007/11/05 18:39:11 eneumann