Drug Safety and Efficacy Note on CDISC's Study Data Tabulation Model (SDTM)

W3C HCLS Draft Interest Group Note 01 11 2007

Eric Neumann, Clinical Semantics Group
Kerstin Forsberg, AstraZeneca
Authors and Contributors:
see Acknowledgments


CDISC's Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence of mechanism is not currently not possible / or not well described. We intend to propose an augmentation for the SDTM model using RDF-OWL that will support the inclusion of biomarker data and genotyping from subjects, associated with known mechanisms and endpoint descriptors.

Task Force Charge

This HCLSIG task force focuses on the topic of “applying semantics to R&D Informatics efforts in support Drug Safety and Efficacy” within clinial trials, as well as post-market surveillance. We also intend to demonstrate how Semantic Web standards can be applied to issues related to these in the near-term. Specifically, the task force focuses on the following areas for scenarios and activities: Identify/address challenges and needs regarding Biomarkers and Pharmacogenomics in coordination with FDA guidelines Semantic applications around Drug Safety: Signals and Notification Possible applications of Semantic Web in Clinical Trial planning, management, analysis, and reporting (e.g., EDC and EHR Single-Source, data security, integrity) Facilitating electronic submissions as per the Common Technical Document (eCTD) specifications, http://www.fda.gov/cder/guidance/7087rev.htm ) X Use Cases document to illustrate, in detail, the techniques XX provides for associating documents with appropriate instructions for extracting any embedded data.

Status of this Document

This is a draft of an interest group note. It does not yet reflect endorsement by the Semantic Web in Health Care and Life Sciences Interest Group.

Table of Contents


This HCLSIG task force (DSE) focuses on the topic of “applying semantics to R&D Informatics efforts in support Drug Safety and Efficacy” within clinical trials, as well as post-market surveillance. We also intend to demonstrate how Semantic Web standards can be applied to issues related to these in the near-term. Specifically, the task force focuses on the following areas for scenarios and activities [HCLS].


Digital data from both Non-Clinical (animal) and Clinical Studies (human) needs to be organized according to the following areas:

The tabular mode proposed by SDTM allows defining the observation forms and codes, but is constrained for wide usage by several factors. Specifically, it needs a more precise way of describing codes (ala URIs), and supporting optional and required extensions that are dependent on certain classes of studies. SDTM needs to be extended using a flexible mode to incorporate key elements of translation medicine. This means the inclusion of biomarker and genotype informationa must be efficiently (multiple sets of diverse measurements per subject per study) and scientifically (molecular, mechanistic, and phenotypic associations) addressed.

DSE Task Objectives

The objectives focused on a few key items related to the SDTM model and possible extensions to it: Develop and document Scenarios for some of the above identified areas Identify and validate some initial Best Practices for handling safety and efficacy information through semantics, which incorporate current vocabulary conventions Create one or more public Semantic Web-based Demonstrations (see Clinical Trial Demo) Coordination and collaboration with relevant organizations, possibly CDISC, ICH, HL7-RCRIM, EMEA, FDA, NCI-caBIG

The requirement to convert ODM/XML to RDF may not be approach the problem by addressing SDTM elements; data + metadata , codelists and definitions embedded in one study, instead use references to metadata and defs.


The use of information to improve the development of Efficacious and Safe Drugs rests on the proper and timely utilization of diverse information sets, and the adoption and compliance of well-defined policies. As information becomes more diverse and policies more central to the pharmaceutical industry, the development of information systems that are better suited to handle multiple information types (data and ontologies) while complying with defined policies (rules and actions) will become essential. Semantic Web technology standards offer potential solutions for: Aggregating Study Datasets, around Biomarkers (and following eCTD guidelines) Enhancing management of non-clinical and clinical controlled vocabularies that will be certainly expanding and evolving (adaptability) Providing fast access to current safety information though semantic-enabled channels (Pharmacovigilance) Applying Rules, Integrity, and Security in support of policy compliance and management (HIPAA, CFR21Part11 and Sarbanes-Oxley)

Use-Case Context

The Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence ofd mechanism is not currently possible. We intend to propose an augmented SDTM model using RDF-OWL that will support the inclusion of biomarker data from subjects, associated with known mechanisms and endpoint descriptors.

Scenario 1: Genetic Diagnostic CT

  1. Phase IIb Clinical Trial Design that includes Amplichip CYP 450 and Colon Cancer Diagnostic Chips, used to identify which CYP alleles present for drug metabolism; used to screen candidates for placement in different CT arms linkage analysis for colon cancer contributors; for population segmentation and responder analyses
  2. Candidate samples (for CYP-allele based recruitment) at last candidate screening visit
  3. Study Samples at second visit taken for colon marker analysis and possible later genome-wide analysis
  4. EDC on a CT study that will use genotype (biomarkers) to track potential tox signals
  5. Raw data SDTM bundling and linking of observations with genotype (from GeneChip data)
  6. Analysis of SDTM data between interventions and outcomes (and genertation of ADaM ); statistics, correlation, and association
  7. Final bundling of analyzed data and CT interpretations
  8. Storage of Clinical findings for clinical mining by other investigations

Scenario 2: Viewing merged clinical data from traditional tables as well as new genotypic forms as RDF graphs

  1. SDTM (SAS) Tables containing subject demographics, treatments, vital signs, and adverse events are produced and their common identifiers normalized.
    which CYP alleles present for drug metabolism; used to screen candidates for placement in different CT arms
  2. The tables are merged and converted into RDF structures (e.g., SIMILE's BABEL)
  3. These tables can be rendered and interactively viewed in browsers using SIMILE's Exhibit formatter
  4. Facets for various patient, treatment, and event categories are created using Exhibit
  5. Lenses are developed to view associations of AEs and demography data, as well as the timeline of the AEs
  6. Information of subject genetic polymorphisms is also converted into RDF and merged with the current set
  7. Associations between polymorphisms and AEs as well as treatments can be displayed
  8. Special subsets that are identified for further study can be "copied" as RDF from the viewer and stored separately
  9. The storage of Clinical findings can be used for further clinical mining by other investigations
  10. See also the Exhibit clinical trials demo

Scenario 2 Example in N3 of SDTM Model with context extensions

The following examples are work in progress (collaborative whiteboard) of how to define and organize clinical data ala the SDTM model using an RDF approach. N3 is being used here to make editing and comprehension easier. Some basic syntactical rules are reviewed here:

	@prefix cdisc: <http://www.cdisc.org/sdtm/vocab> . 
	@prefix dse: <http://www.w3.org/2001/sw/hcls/dse> . 
	@prefix nci: <http://nci.nih.gov/cadsr/vocabulary> . 
	@prefix nist: <http://nist.gov/units> . 
	@prefix time <http://www.w3.org/2006/time> . 

	//  Sex Text Code: 'MALE', 'FEMALE', 'UNKNOWN', 'Intersex'

	            a cdisc:Study ;
	            cdisc:subject <http://clinic.com/study/T2271/subject/S83221> ;
	            cdisc:subject <http://clinic.com/study/T2271/subject/S74343> ;
	        ...   .

	            a cdisc:Subject ;
	            nci:sex_code   nci:Female ;
	     //  here I assume cdisc:Diastolic_BP is a subproperty of cdisc:VSTest --
	            cdisc:observation <http://clinic.com/study/T2271/subject/S83221/observation/O6622> ;
	            cdisc:observation <http://clinic.com/study/T2271/subject/S83221/observation/O6561> ;
	    ...   .

	        a cdisc:Diastolic_BP ;  
	        cdisc:obs_context  cdisc:patient_lying ;
	        cdisc:obs_value  "98" ;
	        cdisc:obs_units  nist:mmHg .

	        a cdisc:Pulse ;         
	        cdisc:obs_context  cdisc:patient_lying ;
	        cdisc:obs_value  "64";
	        cdisc:obs_units  nist:bpm .	

Question: Is the mixing of domaion specific vocabularies with data record information a problem? Can it simply be resolved by using multiple ontologies?

Example Based on simulated Clinical Data from Stephen Dobson

	            a cdisc:Subject ;
	            nci:sex_code   nci:Female ;
	            cdisc:treatment <http://clinic.com/study/T2271/subject/4183542663506/observation/O2241> ;
	            cdisc:vitalSigns <http://clinic.com/study/T2271/subject/4183542663506/observation/O6561> ;
	            cdisc:adverseEvent <http://clinic.com/study/T2271/subject/4183542663506/observation/O6622> ;

	// ROUTE        DRGGROUP        DOSE    pid     treatment       tpfday tptday
	// IV   B       7 MG    4183542663506   7mg then 14mg SEMWEB 6/11/84 7/11/84
	<http://clinic.com/study/T2271/subject/S83221/observation/O2241 >
	        a cdisc:Treatment ;   // cdisc:Treatment is a subclass of cdisc:Observation
	        cdisc:design_arm  <http://clinic.com/study/T2271/treated_B/double_dose> ;
	        dse:route cdisc:IV_route ;
	        dse:drug_group "B" ;
	        cdisc:dose "7" ;
	        cdisc:dose_units nist:mg ;
	        cdisc:treatment "7mg then 14mg SEMWEB" ;
	        cdisc:first_date "6/11/84" ;
	        cdisc:term_date "7/11/84" ;

	... // How best to define Treatments and Experimental Design ? using cdisc:design_arm to link back to design graph?

	// VTLTEXT      VTLRES  VISIT_ID        pid     collday related
	// Standing Diastolic BP (mmHg) 75      BASELINE 4183542663506 6/11/84 1
	<http://clinic.com/study/T2271/subject/S83221/observation/O6561 >
	        a cdisc:Vital_sign ;   // cdisc:Vital_sign is a subclass of cdisc:Observation
	        cdisc:visit_id cdisc:BASELINE ;
	        cdisc:visit_date "6/11/84" ;
	        dse:obs_context  [ cdisc:position cdisc:patient_standing  . ] ;

	        cdisc:diastolic [ a cdisc:StandingDiastolic_BP ;  
	           dse:vtltext  "Standing Diastolic BP (mmHg)" ;
	           dse:related_measure  "1" ;
	           dsecdisc:obs_value  "75" ;
	           dse:obs_units  nist:mmHg .
	        ] .   

	// 4183542663506        6       9       2       MODERATE        2       NO ABDOMEN ENLARGED BODY AS A WHOLE
	<http://clinic.com/study/T2271/subject/S83221/observation/O6622 >
	        a cdisc:Adverse_Event ;   // cdisc:Adverse_Event is a subclass of cdisc:Observation
	        cdisc:visit_id cdisc:BASELINE ;
	        time:first_date "6" ;
	        time:term_date "9" ;
	        time:duration_days "2" ;
	        dse:severity AE:MODERATE ;
	        dse:rating "2" ;
	        dse:RT "NO" ;
	        dse:prefText "NO ABDOMEN ENLARGED" ;
	        dse:bodyText "BODY AS A WHOLE" ;
	        dse:obs_context  [ cdisc:position cdisc:patient_standing  . ] .

Discussion regarding CDISC SDTM Code lists

Below, in the related resources section, two examples are attached from CDISC usage of NCI caDSR for the so called SDTM Controlled Terminologies (see slides on NCIt and CDISC CT). These examples include the permissible values as strings to be incorporated in SDTM datasets, e.g:

It is important to recognize the different approaches in 1) CDISC SDTM standard, and in 2) NCI Thesaurus and in what I could like to see as 3) Observation Types Ontologies, see more details below. And how to relate these to existing terminologies such as LOINC codes and Clinical Findings in SNOMED CT.

This would enable the publication of observation types ontologies. As a formal descriptions of the required patient and measurement context such as these for the measurement of blood pressure:

Such observation types ontologies could also be the place to describe the different classifications such as this type of observation 'is-a-Metabolic_Marker', according to nciOncology and 'is-a-Finding', according to SDTM general classes.

Clinical Trial Data View Demo

This demo illustrates how clinical trials data formatted as RDF can be visualized. It takes advantage of MIT's SIMILE's Exhibit technology, and shows how easy it is to merge and visualize aggregated data (graphs) through the Web. The data was obtained from four separate SDTM generated Excel documents, that were converted and merged using SIMILE's BABEL utility.

Strategy and Technology

Related resources


CDISC's SDTM model can be mapped into an RDF based model, provided that its key data entities (e.g., Subjects, Observations, Studies) are mapped to RDF-S or OWL types and then specified (especially those that cannot be defined by blank nodes) using URIs. There are clear advantages of using such a model, since additional links and metadata can easily be extended to any study set. Considerations should still me made on whether all current attributes are truly bound conceptually to their subject (true entity characteristics), or are contextually (e.g., Study or Project) dependent. These may be handled by different sets of vocabulary and predicates.

The SDTM does provide an extension mechanism called Supplemented Qualifiers (SUPPQUAL). Such datasets consist of supplementary qualifiers extending the predefined and permissible variables for record qualifiers in the different SDTM domains. SDTM also has also a way to relate records from different domains. For example relating the Pharmacokinetic Parameters to their Concentrations. However, CDISC do not provide a mechanism to define what qualifiers are required for each type of observation. This may be an area where appropriate Semantic Web and ontological contributions could effectively address these issues.

The more complex issues we have not yet addressed include an appropriate mapping of terminology codes and strings into OWL or SKOS defined URI entities. Many of these issues will probably be discussed and addressed in the new Clinical Observations Interoperability task force (http://esw.w3.org/topic/HCLS/ClinicalObservationsInteroperability/).

HCLSIG/Drug Safety and Efficacy/SDTM Notes Draft 1.0 (last edited 2007-11-05 08:40:14 by 212)

Further Information

This concludes the SDTM Note. Further DSE discussions can be found at http://esw.w3.org/topic/HCLSIG/Drug_Safety_and_Efficacy.


Clinical Data Interchange Standards Consortium (CDISC) is an open, multidisciplinary, non-profit organization that has established worldwide industry standards to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development. The mission of CDISC is to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare.
[NCI Thesaurus]
NCI Thesaurus as a large OWL file , Mindswap, Maryland Information and Network Dynamics Lab Semantic Web Agents Project.
NCI Center for Bioinformatics , Welcome to the NCI Center for Bioinformatics.


The editor would like to thank the following Working Group members for authoring this document:

This document is a product of the Drug Safety and Efficacy Task Force of the HCLS Interest Group.


A proposed vision for healthcare research that would accelerate the development of new treatments through more direct use of R&D knowledge.
A biological characteristic that is objectively measured and evalauted as an indicator of normal biological or pathologenic processes or pharmacological responses to a therapeutic intervention; these can be obtained from microarray analysis, protein assays, or even clinical signs. (see FDA definition and EMEA definition)
Any clinical phenomenon that occurs during a study at a specific time and with a specific study subject. Adverse Events are an important subclass that need to be identified and addressed within studies.
Any observation made during a clinical study. This often is accompanied by an interpretation.
Any clinical procedure that is performed within a study. This may include a drug treatment, a diagnosis, a treatment discontinuation, or a surgical procedure.
The branch of pharmaceutics which deals with the influence of genetic variation on drug response, efficacy or toxicity.
The pharmacological practice of the detection, assessment and prevention of adverse effects, particularly long term and short term side effect, through the monitoring and analysis of data, even after a product has been launched (Postmarketing pharmacovigilance). For more info see http://en.wikipedia.org/wiki/Pharmacovigilance
Genetic variations at the DNA base level that exist in each organism, and which distinguish one individual from another. Often these affect propensity for a disease, drug metabolism, or drug response. The may be due to single nucleotide changes (SNPs), or larger deletion, rearrangement or insertion.
[Safety Signals]
Any solid indicators that can provide early warning to adverse drug reactions.
[Translational Medicine]
A proposed vision of Drug R&D that fosters better association of animal and human studies in order to make better decision in the development process that would reduce costs and increase time-to-market.

Change Log

$Log: Note_DSE_20071108.html,v $
Revision 1.3  2007/11/09 15:35:33  eric
~ reflect that this in not a Note
~ well-formed

Revision 1.2  2007/11/09 13:52:58  eneumann2
something helpful

Revision 1.1  2007/11/05 04:33:51  eneumann2

Revision 1.0  2007/11/05 18:39:11  eneumann