HCLSIG/Drug Safety and Efficacy/SDTM Notes Draft 1.0

From W3C Wiki
Jump to: navigation, search

SDTM Notes Draft 1.0

Authors: Eric Neumann, Kerstin Forsberg, Bosse Andersson, Stephen Dobson


CDISC's Study Data Tabulation Model (SDTM) is used to define the study components in terms of domains and observations for a given clinical trial study. However, the ability to use it for sets of biomarkers that serve to define surrogate endpoints and/or evidence of mechanism is not currently not possible / or not well described. We intend to propose an augmentation for the SDTM model using RDF-OWL that will support the inclusion of biomarker data and genotyping from subjects, associated with known mechanisms and endpoint descriptors.


Digital data from both Non-clinical (animal) and Clinical Studies needs to be organized according to the following areas:

  1. Study (sponsor, phase, design, objectives)
  2. Subject (selection criteria, background med info)
  3. Observations (classified as Findings, Events, and Interventions)

The tabular mode proposed by SDTM allows defining the observation forms and codes, but is constrained for wide usage by several foctors. Specifically, it needs a more precise way of describing codes (ala URIs), and supporting optional and required extensions that are dependent on certain classes of studies.

SDTM needs to be extended using a flexible mode to incorporate key elements of translation medicine. This means the inclusion of biomarker and genotype informationa must be efficiently (multiple sets of diverse measurements per subject per study) and scientifically (molecular, mechanistic, and phenotypic associations) addressed.


The proposal is to work on a few key items regarding the extension of the SDTM model:

  • CDISC data repository terms from NCI, currently just strings, but should be URIs
  • SDTM Code lists should be both human readable AND machine processable
  • Consider transforming SDTM into a graph extensible form (SDGM?)
  • Classify Observation types in SDTM per Events, Findings, etc.; offer a mechanism to associate patient measurement context (BP- 90/50; while: patient_is_laying_down)
  • Define URI for SDTM entities and concepts with appropriate authority association
  • SDTM is currently stateless (was data updated?), so the inclusion of state (version) info into SDTM RDF model is important
  • Inclusion of provenance data from clinical (non-repudiability?)

The requirement to convert ODM/XML to RDF may not be approach the problem by addressing SDTM elements; data + metadata , codelists and definitions embedded in one study, instead use references to metadata and defs

Discussion regarding CDISC SDTM Code lists

Below, in the related resources section, two examples are attached on how the current XML output looks like from CDISC usage of NCI caDSR for the so called SDTM Controlled Terminologies. These inlude the permissable values as strings to be incorporated in SDTM datasets, e.g:

  • Sex Text Code: 'MALE', 'FEMALE', 'UNKNOWN', 'Intersex'
  • Vital Signs Test Name Text Name: 'BMI', 'DIABP', 'TEMP', ...

It is important to recognise the different approaches in 1) CDISC SDTM standard, and in 2) NCI Thesaurus and in what I could like to see as 3) Observation Types Ontologies, see more details below. And how to relate these to existing terminologies such as LOINC codes and Clinical Findings in SNOMED CT.

  • Today CDISC SDTM standard use short names/terms as strings such as 'DIABP' for "the topic for the focus of the observation" in SDTM datasets. And also for the categorical observation results such as 'FEMALE' for the observation of Sex. These strings are in NCI caDSR linked to concept codes in NCI Thesaurus, such as C25299 for the concept with the name diastolic_pressure as a sub-class of the super-class 'Personal Attribute' with a defintion from UMLS.
  • Currently NCI publish the content in NCI Thesaurus as a large OWL file with URI:s assigned for all NCI concepts. For example this one: http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Diastolic_Pressure (NB links to a very large owl file). In the future this will probably be changed to: http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C25299 (NB links to a very large owl file). Potentially this could be used as a URI for the concept describing the medical phenomena of diastolic blood pressure
  • I the future I would like to have namespaces with URIs assigned for groups of different Types of Observations as recordings of medical phenomena, but also as recordings of actions such as treatments and recordings of adverse events. For basic ones such as the observation of blood pressure and other kinds of vital signs, but also for other more interesting "surrogate endpoints and/or evidence of mechanism".

This would enable the publication of observation types ontologies. As a formal descriptions of the required patient and measurement context such as these for the measurement of blood pressure:

  • the position of the patient at the time of measuring (sitting, lying, etc.),
  • the tilt of the surface on which the person is lying,
  • the variation in measured blood pressure with respiration,
  • the instrument used to measure the blood pressure,
  • the size of the cuff if a sphygmomanometer is used,

The above list is taken from a HL7 Watch blog posting by Barry Smith

Such observation types ontologies could also be the place to describe the different classifications such as this type of observation 'is-a-Metabolic_Marker', according to nciOncology and 'is-a-Finding', according to SDTM general classes.

Strategy and Technology

  • RDF mappings
  • URI syntax and definitions
  • OWL Ontologies or SKOS Vocabularies
  • Data and Type Validators

Related resources