HCLS/UncertaintyUseCases

Uncertainty Use Cases Encountered in HCLS

Hypothesis Uncertainty
- Mutations in the alpha synuclein could cause Parkinsons Disease
- Hypotheses of relationships based on statistical analysis of microarray data associated with p-values, confidence intervals, etc.
- Gene Ontology Evidence codes in support of a particular GO annotation of a gene
- Evidence classes in the OBO Evidence Ontology
Interpretation/Classification Uncertainty
- The patient has elevated cholesterol based on his reading of X mg/dl
- Given the same set of symptoms, Doctor X and Y come up with diagnosis of mild and severe disease respectively
- True/False Positive/Negative rates of patient classifications and diagnoses. Use of measures such as Precision, Recall, PPV, NPV, etc.
Prediction-oriented Uncertainty
- A person with the BRCA1 gene has a disposition towards Breast Cancer with 70% probability in the future
Belief oriented uncertainty
- It is believed to the best of our knowledge that a particular gene is not implicated in a particular disease
- Associated non-monotonicity with the above, i.e., if more knowledge is available, the statement could be proven false.
Data Source based Uncertainty
- Samples from the same patient are analyzed by different labs. Lab 1 results show an 80% probability of Disease 1, whereas Lab2 shows a 90% probability for the same.
- If the Cleveland Clinic says that Avandia is bad for Diabetes, the statement has a higher value of certainty as opposed to an individual Dr. X
Data Uncertainty
- Approximate location of a clinical feature, e.g, tumor in spatial location in the human body as captured in radiological image or any other digital artifact
- Data inconsistency and incompletenes encountered in Healthcare and Drug Databases
- Data uncertainty introduced due to sampling errors, sampling rates, etc.)
- Data uncertainty introduced due to the limitations (least count error?) of the device measuring patient characteristics (e.g., temperature)
- Data uncertainty introduced due to limitation of instruments used to collect experimental data, e.g., micro-arrays

Patterns identified within the Use Cases

Belief statements made by researchers: interpretations, hypotheses, classification models
Data analytic uncertainties: sampling or machine induced
Data - metadata ommission (too open-world): Absence of relevant time and location information

Proposed Solutions

Thresholding issues
KD45 and S5
Named Graphs and RDF-based approaches

Some Thoughts on the Above

(Please feel free to delete/ add these to the main body)

Much of clinical research only produces uncertain knowledge. Clinical trials, especially of treatments, produce statistical associations between treatment and outcome. This knowledge base contains conflicts and is defeasible - later knowledge may lead to a different conclusion.
One solution is to harness existing EBM approaches to ranking evidence. Then knowing that a study is an RCT allows one to infer greater certainty about its conclusions than if it is a case-control study.
Reification (in a general, non-RDF sense) takes place at (at least) two points: Some Knowledge Engineer claims that this study claims that X -> Y. This may be important for trust (where we may be concerned about conflict of interest).
The OBO evidence ontology seems to lack terms for clinical evidence. I may be wrong, but I couldn't find any. This seems to be an easy fix.