HCLS/ClinicalObservationsInteroperability/RIMRDFOWL

From W3C Wiki
Jump to: navigation, search

RDF/OWL Representation of HL7/RIM v3.0

RDF/OWL Representations of HL7/RIM v3.0 snippets of HL7/RIM v3.0 corresponding to the data content requirements identified in the Functional Requirements documents will be entered here.

Participants: Dan Russler, Matt Moores, Helen Chen, Parsa Mirhaji, Rachel Richesson, Jyoti Pathak and Vipul Kashyap

Basic RIM Presentation by Dan Russler: Basic RIM Tutorial PDF Version

We are now working on a list of modeling constructs for Diabetes and Hypertension Data Requirements as identified from the some clinical trials protocols. These include identifying information model elements, vocabulary concepts and data types from key standards such as HL7/RIM, Detailed Clinical Models (DCM), Clinical Document Architecture (CDA) and the Study Data Tabulation Model:

Questions on HL7/RIM

Introduction: This project is a sub-project of the larger, mapping project of HL7 RIM-based healthcare domains to the life sciences domain models. We have noted that the mapping procedures need to be defined before the data from specific use cases can be applied to the larger project.

This sub-project is based on the "proposition" that all models are based on a set of "primitive propositions" like those propositions represented in RDF triples. However, due to differing representation techniques in models, it is difficult to examine the propositions asserted in each model. For examples, models expressed in UML and models expressed in XML look much different visually, but may express the same propositions, also sometimes known as "semantic relationships." This visual differences between the models respectively expressed in UML and XML makes comparison & contrast activities of these propositions, i.e. sematic relationships, difficult for human analysts.

It is noted by the team that the two tools under consideration in this project, RDF and OWL, both can handle basic abstract modeling of propositions. However, RDF is best for expressing use case data because RDF records are designed to be manipulated in databases that hold billions of instances of RDF instances, like those found in healthcare use cases. On the other hand, OWL allows more expressive propositions to be expressed, like those that represent general knowledge rather than specific patient data elements. Knowledge that is only expressed once by OWL can be used many times to evaluate specific propositions about specific patients held in RDF database schemas. Using RDF to express the propositions about specific resources on the web and using OWL to reason about the general relationships regarding semantic equivalence between propositions represents a best practice in the use of web ontology-based tools.

Therefore, a cataloging of the primitive propositions of each of the healthcare and life science models utilizing a common proposition schema suitable for database storage and retrieval, e.g. RDF, is one technique for developing a compare/contrast analysis of the primitive propositions in each model. Through direct human analysis of use case data, it should be possible to identify the propositions that are fully coherent in multiple models and also identify those propositions that are unique to a specific model.

This sub-project will also explore the ability of OWL reasoners to navigate across a combined set of RDF-based propositions from multiple models with the goal of identifying and locating web-based resources with specific semantic content. This sub-project will explore the ability of OWL reasoners to support the mapping between the ontologies of individual models, e.g. is an asserted proposition about the observation class in one model exactly equivalent to an asserted proposition about the observation class in another model? In other words, is an asserted proposition about an observation on a patient expressed in one web based resource based on one model exactly equivalent to an asserted proposition about the same observation on the same patient in another web based resource utilizing a different model?

Finally, the question of whether these RDF-OWL-based searching capabilities are useful in various healhcare and life science interoperable communication scenarios will be explored.

Premises: A list of premises, i.e. assumptions and assertions, will be created at the beginning of this project to assert agreements on the meanings of terms, utilizations of terms, accepted theory, etc:

1) "Proposition" refers to the use of the term by Aristotle (384 BC – 322 BC).

2) The uses of propositions have been studied in many fields, including general philosophy, logic, mathematics, computer science, artificial intelligence, linguistics, semiotics, and many others. As a result, there have been slight variations on the application of the term "proposition" over the two thousand years the term "proposition" has existed.

3) Classically, the proposition includes two parts, the SUBJECT and the PREDICATE. The predicate is used to constrain the subject, e.g. "Socrates is a man." In this sentence, "Socrates" is the subject and "is a man" is the predicate. In this sentence, "man" is considered the object of the predicate, which itself is used to constrain the predicate.

4) A proposition may resolve to true or false. In other words, in itself, a proposition is not a statement of truth. For purposes of this project, the term "assertion" will be used to describe a proposition that is believed to be true by the author. In healthcare, propositions are often believed to be true within a specific range of probability. Therefore, one could ague whether the author is making a "proposition" or an "assertion" when making an observation on a patient. Best practice in healthcare is for authors to appropriately qualify the reliability of assertions. However, this appropriate qualification is often not performed in general practice. In addition, it is well understood that many assertions made in healthcare are actually false. Therefore, advanced healthcare providers are trained to treat "assertions" as "propositions" that must be re-evaluated before the advanced healthcare provider adopts a "belief" about the proposition.

5) In mathematics, the proposition that "Four equals two plus two" is believed to be true. In this case, the subject is "Four" and the predicate is "equals two plus two." Again, the predicate constrains the subject. The object of the predicate in this example is "two plus two," which represents a constraint on "equals."

6) Resource Description Framework (RDF) is a schema that was designed to support understanding of the semantic content of resources on the Web.

7) The RDF Schema utilizes a slightly different definition of "predicate" than found in the predicates of classical propositions and in grammar.

The RDF schema may be simply represented with the notation:

SUBJECT | PREDICATE | OBJECT

where SUBJECT plays the same role as "subject of a sentence"

where OBJECT plays the same role as "direct object of a verb of a sentence"

where PREDICATE provides the same functions between a "subject and a direct object in a sentence that the verb of the sentence plays"

Therefore, the term "predicate" in RDF does NOT include the "object of the predicate" as used in grammar.

The classic proposition in logic is "Socrates is a man." Another classic proposition is that "Four equals two plus two"

These two propositions may be represented in RDF schema notation defined above as:

Socrates | is a | man

Four | equals | two plus two

8) Healthcare records that are represented in RIM-based standards frequently use the Object Identifier (OID) standard to uniquely identify instances of medical record parts, e.g. CDA documents or parts of CDA documents. Thefore, a kind of "resource" in healthcare that may need to be semantically evaluated and located on the web may be a specific OID. An OID is expressed in the form: "2.16.840.1.113883.3.74.40.1.1.1" extension="62204". The RDF proposition that represents the location of such an OID may be expressed as:

root = "2.16.840.1.113883.3.74.40.1.1.1" extension="62204 | hasWebLocation | URI

where URI resolves to a resource locatation technique such as the well-known URL that actually can be used to find a specific resource on the web. In this case, this resource on the web might describe the demographics of a specific patient (since it was taken from a CDA document with a specific patient as named as the record target of the document).

However, the OWL reasoner won't be able to tell what kind of resource this OID identifies without more information. The needed information may be describe in a series of RDF propositions:

root = "2.16.840.1.113883.3.74.40.1.1.1" extension="62204 | RIM.Entity.ClassCode | PSN

where "PSN" is a code for "Person"

root = "2.16.840.1.113883.3.74.40.1.1.1" extension="62204 | RIM.Entity.LivingSubject.AdministrativeGenderCode.AdministrativeGenderValueSet | Female

From these two propositions in this example, we know that the resource at the URI location specified includes a "female person."

9) Since, in the above approach, the many specific resources on the Web are directly described by many RDF triples, what is the role of OWL? Note that the SUBJECT of most RDF triples in the above examples are URN's (OID's in these examples). We could call this type of RDF triple an "OID-specific RDF triple." The relationships between the PREDICATEs and OBJECTs in this model might effectively define a Knowledge-base used to classify resources on the web. In other words, since the OID's themselves are described by the PREDICATES and OBJECTS of each RDF triple, OWL may be assigned the role of assisting reasoning across the relationships between the various "OID-specific" PREDICATEs and OBJECTs used in these RDF assertions in order to better understand the semantic contents of a resource on the web. Likewise, OWL might also assist in reasoning about relationships between OBJECTs and OBJECTs used in OID-specific RDF triples and and between PREDICATEs and PREDICATEs in OID-specific RDF triples.

10) Since propositions may be "true" or "false," and since assertions are believed to be "true" by the author and only sometimes "true" by the reader, one needs to evaluate the use of propositions and assertions in "ontologies" before applying ontologies used by OWL to support reasoning about the semantic content of resources on the web.

11) Since the coining of term "Ontology" in the middle ages, the term ontology has been used to describe the study of belief systems. Originally used to study the intersection of Physics (observation about the physical world) and Metaphysics (belief about the spiritual world), "ontology" has been used in modern times to incorporate the assertions (propositions believed to be true) into knowledge bases and other published artifacts, e.g. artificial intelligence systems and healthcare vocabulary publications. Like Dante's ontology in his "Inferno," where he descends through a hierarchy of related sins, modern ontologies are usually organized into extensive collections of parent-child and sibling relationships. Modern ontologies in information sciences are now developed through a consensus-based process. Individuals submit propositions describing a set of parent-child and sibling relationships to a governing body for the published ontology. Then, the governing body declares the propositions to be assertions accepted as part of the belief system of the governing organization. These assertions in the published ontology may or may not be accepted as "true" by a wider audience.

12) In Healthcare and Life Sciences, many ontologies have been proposed by various authors and organizations. Some ontologies have had greater influence than others. In the healthcare domain, the HL7 Reference Information Model (RIM), which focuses on at the upper level on "Act" and "Entity" has had wide influence. The ACTs of the RIM can be traced to ealier HL7 artifacts designed in the 1980's, but "Act" also traces back centuries to Act-centered philosophies of science. The RIM has also been designed to work with independently published vocabulary systems, which contain their own ontologies. SNOMED is an example of an ontology published as a healthcare vocabulary with origins in pathology practices. In Life Sciences, John F. Sowa's upper ontology, published in his book on Knowledge Representation (1999) from his work dating to 1996, focuses on "Continuent" and "Occurrent." These concepts are featured in the Open Biomedical Onotology project (OBO) Basic Formal Ontology (BFO). However, use of the terms Continuant and Occurent find their origins in the 1920's. For purposes of understanding the semantic content of resources on the web, an OWL reasoner will need to be able to reason across the assertions published in each of these various ontologies, a daunting task.

Outcomes:

Based on these assumptions, this sub-project will deliver two artifacts:

1) A set of "OID-based RDF Triples" that are used to directly describe resources HL7 RIM-based objects identified by OID's.

2) An OWL-based artifact that describes the relationships between the array of PREDICATEs and OBJECTs found in "OID-based RDF Triples" for HL7 RIM-based objects.

This sub-project will then combine artifacts with other sub-projects in an attempt to identify the assertions that are coherent (exactly semantically equivalent) and those that are not coherent (unique assertions or assertions that are not exactly semantically equivalent).

Finally, the challenges of creating useful OWL-based reasoners will be addressed. The specific challenge is an OWL reasoner that can reason across a large set of assertions from multiple ontologies in order to locate the available healthcare or life science web resources without returning too many or too few resource locations.

Relevant HL7 RIM resources:

Process:

1) Begin with data elements described in use cases

2) Map the data elements to equivalent HL7 RIM-based Clinical Care Structures (RMIMS)

3) Express the propositions (RDF triples)from the HL7 RIM in RDF

4) Add the additional propositions expressed in the RIM-based Clinical Care Structures

5) Populate the RDF expressions with the specific instance values from the use cases, e.g. cholesterol 200mg/dl