HCLSIG/Terminology/MakingHealthCareTerminologySWAccessible

Making Standard Healthcare Terminologies accessible to Semantic Web Applications

Proposal

Consensus on 'new' content vocabularies is a significant challenge to health care informatics. Development of formal, expressive medical record terminologies is severely stunted with the exception of existing terminology standards such as SNOMED-CT and the National Library of Medicine's UMLS. Such standards have the advantage of input from domain experts as well as (in the case of these two in particular) decent coverage of the biomedical domain. However, each is typically distributed in its own (often proprietary) format. This is the case even for terminology standards that do leverage an underlying, formal language.

In addition, it is often the case that such terminology systems are distributed with licensing restriction that prevent the kind of serendipitous re-use that is often a primary value proposition for adoption of Semantic Web technologies. This can introduce issues related to authority and web presence. As a canonical example, the Linked Data principles expect that for URIs used in RDF content, an end user is expected to provide useful information when someone looks up a URI. For terminology standards with restrictive licenses, what URIs make sense as proxies for the concepts in the originating system? If HTTP URIs are used, should the domain name associated with the stewards of the standard be used and how can such a scheme ensure the authority of the owners of the terminology but adhere to best practices of Web Architecture at the same time?

This task group would only address the challenge of how one can extract Semantic Web representations from such terminology systems. By focusing on the method only, this bypasses the licensing issues but is still of value to users who are authorized to use the terminology and wish to leverage Semantic Web technologies in the process.

As a catalyst, existing mappings from SNOMED-CT and UMLS Semantic Network will be reviewed as primary input leading up to findings on re-usable methods for extracting Semantic Web representations.

Goal

The goal of this task group would be to identify use cases and methods for extracting Semantic Web representations (such as OWL) from existing, standard medical record terminologies. Such methods should be reproducible and - to the extent it is possible - not lossy (i.e., the extracted representations should faithfully capture the semantics of the originating terminology set as much as possible) . In addition, the task group would identify and document issues along the way related to identification schemes, expressiveness of the relevant languages. Finally, the initial effort should start with SNOMED-CT and UMLS Semantic Network and focus on a particular sub-domain (pharmacological classification, for instance) in order to investigate such methods in a controlled way.

Use Cases/Project Ideas

... Please add/suggest use cases here ...

"Patient-Of-A-Physician" (POAP)

Useful ontologies tend to be very domain-specific, but there is a subset of classes and properties that recur over and over again in healthcare, and it would be nice to have a small, tight, commonsense ontology of these, sort of a FOAF or Dublin Core for healthcare. For example, in healthcare we are always talking about (instances of) patient, physician, "healthcare delivery organization", encounter, etc. Sure, there are models of this nature in the HL7 RIM/CDA and elsewhere, and these would have to be taken into account and related to anything that might be done in HCLS. These would be for very broad use, and should be modeled without a lot of semantic detail. In fact, modeling them as RDF(S) classes/properties would be preferable to modeling them in OWL (which latter choice might commit anyone importing them to OWL).

This task is similar to the task described in Archetypal Patient Record Ontology. But the distinctive aspect that our task group needs to add/contribute is to push the centrality of "weaving" these classes into existing, widely-used ontologies/RDF vocabularies like FOAF, BFO, DOLCE, SWEET, FMA, Dublin Core etc. using appropriate subclassOf/sameAs/subPropertyOf relations and/or SKOS relations.

(Added 4/24/2009) This task is now rolled up into the Pharma task of creating a similar vocab, which is accessible at Pharma Ontology.

OWL/RDF version of UMLS Semantic Network (UMLSSN)

Vipul K., working with Olivier Bodenreider at NLM, has previously created an OWL interpretation of the UMLSSN and has published his experiences doing so, clearly showing that such a project is by no means simple and entails a lot of semantic and technical choices/compromises (1,2). In sum, this makes these kind of ontologies "(re)interpretations" or "reworkings" of the original rather than "versions". Chimezie and John have also given this one a shot, and John certainly found this to be the case (Chimezie too?). Nevertheless, it would still be nice to get UMLSSN out there is some kind of "officially sanctioned" OWL version. That would require Vipul and the group taking up the project again with NLM (Olivier Bodenreider, presumably). Vipul, interested?

SNOMED and OWL

= Introduction =

SNOMED CT is a structured vocabulary of medicine that is edited/classified on the Apelon Ontylog platform. It has parent-child relations and a limited number of cross-hierarchy relations. Viewed from the description logic perspective, the set of constructors that best approximate its modeling framework is said to be EL++ (or, in some papers, EL or EL+), and SNOMED CT as a triplestore can indeed be successfully classified by the CEL classifier, which inferences in EL++. Kent Spackman and many others involved in the SNOMED effort have forcefully argued that EL++ (or something similar) is the most appropriate DL for SNOMED CT because the classification problem in EL++ is polynomial order. More complex DL's, including OWL DL, are considered less appropriate for the use case because the computational complexity is too high.

The complexity issue and the relation of various tractable DL fragments to OWL and its flavors have been discussed in (3). There's a lot to say, but it's important to be aware the absence of negation, disjunction, (universal) value restriction and cardinality constructors (including functional properties) in EL++. I strongly agree that the computational complexity problem is enormously important and may well be dispositive; and that a language based on EL++ can be (and SNOMED has repeatedly field-proven itself to be) useful for navigation, semantically enhanced search and retrieval, and as a mapping target. It is at the same time clear that there are a substantial number of (clinically relevant) medical inferencing use cases for which these constructors are bound to be very important!

For example, to be able to say (in a hypothetical EL++ ABox) that the patient does NOT have a particular symptom/finding/disease is a pretty important feature for clinical use. SNOMED modelers have traditionally approached the semantics of "negative" assertions in the absence of a true negation constructor in an interesting way, that would be a side discussion here (viz., 'nonexistent entity' (Meinongian) classes + disjointness statements).)
Possessing conjunction but lacking a negation operator, the language also cannot express implication (IF-THEN, because (if A then B)↔(¬(A∧¬B))). It is difficult to imagine real-world encodings of many (ABox) medical records in a language that cannot express if-then statements.
This also makes EL++ unsuitable as a medium for expressing rules-based knowledge (because rules have the form IF A THEN B), which is often the most convenient and natural form for encoding medical (TBox) knowledge. The designers of the OWL 2 profiles recognized the importance of rules i some applications by providing the OWL RL profile.
Absence of a disjunction (OR) removes the possibility of (ABox) statements like "The patient has chest pain due to either rib fracture or to myocardial infarction," but these kind of statements are ubiquitous in medical records (A boxes) and certainly will play an important role in inferencing applications.
In the absence of cardinality constructors, it is not possible to directly assert such knowledge as for example (TBox knowledge) that a normal human body has exactly one head, one heart, two arms and two legs, which is a significant limitation for inferencing on surgical and anatomical questions. SNOMED CT deals with constrained cardinalities in effect by a combination of subclassing and exhaustive enumeration; but this strategy brings with it maintenance challenges.

All this means that to support many medical use cases, particularly those involving 'heavier' inferencing, the OWL EL profile all by itself (and in particular, pure SNOMED CT ABoxes) will require supplementation. Supplementary ontology fragments written in a more expressive DL that references SNOMED CT classes/properties and (in some cases) extends them can however easily be imagined, that in effect layers on top of SNOMED CT, or other legacy terminologies.

Fortunately, it is precisely the vision of many in our community (including me, JM) that ontology building has the highest 'fitness-for-use', in general, when done in relatively small, intensively modeled fragments consisting of at most a few hundreds of classes/properties, heavily vetted by experts, and imported/used for inferencing only within the appropriate context, then unloaded. That is to say, in medicine the notion that the entire corpus of ontologies should be simultaneously loadable and co-classifiable is unreasonable, and won't happen in practice. Instead, we will have some relatively semantically-loose, high-level 'Rosetta-stone' ontologies whose role is to give some cross-referencability among the lower tier, more heavily modeled ontologies. The lower tier ontologies will be imported for the purpose of accomplishing some context-dependent inferencing task, then unloaded. (In fact, I suspect we'll have many tiers of complexity, and many levels of modularity.) In the end, we'll have logical consistency on the small scale, and inconsistency on the big scale. But logical inconsistencies on the large scale are a fact of human life (okay, that's being flippant, and certainly it's not so easy as that. But you get the point--maybe we don't need computers to worry about inconsistency on the big scale, because we humans are very good at that, whereas computers excel at dealing with (in)consistency on the small scale).

For SNOMED CT, in particular, this would obviously work best if it were the case that EL++ were a sublogic of SHOIN(D) viz. OWL 1.0. That is to say, it would be ideal if all true inferences in EL++ were also true inferences in SHOIN(D); or put another way, if reasoning were monotonic from SNOMED CT to any hypothetical "SNOMED-OWL' extension ontology. Were this the case, then it would be possible to import SNOMED assertions directly into OWL ontologies, and to use SNOMED TBox knowledge in OWL ABox inferencing. Sadly however, it is not the case that EL++ is a sublogic of SHOIN(D). Consequently, OWL 1.0 ontologies cannot directly make use SNOMED CT knowledge. However, EL++ is reducible to RDFS (see 3), and OWL 1.0 is also reducible to RDFS, so if you want to use SNOMED CT knowledge in an OWL 1.0 inferencing envirnoment, one mechanism would be to prepare an RDFS reduction of SNOMED CT (by eliminating some assertions), and to use this as part of the knowledgebase.

Alternatively, the good news is that in OWL 2, this problem ahould be corrected, EL++ will be a sublogic of the logic of OWL 2, and a profile of OWL 2 will be EL++ (3,4).

= Specific use case proposals =

What that means for us is that if we want to show how SCT (or any other terminology/controlled vocabulary/ontology of a similar level of complexity) can function in a Semantic Web environment, then there appear to be at least three options for mapping SNOMED.

Look at a very simple rendering of SNOMED in RDF.

Deliverables

W3C Note briefly documenting issues, use cases, and a method for extracting Semantic Web representations from an existing, consensus health care terminology.

References

.. Links to related literature go here..

Participants

... Please add your name if you are interested in participation...'