HCLSIG/Terminology/MappingHealthCareTerminologyToOWL/ThreeOptionsForMapping

From W3C Wiki
Jump to: navigation, search

Option 1

First option would be to consider SNOMED CT as truly DL-based, and that it lives in EL++.

One might ask, what portions of EL++ are actually used by the current release of SCT? EL++ provides the following constructors:

  • someValuesFrom restrictions
  • conjunction
  • concept disjointness
  • hasValue restrictions
  • oneOf enumerations involving a single element,
  • complex inclusion axioms for object properties,
  • transitive properties, and
  • General Concept Inclusion axioms (GCIs).

The current published SCT (July 2008) corpus aeems to contain formal assertions of the following types:

  • GCI's, expressed as the SCT is-a property (SCTID),
  • complex inclusion axioms for object properties, expressed as property hierarchies using is-a.

(English language statements in the documentation that make such assertions in the manner of human-readable usage rules rather than computable statements, are not considered formal assertions. And note that here we are talking about the published corpus of assertions comprising the distribution of SCT, which constitutes an inferred hierarchy, i.e. inferred by the Apelon classifier. We are NOT referring to the asserted model, which is not publicly accessible. The asserted model may or may not make use of constructors, properties and classes that are hidden in the public, inferred assertion corpus; of course, this is unknown. What we do know is that there exists an is-a relation, which is intended to be--albeit not stated in a formal computable way to be--transitive; and a number of other Linkage Concepts. But there appears to be no linkage concept that expresses a someValuesFrom restriction; nor one that expresses logical AND; nor one that expresses the same thing as owl:disjointWith; nor one that expresses the same thing as owl:oneOf; nor one that states is attached to any property to state that it is transitive.)

Sidebar: There is a problem interpreting the meaning of SCT properties, viz. what SCT calls Linkage Concepts. This is because SCT allows subclassing of Linkage Concepts using the very same is-a relation as are used for non-linkage (unary) concepts. This is different from the common usage in languages based on RDFS, since there properties are considered instances, and property hierarchies are constituted using a different relation from rdf:subclassOf, namely rdf:subpropertyOf.

So this raises the second option, which is.

Option 2

Given the above, it really seems more accurate to say that the expressivity level of SCT approximates that of RDFS, which has the following constructors:

  • domains and ranges of properties,
  • object property inclusion axioms, and
  • subclass and equivalence relationships between named classes.

In fact, SCT does not include any computable domain and range assertions, so has expressivity below that of RDFS.

(Note that here we are talking about the published corpus of assertions comprising SCT, which constitutes an inferred hierarchy, i.e. inferred by the Apelon classifier. We are NOT referring to the asserted model, which is not publicly accessible. The asserted model may make use of constructors, properties and classes that are hidden in the public, inferred assertion corpus, but this is unknown.)

So, in summary, for this option we treat SCT corpus as a corpus of RDFS assertions, using the following mappings.

  ||SCT||RDF/RDFS||
  ||is-a||rdfs:subClassOf||
  ||is-a (Linkage Concepts)||rdfs:subpropertyOf||
  ||non-leaf concepts||rdfs:Class||
  ||leaf concepts||????? rdf:Resource or perhaps [rdf:type [owl:complementOf rdfs:Class]]||

Option 3

Which brings us to the next option: avoid entirely the issue of any formal semantics for SCT, and incorporate it using relations from the SKOS vocabulary, which do not assume any formal, model-theoretic semantics, and which do not treat it as DL-based, but rather accept it as a semi-formal controlled vocabulary.

Consider the following candidate mappings.

  ||SCT||SCTID||SKOS (2008)||Comment||
  ||is-a||tba||skos:broader||entails skos:broaderTransitive, see SKOS Primer 2008||
  ||is-a (Linkage Concepts)||tba||skos:broader||ditto||
  ||non-linkage concept||all||skos:Concept|| ||
  ||linkage concept||all||(1) skos:Concept or (2) subproperty of skos:related, let's call it snomed:link, which corresponds to the concept LinkageConcept at the top of the attributes (= properties) hierarchy in SCT.||I think the second choice is better. skos:related is symmetric, so the child property would have to make a non-symmetric. This is quite possible (in fact the default in OWL.)||
  ||FSN||all||skos:prefLabel FSN||needs to be per (human) language unique. FSN's from the Concepts table are repeated in Descriptions table, need to filter the repeat string||
  ||Description||all (except repeats of FSN)||skos:altlabel Description|| ||
  ||Various versioning properties of concepts|| ||create various subproperties of skos:note or dc: properties in the snomed: namespace|| ||
  ||SNOMED Root|| ||skos:[[TopConcept]]|| ||
  ||SNOMED CT as a whole|| ||skos:[[ConceptScheme]]|| ||

This final option (which I think may be the best) comes after discussions in the group in which Xiaoshu particularly was insistent that the legacy terminology should not be "overinterpreted" by attributing any model other than the native one, not even the RDF model itself. This is particularly relevant for SNOMED because as originally conceived (and still, in fact) SNOMED does not have a model theoretic (Tarskian-style) semantics. In fact, SNOMED purposely avoids discourse about the domain of its logic (i.e., in SNOMED lingo it is purely about the terminology model and not about the information model, or as others have put it, SNOMED is purely about the T-Box and has nothing to say about the A-Box). Since this is the case, in SNOMED truth has two aspects: intensional correctness is determined by a defined editorial process and officially fixed by description and by the SNOMED synonymy (i.e. the Description table); and formal (logical) truth is established by what is in essence a proof-theoretic criterion (Does it classify without errors in Ontylog?). This in contrast to a language with an extensional type of semantics--like all RDF-family languages--in which truth is determined by whether (a collection of) assertions in the language is (necessarily) in faithful correspondence with individuals and their relations in the domain. In recognition of SNOMED, this really means that it is a lot more ambitious than RDF family languages in that it is highly intensional, but for the Semantic Web we'd like to tone it down into something a bit more tractable to our toolset.

So the point here is that "RDF(S)-ness" should not be attributed to SNOMED concepts. And indeed it is clear that SNOMED is not RDF(S)-like, in a great many ways. As alluded to above, one that always comes to mind first is that relations in SNOMED are concepts and have is-a relations with each other, whereas in RDF-family languages properties are not rdfs:Class -es and do not have rdfs:subClassOf relations with each other (but rather are instances of rdf:Property and have rdfs:subPropertyOf relations with each other). So really, to be faithful to SNOMED, we can't really allow the use of RDFS constructions, and (as Xiaoshu points out), we should assume nothing more than the existence of nodes and relations.

Well, SKOS assumes a little more than that, but it doesn't assume a lot more. To reason with SNOMED content using SemWeb tools, therefore, the best thing might be to create an image of SNOMED under a different namespace that allows some reconfiguration, and leave SNOMED itself to SKOS.