Re: Neophyte question - modelling Alpha Taxonomies in RDF and Owl

On Dec 19, 2007, at 12:20 PM, Christopher Rose wrote:

>
> Hello All,
>
> Please forgive what may be a neophyte question, I hope I've found a
> forum where this question is relevant.  It's particularly a question
> about formulating ontologies in Owl and RDF.
>
> I'm modelling an actual scientific taxonomy of living creatures.  I
> want to model elements at each level of the taxonomy (a.k.a. taxa) as
> Owl classes, such that restrictions on properties I place on taxa at
> higher levels of the hierarchy are passed down to lower levels.  For
> instance I might have a restriction that every member of class Aves
> (which is a taxonomic class as well as on Owl Class) has wings, and I
> would expect then that a Class Strigiformes (strigiformes is the
> taxonomic order containing actual owls, the kind with wings) which I
> might later define to be subClassOf rdf:resource="#Aves" would have
> the same restriction.  This seems natural, to follow the intention of
> the language, and to model the expectations that human taxonomers (or
> 'systematists') might have.

This is a fairly standard way of representing alpha taxonomies, each  
taxon is a class, connected by subclass axioms. Individuals would be  
particular organisms, e.g. orville the owl (typically not named in  
the ontology of course)

You only allude to the more difficult and interesting part: how do  
you represent Wing?

> But there is a lot of information regarding the class Aves which does
> not represent restrictions on the individuals (or subclasses) who may
> be members.  Much of that information is related to the class itself -
> a reference to the relevant papers defining the class (Linnaeus,
> 1758), common names associated with it, a serial number for the
> taxonomic unit bestowed by various scientific organizations, their
> level of acceptance of that taxon, etc.
>
> But if I understand Owl syntax correctly, I cannot simply use it to  
> say
>
> <owl:Class rdf:ID="Aves">
>   <rdfs:hasaSerialNumber rdf:about="174371">
>   <subclassOf rdf:resource="#Vertebrata">
> </owl:Class>

This is fine if you declare an AnnotationProperty (not in the rdfs  
namespace) called hasASerialNumber (or better: has_serial_number)

The serial# applies to the resource Aves, not to the class extent of  
Aves.

This works, with the proviso that AnnotationProperties are lacking in  
semantics. There is nothing to stop you giving Aves 3 different  
serial numbers, or no serial numbers, or sharing a serial number with  
Crocodiles.

> even if I do also define a property called hasaSerialNumber.  I can
> only place restrictions on properties in the class definition.   If I
> then instead write;
>
> <owl:Class rdf:ID="Aves">
>   <rdfs:subClassOf rdf:resource="#Vertebrata"/>
>   <rdfs:subClassOf>
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="#hasaSerialNumber"/>
>       <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">
>       174371
>       </owl:cardinality>
>     </owl:Restriction>
>   </rdfs:subClassOf>
> </owl:Class>
>
> This class will likely be empty, if I understand Owl correctly.  That
> is because any other taxa which I try to define as Classes, and make
> as subClassOf Aves, will certainly have their own serial numbers which
> will by definition be different, and therefore outside the restriction
> of the parent class.

it's worse. You seem to be saying that every instance of a bird has  
174371 serial numbers.

even forgiving the cardinality mistake, you're on the wrong track.  
you don't want to use a restriction here, unless you want to talk  
about the instances, and I presume that here the instances are  
spatiotemporal particulars such as orville the owl and flossie the  
sheep.

i may be wrong and it may be your intent to classify species rather  
than organisms: in that the leaf nodes of your taxonomies (H Sapiens,  
D melanogaster) would be individuals. I would recommend against this.

> I can see that what I am really wishing for is that Owl classes might
> be more similar to OODBs, where I might define a class of taxonomic
> classes, and define slots or attributes on that class, which each
> taxonomic class might fill in differently.

owl has different semantics from OODBs.

> I guess I could define a different property for a serial number at
> each level of the taxonomic hierarchy, but this is cumbersome (there
> are lots of them, more than the 8 you learned in school) and feels
> artificial, as the serial numbers are serial to all taxa, rather than
> to just the taxa at a specific level (the orders, or the classes,
> say.)  An even worse compromise would be to stuff the serial number
> and other information about the class itself in comments in the Owl
> Class.
> Another option that occured to me was to create RDF Element to hold
> all of this data that is specifically about the individual Owl Class;
>
> <rdf:Description rdf:ID="Aves">
>    <uni:serialNumberITIS>174371</serialNumberITIS>
>    <uni:hasaParentClass rdf:resource="#Vertebrata"/>
>    <uni:hasLimbs>
>      <rdf:Bag>
>         <rdf:_1 rdf:resource="#wings"/>
>         <rdf:_2 rdf:resource="#legs"/>
>      </rdf:Bag>
>    </uni:hasLimbs>(etc.)

I'm wincing here, sorry. rdf:Bag, ouch.

I guess you can just toss out all the good stuff with OWL and use  
RDFS. I'm not sure what inferences you intend to get with your bags  
of limbs. They wouldn't propagate over hasParent.

Don't use plurals in your anatomical entity names, unless you  
explicitly intend to denote a collection of legs or wings.  
Unfortunately it's too late to get Linnaeus to stick to this rule.

Far better to use classes, then you can get more expressive

Aves SubClassOf hasLimb SOME wing
Aves SubClassOf hasLimb SOME leg

(sorry to switch syntaxes on you)

even better, dispense with hasLimb and use has_part. Or rather re- 
use, so you can interoperate with other bio-ontologies -- http:// 
obofoundry.org/ro

However, there may be some advantages in turning your "Aves" into  
instances thus placing them domain of discourse allowing you to make  
more expressive statements about them. I'd  recommend proceeding  
carefully. More on this below.

> and then somehow (?) associate each of the Owl Classes with the more
> general and capacious RDF description.  But again I feel like this is
> a contrivance that is forced on me by the syntax, and certainly not
> respresentative of how a human taxonomer would organize her own
> thoughts.
>
> It's certainly possible my frustration with this may stem from my
> incomplete (or inaccurate) understanding of the syntax of XML, RDF,
> RDF Schema, and Owl, and/or the intent of each of these.

There's a lot there, isn't there? My suggestion would be to try and  
ignore xml, rdf and all syntax issues and learn owl first.  
Unfortunately I don't have any pointers. Most owl guides put the rdf/ 
xml in your face

>   But also it
> may come from these each being defined separately, and over a period
> of time, where an OOP language (C++, or Ruby, say) is defined all in a
> single stroke (I'm simplifying here, I realize.)

forget everything you know about OOP then try and tackle OWL.

if this proves impossible, learn the consequences of the difference  
between the CWA and OWA first. then instances in reality vs instances  
in UML/java.

>   Also it seems as
> though a lot of modelling facility has been sacrificed in order to
> make Owl and RDF more easily digestible to reasoners (the software
> kind).

Less than you may think for your problem above. You can have your  
serial numbers, as annotation properties. You just can't constrain  
them, at least with the above representation.

There are more serious expressivity constraints. You can't say that  
species can't interbreed (since "species" is just a taxonomic rank  
here, indicated with a logically invisible annotation property). And  
you can't formally state a monophyleticity property for a class.

There are some other interesting options. You could take a  
phylogenetic perspective and treat each taxon as denoting an instance  
of a spatiotemporal event. E.g. "Aves" as denoting the branching  
event instance or progenitor organism instance that gave rise to all  
extant actual Aves instances. Your ontology could be pretty minimal  
here - organism, birth. You can state your monophyleticity  
constraints in a DL-safe rules way (if this is important to you), you  
can account for evolutionary loss (tetrapod has_part limb - mostly)  
without getting into non-monotonicity.... fun, but diverging from a  
more traditional taxonomy.

> Any instruction or suggestions would be heartily appreciated.

been a while since I learned owl & don't have decent learning  
resources handy. I presume there must a be a "OWL for OO programmers"  
guide somewhere.

Or you can just cheat and skip straight to:

http://www.berkeleybop.org/ontologies/obo-all/ncbi_taxonomy/ 
ncbi_taxonomy.owl
[warning - big file]

others may have similar translations for their favourite taxonomies.

cheers
chris

> Thanks sincerely,Chris
>
>
>

Received on Thursday, 20 December 2007 05:57:33 UTC