Re: Modeling Taxonomic Classifications in a World where a given Species can have many Classifications

Hi Peter
doesn't your problem still exist using skos ?
 - use of skos:broader to infer a hierarchy doesn't stop users making
sameAs relationships between two concepts at different depths of your
taxonomy, and thus creating the same problem for you defined in skos rather
than a class hierarchy ?
 - also one thing to note with skos is that it is triple heavy having both
inverseOf relationships and deepish property inheritance (broaderTransitive
<= semanticRelation   - this means in a owl inferencing triple store you
will materialise something like n * n * 6 triples (taxon width * depth *
skos properties)  - which could turn out to be very large if your
dataset/taxonomy is deep, as I imagine it is quite wide  ( a few million
species?)...  may not be a problem - but thought worth mentioning

sounds like a great project though :)
kind regards
Paul

Paul Wilton, Technical Architect
Ontoba Ltd <http://www.ontoba.com>
paul.wilton@ontoba.com




On Thu, Jan 26, 2012 at 3:39 PM, Peter DeVries <pete.devries@gmail.com>wrote:

> Hi Jerven,
>
> Thank you for your response. Your reasoning makes sense to me and I like
> the move to skos:broader and skos:narrower.
>
> The problem that I have with subClassing is that some groups have made
> sameAs links between *txn* concepts and subclassed concepts.
>
> This then entails the *txn* concepts within their subClass hierarchy in
> the LOD.
>
> So it is in this regard that I see them as potentially error prone.
>
> Ideally this linking should be done with something similar to SameAs but
> without entailment.
>
> For now I think the best alternative is skos:closeMatch.
>
> In some use cases these two linked entities can be interpreted as the same
> thing, but for other uses it might be best to consider them "different
> things".
>
> Until there are more nuanced versions of sameAs I think that
> skos:closeMatch allow end users to treat these linked entities as they see
> fit.
>
> I am glad you wrote and I would like to follow up in the future.
>
> I am currently in Woods Hole MA working with the EoL.org and
> GlobalNames.org and so there might be some opportunities to that we could
> run past your group.
>
> Also note that the species concepts are still experimental and would
> probably benefit from your suggestions.
>
> Thanks Again,
>
> - Pete
>
>
>
> On Thu, Jan 26, 2012 at 7:46 AM, Jerven Bolleman <
> jerven.bolleman@isb-sib.ch> wrote:
>
>> Hi Peter,
>>
>> Its interesting to see this discussion. I would like to give a short
>> background on why we at UniProt used rdfs:subClassOf relations between
>> taxons ids.
>> When this decision was made there where no property paths yet but there
>> was RDFS inferencing. So the only way one could query for all bacterial
>> proteins is by having each bacteria species being a subClassOf the bacteria
>> kingdom thing.
>> e.g. select ?protein where {?protein :organism ?taxon . ?taxon
>> rdfs:subClassOf taxon:2}
>>
>> Now that there are property paths we no longer need RDFS inferencing to
>> answer these kinds of questions.
>> i.e. select ?protein where {?protein :organism ?taxon . ?taxon
>> skos:broader+ taxon:2}
>>
>> We could actually move away from using rdfs:subClassOf. If we have a good
>> use case of this.
>> You can actually see that in this release of UniProt where we introduced
>> skos:narrower into the taxonomy relations next release we will add the
>> skos:broader links.
>>
>> Peter, I am baffled by one statement you made: why does the use of
>> rdfs:subClassOf relations make correct linking error prone?
>>
>> Regards,
>> Jerven Bolleman
>>
>> On Jan 25, 2012, at 11:27 PM, Peter DeVries wrote:
>>
>> > Hi,
>> >
>> > I have been trying to figure out the best way to deal with the
>> following problem.
>> >
>> > There are entities that we see as "species". (some argue if they are
>> real things or simply an artificial human construct.)
>> >
>> > I think that in general the species themselves see them as real and do
>> a pretty good job identifying other members of the same species.
>> >
>> > Putting that entire debate aside, we still need some way to deal with
>> the idea of a species as a typological construct so one can say things like.
>> >
>> > This species was observed at this geolocation or There have been X
>> number of bird species observed in this natural area.
>> >
>> > Names change over time, and the same name string can be used for
>> different animal / plant species.
>> >
>> > So that is why I created LOD entities like these
>> >
>> > http://lod.taxonconcept.org/ses/iuCXz.html  (
>> http://lod.taxonconcept.org/ses/iuCXz.rdf )
>> >
>> > http://lod.taxonconcept.org/ses/v6n7p.html  (
>> http://lod.taxonconcept.org/ses/v6n7p.rdf )
>> >
>> > Since moving to this new model from my earlier GeoSpecies, I have been
>> trying to figure out how to deal with the following issue.
>> >
>> > A species can have multiple classifications. You can see this when you
>> compare many of the species in DBpedia to those in the NCBI taxonomy
>> (uniprot, bio2rdf)
>> >
>> > Uniprot and Bio2RDF model these as nested subclasses which makes
>> correct linking error prone.
>> >
>> > I think a better way to think of this: there are species and different
>> groups choose to organise them into classifications differently.
>> >
>> > So rather than organize these into nested subclasses, I am thinking
>> about the following pattern.
>> >
>> > Puma concolor
>> > txn:inGenus txn_mammalia_genera:Genus_Puma
>> > txn:inFamily txn_mammalia:Family_Felidae
>> > txn:inOrder  txn_mammalia:Order_Carnivora
>> >
>> > You can see this in this file
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/species.owl
>> >
>> >     <owl:Class rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Species
>> ">
>> >      <txn:inClass rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Class_Mammalia
>> "/>
>> >      <txn:inOrder rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Order_Carnivora
>> "/>
>> >      <txn:inFamily rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae
>> "/>
>> >      <txn:inGenus rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/genera.owl#Genus_Puma
>> "/>
>> >      <rdfs:isDefinedBy rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/species.owl"/>
>> >     </owl:Class>
>> >
>> > And here http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl
>> >
>> >     <owl:Class rdf:about="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae
>> ">
>> >         <rdfs:label>Family Felidae</rdfs:label>
>> >         <rdf:type rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family
>> "/>
>> >         <txn:commonName>Cats</txn:commonName>
>> >         <skos:closeMatch rdf:resource="
>> http://purl.uniprot.org/taxonomy/9681"/>
>> >         <skos:closeMatch rdf:resource="
>> http://dbpedia.org/resource/Felidae"/>
>> >         <txn:hasWikipediaArticle rdf:resource="
>> http://en.wikipedia.org/wiki/Felidae"/>
>> >         <skos:broader rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia
>> "/>
>> >         <skos:narrower rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae
>> "/>
>> >         <skos:narrower rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae
>> "/>
>> >         <owl:sameAs rdf:resource="
>> http://lod.geospecies.org/families/gSvIP"/>
>> >         <rdfs:isDefinedBy rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
>> >     </owl:Class>
>> >
>> > This allows SPARQL queries like the one here http://bit.ly/qssZOGbased on my classification without breaking queries where the link is to
>> DBpedia via a different predicate.
>> >
>> > For now I have simply linked these broadly to DBpedia using the
>> following
>> >
>> > <txn:inDBpediaClade rdf:resource="http://dbpedia.org/ontology/Mammal"/>
>> *I use clade because these don't always match Order => Order etc.
>> >
>> > I think this pattern allows a given species to exist in several
>> classifications, and allow those interested to move up and down the
>> taxonomy - all without breaking things in the LOD.
>> >
>> > I thought I would ask the list what they thought of this before I do
>> much more?
>> >
>> > I was also wondering if it would it be better for me to use
>> subproperties of skos that I have created in this draft ontology?
>> >
>> > http://lod.taxonconcept.org/ontology/taxnomen/index.owl
>> >
>> > Such as:
>> >  txn_nomen:narrowerTaxon
>> >  txn_nomen:broaderTaxon
>> >
>> > Which would be used this way
>> >
>> >     <owl:Class rdf:about="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Family_Felidae
>> ">
>> >         <rdfs:label>Family Felidae</rdfs:label>
>> >         <rdf:type rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Mammal_Family
>> "/>
>> >         <txn:commonName>Cats</txn:commonName>
>> >         <skos:closeMatch rdf:resource="
>> http://purl.uniprot.org/taxonomy/9681"/>
>> >         <skos:closeMatch rdf:resource="
>> http://dbpedia.org/resource/Felidae"/>
>> >         <txn:hasWikipediaArticle rdf:resource="
>> http://en.wikipedia.org/wiki/Felidae"/>
>> >         <txn_nomen:broaderTaxon rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Suborder_Feliformia
>> "/>
>> >         <txn_nomen:narrowerTaxon rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Felinae
>> "/>
>> >         <txn_nomen:narrowerTaxon rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl#Subfamily_Pantherinae
>> "/>
>> >         <owl:sameAs rdf:resource="
>> http://lod.geospecies.org/families/gSvIP"/>
>> >         <rdfs:isDefinedBy rdf:resource="
>> http://lod.taxonconcept.org/ontology/p01/Mammalia/index.owl"/>
>> >     </owl:Class>
>> >
>> >
>> > And
>> > txn_nomen:narrowerRank
>> > txn_nomen:broaderRank
>> >
>> > Which is used this way
>> >
>> >     <owl:Class rdf:about="
>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Rank_Family">
>> >         <rdfs:label xml:lang="en">Rank Family</rdfs:label>
>> >         <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
>> >         <rdf:type rdf:resource="
>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#TaxonRank"/>
>> >         <txn_nomen:narrowerRank rdf:resource="
>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Subfamily"/>
>> >         <txn_nomen:broaderRank rdf:resource="
>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#Superfamily"/>
>> >         <owl:equivalentProperty rdf:resource="
>> http://purl.org/ontology/wo/Family"/>
>> >         <rdfs:seeAlso rdf:resource="
>> http://en.wikipedia.org/wiki/Family_%28biology%29"/>
>> >         <rdfs:seeAlso rdf:resource="http://www.bbc.co.uk/nature/family
>> "/>
>> >         <vs:term_status>testing</vs:term_status>
>> >        <rdfs:isDefinedBy rdf:resource="
>> http://lod.taxonconcept.org/ontology/taxnomen/index.owl#index.owl"/>
>> >     </owl:Class>
>> >
>> > Respectfully,
>> >
>> > - Pete
>> >
>> > P.S. Taxonomic Classification Ontologies like the ones listed above for
>> mammals will change over time as additional species are discovered and
>> their phylogeny is better understood.
>> >         What would be the best practices to handle things like this?
>> >
>> >
>> > --
>> >
>> ------------------------------------------------------------------------------------
>> > Pete DeVries
>> > Department of Entomology
>> > University of Wisconsin - Madison
>> > 445 Russell Laboratories
>> > 1630 Linden Drive
>> > Madison, WI 53706
>> > Email: pdevries@wisc.edu
>> > TaxonConcept  &  GeoSpecies Knowledge Bases
>> > A Semantic Web, Linked Open Data  Project
>> >
>> --------------------------------------------------------------------------------------
>>
>>
>
>
> --
>
> ------------------------------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> Email: pdevries@wisc.edu
> TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
> Bases
> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
>
> --------------------------------------------------------------------------------------
>

Received on Thursday, 26 January 2012 16:45:07 UTC