problems with imports

I hope that this is an appropriate and acceptable message.  For some  
time now some tool developers and users have been having increasing  
difficulty with the owl import declaration.  I have a proposal but I  
would really like some advice on any misconceptions I might have,  
whether this is good approach and how the OWL 1.1 standard will  
address this  issue.

The Problem:

Somewhere on the web, there is an ontology that contains the following  
statements:

    <?xml version="1.0"?>
    <rdf:RDF
         ...
         xml:base="http://purl.org/obo/owl/sequence">
      ...
      <owl:Ontology rdf:about="">

According to my reading of the specifications, this means that the  
ontology in question is  called http://purl.org/obo/owl/sequence and  
that the proper way to import this ontology is

	<owl:Ontology rdf:about="">
		<owl:imports rdf:resource="http://purl.org/obo/owl/sequence"/>
		...

(please forgive faulty rdf...).   I will call this method of writing  
an import "import by name".


However, when I went to the web page given by http://purl.org/obo/owl/sequence 
  I get a redirect followed by a not found error.   So anyone who  
finds the importing ontology has no way of finding the imported  
ontology.  When a developer or user knows were the ontology is  
located, they can use various mechanisms to tell their tool where to  
find the ontology.  But none of these mechanisms are compatible.  In  
addition, in many cases, ontology authors cannot control the  
ontologies that they wish to import.

For these reasons, many ontology authors have taken to doing import by  
location.  Tool builders are finding themselves forced to  
accommodate.  In my opinion import by location (or some hybrid)  
doesn't solve anything though as I argue below.

Use Cases:

1. The internet is trusted, available and reliable, ontologies are  
never relocated and all the ontologies of interest are on the internet.

This is the one case where import by location shines and import by  
name does very badly.  With import by name, a person reading an  
ontology off the web may not be able to determine where to find the  
imported ontology.

I will lump the other use cases together but they may have important  
differences.

2. I am commuting home from work with no internet access and unzip a  
collection of owl files.

3. I am developing an application which may not have access to the  
internet and/or may not be willing to trust the internet even if it  
had access.

4. I have access to the internet but I want to edit some (must be more  
than one) ontology that I download off the web.

5. Web servers, projects and organizations come and go and ontologies  
are relocated.

In these cases, to varying degrees,  import by name works very well  
and this is why I think it is the right choice.  (Cases 2 and 3 are  
close to my heart.)  Consider use case 2 because in some sense it is  
an extreme.  In this case - with import by name - I simply plop the  
owl files on my disk and my tool can easily determine which ontologies  
import which.  It just needs to parse the import statements and the  
ontology declarations from the files in question.

Import by location fares much worse in this case.  My tool has no way  
of figuring out which ontology imports which - it must be told. If the  
zip file only contains owl files then it is a human who must figure  
out the imports.  Also import trees can be pretty complicated as they  
have been in several recent examples. This is aggravated when - as in  
one case - the ontologies in question use different methods of   
importing the same ontology.  This means that my zip file must include  
a file that records *all* the different ways in which the owl files  
are downloaded.  As different tools will use different versions  of  
this file, I will need to convert the file to all the different  
formats.  Seems very awkward.

Proposal:

My proposal is to use import by name but to allow an annotation that  
provides a hint as to where to find the imported ontology.  Thus a  
good import of the http://purl.org/obo/owl/sequence ontology could be

    <owl:Ontology rdf:about="">
       <owl:imports>
          <owl:Ontology rdf:about="http://purl.org/obo/owl/sequence">
             <owl:hasPhysicalLocaition>
                http://www.berkeleybop.org/ontologies/obo-all/sequence/sequence.owl
             </owl:hasPhysicalLocation>
          </owl:Ontology>
       ...

This  hint can be ignored in many cases (e.g. use case 2) where the  
hint is known to be wrong or it doesn't work.

I think this needs to be a standard because it would need to be  
understood by a variety of tools using different api's and written in  
different languages.  I think that it is important enough because  
there is a significant amount of traffic devoted to problems related  
this sort of import problem.  In particular we are seeing traffic  
about this subject on two tool forums where the tools are based on   
entirely different owl api's with an independent ancestry.

-Timothy

Received on Sunday, 2 December 2007 04:07:39 UTC