Imports

From OWL
Revision as of 15:33, 18 April 2008 by Peter Patel-Schneider (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Current situation

The proposal from Peter and Boris, is to import by location. Either the ontologyURI of the imported ontology or the version information of the imported ontology must match the location.

The proposal from Alan is to import by location, in a breadth-first fashion, but to skip imports that have already resulted in importing the "same" ontology.

Examples

Simple import

At http://example.com/ontology1

Ontology(<http://example.com/ontology1>
  Import(<http://example.com/ontology2>))

at http://example.com/ontology2

Ontology(<http://example.com/ontology2>)

Versioned import (of current version)

At http://example.com/ontology1

Ontology(<http://example.com/ontology1> 
  Annotation(owl:versionInfo <http://example.com/ontology1.1>)
  Import(<http://example.com/ontology2.1>))

at http://example.com/ontology2

Ontology(<http://example.com/ontology2> 
  Annotation(owl:versionInfo <http://example.com/ontology2.1>))

Versioned import (of specific version)

At http://example.com/ontology1

Ontology(<http://example.com/ontology1>
  Annotation(owl:versionInfo <http://example.com/ontology1.1>)
  Import(<http://example.com/ontology2.1>))

at http://example.com/ontology2.1

Ontology(<http://example.com/ontology2> 
  Annotation(owl:versionInfo <http://example.com/ontology2.1>))

Clashing versions

When two ontologies in an imports closure ask for different versions of a third ontology problems arise. These problems appear to be insoluble, as the dependency on a particular version can arise anywhere at any time. Probably the worst example is when an imported ontology is changed to depend on a particular version of the third ontology:

Initial situation:

At http://example.com/ontology1

Ontology(<http://example.com/ontology1>
  Annotation(owl:versionInfo <http://example.com/ontology1.1>)
  Import(<http://example.com/ontology2>)
  Import(<http://example.com/ontology3>))

at http://example.com/ontology2

Ontology(<http://example.com/ontology2>
  Annotation(owl:versionInfo <http://example.com/ontology2.1>)
  Import(<http://example.com/ontology3>))

at http://example.com/ontology3

Ontology(<http://example.com/ontology3>
  Annotation(owl:versionInfo <http://example.com/ontology3.1>))

Revised situation:

At http://example.com/ontology1

Ontology(<http://example.com/ontology1>
  Annotation(owl:versionInfo <http://example.com/ontology1.1>)
  Import(<http://example.com/ontology2>)
  Import(<http://example.com/ontology3>))

at http://example.com/ontology2

Ontology(<http://example.com/ontology2>
  Annotation(owl:versionInfo <http://example.com/ontology2.1>)
  Import(<http://example.com/ontology3.2>))

at http://example.com/ontology3

Ontology(<http://example.com/ontology3>
  Annotation(owl:versionInfo <http://example.com/ontology3.1>))

at http://example.com/ontology3.2

Ontology(<http://example.com/ontology3>
  Annotation(owl:versionInfo <http://example.com/ontology3.1>))


Background

owl:imports is an important feature, yet it was conversial in the WebOnt working group, at best loosely specified, and has shown some problems in practice. Furthermore, the OWL 1.1 version is (perhaps) somewhat different than the OWL 1.0 version. Additionally, since OWL 1.0 went to recommendation, XML inclusion became a recommendation. There is an OWLED task force devoted to improving the design of owl:imports as well.

Given all this, a proper rethink is in order.

Issues Pertaining to Imports

Some useful emails and blog posts:

OWL 1.1 Use Cases

These are use cases motivating additional design over the 1.0 design. (Rather than simply clarifying the 1.0 design)

Offline editing

An ontologist is editing their related ontologies O1 and O2, and using two other ontologies O3, and O4, both on the Web. They are going off-line for a bit, and wish to copy all the relevant files to their hard disk and continue working.

When they go back on-line they wish to be able to continue working again, eventually publishing final versions of O1 and O2 on the web.

Note

This relates to imports because O1 and O2 will have some imports relation with each other, and also with O3 and O4 but in the offline case, they are on the local hard disk rather than the Web.

Bijan's Scenario: Generating Variants

(This is to fulfill ACTION-38. I'm describing scenarios I've encountered and variants thereof.)

A lot of confusion surrounds the relation between 1) the xml:base (bases) of an RDF/XML document, 2) namespace declarations in that file, 3) the prefixes of terms in the ontology corresponding to that file, 4) the URI/L of that file (if any), and 5) the URI at the object end of an imports statement, and 6) the identifier for the ontology (if any).

When working on OWL-S, esp. the "DLization" (i.e., coercing a version to OWL DL from OWL Full), I needed to work offline and on variants. That is, I'd have files (say process.owl and profile.owl) that I needed to change to make DL compliant, but one of the imports the other. Obviously, with location based imports I'd change the imports statement (and have to do so through all the imports chain). I would share these files as zipped directories or as published in my webspace with other members of the OWL-S coalition. (It was a bit annoying to share zips as you don't want file uris in the imports statements, but since we tended to use xml:base to keep the term uris "owl-s" looking, the base wasn't very helpful. So I would put things on the web.)

Given a decent redirect/catalog mechanism, name based importing would be ok, but it would need to be the case that not only my tool chain, but everyone else's needs to work reasonably as well. I guess I could have changed both the imports statement and the name of each ontology...but to what advantage I have no idea. Just means I need to change two pieces of information rather than one (and most of my ontology URIs just grab it from the base via the <owl:Ontology rdf:about=""> idiom).

Granted if I did two or three variants, most tools really needed me to rename the terms (so update namespaces and bases). It's not insurmountable, but it's a bit of book-keeping which, again, seems pointless. Suppose I wanted to explore the consequences of two different definitions (in context) of a term, foo:treetop. I could generate two ontologies with different names, and have two different terms for foo:treetop, foo1:treetop and foo2:treetop. Suppose in the end we decide on foo1:treetop which I then rename to foo:treetop.

This is tedious but perhaps workable. I really don't like it. But now consider the case where this ontology is also imported by some third ontology (which otherwise remains static). I have to harmonize all the references to treetop, the imports, the name of the ontology, etc. etc. etc.

From a text file perspective, if I'm very careful, I might be able to make this almost reasonable, but some of the tricks to avoid duplication (<owl:Ontology rdf:about="">) might get in the way. It doesn't seem worth the effort overall. From a IDE perspective, if my IDEs are all different and have different ways of setting the variants, it's pretty nasty.

This is straying from mere imports, but a lot of people like their canonical prefix (or "namspace") and their ontology ID to be the same. (RDF/XML idioms encourage this.)

Of course, having the imports id and the ontology id means I have no direct way to detect (from an RDF perspective) which axioms are associated with which ontology (or even which ontology came from which imports). This is, indeed, annoying, but name synching doesn't really help at all. (Nesting axioms in Ontology() structures does, of course.)

I guess for publishing using different names for variants of different ontologies isn't the worst thing in the world. It does feel unnatural in some cases.

Current Designs

OWL 1.0

OWL DL DL-Style Model Theoretic Semantics

Section 3.4 of the OWL Semantics and Abstract Syntax document:

[A]n owl:imports annotation also imports the contents of another OWL ontology into the current ontology. The imported ontology is the one, if any, that has as name the argument of the imports construct. (This treatment of imports is divorced from Web issues. The intended use of names for OWL ontologies is to make the name be the location of the ontology on the Web, but this is outside of this formal treatment.)

OWL Full

Section 5.3 of the OWL Semantics and Abstract Syntax document:

Definition: Let K be a collection of RDF graphs. K is imports closed iff for every triple in any element of K of the form x owl:imports u . then K contains a graph that is the result of the RDF processing of the RDF/XML document, if any, accessible at u into an RDF graph. The imports closure of a collection of RDF graphs is the smallest import-closed collection of RDF graphs containing the graphs.

OWL DL RDFS-Compatible Semantics

Section 5.4 of the OWL Semantics and Abstract Syntax document:

Definition: Let T be the mapping from the abstract syntax to RDF graphs from Section 4.1. Let O be a collection of OWL DL ontologies and axioms and facts in abstract syntax form. O is said to be imports closed iff for any URI, u, in an imports directive in any ontology in O the RDF parsing of the document accessible on the Web at u results in T(K), where K is the ontology in O with name u.

Summary of OWL 1.0 Design

The situation for OWL DL is quite clear - the ontology retrievable at URI u is supposed to have name u. It is thus irrelevant whether imports is by location or by ontology name. For OWL Full, it is not required that the ontology at u have name u, but it is clear that imports is by location.

The end result is that in OWL 1.0, imports works like entire-document XML inclusion.

OWL 1.1

Section 3 of the OWL 1.1 Syntax Specification:

The structure of OWL 1.1 ontologies is shown in Figure 1. Each ontology is uniquely identified with an ontology URI. This URI need not be equal to the physical location of the ontology file. For example, a file for an ontology with a URI http://www.my.domain.com/example need not be physically stored in that location. A specification of a mechanism for physically locating an ontology from its ontology URI is not in scope of this specification.
Each ontology contains a possibly empty set of import declarations. An ontology O directly imports an ontology O' if O contains an import declaration whose value is the ontology URI of O'. The relation imports is defined as a transitive closure of the relation directly imports. The axiom closure of an ontology O is the smallest set containing all the axioms of O and of all ontologies that O imports. Intuitively, an import declaration specification states that, when reasoning with an ontology O, one should consider not only the axioms of O, but the entire axiom closure of O.

Summary of OWL 1.1 Design

There is no treatment of importing in the OWL 1.1 Semantic document, which is a bit strange, so the controlling definition of importing in OWL 1.1 is in the SS&FS document. The definition is a bit unfinished, but the intent is clear that imports is by name, as in the OWL DL direct semantics, but without the intent that names and locations correspond.

XML Inclusions

XML inclusions is compatible with an imports-by-location mechanism, but XML inclusions provide much more power. XML inclusions do not appear to be appropriate for any non-XML OWL syntax, as they are too tied to XML. Full XML inclusions might be appropriate for an XML-based OWL syntax, provided that imports-by-location is the imports paradigm that is wanted.


Comparison between OWL 1.0 and OWL 1.1

OWL 1.1 requires all ontologies to have URIs and requires the import to only import ontologies whose URI is the same as the object position of the imports statement. OWL 1.0 is looser than this. OWL 1.1 prevents putting a URI as the target of an imports with an anonymous ontology or one located somewhere other than the URI of the ontology itself [actually OWL 1.1 doesn't talk about the location of an ontology at all]. (At least, by default. A system registry could work around that.)

Proposals

Imports is by Location (with Version extension)

Synthesis

(pfps 16:03, 4 April 2008 (EDT) Here is my synthesis of the augmentations to my initial proposal, with a version proposal added, as modified at F2F2.)

Basic version

Imports in OWL 2 are "by location": if an ontology O contains a statement "Import( someURI )", then "someURI" specifies the location of the imported ontology---that is, an OWL 2 implementation SHOULD access the ontology at the location "someURI" using the standard Internet protocols. In some cases, however, an application might want to retrieve the imported ontology from some other location; this can be the case, for example, in order to implement off-line caching or a staging mechanism for ontologies prior to their publishing on the web. In this case, an OWL 2 implementation MAY choose to replace or augment the above described resolution mechanism with a mechanism that from "someURI" determines how to find the ontology. In any case, the ontology so found SHOULD have its ontologyURI equal to "someURI".

Versioning

Versioning is handled by a publishing methodology. The ontology with ontology URI "someURI" is published at "someURI" *and* at another location, which SHOULD be the value of an ontology annotation for owl:versionInfo. When a new version of the ontology with ontology URI "someURI" is created, it is published at "someURI" and also at a version-specific location. Other ontologies can import from "someURI" if the current version is wanted, or from the version-specific location if a particular version is required. This has the side effect of loosening name-location matching so that the ontology so found SHOULD have its ontologyURI or an owl:versionInfo annotation value equal to "someURI".

So the ontology

Ontology(<http://example.com/ontology/family>
  Annotation(owl:versionInfo <http://example.com/ontology/v1/family>) ... )

would be published at both http://example.com/ontology/family and http://example.com/ontology/v1/family. A new version

Ontology(<http://example.com/ontology/family>
  Annotation(owl:versionInfo <http://example.com/ontology/v2/family>) ... )

would be published at http://example.com/ontology/family, replacing the version previously there, and also at http://example.com/ontology/v1/family

An ontology SHOULD NOT import multiple versions of the same ontology, i.e., different ontology documents with the same ontology URI but that do not share an owl:versionInfo annotation value.

Initial Version

Peter Patel-Schneider 09:40, 19 December 2007 (EST)

The basic driver for this proposal is that OWL is a SW language. I take this to mean that, among other things, that there is one ontology for a given URI in the entire Semantic Web. Thus the basic idea is that the imports has to be by URI because otherwise there can be several ontologies with the same "name".

Imports is by location, so that in Import(URI), URI is an ontology location, not an ontology name. So an ontology could have Import(<http://foo.ex/ex>), which would import the ontology obtained by web retrieval from http://foo.ex/ex. The ontology at URI should (must?) have name URI (but see just below for a minor tweak). Ontologies need not have names, but unnamed ontologies should not be (are not) imported. The ontology at http://foo.ex/ex thus should look something like Ontology(<http://foo.ex/ex> ...).

To handle versioned ontologies, a solution very similar to the W3C publication methodology is used. The current version of the ontology is accessible at URI (as above). Ontologies with version information are also available at the location which is the location name plus the version. (The details of how to form the location is not yet specified.) So, an ontology might have something like Import(<http://foo.ex/ex-11>) to import version "11" of the above ontology. The ontology at http://foo.ex/ex-11 would look something like Ontology(<http://foo.ex/ex> Annotation(owl:versionInfo "11"^^xsd:string) ...).

Local uses of ontologies can be handled by:

  • If the ontologies are on the web, tools can make local caches, in the usual, invisible web manner, so that retrieval from a location is invisibly short-circuited to the cache.
  • Ontologies can be stored using file: URIs, which explicitly provide a local location (with, of course all the attendant problems).
  • Tools can also support an explicit caching mechanism, so that users can point to a retrieval area probably using a different URI (such as local: or even protege:).

Publishing an ontology on the web may require changing the name of the ontology, because its published web location was not known when the ontology was being developed. In the ideal case, the published location is known at the beginning, so the development can be done in a cache area using the published "location".

Proposal-jjc-variant

As with Peter's, maybe without the versioning rules (I've not really thought about that), but with the following rider:

When using ontologies from the Web, tools MAY, as always, need a local cache. In a typical cache local files are used which are copies of remote ontologies retrieved with a Web GET operation. In this case, if the tool has access to the Web and the cache copy is out-of-date with respect to the Web copy, it SHOULD be replaced. Editing tools, being used as part of a publication process MAY have local files which are being prepared for a Web PUT operation. In this case, if the tool has access to the Web then Web copies of such resources SHOULD be ignored. To faciliate interoperation between tools using the same cache copies (both GET-cache and PUT-cache), the RDF vocabulary in appendix-TBD MAY be used (e.g. Jena location mapper).

Boris's Proposal for handling imports

Here is another suggestion for dealing with imports. This is actually the wording that I propose to include as part of the structural specification document. It basically is in agreement with Peter and Jeremy; the main difference is that it does not specify the precise mode of operation in terms of caching. I believe that this is not just the problem of caching; rather, this seems to me to be a more general problem of locating one's ontologies.

Imports in OWL 1.1 are "by location and name": if an ontology O contains a statement "Import( someURI )", then "someURI" specifies the location of the imported ontology ---that is, an OWL 1.1 implementation should access the ontology at the location "someURI" using the standard Internet protocols. The ontology located in this way MUST have the ontologyURI equal to "someURI".

In some cases, however, an application might want to retrieve the imported ontology from some other location; this can be the case, for example, in order to implement some form of off-line ontology caching. In this case, an OWL 1.1 implementation MAY choose to replace the above described resolution mechanism with a proprietary mechanism that from "someURI" determines the ontology’s location. Regardless of what mechanism is used, the ontology located through it MUST have the ontologyURI equal to "someURI".

Imports is by ontology URI (with resolution of ontology URIs to physical URIs)

Boris Motik 17:56, 15 December 2007 (GMT)

Solution summary

If an ontology O imports an ontology O', then importation should be by ontology URI; the ontology O should contain the URI of the ontology O' and this can be different from the physical location of O'. Furthermore, the OWL 1.1 specification should provide common ways of resolving ontology URIs to physical URIs.

(Nomenclature convention: In OWL 1.1 Structural Specification, the term "ontology URI" is used to mean "ontology name". Hence, I shall use the term "ontology URI" from now on.)

Why not imports by location?

At the F2F, some people suggested that imports should be by location: if O imports O', then O should contain the location of O'. Furthermore, there was some discussion about whether the location of the imported ontology should be the same as the ontology URI of the imported ontology.

I believe that such a system is not particularly suited to typical scenarios in which OWL is used. I briefly list some of the problems that commonly arise from such a definition.

  1. Whereas finished ontologies may indeed be published on the Web at a location that is identical to the ontology URI, in order for someone to use the ontology, the ontology has to be copied locally. This invariably makes the physical ontology different from its ontology URI.
  2. Some people argue that the problems described under 1 can be thought of as caching. I agree to this view, for the ontologies that originally exist somewhere on the Web and are copied locally for reasoning. However, ontologies will often be developed locally, and will be published to the Web only after they are finished. While the ontology is being edited, its ontology URI is unlikely to be the same as its physical URI.
  3. There is a more general question whether the ontology URI and the physical location need to be the same. Imagine a user that is starting Protégé and clicking on "New Ontology". Protégé might ask then the user to choose a directory where the ontology is to be stored. If the ontology URI is to be equal to the physical location, then the ontology should be assigned a URI such as file:/C:/Temp/ontology.owl. It is natural to use the ontology URI to generate URIs of ontology entities; hence, as the user adds entities to the ontology, these entities will be called, e.g., file:/C:/Temp/ontology.owl#Person. This is undesirable.
  4. People often move files on their computer. If O were to import O' by location, then moving the ontology files breaks the imports. You then need to open the ontology in a text editor to fix the problem. Furthermore, if the ontology URI must be the same as the physical one, then moving the ontology on your computer breaks the validity of an ontology, unless you rename the ontology accordingly.
  5. Consider ontology repositories, such as Swoogle: there, the ontology URI is again unlikely to be the same as the physical location of an ontology.

To summarize, the physical URI and the ontology URI are different for most of the time when people are actually working with their ontologies. Therefore, the OWL 1.1 specification should take this into account and provide some direction and guidance to implementations about how to handle these situations correctly. I'm fine with viewing this as caching; however, I then believe we should standardize some common caching mechanisms across tools.

I would also like to mention that XML Schema -- a Web and W3C standard -- has correctly identified this problem, so the XML Schema standard takes these distinctions into account. Some people suggested at the F2F that this holds for WSDL (I don't know myself the details of WSDL so I can just repeat here what others have said).

The solution in more detail

If an ontology O imports an ontology O', then O should contain the ontology URI of O'. Furthermore, at any given point in time, the ontology and physical URIs of either of these two ontologies are allowed to be different. Each OWL 1.1 implementation is required to provide an oracle for mapping ontology to physical URIs.

This is roughly what the current OWL 1.1 draft says. I do agree, however, that this solution has an important drawback: there is no oracle that would be generally supported across implementations. This clearly hampers interoperability. Hence, in the rest of this e-mail, I shall describe a couple of oracles that might be made normative.


Oracle 1: File-based resolver

We could require each implementation to support at least the default oracle, which takes as input a file containing pairs of ontology and physical URIs. This file should have a trivial textual format, such as

<ontology URI><TAB><physical URI><CR/LF>

Alternatively, we might create a simple ontology instances of which would provide (ontology URI => physical URI) mappings.


Oracle 2: Physical location hints a la XML Schema

In XML schema, imports are by schema name; however, in order to aid an implementation in locating the imported schema, the importing schema can contain location hints. For example, you can write in the importing schema

<import namespace="http://www.w3.org/1999/xhtml" schemaLocation="file:/c:/Temp/schema.xsd"/>


We might provide a solution that works in similar vein. We could change the OWL 1.1 structural specification and say that an ontology contains zero or more *importation records*, each of which consists of exactly one ontology URI and a list of zero or more physical URIs. Such records could be serialized into RDF as follows:

<O owl11:importationRecord _:x> <_:x owl11:ontologyURI "the ontology URI of the imported ontology"> <_:x owl11:physicalURI "one of the physical URI hints"> <= repeated for each physical URI


The final ingredient: URI resolution strategy

Assume that an ontology URI O' is imported in an ontology O, and that we need to locate O' physically. An implementation should then do the following:

  1. An implementation should first try each physical URI specified in the importation record for O' in O. The physical URIs should be tried in the order specified in the importation record. As soon as an ontology is found at some physical URI, the algorithm terminates.
  2. An implementation should try application-specific oracles for resolving the ontology to the physical URI. An application is required to support at least the file-based oracle.
  3. If all this fails, the implementation should look for O' at the physical URI equal to O' -- that is, if everything else fails, the application should assume that the ontology URI is equal to the physical URI.

Regardless of how the ontology is found, the ontology URI specified in the ontology file should be exactly the same to O' -- that is, it should be illegal to resolve O' to some ontology file which contains some other ontology URI.

Finishing notes

Note that condition 3 above essentially allows you to have your cake and eat it: if the ontology and the physical URIs are the same, then imports are "by name". I believe, however, that it is important to alert OWL 1.1 implementors to the fact that the ontology and the physical URIs are usually not the same and that they have to think how to handle this in practice. The main drawbacks of OWL 1.0 implementations arose due to the fact that the specification did not contain any information about this issue, so some developers just didn't think about it.

Imports is by location (but location sameAs name asserted). Imports processed breadth first

Alan Ruttenberg

General case I am concerned about is some graph of ontologies, with the same ontology imported possibly more than once. In general I do not have write access to most of the ontologies. I want to make the location different from the name for one of three reasons:

  • Some or all of the ontologies in the graph are on a local disk for editing
  • I want to override the location of one of the ontologies in the graph to use a different version (either to test a new version of an ontology that I am developing, or to revert to a saved version because a new version has been published with the same name and it is buggy)
  • I want to provide historical versions of an ontology and let users of my ontology easily choose one of them.

Proposed solution:

  • The ontology header must be the first element in a resource
  • Ontology name and location are assumed to be the same
  • If an ontology imports from location U1 and the name of the imported ontology is U2 then U1 sameAs U2 is asserted.
  • Ontologies are only loaded once.
  • Imports are processed in breadth first order. The queue of remaining imports is initalized with the resource to process.
  • To process imports, repeat the below until the queue is empty. Then load the resources whose locations are in AI.
    • O = pop queue
    • If O names an ontology that is sameAs any ontology named by an element of AI no further processing of the resource is done.
    • Otherwise, the names of the imports in O are collected into ii and O is added to AI
    • For each ii
      • Read the ontology element of the resource located by ii.
      • Assert that the ontology named ii is sameAs the ontology named in the header.
      • Add ii to the end of the queue.

The usual practice to override some set of imports will be to create a new OWL file that imports all the ontologies, by location, that we want to override.

Alan's Imports and Versioning Proposal of April 4

As a presentation