Decision Incubator Tools/ODP Tour

From Decision XG
< Decision Incubator Tools
Revision as of 12:56, 20 April 2010 by Eblomqvi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

ODP Background

What are Ontology Design Patterns (ODPs)?

There are many different types of ODPs (see for a graphical illustration http://ontologydesignpatterns.org/wiki/OPTypes ). The main idea is that ODPs are encodings of some kind of best practices or solutions to common problems, that can be reused, either intellectually or concretely as a reusable component. All ODPs are concerned with ontology engineering in some sense, but they have different focuses, e.g., there are reasoning patterns that describe common reasoning services that you may want to perform on some ontology to reach some goal, alignment patterns that describe typical correspondences between elements of two or more ontologies etc. Hence, not all ODPs are directly targeted at the construction/design of an ontology. The types that will be most interesting for us when creating vocabularies for decision making are probably the logical and content ODPs, possibly also re-engineering ODPs and naming ODPs.


Re-engineering ODPs

Re-engineering ODPs describe patterns of transformations that can be used to transform non-ontological resources, e.g., thesauri, database schemas etc., into ontologies. See http://ontologydesignpatterns.org/wiki/Submissions:ReengineeringODPs for a list of suggested patterns.


Naming ODPS

In ontologies naming is much more important than for instance in software engineering (see also discussion in OWL tutorial). We can express some of the semantics of the domain through an ontology, but there will always be terms that indicate how to interpret classes and properties, i.e., the "base classes" by which others are defined, that are still implicitly defined through their naming and intensional descriptions. Additionally, there are good practices for forming URIs.


Logical ODPs

Logical ODPs describe solutions to problems of expressivity, i.e. "how do I express x in language y?". Examples are how to express n-ary relations, how to work around the issue that you cannot use classes as property values in OWL 1.0 etc. Examples from W3C can be found here (see working groups notes): http://www.w3.org/2001/sw/BestPractices/ There is also a catalogue from the University of Manchester: http://www.gong.manchester.ac.uk/odp/html/index.html and there is a few logical patterns (mostly anti-patterns) in the ODP portal: http://ontologydesignpatterns.org/wiki/Submissions:LogicalODPs Basically, if you want to model something and there is not a direct solution provided by the language, a Logical ODP can tell you how to do it. But a Logical ODP only considers logical constructs, i.e., they are abstract "ways to model".


Content ODPs

Content ODPs are from a theoretical perspective instances (of combinations) of logical ODPs. This means that Content ODPs treat similar issues as logical ODPs but from a more concrete perspective. Where logical ODPs only present how abstract logical structures can be applied to solve abstract problems of expressivity, Content ODPs present actual domain-dependent solutions to such problems. Domain-dependent here does not mean "valid in only one industry domain", it is not mainly about industry domains, but it means that these patterns contain actual concrete classes, properties, axioms etc. Hence, the domain can be very general, e.g., modelling time or events, but since the ODPs do contain actual classes they belong to some domain.

You can view Content ODPs as abstract solutions, and they are described through a template with a set of headings, see catalogue at: http://ontologydesignpatterns.org/wiki/Submissions:ContentOPs but they can also be seen as small concrete modelling component, i.e., highly reusable, well-specified, small ontologies. While a Logical ODP can be reused by intellectually understanding the idea of the solution and then start modelling from scratch, the idea of Content ODPs is to import, specialize, compose and extend the small modules/small vocabularies provided through the OWL building blocks of the Content ODPs.


So how do I use Content ODPs for ontology design?

There is a method proposed specifically for reusing Content ODPs, called eXtreme Design (XD). There is a paper about it if you want a more detailed look (1). In summary, the idea is to use an agile and iterative approach for developing ontologies. XD is heavily based on reusing ODPs and thereby a divide-and-conquer paradigm is inherent, which also results in highly modular ontologies. To take a small example: a common way to start modelling an ontology is to start listing all the relevant terms, create classes for those, start thinking about specializations and generalizations of those classes, then considering relations between classes, finally restrictions and so on. This is a kind of "waterfall"-like approach to ontology design. XD takes a quite different approach: first divide your problem into pieces based on small stories that describe particular modelling issues, develop modules solving these small partial problems and carefully test them, then integrate the new module into the overall solution ans test the overall before proceeding to the next part of the overall problem.

The current version of the method guidelines contain the following steps:

  • 1. Familiarize with the domain and task. - You need to know what the domain is about, and what the customer really wants to do, set the scope etc. Also agree on things like naming conventions, conventions for URIs etc.
  • 2. Collect requirements stories. - The customer writes small examples stories exemplifying the type of information to be represented and what they want to do with it. The stories need to be prioritized.
  • 3. Select a story. - This is where the iteration starts. Developers usually work in pairs, so the team is divided into such development pairs, and each pair selects one small story to start with.
  • 4. Transform the story into CQs. - Competency Questions is a common way to represent ontology requirements, but in principle this step just means extract the actual requirements from the story. If you have a story like: "During 2009 10 people were employed at the computer science department, half of them worked on project x during the first quarter." you could derive CQs like: "Who worked on what project during what time?", "Who was employed at what department at what time?". However, these should of course express the real tasks that the customer intend the ontology to support, so if these are not things that the system should be able to provide to the user, then these are not good requirements although they can be derived from the story. In addition to CQs you may want to represent additional constraints, e.g. in our example things like "a project always has at least one person working on it".
  • 5. Select one or more CQs - This is the start of the second, "inner", iteration, where the development pair selects among their own CQs, and chooses a coherent set to address in one iteration, i.e., the solution will end up in one module.
  • 6. Match CQs to GUCs of Content ODPs - The General Use Case (GUC) is the requirements that are covered by the pattern, so a set of very general requirements (usually expressed as CQs). The idea of this step is to try to find useful ODPs that cover some part of or problem. There is a search functionality in XD Tools, the XD Selector, and there is the possibility to view the detailed annotations of the ODPs, but from an "ease of use" perspective you may feel that it is easier to understand ODPs by accessing their page in the ODP portal.
  • 7. Select the Content ODPs to reuse - From the task before you get a set of possible ODPs that cover parts of your CQs. Usually you don't have a one-to-one mapping between problem and Content ODP, there may be overlapping ODPs, ODPs on different levels of abstraction that all fit the task etc. So now you have to select as subset that fits your needs. A good rule of thumb is to prefer a more specific ODP over a general/abstract one, but also consider not to select a complex ODP for a very simple requirement, then it may even be better to model from scratch.
  • 8. Reuse and integrate selected Content ODPs. - This is the step of the actual modelling. Here you can use the XD Tools specialization wizard to help you in importing and specializing the ODPs, but don't forget that you usually also need to extend the solution in some way and connect ODPs together (if you are reusing several for your module). Below is a brief description of operations/task that can be useful.
    • Import - An operation "native" to OWL, adding owl:import statements to your ontology. Importing means that ALL the statements of the imported ontology will be present in the ontology you import it into. You cannot change any imported statements, but you can add statements that refer to the imported elements. A common way to reuse a Content ODP is to import it into your ontology (module) and then proceed to specialize it.
    • Specialize - An operation that can be performed on classes or properties in OWL by adding subClassOf or subPropertyOf statements, i.e. creating more specific classes and properties. When we say that we "specialize" a Content ODP it means to specialize one or more of its elements, add more specific restrictions etc.
    • Extend - Usually there are some CQs, or parts of CQs, in your requirements list that are not covered by any Content ODP, or that are so simple that no ODP is needed (e.g. the CQ "What is the birth date of a certain person?" is a perfect candidate for modelling using a simple datatype property, rather than trying to ruse some ODP). So after specializing some ODPs, you may need to add some extra properties/classes/restrictions in order to cover the CQs completely.
    • Integrate/compose - A special case of extending your model, but that is often forgotten by inexperienced developers: if you reuse several ODPs most likely you need to add something to connect them, in order to support your CQ, e.g. a property or a restriction perhaps.
  • 9. Test and fix. - This is a very important step. As soon as you have finished the small module representing the (one or a couple) CQs you selected, you need to test the model against the CQs. This means that you should transform the CQs into queries and use test instances to check if the queries give correct results. In most cases it is not actually running the queries that will point you at the mistakes, but rather trying to write them (then you discover if you've missed something or if there is some other issue). For an OWL model we can use the query language SPARQL to write these queries, usually these type of queries are called "unit tests". Test instances can be added based on the user story you had at the beginning, or invented. Additional tests that can be good to try, apart from the unit tests, is to run the reasoner. First of all you can make sure you have a consistent model, but if you also add some instances you can then see if you agree with all the new classifications etc. that the reasoner gives you. If you don't agree, e.g. your person "Mike" is classified to be of type dog, then you can start debugging from there, trying to analyze how this was inferred. For any error you find, of course you don't proceed until you have removed it and all tests run without problems.
  • 10. Release module. - Make sure your module is properly documented before you "release" it, i.e. that all elements have a easy-to-understand human readable label, and that you add comments to all elements that are not completely self explaining. Also annotate the module as a whole, for example using the annotation properties in the CPAnnotationSchema, where you have properties that for instance can store your requirements related to that module (i.e. you store your CQs as a string value with your ontology). This point is also the end of the iteration for the design pair. Now the pair goes back to see if they have any CQs left from the current story, if so they proceed from step 5. If not, but there are still stories left to be treated, they go back to step 3.
  • 11. Integrate, test and fix. - Once at least two modules have been released they need to be integrated into an overall solution, i.e. ontology. Depending on what overall architecture you want of the ontology you can do it in different ways, but one common way is to create a new ontology file for the "overall solution" and when new modules become available you import them there and integrate them with the rest. Of course there can occur any kinds of problems here. Sometimes it may be enough to just define some alignments, if here are for example overlaps, while in other cases you may want/need to do some refactoring of the model, e.g. to remove overlaps or solve inconsistencies. When you are done you should be able to run all tests from step 9 again (ok, you have to maybe change the names in teh SPARQL queries if you changed something) and all results should still be correct. Here you should also add any "global" constraints that may exist, e.g. restrictions from the customer that was not related to just one user story.
  • 12. Release new version of the ontology. - Finally, you release a new version of the ontology, properly annotated with comments etc.
  Note: XD is task-focused! This means that you should model what is in the current CQs, not less but not more either.
  It is important not to let things get out of hand by starting to model "the whole world", just because maybe it can be useful...

Now, just remember that this is not a prescriptive method, in the sense that you can skip steps or run them in parallel if you want, but this is a guideline that describes how we have worked with Content ODPs and reached good results.

Content ODP Tutorial

So, let's run a little exercise to try out the Content ODPs and the XD methodology!


Below is the current context that you can imagine, and then a set of requirements to model (so let's assume that you have already selected this story, although the story is actually too long to be a really good one, and that you have agreed on the CQs with the customer).


Problem

Develop an OWL ontology starting from the below story and CQs. Note that the text is only there to help you understand the domain and the context of the ontology, but the actual modelling requirements are the competency questions and the contextual statement (the restriction). You can find the Content ODPs at this page or you can access them through the XD Tools plugin.

  Important: We are assuming that you will use OWL 1.0, some of the patterns are not intended 
  for OWL2, and your model should be within OWL-DL. Treat the CQs and the contextual statement as the 
  requirements of your model. The result of your modelling should represent these, but also conform to  
  modelling "best practices", e.g. appropriate naming of concepts.


Context

The national association for promotion of theater in Italy wants to set up a web-based system for keeping track of details about theater productions and the actors at different theaters. In order to support reasoning about the productions, the system should be based on an ontology. Below are some typical situations that should be representable in the ontology, and requirements in the form of competency questions and contextual statements.


Story: theater productions

During each year a number of theatre festivals are held in cities around Italy. In January 2007 a festival called “Roma Loves Shakespeare” took place in Rome. Two different productions of “The Merchant of Venice” participated, one from a theatre in Pisa and the other from a theatre institute in Venice, featuring an ensemble of university art students. Other plays were Othello and a Midsummer Night’s Dream.

The Grand Theatre in Rome offers two theatre shows each evening during September and October 2009. The play set up in this period is the "Merchant of Venice",given through an ensemble of well-known Italian actors. The Merchant of Venice was written during 1596 to 1598 by William Shakespeare, and it has 5 distinct acts. The premier of this production at The Grand Theatre was on September 7. Il Gazzettino gave the setup of the play 5 stars in a recent review.

Fabio Bianchi is an Italian actor employed at the theatre since May 2004, he is a part of the ensemble setting up the Merchant of Venice and he plays the Duke of Venice but also a servant in one of the scenes. During the second and third week of September the role of Shylock is played by Arnold Schwarzenegger as a special guest actor.


Competency questions (CQs) and contextual statements of theater production

  1.  When did a certain theatre festival take place?
  2.  Where did a certain festival take place?
  3.  What plays could be seen during a certain theatre festival?
  4.  In what city is a certain theatre located?
  5.  In what country is a certain city located?
  6.  What play is the basis of a certain production?
  7.  Who are the members of a certain ensemble during a certain time?
  8.  What plays did a certain author write?
  9.  During what time period was a certain play written?
  10. How many acts does a particular play contain?
  11. When was the premier of a certain production?
  12. What is the “star rating” given by a certain newspaper for a certain production?
  13. At what time did a certain actor start working for a specific theatre?
  14. What roles does a certain person have within a certain production during a certain time?
  Contextual statement: 
  1. A production has exactly one premier.


Exercise

Above are the requirements and the scenario to address. So, how to start? In terms of the XD methodology, we are now at step 5; about to select one or more CQs to start solving. If you start reading the requirements, you start to realize that some "naturally" goes together and some are more independent, i.e., 1-3 all talk about theater festivals and what they are about, 4-5 talks about locations, while 6 may be treated independently or together with 8-10 that are also about plays etc. The idea is to decide on one, or such a set of CQs (but it should be a SMALL set, i.e. no more than 3-4 at a time is advisable), to start working with. Let's start with the CQs 1-3.

Step 6 then tells us to go look for Content ODPs. If we are using NTK (or TopBraid Composer) with the XD Tools this means we switch to the XD perspective and start browsing the available repositories, or search for patterns using the ODP selector view. When we mark an ODP in the list, either of the repository browser or the search results, we can see its annotations in a view to the right. If we are instead using the ODP portal to browse Content ODPs, we see the list of all proposed Content ODPs (note that they are all "proposals", i.e. suggestions by the community, but not always discussed and agreed so evaluate for yourself if it seems reasonable - the certification process of the portal has not been started yet), and we can click on the name of a pattern to get to see all the information about it (including a graphical illustration, which is not available through the XD Tools currently). Whatever method you use, you will probably be most interested in the information under the headings "domain", "intent" and "competency questions" (covered requirements) to start with.

Now, it is important to focus on the "modelling issue" rather than the individual pieces of the problem. So what are we going to model? A festival. What is a festival? Some kind of event. Basically a festival is something that connects some set of plays to some time and some location, according to our requirements (then in reality a festival may be more things, i.e. have a name, but we care only about our requirements now). Let's look at some patterns and see if something fits. I go to http://ontologydesignpatterns.org/wiki/Submissions:ContentOPs (you can also use XD Tools) and look at the list of patterns. First I make a rough selection based on intent and domain that is presented in the list; I find the co-participation that has something to do with events (domain is "general"), as well as n-ary participation, and time-indexed participation. I also find the time-indexed situation, and the situation pattern itself (a festival could possibly be seen as a situation). So then I proceed to investigate each of them. Co-participation: it has the event and things participating in the event, but these are objects participating, are plays really objects? No probably not. N-ary participation has the event and models the participation as a separate class in order to represent it as an n-ary relation, in order to express that something can participate in an event only during some particular time period, but in our case it is not actually the participation that has a time it is the event itself. Time-indexed participation is very similar, also setting a time for the actual participation. No, neither of those fit. Time-indexed situation seems more reasonable, here it is the actual situation that has a time, but that time is an interval. The situation pattern is just a projection of the n-ary relation logical pattern on the domain of general situations, so this would work, but there we don't have the time in the pattern we would have to add it.

For choosing between situation and time-indexed situation we need to figure out if the time here is a time interval or not. The time-indexed situation is actually just a composition of the situation and time interval patterns, so situation is a more general pattern than the composition. The requirements don't tell us whether we have an interval so in principle we have to ask the customer, but since this is an example we just assume the most reasonable thing. A theater festival would in my opinion possibly span several days, or at least some time interval during one day, so a time interval seems reasonable. Hence, I choose the time-indexed situation pattern. I now download the OWL building block and import it in an empty ontology, or use the XD Tools specialization wizard to create a new module where to import my result and then go through the specialization steps. Let's assume we specialize it without the wizard, so you get the idea of what to do.

First I create a new ontology module, i.e. a new ontology with its own URI. Remember that URIs should be resolvable, so we should use a URI that really exists, and then give the ontology module a good and descriptive name as well. I choose the URI: http://www.ontology.se/examples/TheaterFestival.owl because I will put the module in the folder "examples" on the ontology.se domain when I'm done. This URI now becomes the XML namespace of this ontology. When I have the empty ontology, I proceed to import the OWL building block of the pattern, i.e. timeindexedsituation.owl. The pattern locally defines a type of situation, and a couple of properties, but it also in turn imports the situation and time interval patterns, as well as the cpannotationschema, which defines a set of annotation properties for documenting patterns (and ontologies in general).

The next step is to start the specialization. The rule of thumb is to specialize all the most specific classes and properties of the pattern, as far as we need them. Already when we analyzed the pattern before selecting it we thought of the festival as a kind of situation, so this should most likely be a specialization of the class "TimeIndexedSituation", let's add it as a subclass. Now we have the three things connected to the festival; the place, the time interval, and the plays that are shown. Time interval is already there as a class. Should we specialize it or not? Well, it can depend a bit on if we intend to keep the imported pattern or if we intend to remove it later, if we intend to remove it later we should make a subclass so that the module is more self-contained, but otherwise it does not make much sense to have a particular type of intervals just for festivals. I don't make a subclass, however I add some restrictions on the TheaterFestival class that I created to say that it always has some time interval associated. We can see that it is the "atTime" property that connects a time indexed situation to a time interval so I use this and state that a TheaterFestival is a subclass of all things that have some (this is the owl:someValuesFrom) time interval. Why not equivalent? Because there are other things that have time intervals that are not theater festivals (necessary but not sufficient condition)! Another possible condition would be that a theater festival has exactly one time interval.

Now, let's deal with the location. I want to say that the TheaterFestival is held in a place. I create the class "Place" and then create a sub-property of the "forEntity" property, which intends to relate the involved entities to the time-indexed situation (in our case one such entity is the place). I name it "hasLocation". The forEntity property has an inverse, called "hasTimeIndexedSetting", so I make sure I create an inverse of hasLocation as a subproperty of this, e.g. "isLocationOf", and explicitly state that they are inverses of each other. Then I again add a restriction on the TheaterFestival class, to say that all festivals are located somewhere.

Finally, we have the plays. I add a class "Play" and a property that is again a sub-property of the forEntity property. I create its inverse, and add a restriction on the TheaterFestival class.

Before to leave this problem and go to the next I should test my module, and see that I can create queries to retrieve the information corresponding to the CQs. First, I add some example instances (e.g. from the story) and then I write the following SPARQL query:

  SELECT ?festival ?start ?end ?place ?play
  WHERE {
    ?festival :hasPlay ?play .
    ?festival :hasLocation ?place .
    ?festival timeindexedsituation:atTime ?time .
    ?time timeinterval:hasIntervalEndDate ?end .
    ?time timeinterval:hasIntervalStartDate ?start .
  }

With this small model I can see already when writing the query that it is going to give me the right results, but it is an important check. I also run the reasoner and check all the inferences I get, so that there is nothing strange there. Finally, when everything is ready I also check that I have annotated my module properly. All classes and properties should have labels, and to be safe also comments. Then the ontology could be commented as well, for instance using the cpannotationschema, to for example store the CQs corresponding to the module as the "covered requirements". My finished module can be found here: http://www.ontology.se/examples/TheaterFestival.owl

Time to select a new set of CQs. Let's take the following 2 (4-5) about locations. I create a new ontology module, i.e. a new ontology with its own URI, again. Then start to look for patterns... and so on.

Time for you to do some modelling on your own! Proceed to follow the XD iterations and after each iteration also try to integrate your modules, i.e. when you finish the locations-module, create a new empty ontology, import both modules so far, and connect them. Either you are satisfied with just defining an alignment (e.g. stating that some classes and/or properties are equivalent or sub/super classes/properties) or you decide to do some refactoring, i.e. to go back into one module or the other and make some changes, e.g. so that both modules don't define the same concept, but maybe one module instead refers to (or even imports) the other one. If you make changes, then make sure that all your SPARQL queries still run (i.e. you have to rewrite them as well to be able to test). In this way you build your ontology incrementally, with the focus on one problem at a time.

A suggestion for complete solution can be found here: http://www.ontology.se/examples/TheaterOntology.owl This is not the only correct solution, merely one possibility!


References

(1) Valentina Presutti, Enrico Daga, Aldo Gangemi, and Eva Blomqvist. eXtreme Design with content ontology design patterns. In: Proceedings of the Workshop on Ontology Patterns (WOP 2009), collocated with the 8th International Semantic Web Conference (ISWC-2009), Washington D.C., USA, 25 October, 2009. Vol. 516. CEUR Workshop Proceedings, 2009. Available at: http://ceur-ws.org/Vol-516/pap21.pdf