Use Case Documenting Axiom Formulation

From XG Provenance Wiki
Jump to: navigation, search


Yolanda Gil


Paolo Missier

Provenance Dimensions

  • Primary: Attribution (Content)
  • Secondary: Entailment (Content), Argumentation/Justification (Content), Understanding (Use), Trust (Use)

Background and Current Practice

Much of the content of the semantic web is built by hand, including ontologies, linked data, mashups, etc. This means that many axioms and assertions were formulated by a person based on their understanding of how to model the domain at hand. In creating those axioms, the developer often consults documents and sources, makes some assumptions, and integrates information. It would be useful to record in enough detail what were the original sources consulted, what  pieces seemed contradictory or vague, which were then dismissed, what additional  hypotheses were formulated in order to complement the original sources, and ultimately how axioms came about.  However, this kind of information is not captured in current practice. Ontologies, assertions, and resources lack such records to provide rationale for their design, and as a result it makes it hard for others to reuse those ontologies and data. That kind of information would reveal for example what aspects or areas of the ontology they can be more confident about, for example because more resources were used to develop those areas or because more backing by sources is provided.

There are several other potential benefits to including this rationale  within an ontology, such as supporting its maintenance, facilitating its  integration with other ontologies, and integrating (or transferring) knowledge  among heterogeneous systems. 

This kind of information would also be useful to justify answers to end users not in terms of what reasoning steps were used but in terms of what initial knowledge the system had and where it came from. This type of justification is distinct from justifications of the reasoning itself, because here we are not interested in justifying the system's inferences but rather what sources were consulted to give the system its initial set of axioms.


A user wants to understand what an ontology axiom or an assertions means in the context of how it was created, in this use case when axioms and assertions are created by hand.

Use Case Scenario

A user is trying to create a resource (eg, a set of RDF triples, or a mashup) with ski resorts near their city for weekend trips. The user consults many sources and creates the resource indicating that they are due to combining information from four different sources. Source A is published by the visitor's bureau of the city and lists all nearby ski stations but only within a 50 mile radius of the city. Source B is published by the state and shows two additional stations that are reasonably close. Source C shows the traffic patterns on weekends for the roads to the ski stations. Source D is a national weather source that shows the snow conditions over winter and spring months. The user builds a small ontology of ski resort properties used to decide whether to include a given ski resort or not.

A second user discovers this resource and wonders why a certain ski station is not included. After checking the sources used to create the resource he decides that the ski station he saw missing was in a bordering state and therefore was not included. He also checks and finds out that the traffic source used is unknown to him. This user can decide to add on to what the original user did or dismissing it as not useful for their purposes.

A third user uses this resource and is surprised to find in it a ski station that he thought is very close to the city but does not get very much snow during the skiing season. He looks at the properties of the ski resorts that are defined in the ontology, and finds the definition of the property for "snow conditions". Its definition is attached to the source D documentation which discusses average snow fall. The user realizes that the creator of this resource did not take into account the true condition of the snow (eg, packed powder, fresh powder, etc) or the proportion of lifts that are open during the season. This explained why the ski station was included in the resource. The user decides to dismiss the resource as those criteria do not reflect his own criteria.

Note that the original user that created the resource may have consulted many other documents besides sources A,B,C,D. It may be useful to record as sources that were consulted but not found to be useful for the user, this may help someone determine how thorough or informed the user was (or what aspects they focused on) and based on that decide whether they would find the resource useful.

Problems and Limitations

In the simplest cases, axioms can be associated with the documents and resources (or portions) that back them up. In more complex cases, entire groups of axioms (graphs) may need to be associated with such provenance information.

The provenance of a set of axioms may be much larger than the axioms themselves. So there need to be mechanisms in place for management of this provenance: efficient reasoning on the axioms without provenance when needed, and inclusion of the provenance information when needed.

As the original resource evolves, the provenance information needs to be updated/ extended accordingly.

Existing Work

The use case is described in terms of the use of semantic web ontologies and data, but its motivation comes from uses of ontologies for engineering problems. Consider for example a system developed to estimate  the duration of carrying out specific engineering tasks, such as repairing a damaged  road or leveling uneven terrain.  Users invariably wanted explanations about where the  answers came from in terms of the sources we consulted and the sources that we chose  to pursue.  They wanted to know whether well-known engineering manuals were consulted, which were given more weight, whether practical experience was considered to refine theoretical estimates, and what authoritative sources were consulted to decide among competing recommendations.  In other words, the analysis process that knowledge engineers/developers perform is part of the rationale that needs to be  captured in order to justify answers to user queries.   

[Gil EKAW 02] describes a tool that enables  knowledge base developers to keep track of the knowledge sources and intermediate  knowledge fragments that result in a formalized piece of knowledge.  The resulting  ontology is enhanced with pointers that capture the rationale of its  design and development.