PIL OWL Ontology

From Provenance WG Wiki
Revision as of 17:45, 18 October 2011 by Ssahoo2 (Talk | contribs)

Jump to: navigation, search

Model Task Force


Satya has taken lead to develop the OWL ontology for PIL. Others that are helping include:

  • Khalid Belhajjame (GMT)
  • James Cheney
  • Daniel Garijo
  • Tim Lebo (Eastern Time)
  • Deborah McGuinness (Eastern Time)
  • Luc Moreau
  • Stian Soiland-Reyes (GMT)

Background materials


The OWL ontology materials are in the Mercurial repository at http://dvcs.w3.org/hg/prov/file/tip/ontology

OWL Introduction

  • HTML documentation of the OWL model.

OWL Encoding

Examples and Test cases

@prefix prov: <http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#> .
@prefix ext:  <http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/examples/ontology-extensions/crime-file/crime.owl#> .

Meeting notes

Tools Used


  • Should we also be exposing the (Java?) code that produced the OWL file? Or was Protege used?
  • Can we move the comments from rdfs:label to rdfs:comment?
    • e.g. <rdfs:label rdf:datatype="&xsd;string">A BOB represents an identifiable characterized entity.</rdfs:label> should become rdfs:comment.
    • DanielG: I agree with this change. Also, we should add the labels for each class and the language (e.g., "Agent"@en).

Initial comments/suggestions about the ontology

  • Time can be reused from other mereologies instead of defining our own concept in the ontology.
    • Suggestions:
    • W3C's Time Ontology: Adresses time instants and intervals, so we could reuse it. (Daniel G)
  • Location can be reused from other popular ontologies instead of defining our own concept in the ontology.
    • Suggestions:
    • wgs_84: It is widely used already, it is simple and provides the concept SpatialThing to relate to anything with spatial extent. (Daniel G)
  • Missing relationship between generation/use/derivation and Time/Location. (Daniel G)
    • 2 different ways to address this issue:
    • Define subproperties (generatedAtTime, generatedAtLocation). Example of modelling: more simple but it would lead to a loss of information (we assert the facts to the process execution rather than the relationship itself). However it looks better for inferencing new knowledge:


    • Make the properties n-ary. This would lead to declare the properties as concepts in the ontology (and it may be more difficult to inferr new knowledge. Example of how it would be modelled in OWL (n-ary relationship pattern):


  • Arities missing (to do yet).
  • Revision, Location, ProcessExecution are not subclasses of BOB in the current OWL document, but they are in the ontology spec.
  • Roles are not represented yet (Luc) - issue has been addressed
  • It would be good if names of relationships and "direction" were compatible with Model (see appendix A for conventions). Specifically, in the graphical notations, edges tend to point to the past. isUsedBy should become uses. (Luc)

Initial hierarchical diagram of PIL concepts

Hierarchy of concepts (without their relationships)


General diagram


Comments from the diagram:

  • I find it a bit confusing. I think it would be more clear to take the concepts as nodes in the graph and join them with the relationships instead of representing range and domain (or subclass) in the diagram. (Daniel G)
    • Interesting suggestions; could you draw up an example of "concepts as nodes" and "joining them with the relationships"? -TLebo
      • Yes, I was thinking about something like this (DanielG):


Characteristics of Object Properties

The table below summarizes the characteristics of the object properties that are defined in the OWL schema. Some of them may be subject to discussion. In particular, regarding the object properties isControlledBy, isGeneratedBy and isUsedBY, I didn't specify whether they are transitive or not. I am more inclined to specify that they are not transitive. However, one may argue that given that an agent can be a process execution, a process execution pe1 can be controlled by an agent pe2, which happens to be a process execution that is controlled by an agent ag, and that, therefore, ag (indirectly) controls pe1. The same argument can be applied to isGeneratedBy and isUsedBY. That said, I am not convinced these properties should be declared as transitive. (Khalid)

Functional Reverse functional Transitive Symmetric Asymmetric Reflexive Irreflexive
isControlledBy No No ? No Yes No Yes
isDerivedFrom No No Yes No Yes No Yes
isGeneratedBy Yes No ? No Yes No Yes
isUsedBy No No ? No Yes No Yes
isPrecededBy No No Yes No Yes No Yes

Cardinalities of Object Properties

The Figure below illustrates the cardinalities of the object properties defined so far in the OWL schema. As you will notice, all the cardinalities are of type zero to many, except that associated with the isGeneratedBy property which is of type zero to one, due to the fact that a Bob can be generated by at most one process execution.

Object Properties Cardinalities.PNG

Best Practices

Deborah mentioned the possibility of having a separate document for best practices. On this topic, there is some work that have been done by the Semantic Web Best Practices and Deployment Working Group at http://www.w3.org/2001/sw/BestPractices/OEP, which may be worth looking at.


PROV OWL ontology component examples

RDF Graph for Crime File Scenario

RDF/XML notation

moved to http://dvcs.w3.org/hg/prov/file/tip/ontology/examples/ontology-extensions/crime-file/instances/example-1/crime.ttl

Visualization of the RDF graph


(Click to enlarge the image)

Crime File Ontology

moved to http://dvcs.w3.org/hg/prov/file/tip/ontology/examples/ontology-extensions/crime-file/crime.owl

Workflow example

Stian has generated an early example of representing the provenance of a Taverna workflow using this ontology.

An Axiomatic Semantics for RDF, RDF-S, and DAML+OIL


Design Proposals


Dealing with the issues of "uses" relationship

After 8-08-2011 telecon, we have agreed to address this issue by a new modelling alternative introduced by Satya: instead of making the Agent direct participant in the process execution, we will create an intermediate class for the role of the participant agent. This approach mixes the previous two, and addresses the issues we had with them. The next image summarizes the modelling in the ontology.


According to this, our ontology diagram should be something like this:


Roles directly on the prov:used prov:Entity

   a prov:Entity;
   prov:used [
      a prov:Entity; 
      prov:actually :Khalid; 
      a prov:Role, restaurant:Customer;
      time:begin :t1, time:end :t2;

Possible roles for every process step in the journalism example

(I've added ?? in the ones I'm not completely sure(DanielG))

  • government (gov) converts data (d1) to RDF (f1) at time (t1)
    • role of gov: ConverterRole??
    • role of the data: SourceRole??
    • role of the process execution in the outcome generation : GenerationRole
  • government (gov) generates provenance information (prov) regarding RDF (f1)
    • role of gov: GeneratorRole
    • role of the used data: ReferenceRole (data used as reference)
    • role of the process execution in the outcome generation : CreationRole?¿
  • government (gov) publishes RDF data (f1) along with its provenance (prov) on a portal with a license (li1); the rdf data is now available as a Web resource (r1)
    • role of gov: PublisherRole
    • role of f1: ReferenceRole
    • role of li1: LicensingRole ??
    • role of the process execution in the outcome generation :PublicationRole
  • analyst (alice) downloads a turtle serialization (lcp1) of the resource (r1) from government portal
    • role of alice: RequesterRole
    • role of the resource: RequestedResourceRole ??
    • role of the process execution in the outcome generation : CreationRole (since it creates the file in your computer)
  • analyst (alice) generates a chart (c1) from the turtle (lcp1) using some software (tools1) with statistical assumptions (stats1)
    • role of alice: GeneratorRole
    • role of lcp1: LicensingRole
    • role of tools1: ReferenceSoftwareRole ??
    • role of stats1: ReferenceRole
    • role of the process execution in the outcome generation (c1): CreationRole
  • newspaper (news) obtains image (img1) from freelancer, Carlos.
    • role of news: RequesterRole
    • role of Carlos: ProviderRole
    • role of the process execution in the outcome generation: ObtentionRole¿?
  • newspaper (news) publishes the incidence map (map1), chart (c1) and the image (img1) within a document (art1) written by (joe) using license (li2)
    • role of news: PublisherRole
    • role of map1: ReferenceRole
    • role of c1: ReferenceRole
    • role of img1: ReferenceRole/IllustrationRole?
    • role of the process execution in the outcome generation:PublicationRole
  • government (gov) publishes an update (d2) of data (d1) as a new Web resource (r2)
    • role of gov: PublisherRole
    • role of d2: UpdateRole
    • role of d1: UpdatedResourceRole??
    • role of the process execution in the outcome generation:PublicationRole
  • blogger (bob) downloads turtle (lcp2) of the resource (r2) from government portal, determines that it's a different version of the same data
    • role of bob: RequesterRole
    • role of lcp2: SerializedResourceRole
    • role of the process execution in the outcome generation: ObtentionRole??
  • blogger (bob) generates new chart (c2) based on the data (lcp2) using some software (tools2) with statistical assumptions (stats2)
    • role of bob: GeneratorRole
    • role of lcp2: ReferenceRole
    • role of tools2: UsedSoftwareRole
    • role of stats2: ReferenceRole
    • role of the process execution in the outcome generation: GenerationRole
  • blogger (bob) publishes the chart (c2) under an open license (li3).
    • role of bob: PublisherRole
    • role of li3: LicensingRole??
    • role of the process execution in the outcome generation: PublicationRole


Using named graphs to model Accounts

Provenance Containers

Using named graphs to model Provenance Containers