Difference between revisions of "PIL OWL Ontology"

From Provenance WG Wiki
Jump to: navigation, search
(Steps taken for CR)
(Steps taken for CR)
Line 312: Line 312:
 
*** perl -pi -e "s|http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html|http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/CR-prov-dm-20121211/Overview.html|g" releases/CR-prov-o-20121211/Overview.html
 
*** perl -pi -e "s|http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html|http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/CR-prov-dm-20121211/Overview.html|g" releases/CR-prov-o-20121211/Overview.html
 
*** grep prov-dm releases/WD-prov-o-2012MMDD/Overview.html
 
*** grep prov-dm releases/WD-prov-o-2012MMDD/Overview.html
 +
*** <code>hg add releases/CR-prov-o-20121211/Overview.html releases/CR-prov-o-20121211/diagrams/</code>
 +
*** <code>hg commit -m 'staged prov-o html'</code>
 
** Change links to DM's final resting place
 
** Change links to DM's final resting place
  

Revision as of 17:53, 21 November 2012

Model Task Force

Participants

Satya has taken lead to develop the OWL ontology for PIL. Others that are helping include:

  • Khalid Belhajjame (GMT)
  • James Cheney
  • Daniel Garijo (GMT-8h. I'm in California until Oct)
  • Tim Lebo (Eastern Time)
  • Deborah McGuinness (Eastern Time)
  • Luc Moreau
  • Stian Soiland-Reyes (GMT)

Materials

The OWL ontology materials are in the Mercurial repository at http://dvcs.w3.org/hg/prov/file/tip/ontology

Presentation of the ontology

Documentation of the OWL Formal Model

  • HTML documentation of the OWL model.

OWL Encoding

PROV-O URI namespace

Examples and Test cases

@prefix prov: <http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceOntology.owl#> .
@prefix ext:  <http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/examples/ontology-extensions/crime-file/crime.owl#> .

Background materials

http://owl.cs.manchester.ac.uk/validator/

RL compliance

http://www.cs.man.ac.uk/~ezolin/dl/ - Complexity of reasoning in Description Logics

https://www.w3.org/2011/prov/track/issues/265

How to stay RL?

  • Multiple parents are fine, but not unions.

http://www.w3.org/TR/owl2-profiles/#Feature_Overview_3

JAR checker

Stian built an OWL profile checker. This can be invoked using make:

cd ontology/
make

Or directly using java -jar:

ls bin/
  profilechecker.jar
java -jar bin/profilechecker.jar ProvenanceOntology.owl OWL2RLProfile


If everything is fine, there is no further output.

stain@ralph ~/src/provenance-wg/prov/ontology; make
java -jar bin/profilechecker.jar ProvenanceOntology.owl OWL2RLProfile

However, if I add that Element is a subclass of (Activity or Entity) I get:

stain@ralph ~/src/provenance-wg/prov/ontology; make
java -jar bin/profilechecker.jar ProvenanceOntology.owl OWL2RLProfile

Use of non-superclass expression in position that requires a
superclass expression:
ObjectUnionOf(<http://www.w3.org/ns/prov-o/Activity>
<http://www.w3.org/ns/prov-o/Entity>)
[SubClassOf(<http://www.w3.org/ns/prov-o/Element>
ObjectUnionOf(<http://www.w3.org/ns/prov-o/Activity>
<http://www.w3.org/ns/prov-o/Entity>)) in
<http://www.w3.org/ns/prov-o/>]
make: *** [test] Error 1

See https://github.com/stain/profilechecker for source code of the JAR. It is based on OWL API 3.2.4, and can also check against other (or all) profiles.

cd
java -jar bin/profilechecker.jar ProvenanceOntology.owl --all
OWL2DLProfile: OK
OWL2ELProfile: 64 violations
OWL2Profile: OK
OWL2QLProfile: 12 violations
OWL2RLProfile: OK

QL does not like irreflexive and functional. EL just goes crazy over annotation properties. Not sure if I need to include an RDFS owl.

Breaks RL: one startedAt

+        <rdfs:subClassOf>
+            <owl:Restriction>
+                <owl:onProperty rdf:resource="&prov;startedAt"/>
+                <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:cardinality>
+            </owl:Restriction>
+        </rdfs:subClassOf>


Use of non-superclass expression in position that requires a superclass expression: ObjectExactCardinality(1 <http://www.w3.org/ns/prov-o/startedAt> owl:Thing) [SubClassOf(<http://www.w3.org/ns/prov-o/Activity> ObjectExactCardinality(1 <http://www.w3.org/ns/prov-o/startedAt> owl:Thing)) in <http://www.w3.org/ns/prov-o/>]

Breaks RL: one TimeInstant on startedAt

+        <rdfs:subClassOf>
+            <owl:Restriction>
+                <owl:onProperty rdf:resource="&prov;startedAt"/>
+                <owl:onClass rdf:resource="&prov;TimeInstant"/>
+                <owl:qualifiedCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:qualifiedCardinality>
+            </owl:Restriction>
+        </rdfs:subClassOf>
Use of non-superclass expression in position that requires a superclass expression: ObjectExactCardinality(1 <http://www.w3.org/ns/prov-o/startedAt> <http://www.w3.org/ns/prov-o/TimeInstant>) [SubClassOf(<http://www.w3.org/ns/prov-o/Activity> ObjectExactCardinality(1 <http://www.w3.org/ns/prov-o/startedAt> <http://www.w3.org/ns/prov-o/TimeInstant>)) in <http://www.w3.org/ns/prov-o/>]

Breaks RL: Subclass of owl:Thing

To eliminate this error, remove "subclassOf Thing"

     <owl:Class rdf:about="&prov;Entity">
-        <rdfs:subClassOf rdf:resource="&owl;Thing"/>
         <rdfs:comment xml:lang="en">An identifiable characterized entity.</rdfs:comment>
         <prov:category>simple</prov:category>
         <rdfs:seeAlso rdf:resource="http://www.w3.org/2011/prov/wiki/ProvRDF#Entity"/ 
Use of non-superclass expression in position that requires a superclass expression: owl:Thing [SubClassOf(<http://www.w3.org/ns/prov#Entity> owl:Thing) in <http://www.w3.org/ns/prov#>]

automated DM definitions

(Luc, 12 March)

Look at file: http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-glossary.html

You need to add the following div element.

<div id="glossary_div" class="remove">
<!-- glossary loaded from glossary.js will be hooked up here,
    class remove, will remove this element from the final output.
-->
</div>

You need to add the following, which is generated automatically from the glossary.html file.

<script src="glossary.js" class="remove"></script>



You need to add a bit of javascript at the beginning
<script class="remove">
     function updateGlossaryRefs() {
       $('.glossary-ref').each(function(index) {
         var ref=$(this).attr('ref');
         var span=$(this).attr('withspan')
         // look up directly in the_glossary
         //$(this).html(the_glossary[ref]);
         // if glossary is in a string:
         $('#'+ref+'.glossary').contents().clone().appendTo($(this));
         $(this).attr('prov:hadOriginalSource',glossary_hg);
         if (span) {
           $(this).children('dfn').replaceWith(function(){return $('<span>').addClass('dfn').append($(this).contents())});
        }
       });
    }
     $(document).ready(function(){
       // if glossary is in a string:
       $('#glossary_div').html(glossary_string)
       updateGlossaryRefs();
     });

</script>

For each definition, you want to extra, you add a div element like the following:

diagram styling

Luc, Mar 28

Entities, activities and agents are represented as nodes, with oval, rectangular, and octagonal shapes, respectively. Usage, Generation, Derivation, and Activity Association are represented as directed edges.

Entities are laid out according to the ordering of their generation. We endeavor to show time progressing from left to right. This means that edges for Usage, Generation, Derivation, Association typically point leftwards.


Alternatively, time going downward if it's more convenient.

In terms of tooling, Paolo started with graffle, but at some point Luc starting editing the svg with inkscape.

  • Entities are yellow ovals
    • Saturated: (255, 252, 135) or #FFFC87
    • Desaturated: (255, 253, 229)
  • Activities are blue rectangles
    • Saturated: (159, 177, 252) or #9FB1FC
    • Desaturated: (216, 217, 254)
  • Agents are orange pentagon houses or (upward isosceles triangles)
    • Saturated: (254, 211, 127) or #FED37F
    • Desaturated: (253, 238, 205)

Prov-colors.png

W3C Style Guidelines

Steps taken for CR

TODO: change prov.owl to prov-o.owl

TOOD: include snapshot MMDD in /ns

TODO: take off extensions when referencing to stuff.

  • 0) Obtain MMDD (the target W3C publication date) from WG Chairs (e.g. 1211, 0724)

OWL:

  • 1) Make sure hg pull and hg status are clear.
  • 2) Fix URL references
  • 3) Create Turtle version: rapper -g -o turtle ProvenanceOntology.owl > prov-o.ttl
  • 4) Make sure MMDD was replaced with grep MMDD ProvenanceOntology.owl prov-o.ttl
  • 5) hg commit -m 'added prov-o owl:versionIRI'; hg tag WD-prov-o-2012MMDD; hg push

Denis says (to point to other WG docs):

HTML:


When CR is published, change prov:wasRevisionOf <http://www.w3.org/TR/2012/WD-prov-o-20120724/prov.owl>; to the latest MMDD of CR.


Make sure http://www.w3.org/TR/prov-o/prov.owl will exist.

Ivan's notes on making the namespace dereferencable: http://www.w3.org/mid/32A1ACAD-7D5C-45FB-9D0D-26A9C0365717@w3.org

(revert to draft: perl -pi -e 's|http://www.w3.org/TR/2012/WD-prov-dm-20120703/%7Chttp://dvcs.w3.org/hg/prov/raw-file/default/model/%7Cg' ProvenanceOntology.owl)

Steps taken for LC

NOTE: These steps are done. The result is at https://dvcs.w3.org/hg/prov/raw-file/tip/ontology/releases/WD-prov-o-20120724/Overview.html The webmaster should take prov-o from there.

  • 0) Obtain MMDD (the target W3C publication date) from WG Chairs (e.g. 0703, 0711, 0724)

OWL:

  • 1) Make sure hg pull and hg status are clear.
  • 2) Fix URL references
  • 3) Create Turtle version: rapper -g -o turtle ProvenanceOntology.owl > prov.ttl
  • 4) Make sure MMDD was replaced with grep MMDD ProvenanceOntology.owl prov.ttl
  • 5) hg commit -m 'added prov-o owl:versionIRI'; hg tag WD-prov-o-2012MMDD; hg push

Denis says:

HTML:



Make sure http://www.w3.org/TR/prov-o/prov.owl will exist.

Ivan's notes on making the namespace dereferencable: http://www.w3.org/mid/32A1ACAD-7D5C-45FB-9D0D-26A9C0365717@w3.org

(revert to draft: perl -pi -e 's|http://www.w3.org/TR/2012/WD-prov-dm-20120703/%7Chttp://dvcs.w3.org/hg/prov/raw-file/default/model/%7Cg' ProvenanceOntology.owl)

Steps taken for WD2

Steps taken for FPWD


p.s We also had a discussion thread on Oct 11 on the steps.


ctrl-alt-shift-s

Also purely technically, I saved the ReSpec HTML to a pure (non-javascript) HTML 4.0 document by using Ctrl-Alt-S inside Firefox (it does not do it correctly from Chrome).

Ctr-alt-shift-s-respec.png

try ctrl-shift-alt S - that works in Chromium at least. (But Chromium gives buggy HTML).

Note that the XHTML export does not work well as it messes up the CSS and the examples - perhaps this is fixable.


We copied this HTML and figures to the fpwd/ subfolder in mercurial - to avoid having to play with branches etc. This is the one I fed into the various validators.

I then handed the URL of that to the publication guys.


The outer "an example" is not needed:

        <div class="anexample">
           <div class="exampleOuter">
              <p>The following PROV-O describes the resources involved when creating a chart about crime statistics. The example uses only St
arting Point terms and serves as a basis for elaboration that will be described in subsequent sections. In the example, Derek performs an agg
regation of some government crime data, grouping by national regions that are described in a separate dataset by a civil action group.
              </p>
              <pre class="example">{% include "includes/eg16-journalism-simple-without-comments.ttl" %}
          </div>
       </div>

</pre>

Styling rules in prov-o html

First reference to a term:

<a href='#knownMembership' class="qname">prov:knownMembership</a>

Repeated reference within same narrative:

<span class="repeated">prov:derivedByInsertionFrom</span> 

All References to instance data:

<code>:c2</code>

suggestion:

I guess HTML-formally, we should use <code> for terms, <var> for
variables (<var>:ex1</var>), and <samp> for inline examples (<samp>:x
a prov:Collection</samp>) and <a><code> ..</code></a> for first use of
other term.

Tools to check conneg with

  • Pellet
  • Protege
  • TopBraid Composer
  • Tabulator

Meeting notes

Outstanding issues as of Dec 2011

This list is current issues to sync PROV-O with DM's 2WD as of Dec 2011

Should these be listed properly on the tracker?


Constraints defined in PROV-DM - distinguish between enforceable constraints and well-formed constraints

Accounts (and named graphs)+ examples - coordinate between the RDF WG (RDF1.1) and prov WG - make note in PROV-O html document.

Reworked list of terms to sync with PROV-DM:

  • prov:follows and prov:precedes properties missing.
  • Replace Recipe with Plan.
  • Add Person, Organization and SoftwareAgent as subtypes of Agent. Add equivalence between prov:Person and foaf:Person
  • Note records are missing in prov-o. Also the property "hasAnnotation".
  • wasAssociatedWith is missing in prov-o.
  • wasStartedBy and wasEndedBy relationships missing in prov-o.
    • Note (Daniel G): should wasAssociatedWith, wasStartedBy and wasEndedBy have qualified involvements? According to prov dm, they can be qualified with roles. In particular, wasAssociatedWith can be qualified with a plan: an agent executes an activity according to a plan. Should the plan be linked to the associatedwith or to the activity? Can activities have more than 1 plan? (To be discussed).
  • actedOnBehalfOf relationship missing.
  • prov:steps property (to qualify derivations) is missing. This implies to create a qualified involvement for the derivation too.
  • alternateOf and specializationOf properties are missing in prov-o. They replace wasComplementOf.
  • ProvenanceContainer should be renamed RecordContainer. Also, according to prov-dm a record container should not be defined as a prov-dm record (i.e, we should not include it in the ontology), because otherwise it could appear arbitrarily nested inside accounts.
    • I think we should remove this term (Daniel G)
  • tracedTo relationship missing.
  • wasControlledBy, wasEventuallyDerivedFrom, dependedOn, hadParticipant and wasScheduledAfter properties should be removed from prov-o.

Initial list:

  • Agents association
  • Plan (in place of Recipe)
  • wasScheduledAfter - overloaded property
  • Start/End of Activity
  • onBehalfOf relation between agents and an activity
  • Derivation - updated in DM, three kinds of derivation

Tools Used

Questions

  • Should we also be exposing the (Java?) code that produced the OWL file? Or was Protege used?
  • Can we move the comments from rdfs:label to rdfs:comment?
    • e.g. <rdfs:label rdf:datatype="&xsd;string">A BOB represents an identifiable characterized entity.</rdfs:label> should become rdfs:comment.
    • DanielG: I agree with this change. Also, we should add the labels for each class and the language (e.g., "Agent"@en).

Initial comments/suggestions about the ontology

  • Time can be reused from other mereologies instead of defining our own concept in the ontology.
    • Suggestions:
    • W3C's Time Ontology: Adresses time instants and intervals, so we could reuse it. (Daniel G)
  • Location can be reused from other popular ontologies instead of defining our own concept in the ontology.
    • Suggestions:
    • wgs_84: It is widely used already, it is simple and provides the concept SpatialThing to relate to anything with spatial extent. (Daniel G)
  • Missing relationship between generation/use/derivation and Time/Location. (Daniel G)
    • 2 different ways to address this issue:
    • Define subproperties (generatedAtTime, generatedAtLocation). Example of modelling: more simple but it would lead to a loss of information (we assert the facts to the process execution rather than the relationship itself). However it looks better for inferencing new knowledge:

Alternative2.png

    • Make the properties n-ary. This would lead to declare the properties as concepts in the ontology (and it may be more difficult to inferr new knowledge. Example of how it would be modelled in OWL (n-ary relationship pattern):

Alternative1.png

  • Arities missing (to do yet).
  • Revision, Location, ProcessExecution are not subclasses of BOB in the current OWL document, but they are in the ontology spec.
  • Roles are not represented yet (Luc) - issue has been addressed
  • It would be good if names of relationships and "direction" were compatible with Model (see appendix A for conventions). Specifically, in the graphical notations, edges tend to point to the past. isUsedBy should become uses. (Luc)

Initial hierarchical diagram of PIL concepts

Hierarchy of concepts (without their relationships)

InitialDiagram.png

General diagram

TimGenDiagram.png

Comments from the diagram:

  • I find it a bit confusing. I think it would be more clear to take the concepts as nodes in the graph and join them with the relationships instead of representing range and domain (or subclass) in the diagram. (Daniel G)
    • Interesting suggestions; could you draw up an example of "concepts as nodes" and "joining them with the relationships"? -TLebo
      • Yes, I was thinking about something like this (DanielG):

GlobalSchema.png

Characteristics of Object Properties

The table below summarizes the characteristics of the object properties that are defined in the OWL schema. Some of them may be subject to discussion. In particular, regarding the object properties isControlledBy, isGeneratedBy and isUsedBY, I didn't specify whether they are transitive or not. I am more inclined to specify that they are not transitive. However, one may argue that given that an agent can be a process execution, a process execution pe1 can be controlled by an agent pe2, which happens to be a process execution that is controlled by an agent ag, and that, therefore, ag (indirectly) controls pe1. The same argument can be applied to isGeneratedBy and isUsedBY. That said, I am not convinced these properties should be declared as transitive. (Khalid)


Functional Reverse functional Transitive Symmetric Asymmetric Reflexive Irreflexive
isControlledBy No No ? No Yes No Yes
isDerivedFrom No No Yes No Yes No Yes
isGeneratedBy Yes No ? No Yes No Yes
isUsedBy No No ? No Yes No Yes
isPrecededBy No No Yes No Yes No Yes

Cardinalities of Object Properties

The Figure below illustrates the cardinalities of the object properties defined so far in the OWL schema. As you will notice, all the cardinalities are of type zero to many, except that associated with the isGeneratedBy property which is of type zero to one, due to the fact that a Bob can be generated by at most one process execution.

Object Properties Cardinalities.PNG

Best Practices

Deborah mentioned the possibility of having a separate document for best practices. On this topic, there is some work that have been done by the Semantic Web Best Practices and Deployment Working Group at http://www.w3.org/2001/sw/BestPractices/OEP, which may be worth looking at.

Examples

PROV OWL ontology component examples

RDF Graph for Crime File Scenario

RDF/XML notation

moved to http://dvcs.w3.org/hg/prov/file/tip/ontology/examples/ontology-extensions/crime-file/instances/example-1/crime.ttl

Visualization of the RDF graph

A

(Click to enlarge the image)

Crime File Ontology

moved to http://dvcs.w3.org/hg/prov/file/tip/ontology/examples/ontology-extensions/crime-file/crime.owl

Workflow example

Stian has generated an early example of representing the provenance of a Taverna workflow using this ontology.

An Axiomatic Semantics for RDF, RDF-S, and DAML+OIL

http://www.daml.org/2001/03/axiomatic-semantics.html


An effort to define RDF semantics in terms of First order Logic as an alternative to the direct model theory http://www.w3.org/TR/2003/NOTE-lbase-20031010/ -Graham

Design Proposals

Roles

Dealing with the issues of "uses" relationship

After 8-08-2011 telecon, we have agreed to address this issue by a new modelling alternative introduced by Satya: instead of making the Agent direct participant in the process execution, we will create an intermediate class for the role of the participant agent. This approach mixes the previous two, and addresses the issues we had with them. The next image summarizes the modelling in the ontology.

Alternative3.png

According to this, our ontology diagram should be something like this:

WhatItShouldBe.png


Roles directly on the prov:used prov:Entity

:my_pe 
   a prov:Entity;
   prov:used [
      a prov:Entity; 
      prov:actually :Khalid; 
      a prov:Role, restaurant:Customer;
      time:begin :t1, time:end :t2;
   ] 
.

Possible roles for every process step in the journalism example

(I've added ?? in the ones I'm not completely sure(DanielG))

  • government (gov) converts data (d1) to RDF (f1) at time (t1)
    • role of gov: ConverterRole??
    • role of the data: SourceRole??
    • role of the process execution in the outcome generation : GenerationRole
  • government (gov) generates provenance information (prov) regarding RDF (f1)
    • role of gov: GeneratorRole
    • role of the used data: ReferenceRole (data used as reference)
    • role of the process execution in the outcome generation : CreationRole?¿
  • government (gov) publishes RDF data (f1) along with its provenance (prov) on a portal with a license (li1); the rdf data is now available as a Web resource (r1)
    • role of gov: PublisherRole
    • role of f1: ReferenceRole
    • role of li1: LicensingRole ??
    • role of the process execution in the outcome generation :PublicationRole
  • analyst (alice) downloads a turtle serialization (lcp1) of the resource (r1) from government portal
    • role of alice: RequesterRole
    • role of the resource: RequestedResourceRole ??
    • role of the process execution in the outcome generation : CreationRole (since it creates the file in your computer)
  • analyst (alice) generates a chart (c1) from the turtle (lcp1) using some software (tools1) with statistical assumptions (stats1)
    • role of alice: GeneratorRole
    • role of lcp1: LicensingRole
    • role of tools1: ReferenceSoftwareRole ??
    • role of stats1: ReferenceRole
    • role of the process execution in the outcome generation (c1): CreationRole
  • newspaper (news) obtains image (img1) from freelancer, Carlos.
    • role of news: RequesterRole
    • role of Carlos: ProviderRole
    • role of the process execution in the outcome generation: ObtentionRole¿?
  • newspaper (news) publishes the incidence map (map1), chart (c1) and the image (img1) within a document (art1) written by (joe) using license (li2)
    • role of news: PublisherRole
    • role of map1: ReferenceRole
    • role of c1: ReferenceRole
    • role of img1: ReferenceRole/IllustrationRole?
    • role of the process execution in the outcome generation:PublicationRole
  • government (gov) publishes an update (d2) of data (d1) as a new Web resource (r2)
    • role of gov: PublisherRole
    • role of d2: UpdateRole
    • role of d1: UpdatedResourceRole??
    • role of the process execution in the outcome generation:PublicationRole
  • blogger (bob) downloads turtle (lcp2) of the resource (r2) from government portal, determines that it's a different version of the same data
    • role of bob: RequesterRole
    • role of lcp2: SerializedResourceRole
    • role of the process execution in the outcome generation: ObtentionRole??
  • blogger (bob) generates new chart (c2) based on the data (lcp2) using some software (tools2) with statistical assumptions (stats2)
    • role of bob: GeneratorRole
    • role of lcp2: ReferenceRole
    • role of tools2: UsedSoftwareRole
    • role of stats2: ReferenceRole
    • role of the process execution in the outcome generation: GenerationRole
  • blogger (bob) publishes the chart (c2) under an open license (li3).
    • role of bob: PublisherRole
    • role of li3: LicensingRole??
    • role of the process execution in the outcome generation: PublicationRole

Accounts

Using named graphs to model Accounts

An alternative to model Roles

An alternative to model Roles

Provenance Containers

Using named graphs to model Provenance Containers