ProvRDF

From Provenance WG Wiki
Revision as of 01:54, 13 February 2012 by Ssahoo2 (Talk | contribs)

Jump to: navigation, search

Introduction

This document gives a draft (detailed) translation from (some of) PROV-DM to PROV-O, and sketches how to go in the reverse direction (i.e. how to extract PROV-DM from a RDF graph that includes PROV-O data as well as possibly other RDF).

Note that I (jcheney) am not being careful about using a standardized RDF syntax, as I don't know any of them. I am just giving the flavor of what I have in mind.

Coverage: This is NOT complete, only for illustration so far. It covers many of the basic element and relation records. It does not cover: derivation, acts on hbehalf of, accounts, ....

Guideline: Include all RDF assertions associated with a DM assertion, even if some of them wind up being redundant/inferrable.

From PROV-DM to PROV-O

We define a translation from PROV-DM formulas to RDF conforming to PROV-O as follows.

There are some places where it's non-obvious (to jcheney) what to do, marked with "???".

Mapping coverage

The undersigned have reviewed DM WD3 and agree that all ASN signatures in WD3 appear as left hand sides of the rules shown on this page. Further, the rules here are in the same order as DM WD3 and no rules appear here without appearing in DM WD3.

  • Daniel Garijo (10-Feb-2012)
  • Add your name here (date)
  • and here (date)

The formulas are listed in an order that corresponds to the order given in PROV-DM WD3.

Translating element formulas

PROV-DM Element

Entity

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} entity(id,[attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl} id & \texttt{a} & \texttt{prov:Entity}\ . \\ id & attr_1 & val_1 \ .\\ & \vdots\\ id & attr_n & val_n \ . \end{array}\right. \end{array} </math>

Uses before defined: prov:type

Activity

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl}

activity(id,[st],[et],[attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{prov:Activity}\ . \\
 id & \texttt{prov:startedAt} & st\ . \\
 id & \texttt{prov:endedAt} & et\ .\\
 id & attr_1 & val_1 \ .\\
    & \vdots\\
 id & attr_n & val_n \ .

\end{array}\right.

\end{array} </math>

Uses before defined: prov:type

Issues (LHS):

  • How is startedAt and endedAt distinct from other attribute-value associated with Activity? (Satya)

Agent

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl}

agent(id, [attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{prov:Agent}\ . \\
 id & \texttt{a} & \texttt{ns:Type}\ .\\
 id & attr_1 & val_1\ .\\
    & \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>


prov:Person        rdfs:subClassOf prov:Agent .
prov:Organization  rdfs:subClassOf prov:Agent .
prov:SoftwareAgent rdfs:subClassOf prov:Agent .

[] a owl:AllDisjointClasses; 
   owl:members ( prov:Person prov:Organization prov:SoftwareAgent ).

Mentions but does not define:

  • prov:Person,
  • prov:Organization,
  • prov:SoftwareAgent .

Uses before defined:

  • prov:type
  • wasStartedBy
  • wasAssociatedWith
  • prov:role

Issues:

  • How to type an agent to Person, Organization, SoftwareAgent? with a prov:type attribute? (the example shows it, but not stated explicitly)

Note

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} note(id,[attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{[~owl:unionOf~(~prov:Entity~prov:Activity~)~]}\ .\\
 id & attr_1 & val_1\ .\\
    & \vdots\\
 id & attr_n & val_n\ .\\
 attr_1 & \texttt{a} & \texttt{owl:AnnotationProperty}\ .\\
    & \vdots\\
 attr_n & \texttt{a} & \texttt{owl:AnnotationProperty}\ .

\end{array}\right. \end{array} </math>

Concerns:

  • Use of notes is reasonable for things like "GUI Color",
  • but NOT for the much heavier-duty use that DM offers (meta-provenance).

Uses before defined: hasAnnotation

Translating relation formulas

PROV-DM Relation

Generation

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasGeneratedBy([id],e,[a],[t],[attrs]) &\to& \left\{ \begin{array}{lcl}

 e & \texttt{prov:wasGeneratedBy} & a\ .\\
 e & \texttt{a} & \texttt{prov:Entity}\ . \\
 a & \texttt{a} & \texttt{prov:Activity}\ . \\
 a & \texttt{prov:generated} & e\ .\\
 a & \texttt{prov:hadQualifiedGeneration} & id\ .\\
 id & \texttt{a} & \texttt{prov:Generation}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & e\ .\\
 id & attr_1 & val_1\ .\\
   & \vdots\\
 id & attr_n & val_n\ .\\
 e & \texttt{prov:wasGeneratedAt} & t\ .

\end{array}\right.

\end{array} </math>

Issues:

  • An entity can only be generated once (Tim's claim, does DM say anything about it?):
prov:Entity owl:subClassOf [ owl:onProperty prov:wasGeneratedAt; owl:maxCardinality 1 ] .
  • For PROV-O, the activity id cannot be optional. (Satya)
  • Having activity id as optional violates the DM requirement that all relations "have two primary elements" (Section 5.3, DM TPWD). (Satya)
  • Why is time [t] listed as a distinct attribute from other attribute-value pairs? Isn't time of generation yet another attribute? (Satya)

Usage

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl}

used([id],a,e,[t],[attrs]) &\to& \left\{ \begin{array}{lcl}

 a & \texttt{prov:used} & e\ .\\
 a & \texttt{a} & \texttt{prov:Activity}\ . \\
 e & \texttt{a} & \texttt{prov:Entity}\ . \\
 e & \texttt{prov:wasUsedAt} & t\ .\\
 a & \texttt{prov:hadQualifiedUsage} & id\ .\\
 id & a & \texttt{prov:Usage}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & e\ .\\
 id & \texttt{prov:happenedAt??} & t\ .\\
 id & attr_1 & val_1\ .\\
    & \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>

Issues:

  • If activity id is optional for generation record, why is it not so for usage record? These two points need to be reconciled either way. (Satya)
  • Similar to generation, time can be "folded" into the "attribute" list. (Satya)

Agent Association

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl} wasAssociatedWith([id],act,ag,[p],[attrs]) &\to& \left\{ \begin{array}{lcl}

 act & \texttt{prov:wasAssociatedWith} & ag\ .\\
 act & \texttt{a} & \texttt{prov:Activity}\ . \\
 ag & \texttt{a} & \texttt{prov:Agent}\ . \\
 act & \texttt{prov:hadQualifiedAssociation} & id\ .\\
 id & \texttt{a} & \texttt{prov:Association}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & ag\ .\\
 id & \texttt{prov:adoptedPlan} & p\ .\\
 id & attr_1 & val_1\ .\\
    &  \vdots\\
 id & attr_n & val_n\ .\\
 p & \texttt{a} & \texttt{prov:Plan}\ . 

\end{array}\right.

\end{array} </math>

Uses before defined: prov:type, prov:role

Issues:

  • All the descriptions associated with Plan in DM TPWD is in context of Activity, then why should be associated with Activity and Agent (also raised as Issue-203 by Stephan). (Satya)

Starting

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl}

wasStartedBy([id],a,ag,[attrs]) &\to& \left\{ \begin{array}{lcl}

 \texttt{a} & \texttt{prov:wasStartedBy} & ag.\\
 a & \texttt{a} & \texttt{prov:Activity} . \\
 ag & \texttt{a} & \texttt{prov:Agent} . \\
 a & \texttt{prov:hadQualifiedStart} & \texttt{prov:Start} . \\
 a & \texttt{prov:hadQualifiedEntity} & ag . \\
 id & attr_1 & val_1 .\\
    & \vdots\\
 id & attr_n & val_n .

\end{array}\right.

\end{array} </math>

We seemed to agree at F2F2 that 1) who started and 2) when it was started would be separated

Note: we agreed to left this relationship out of the first alignement

Issues:

  • The majority of uses for the qualified start should actually be Activities (Tim)

Ending

PROV-DM <math>\implies</math> [PROV-O]

<math> \begin{array}{lcl}

wasEndedBy([id],a,ag,[attrs]) &\to& \left\{ \begin{array}{lcl}

 a & \texttt{a} & \texttt{prov:Activity} . \\
 ag & \texttt{a} & \texttt{prov:Agent} .\\
 a & \texttt{prov:wasEndedBy} & act.\\
 id & attr_1 & val_1 .\\
    & \vdots\\
 id & attr_n & val_n .

\end{array}\right.

\end{array} </math>

We seemed to agree at F2F2 that 1) who started and 2) when it was started would be separated

Note: we agreed to left this relationship out of the first alignement

Responsibility

PROV-DM <math>\implies</math> PROV-O

<math> \begin{array}{lcl} actedOnBehalfOf([id],ag2,ag1,[a],[attrs]) &\to& \left\{ \begin{array}{lcl}

 ag2 & \texttt{prov:actedOnBehalfOf} & ag1 . \\
 ag2 & \texttt{a} & \texttt{prov:Agent} . \\
 ag1 &  \texttt{a} & \texttt{prov:Agent} . \\
 ag2 & \texttt{prov:hadQualifiedDelegation} & id .\\
 id & \texttt{a} & \texttt{prov:Delegation} . \\
 id &  \texttt{prov:qualifiedEntity} & ag1 . \\
 id &  \texttt{prov:qualifiedActivity???} & a . \\
 id & attr_1 & val_1 .\\
    & \vdots\\
 id & attr_n & val_n .\\
 a &  \texttt{a} & \texttt{prov:Activity} .

\end{array}\right. \end{array} </math>

Used prov:type and prov:role before defined.

Issues:

  • might be nice to rename "hadQualifiedEntity/Activity" to "involvedEntity/Activity"
  • DM uses "subordinate" and "responsible" as additional qualifications for the two agents, but it is not reflected in the ASN. (Satya)

Derivation

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

precise-1

<math> \begin{array}{lcl} wasDerivedFrom([id], e2, e1, a, g2, u1, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.

\end{array}\right.

\end{array} </math>
Issues:

  • Why should usage and generation "events" be associated with a derivation relation? (Satya)
  • If derivation is representing the relation between two entities, why should the intermediate activity(s) associated with the two entities be part of the derivation relation? (Satya)

imprecise-1

<math> \begin{array}{lcl} wasDerivedFrom([id], e2,e1, [t], attrs) &\to& \left\{ \begin{array}{lcl}

 e2 & \texttt{prov:wasDerivedFrom} & e1\ .\\
 a & \texttt{prov:hadQualifiedDerivation} & id\ .\\
 id & \texttt{prov:steps} & \texttt{prov:single}\ .\\
 id & attr_1 & val_1\ .\\
    &  \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>

imprecise-n

<math> \begin{array}{lcl} wasDerivedFrom([id], e2, e1, [t], [attrs]) &\to& \left\{ \begin{array}{lcl} TBD \end{array}\right.

\end{array} </math>

not in DM: consolidated derivation signature

<math> \begin{array}{lcl} wasDerivedFrom([id], e2, e1, [a, [g2], [u1]], [attrs]) &\to& \left\{ \begin{array}{lcl}

 e2 & \texttt{prov:wasDerivedFrom} & e1\ .\\
 e2 & \texttt{a} & \texttt{prov:Entity}\ . \\
 e1 & \texttt{a} & \texttt{prov:Entity}\ . \\
 a & \texttt{a} & \texttt{prov:Activity}\ . \\
 e2 & \texttt{prov:wasGeneratedBy} & a\ . \\
 a & \texttt{prov:used} & e1\ . \\
 e2 & \texttt{prov:hadQualifiedDerivation} & id\ .\\
 id & \texttt{a} & \texttt{prov:Derivation}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & ag\ .\\
 id & \texttt{prov:hadQualifiedActivity} & a\ .\\
 id & attr_1 & val_1\ .\\
    &  \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>

Note: Daniel used the [a,[g2], [u1]] notation to indicate that a, g2, and u1 are optional, but if g2 or u1 are present, then a is required as well. (Imprecise and precise derivations are mixed, but we could separate them)
Issues:

  • This should be the only derivation construct supported by PROV-O (leaving out all the optional information). (Satya)

AlternateOf

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} alternateOf(e1,e2,[attrs]) &\to& \left\{ \begin{array}{lcl}

 e1 & \texttt{prov:alternateOf} & e2 . \\
 ag2 & \texttt{prov:hadQualified???} & id .\\
 e1 & \texttt{a} & \texttt{prov:Entity} . \\
 e2 &  \texttt{a} & \texttt{prov:Entity} . \\
 ?? &  \texttt{prov:qualifiedEntity} & e2 . \\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>


Note: we agreed to left this relationship out of the first alignement

SpecializationOf

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl} specializationOf(e1,e2,[attrs]) &\to& \left\{ \begin{array}{lcl}

 e1 & \texttt{prov:specializationOf} & e2 . \\
 ag2 & \texttt{prov:hadQualified???} & id .\\
 e1 & \texttt{a} & \texttt{prov:Entity} . \\
 e2 &  \texttt{a} & \texttt{prov:Entity} . \\
 ?? &  \texttt{prov:qualifiedEntity} & e2 . \\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Note: we agreed to left this relationship out of the first alignement

Annotation

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} hasAnnotation(id,[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Note: we'll be using rdfs:comment and label to handle the annotations

Account

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} Account(id,[asserter],[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Note: we agreed to left this relationship out of the first alignement

Record Container

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} container namespaceDeclarations (record) + 'end container' &\to& \left\{ \begin{array}{lcl} \end{array}\right. \end{array} </math>

Note: we agreed to left this relationship out of the first alignement

Time

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

Asserter

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

Location

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

Traceability

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} tracedTo(id1,id2, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Activity Ordering

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasInformedBy(id1,id2, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Revision

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} wasRevisionOf(id1,id2,[ag], [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Attribution

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasAttrributedTo(e,ag,[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Quotation

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasQuotedFrom(e1,e2,[ag1],[ag2],[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Summary

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasSummaryOf(e2,e1, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Original Source

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} hadOriginalSource(e1,e2, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

old monolithic list

<math> \begin{array}{lcl} wasGeneratedBy(id,e,a,attrs,t) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:Generation} . \\
 id & \texttt{prov:hadQualifiedEntity} & e .\\
 a & \texttt{prov:hadQualifiedGeneration} & id .\\
 e & \texttt{prov:wasGeneratedAt} & t .\\
 e & \texttt{prov:wasGeneratedBy} & a .\\
 id & \texttt{prov:happenedAt???} & t .\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\ used(id,e,a,attrs,t) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:Usage} . \\
 id & \texttt{prov:hadQualifiedEntity} & e .\\
 a & \texttt{prov:hadQualifiedUsage} & id .\\
 id & \texttt{prov:happenedAt???} & t .\\
 a & \texttt{prov:used} & e .\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\ wasStartedBy(id,a,ag,attrs) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:ActivityStart???} . \\
 a & \texttt{prov:startedAt} & t .\\
 a & \texttt{prov:wasStartedBy} & act.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\ wasEndedBy(id,a,ag,attrs) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:ActivityEnd???} . \\
 a & \texttt{prov:endedAt} & t .\\
 a & \texttt{prov:wasEndedBy} & act.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\

 alternateOf(e_1,e_2) &\to& 
  e_1~ \texttt{prov:alternateOf}~ e_2

\\\\ specializationOf(e_1,e_2) &\to& e_1 ~\texttt{prov:specializationOf} ~e_2 \\\\

hasAnnotation(r,n,attrs) &\to& 

\left\{ \begin{array}{lcl}

 r & \texttt{prov:hasAnnotation???} & \texttt{n} . \\
 ?? & attr_1 & val_1 .\\
 \vdots\\
 ?? & attr_n & val_n .

\end{array}\right. \end{array} </math>

Questions/problems

  1. The element formula for activities is the only one that mentions additional things besides attributes. This seems odd.
  2. It isn't obvious whether we should emit a triple saying that the plan element of an activity is a <math>\texttt{prov:Plan}</math>. I guess this can be inferred if we omit it?
  3. In the rule for note, there is no class we can assign to the id. (The obvious idea of using rdfs:comment doesn't work because there's no separate class for the comments, and the range of rdfs:comment is Literal.) Is this a problem? Proposed solution: add class prov:Note.
  4. wasGeneratedBy has a time which can be linked to the generated entity by <math>\texttt{prov:wasGeneratedAt}</math>, but I think the time should be linked directly to the id. Proposed solution: introduce <math>\texttt{prov:happenedAt}</math>, define <math>\texttt{prov:wasGeneratedAt}</math> as the composition of <math>\texttt{prov:happenedAt}</math> and <math>\texttt{prov:hadQualifiedEntity}</math>.
  5. used has a time and it's not obvious what this should be linked to in RDF and how. There is no relation for linking the used id to the time. Proposed solution: introduce <math>\texttt{prov:happenedAt}</math>.
  6. wasStartedBy and wasEndedBy are treated as events (and they have id's and attributes), but there is no class for them. Proposed solution: introduce <math>\texttt{prov:ActivityStart}</math> and <math>\texttt{prov:ActivityEnd}</math> as subclasses of QualifiedInvolvement.
  7. wasStartedBy and wasEndedBy rules have no obvious way to link the start and end time.
  8. In hasAnnotation, should the attributes be connected to r or to n? Given that the note n can have arbitrary attributes, why does hasAnnotation have additional attributes?

From PROV-O to PROV-DM

Given an instance of PROV-O, we want to compute an instance of PROV-DM that has the "same meaning".

The basic idea is:

  1. For each node in the RDF graph, check whether the node is an instance of one of the PROV-O classes Entity, Agent, or Activity.
    1. For each such node, look for the appropriate edges in the prov: namespace needed to fill in the fields of the corresponding PROV-DM record.
    2. Any additional fields in other namespaces are added as attributes.
  2. For each of the edges / graph patterns corresponding to PRO-DM relations, look for the corresponding data and generate the appropriate relation.

[TODO: Flesh this out!]