Difference between revisions of "ProvRDF"

From Provenance WG Wiki
Jump to: navigation, search
(Mapping coverage)
(Activity)
Line 65: Line 65:
  
 
Uses before defined: prov:type
 
Uses before defined: prov:type
 +
Issues (LHS):
 +
* How is startedAt and endedAt distinct from other attribute-value associated with Activity? (Satya)
  
 
=== Agent ===
 
=== Agent ===

Revision as of 00:57, 12 February 2012

Introduction

This document gives a draft (detailed) translation from (some of) PROV-DM to PROV-O, and sketches how to go in the reverse direction (i.e. how to extract PROV-DM from a RDF graph that includes PROV-O data as well as possibly other RDF).

Note that I (jcheney) am not being careful about using a standardized RDF syntax, as I don't know any of them. I am just giving the flavor of what I have in mind.

Coverage: This is NOT complete, only for illustration so far. It covers many of the basic element and relation records. It does not cover: derivation, acts on hbehalf of, accounts, ....

Guideline: Include all RDF assertions associated with a DM assertion, even if some of them wind up being redundant/inferrable.

From PROV-DM to PROV-O

We define a translation from PROV-DM formulas to RDF conforming to PROV-O as follows.

There are some places where it's non-obvious (to jcheney) what to do, marked with "???".

Mapping coverage

The undersigned have reviewed DM WD3 and agree that all ASN signatures in WD3 appear as left hand sides of the rules shown on this page. Further, the rules here are in the same order as DM WD3 and no rules appear here without appearing in DM WD3.

  • Daniel Garijo (10-Feb-2012)
  • Add your name here (date)
  • and here (date)

Translating element formulas

The formulas are listed in an order that corresponds to the order given in PROV-DM WD3.

Entity

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} entity(id,[attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl} id & \texttt{a} & \texttt{prov:Entity}\ . \\ id & attr_1 & val_1 \ .\\ & \vdots\\ id & attr_n & val_n \ . \end{array}\right. \end{array} </math>

Uses before defined: prov:type

Activity

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl}

activity(id,[st],[et],[attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{prov:Activity}\ . \\
 id & \texttt{prov:startedAt} & st\ . \\
 id & \texttt{prov:endedAt} & et\ .\\
 id & attr_1 & val_1 \ .\\
    & \vdots\\
 id & attr_n & val_n \ .

\end{array}\right.

\end{array} </math>

Uses before defined: prov:type Issues (LHS):

  • How is startedAt and endedAt distinct from other attribute-value associated with Activity? (Satya)

Agent

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl}

agent(id,[prov:type = ns:Type], [attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{prov:Agent}\ . \\
 id & \texttt{a} & \texttt{ns:Type}\ .\\
 id & attr_1 & val_1\ .\\
    & \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>


prov:Person        rdfs:subClassOf prov:Agent .
prov:Organization  rdfs:subClassOf prov:Agent .
prov:SoftwareAgent rdfs:subClassOf prov:Agent .

[] a owl:AllDisjointClasses; 
   owl:members ( prov:Person prov:Organization prov:SoftwareAgent ).

Mentions but does not define: prov:Person, prov:Organization, prov:SoftwareAgent .

Uses before defined: prov:type wasStartedBy wasAssociatedWith prov:role

Note

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} note(id,[attr_1=val_1,...,attr_n=val_n]) &\to & \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{[~owl:unionOf~(~prov:Entity~prov:Activity~)~]}\ .\\
 id & attr_1 & val_1\ .\\
    & \vdots\\
 id & attr_n & val_n\ .\\
 attr_1 & \texttt{a} & \texttt{owl:AnnotationProperty}\ .\\
    & \vdots\\
 attr_n & \texttt{a} & \texttt{owl:AnnotationProperty}\ .

\end{array}\right. \end{array} </math>

Concerns:

  • Use of notes is reasonable for things like "GUI Color",
  • but NOT for the much heavier-duty use that DM offers (meta-provenance).

Uses before defined: hasAnnotation

Translating relation formulas

Generation

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasGeneratedBy([id],e,[a],[t],[attrs]) &\to& \left\{ \begin{array}{lcl}

 e & \texttt{prov:wasGeneratedBy} & a\ .\\
 e & \texttt{a} & \texttt{prov:Entity}\ . \\
 a & \texttt{a} & \texttt{prov:Activity}\ . \\
 a & \texttt{prov:generated} & e\ .\\
 a & \texttt{prov:hadQualifiedGeneration} & id\ .\\
 id & \texttt{a} & \texttt{prov:Generation}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & e\ .\\
 id & attr_1 & val_1\ .\\
   & \vdots\\
 id & attr_n & val_n\ .\\
 e & \texttt{prov:wasGeneratedAt} & t\ .

\end{array}\right.

\end{array} </math>

An entity can only be generated once (Tim's claim, does DM say anything about it?):

prov:Entity owl:subClassOf [ owl:onProperty prov:wasGeneratedAt; owl:maxCardinality 1 ] .

Usage

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl}

used([id],a,e,[t],attrs) &\to& \left\{ \begin{array}{lcl}

 a & \texttt{prov:used} & e\ .\\
 a & \texttt{a} & \texttt{prov:Activity}\ . \\
 e & \texttt{a} & \texttt{prov:Entity}\ . \\
 a & \texttt{prov:hadQualifiedUsage} & id\ .\\
 id & a & \texttt{prov:Usage}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & e\ .\\
 id & \texttt{prov:happenedAt???} & t\ .\\
 id & attr_1 & val_1\ .\\
    & \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>

Agent Association

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl} wasAssociatedWith([id],act,ag,[p],[attrs]) &\to& \left\{ \begin{array}{lcl}

 act & \texttt{prov:wasAssociatedWith} & ag\ .\\
 act & \texttt{a} & \texttt{prov:Activity}\ . \\
 ag & \texttt{a} & \texttt{prov:Agent}\ . \\
 act & \texttt{prov:hadQualifiedAssociation} & id\ .\\
 id & \texttt{a} & \texttt{prov:Association}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & ag\ .\\
 id & \texttt{prov:adoptedPlan} & p\ .\\
 id & attr_1 & val_1\ .\\
    &  \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>

Uses before defined: prov:type, agent, prov:role

Starting

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl}

wasStartedBy([id],a,ag,[attrs]) &\to& \left\{ \begin{array}{lcl}

 id & \texttt{a} & \texttt{prov:ActivityStart???} . \\
 \texttt{a} & \texttt{prov:startedAt} & t .\\
 \texttt{a} & \texttt{prov:wasStartedBy} & act.\\
 id & attr_1 & val_1 .\\
 & \vdots\\
 id & attr_n & val_n .

\end{array}\right.

\end{array} </math>

We seemed to agree at F2F2 that 1) who started and 2) when it was started would be separated

Note: we agreed to left this relationship out of the first alignement

Ending

PROV-DM <math>\implies</math> [PROV-O]

<math> \begin{array}{lcl}

wasEndedBy([id],a,ag,[attrs]) &\to& \left\{ \begin{array}{lcl}

 a & \texttt{a} & \texttt{prov:Activity} . \\
 ag & \texttt{a} & \texttt{prov:Agent} .\\
 a & \texttt{prov:wasEndedBy} & act.\\
 id & attr_1 & val_1 .\\
    & \vdots\\
 id & attr_n & val_n .

\end{array}\right.

\end{array} </math>

We seemed to agree at F2F2 that 1) who started and 2) when it was started would be separated

Note: we agreed to left this relationship out of the first alignement

Responsibility

PROV-DM <math>\implies</math> PROV-O

<math> \begin{array}{lcl} actedOnBehalfOf([id],ag2,ag1,[a],[attrs]) &\to& \left\{ \begin{array}{lcl}

 ag2 & \texttt{prov:actedOnBehalfOf} & ag1 . \\
 id & \texttt{a} & \texttt{prov:Delegation} . \\
 ag2 & \texttt{prov:hadQualifiedDelegation} & id .\\
 ag2 & \texttt{a} & \texttt{prov:Agent} . \\
 ag1 &  \texttt{a} & \texttt{prov:Agent} . \\
 id &  \texttt{prov:qualifiedEntity} & ag1 . \\
 id &  \texttt{prov:qualifiedActivity???} & a . \\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Derivation

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl} wasDerivedFrom([id], e2, e1, [a, [g2], [u1]], [attrs]) &\to& \left\{ \begin{array}{lcl}

 e2 & \texttt{prov:wasDerivedFrom} & e1\ .\\
 e2 & \texttt{a} & \texttt{prov:Entity}\ . \\
 e1 & \texttt{a} & \texttt{prov:Entity}\ . \\
 a & \texttt{a} & \texttt{prov:Activity}\ . \\
 e2 & \texttt{prov:wasGeneratedBy} & a\ . \\
 a & \texttt{prov:used} & e1\ . \\
 e2 & \texttt{prov:hadQualifiedDerivation} & id\ .\\
 id & \texttt{a} & \texttt{prov:Derivation}\ . \\
 id & \texttt{prov:hadQualifiedEntity} & ag\ .\\
 id & \texttt{prov:hadQualifiedActivity} & a\ .\\
 id & attr_1 & val_1\ .\\
    &  \vdots\\
 id & attr_n & val_n\ .

\end{array}\right.

\end{array} </math>

Note: I've used the [a,[g2], [u1]] notation to indicate that a, g2, and u1 are optional, but if g2 or u1 are present, then a is required as well. (Imprecise and precise derivations are mixed, but we could separate them)

AlternateOf

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} alternateOf(e1,e2,[attrs]) &\to& \left\{ \begin{array}{lcl}

 e1 & \texttt{prov:alternateOf} & e2 . \\
 ag2 & \texttt{prov:hadQualified???} & id .\\
 e1 & \texttt{a} & \texttt{prov:Entity} . \\
 e2 &  \texttt{a} & \texttt{prov:Entity} . \\
 ?? &  \texttt{prov:qualifiedEntity} & e2 . \\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>


Note: we agreed to left this relationship out of the first alignement

SpecializationOf

PROV-DM (eg) <math>\implies</math> PROV-O (eg)


<math> \begin{array}{lcl} specializationOf(e1,e2,[attrs]) &\to& \left\{ \begin{array}{lcl}

 e1 & \texttt{prov:specializationOf} & e2 . \\
 ag2 & \texttt{prov:hadQualified???} & id .\\
 e1 & \texttt{a} & \texttt{prov:Entity} . \\
 e2 &  \texttt{a} & \texttt{prov:Entity} . \\
 ?? &  \texttt{prov:qualifiedEntity} & e2 . \\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Note: we agreed to left this relationship out of the first alignement

Annotation

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} hasAnnotation(id,[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Note: we'll be using rdfs:comment and label to handle the annotations

Account

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} Account(id,[asserter],[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Note: we agreed to left this relationship out of the first alignement

Record Container

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} container namespaceDeclarations (record) + 'end container' &\to& \left\{ \begin{array}{lcl} \end{array}\right. \end{array} </math>

Note: we agreed to left this relationship out of the first alignement

Time

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

Asserter

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

Location

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

Traceability

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} tracedTo(id1,id2, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Activity Ordering

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasInformedBy(id1,id2, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Revision

PROV-DM (eg) <math>\implies</math> [PROV-O] (eg)

<math> \begin{array}{lcl} wasRevisionOf(id1,id2,[ag], [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Attribution

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasAttrributedTo(e,ag,[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Quotation

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasQuotedFrom(e1,e2,[ag1],[ag2],[attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Summary

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} wasSummaryOf(e2,e1, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

Original Source

PROV-DM (eg) <math>\implies</math> PROV-O (eg)

<math> \begin{array}{lcl} hadOriginalSource(e1,e2, [attrs]) &\to& \left\{ \begin{array}{lcl}

 TBD.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \end{array} </math>

old monolithic list

<math> \begin{array}{lcl} wasGeneratedBy(id,e,a,attrs,t) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:Generation} . \\
 id & \texttt{prov:hadQualifiedEntity} & e .\\
 a & \texttt{prov:hadQualifiedGeneration} & id .\\
 e & \texttt{prov:wasGeneratedAt} & t .\\
 e & \texttt{prov:wasGeneratedBy} & a .\\
 id & \texttt{prov:happenedAt???} & t .\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\ used(id,e,a,attrs,t) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:Usage} . \\
 id & \texttt{prov:hadQualifiedEntity} & e .\\
 a & \texttt{prov:hadQualifiedUsage} & id .\\
 id & \texttt{prov:happenedAt???} & t .\\
 a & \texttt{prov:used} & e .\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\ wasStartedBy(id,a,ag,attrs) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:ActivityStart???} . \\
 a & \texttt{prov:startedAt} & t .\\
 a & \texttt{prov:wasStartedBy} & act.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\ wasEndedBy(id,a,ag,attrs) &\to& \left\{ \begin{array}{lcl}

 id & a & \texttt{prov:ActivityEnd???} . \\
 a & \texttt{prov:endedAt} & t .\\
 a & \texttt{prov:wasEndedBy} & act.\\
 id & attr_1 & val_1 .\\
 \vdots\\
 id & attr_n & val_n .

\end{array}\right. \\\\

 alternateOf(e_1,e_2) &\to& 
  e_1~ \texttt{prov:alternateOf}~ e_2

\\\\ specializationOf(e_1,e_2) &\to& e_1 ~\texttt{prov:specializationOf} ~e_2 \\\\

hasAnnotation(r,n,attrs) &\to& 

\left\{ \begin{array}{lcl}

 r & \texttt{prov:hasAnnotation???} & \texttt{n} . \\
 ?? & attr_1 & val_1 .\\
 \vdots\\
 ?? & attr_n & val_n .

\end{array}\right. \end{array} </math>

Questions/problems

  1. The element formula for activities is the only one that mentions additional things besides attributes. This seems odd.
  2. It isn't obvious whether we should emit a triple saying that the plan element of an activity is a <math>\texttt{prov:Plan}</math>. I guess this can be inferred if we omit it?
  3. In the rule for note, there is no class we can assign to the id. (The obvious idea of using rdfs:comment doesn't work because there's no separate class for the comments, and the range of rdfs:comment is Literal.) Is this a problem? Proposed solution: add class prov:Note.
  4. wasGeneratedBy has a time which can be linked to the generated entity by <math>\texttt{prov:wasGeneratedAt}</math>, but I think the time should be linked directly to the id. Proposed solution: introduce <math>\texttt{prov:happenedAt}</math>, define <math>\texttt{prov:wasGeneratedAt}</math> as the composition of <math>\texttt{prov:happenedAt}</math> and <math>\texttt{prov:hadQualifiedEntity}</math>.
  5. used has a time and it's not obvious what this should be linked to in RDF and how. There is no relation for linking the used id to the time. Proposed solution: introduce <math>\texttt{prov:happenedAt}</math>.
  6. wasStartedBy and wasEndedBy are treated as events (and they have id's and attributes), but there is no class for them. Proposed solution: introduce <math>\texttt{prov:ActivityStart}</math> and <math>\texttt{prov:ActivityEnd}</math> as subclasses of QualifiedInvolvement.
  7. wasStartedBy and wasEndedBy rules have no obvious way to link the start and end time.
  8. In hasAnnotation, should the attributes be connected to r or to n? Given that the note n can have arbitrary attributes, why does hasAnnotation have additional attributes?

From PROV-O to PROV-DM

Given an instance of PROV-O, we want to compute an instance of PROV-DM that has the "same meaning".

The basic idea is:

  1. For each node in the RDF graph, check whether the node is an instance of one of the PROV-O classes Entity, Agent, or Activity.
    1. For each such node, look for the appropriate edges in the prov: namespace needed to fill in the fields of the corresponding PROV-DM record.
    2. Any additional fields in other namespaces are added as attributes.
  2. For each of the edges / graph patterns corresponding to PRO-DM relations, look for the corresponding data and generate the appropriate relation.

[TODO: Flesh this out!]