Using graphs to model Accounts

From Provenance WG Wiki
Revision as of 18:31, 15 January 2012 by Tlebo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

author: Tim Lebo

Feedback is very much welcome (public-prov-wg@w3.org).

Previous materials

"named" was removed from this document's title to better reflect how Accounts should be treated in RDF, to align with RDF 1.1 WG's perspective, and to avoid confusion from many conflicting interpretations of "named" that have accumulated through years of development without a formal W3C specification.

Starting fresh

This discussion will work from the PROV-DM draft, attempt to create concrete RDF examples of those given, recast the examples to reflect realistic RDF use, and motivate some shortcomings when using PROV-DM in a PROV-O context.

Three parts of an Account

prov-dm b984f67f3465 #account-example-1 shows an account that is reproduced below. An account:

  • is named with an identifier (ex:acc0),
  • cites an asserting agent (http://example.org/asserter), and
  • indicates the records that have been asserted by the agent.
account(ex:acc0,
        http://example.org/asserter, 
          entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])
          ...
          wasDerivedFrom(e2,e1)
          ...
          activity(a0,t,,[prov:type="createFile"])
          ...
          wasGeneratedBy(e0,a0)     
          ...
          wasAssociatedWith(a4, ag5, [prov:role="communicator"])  )

An Account's records are a Graph

Since Record Containers provide an implicit account [1], let's start by focusing on encoding the records above without the account's name or asserter. The records shown above can be encoded in RDF (and have been). The following RDF uses 13 triples to reflect the entity, wasDerivedFrom, activity, wasGeneratedBy, and wasAssociatedWith records. (The RDF encoding is hardly complete; for discussion purposes all we need is a few triples).

@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .
@prefix time:    <http://www.w3.org/2006/time#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix skos:    <http://www.w3.org/2008/05/skos#> .
@prefix prov:    <http://www.w3.org/ns/prov-o/> .
@prefix ex:      <http://dvcs.w3.org/hg/prov/raw-file/898c71190fd5/ontology/components/Account/prov-dm-b984f67f3465-example.ttl#ex> .
@prefix :        <http://dvcs.w3.org/hg/prov/raw-file/898c71190fd5/ontology/components/Account/prov-dm-b984f67f3465-example.ttl#> .

:e0
   dcterms:description "entity(e0, [ prov:type='File', ex:path='/shared/crime.txt', ex:creator='Alice' ])";
   a prov:Entity;
   a :File;
   ex:path "/shared/crime.txt";
   ex:creator :Alice;
.

:e2
   dcterms:description "wasDerivedFrom(e2,e1)";
   prov:wasDerivedFrom :e1;
.

:a0
   dcterms:description "activity(a0,t,,[prov:type='createFile'])";
   a prov:Activity;
   a :CreateFile;
.

:e0
   dcterms:description "wasGeneratedBy(e0,a0)";
   prov:wasGeneratedBy :a0;
.

:a4
   dcterms:description "wasAssociatedWith(a4, ag5, [prov:role='communicator'])";
.

The text shown above AND the identical text on prov hg can be considered Record Containers because they fulfill the notions behind the ASN grammar definition (reproduced below); they start, specify some namespaces, specify some records, and then they stop. The same set of 13 triples somewhere else on prov hg (and in a different format) also fulfills the definition of a Record Container; The N-Triples file starts, specifies no namespace declarations, specifies some records (with namespaces), and then it stops. We could continue to proliferate these 13 triples into an assortment of other locations, formats, and access protocols, but we'll stop at three for discussion purposes.

recordContainer ::= 'container' namespaceDeclarations ( record ) + 'endContainer '

The common aspect of all three of these texts -- the 13 abstract triples -- is the important element of an account. They reflect what was asserted by some asserter, regardless of location or serialization. Here, we'll use a graph hash of the 13 triples (CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf) to name the RDF Abstract Graph that they constitute. To reinforce this, we'll construct a URI using the hash and describe the RDF Abstract Graph in some RDF:

<http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>
  a rdf:Graph;

  dcterms:description """The 13 abstract (independent of serialization) RDF triples that represent the 
provenance records in 
http://dvcs.w3.org/hg/prov/raw-file/b984f67f3465/model/ProvenanceModel.html#RecordContainer""";

  void:triples 13;

  rdfs:comment """CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf 
is a serialization-independent hash of the 13 abstract triples in this graph.""";

  rdfs:comment """The bitstream returned when dereferencing this URI awww:represents the abstract graph,
which may be in an arbitrary format.""";

  void:dataDump <http://dvcs.w3.org/hg/prov/raw-file/56929fdd1a45/ontology/components/Account/prov-dm-b984f67f3465-example.ttl>,
                <http://dvcs.w3.org/hg/prov/raw-file/56e1c6081026/ontology/components/Account/prov-dm-b984f67f3465-example.ttl.nt>;
.

Asserting a graph

Now that we have the URI http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf to awww:identify the Abstract RDF Graph of 13 triples, how do we relate it to the asserter and the name of the account?

The recent addition of wasAssociatedWith allows us to ascribe an agent with a muddled notion of "responsibility". Fortunately, "responsibility" is exactly what we need to make an assertion; It establishes the qualitative distinction between generating something and claiming something. Further, we can (and should!) reuse the core constructs of PROV-DM to phrase an account in terms of how it was created:

An Account is an Entity that was generated by an asserter during an assertion activity.

In PROV-O, the generated Entity is a set of abstract triples that use predicates and classes from the prov: namespace, among others.

So, reusing ex:acc0 and <http://example.org/asserter> from the DM example, a PROV-O encoding would resemble the following.

TODO use wasAttributedTo and a qualified wasAttributedTo if further details are needed.
ex:acc0
   a prov:AssertionActivity, prov:Activity;
   prov:wasAssociatedWith <http://example.org/asserter>;
   prov:generated         <http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>;
.

The PROV-O statements above relate the three elements of an Account that were enumerated in this document's introduction. An account...

  • is named with an identifier (ex:acc0),
  • cites an asserting agent (http://example.org/asserter), and
  • indicates the records that have been asserted by the agent.

When Accounts (i.e. AssertionActivities) are modeled as a specialization of Activities, we are able to express much more about the Account by reusing PROV-DM core, and without creating new constructs. Now, we can say as much as we want about the Account, such as when it was performed, how it was performed, if any witnesses were present, etc.

From Graph, up to location, and down to serialization

<http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf> awww:identifies 13 specific RDF abstract triples that <http://example.org/asserter> claimed, regardless of location or serialization. Although this is conceptually clean, it is not practical since one cannot query abstract triples. To obtain the 13 triples we must do one of:

Each of these activities would involve their own, more mechanical, provenance that we are going to avoid for now. However, each kind of retrieval creates a new concrete form of the same abstract content and places it in a new location. Thus, we can "go lots of places" to see what <http://example.org/asserter> claimed. In the next section, we focus on how this would be done in a SPARQL triple store.

Benefiting FROM NAMED in SPARQL

It is imperative that any design for PROV-O Accounts leverage the FROM NAMED and GRAPH constructs that SPARQL provides. SPARQL triple stores are commonly used to accumulate RDF data from a variety of places so that their interrelations may be discovered and used. "Partitions" within a SPARQL triple store are called named graphs, which can now be described using the SPARQL 1.1 Service Description's RDFS vocabulary. The following RDF uses this vocabulary to describe a named graph created while this proposal was being developed.

<http://logd.tw.rpi.edu/sparql>
   a sd:Service;
   sd:defaultDatasetDescription [
      a sd:Dataset; # is an optional defaultGraph and 0 or more namedGraphs
      sd:namedGraph [
         sd:name  <http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>;
                  # The triple store owner could have chosen any sd:name, but we reuse the URI of the
                  # abstract graph for convenience. If this convention is not used, then we'll need an
                  # extra level of indirection in our query to find which named graph has the abstract 
                  # content (claims) we want.
         sd:graph <http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>;
      ];
   ];
.
<http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>
   void:triples 13;
.

Knowing about the sd:Service and how it names one of its graphs, we can query it with SPARQL:

prefix prov: <http://www.w3.org/ns/prov-o/>
select distinct ?entity
where {
  graph <http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf> {
    ?entity a prov:Entity .
  }
}

Account assertions separate from claimed content

Knowing a little more about what is in the triple store, we can query only the content that has been claimed by <http://example.org/asserter>.

<http://logd.tw.rpi.edu/sparql>
   a sd:Service;
   sd:defaultDatasetDescription [
      a sd:Dataset; # is an optional defaultGraph and 0 or more namedGraphs
      sd:namedGraph [
         sd:name  <http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>;
                  # The triple store owner could have chosen any sd:name, but we reuse the URI of the
                  # abstract graph for convenience. If this convention is not used, then we'll need an
                  # extra level of indirection in our query to find which named graph has the abstract 
                  # content (claims) we want.
         sd:graph <http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>;
      ];
      sd:namedGraph [
         sd:name  <http://example.org/asserter>;
                  # Here, the triple store owner chose a different sd:name than that of the abstract graph
                  # available. This may have been done because it is helpful to group all assertions by
                  # their asserter.
         sd:graph <http://purl.org/twc/rdf-abstract-graph/A1kwBx91NULsklWGKC4aMPstVcgRuH1ZeDmhynahlZ-8>;
      ];
   ];
.
<http://purl.org/twc/rdf-abstract-graph/CEb-b5-djBpjqa-a9-Z7MECf2_KXmNsjT435R-XYF6hf>
   void:triples 13;
.
<http://purl.org/twc/rdf-abstract-graph/A1kwBx91NULsklWGKC4aMPstVcgRuH1ZeDmhynahlZ-8>
   void:triples 4;
.
prefix prov: <http://www.w3.org/ns/prov-o/>
select distinct ?entity
where {
  graph <http://example.org/asserter> {
    []
      a prov:AssertionActivity, prov:Activity;
      prov:wasAssociatedWith <http://example.org/asserter>;
      prov:generated         ?graph
    .
  }
  graph ?graph {
    ?entity a prov:Entity .
  }
}

It is important to note that the directness of this query (that prov:generated ?graph aligns with graph ?graph) requires the triple store maintainer to follow a naming convention for the second graph in the query. Otherwise, supplemental data relating the local graph name to the URI of the abstract graph is will be required, along with additional graph patterns in the query itself.

Nested accounts

In the previous section, we argue that an Account's provenance records constitute an abstract RDF graph that exists independent of location or serialization, and a particular named graph in a particular triple store can be used to store one concrete form of the graph. We proposed to model Accounts using a special kind of Activity that relates the "responsible" asserter to the graph being asserted. These two sets of assertions -- the assertion of the account's existence versus the claims of the account -- were stored in separate named graphs, which allows claims to be queried according to any aspect of the assertion activity.

The second prominent Account example in the PROV-DM shows how Accounts can be nested. In this section, we reapply the design from before to show how graphs can be used to model accounts.

account(ex:acc3,
        http://example.org/asserter1, 
          entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])
          activity(a0,t,,[prov:type="createFile"])
          wasGeneratedBy(e0,a0,[])  
          account(ex:acc4,
                  http://example.org/asserter2,
                    entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ])
                    activity(a0,t,,[prov:type="copyFile"])
                    wasGeneratedBy(e1,a0,[ex:fct="create"])
                    wasComplementOf(e1,e0)))

The PROV-O resembles the previous two-graph example, but has a third. This time, we use TRiG to encode the quads. The example is also on prov hg.

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix sd:      <http://www.w3.org/ns/sparql-service-description#> .
@prefix void:    <http://rdfs.org/ns/void#> .
@prefix prov:    <http://www.w3.org/ns/prov-o/> .
@prefix ex:      <#ex> .
@prefix :        <#> .

ex:acc3
   a prov:Account;
   prov:wasAttributedTo <http://example.org/asserter1>;
   void:triples 13;
.

[] a sd:NamedGraph;
   sd:name  ex:acc3_claims;
   sd:graph ex:acc3;
.

ex:acc3_claims prov:specializationOf ex:acc3 .

ex:acc3_claims {
   :e0
      dcterms:description "entity(e0, [ prov:type='File', ex:path='/shared/crime.txt', ex:creator='Alice' ])";
      a prov:Entity;
      a :File;
      ex:path "/shared/crime.txt";
      ex:creator :Alice;
   .

   :a0
      dcterms:description "activity(a0,t,,[prov:type='createFile'])";
      a prov:Activity;
      a :CreateFile;
   .

   :e0
      dcterms:description "wasGeneratedBy(e0,a0,[])";
      a prov:Entity;
      prov:wasGeneratedBy :a0;
   .

   ex:acc4
      a prov:Account;
      prov:wasAttributedTo <http://example.org/asserter2>;
      void:triples 14;
   .

   # Note that this is provenance of provenance. ex:acc3 does not include the assertions of ex:acc4.
   # For the latter, ex:acc3 would need to include the triple "ex:acc3 void:subset ex:acc4 ." by way of ex:acc3_claims.
}

[] a sd:NamedGraph;
   sd:name  ex:acc4_claims;
   sd:graph ex:acc4;
.

ex:acc4_claims prov:specializationOf ex:acc4 .

ex:acc4_claims {
   :e1
      dcterms:description "entity(e1, [ prov:type='File', ex:path='/shared/crime.txt', ex:creator='Alice', ex:content='' ])";
      a prov:Entity;
      ex:path "/shared/crime.txt";
      ex:creator :Alice;
      ex:content "";
   .

   :a0
      dcterms:description "activity(a0,t,,[prov:type='copyFile'])";
      a prov:Activity;
      a :CopyFile;
   .

   :e1
      dcterms:description "wasGeneratedBy(e1,a0,[ex:fct='create'])";
      prov:wasGeneratedBy :a0;
      ex:fct :create;

      dcterms:description "wasComplementOf(e1,e0))";
      prov:wasComplementOf :e0;
   .

   :e1
      a prov:Entity;
   .
}

Outstanding issues

  • There is currently no standard way to describe how or why a triple store's named graph was created, how it is being maintained, and how it relates to other named graphs. This can be seen with the named graph with sd:name <http://example.org/asserter> in #Account assertions separate from claimed content.
    • The inability to describe these associations was discussed at RDF 1.1 F2F2, but seemed to be a low priority and may become a note if it gets motivation. The note would outline the commonly used patterns to organize named graphs in a triple store.
    • Fortunately, the SPARQL Service Description (sd:) gives us the vocabulary for the subjects of interest. So most of the solution would extend and combine sd: void:, and prov:.
  • Scope of naming returned to PROV-DM with a vengeance, and named graphs and SPARQL do not (and should not) support it.
  • The need for sd:name'ing conventions is not pronounced enough in this discussion, but doing so will require too much distraction.
  • If sd:names are not chosen appropriately, more descriptions and larger queries will be needed. The fact that the design handles more general distributed cases while looking behaving like the conventional usage is not clear, but again would be a distraction.

Comments

Examples

Piece-wise accounts

http://www.w3.org/mid/EMEW3%7Cce5b59aaefea0954e60fddb6da813220o04DyB08L.Moreau%7Cecs.soton.ac.uk%7C4F05ABBC.4020608@ecs.soton.ac.uk

  Using prov-aq, I may retrieve the provenance of entity e1, and obtain:

  acc(ex:a1,
      http://ex/asserter1,
      entity(e1,[...])
      ...)

  Again, using prov-aq, I may retrieve the provenance of entity e2, and obtain:


  acc(ex:a1,
      http://ex/asserter1,
      entity(e2,[...])
      ...)

  Same account in the sense that it is generated by
  http://ex/asserter1 and named ex:a1, but different subset of
  records.

If ex:a1 is a sd:NamedGraph's sd:name, then we wouldn't know which (RDF abstract) sd:Graphs we would be allowed to merge. In SPARQL service description vocabulary does not inform us about whether or not :ng1 and :ng2 are the same because they only share the same sd:name.

:ng1 a sd:NamedGraph;
     sd:name ex:a1;
     sd:graph [ void:triples 10 ] .
:ng2 a sd:NamedGraph;
     sd:name ex:a1;
     sd:graph [ void:triples 10 ] .

If ex:a1 denotes a sd:Graph, then we would know that any triples ever asserted to be in it 'are' in that graph.

:ng1 a sd:NamedGraph
     sd:name :anything;
     sd:graph ex:a1;
.
ex:a1 void:triples 10 .