Re: Towards PROV-O Accounts

Hi, Luc.

Thanks for so much feedback and sorry for taking so long to respond.

responses within.

On Jan 5, 2012, at 8:55 AM, Luc Moreau wrote:

> Hi Tim,
> 
> Thanks for your document on using graphs to model accounts.
> 
> There is a few issues that I think we should discuss since they
> potentially represent substantial differences with prov-dm.
> 
> 1. I have a problem with your statement:
> 
>   An Account is an Entity that was generated by an asserter during an
>   assertion activity.
> 
> This is not what prov-dm states at all.
> 
> Indeed, an account is a "thing in the world". We can then take
> multiple perspectives about that thing, which we can represent as
> entities for provenance purpose.  Having done that, then we can talk about
> the provenance of an account.
> 
> There maybe multiple ways of looking at a given account acc:
>  - what account acc tells us about entity e
>  - acc with suitable anonymization
>  - acc with a cryptographic signature
>  - etc
> 
>  It is up to the provenance asserter to decide how to assert entities
>  and we can't do that for them.
>  That's why it is not right to say that an account *is* an entity.
> 
>  However, an account is a thing in the world and we can define perspectives
>  on it as entities.
> 

I appreciate your attempt to clarify this point, but I must admit I don't know how to make the distinction useful.
I see "Things" vs. "Entities" as semiotic referents and symbols, respectively.
I'm going to hope that I can recruit those in the group that have a better understanding than I do and seem to share my confusion.


> 
> 
> 2. Currently an asserter is not modelled as an agent.  There is a note
>   to that effect. Nobody has come back to this point.  Until we firm
>   up this issue, we won't be able to decide whether your modelling is
>   correct or not.

I find it difficult to conceive of an asserter that does not have any responsibility for what it stated.
And since responsibility is agency in DM, I would think the asserter must be an agent.

> 
> 3. I agree with you that if we have an entity for an account, we can
>   also explain how it is generated, etc.
> 
>   Maybe, it's easier to use attributionm rather than introduce activity types.
> 
>    wasAttributedTo ( eIdentifier , agIdentifier optional-attribute-values )

I very much like this idea. Thank you for the suggestion.
As we discussed briefly a few telecons ago, would the DM be able to have qualified wasAttributedTo relations?
I think that it would be a natural question for a consumer, upon hearing that "account x was from agent y", to want to ask about how, when, or in what situation agent y stated those things (e.g., under oath in a courtroom, on twitter 2am on a Friday night, etc).
I added https://www.w3.org/2011/prov/track/issues/216 so we can track this idea.

> 
> 4. I can't decide whether your graph hash is central to your encoding
>   or not.

The fact that I'm using a hash to name is not important, but _what_ I am naming _is_ important.
As long as one knows they are naming a set of abstract triples, they can choose any non-hash name they desire.
But they need to know that they are denoting the abstract triples.
I am using the hash to be clear about what I'm naming.
Adding or changing a triple would result in a new abstract graph and would thus need a new name.
Hashes exhibit this same characteristic, thus their employment.

> If it is part of the design, it doesn't match my view of accounts.
> 
>   Using prov-aq, I may retrieve the provenance of entity e1, and obtain:
> 
>   acc(ex:a1,
>       http://ex/asserter1,
>       entity(e1,[...])
>       ...)
> 
>   Again, using prov-aq, I may retrieve the provenance of entity e2, and obtain:
> 
> 
>   acc(ex:a1,
>       http://ex/asserter1,
>       entity(e2,[...])
>       ...)
> 
>   Same account in the sense that it is generated by
>   http://ex/asserter1 and named ex:a1, but different subset of
>   records.

That is fine, because the consumer is not naming the account. 
The producer already has named it, and knew the full graph when they named it.
(again, hashes need not be used, but the properties that they exhibit should be followed)

> 
>   It's important to support that use case, since in that case, those
>   two account instances are telling us that they are the same, coming
>   from a same asserter, and can be merged without conflict (if the
>   original full account was without conflict).


This use case is not only supported, but also motivates modeling accounts as abstract graphs.
If instead we used the graph name when returning portions of an account, we wouldn't know which graphs we should merge.
I've added an example at http://www.w3.org/2011/prov/wiki/Using_graphs_to_model_Accounts#Piece-wise_accounts

> 
>   I don't know how these hashes work, given that these account examples
>   contain different records.

The account had already been named by the time the consumer got two two portions of it.
The URI of the account wouldn't change after the fact.


> 
> 5. Your nested account example:
> 
>  In acc4_claims, you write:
>   :e1
>      prov:wasComplementOf :e1;
> 
>  Shouldn't it be e0?
>   :e1
>      prov:wasComplementOf :e0;

Yes, I mis-transcribed. I changed it.


> 
>    How do you find which entity record this actually is?


:e1 or :e0?

I could resolve their URIs to get some descriptions.
I could query the current Dataset (0 or 1 default graphs and 0+ named graphs) which could be a trig file or a triple store, among other things.
In RDF, if :e1 or :e0 is every mentioned or described, your "record" grew.

I must admit, I do not understand your continuing need to "find a record".
I'm starting to think that your "records" are bounded by an RDF Graph.
The semantic web is designed to let the subgraph around an entity transcend the boundaries of particular files, graphs, stores, etc.


> 
>   To be compliant with prov-dm, you should probably encode the example as
> 
>   :e1, ex:acc4
>      prov:wasComplementOf :e0 , ex:acc3;

This isn't valid Turtle or Trig.

> 
>   meaning was is asserted about e1 in account4 wasComplementOf what is asserted
>   about e0 in account3.

That is already stated ( I abbreviate here). Remember the :e0 is an abbreviation for a full URI, so both occurrences of :e0 are the same URI (and thus referring to the same resource or "Thing"). Though, if anyone else asserts something about :e1 in their account, we wouldn't know if ex:acc3_claims was complementing that, too.

ex:acc4_claims {
   :e1 prov:wasComplementOf :e0;
}
ex:acc3_claims {
   :e0 ?p ?o
}



> 
> 6. In your example, is it problematic or not, to have two different
>   entity records containing the same entity identifier?

You named them the same in ASN, so I named them the same in URI. If they should not be the same, then I can change one of the URIs.

The serialization doesn't matter here. It's RDF triples.
Just as discussed in our last telecon, "record" doesn't mean anything in RDF.
I think you agreed that if anything, an RDF "record" was a subgraph. The union of these 5 triples below is a graph. One subject with five "predicate-object" pairs.
If the two accounts are not describing the same activity (or are offering different characterized perspectives), then one should be renamed and we could go as far as asserting that they are distinct.

The URI that :a0 is abbreviating is awww:identifying an Activity, a "Thing in the world", a awww:Resource. The URI is a symbol that denotes the Activity, and the Activity is being described (characterized) with attributes (triples). Resources can be awww:represented with a variety of concrete serializations; any serialization we receive when requesting a URI awww:represents the awww:Resource that the URI awww:identifies (and, "denotes").

see http://www.w3.org/TR/webarch/

The URI that ":a0" is abbreviating is NOT denoting the 3 (or 5) triples that use it as its subject. It is denoting the Activity.
There is no notion of "shape" or "container" (e.g., struct, array, row) that you keep wanting to impose on these triples.
One _could_ impose a shape by circumscribing a subgraph by way of a named graph -- AND defining some graph traversal operations seeded at some set of resources. But this is not common.


> 
> 
>     :a0
>      dcterms:description "activity(a0,t,,[prov:type='createFile'])";
>      a prov:Activity;
>      a :CreateFile;
> 
> 
>     :a0
>      dcterms:description "activity(a0,t,,[prov:type='copyFile'])";
>      a prov:Activity;
> 
> 
>    If no, than that's great.

It's not a problem if the URIs abbreviated by :a0 are awww:identifying (and "denoting") the same Activity - the "Thing in the world".

> What is your concern with scope then?

There is no scope for naming/denoting in RDF. The user's choice of URI establishes scope or ignores scope; that's up to them. URI naming is orthogonal to any use within Accounts (and thus, Graphs). If the same "Thing in the world" is mentioned by way of URI in two different accounts, then the two accounts are describing the same Thing in the world.



>    If yes, than what is the exact problem?


If you want two "string match" URIs mentioned in two different accounts to be denoting DIFFERENT "things in the world", I have concerns. It breaks the fundamental principles of the web.


-Tim

> 
> 
> Luc
> 
> 
> 
> 
> On 01/05/2012 03:35 AM, Timothy Lebo wrote:
>> prov-wg,
>> 
>> I have been working on some discussion [1] that is relevant to modeling Accounts in PROV-O.
>> 
>> It is incomplete, but I think ready for some initial feedback.
>> 
>> Modeling accounts is on the agenda for tomorrow's telecon [2], so I hope this can provide some discussion material.
>> 
>> Regards,
>> Tim
>> 
>> [1] http://www.w3.org/2011/prov/wiki/Using_graphs_to_model_Accounts
>> [2] http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.01.05
>> 
>> 
>>   
> 
> -- 
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
> 
> 
> 

Received on Sunday, 15 January 2012 18:26:54 UTC