Re: Reification and Provenance modelling from Richard Cyganiak on 2011-09-22 (public-rdf-comments@w3.org from September 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 22 Sep 2011 18:49:33 +0100
To: Bob Ferris <zazi@smiy.org>
Cc: public-rdf-comments@w3.org
Message-Id: <DB158BCF-063D-41F0-9FE2-3996D3B8D00C@cyganiak.de>
Hi Bob,

On 21 Sep 2011, at 12:11, Bob Ferris wrote:
> I think that the important use cases are already covered in [1]. My specific one is powered by multiple information providers and requires an access control mechanism. Especially important for that use case is to be able to push back changes to its origins, i.e., if I have a resource description that is aggregated by information from multiple information providers, I need to know which statement is from which information provider and, furthermore, if single statements are spread over multiple graphs (views), I need to be able to handle changes on these statements as well.

Load the data from each information provider into a separate graph. Then create a single-triple graph for each triple, and assert {?g ex:isOriginalSourceOfTriple ?t} between the original graph and the single-triple graph. Whenever you merge or aggregate multiple graphs into a new graph, assert a new triple {?new_graph ex:containsTriple ?t}. This allows tracking of any triple back to its original source in order to update it.

>>>> SELECT * WHERE {
>>>>    TRIPLE ?t { ?s ?p ?o }
>>>>    ...
>>>> }
…
> SELECT ?t WHERE {
> 	?s ?p ?o ?t }

Both of these examples are equivalent except for order and an extra keyword and punctuation. There is no difference in complexity.

>>> My use case of my proposal is reification and how to relate single statements a.k.a. shortcut relations to its reification class instances.
>> 
>> Now we're getting somewhere. Can you explain why this use case of property reification isn't well-addressed by named graphs? An example might help.
> 
> I don't want to scramble this information into separate graphs, i.e., shortcut relations and reification class instances should be able to co-exist in one and the same graph.

You can leave everything in the original graph, and in addition create a new single-triple graph that contains only the reified triple. Use the graph IRI of the single-triple graph in place of a statement identifier.

>>>>> To make statements about them somewhere else we usually need an identifier to refer to them, or?
>>>> 
>>>> No, because graphs are literals, so one can repeat the literal to make statements about it.
>>> 
>>> Well, then I have the same disadvantage as in the existing Named Graph proposal, i.e., statements of one named graph do not have any semantically relation to identical statements of another named graphs.
>> 
>> That's not true. The semantic relation between the statements is that they're identical. It's like using the literal number 1 in two different graphs, or the string "Bob". We don't need to assign an identifier to these literals in order to know that they're the same. Literals are self-denoting in RDF.
> 
> Okay, you are right. However, graphs can be more complex than a simple number- or string-typed literal. Furthermore, we would utilise these graphs for further processing of our model. Usually a literal can be seen as a kind of leaf in a graph representation, or?

This has nothing to do with the original question asked above. You still don't need an identifier to refer to a graph literal, because it's a literal, and they are self-denoting. The complexity of the literal doesn't matter for this as long as equality is well-defined (and it is for RDF graphs).

> Quoted from [2]:
> 
> "one can also decouple a reused statement by changing its statement
> identifier; i.e., the triple of the statement are still the same
> but the relation to the original statement might now be another e.g.,
> reflected by a provenance statement e.g., <#s20> :original <#s19>"
> 
> i.e. if I intend that an utilised statement in multiple graphs belongs semantically together, so that I really refer to that statement, then I'll utilise the same statement identifier; otherwise, I'll utilise a different statement identifier (and if necessary I can still relate these statements to each other).

You can do the same with single-triple graphs.

> Let's imagine the following use case: you are trying to implement an algorithm that ranks information from multiple information providers. Before the aggregation and federation task, you would usually store the information fetched from different information providers separately. Therefore, you could utilise Named Graphs and statement identifiers. Different information providers can provide the same information, i.e., the same statements. However, to keep track of their origin you will maybe address them by different statement identifiers at the beginning.

The scheme I described in the beginning of this message could be used to handle this situation with named graphs.

>>> Real world knowledge description are then, at the moment with the existing SPARQL specification, not really query-able, if we have many isolated single-triple named graphs.
>> 
>> I don't understand what this means. Can you give me an example of such a knowledge description, and an example query that you cannot express in SPARQL if the data is organized in single-triple named graphs?
> 
> Let's take the multiple information providers scenario. If I would store the federated information still in separate graphs to keep track of the provenance, an information resource would not really be query-able, because single statements are isolated into separate graphs. (please keep the statement duplication proposal aside here)

Why would I keep that aside? It's how you solve that problem in SPARQL + single-triple graphs. My question was for an example that cannot be solved in SPARQL + single-triple graphs.

> However, by utilising statement identifiers I can still track the provenance and single statements are not scrambled into separate graphs and I can easily query this information by specifying the graph that contains all these statements.

You can do all that too by creating single-triple graphs and ex:originalSourceOfTriple/ex:containsTriple, as described in the beginning of this mail.

>> How would you represent these two options using statement identifiers?
> 
> Here is an example (following the syntax as introduced in [2]):

> 
> <#alice> :friend <#bob> <#s1> . # a statement that can be identified by statement identifier #s1
> <#alice> :friend <#bob> <#s2> . # a statement that can be identified by statement identifier #s2
> 
> <#g1> rdf:type rdfg:Graph <#s3> ;
> <#g1> :contains <#s1> <s#4> . # a graph that contains the statement #s1
> 
> <#g2> rdf:type rdfg:Graph <#s5> ;
> <#g2> :contains <#s1> <s#6> . # another graph that contains the statement #s2
> 
> <#g3> rdf:type rdfg:Graph <#s7> ;
> <#g3> :contains <#s2> <s#8> . # a graph that contains the statement s#2
> 
> #g1 and #g2 contain the same statement (#s1)
> #g3 contains another statement (#s2)

Same with single-triple graphs:

   <#s1> { <#alice> :friend <#bob> }
   <#s2> { <#alice> :friend <#bob> }

   <#g1> { <#alice> :friend <#bob> }
   <#g2> { <#alice> :friend <#bob> }
   <#g3> { <#alice> :friend <#bob> }
 
   <#metadata> {
     <#g1> ex:containsTriple <#s1>.
     <#g2> ex:containsTriple <#s1>.
     <#g3> ex:containsTriple <#s2>.
   }

>>>>> However, I believe that there is a strong antipathy for single-triple graphs.
>>>> 
>>>> This is not a technical argument.
>>> 
>>> The technical argument is that one of the bad query handling with single-triple graphs (see above).
>> 
>> You mean stores that don't support mirroring the named graphs into the default graph?
> 
> I intended to address the query-ness issue, i.e., scramble information (caused by "unnecessary" graph isolations) vs. composed information (produced by the utilisation of statement identifiers and statements that are de-coupled from its graph enclosure).

I maintain my claim that everything you can do with statement identifiers, you can do easily with single-triple named graphs too. I still have not seen anything that convinces me that your proposed scheme works any better than what we already can do with SPARQL today.

Best,
Richard



> 
>> That's not a complaint about the proposal, but a complaint about the state of implementations; and that's something we can't fix by writing something else into the spec.
>> 
>> Best,
>> Richard
> 
> Cheers,
> 
> 
> Bo
> 
> 
> [1] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs-UC
> [2] http://lists.w3.org/Archives/Public/public-rdf-comments/2011Jan/0001.html
>
Received on Thursday, 22 September 2011 17:50:16 UTC