Re: bNodes as graph identifiers (ISSUE-131) from Pat Hayes on 2013-06-02 (public-rdf-wg@w3.org from June 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 2 Jun 2013 01:08:50 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org, Steve Harris <steve.harris@garlik.com>
Message-Id: <45174E89-5150-4BB9-B703-047791172060@ihmc.us>
On Jun 1, 2013, at 6:35 PM, Sandro Hawke wrote:

> On 06/01/2013 11:17 AM, Pat Hayes wrote:
>> On Jun 1, 2013, at 9:31 AM, Sandro Hawke wrote:
>> 
>>> On 05/31/2013 08:12 PM, Pat Hayes wrote:
>>>> On May 31, 2013, at 4:44 PM, Sandro Hawke wrote:
>>>> 
>>>>> [ BTW, I'm CC'ing Steve, since he's not a member of the RDF Working Group any more.   Steve, if you want to reply to the group, please remember to send to public-rdf-comments@w3.org.]
>>>>> 
>>>>> On 05/31/2013 01:17 PM, Andy Seaborne wrote:
>>>>>> On 31/05/13 20:27, Sandro Hawke wrote:
>>>>>>> Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>>>>>>> 
>>>>>>>> On 31/05/13 17:00, Sandro Hawke wrote:
>>>>>>>>> On 05/29/2013 01:47 PM, Steve Harris wrote:
>>>>>>>>>> [ as a side note I find it bizarre that I'm having to advocate NOT
>>>>>>>>>> changing a 14
>>>>>>>>>> year old, industrially deployed spec, at the 11th hour of the
>>>>>>>>>> standardisation
>>>>>>>>>> process, to add a feature that's used by a tiny minority of deployed
>>>>>>>>>> systems -
>>>>>>>>>> if anything was to strike an outsider as peculiar about this WGs
>>>>>>>>>> process, it
>>>>>>>>>> would surely be this feature ]
>>>>>>>>> I don't understand this complaint at all.  This Working Group is
>>>>>>>>> chartered to provide a standard mechanism for working with and
>>>>>>>> sharing
>>>>>>>>> multiple graphs.   In the chartering process in 2010, our various
>>>>>>>> inputs
>>>>>>>>> all said this was a very high priority.   A lot of folks said to add
>>>>>>>>> Named Graphs or fix reification or something like that.
>>>>>>>> Specifically, blank nodes for graph names, not datasets in general.
>>>>>>>> 
>>>>>>> What 14-year old spec do you think Steve was referring to?
>>>>>> Best to ask him (but I'm reading "RDF" and referring to blank nodes, them not being anonymous individuals for better or worse) , simple entailment, leanness etc.)
>>>>>> 
>>>>> It seems clear to me that he was referring to RDF (which became a REC in 1999, 14 years ago).    He seemed to be protesting changing it, so I was explaining why we were doing so.
>>>>> 
>>>>> If he was protesting changing the Working Group's design for handling graph identification -- well, that was only settled in October 2012, and has never been published as more than an Ordinary Working Draft.    So that's still relatively easy to change. (Relative to how it will be after Last Call, Candidate Recommendation, getting implementations, Proposed Recommendation, and Recommendation, let alone the passing of 14 years.)
>>>>> 
>>>>> It seems like maybe you or he are thinking I'm somehow proposing changing the semantics of blank nodes in RDF.   That's certainly not my intention.  Is there some way that allowing blank nodes to be used as "graph names" in datasets would have that effect?
>>>> No, it doesn't.
>>>> 
>>>> However, if we were to also add a sensible semantics for blank node graph labels, to the effect that using _:X as a label of the graph G effectively asserts the equation  _:X = G, then we would have to treat these blank nodes slightly more carefully. In effect, this would treat the graph labelling syntax as making an assertion involving the blank node, and that assertion would have the consequence that, for example, a dataset with a URI label (which does *not* make the analogous assertion of an equation, we have already decided):
>>> I am not proposing adding such semantics, although I see the appeal.
>> I am proposing it, as a condition on allowing bnodes as graph labels. Allowing bnode graph labels without a semantics is worse than not allowing them at all.
> 
> I hope you'll reconsider after we talk this through.

Possibly, but I doubt it. I have been pretty consistent on this now for quite a while, and I havnt seen any reaason to change my mind.

> 
>>>   The main problem with doing so is that it would mean that if someone Skolemized the graph names in a dataset, they'd be losing these conditions, which might significantly change the meaning of the dataset.
>> Right. Skolemization does lose meaning, in fact.
> 
> As long as that's true, I don't see how we can be telling Steve, et al, to just Skolemize.  

If someone wants to only be using IRIs as labels then they cannot possibly be relying on any bnode semantic conditions, so skolemization will be no hardship to them. I don't get the sense that Steve is pining for a fully semantically justified metadata mechanism. 

>  I think it make implementation pretty hard (but maybe there's a trick that makes it easy).
> 
>> But without this condition, some other odd things happen. For example, leaning (leanening?) a graph could eliminate a blank node that is being used as a graph label. Do you want to allow that?
> 
> My intuitive notion of leanness has to do with blank nodes being used/referred to.   A 'leaning' operation can't remove triples unless it knows that no reference exists from "outside" to the blank nodes.  

I have no idea what you mean here. What is a reference to a blank node? Leaning is defined in terms of instances of graphs.

>   That's easy to know when dealing with parsing/generating g-texts, but not when there might by references via API calls. With dataset syntaxes, it's doable, too, as long as we take into account the idea that blank nodes might be graph names.
> 
>> 
>>>    Also, I think our minimalist approach to dataset semantics is going to be confusing to people and I think having this additional twist, as simple as it is when one understands it, would be quite confusing to people who haven't quite grasped it.
>> The part that is confusing is the minimalist view of IRIs as graph labels. At least we have here an opportunity to have *some* graph labels that actually make semantic sense. How can it be confusing to be told that labelling a graph with a label means that the label refers to the graph??  What else could it possibly mean?
> 
> It could mean that it refers to the graph indirectly, in the same way IRI graph names do.

No, they don't refer to the graph ***at all***. There is no notion of "indirect reference" in RDF, in the current documents. Look, Sandro, PLEASE get this straight. The WG has taken a decision which implies that an IRI used as a graph name can refer to something other than the graph. That means that it DOES NOT REFER TO THE GRAPH. That is an absolute end of story regarding graph names and graphs. There is NO WAY in the current semantics to get around this. 

> 
> Having both direct reference and indirect reference in one system is pretty complicated.

It would be if that is what we would have. But in fact, what we would have is one (and only one) way to have graph names referring to their graphs, which is to use bnode graph labels. WIthout this we have no way to refer to graphs. Using IRIs as graph labels, the WG has decided, does not provide any referntial connection between the label and the graph. 

>   If we only have indirect reference, then people can (nearly all the time) just think of it as reference and have things work.

I do not believe that it does work. 

Pat

> 
>     -- Sandro
> 
>> Pat
>> 
>>> I'm thinking it's high time for me to write that "Working With Datasets" or "Dataset Vocabularies" non-rec-track document. Soon...
>>> 
>>>    -- Sandro
>>> 
>>> 
>>>> { {ex:this rdf:type ex:foodle .}
>>>> ex:this { G } }
>>>> 
>>>> would not entail the similar dataset which has it as an instance:
>>>> 
>>>> { {_:x rdf:type ex:foodle .}
>>>> _:x { G } }
>>>> 
>>>> because the second, but not the first, really does assert that the graph G is a foodle.
>>>> 
>>>> Formally, say that a blank node used as a graph label is an "identifying blank node", and exclude identifying blank nodes from the definition of "instance". That covers what is needed (including definitions of "lean" and the simple-entailment interpolation lemma.)
>>>> 
>>>> Intuitively, this is just like what you would get if the labelling were done by adding a triple
>>>> 
>>>> _:x owl:sameAs {G}
>>>> 
>>>> to the default graph, where {G} is a graph literal. The labelling acts like a special kind of RDF triple, in effect.
>>>> 
>>>>>> BTW: Are graph literals in N3 tidy or not?
>>>>> I don't know what "tidy" means in this context.   Do you mean "lean"?    If so, I'd say that I think the usual style of N3 is to treat graphs as logical formulas, in which case there's no way to detect the presence of meaningless triples and see whether a graph is lean or not.    But there may be systems (even some predicates in cwm) which expose graphs at the syntactic level allowing this inspection; I'm not sure about that.   I'm fairly confident non-lean graphs would be described in N3 as entailing and being entailed by their lean versions, not being equal to their lean versions.    But as a matter of efficiency, systems might well be permitted to silently lean their graphs, since, as I say, this isn't something that would be detectable through the normal interface.
>>>> But again, be careful leaning a graph when the bnodes might be identifying bnodes. In general, you have to take into account *all * the triples that might contain a bnode before you can make decisions about leanness, and that already requires care in datasets where bnodes can be shared [1]. If we have the above labelling semantics, then the labelling itself is another kind of "triple" that has to be taken into account.
>>>> 
>>>> Pat
>>>> 
>>>> [1]  For example, consider
>>>> 
>>>> ex:g1 { ex:a ex:p ex:c .
>>>>              _:x ex:p ex:c . }
>>>> ex:g2 {_:x ex:q ex: c . }
>>>> 
>>>> looked at in isolation, g1 is non-lean, but if take the union with the other graph containing the same bnode (which IMO we should) then the whole dataset *is* lean.
>>>> 
>>>> 
>>>> 
>>>>>     -- Sandro
>>>>> 
>>>>> 
>>>>>>>>    Andy
>>>>> 
>>>> ------------------------------------------------------------
>>>> IHMC                                     (850)434 8903 or (650)494 3973
>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>> Pensacola                            (850)202 4440   fax
>>>> FL 32502                              (850)291 0667   mobile
>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 2 June 2013 06:09:23 UTC