Re: queries, snowflakes and referential opacity (was Re: Three ideas)

Dear Thomas, 

>> * undisputed: noone has ever objected so far to this statement. Whether this means the statement is true we do not know nor care: simply, there has been no discussion on it. This is expressed as a plain RDF triple, e.g., [1]. 
>> * disputed: two or more different and incompatible opinions exist about this statement. We do not take side, we represent them all as conjectural, and none as asserted. This is expressed with quoted triples (or, in my proposal, with conjectures), e.g., [2] and [3].
>> * settled: two or more different and incompatible opinions exist about this statement. Yet, maybe following the majority view of the scholars, we do take side, and represent them all as conjectural, but at the same time one as asserted. I think this is different than simply re-expressing the one as a plain triple: a vase broken and glued back together is different from an intact vase: once a doubt has been expressed, there is no going back. Settled situations are expressed with quoted triples for the losing statements, and an annotated triple (or, in my proposal, a collapsed conjecture) for the winning one, e.g., [4] and [5]. 
>> 
>> For instance, suppose that this is a dataset containing all types of statements:  
>> 
>> # no-one ever objected to this
>> [1]    :monaLisa      dc:creator :daVinci .   
>> 
>> # two opinions exist on this, and we take no side
>> [2] << :salvatorMundi dc:creator :daVinci     >> :accordingTo :MartinKemp .    
>> [3] << :salvatorMundi dc:creator :Boltraffio  >> :accordingTo :JacquesFranck .  
>> 
>> # this one used to be misattributed, but now scholars agree with Pietro Marani, so we assert this, too. 
>> [4] << :annunciation  dc:creator :Ghirlandaio >> :accordingTo :earlyScholars .  
>> [5a] << :annunciation  dc:creator :daVinci     >> :accordingTo :PietroMarani  .  
>> [5b]    :annunciation  dc:creator :daVinci.                                      
> 
> So it seems that you can realize three categories - undisputed, disputed and settled - with two syntactic variants: unasserted and asserted. But what if you still want to annotate statement [1] with a source? Would it then be a disputed claim? What about all the other possible ways to annotate or qualify a statement? It seems to me that an explicit annotation declaring a certain statement as disputed, undisputed, whatever would a provide a more extensible solution, more in line with how the semantic web works in general.

These two syntactic variants combine themselves in three modes, plain triple, quoted triple and annotated triple, so I forcefully mapped our three categories onto them as I could. One of my complaints about rdf-star is that annotated triples are not first class entities, so that I cannot quote an annotated triple ("Bruce thinks that Alice's hypothesis is correct"). 

Conjectures do separate simply quoted statements from settled statements. rdf-star does not. 

>> "Give me all the paintings somebody for any reason attributed to Leonardo da Vinci": this means both undisputed and disputed claims, so in this way:
>> 
>> SELECT DISTINCT ?painting WHERE {
>> { ?painting dc:creator :daVinci. } 
>> UNION 
>> { << ?painting dc:creator :daVinci. >> :accordingTo ?anyOne . } 
>> }
> 
> This is the query I was referring to. To me this represents quite a complication of matters. However, two things:
> 
> 1) in this case your approach is not the culprit. A syntactic shortcut to facilitate querying for a triple in asserted and unasserted from would seem justifified if quaifed relations were to become a standard modelling technique.

I am all in favour of extending SPARQL's syntax to support conjectural queries. It is my understanding though that no extension of its semantics is needed, and that we only need, as you say, a syntactic shortcut. 

> 2) in other mails I understood that you would make much wider use of unasserted embedded statements, to model practically anything that is qualified in any way - that’s what I was arguing about with you, and in such a broadly defined scenario the need to query for a UNION of both representations IMO would indeed present quite a burden.

Yes, true. In my mind a plainly stated triple represent the exception rather than the norm, since I truly believe most statements are non absolute but conceptually constrained in one way or another. But then, if all you have are unasserted embedded statements, then all your queries end up being simple... a different way of simple, but simple anyway, since you would not have to fight with UNIONs of asserted and non-asserted statements.  

>> In general, I am a newcomer in this part of the SW, but I am surprised at the abundant reliance on blank nodes for handling so many dark and unconfessable aspects of data representation, and this is no exception. Even though they do not have an IRI, they are still nodes, i.e. they represent entities that exist in the dataset, they can be counted, they affect and are affected by the overall ontology, etc., yet they seem to be used as duct tape is used in engineering, as the quick fix to keep any two random things together, good for every situation until we find something better. I am not sure I like it. 
> 
> The are a matter that is more complicated than it seems. Aidan Hogan’s "Everything you ever wanted to know about blank nodes" is a good start if you’re looking for a through introduction. 
> Just two things:
> - they have counting semantics in SPARQL but not in RDF. In RDF they are existential in the FOL sense, but RDF semantics doesn’t REQUIRE leaning.
> - they are more than just duct tape, they are indeed a very elegant tool. They provide a means to add structure to graphs without adding much burden. We are very used to structures - lists, trees, tables - that are all not provided out of the box by a grah. Blank nodes help create them without much fuss and without distracting from the core issues we want to express.
> - and if you think about the elegenace with which blank nodes allow you to make composite statement although RDF is strictly monotonic and no statement is allowed to rule into the meaning of anoter statement - that wouldn’t be possible any other way I guess.
> That’s three things atually.

Right right. Except that blank nodes are entities, i.e. things, and they are often of unknown types, and usually they express abstract concepts, such as lists, trees, tables, that are "natural" in the mind of the creator of the dataset, and not in my mind as a user. If I prefer my ontology to contain only "real" things, and not "concepts", then a blank node is complicated to allow. 

>> I am totally convinced that referential opacity in rdf-star pollutes the well. 
>> 
>> I am also convinced that the well is already dry and full of snakes. 
>> 
>> Once upon a time, many years ago, it was decided that different IRIs identify different entities. At the same time, people actively avoided to create a single, global repository where a single IRI for each common entity could be created, shared and re-used. Finally, they selected a higher layer of the semantic web, OWL, to handle the concept of sameness between entities, as if it was a weird ontological aspect of reality rather than a structural foundation of the representation model. 
>> 
>> Three bad decisions, IMHO. 
> 
> You are not "on the web", IMHO. This is a decentralized information system, which brings with it some burdens - but less burdens than chances, some argue.

I disagree. I have been playing  with the web (the web of document, truly) since its early days, and I have always been amazed by the totally disconnected expectation that different URIs represent different resources, when clearly this was not the case in real life: 

In real life, "http://www.site.com/" is the same resource as "http://www.site.com/index.html" and "http://195.116.25.1/" and "http://195.116.25.1/index.html", and we could debate about the corresponding "https://..." but still we must pretend they are actually different, why? 

This is not a big deal with actual documents. For instance, Akoma Ntoso, the XML vocabulary for legislative documents I have been working on in the last 20 years, allows fully formed aliases for the main URI of a document, and transparently responds with a Manifestation (an actual document with a proper digital representation, e.g. "the Akoma Ntoso XML representation of the version dated 22/01/2022 of the law n. 156 of 13/05/2019") when you actually requested a FRBR Work (a conceptual document, e.g. "law n. 156 of 13/05/2019"). I think this is solid and correct both conceptually and for the users' expectations. 

Not so in the Semantic Web, where different IRIs are different entities, and you can realign them only at the ontological level. In my mind, a simple "rdf:sameAs" instead of "owl:sameAs" would have sufficed, with the expectation that all triples associated to any of the URI connected thru it are pointing to the SAME ENTITY (and not to different entities that happen to be spiritually joined in the mind of the readers assuming that they use OWL). Sadly, this is not the case, so I am not using owl:sameAs at all and I do not fall prey of the dangers of referential opacity.  

>> I hope I was clear now. 
> 
> Yes you were and I see why you don’t regard referential opacity as a big problem. But I disagree.

I think I showed up later than when the big discussions about Refential Opacity were made, so I am curious to know you point of view. 

Ciao

Fabio






--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/

Received on Thursday, 27 January 2022 11:23:02 UTC