Re: Agenda for June 14 Telcon - Revision 1 from Michael Hausenblas on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Tue, 14 Jun 2011 12:07:33 +0100
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: Eric Prud'hommeaux <eric@w3.org>, ashok.malhotra@oracle.com, public-rdb2rdf-wg@w3.org
Message-Id: <2A910E7E-9C7E-4575-BD44-712A95874DBC@deri.org>
>  It is ages I'm asking to this WG how to rebuild the correct answers  
> with explicit NULLs from your representation

This is, IMO, the core of the problem. You're asking rather than  
coming up with a concrete wording for the proposal.

Please, for the sake of getting this issue closed and meeting the  
September deadline for LC: Enrico, can you draft a concrete wording  
such as:


[[
   PROPOSAL: To resolve ISSUE-42, ...
]]


that we can discuss and hopefully resolve today?

If we fail to get this done today I'm inclined to change the overall  
timeline because we have a lot of more issues to resolve and simply  
can not afford it to discuss one single issue (no matter how important  
it is) till the cows come home.

This is not a scientific beauty context. We're writing a spec, for  
heavens sake.

Cheers,
 Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 14 Jun 2011, at 11:44, Enrico Franconi wrote:

> On 13 Jun 2011, at 23:16, Eric Prud'hommeaux wrote:
>
>> There is a fundamental difference between SPARQL and SQL users in  
>> that SQL users either prohibit a query from answering with NULLs:
>>   SELECT name, company            
>> ┌────────────────┐
>>    FROM Conctacts    │ name │ company │
>>   WHERE name="Sue"    
>> ├──────┼─────────┤
>>     AND company IS NOT NULL   
>> └──────┴─────────┘
>> or they write in some application code to skip over the NULLs, or,  
>> pretty commonly, the UI paints an empty string and the interface  
>> user has to guess whether it's was a NULL or a company named "".  
>> The intent of the query in this example was clearly to get the  
>> names of the companies which Sue represents, for wich neither NULL  
>> nor r2rml:NULL nor "" are acceptable answers.
>
> I claim that you can filter out NULLs, exactly like you would do in  
> SQL. On which ground do you claim that applications built on top of  
> RDF data are different from applications built on top a RDB wrt the  
> usage of NULLs? I don't see any evidence that there is such a  
> radical difference to justify your non-standard way in dealing with  
> standard NULLs.
>
>> At any rate, I was just arguing that given a tension between  
>> putting burden on the query author to incorporate <code>FILTER (? 
>> company != r2rml:NULL)</code> into the above query, vs. requiring  
>> the person who wants to see the NULL to know the schema:
>>                                                       
>> ┌────────────────┐
>>  SELECT *                                            │  who │  
>> company │
>>   WHERE { ?who <Conctacts#name> "Sue"         
>> ├──────┼─────────┤
>>   OPTIONAL { ?who <Conctacts#company> ?company } }   │  Sue │  
>> UNBOUND │
>>             └──────┴─────────┘
>> , I *think* the rest of the WG is in favor of the the latter (hence  
>> the claim of rough concensus).
>
> No, this doesn't work, since you would confuse the answer with a  
> NULL value with the answer with a non existing value. So, the above  
> query doesn't do the job you are declaring. It is ages I'm asking to  
> this WG how to rebuild the correct answers with explicit NULLs from  
> your representation (even with the schema). To no avail.
> So, please tell me explicitly how do you get the right answer in the  
> above case, with all the details (how the schema is used, how do you  
> distinguish the missing value with the NULL value, how this can be  
> applied mechanically to general queries, etc).
>
>>> That's why I am saying "This mapping for NULL values is arbitrary  
>>> since the WG has left unexplored its relationship with the  
>>> original meaning and behaviour of NULL values in the source RDB."
>
> I can repeat that :-)
>
>>> What I am asking you since ages is to go through my three examples  
>>> and see how your proposal would actually encode the answers, and  
>>> show how this would lead to a generic recipe.
>
> This request still stands.
>
>>> My argument is that this will most likely be possible, but that it  
>>> will be overly complex since it will necessarily require the  
>>> ability to recognise whether a missing value is a NULL or not  
>>> (also in the answer set!).
>
> Let's see your answer to my question in bold above.
>
>>> Clearly, by having explicit NULL values this problem is avoided.  
>>> Moreover, you can easily switch the the absent-NULL representation  
>>> by just filtering all the tuples with NULL values in one simple  
>>> shot.
>>
>> In <http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#Comments_and_Proposal_by_Enrico 
>> >, you asked how to discriminate between the direct graphs of
>>  ┌┤R├────────┐ and ┌┤R'├┐
>>  │ ID │    A │     │ ID │
>>  ├────┼──────┤     ├────┤
>>  │  1 │ NULL │     │  1 │
>>  └────┴──────┘     └────┘
>> , but we do that by knowing the schema so the question doesn't help  
>> us learn what is a reasonable mapping.
>
> This is too vague: "we do that by knowing the schema". As I said  
> above, please tell how do you proceed explicitly.
>
>>  I instead propose that you ask questions of the ┤Conctacts├  
>> database above and show how, even knowing the schema, the direct  
>> graph doesn't give you reallistic access to information. Remember,  
>> this isn't a database interchance language, but instead a way to  
>> give RDF users an useful view of relational data.
>
> I don't understand this point :-(
>
> cheers
> --e.
>
Received on Tuesday, 14 June 2011 11:08:03 UTC