Fwd: Re: ISSUE-9 Another question about Generate Blank Nodes from Souripriya Das on 2011-02-02 (public-rdb2rdf-wg@w3.org from February 2011)

From: Souripriya Das <souripriya.das@oracle.com>
Date: Wed, 02 Feb 2011 16:53:19 -0500
To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <4D49D24F.9020902@oracle.com>
earlier attempt failed for some reason ...

-------- Original Message --------
Subject:  Re: ISSUE-9 Another question about Generate Blank Nodes
Date:  Wed, 02 Feb 2011 16:37:01 -0500
From:  Souripriya Das <souripriya.das@oracle.com>
To:  public-rdb2rdf-wg@w3.org



Yes, as Ivan pointed out, a DM (Direct Mapping) spec provides no way of sending triples generated from a row into different graphs.

Here are some random thoughts (aka ramblings) on the topic of URIs vs bNodes for the table-with-no-primary-key case:

Looking at it from the query processing point of view, I noted that we do not actually generate the triples.

Let us consider the following SPARQL query:

SELECT ?x ?fn ?ln ?amt
WHERE {
   ?x rdf:type ex:IOUs ;
      table:fname  ?fn ;
      table:lname  ?ln .
      table:amount ?amt .
}

Only during query processing we actually generate the subject (URI or bNode, based upon what we finally decide)
for the two matching (virtual) triples corresponding to a solution. So, the query translates to

SELECT generateSubject('IOUs', fname, lname, amount)
      , '"' || fname || '"'
      , '"' || lname || '"'
      , '"' || amount || '"^^xsd:decimal'
   FROM IOUs;

where generateSubject represents a function that can return a URI or a bNode.
If IOUs had a primary key then it would return a URI.
But, we are considering the case where the table, IOUs in this case, does not have a primary key.

If a URI is generated for each (row) subject, then the solutions are:

?x                                          ?fn       ?ln      ?amt
--------------                              --------  -------- --------------------
<IOUs/fname=Bob,lname=Smith,amount=30:1#_>   "Bob"     "Smith"  "30"^^xsd:decimal
<IOUs/fname=Sue,lname=Jones,amount=20:1#_>   "Sue"     "Jones"  "20"^^xsd:decimal
<IOUs/fname=Bob,lname=Smith,amount=30:2#_>   "Bob"     "Smith"  "30"^^xsd:decimal

If a bNode is generated for each (row) subject, then the solutions are:

?x                                          ?fn       ?ln      ?amt
--------------                              --------  -------- --------------------
_:b1                                        "Bob"     "Smith"  "30"^^xsd:decimal
_:b2                                        "Sue"     "Jones"  "20"^^xsd:decimal
_:b3                                        "Bob"     "Smith"  "30"^^xsd:decimal

The returned URIs for the subjects returned could be based on use of unique generated IDs
(implying that user must understand that a subsequent run of the same query may return different URIs
for the subjects and direct use of a returned URI may in a subsequent query may not resolve to the any rows):

      <IOUs/fname=Bob,lname=Smith,amount=30,rr:unique=123456:#_>  OR just<IOUs/rr:unique=123456>
      <IOUs/fname=Sue,lname=Jones,amount=20rr:unique=234567:1#_>  OR just<IOUs/rr:unique=234567>
      <IOUs/fname=Bob,lname=Smith,amount=30,rr:unique=345678:#_>  OR just<IOUs/rr:unique=345678>

If bNodes are returned, the subjects could look like this
(no special documentation is needed for use of returned bNodes in subsequent queries because that restriction is specified in SPARQL):
      _:IOUs123456
      _:IOUs234567
      _:IOUs345678

We could generate a unique id for use in creating the subject, if it needs to be returned from the query, for each row
retrieved from a table. The translated query could reflect that.

SELECT generateSubject('IOUs', fname, lname, amount, generateUniqId())
      , '"' || fname || '"'
      , '"' || lname || '"'
      , '"' || amount || '"^^xsd:decimal'
   FROM IOUs;

So, finally, since whether the returned subject in the case of table with no primary key is a URI or a bNode, it has the same restriction
regarding reuse anyway, we could consider allowing both and leave it to the implementation.

Thanks,
- Souri.

----- Original Message -----
From: ivan@w3.org
To: SOURIPRIYA.DAS@oracle.com
Cc: public-rdb2rdf-wg@w3.org
Sent: Wednesday, February 2, 2011 5:00:52 AM GMT -05:00 US/Canada Eastern
Subject: Re: ISSUE-9 Another question about Generate Blank Nodes

Souri,

Isn't it correct that your example does not apply to Direct Mapping (that is what initiated the discussion). AFAIK, there is no control over target graphs in the Direct Mapping.

If one uses R2RML, then of course the situation you describe applies, but that is another matter; afaik, you already refer to this problem in the document, and R2RML authors should be careful about what they are doing for that case. But, again, I do not think this applies to the Direct Mapping.

Cheers

Ivan



On Feb 2, 2011, at 03:55 , Souripriya Das wrote:

>  Let me use the same test to illustrate a problem with use of bNodes vs. IRIs:
>  http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#2duplicates0nulls
>
>  we have
>  ┌┤IOUs├─┬───────┬────────┐
>  │ fname │ lname │ amount │
>  │ Bob   │ Smith │     30 │
>  │ Sue   │ Jones │     20 │
>  │ Bob   │ Smith │     30 │
>  └───────┴───────┴────────┘
>  Here are the generated triples, without showing the actual destination graph:
>
>  Using IRIs:
>  <IOUs/fname=Bob,lname=Smith,amount=30:1#_>  <IOUs#fname>  "Bob" ;
>                                            <IOUs#lname>  "Smith" ;
>                                            <IOUs#amount>  30.0 .
>  <IOUs/fname=Sue,lname=Jones,amount=20:1#_>  <IOUs#fname>  "Sue" ;
>                                            <IOUs#lname>  "Jones" ;
>                                            <IOUs#amount>  20.0 .
>  <IOUs/fname=Bob,lname=Smith,amount=30:2#_>  <IOUs#fname>  "Bob" ;
>                                            <IOUs#lname>  "Smith" ;
>                                            <IOUs#amount>  30.0 .
>  Using bNodes:
>  _:b1 rdf:type ex:IOUs;
>          table:fname "Bob";
>          table:lname "Smith";
>          table:amount "30"^^xsd:integer.
>
>  _:b2 rdf:type ex:IOUs;
>          table:fname "Sue";
>          table:lname "Jones";
>          table:amount "20"^^xsd:integer.
>
>  _:b3 rdf:type ex:IOUs;
>          table:fname "Bob";
>          table:lname "Smith";
>          table:amount "30"^^xsd:integer.
>
>  Now suppose that the triples generated from the table go to more than one named graphs:
>  - Graph<G1>  gets the triples generated from the columns "fname" and "lname"
>  - Graph<G2>  gets the triples generated from the column "amount"
>  - (this is not important here, but) assume that both graphs get the rdf:type triples
>
>  The following set of triples clearly represents a different RDF data than the sets shown above due to graph-local scope of bNodes:
>
>  <G1>
>  {
>  _:b1 rdf:type ex:IOUs;
>          table:fname "Bob";
>          table:lname "Smith" .
>  _:b2 rdf:type ex:IOUs;
>          table:fname "Sue";
>          table:lname "Jones" .
>  _:b3 rdf:type ex:IOUs;
>          table:fname "Bob";
>          table:lname "Smith" .
>  }
>
>  <G2>
>  {
>  _:b1 rdf:type ex:IOUs;
>          table:amount "30"^^xsd:integer.
>  _:b2 rdf:type ex:IOUs;
>          table:amount "20"^^xsd:integer.
>  _:b3 rdf:type ex:IOUs;
>          table:amount "30"^^xsd:integer.
>  }
>
>  It is possible to introduce IRIs as go-between and also use owl:sameAs to get back to the original semantics of the relational data, but it gets complicated.
>
>  Thanks,
>  - Souri.
>
>  ----- Original Message -----
>  From: eric@w3.org
>  To: ashok.malhotra@oracle.com
>  Cc: auer@informatik.uni-leipzig.de, juanfederico@gmail.com, public-rdb2rdf-wg@w3.org
>  Sent: Tuesday, February 1, 2011 8:18:44 PM GMT -05:00 US/Canada Eastern
>  Subject: Re: ISSUE-9 Another question about Generate Blank Nodes
>
>  * ashok malhotra<ashok.malhotra@oracle.com>  [2011-02-01 15:33-0800]
>>  Let me see if I understand the problem.
>>
>>  Suppose we have a table with no primary key and many columns.
>>  A triple would be generated for each column in the table and all the triples
>>  for a row would be anchored by a blank node.  Is this correct?
>
>  Yes, for
>  http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#2duplicates0nulls
>  we have
>  ┌┤IOUs├─┬───────┬────────┐
>  │ fname │ lname │ amount │
>  │ Bob   │ Smith │     30 │
>  │ Sue   │ Jones │     20 │
>  │ Bob   │ Smith │     30 │
>  └───────┴───────┴────────┘
>  The direct graph Sören proposes would look like:
>
>  <IOUs/fname=Bob,lname=Smith,amount=30:1#_>  <IOUs#fname>  "Bob" ;
>                                            <IOUs#fname>  "Smith" ;
>                                            <IOUs#fname>  30.0 .
>  <IOUs/fname=Sue,lname=Jones,amount=20:1#_>  <IOUs#fname>  "Sue" ;
>                                            <IOUs#fname>  "Jones" ;
>                                            <IOUs#fname>  20.0 .
>  <IOUs/fname=Bob,lname=Smith,amount=30:2#_>  <IOUs#fname>  "Bob" ;
>                                            <IOUs#fname>  "Smith" ;
>                                            <IOUs#fname>  30.0 .
>
>
>>  So, the problem is that we have a blank node anchoring the triples for each
>>  row but, really, the blank nodes for each row represent different entities.  Is this correct?
>
>  Yes, with the caveat that they represent different rows in the
>  database. The extent to which these rows represent different entities
>  is a matter of database modeling, to be kept in mind when formulating
>  queries.
>
>
>>  If so, then I'm with Soeren.  We can improve the details of his solution but his
>>  direction seems right.
>
>  I believe a motivation is to be linked-data-friendly; that is to
>  identify things with URLs and to serve them when asked, as in
>   GET<IOUs/fname=Bob,lname=Smith,amount=30:2#_>  HTTP/1.0
>
>  The thing we specifically don't want to do is to give the world and
>  identifier which we have no way of resolving, either in response to
>  a GET, or in response to a SPARQL query e.g.:
>   ASK {<IOUs/fname=Bob,lname=Smith,amount=30:2#_>  <IOUs#fname>  30.0 }
>
>  If I use an Oracle rownum to tweak one of a set of identical rows, or
>  if the rows are not identical and I use that to tweak the identifier,
>  I can no longer honor queries about this row, even though the row
>  still exists. I believe it is easier to honor the LD requirements if
>  we tie the RDF identifiers directly to the SQL identifiers, and don't
>  advertise identifiers otherwise.
>
>
>>  All the best, Ashok
>>
>>  On 2/1/2011 2:11 PM, Sören Auer wrote:
>>>  Hi all,
>>>
>>>  In todays telco several people (including Souri and me) supported the idea to abandon the use of blank notes. Is there any fundamental reason (beside philosopical views) to use blank nodes?
>>>  If not I suggest we just generate IRIs for all resources. Of course this does not yet solve the problem of how they should be created, but we could follow the following strategy:
>>>
>>>  * if there is a candidate key use the candidate key,
>>>  * if there is no candidate key, but an internal row identifier (e.g. Virtuoso has such one always) use this row identifier,
>>>  * if nether one exists, generate an identifier using a hash function over all values of the row + an incremented counter in case duplicate rows exist
>>>
>>>  Wouldn't this be a simple and effective solution to the problem?
>>>
>>>  Best,
>>>
>>>  Sören
>>>
>
>  --
>  -ericP
>
>


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 2 February 2011 21:55:22 UTC