Re: ISSUE-9 Another question about Generate Blank Nodes from Alexandre Bertails on 2011-02-01 (public-rdb2rdf-wg@w3.org from February 2011)

From: Alexandre Bertails <bertails@w3.org>
Date: Tue, 01 Feb 2011 18:46:35 -0500
To: Sören Auer <auer@informatik.uni-leipzig.de>
Cc: Juan Sequeda <juanfederico@gmail.com>, Eric Prud'hommeaux <eric@w3.org>, RDB2RDF Working Group WG <public-rdb2rdf-wg@w3.org>
Message-ID: <1296603995.26739.351.camel@simplet>

On Wed, 2011-02-02 at 00:33 +0100, Sören Auer wrote:
> Am 01.02.2011 23:54, schrieb Alexandre Bertails:
> >> * if there is a candidate key use the candidate key,
> >
> > What if they are several ones? What if there is a NULL value? Does it
> > make really sense to map a row to some arbitrary candidate key?
> 
>  From my point of view it does, since then you can talk about (i.e. link 
> from the Data Web to) that row.

The goal of the Direct Mapping is to give an RDF semantics to an
arbitrary relational database. Bringing the results to the Linked Data
work can be done later.

> 
> >> * if there is no candidate key, but an internal row identifier (e.g.
> >> Virtuoso has such one always) use this row identifier,
> >
> > You can be tempted to use the row identifier but this one must remain
> > hidden as it's not exposed in SQL. It's only accessible by the database
> > vendor, not by the guys relying only on SQL, like in our prototype.
> 
> They might be available by means of a stored procedure or defined 
> function. If they are, why not using them?

I'd rather no try to do that as we'll have some issues modeling them
both in the denotational semantics and in the rules.

> 
> >> * if nether one exists, generate an identifier using a hash function
> >> over all values of the row + an incremented counter in case duplicate
> >> rows exist
> >
> > "incremented counter" sound like a side-effect to me :-) I'm interested
> > to know how you will simply translate that in a mathematical function
> > (ie. the output depends only on the input).
> 
> Why is that required? Since the duplicate rows are not distinguishable 
> anyway it also doesn't matter if their identifiers are permuted.

The purpose of a semantics is to define precisely the meaning of
something. It's suitable for proofs for example.

By experience, I can tell you that reasoning with side-effects if very
difficult. That's why people use real functions to define a semantics.

So I would say it become "required" as soon as people started speaking
about "semantics" or "algebra".

> 
> > Anyway, the behaviour of such a URI mimics so much the semantics of a
> > Blank Node that I really prefer to see a real Blank Node instead.
> 
> I have the impression the semantics of blank nodes is rather unclear and 
> debated. From a practical point of view the only difference between a 
> blanknode and an IRI is that blank nodes are not unique globally and 
> thus not really usable with Linked Data (which to support is one of the 
> tasks per our charter).

If the Direct Mapping is part of a chain of transformation, this can be
achieved later in this chain.

Alexandre.

> 
> Have a good night,
> 
> Sören
>

Received on Tuesday, 1 February 2011 23:46:34 UTC