RDB2RDF Working Group Teleconference -- 21 Sep 2010

<trackbot> Date: 21 September 2010

<Ashok> meeting: RDB2RDF

<scribe> scribenick: mhausenblas

Agenda is at http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2010Sep/0048.html

Admin

PROPOSAL: Accept the minutes of last meeting, see http://www.w3.org/2010/09/14-rdb2rdf-minutes.html

<hhalpin> +1

RESOLUTION: WG has accepted the minutes from last time

Change name of language to SQRL?

Michael: let's focus on FPWD for now

<hhalpin> my main comment is to leave that to the editors

Ashok: second that

Any comments on FPWD

<Souri> I have not had much time to work on it due to OOW

Ashok: Souri, Seema and Richard are working on it - any comments?

ericP: still some issues re CVS, we're working on it

hhalpin: do all editor have CVS working?

cygri: I believe it should be working now, yes - sent credentials to ericP

Michael: I can assist, yes

hhalpin: Souri and Seema?

<ericP> cygri, all three of you will get email when the sysfolks have dealt with the request

<hhalpin> straight XHTML

cygri: editors will sort out, yes

Souri: got hhalpin mail but not yet tested it

<hhalpin> next week is fine for me.

Ashok: so question is - are we on track for a FPWD by end of month?

cygri: how much content would you expect?

Ashok: a skeleton would be sufficient

<ericP> it's a balance between getting early feedback and taking advantage of an opportunity to make a splash

cygri: will have this and a lot of details not yet nailed down I

Ashok: completely fine if FPWD has questions/comments in there

hhalpin: would be good to have a something out there by end of week (heart beat requirement)
... gather early feedback

<hhalpin> heartbeat check/schedule by Sept 30th.

R2RML Semantics and direct mapping

Ashok: I'd prefer to have the semantics right after FPWD
... so this week we have ericP on the call

<juansequeda> Eric has been talking with Marcelo (... right?)

Ashok: did you and juansequeda agree on semantics, yet?

ericP: trying to balance against the Datalog one
... couple of issues (PK, FK to tuple, etc.)

Ashok: what doc are you referring to?

ericP: the maths for creating URI is plain English (?)

<hhalpin> I would like to see EricP list out these issues in IRC.

ericP: still believe the set notation would be more sound

<juansequeda> yes

<juansequeda> having problems with my phoen

<juansequeda> can you here me?

<juansequeda> im going to redail

hhalpin: would be good if we go with the approach the majority of the WG prefers
... can ericP list the issues?

<juansequeda> I talked to several people at Microsoft

<juansequeda> they said either Datalog or even relational algebra

juansequeda: ericP was working with Marcello

ericP: we exchanged a bunch of documents, yes

juansequeda: talked to MS people, for them Datalog is fine or relational algebra
... want to approach IBM people

soeren: think relational algebra would be easier to understand
... I already sent a draft

juansequeda: response I got was half half (between datalog and relational algebra)

hhalpin: more or less same in the DB group in Edinburgh

<hhalpin> the point is they can generally be made equivalent

juansequeda: seems to be two camps
... we can put them side by side

soeren: I like both but relational algebra would be easier to read

<hhalpin> These are the issues that need to be iterated.

ericP: I see quite some issues that might require a lot of hand waving (re bNodes, etc.)

<Souri> +1 to having both and then comparing side by side (taking into account EricP's comment about no-primary-key case handling)

juansequeda: need to think about the limitations mentioned by ericP
... would be good if ericP sends out a list
... of cases where he sees issues

MacTed: not sure if it is worth it determining if it is readable

<ericP> no pk: http://www.w3.org/2001/sw/rdb2rdf/directGraph/#no-pk

<scribe> ACTION: Eric to list issues re Datalog approach re PK, FK, etc [recorded in http://www.w3.org/2010/09/21-rdb2rdf-minutes.html#action01]

<trackbot> Created ACTION-72 - List issues re Datalog approach re PK, FK, etc [on Eric Prud'hommeaux - due 2010-09-28].

<hhalpin> What would be the identifier then?

juansequeda: talked with Marcello; he was against using a bNode for this (see also RDF next steps workshop)

<MacTed> default primary key = ROWID, which is RDB implementation dependent

<MacTed> when there's no such, concatenation of all fields is sometimes (often) used as a fallback

ericP: I can give you examples where this is needed

<hhalpin> good point MacTed

<MacTed> if there's no unique on that concat, then problems arise ... but there are problems already

<Souri> we would need to generate unique bNode label for each row (uniqueness is limited to the desitination graph)

ericP: just to give a scope on the problem - assume the RDB as dataware house, in this multi set case you run into issues

Ashok: re MacTed's comment

<MacTed> "implementation dependent" :-)

Ashok: ROWID can change, hence not possible

Souri: basic problem is that we don't have PK we need to identify each row
... can be limited to destination graph
... just thinking aloud now - if we say the ROWID is the bNode ID
... so could might have moved
... seems like it would work out for SPARQL query
... but not totally sure

juansequeda: if there is no PK, how do I get all the data?

Souri: can't get it from the RDB
... the bNode could cluster all the relevant data
... in subject position

<Zakim> ericP, you wanted to suggest how a blank node is interpreted in SQL

ericP: in SQL-land I can't identify a non-PK row
... same in RDF-land
... but in SPARQL/SQL update I can do it

juansequeda: a bNode identifies something, but in SQL we don't have the same

Souri: I disagree; the row boundary is there
... but we need a unique subject

juansequeda: I agree, yes - will check back with Marcello, as he has some reservation re this

<hhalpin> http://www.w3.org/2009/12/rdf-ws/papers/ws23

hhalpin: the semantics are a bit different though (PatH agrees with Marcello re this)
... so, whatever is the simplest solution is fine with me (sort of ignoring the original semantics)

<Souri> From practitioners point of view, the main difference between bNode and URI is one has a local (graph) scope and the other has global scope

ericP: so, hhalpin, which semantics are we talking about, then?

hhalpin: we should use it in a practical way

<ericP> Debtors:

<ericP> Bob Smith $30

<ericP> Debtors: { [ :fn "Bob" ; :ln "Smith" ; :amnt ] [ :fn "Bob" ; :ln "Smith" ; :amnt ] }

juansequeda: agree. need to identify and discuss issues

<ericP> SELECT SUM(amnt) FROM Debtors WHERE fn = "Bob" AND ln = "Smith"

<ericP> SELECT (SUM(amnt) AS ?a) { [ :fn "Bob" ; :ln "Smith"] }

<hhalpin> i.e. the existential variable interpretation is a bit silly, but the blank nodes are generally used for "grouping" data that doesn't otherwise have an identifier.

ericP: on a per-use cases basis decide on issues/coverage

<Souri> if we generate bNode, we need to ensure the uniqueness is maintained within each destination graph, if we use URIs uniqueness has to be global

<hhalpin> that's a good point Souri - i.e. why we need to consider the blank node semantically as basically a unique identifier...

<hhalpin> however, again, that bring's up the issue with MacTed's concat common practice

Souri: from practitioners POV, URI is a bit more complex - all we need to ensure the 'label' we produce is unique in the destination graph

(scribe missed the details of Souri's explanation)

ericP: I guess the most complex scenario is with FK
... FK have to reference candidate key
... could be a bNode
... so unclear how to deal with the materialisation

Souri: tough case indeed, need to think more about it

ericP: it's doable, just an extra step

Souri: I was thinking of having a joint condition - if there is no corresponding on the PK, then don't know how to generate it

juansequeda: need to distinguish default mapping and customisation

ericP: right, and we might write some UC out of the default mapping

Souri: we need to explicitly say what we can handle (the editors)

hhalpin: edge cases need to be addressed (test cases)
... need to highlight these cases (not only formally)
... agree with MacTed's point there

juansequeda: agree as well

<Souri> I agree to Harry about the edge cases

<hhalpin> not hide edge-cases in formal semantics

<hhalpin> even if they are there, but we need to warn implenters about them

Souri: need to be explicit about the edge cases, yes

ericP: the scenario I just gave above is actually a combination of two more basic ones

<ericP> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#no-pk

juansequeda: will send out paper about this and sync with ericP

(Souri explains another joint-condition example)

<ericP> "The columns in the referencing table must be the primary key or other candidate key in the referenced table." — http://en.wikipedia.org/wiki/Foreign_key

<ericP> ('cause I can't paste from Date)

Ashok: ok, thanks for all the input - we have our action items, will not be here next week

Michael: I'm around

[adjourned]

trackbot, end telecon

- DRAFT -

RDB2RDF Working Group Teleconference

21 Sep 2010

Attendees

Contents

Admin

Change name of language to SQRL?

Any comments on FPWD

R2RML Semantics and direct mapping

Summary of Action Items

Scribe.perl diagnostic output