RDB2RDF Working Group Teleconference -- 24 Apr 2012

<trackbot> Date: 24 April 2012

<mhausenblas> scribenick: mhausenblas

<scribe> scribenick: Ashok

<mhausenblas> Zakm, who's here?

<joerg> joerg joined

Minutes of last meeting

<mhausenblas> PROPOSAL: Accept the minutes of last meeting http://www.w3.org/2012/04/03-RDB2RDF-minutes.html

<boris> +1

Minutes approved without objection

<mhausenblas> ericP are you gonna join us?

Implementation feedback

<mhausenblas> + DM cannot be implemented as an R2RML mapping http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0021.html

<mhausenblas> + XSD mapping for binary columns http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0020.html

<mhausenblas> + implementability for tables w/o primary key http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0019.html

<mhausenblas> + using non-existing column in mapping http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0018.html

<mhausenblas> + Unnamed columns in rr:sqlQuery http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0017.html

Richard: There is an issue about blank nodes identifiers ...

<cygri> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0019.html

+ implementability for tables w/o primary key http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0019.html

<cygri> "DM: implementability for tables w/o primary key"

Richard: The multiple blank nodes cannot be told apart in queries, etc.

<Souri> Example: two rows in a single table T1<lender,borrower,amount,tdate>: R1 => <John,Mary,100,01-Jan-2012>, R2 => <John, Mary,100,01-Jan-2012>

<cygri> SELECT * { ?s ?p ?o }

<cygri> SELECT ROWID AS s, "T1/borrower" AS p, borrower AS o FROM T1 UNION SELECT ROWID AS s, "T1/amount" AS p, amount AS o

<Souri> ROWNUM ?

<cygri> ROWNUM works in this simple case but not in more complex ones AFAIK

Ashok: Can you write an appropriate SPARQL query? Translating to SQL is not part of our spec.

Richard: The fact that the nodes are indistinguishable is a problem ... this will show up in programming language

Ted: That is a schema problem

<juansequeda> Souri, you are correct

<Zakim> Souri, you wanted to ask if this is the correct DM based generated triples: _:b1 ex:borrower ex:John ; ex:amount 10 . _:b2 ex:borrower ex:John ; ex:amount 10 .

<juansequeda> souri, yes!

Richard: The translation from SPARQL to SQL cannot be done

<juansequeda> it needs to be distinct in order to be query preserving. Otherwise, you couldn't do aggregation

Richard: I would recommend we relax the requirement and implementations can add identifying information

<Zakim> juansequeda, you wanted to ask Richard what is his proposal? and to

Juan: You would lose query preservation

Richard: You should generate a blank node for each row or a blank node for each set of unique values

<Souri> Unless I am mistaken, unique key of a row could be NULL

Ashok: Isn't this a real corner case

Souri: Problem is we are crossing row boundaries

<mhausenblas> Michael: I tend to agree. It seems like a corner case to me as well. This just means we've experienced a limitation of our specs during CR.

<cygri> ericP, doesn't help you with Jena

<mhausenblas> Michael: If we can agree on corner case, we should discuss what would need to happen that we isolate this (add warning, remove TC, etc.)

<ericP> could you propose an API invocation to address?

<dmcneil> seems like you could also use "count" in SQL to see how many duplicates there are

<Zakim> ericP, you wanted to say you can always fall back to materializing for this case

<juansequeda> PROPOSAL: If you have a db that has a table that does not have a primary key, we recommend to materialize the RDF

<cygri> juansequeda, -1. the only tables without PKs that I see in practice are gigantic ones, like log tables.

<MacTed> -1 to dropping the test case

<MacTed> +1 to warning "RDB data with these characteristics cannot be (usefully) dynamically (direct) mapped; materialization (replication + transformation) is recommended."

<juansequeda> +1 to MacTed

Michael: We need to decide if this is a corner case and can be isolated or we change the spec

<ericP> or they are inefficient in that case, which may well be acceptable

<cygri> PROPOSAL: "if the table has no primary key, the row node is a blank node. whether this blank node is identical to other blank nodes is undefined."

<Zakim> cygri, you wanted to ask Souri how to write that view in Core SQL 2008

Richard, where would you put those words?

<ericP> suppose the T1 table is large and i want to not fully materialize

<ericP> SELECT ?owes { ?s <borrower> "Bob" ; <amount> ?owes } =>

<ericP> SELECT borrower AS p, amount AS o FROM T1 WHERE borrower="Bob" then add the bnodes and a type triple for each of those bnodes

<ericP> you only process the selected rows

Ted: Richard has made an argument based on a SPARQL query ... that is out of our charter ... a warning is sufficient

<dmcneil> +q

Ted: The RDF group is questioning the model

David: Do you assume that 1 SPARQL query leads to 1 SQL query?

Richard: No

David: Outlines a solution using cursor to create distinguishable blank nodes

Michael: Do we agree this is a corner case?

<juansequeda> This is not a bug. it's an implementation issue. It can be implemented... maybe inefficiently, but it can

Richard: It is a corner case but we need to fix the bug ... I suggest a 1 sentence change

<cygri> juansequeda, it cannot be implemented except when dumping

<juansequeda> cygri, you can do post processing afterwards.

<cygri> juansequeda, some implementers are not interested in dumping the db

Eric: I'm still working on trying to create a case where this is a problem

<MacTed> I think that sentence is not sufficient, if we're going in that direction. I'm not entirely opposed to a change in that direction (optional blank-node distinction over multiple identical-content rows).

Richard: It is a corner case but we need to fix it
... we are not mapping to a proper RDF graph because the nodes are not distinguishable

<cygri> MacTed, RDF concepts says: "Given two blank nodes, it is possible to determine whether or not they are the same."

<ericP> [[

<ericP> SELECT ?borrower ?amount ?address {

<ericP> _:t1 <T1#date> ?date ; <T1#borrower> ?borrower ; <T1#amount> ?amount .

<ericP> _:t2 <T2#name> ?borrower ; <T2#address> ?address

<ericP> FILTER (?date < "2012-01-01"^^xsd:date)

<ericP> SELECT T1.borrower, T1.amount, T2.address

<ericP> FROM T1, T2

<ericP> WHERE T1.date < "2012-01-01"

<ericP> AND T1.borrower=T2.name

<ericP> ]]

<ericP> (still not arriving at something i can't implement)

<Zakim> juansequeda, you wanted to ask why is this long. It's just a difference between lean and an not lean RDF graph http://www.w3.org/TR/rdf-mt/#graphdefs

Juan: We need to preserve cardinality

<cygri> juansequeda, good observation. and note that leanification doesn't change the semantics

<Souri> [removing bNodes from Eric's query] ?x <T1#date> ?date ; <T1#borrower> ?borrower ; <T1#amount> ?amount . ?y <T2#name> ?borrower ; <T2#address> ?address . FILTER (?date < "2012-01-01"^^xsd:date)

<MacTed> +1 base64 change to hex

<mhausenblas> ACTION: Richard to come up with concrete example re "DM: implementability for tables w/o primary key" [recorded in http://www.w3.org/2012/04/24-RDB2RDF-minutes.html#action01]

<trackbot> Created ACTION-205 - Come up with concrete example re "DM: implementability for tables w/o primary key" [on Richard Cyganiak - due 2012-05-01].

<mhausenblas> (meeting adjourned)

<mhausenblas> trackbot, end telecon

<juansequeda> mhausenblas, so what about turning in test cases. we were suppose to do that next week. I guess it is postponed

- DRAFT -

RDB2RDF Working Group Teleconference

24 Apr 2012

Attendees

Contents

Minutes of last meeting

Implementation feedback

Summary of Action Items

Scribe.perl diagnostic output