Non-unique Tables

From RDB2RDF
Revision as of 04:15, 13 May 2012 by Eric (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

SQL tables and views have 0 or more unique keys. Those with 0 unique keys may have repeated rows. This case is captured by test cases [1] and 2tables2duplicates0nulls.

A R2RML subjectMap is one of

  • a constant RDF term - rr:constant <foo>
  • a column value - rr:column "PAGE" (where the PAGE column has URLs)
  • a generated IRI - rr:template "http..{EMPNO}"); rr:termType rr:IRI
  • a generated blank node - rr:template "{lname}|{amount}"; rr:termType rr:BlankNode

A table like:

<tbody>
 </tbody>
IOUs
fnamelnameamount
BobSmith30
SueJones20
BobSmith30

Because the 1st and 3rd rows have the same values, no template string can create different subjects for the two rows. Because RDF is a set of triples, a assertion generated for each of the repeated rows:

_:Smith30 <IOUs#amount> 30.
_:Smith30 <IOUs#amount> 30.

yield a single triple.

Because the DM generated unique blank nodes for each row in a table with no primary key, these triples would have different subjects:

_:a <IOUs#amount> 30.
_:b <IOUs#amount> 30.

This makes it impossible for one to generate an R2RML mapping to the DM of a table with potentially repeated rows. Richard's proposal is to relax this requirement to permit these triples to have a single blank node, thus collaposing to one triple. In the cases where the person configuring R2RML knows these rows are unique (perhaps do to domain knowledge or constraints in the applications which populate the database), it would do no harm for the default behavior to generate repeated blank nodes for repeated rows because the user knows there are no repeated rows. Likewise if the user knows that repeated rows are uninteresting to the application. they may also be served by a default of generating the same node.