Re: Proposal for the Direct Mapping from Eric Prud'hommeaux on 2011-08-05 (public-rdb2rdf-wg@w3.org from August 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 5 Aug 2011 20:09:37 +0200
To: Juan Sequeda <juanfederico@gmail.com>
Cc: Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, rdb2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20110805180933.GF27646@w3.org>
* Juan Sequeda <juanfederico@gmail.com> [2011-08-03 19:22-0500]
> On Wed, Aug 3, 2011 at 12:46 PM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > * Richard Cyganiak <richard@cyganiak.de> [2011-08-02 23:32+0100]
> > > On 2 Aug 2011, at 15:19, Eric Prud'hommeaux wrote:
> > > >  • DM is for "all the tables in a database"
> > > >    I debated this; I didn't want to be alarm folks who would think
> > > >    they'd have to expose everything if they didn't want to. The
> > > >    alternative is to parametrize; neither is terribly attractive. I
> > > >    guess "all tables" is fine.
> > >
> > > "all tables and views in the schema"?
> >
> > "each table and view in a database schema"?
> > done in two places (here and the definition below).
> >
> > > >  • s/an SQL/a SQL/
> > > >    This depends on whether you call it "S Q L" or "sequal". The SQL
> > > >    spec uses "an", e.g. "Effects of SQL-statements in an
> > SQL-transaction".
> > >
> > > Ah, interesting point. R2RML uses “a SQL” but that's just my personal
> > preference. I guess the spec should be considered authoritative on this.
> > >
> > > > [[
> > > > The Direct Mapping is a formula for creating an RDF graph from the
> > > > rows of each table in a database. A base IRI defines a web space for
> > > > the labels in this graph; all labels are generated by appending to the
> > > > base.
> > >
> > > There are no “labels” in an RDF graph. Let's please stick to the standard
> > terminology from the specs.
> >
> > done
> > also s/attribute/column/ # ignoring the question of "fields"
> >
> > > > The functions scalar and reference extract the scalar and reference
> > > > attributes (those participating in a foreign key) respectively:
> > >
> > > Why does this have to be formulated as “functions”?
> >
> > Is there a more intuitive way to say that there's an exact mapping from the
> > input onto the outputs?
> > And isn't that exactly what an implementor wants to know?
> >
> > > > dfn scalars: the attributes in a table which are NOT in any foreign
> > > >   key.
> > >
> > > How about: The non-foreign key columns of a table are the columns which
> > are not in any foreign key.
> >
> > Looking at it in-situ <
> > http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-scalars>, I'm not
> > convinced that the "defintion X: X is..." redundancy will be helpful.
> >
> > > > dfn references: the attributes in a table's foreign keys.
> > >
> > > How about: The foreign key columns of a table are the columns which are
> > in some foreign key.
> >
> > ditto
> >
> > > > SQL table and attribute identifiers compose RDF IRIs in the direct
> > > > graph. These identifiers are separated by the punctuation characters
> > > > '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> > > > encoding
> > > > <
> > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> > >
> > > > except that only the above punctuation and the characters not
> > > > permitted in RDF IRIs are escaped.
> > >
> > > I'd define once: The URL-encoded form of a string is …
> > >
> > > And then explicitly state that the so-and-so IRI is the concatentation of
> > base IRI, '/', URL-encoded form of the table name, and so on.
> > >
> > > (I recall discussions about using relative IRIs in the direct mapping. It
> > might be easiest to limit that to the examples. “The example omits the base
> > IRI for brevity, and uses relative IRIs. In the actual direct mapping graph,
> > the base IRI would be prepended to all IRIs.”)
> >
> > Didn't attack yet. stuck in a todo.
> >
> > > > In the direct graph, there is an identifier for each row in a database
> > > > table. If the row is in a table with a primary key, this is formed
> > > > from the table name and the attribute names and values of each
> > attribute
> > > > in the primary key. If there is no primary key for the table, the row
> > > > identifier is a fresh blank node:
> > > >
> > > > dfn row identifier:
> > > >
> > > >   if the table has a primary key with attributes, the relative IRI for
> > > >   the row identifier is the concatenation of the table name, '/', and
> > > >   a ','-separated concatenation of each attribute name, '=', and the
> > > >   attribute value.
> > > >
> > > >   if the table has no primary key, the row identifier is a fresh blank
> > > >   node.
> > >
> > > This doesn't need to be repeated twice. I'd call it row IRI for maximum
> > clarity.
> >
> > I'm not sure what's repeated. If you mean that there are two clauses, they
> > deal with different cases.
> > Re: "row IRI", we could say that "row identifier" is either a "row IRI" or
> > "row blank node". Proposed text?
> >
> > > > A (potentially unary) list of attribute names in a table form a
> > > > property IRI:
> > > >
> > > > dfn property IRI: the concationation of the table name, '/', and a
> > > >   ','-separated concatonation of each attribute name, and a '#' at
> > > >   the end of the property IRI.
> > >
> > > This doesn't need to be repeated one-and-a-half times.
> >
> > The property IRI is simpler than the earlier definition (doesn't include
> > column values).
> >
> > > > The values in a row are mapped to RDF literals:
> > > >
> > > > dfn literal map: a mapping from an SQL value with a datatype to an RDF
> > > >   literal with and XML Schema datatype where the RDF literal has a
> > > >   lexical value equivalent to the SQL lexical value and the datatype
> > > >   mapping is found in this table:
> > > >
> > > > SQL         XSD datatype
> > > > ___     ____________
> > > > INT         http://www.w3.org/TR/xmlschema-2/#integer
> > > > FLOAT       http://www.w3.org/TR/xmlschema-2/#float
> > > > DATE        http://www.w3.org/TR/xmlschema-2/#date
> > > > TIME        http://www.w3.org/TR/xmlschema-2/#time
> > > > TIMESTAMP   http://www.w3.org/TR/xmlschema-2/#dateTime
> > > > CHAR        plain literal
> > > > VARCHAR plain literal
> > > > STRING      plain literal
> > >
> > > This should use the standard SQL 2008 types, including BOOLEAN and BINARY
> > string types. (Probably the Direct Mapping can re-use the outcome of R2RML
> > ISSUE-48 here.)
> >
> > Labeled as an issue. Have you incorporated that into R2RML (when there's
> > not rr:datatype) so I can steal the text?
> >
> > > > The Direct Mapping is defined by a set of mapping functions from table
> > > > rows to RDF triples:
> > > >
> > > > dfn direct mapping: the set of RDF triples produced by invoking the
> > > >   <table mapping> on each table in a database.
> > >
> > > A minor stylistic point but I'd say: The direct mapping graph is the
> > union of the table graphs for each table.
> > >
> > > > dfn table mapping: the set of RDF triples created by invoking the
> > > >   <row mapping> on each row in a table.
> > >
> > > I'd say, the table graph of a table is the union of the row graphs for
> > each row.
> >
> > If I understand this, it implies the definition of table graph which might
> > then be defined row graphs. Is this your proposal?
> >
> > > > dfn row mapping: using a row identifier S for the row,
> > > >  the type triple:
> > > >    (S, rdf:type, <table type>)
> > > >  plus the scalar triples:
> > > >    for each attribute in the list of <scalars> where the attribute
> > > >      value is non-NULL:
> > > >      (S,
> > > >       the <property IRI> for the attribute,
> > > >       the <literal map> for the attribute value).
> > > >  plus the reference triples:
> > > >    for each list of attributes in the <non-unary references> where none
> > > >      of the attribute values are NULL:
> > > >      (S,
> > > >       the <property IRI> for the attributes,
> > > >       the <row identifier> for the referenced triple)
> > > > ]]
> > >
> > > I'd decompose this a bit: The row graph of a row is a graph consisting of
> > the following triples:
> > > - the row type triple
> > > - a data triple for each non-foreign key column where the data value is
> > non-null
> > > - a reference triple for each foreign key column ...
> > >
> > > And then:
> > >
> > > The row type triple of a row is an RDF triple with the following
> > components:
> > > - subject: the row IRI of the row
> > > - predicate: rdf:type
> > > - object: the table class IRI of the row's table
> > >
> > > et cetera.
> >
> > I worked from this angle for a bit, but the challenging thing was ensuring
> > the same subject without introducing some sort of hand-waiving about "the
> > current subject" or some such.
> > Recall that the containing table may not have a primary key (or even any
> > candidate keys).
> >
> 
> 
> Eric,
> 
> I agree with Richard on this one. Actually, we already have something like
> this (or practically identical)
> 
> http://www.w3.org/2001/sw/rdb2rdf/directMapping/#rules

Rev 1.4 is guaranteed to meet everyone's needs or your money back.
  http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-row%20graph


> > > I know this might not be politically correct in RDF circles, but again
> > I'll point out this post that I found very helpful when editing R2RML:
> > > http://ln.hixie.ch/?start=1140242962&count=1
> > >
> > > Best,
> > > Richard
> >
> > --
> > -ericP
> >

-- 
-ericP
Received on Friday, 5 August 2011 18:09:36 UTC