comparison of no-functional-change proposes for no-primary-key issue

* ashok malhotra <ashok.malhotra@oracle.com> [2012-05-16 05:59-0700]
> Eric:
> The statement that the DM provides default behavior appears in the DM spec, so it needs to
> be addressed there.

Apologies, quite right.

> I do not think there is disagreement with your points 1 to 3 but we need a succinct statement
> that captures the situation.  I have no real quarrel with the words Richard suggested except that
> I want to say the same thing in fewer words.

I understand your intent, but think that some extra words are useful. Below are the three proposals that I see on the table. I've laid them out side-by-side (just below) and sequentially with long lines (further below). Changes are embedded in >><<s. The first included paragraph of the R2RML Intro is the same for all proposals; Ivan's proposal appends some text to it, and I've included it for context.

---- Ivan ----                      ---- Richard ----                  ---- Ashok ----    
=DM Intro=       =DM Intro=          =DM Intro=    
The Direct Mapping is intended     >>This specification has a        The Direct Mapping is intended  
to provide a default behavior     companion, the R2RML mapping       to provide a default behavior    
for R2RML: RDB to RDF Mapping     language [R2RML], that allows      for R2RML: RDB to RDF Mapping    
Language [R2RML] >>for tables     the creation of customized        Language [R2RML]>>₁<<. It can  
which have at least one unique     mapping from relational data       also be used to materialize    
key<<. It can also be used to     to RDF. R2RML defines a        RDF graphs or define virtual    
materialize RDF graphs or     relaxed variant of the Direct      graphs, which can be queried    
define virtual graphs, which     Mapping intended as a default      by SPARQL or traversed by an    
can be queried by SPARQL or     mapping for further         RDF graph API.]]   
traversed by an RDF graph     customization.<< It can also              
API.]]        be used to materialize RDF        >>₁ Except in the case of  
        graphs or define virtual        tables or views without a  
           graphs, which can be queried       primary key.  In this case,    
        by SPARQL or traversed by an       identical rows may be kept    
        RDF graph API.]]         distinct by the DM and     
               collapsed into a single row    
               by R2RML<<      
             
=R2RML Intro=       =R2RML Intro=         =R2RML Intro=        
This specification has a     This specification has a        This specification has a    
companion that defines a     companion that defines a        companion that defines a    
direct mapping from relational     direct mapping from relational     direct mapping from relational             
databases to RDF [DM]. In the     databases to RDF [DM]. In the      databases to RDF [DM]. In the    
direct mapping of a database,     direct mapping of a database,      direct mapping of a database,    
the structure of the resulting     the structure of the resulting     the structure of the resulting    
RDF graph directly reflects     RDF graph directly reflects        RDF graph directly reflects    
the structure of the database,     the structure of the database,     the structure of the database,    
the target RDF vocabulary     the target RDF vocabulary        the target RDF vocabulary    
directly reflects the names of     directly reflects the names of     directly reflects the names of    
database schema elements, and     database schema elements, and      database schema elements, and    
neither structure nor target     neither structure nor target       neither structure nor target    
vocabulary can be      vocabulary can be         vocabulary can be     
changed. With R2RML on the     changed. With R2RML on the        changed. With R2RML on the    
other hand, a mapping author     other hand, a mapping author       other hand, a mapping author    
can define highly customized     can define highly customized       can define highly customized    
views over the relational     views over the relational        views over the relational    
data.        data.          data.       
        
>>R2RML implementations are     >>==4.4 Default Mapping==   
encouraged to provide a      An R2RML processor MAY include
default mapping equivalent to     an *R2RML default mapping   
the Direct Mapping for tables     generator*. This is a facility         
which have at least one unique     that introspects the schema of   
key. For tables with no unique     the input database and             
key and which have multiple     generates a *default mapping          
identical rows, the output     document* intended for further   
dataset produced by the      customization by a mapping     
default mapping will be      author. The R2RML mapping     
equivalent to the Direct     expressed in the default     
Mapping over the unique rows     mapping document SHOULD be     
in that table.<<      such that its output is the   
        Direct Graph [DM]    
=R2RML 6.1=       corresponding to the input   
>>Because rr:IRI and      database.     
rr:BlankNode subject labels           
are generated from column     Duplicate row preservation:   
values, R2RML mappings do not     For tables without a primary  
preserve repeated rows in SQL     key, the Direct Graph requires
databases.<<       that a fresh blank node is   
        created for each row. This   
        ensures that duplicate rows in  
        such tables are      
        preserved. This requirement is
        relaxed for R2RML default   
        mappings: They MAY re-use the   
        same blank node for multiple    
        duplicate rows. This behaviour  
        does not preserve duplicate   
        rows. Implementations that     
        provide default mappings based  
        on the Direct Graph MUST     
        document whether they preserve
        duplicate rows or not.<<   

Again, in linear format with long lines:

---- Ivan ----
=DM Intro=
The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML] >>for tables which have at least one unique key<<. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]

=R2RML Intro=
This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.

>>R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key. For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table.<<

=R2RML 6.1=
>>Because rr:IRI and rr:BlankNode subject labels are generated from column values, R2RML mappings do not preserve repeated rows in SQL databases.<<


---- Richard ----
=DM Intro=
>>This specification has a companion, the R2RML mapping language [R2RML], that allows the creation of customized mapping from relational data to RDF. R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.<< It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]

=R2RML Intro=
This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.

>>==4.4 Default Mapping==
An R2RML processor MAY include an *R2RML default mapping generator*. This is a facility that introspects the schema of the input database and generates a *default mapping document* intended for further customization by a mapping author. The R2RML mapping expressed in the default mapping document SHOULD be such that its output is the Direct Graph [DM] corresponding to the input database.

Duplicate row preservation: For tables without a primary key, the Direct Graph requires that a fresh blank node is created for each row. This ensures that duplicate rows in such tables are preserved. This requirement is relaxed for R2RML default mappings: They MAY re-use the same blank node for multiple duplicate rows. This behaviour does not preserve duplicate rows. Implementations that provide default mappings based on the Direct Graph MUST document whether they preserve duplicate rows or not.<<


---- Ashok ----
=DM Intro=
The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML]>>₁<<. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
>>₁ Except in the case of tables or views without a primary key.  In this case, identical rows may be kept distinct by the DM and collapsed into a single row by R2RML<<

=R2RML Intro=
This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.


> Feel free to suggest text.
> All the best, Ashok
> 
> On 5/15/2012 8:23 PM, Eric Prud'hommeaux wrote:
> >* ashok malhotra<ashok.malhotra@oracle.com>  [2012-05-15 15:19-0700]
> >>I think we just need to fix the DM.  If you disagree, please indicate what else needs to be said.
> >But what exactly is broken in the DM?
> >
> >That's a somewhat glib question, but the point I made during the call today (which I thought actually caught some momentum) was this:
> >   1 The DM is able to preserve cardinality over tables with potentially repeated rows.
> >   2 R2RML is not able to preserve cardinality over tables with potentially repeated rows while staying within pure SQL (that is, you may be able to use e.g. rownums or assignable variables in different flavors of SQL, but in the SQL that we're targeting, the required behavior exceeds the expressivity of SQL).
> >   3 For every situation where an R2RML processor would be unable to produce a DM as a default behavior (that is, those where the DM preserved cardinality and R2RML does not), the users need to be warned that, because they have potentially repeated rows in non-unique tables, the R2RML representation will lose some of the information in the database.
> >   4 Future versions of R2RML will likely address this issue, making it enabling a generic R2RML processor to capture all of the information in repeated rows, and therefor able to use the DM for these cases.
> >
> >This points to following Ivan's proposal<http://www.w3.org/mid/FD9565BB-380D-474B-9453-60C7CAF6072E@w3.org>  (add caveat text about when the DM is not the default or non-configured R2RML behavior). Adding text to the R2RML text in Ivan's proposal would help users understand the issues and the outcome. The current text is point 2 in Ivan's mail:
> >[[
> >2. Add to the R2RML document (probably in the intro part): "R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key"
> >]]
> >Adding this text would specify the behavior when there is no unique key:
> >[[
> >For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table.
> >]]
> >
> >It's possible that we'll want to s/mapping will be equivalent/mapping MAY be equivalent/ because the simple mapping for SPARQL queries analogous conventional SQL queries, e.g.
> >   SELECT ?who ?owes { ?debt<IOUs#fname>  ?who ;<IOUs#amount>  ?owes }
> >or
> >   SELECT ?fname (SUM(?owes) AS ?payupnow) { ?debt<IOUs#fname>  ?fname ;<IOUs#amount>  ?owes } GROUP BY ?fname
> >would preserve cardinality unless one specifically invoked a subselect which grouped by all of the unique columns. (This consistency problem will arise R2RML regardless of whether DM is relaxed to potentially lose cardinality.)
> >
> >
> >>The DM spec says:
> >>[[The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language<http://www.w3.org/TR/2012/CR-r2rml-20120223/>  [R2RML]<http://www.w3.org/TR/rdb-direct-mapping/#R2RML>. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
> >>
> >>Add an asterisk after the first sentence and a footnote.  The footnote says:
> >>[[Except in the case of tables or views without a primary key.  In this case, identical rows may be kept distinct
> >>by the DM and collapsed into a single row by R2RML]]
> >>
> >>R2RML says:
> >>[[This specification has a companion that defines a direct mapping from relational databases to RDF<http://www.w3.org/TR/rdb-direct-mapping/>  [DM<http://www.w3.org/TR/r2rml/#DM>]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.]]
> >>
> >>No change needed.
> >>-- 
> >>All the best, Ashok

-- 
-ericP

Received on Wednesday, 16 May 2012 20:22:06 UTC