Finalizing the duplicate-row-preservation proposal

Eric, Ivan, Ashok,

I've gone ahead and made a version of the R2RML ED with the proposed changes. This is of course not yet backed by a WG resolution, so it's all tentative. Diff here:
http://www.w3.org/2001/sw/rdb2rdf/r2rml/diffs/default-mapping.html

The relevant changes are in the new Section 4.4. Some wording is minimally different from the proposal linked in the thread below, to make it work in context. I also applied the clarifying change that I proposed deeper in the thread with Eric. I'm happy to discuss further improvements to the wording.

The proposal also is to add the following to the DM Introduction:

[[
This specification has a companion, the R2RML mapping language [R2RML], that allows the creation of customized mapping from relational data to RDF. R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.
]]
[R2RML] links to http://www.w3.org/2001/sw/rdb2rdf/r2rml/
“relaxed variant” links to http://www.w3.org/2001/sw/rdb2rdf/r2rml/#dfn-duplicate-row-preservation
“default mapping” links to http://www.w3.org/2001/sw/rdb2rdf/r2rml/#default-mappings
(final edits probably should actually link to http://www.w3.org/TR/r2rml/#...)

And strike the following sentence from the DM Introduction:

[[
The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML].
]]

I also note that the DM editors still need to apply the changes here before we can have final documents:
http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012Apr/0070.html

Best,
Richard



On 18 May 2012, at 12:40, Ivan Herman wrote:

> Richard,
> 
> On May 18, 2012, at 13:19 , Richard Cyganiak wrote:
> 
>> Ivan,
>> 
>> On 18 May 2012, at 05:13, Ivan Herman wrote:
>>> I cannot quote Shakespeare here, but, also  time is the essence.
>> 
>> “How poor are they that have not patience!”
>> 
>>> Ideally, I would love to have documents on the table on Tuesday on the base of which the WG could vote to move to PR.
>> 
>> The proposal to define an optional non-cardinality-preserving version of the DM was opposed by Eric, Ashok and you. Everyone else could live with it.
>> 
>> Now we are exploring whether we can overcome this opposition by defining this optional version in the R2RML spec rather than in the DM spec. This requires a few extra sentences to explain why the hell the R2RML spec should talk about such matters at all. The way to do this is by introducing the notion of an “R2RML default mapping generator” into the R2RML spec. The concrete wording proposal is here:
>> 
>> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0084.html
>> 
>> Eric is ok with this approach, modulo some rewording of the proposed R2RML text that we are currently discussing about. David and Ted +1'd it.
> 
> The proposal above is a reformulation of a proposal that I had put forward weeks ago, and you objected to... so it sounded obvious that I would be fine with it, modulo editorial discussions that I do not intend to be involved in. But you are right, I never explicitly +1-d it, so here you are:
> 
> +1 modulo editorial tweaks, to be agreed upon by the editors of the document.
> 
> My comments on timing were with that assumption. Ie: that you and Eric may finalize the text by next Tuesday, and I would then be happy to vote on the advancement to PR.
> 
> Ivan
> 
>> 
>> Now the only remaining roadblock is that I don't know whether you and Ashok uphold your opposition, or whether you can live with this compromise!
>> 
>>> We missed the deadline to get a Rec for Semtech; maybe we can get a PR for Semtech, which is the second best thing. But that would need a formal WG vote ASAP...
>> 
>> Well, it would help tremendously if you could indicate whether the proposal as worded above works for you.
>> 
>> Best,
>> Richard
>> 
>> 
>>> 
>>> Ivan 
>>> 
>>> ---
>>> Ivan Herman
>>> Tel:+31 641044153
>>> http://www.ivan-herman.net
>>> 
>>> (Written on mobile, sorry for brevity and misspellings...)
>>> 
>>> 
>>> 
>>> On 18 May 2012, at 01:18, ashok malhotra <ashok.malhotra@oracle.com> wrote:
>>> 
>>>> Richard:
>>>> Since we seem to be converging on your proposal could you send mail with the suggested words.
>>>> Eric's 3 column format is cool but I cannot cut and paste from it.
>>>> We are dealing with a corner case so we should not give too much importance to it with a large
>>>> number of words.  "Brevity" as Polonius says in Hamlet "is the soul of wit".
>>>> All the best, Ashok
>>>> 
>>>> On 5/17/2012 1:58 PM, Eric Prud'hommeaux wrote:
>>>>> just to be clear, i have every confidence that we are working towards the same design, and that you'll document it well. i am, however, happy to tool on the wording.
>>>>> 
>>>>> 
>>>>> * Richard Cyganiak<richard@cyganiak.de>  [2012-05-17 20:52+0100]
>>>>>> Eric,
>>>>>> 
>>>>>> Comments inline.
>>>>>> 
>>>>>> On 17 May 2012, at 04:53, Eric Prud'hommeaux wrote:
>>>>>>> I think I favor the explicitness of Richard's with a couple textual proposals below:
>>>>>>> 
>>>>>>> 
>>>>>>>> ---- Ivan ----                      ---- Richard ----                  ---- Ashok ----
>>>>>> Three-column side-by-side text? O_o
>>>>>> 
>>>>>>>> =DM Intro=                        =DM Intro=                         =DM Intro=
>>>>>>>> The Direct Mapping is intended>>This specification has a         The Direct Mapping is intended
>>>>>>>> to provide a default behavior     companion, the R2RML mapping       to provide a default behavior
>>>>>>>> for R2RML: RDB to RDF Mapping     language [R2RML], that allows      for R2RML: RDB to RDF Mapping
>>>>>>>> Language [R2RML]>>for tables     the creation of customized         Language [R2RML]>>₁<<. It can
>>>>>>>> which have at least one unique    mapping from relational data       also be used to materialize
>>>>>>>> key<<. It can also be used to     to RDF. R2RML defines a            RDF graphs or define virtual
>>>>>>>> materialize RDF graphs or         relaxed variant of the Direct      graphs, which can be queried
>>>>>>>> define virtual graphs, which      Mapping intended as a default      by SPARQL or traversed by an
>>>>>>>> can be queried by SPARQL or       mapping for further                RDF graph API.
>>>>>>>> traversed by an RDF graph         customization.<<  It can also
>>>>>>>> API.                              be used to materialize RDF>>₁ Except in the case of
>>>>>>>>                                graphs or define virtual           tables or views without a
>>>>>>>>                                graphs, which can be queried       primary key.  In this case,
>>>>>>>>                                by SPARQL or traversed by an       identical rows may be kept
>>>>>>>>                                RDF graph API.                     distinct by the DM and
>>>>>>>>                                                                   collapsed into a single row
>>>>>>>>                                                                   by R2RML<<
>>>>>>> Like Ashok, I was tempted to be explicit about what a "relaxed variant" is. As it turns out, it's identical to the DM over the unique rows.
>>>>>>> I think it might be a bit awkward so I'm tempted to use Ricarhd's wording directly,
>>>>>> This is just the introduction; the purpose is just to give a brief account of how the two specs relate. The imprecise phrase “relaxed variant” should be a link directly to the new section of R2RML, so anyone who wonders what it means just needs to click.
>>>>> works for me
>>>>> 
>>>>>>> but if folks think it's worth the extra noise, here's what I wrote:
>>>>>>> [[
>>>>>>> s/R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.
>>>>>>> /R2RML uses the Direct Mapping as a default mapping for further customization. For tables with no unique keys, R2RML implementations may use the Direct Mapping over only the unique rows in tables with no unique key.
>>>>>>> /
>>>>>> Yeah, this would be ok too, although it seems to much detail for the introduction.
>>>>> let's leave it out of DM.
>>>>> 
>>>>>>> The other minor mod is s/It can also/The Direct Mapping can also/ 'cause the antecedent has gotten stale by the time you get there.
>>>>>> The intention in my proposal was to move the sentence starting “It can also” before the sentence(s) that explains the R2RML relationship. Either way is ok.
>>>>>> 
>>>>>>>> are generated from column         Duplicate row preservation:
>>>>>>>> values, R2RML mappings do not     For tables without a primary
>>>>>>>> preserve repeated rows in SQL     key, the Direct Graph requires
>>>>>>>> databases.<<                       that a fresh blank node is
>>>>>>>>                               created for each row. This
>>>>>>>>                               ensures that duplicate rows in
>>>>>>>>                               such tables are
>>>>>>>>                               preserved. This requirement is
>>>>>>>>                               relaxed for R2RML default
>>>>>>>>                               mappings: They MAY re-use the
>>>>>>>>                               same blank node for multiple
>>>>>>>>                               duplicate rows. This behaviour
>>>>>>>>                               does not preserve duplicate
>>>>>>>>                               rows. Implementations that
>>>>>>>>                               provide default mappings based
>>>>>>>>                               on the Direct Graph MUST
>>>>>>>>                               document whether they preserve
>>>>>>>>                               duplicate rows or not.<<
>>>>>>> In order to make users life easier, let's add that they must be consistent about using the DM or the uniquified variant:
>>>>>>> 
>>>>>>> s/Graph MUST document whether
>>>>>>> /Graph MUST be consistent about whether or not duplicate rows are exposed in the output dataset, and document whether
>>>>>>> /
>>>>>> This seems imprecise. It says that *implementations* must be consistent. The language should make clear which of these is allowed:
>>>>>> 
>>>>>> • One implementation that supports multiple different DB engines, and generates a preserving default mapping for Oracle and a non-preserving for MySQL
>>>>>> 
>>>>>> • An implementation that has a switch where the user can choose the behaviour when invoking the default mapping generator
>>>>>> 
>>>>>> • An implementation that generates a preserving default mapping if and only if it knows that is has write access to the DB
>>>>>> 
>>>>>> • An implementation that generates a default mapping that preserves duplicate rows only in the unlikely case that a unique key (but no primary key) is present
>>>>>> 
>>>>>> • An implementation that generates a default mapping that preserves duplicate rows over base tables, but not over views
>>>>>> 
>>>>>> I think one could argue that all of these are reasonable and should be allowed, as long as it's properly documented and users know what's going on. But regardless, making the phrasing sufficiently precise to discriminate between these cases may make it too complicated to be worth it.
>>>>> How about just ruling out an implementation which preserves cardinality for some operations but treats the table as a set for others? For example, an implementation which provides a non-materialized view of a non-unique table mustn't treat the table as unique when answering queries with variable predicates ("SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER regex(?p, '^http://foo.example/db/IOUs/') }") but preserve cardinality when answering queries with fixed predicates ("SELECT ?who ?amount WHERE { ?x IOUs:fname ?who ; IOUs:owes ?amount }").
>>>>> 
>>>>> Any idea how to say that?
>>>>> 
>>>>>>> (This is a forward ref to output dataset, ugh.)
>>>>>> It wouldn't be the first one in the R2RML spec :-( One could make this Section 12 instead of 4.4 to avoid the forward ref, but I'm not sure that's better in the end.
>>>>>> 
>>>>>> Best,
>>>>>> Richard
>>>>>> 
>>>> 
>>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 

Received on Friday, 18 May 2012 14:37:11 UTC