Comments on Eric's Section 2 from Richard Cyganiak on 2010-11-07 (public-rdb2rdf-wg@w3.org from November 2010)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 7 Nov 2010 12:13:38 +0800
To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <3AB73FFE-1FF7-429C-9AB6-1E38E4281273@cyganiak.de>
All,

I'm travelling and a few days behind the latest RDB2RDF news and  
continue to be baffled by events, especially the decision by Ashok and  
Thomas to abandon work on Eric's version of the direct mapping  
document in favour of the Juan/Marcelo version.

I had a checkout of Eric's version and reviewed it while on the plane,  
which now apparently was a waste of time, but I'll share the comments  
anyway.

Having read both documents, I think that Eric's is better written,  
gets the same information across in a more concise and accurate way,  
and has just sufficient examples to make everything clear. It deals  
with corner cases that are not addressed in the /alt version.  
Altogether I think that it's superior to the /alt document. I still  
don't understand why Juan and Marcelo have forked the document in the  
first place, but seriously I don't think that their changes have led  
to a superior Section 2 -- their version simply says the same things  
in a generally harder-to-digest style in more words.

For the record: If the issues that I list below can be addressed,  
along with the three from my other email I sent earlier, then I  
support publication of an FPWD that consists of:

- Eric's sections 1 and 2
- followed by Eric's set semantics based formal approach
- and Juan/Marcelo's datalog based formal approach
- with an issue box explaining that both of these are work-in-progress  
candidates for the formal semantics.

And that's the last thing I intend to say about the direct mapping  
thingy until the three editors have managed to present the WG with a  
single version of the document endorsed by all of them.

Best,
Richard


Comments on Eric's draft

1. Section 2.1 is IMHO unnecessary and confuses more than it helps. I  
would move its first two sentences into the Introduction, and remove  
the rest, in particular the SPARQL example. The same goes for the  
SPARQL example in 2.4, I would remove it. SPARQL query evaluation is a  
completely different topic and requires a ton of knowledge that is not  
essential for understanding the default mapping, so I honestly don't  
see how this helps the average reader.

2. Section 2.2: The predicate for reference triples is described as:  
“an IRI composed of the stem, table name and column name and value for  
each column in the foreign key”. I don't understand why it says “and  
value”? The object is described as: “the subject created for the  
referred triple”. Do you mean “referenced row”?

3. Please provide a rationale for the “#_” at the end of generated  
IRIs in the text. In my opinion, this is entirely unnecessary and a  
useless complication. I see there is an issue box for that in the  
document, that's great, but if you want to have the “#_” thing in the  
FPWD then there should be text stating why it is necessary. My  
proposal for FPWD would be to s/#_//g and state in the issue box that  
this is subject to more discussion.

4. Inconsistency: Section 2.2 states that predicate IRIs have hashes,  
while all the examples have slashes.

5. You should define the terms “row IRI” or “row identifier” and  
“column IRI”, and use them throughout, instead of saying sloppy things  
like “a IRI composed of the stem, table name and column name” or “the  
subject of the referenced row”. I think this is done pretty well in  
the directGraph/alt draft.

6. Why a reference to [SQL99]? I thought we had agreed to use SQL Core  
2008? You can copy the reference from the R2RML draft.

7. Both “URI” and “IRI” are used. I suppose it should be “IRI”  
everywhere?

8. In order to have an improved narrative in the section titles, I  
propose splitting 2.2 into one section “Identifiers for rows and  
columns” and one section “Row mapping rules”. (Not essential for FPWD)

9. Section 2.5: “Hierarchies” can refer to many things in an SQL  
context, so it's a bit hard to figure out what the section refers to.  
The first sentence should perhaps talk about “hierarchies of tables  
that represent specializations of the same concept” or something  
similar. The People table should perhaps be removed from the example,  
because it is not relevant to the example and makes understanding the  
relevant parts of the example harder.

10. Given that the question of many-to-many table mappings is an open  
issue, there should be at least a section about it that is empty  
except for an issue box. (I have more to say on this topic, but don't  
expect that discussion to be resolved before FPWD)

11. See my comments to Juan and Marcelo asking for inclusion of table  
IRIs and of a triple that associates each row to its table. I'd really  
like to see a proposal for this in the FPWD, but at least an issue box  
would be essential. I note that the directGraph/alt version already  
has this.
Received on Sunday, 7 November 2010 04:14:10 UTC