Re: feedback on the current Use case & Requirements for RDB2RDF...

Hi Ahmed, thanks for the input. I'm trying to map these to specific textual suggestions, which the most likely to capture the details needed for a decision by the working group.

* Ezzat, Ahmed <Ahmed.Ezzat@hp.com> [2010-05-09 05:26+0000]
> 
> Hi,
> 
> Thanks for putting the effort in generating this document.  Below is my feedback:
> 
> 1.      The use case section of the document looks fine.
> 
> 2.      The requirement section (3) needs work.  Below are some comments:
> *       New uncommon terminologies used which makes it difficult to read: putative, isomorphic, etc. Is isomorphic mapping is a requirement in our R2RML? People use today local + domain mapping and it works (uni-direction)?  Is isomorphic requirements comes from future looking and expect the same mapping to be bi-directional (read/write to the DBMS) and hence isomorphic mapping makes sense?  We need to discuss this one.

The generation of a direct graph would appear to be a requirement as it was the effect of the minimum D2R configuration.


> *       I thought a simple diagram like option-2 and option-3 in the diagram Juan sent out earlier seems reasonable to add.  P.S. option-1 is a special case of option-3.

LeeF had asked some questions teasing out the meaning of the diagrams. I was interested in the outcome of that conversation.


> *       In the relevant community people use local and domain ontology; why we are not using these terms to make it easier for readers?

#SHAPE refers to "shared" and "popular" ontologies, which i believe encompass domain ontologies as well as ontologies not focused on a specific domain. (e.g. the use of FOAF to represent people in an employees table).


> *       Why we are using graphs and labels terms vs RDF tuples and identifiers terms?

What's an RDF tuple?
Per identifier vs. label, #LABELGEN has a description in terms of identifiers:
[[
LABELGEN - Label Generation
RDF identifiers for objects in the conceptual model can, in some cases, be generated from a transformation of the schema and data in a tuple representing that conceptual model.
]]

I tried "Identifier Generation" instead of "Label Generation", but it I wouldn't know how to defend against the argument "you're not generating identifiers; you're using identifiers to generate labels." I thought that using the graph theory term was safer.


> *       Section 3.1.4, I am not clear what we are trying to say regarding database connection?  RDBMS has its own notion and I suspect in SPARQL there is well defined notion of end point.  Is the mapping language is involved in mapping RDBMS connections?

D2R, for example, has connection information like:
[[
map:MyDatabase a d2rq:Database;
 d2rq:jdbcDSN "jdbc:mysql://localhost/mydb";
 d2rq:jdbcDriver "com.mysql.jdbc.Driver";
 d2rq:username "user";
 d2rq:password "password".
]]
as does FeDeRate:
  http://swobjects.svn.sourceforge.net/viewvc/swobjects/trunk/tests/7tm_receptors/flat/receptors.map

One could argue that this is a combination of a map and connection information (that we should define only the mapping language), but there is also a user benefit to being able to swap implementations and use the same configuration information. Perhaps cygri would refer to this as "standardizing httpd.conf". I'm ambivalent.


> *       Section 3.1.5 (MicroParsing): I assume the editor is referring to using RDBMS UDF for transformation processing.  This seems an implementation detail and should not be included in mapping language specification or requirement?

I take "user defined output processing functions" to mean user-defined functions within R2RML (as opposed to SQL). (This came from the original use cases wiki so I'm doing some interpretation here.) I've added:
[[
Microparsing can come in many flavors. The RDB2RDF would like feedback as to which, if any, functions are needed in version 1 and what expressivity should be available to user-defined functions withing R2RML.
]]
to solicit community feedback.


> *       Section 3.1.6 (TableParsing): I am not clear on this one - did it mean table mapped UDF? It looks like implementation detail and should not be part of mapping language requirements?
> *       Section 3.1.7 (NamedGraph): I am not clear? Do we mean a query can return multiple graphs similar t JDBC returning multiple result sets?  Needs clarification, but look this section after clarification needs to move to Section 3.2 as non-core requirements.

The SPARQL query language expresses constraints that graph patterns come from particular named graphs¹. This would seem like a core aspect of the expressivity.


¹http://www.w3.org/TR/rdf-sparql-query/#queryDataset


> 3.      I suggest replacing current hybrid list of editors and authors with two lists:
> *       Authors: this should refer to all RDB2RDF group members listed in alphabetic order
> *       Editors: Eric and Michael

Happy to do this but I'll wait for someone to second as this is non-editorial.


> 4.      Finally, in few hours I am traveling in a  business trip and I will be back to the Bay Area late Wed. evening. I will miss this Tuesday meeting (regrets).  If Eric or Michale can handle this session - thanks.  I suggest the team to discuss input from all including the above points.  Hope Editors would capture/address the above issues and others, and generate a new version for final review.  Let us not rush and go out week or so earlier.
> 
> I will be very busy in this trip and will not be able to respond to emails next week at least a day or so after I come back.
> Regards,
> 
> Ahmed
> 
> 
> Ahmed K. Ezzat, Ph.D.
> HP Fellow, Strategic Innovation Architecture Manager,
> Business Intelligence Software Division
> Hewlett-Packard Corporation
> 11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691
> Office:      Email: Ahmed.Ezzat@hp.com<mailto:Ahmed.Ezzat@hp.com> Off: 408-447-6380  Fax: 1408796-5427
> Personal: Email: AhmedEzzat@aol.com<mailto:AhmedEzzat@aol.com> Tel: 408-253-5062  Fax:  408-253-6271
-- 
-ericP

Received on Sunday, 9 May 2010 17:56:13 UTC