Comments on RDB2RDF Use Cases and requirements for Mapping RDB to RDB2RDF, Ahmed Ezzat
Version reviewed: Apr 20, 2010 Revised 2010/05/11 17:21:15
I am skipping few aspects/sections of the document which I think are fine for a DRAFT, and instead I will be focusing on issues that should be addressed in the DRAFT. Below are some general observations:
- Motivation (Section 1.2):
· Having standard RDB2RDF mapping would allow viewing the RDBMS data as RDF independent of the physical database hosting the data; it is like viewing web page using different browsers.
· Having standard mapping would simplify programming applications that access multiple database sources.
- Use Cases (Section 2): seems reasonable. It can be improved by highlighting 2 scenarios:
· Automatic transformation of RDBMS data into RDF to either apply SPARQL or do reasoning on the RDF data. P.S. It is not clear why not using SQL directly on the RDBMS data and analytics on the database data?
· Data Integration: this includes database federation scenarios or structured and unstructured data integration. I personally think that is where RDF should shine!!!
- Requirement Section (Section 3): This is where I have difficulties/issues:
· General: we do not want to go out with a feature and then retract.
· This section is full with new non-standard terminologies and in few instances is very confusing – examples below.
· Functional Requirements (Sections 3.1.1 – 3.1.2): There is no smooth flow from the use cases to these requirements. I think the requirements can be:
a) Automatic Mapping where the mapping language will transform the relevant database data to RDF where the customer can apply SPARQL queries or does reasoning on the generated RDF, i.e., use case -2.
b) Mapping the database data to a given application domain. This scenario is needed for data integration applications, i.e., use cases 1, 3, 4.
P.S. The above are requirements. Specifying one-step mapping or two-steps mapping is not a requirement and is rather implementation details.
· SQLGEN (Section 3.1.3): Translating SPARQL to SQL is not part of the charter and way beyond the time axis we have. Justifying this requirement for lack of domain ontology (Use case 4) is a stretch. First, there are tools for ontology generation. Second, why not using the automatic mapping approach, i.e., after all you do not have domain ontology! We need to discuss this requirement!
· LABELGEN (Section 3.1.4): How about calling this requirement: Generating globally unique identifiers – use case 3.
· SQL Data Types Supported (Section 3.1.5): we need to specify which SQL data types that our mapping language will not support or will not support initially. In addition I see the draft citing only SQL-92 (ISO 9075). How about SQL-99 (ISO/IEC 9075-1:1999, 9075-2:1999, 9075-3:1999, and 9075-4:1999) – supporting non-scalar data types. SQL-2003, etc.
· Connection (Section 3.1.6): I am not sure it is relevant – delete this one.
· Microparsing (Section 3.1.7): Too complicated and I am not clear to what it is? I am guessing we need to say: Custom Mapping to Unique Database Vendor Specific Data Types. In a sense this is comparable to most RPC allowing application programmer to provide their own serialization routine for scenarios like circular data structures. If the intention is what I think it is, this is an important requirement, otherwise we should add what I described as additional requirement and needs explanation to what this section means?
· Table Parsing (Section 3.1.8): It is not clear what is an attribute-value table? SQL tables are set of attributes and each tuple has a value for each attribute? Can you elaborate? If my recollection is accurate, there was a reference about columns that are strings? If this is the intention, it is not clear how do you translate free text to RDF? This is a significant piece of work in the unstructured world? Typically you need either rules or large corpus of data to learn from and then do entity extraction and then semantics relationship discovery!!!! If it really means translating free text into RDF I suggest to be moved to non-essential requirement or removed from the document.
· Named Graph (Section 3.1.9): I am not following the meaning of “creation of multiple named graphs within one mapping definition?”
· Non-Functional requirements (Section 3.2):
a) Metadata: I am not following licensing information to access database? Typically user/password or OS account. Can you elaborate?
b) Updates – update logs: I am not following. RDB2RML is a mapping language to data types, what the logs are needed for? Is it record the fact mapping happened? Can you elaborate?
· Application requirements (Section 3.3): I am not following what this section is meant to say; you might want to delete.