Re: Re-opening ISSUE-22 on vendor-specific SQL

On 31 May 2011, at 15:18, David McNeil wrote:
>> Hm … the names of the columns is already implied in the rest of the TriplesMap(s) that are using the query as its logical table. I'd be concerned about repeating this information onece more in a new property.
> 
> Yes, I considered this, a couple of thoughts:
> 
> * If the usage pattern is to define a SQL query as a re-usable entity then it makes more sense to me to define the columns on the query. I think of this like defining a view in the mapping that can be used in many TriplesMaps. From this perspective I like the idea of centrally defining the columns on the view rather than trying to reverse engineer it from the way it is used in various places. If the columns are included with the query definition then it is analogous to a physical table which has a name and column names.

Well, this kind of makes sense, but it seems orthogonal to ISSUE-22. My system might be able to parse column names out of a vendor-specific SQL query anyways because my parser is great. Or my system might not be able to parse SQL queries even if they are valid SQL 2008 because it treats all queries as opaque. Requiring column information *only* for vendor-specific SQL assumes a very specific implementation approach. I'm not quite comfortable with that.

> * I am looking down the road to general SQL composition. Say two SQL Queries are joined together. We need to know which column comes from which query. Perhaps this is tangling up a future concern that is not relevant to the current spec.

Yes, I'd see it as an R2RML 1.1 issue.

>> Devil's advocate: Just try to parse it as SQL 2008; if that fails, it's vendor-specific?
>> 
>> A problem I see is that the average mapping author may not be aware of the differences between SQL 2008 and the SQL dialect they use. So they would have trouble if asked to indicate which queries are vendor specific and which are not.
>> 
> 
> Yes, I think that is an entirely valid concern. On the other hand if a SQL query fails to parse as SQL 2008 it can often just mean that it is invalid.

Maybe.

Why would you want to parse the SQL at all in your R2RML implementation? I can see two reasons. Optimizations that require analyzing the query, and validation.

For optimization, it's actually unlikely to matter whether the query is SQL 2008. Maybe your optimization engine just can deal with a subset of SQL 2008, and treats complex SQL 2008 queries as opaque. Other systems might be specific to a single RDBMS engine (e.g., Oracle's implementation, I imagine) and will want to use a full parser for that dialect.

For validation, the vendor-specific flag is somewhat useful, because it allows an engine to treat parse failures as errors instead of warnings. But again it doesn't apply to single-RDBMS implementations. And for those implementations that are not bound to a single RDBMS, there will be market pressure that drive them towards being able to validate more than just core SQL 2008. And the ultimate way of validating a SQL query is by sending it to the server …

So I am still unconvinced that the vendor-specific flag solves more problems than it introduces.

I still prefer the proposal that said, essentially, “you can do vendor-specific queries but then you're on your own.” [1]

Best,
Richard

[1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0179.html

Received on Tuesday, 31 May 2011 19:37:18 UTC