PotentialSQLIssues

From RDB2RDF
Revision as of 17:24, 2 February 2010 by Eric (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

RDB2RDF HomeGenerated SQL

This page documents known performance killers in various DBs. Editor's, please include examples, and help flush out the very serious template below.

Queries on Sundays

database: Oracle description: queries performed on Sundays take until Monday to complete.

reason: Sunday is the optimizer's day off.

example:

 asdf
 jkl;

Trivial subquery on MySQL

This is a simple table with two columns, ID (int PK autoincrement) and value (varchar). It is filled with 10M random rows.

mysql> select * from t where id=1234567;
+---------+-----------------------------------------+
| id      | value                                   |
+---------+-----------------------------------------+
| 1234567 | rkyyxsmtygevumfhforzhnrobcntpingsmeqerk | 
+---------+-----------------------------------------+
1 row in set (0.04 sec)

mysql> select * from (select * from t) as t2 where id=1234567;
+---------+-----------------------------------------+
| id      | value                                   |
+---------+-----------------------------------------+
| 1234567 | rkyyxsmtygevumfhforzhnrobcntpingsmeqerk | 
+---------+-----------------------------------------+
1 row in set (2 min 14.99 sec)

The second query can be trivially optimised and executed in the same way as the first, but is slower by two orders of magnitude.

Self-joins are not optimised on MySQL

Same table as in previous case.

The first query has ten aliases of the table, which are then all joined to each other on the primary key, and a few hundred rows are selected (with LIKE). A value from each aliased table is returned. The main point is the high number of self-joins, which is typical when translating large SPARQL queries to SQL in the trivial way.

Because the joins are all on the PK, the joins can be trivially optimised away. The second query, which is obtained by optimising the first, returns exactly the same results, and runs three orders of magnitude faster.

mysql> select t1.value, t2.value, t3.value, t4.value, t5.value, t6.value, t7.value, t8.value, t9.value, t10.value
  from t as t1, t as t2, t as t3, t as t4, t as t5, t as t6, t as t7, t as t8, t as t9, t as t10
  where t1.id=t2.id and t1.id=t3.id and t1.id=t4.id and t1.id=t5.id and t1.id=t6.id and t1.id=t7.id and t1.id=t8.id and t1.id=t9.id and t1.id=t10.id and t10.value like 'nnn%';
581 rows in set (17.12 sec)

mysql> select value, value, value, value, value, value, value, value, value, value from t where value like 'nnn%';
581 rows in set (0.04 sec)