Pages

25 March, 2015

Readable and Fast Math in SQL

For my dissertation, I write a lot of SQL queries, doing some Math on the data. For instance, the following query computes the relative times from a numeric timestamp t, and scales the result up by 10000.

-- query 1
with Q    as (select t,v from csv_upload),
     Q_b  as (select min(t) as t_min, max(t) as t_max from Q)
select 10000 * (t - (select t_min from Q_b))
             / (select t_max - t_min from Q_b) as tr from Q

As you can see, I use CTEs to be able to read my code ;-). However, the select statements in the final subqueries, extracting scalar values from the computed relations with one record, impair the readability of the actual Math that is to be computed.

That is why modern SQL databases allow columns from parent subqueries to be used in nested child subqueries. The following query computes the same result.

-- query 2
with Q    as (select * from csv_upload),
     Q_b  as (select min(t) as t_min, max(t) as t_max from Q)
select (select 10000 * (t     - t_min)
                     / (t_max - t_min) from Q_b) as tr from Q

Finally, another, if not the best way of writing such queries is the following.

-- query 3
with Q    as (select * from csv_upload),
     Q_b  as (select min(t) as t_min, max(t) as t_max from Q)
select 10000 * (t     - t_min)
             / (t_max - t_min) as tr from Q,Q_b

Even though all three queries are very similar, and yield the same result, I saw notable differences in query execution time. In general, query 2 was a bit slower, and query 3 was a bit faster than the others.

Conclusion
For my queries, using nested columns improves readability but decreases performance. If you have computed relations with one record, such as the boundary subquery Qb, it is safe to join these relations with your data.

No comments: