I’ve recently come to the conclusion that Common Table Expressions (CTE) in SQL are almost always undesirable. CTE’s are a powerful feature. But unless they’re recursive, CTE’s are just replacements for views. And a query that uses a view is always going to benefit from better test-ability and reduction in complexity from decomposition, in addition to providing an opportunity for code reuse.
For example let’s examine a query for customers with an average order over $1000 with a CTE and then with a view.
with order_summary as ( select orders.customer_id , customers.customer_name , count(1) number_of_orders , sum(price) total_price from sales.orders left outer join sales.customers on customers.customer_id = orders.customer_id group by orders.customer_id, customers.customer_name ) select customer_id ,customer_name from order_summary where 1000 < (total_price / number_of_orders);
With a CTE this query is 13 lines. First it aggregates, then it filters. It is not the longest I’ve seen, but it should be shorter, and thousand line SQL statements don’t make good examples. A query using the view from my post on testing views is less than a third of the length.
select customer_id ,customer_name from sales.customer_orders where 1000 < (total_price / number_of_orders);
How do you test the CTE in the first version? You can’t without altering the main query or cutting-and-pasting code, or some other equally inefficient and error-prone process.
More importantly, CTEs are a smell that you have a database design problem. Code smells are small-scale anti-patterns which on their own aren’t serious enough to be defects, but which usually are indicators that related code is defective or brittle. CTE’s are a smell that stems from a lack of understanding of encapsulation and a team that doesn’t trust each other.
Chapter 7, “Building For Maintainability”, of Guernsey‘s excellent book Test Driven Database Development discusses exposing an interface and hiding the implementation. A unit of code should have a single responsibility. The use of CTEs should be discouraged as they always have a different responsibility than the main query. Those different levels of abstraction make programming much more difficult.
The bigger problem is that frequent use of CTE’s usually indicates that the developers don’t have permission to create views. Which in turn is a reflection of a dysfunction in the team. It is a truth universally acknowledged that better software results when everyone works together and trusts each other. If your database administrator is the only team member who can create database objects, then overuse of CTE’s is likely to be the least of the problems in the resulting software.
In the first paragraph I said that CTE’s are almost always undesirable. The only use case for CTE’s is recursion. I will recommend that any recursive CTE ought to be encapsulated into a view with accompanying tests.
Don’t repeat yourself. Write views once and cover them with tests. Then write the simplest, shortest queries that you can. You may be surprised by the size the productivity and morale benefits that will result.
Leave a Reply