Understanding recursive queries in PostgreSQL
Perhaps 3-4 years ago in an interview, I was asked a query to print a product hierarchy or something in the context of an e-commerce site database. I couldn’t, of course, as I barely had a handle on basic aggregations at that time.
To be fair, I still don’t know whether there’s a non-brain-twisting query that’d print the entire hierarchy of a tree-like system. However, I do know now have to do a subset of it – i.e., show all the results within a parent category, with several nested subcategories.
A typical example comes from employee data, where employees are also managers. We can represent such a situation in the table by adding another
CREATE TABLE employees ( employee_id serial PRIMARY KEY, full_name VARCHAR NOT NULL, manager_id INT );
So, yes, the top-level employees will have a
NULL or something, while all others will have a valid integer. Now, suppose we have to find all the people that are directly or indirectly managed by employee id
2. How do we do it in a single query? a
JOIN will not help us, simply because we don’t know how many times the query should run. We only know that it should start with id
2 and keep looking deeper until it reaches a dead-end in every search. Sounds a lot like recursion!
The nice thing here is that we don’t have to write a recursive query the way we write a function call (we also cannot, even if we try). SQL (and thus, PostgreSQL) has a special construct for such queries – the
RECURSIVE keyword and a well-defined structure.
It’s hard to explain without an example, so let’s see the query right away:
WITH RECURSIVE subordinates_of_2 AS ( SELECT employee_id, manager_id, full_name FROM employees WHERE employee_id = 2 UNION ALL SELECT e.employee_id, e.manager_id, e.full_name FROM employees e JOIN subordinates_of_2 s2 ON e.manager_id = s2.employee_id ) SELECT * FROM subordinates_of_2 WHERE employee_id <> 2;
WITH ... part is called a Common Table Expression (CTE), and can be thought of a name assigned to the parenthesized query that we reference in the final expression (
subordinates_of_2). Next comes the
RECURSIVE keyword, which tells the PostgreSQL engine that the expression coming next needs to be executed recursively. Recursively how?
To tell Postgres to execute the query recursively, we have to tell it three things:
- The base condition (which is always the case in recursion). This is the first part of the query that tells it to select the employee id
UNIONkeyword that precedes the recursive part basically tells Postgres to keep doing a
UNIONon all the intermediate rows generated as a result of the recursion.
- The recursive part of the query itself; in our case, it’s a
JOINthat tells SQL to match
manager_ids with the
employee_ids generated in the previous recursive calls.
Head hurts? Mine does too! As a programmer, it’s kinda hard to see how recursion gets executed in this query, but you’ll have to trust the engine on this one. The query starts by generating a list of employees with id
2, and then keeps
JOINing things until it doesn’t find any more employees who are managers. Since the final output includes the top-level manager as well, we’ve filtered it out using
WHERE employee_id <> 2.
In terms of syntax, that’s all there is to a recursive query. And yes, I admit it’s a bit of a brain-bender. But it solves a use case that’s pretty much impossible to achieve in a single query otherwise. So, let’s rejoice! 🥳