When working with databases, most of the time, all you need is
UPDATE (CRUD operations), few
WHERE clauses and that's about it. But, sometimes you will run into datasets and tasks, that will require you to use much more advanced features of SQL. One of them is CTE or common table expression and more specifically, Recursive CTE, which we can use to make recursive SQL queries. Let's look at how it works and what we can do with it!
Why Recursive Queries?
First of all, why is this actually even a thing? What are recursive queries good for? — In general recursive queries come in handy when working with self-referential data or graph/tree-like data structures. Just a few examples of these use cases are:
- Manager -> Subordinate (employee) relationship
- Category -> Subcategory -> Product relationship
- Graphs — Flight (Plane Travel) map
- Any taxonomy system — books, animals, genetics…
- Links between articles — for example on Wikipedia
- Comment section — for example threads on Reddit
For any of these examples, it would require you to build temporary tables or use cursors to actually get useful data from such datasets. Now that I hopefully persuaded you that recursive queries are pretty useful, let’s see what they are and how we can use them.
What Will We Need?
Recursive queries leverage something called CTE — Common Table expression, which is SQL
WITH clause, more commonly used for simplifying very complex queries (subqueries). Let's look at an example of that:
This is a very simple example, but you can surely imagine how this can be used to make very complicated queries containing multiple subqueries much more readable.
Apart from the syntax above, we will also need some data that we can use for our recursive queries. For that we will use Manager — Subordinate hierarchy using one self-referencing table, defined like so:
Here’s SQL to create such a table including some data to play with:
So, with that out of the way, let’s build some recursive queries!
To iterate is human, to recur, divine — L. Peter Deutsch
Let’s start with a formal structure of recursive SQL query:
It looks pretty simple, but doesn’t tell us much, so let’s see a human-readable example:
And the output of that query:
In this example, our non-recursive base case is just
SELECT 1 and recursive part is
SELECT n+1 FROM nums WHERE n+1 <= 10. This part creates recursion by referring to name of CTE (
nums) and unioning all the results. At the end we have
WHERE clause to terminate the recursion so that our query doesn't run infinitely.
Real World Examples
The previous section showed a very basic example that explains how it works but doesn’t really show how the recursive CTE can help us. So, let’s go through some examples using Manager — Subordinate hierarchy data from above:
Getting “manager tree” for some employee:
This query can be used to generate a list of all subordinates of some manager, which essentially is subtree starting from some specific employee. As a base case here, we select one employee (manager) by ID and we also include
level to indicate how many levels down we went in the tree. In the recursive part we
JOIN employees table to CTE (
managers) using manager IDs. Again, we include
level by incrementing level of results from the previous step of recursion. This is the output:
To better visualize what happens there, look at the following tree. The highlighted part is what is returned by query:
Another practical example using same data is computing degrees of separation between 2 employees:
The recursive CTE is quite similar to the one in the previous example. To simplify things, we only select ID and
degree (instead of level). Next, we need a query that looks up and tells us degrees of separation for 2 employees - to do so, we join
employees table to our previously built
rec table containing IDs and
degrees using IDs in respective tables. In the final
SELECT part we do some formatting to get nicer output and in
WHERE clause we optionally specify the other (second) employee for whom we want the degrees of separation. Output:
Again, playing with the same data, let’s find out what might be the job progression in the hypothetical company:
This time we run the recursion from the bottom up. This is achieved by flipping the
ON clause, compared to previous examples. To create readable output, we use
STRING_AGG function, which takes rows from above
SELECT and concatenates them using
> as a delimiter. This is what the query gives us:
Recursion on Graphs
Enough of that employee data set, let’s explore what we can to with graphs — graphs as a data structure are an ideal candidate for traversing using recursion. So, let’s first define a simple table and insert some data, so we have something to play with:
As an example, we will do the obvious thing — find paths as graph. For that, we will need a PostgreSQL special —
ARRAY data type. Arrays are not a common relational database feature but are very handy here for storing paths. If you are not familiar with this data type, then these are the things you need for understanding SQL below:
- Create array —
- Concatenate array —
ARRAY[value_1, value_2] || ARRAY[value_3]
"x" != ALL(ARRAY["a", "b", "c"])- compares
"x"to each value in array, results is
trueif all comparisons result in
Alright, now for the query:
The query above first selects all the direct neighbours in the non-recursive case. Then in recursive part, we build on the base case (and during subsequent iterations on previous recursive cases) and append available edge to existing paths and replace destination with the same edge. We do this for every row where the end of the existing path (
dest) is same as the start of the selected edge and where the new destination is not yet in the path (to prevent cycles). And this is the full output:
Aside from the useful things above, you can also use recursive CTE with infinite streams of data:
The recursive table is built iteratively with each step and each step produces a finite number of rows, therefore it’s possible to use
LIMIT to retrieve first
n rows from infinite query!
Bonus: Recursive VIEW
As a quick little bonus, I want to also show, that it is possible (as of PostgreSQL 9.3) to even create recursive views:
Even though this is just syntax sugar, it can make your life easier when working with some of recursive queries and data.
Recursion is a very powerful tool, that can solve some problems in a very elegant way. Using it with SQL is not so common though, so hopefully, this article showed you some ways for how to leverage it, in case you run into data that can only be processed recursively.
Recursion, however, can make your code less readable, so use it sparingly, especially with SQL, as it is often not that easy parse and understand even without recursion.