Recursive SQL Queries with PostgreSQL

By Martin Heinz

Martin Heinz
Photo by Ludde Lorentz on Unsplash

When working with databases, most of the time, all you need is SELECT, UPDATE (CRUD operations), few JOINs and WHERE clauses and that's about it. But, sometimes you will run into datasets and tasks, that will require you to use much more advanced features of SQL. One of them is CTE or common table expression and more specifically, Recursive CTE, which we can use to make recursive SQL queries. Let's look at how it works and what we can do with it!

Why Recursive Queries?

Self-referential data:

  • Manager -> Subordinate (employee) relationship
  • Category -> Subcategory -> Product relationship
  • Graphs — Flight (Plane Travel) map

Trees:

  • Any taxonomy system — books, animals, genetics…
  • Links between articles — for example on Wikipedia
  • Comment section — for example threads on Reddit

For any of these examples, it would require you to build temporary tables or use cursors to actually get useful data from such datasets. Now that I hopefully persuaded you that recursive queries are pretty useful, let’s see what they are and how we can use them.

What Will We Need?

This is a very simple example, but you can surely imagine how this can be used to make very complicated queries containing multiple subqueries much more readable.

Apart from the syntax above, we will also need some data that we can use for our recursive queries. For that we will use Manager — Subordinate hierarchy using one self-referencing table, defined like so:

Employee Table

Here’s SQL to create such a table including some data to play with:

So, with that out of the way, let’s build some recursive queries!

To iterate is human, to recur, divine — L. Peter Deutsch

Recursion

It looks pretty simple, but doesn’t tell us much, so let’s see a human-readable example:

And the output of that query:

In this example, our non-recursive base case is just SELECT 1 and recursive part is SELECT n+1 FROM nums WHERE n+1 <= 10. This part creates recursion by referring to name of CTE ( nums) and unioning all the results. At the end we have WHERE clause to terminate the recursion so that our query doesn't run infinitely.

Real World Examples

Getting “manager tree” for some employee:

This query can be used to generate a list of all subordinates of some manager, which essentially is subtree starting from some specific employee. As a base case here, we select one employee (manager) by ID and we also include level to indicate how many levels down we went in the tree. In the recursive part we JOIN employees table to CTE ( managers) using manager IDs. Again, we include level by incrementing level of results from the previous step of recursion. This is the output:

To better visualize what happens there, look at the following tree. The highlighted part is what is returned by query:

Another practical example using same data is computing degrees of separation between 2 employees:

The recursive CTE is quite similar to the one in the previous example. To simplify things, we only select ID and degree (instead of level). Next, we need a query that looks up and tells us degrees of separation for 2 employees - to do so, we join employees table to our previously built rec table containing IDs and degrees using IDs in respective tables. In the final SELECT part we do some formatting to get nicer output and in WHERE clause we optionally specify the other (second) employee for whom we want the degrees of separation. Output:

Again, playing with the same data, let’s find out what might be the job progression in the hypothetical company:

This time we run the recursion from the bottom up. This is achieved by flipping the ON clause, compared to previous examples. To create readable output, we use STRING_AGG function, which takes rows from above SELECT and concatenates them using > as a delimiter. This is what the query gives us:

Recursion on Graphs

Graph

As an example, we will do the obvious thing — find paths as graph. For that, we will need a PostgreSQL special — ARRAY data type. Arrays are not a common relational database feature but are very handy here for storing paths. If you are not familiar with this data type, then these are the things you need for understanding SQL below:

  • Create array — ARRAY[value_1, value_2]
  • Concatenate array — ARRAY[value_1, value_2] || ARRAY[value_3]
  • ALL function - "x" != ALL(ARRAY["a", "b", "c"]) - compares "x" to each value in array, results is true if all comparisons result in true

Alright, now for the query:

The query above first selects all the direct neighbours in the non-recursive case. Then in recursive part, we build on the base case (and during subsequent iterations on previous recursive cases) and append available edge to existing paths and replace destination with the same edge. We do this for every row where the end of the existing path ( dest) is same as the start of the selected edge and where the new destination is not yet in the path (to prevent cycles). And this is the full output:

Aside from the useful things above, you can also use recursive CTE with infinite streams of data:

The recursive table is built iteratively with each step and each step produces a finite number of rows, therefore it’s possible to use LIMIT to retrieve first n rows from infinite query!

Bonus: Recursive VIEW

Even though this is just syntax sugar, it can make your life easier when working with some of recursive queries and data.

Conclusion

Recursion, however, can make your code less readable, so use it sparingly, especially with SQL, as it is often not that easy parse and understand even without recursion.