This post is based on half of a talk I gave at !!con. If you want to watch the talk, you can find it here.
One not too well known aspect of SQL is that it’s Turing complete. This means any program you can write in a general purpose programming language, you can also write in SQL. In this post, I’ll show you how you can use some of the dark corners of SQL in order to generate fractals. We’ll ultimately wind up with a set of queries which will allow us to generate an entire class of fractals known as the “escape-time fractals”.
Common Table Expressions
Before we begin, I’m going to go over one very important SQL feature that I’m going to use a ton throughout this post. That feature is known as CTEs (common table expressions). CTEs effectively give you a way to define what are effectively variables in SQL. They are really nice for large SQL queries because they let you break down the query into smaller pieces.
To define a CTE, you use the WITH clause. Here’s a simple example of a CTE that counts how many numbers there are from 5 to 10
WITH numbers AS ( SELECT i FROM generate_series(5, 10) g(i) ) SELECT COUNT(*) FROM numbers;
To execute this query Postgres first runs the query
SELECT i FROM generate_series(5, 10) g(i)
After running that query Postgres aliases the results of that query as numbers. That means from then on, you use use numbers as if it were a table. The numbers “table” has in it the results of the previous query. This is just like we are saving the results of the generate_series query into the variable numbers. As I said before, this is really nice because it gives you a way to pull out portions of a large query into separate components.
Now that we’ve gone over CTEs, we can get to the meat of the post, generating fractals.
The Sierpinski Triangle
Let’s start with a relatively simple fractal, the Sierpinski triangle:
There are many different ways to generate the Sierpinski triangle. One that’s especially convenient is based on bitwise arithmetic. If you take a grid and fill in all the cells where the bitwise AND of the row number and column number is zero, you get the Sierpinski triangle. It’s not intuitive, but here’s a few examples demonstrating that this approach works:
2x2 1|* 0|** -- 01 4x4 3|* 2|** 1|* * 0|**** ____ 0123 8x8 7|* 6|** 5|* * 4|**** 3|* * 2|** ** 1|* * * * 0|******** -------- 01234567
We can see that as we increase the size of the grid, we get a better resolution version of the Sierpinski triangle.
To execute this algorithm in SQL, we can write a three part SQL query:
- First, we use the generate_series function and CROSS JOIN to produce one row per cell in the grid.
- Next, we use CASE WHEN (analogous to if-else in any other programming language) to assign a string to each row based on the bitwise AND of the row number and column number.
- Finally, we concatenate all the cells together using the string_agg function.
The first part is relatively straightforward. We use two calls to generate_series, one to generate the row numbers and a second to generate the column numbers. We can then use a CROSS JOIN to generate every pair of row number and column number. This gives us our grid:
WITH cells AS ( SELECT r, c FROM generate_series(0, 63) rows(r) CROSS JOIN generate_series(0, 63) cols(c) ) SELECT * FROM cells; r | c ----+---- 0 | 0 0 | 1 0 | 2 ... | ... 0 | 62 0 | 63 1 | 0 1 | 1 ... | ...
The next part is also straightforward. For each cell we assign a string based on the bitwise AND of the row number and column number. Let’s assign ‘**’ to cells that are part of the triangle and ‘ ‘ to cells that are not. In SQL this looks something like this:
WITH cells AS ( SELECT r, c FROM generate_series(0, 63) rows(r) CROSS JOIN generate_series(0, 63) cols(c) ), marked_points AS ( SELECT r, c, CASE WHEN r & c = 0 THEN '**' ELSE ' ' END AS marker FROM cells ) SELECT * FROM marked_points; r | c | marker ----+----+-------- 0 | 0 | ** 0 | 1 | ** ... | ...| ... 1 | 0 | ** 1 | 1 | 1 | 2 | ** 1 | 3 |
Now that we’ve got our marked points, the only thing left to do is to combine all the strings together. We can do this with two calls to the string_agg function. string_agg is an aggregation function that concatenates all the given strings together. We first use it on the values from each row to generate one string for each row. We then use it a second time to concatenate all the rows together. In the end this looks like this:
WITH cells AS ( SELECT r, c FROM generate_series(0, 63) rows(r) CROSS JOIN generate_series(0, 63) cols(c) ), marked_points AS ( SELECT r, c, CASE WHEN r & c = 0 THEN '**' ELSE ' ' END AS marker FROM cells ), combined_rows AS ( SELECT string_agg(marker, '') AS row_text FROM marked_points GROUP BY r ORDER BY r DESC ) SELECT string_agg(row_text, E'\n') FROM combined_rows;
The Mandelbrot Set
Now that we’ve got the Sierpinski triangle, the next question is whether or not we can generalize this code. If you look closely you will notice there’s one nice place where we can generalize the code. In the current version, we use bitwise AND to determine whether or not the cell should be part of the fractal:
... CASE WHEN r & c = 0 THEN '**' ELSE ' ' END AS marker ...
We can actually replace that test with any test we want. This means if, for a given fractal, we can come up with an easy test to determine whether or not a cell should be part of the fractal, we can produce a SQL query that generates it!
One well known fractal that has this property is the Mandelbrot set. You can test whether or not a point is in the Mandelbrot set by repeatedly running the following formula over each point (i in the formula is the imaginary number):
z = 0 z[n+1] = z[n]^2 + col + i * row
If z[n] does not diverge to infinity as n increases, the point is part of the Mandelbrot set. This is usually implemented by running the formula for a fixed number of iterations and checking whether z[n] has crossed some threshold.
To implement this test in SQL, we will need a way to express iteration. One way of doing iteration in SQL is what’s known as a “Recursive CTE“. A recursive CTE has similar syntax to a regular CTE, but is actually completely different. The rough idea with a recursive CTE is that you provide an “initial query” and a “recursive query”. It ultimately looks something like this: