Databases provide a great way to store and query large amounts of data quickly and efficiently. But what if you want to store something like a file system, where hierarchical links between the data are key?
In this article I’ll be looking at how to store and query data in SQL Server in such a way that using it hierarchically is simple.
This is specifically written for SQL Server, but the core functionality (CTEs) should work in many other SQL flavours including MySQL 8 and SQLite
For this example, I’ve constructed a very simple table “folder” as shown below.
CREATE TABLE folder ( folderid INT IDENTITY(1,1) PRIMARY KEY, parentid INT NULL, name VARCHAR(200) NOT NULL, created DATETIME NOT NULL DEFAULT(GETDATE()), lastupdate DATETIME NOT NULL DEFAULT(GETDATE()), FOREIGN KEY (parentid) REFERENCES folder(folderid) );
The properties of this table are:
- The table has a integer primary key “folderid” which, in this case, is auto-incrementing.
- It also has a foreign key “parentid” which may be NULL, indicating it has no parent and is thus a “root” folder, or an integer value referencing another row in the folder table by its primary key.
- Each folder has a name to more easily identify it, and created and last update datetime values for flavour.
For this example, I’ve added a series of rows to the table to show it working.
I’ve tried to name the child folders in such a way that it is easy to see its intended path quickly and easily. “Child 1-2” for example is the second child of “Child 1“, which in turn is the first child of the root folder “Root Folder 1“.
Querying the Data
The data as it’s shown in the above Table Example Data screenshot is the result of a straight SELECT statement. It shows all of the data correctly, and would be fine if you are looking to get ALL of the data simply.
However, consider the scenario where you only want the folder “Child 1” and its descendents. I can get “Child 1” itself easy enough with a basic SELECT:
SELECT folderid, parentid, name FROM folder WHERE folderid = 2;
I can also get its children with a similarly simple query. But what about its children’s children, or its children’s children’s children? And I want it all in one neat query with the results properly sorted. Time to introduce the Common Table Expression.
Common Table Expressions (CTE)
A CTE is used to create a temporary results set for the lifetime of one SELECT (or INSERT, UPDATE, DELETE or MERGE) statement, so like a temporary table. Where CTEs get their real power though is that they can be self-referential and recursive.
What that means in our scenario is that we can use a CTE to build a result set based on a recursive query, or automatically drill down into a hierarchical structure parent to child in a single query.
CTEs are defined with the “WITH” command, followed by a single SELECT (in this example) statement, such as:
WITH cte_name AS ( SELECT folderid, parentid FROM folder ) SELECT * FROM cte_name;
This simple example creates a CTE named “cte_name” and fills it with folderid and parentid from the folder table, and then selects the results of that. A fairly pointless example, but it shows the syntax so we can move on to something more complicated.
Get Folder Structure
Let’s look at our scenario again.
We want to be able to query the “folder” table to find one particular folder row and all of its descendents.
/* Using a Recursive Common Table Expression (CTE) to retrieve a hierarchical list of child folders starting from a source container given by @sourceid. */ DECLARE @sourceid INT = 2; WITH cte (folderid, parentid, name, created, lastupdate, level, path) AS ( SELECT folderid, parentid, name, created, lastupdate, 0 AS level, CAST(name AS VARCHAR(1000)) AS path FROM folder WHERE folderid = @sourceid UNION ALL SELECT c.folderid, c.parentid, c.name, c.created, c.lastupdate, cte.level + 1 AS level, CAST((cte.path + '/' + c.name) AS VARCHAR(1000)) AS path FROM folder c INNER JOIN cte ON cte.folderid = c.parentid ) SELECT folderid, parentid, name, created, lastupdate, level, path FROM cte ORDER BY path ASC
It may seem daunting at first, but this is really very simple. It will first retrieve the row with folderid matching the “@sourceid” variable.
It then performs a UNION to also select the children of the data in the CTE recursively, building the hierarchical tree of data. Note that the second SELECT statement performs an INNER JOIN on the CTE itself, linking the current result’s “parentid” field to the CTE result’s “folderid“.
Here you can see we also used the CTE’s SELECT statements to inject two additional useful fields into the data: “level” and “path”.
- The “level” field is set to “0” in the first SELECT as the source query, then during recursion on the second SELECT, 1 is added to the value at each depth. This allows you to quickly see how far down the hierarchy the row is. Folder with ID 15, for example, “Child-1-1-1” , has a level of 2 meaning it is the is the child of the child of the @sourceid.
- The “path” field is a concatenation of the name of each folder at each level with a “/” character between them, constructing something like a file path. The final SELECT then uses this path to sort the results so that we get an easy to parse list of results
Get Reverse Tree Branch
As an additional step, we can reverse this procedure.
Consider the slightly different scenario where you instead have the ID of a child folder, and want to get a list of all of its parents right up to the root.
/* Using a Recursive Common Table Expression (CTE) to retrieve a hierarchical folder list starting from a child container given by @childid and navigating up to the root. */ DECLARE @childid INT = 13; WITH cte(folderid, parentid, name, created, lastupdate, level) AS ( SELECT folderid, parentid, name, created, lastupdate, 0 AS level FROM folder WHERE folderid = @childid UNION ALL SELECT c.folderid, c.parentid, c.name, c.created, c.lastupdate, cte.level + 1 AS level FROM folder c INNER JOIN cte ON cte.parentid = c.folderid ) SELECT folderid, parentid, name, created, lastupdate, level FROM cte ORDER BY level ASC
The primary difference here is that during the recusive SELECT statement (the second one), it is the CTE’s “parentid” field which is linked to the current folder’s “folderid“, essentially reversing the direction of traversal.
Also note that, because we’re working backwards up the tree, the “path” field’s construction is not as simple so has been excluded.
This article shows a couple of quick and simple uses of Common Table Expressions for the purposes of building hierarchical data structures, but this only scratches the surface of what is possible with them.
Let me know on Twitter (@matt_is_ready) if you’ve found other creative uses.