He probably couldn’t imagine that we’d be blogging about him 1200 years later, but there’s one thing he would have little problem understanding about what makes today’s world tick: algorithms. That’s because he’s the guy algorithms were named for.Al-Khwarizmi’s work preceded the world of graphs (invented by fellow super-nerd Euler) by 900-some-odd years, so maybe just a little of this would be new to him, but he’d get the gist – just like you. That’s because you don’t have to be a super-nerd to understand the basics of graph algorithms, particularly when they’re used for search.
In this Graph Databases for Beginners blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graph technology is the future, why connected data matters, the basics (and pitfalls) of data modeling, why a database query language matters, the differences between imperative and declarative query languages and predictive modeling using graph theory.This week, we’ll be discussing different graph search algorithms and how they’re used, including Dijkstra’s algorithm and the A* algorithm. Our discussion will focus on what graph search algorithms do for you (and your business) without diving too deep into the mathematics of graph theory. There are two basic types of graph search algorithms: depth-first and breadth-first. The former type of algorithm travels from a starting node to some end node before repeating the search down a different path from the same start node until the query is answered. Generally, depth-first search is a good choice when trying to discover discrete pieces of information. They’re also a good strategic choice for general graph traversals.
The most classic or basic level of depth-first is an uninformed search, where the algorithm searches a path until it reaches the end of the graph, then backtracks to the start node and tries a different path.
On the contrary, dealing with semantically rich graph databases allows for informed searches, which conduct an early termination of a search if nodes with no compatible outgoing relationships are found. As a result, informed searches also have lower execution times.
(For the record, Cypher queries and Java graph traversals generally perform informed searches.)
Breadth-first search algorithms conduct searches by exploring the graph one layer at a time. They begin with nodes one level deep away from the start node, followed by nodes at depth two, then depth three, and so on until the entire graph has been traversed. Like so:
As you see above, a breadth-first search starts at a given node (marked
0) and then travels to every node that’s only one hop away (marked
1) before it travels to each node that’s two hops away (
2). Then, only after it’s traveled to all of the
2 nodes would it explore one more hop to the nodes marked
Like his super-nerd predecessors – al-Khwarizmi and Euler – Edsger Wybe Dijkstra was super into algorithms and super into graphs. That’s probably why he invented Dijkstra’s algorithm in twenty minutes. Like I said, super-nerd. The goal of Dijkstra’s algorithm is to conduct a breadth-first search with a higher level of analysis in order to find the shortest path between two nodes in a graph. Here’s how it works:
- Pick the start and end nodes and add the start node to the set of solved nodes with a value of 0. Solved nodes are the set of nodes with the shortest known path from the start node. The start node has a value of 0 because it is 0 path-length away from itself.
- Traverse breadth-first from the start node to its nearest neighbors and record the path length against each neighboring node.
- Pick the shortest path to one of these neighbors and mark that node as solved. In case of a tie, Dijkstra’s algorithm picks at random (Eeny, meeny, miny, moe, presumably).
- Visit the nearest neighbors to the set of solved nodes and record the path lengths from the start node against these new neighbors. Don’t visit any neighboring nodes that have already been solved, as we already know the shortest paths to them.
- Repeat steps three and four until the destination node has been marked solved.
Note that because this is a breadth-first search, Dijkstra’s algorithm must first search all still-possible paths, not just the first solution that it happens across. This principle is why Perth isn’t immediately ruled out as the shortest path.
Now that al-Khwarizmi and Euler have used Dijkstra’s algorithm to solve for all possible paths, they can rightly compare the two routes to Perth:
- via Adelaide at a cost of 50 hours, or
- via Darwin at a cost of 82 hours
For more information on how the A* algorithm is used in practice, consult Chapter 7 of Graph Databases from O’Reilly Media.Graph search algorithms help you (or your friendly computer sidekick) traverse a graph dataset in the most efficient means possible. But “most efficient” depends on the results you’re looking for – a breadth-first search isn’t the most efficient if your results are better suited to depth-first queries (and vice versa).
This is where your super-nerd friends (or even a helpful ebook, if you don’t have friends) come in handy. If they’re smart, they’ll match the right graph algorithm with the type of results you’re looking for.But now when they start talking breadth-first vs. depth-first vs. best-first you can tell them you know your A* from your Dijkstra and this isn’t your first rodeo. Go make al-Khwarizmi proud.
Click below to get your copy of the O’Reilly Graph Databases book and start using graph technology to solve real-world problems.
Catch up with the rest of the Graph Databases for Beginners series: