John Spilsbury, a Long cartographer, is credited with commercializing jigsaw puzzles around 1760 which at the time were made by painting a picture on a flat rectangular piece of wood and then cutting it into small pieces with a jigsaw. Since then, countless hours have been spent around the dining room table trying to force that one slightly big piece into what must be its only misshapen spot.
These puzzles, while simple in their nature and design, lend themselves to the amazing pattern recognition our brains are capable of. While they come in all degrees of difficulty, jigsaw puzzles can be solved by young and old alike. So I am interested in understanding how a computer can go about solving such a puzzle.
As my initial experiment, I found a 1000 piece puzzle set of New York City as seen below
On the left hand side, we see the assembled puzzle as portrayed on the box cover (the template) while on the right hand side we see a small gazebo cluster of pieces (the image). My goal in this exercise is to see how can I determine the coordinates where the image belongs in the template.
For this entire exercise I will be making use of the OpenCV library, particularly the Python bindings for it, so all code samples will be in Python.
As a first pass, I realized that the above two images captured by my phone complicate the problem too much, so as a result I went ahead and simplified the problem downloading the puzzle image from the publisher's’ website and cropping out a similar region around the gazebo from the downloaded image. This setup now eliminates the various difficulties introduced by orientation, image quality (i.e. glare) and proportions to the much simpler problem of given an image and a cropped image, can you find the cropped image’s coordinates relative to the original.
The first thought to solve this problem was then to take the cropped image, align it to the top left corner of the original image, and just slide it across the entire original image performing simple template matching as described here.
OpenCV provides a collection of different ways to template match as seen in the gist above, and all perform well as long as the image is a direct crop from the template. Once we start introducing real world artifacts, such as the ones seen in our case, template matching becomes really poor and unwieldy. I tinkered for a while trying to adjust and scale the original image set so that it conforms to the ideal world requirements for template matching but after a while figured there must be a better way which is more robust to the conditions of our case.
After some quick searching online, I came across the popular SIFT matching algorithm. Unlike template matching, SIFT claims to be invariant to scale and orientation, which was exactly what I was searching for. As long as the two images being compared maintained the same relative position between their features (which in our case was true as the images were not stretched or altered to my knowledge) SIFT would be able to detect and match a set of comparable features in both images.
As seen above, SIFT performs well in detecting the general location of the gazebo, even when using the template with various artifacts.
One further optimization to note would have been to perform SIFT on not the piece, but on the mask of the piece (basically everything inside it). In my tests, I noticed the edges of the piece, since they produce a high contrast against the white table, tend to be picked up as more significant features than the actual content inside the piece.
Running the same algorithm on a simpler kid’s puzzle showed even better results.
Here, we did not even need to worry about masking the piece edges as the internal features were strong enough.
While impressive, this approach is not perfect. In the case of a more homogenous (aka masochistic) puzzle where it is an image of a blue sky, or just a pattern, this approach would probably not be ideal. That being said, I was very impressed at how robust it truly performed when introducing images taken from my phone rather than the original image files.
OpenCV also deserves a lot of credit, as it provides easy handlers to execute the SIFT algorithm (much like the previous template matching approaches). Since OpenCV is available on both iOS and Android, it would be a fun and interesting exercise to port this to them and make a Jigsaw Puzzle Companion App. Other more practical use cases of the SIFT algorithm could be for object detection in videos or images which in turn lead to interesting applications of AR and VR.