In the past few months, Facebook has been plagued filled with 3D photos. If you have not had the chance to see one, 3D photos are images inside a post which gently change perspective as you scroll the page, or as you move your mouse over them.

A few months prior to their introduction, Facebook had been testing a similar feature with 3D models. While it is easy to understand how Facebook can render 3D models and rotates them according to the mouse position, the same might not be as intuitive for 3D photos.

The techniques that Facebook is using to create the illusion of three-dimensionality on two-dimensional pictures is sometimes known as height map displacement, and it relies on an optical phenomenon called parallax.

This is a two-part series. You can read all the posts here:

A link to the complete Unity package is available at the end of the tutorial.

## Understanding Parallax

If you have played Super Mario, you know exactly what parallax is. While Mario is running at a certain speed, distant objects in the background appear to be moving slower (below).

This effect creates the illusion that certain elements, like mountains and clouds, are further away. The reason behind its effectiveness comes from the fact that our brain strongly relies on parallax (among other visual clues) to estimate the distance of far objects.

❓ How is the brain estimating distance?
There are several proposed mechanism that the human brain uses to estimate distances.

At a short to medium range, distances are calculated comparing how much the position of an object differ when seen from the left and right eye. This is called stereoscopic vision, and is exceptionally common in nature.

For rather far objects, however, stereoscopic vision alone is not sufficient. Mountains, clouds and stars differ too little for our eyes to notice any significant change. This is when relative parallax comes into play. Objects in the background will appear to move less, compared to objects in the foreground. It’s their relative motion that allows establishing their relative distance.

There are many other mechanisms involved in the perception of distance. Most notoriously, the atmospheric haze that taints distance features blue. Most of these atmospheric clues are completely missing on alien worlds, which is why it can be exceptionally hard to estimate the scale of objects on other planets and moon. YouTuber Alex McColgan is explaining this his channel Astrum, showing how hard it is to guess the size of the lunar features seen in the video below.

## Parallax As Shifting

If you are familiar with linear algebra, you probably know how tricky and complex the Mathematics of 3D rotations can be. That being said, there is a very easy way to understand parallax which involves nothing more than shifts.

Let’s imagine that we are looking at a cube (below). If we are perfectly aligned to its centre, the front and back faces will appear to our eyes as two squares of different size. That is perspective in a nutshell.

However, what happens if we shift the camera down or, equivalently, if we shift the cube up? By applying the same principles, we can see that the front and back faces appear to have shifted from their previous position. More interestingly, they have shifted in respect to each other. The back face, which is further to us, appears to have move less.

If we want to calculate the actual positions of those cube’s vertices on our projected field of view, we do need to deal with a good amount of trigonometry. However, that is not really necessary. If the movement of the camera is small enough, we can approximate the displacement of the vertices by offsetting them proportionally to their distance.

The only thing we need to establish is a scale. If we move X metre on the right, an object at Y metres from us appears to be shifted by Z metres. As long as X stays small, parallax becomes a problem of linear interpolation, not trigonometry. This ultimately means that we can simulate small 3D rotations by simply shifting pixels based on how far they are from the camera.

## Generating Depth Maps

What Facebook does is, at its core, not too dissimilar from what is happening in Super Mario. Given a picture, certain pixels are shifted in the direction of the movement based on their distance from the camera. All that Facebook needs to create a 3D photo is the photo itself, and a map that tells how far each pixel is from the camera. Such a map is called, unsurprisingly, a depth map. Depending on the context, it can also be referred to as a height map.

While taking pictures is a relatively easy task, generating a reliable depth map is a much more challenging problem. Modern devices rely on various techniques. The most common one involves the use of two cameras; each one takes a picture of the same subject, but from a slightly different perspective. This is the principle behind stereoscopic vision, which is another way in which humans are able to perceive depth on a short to medium range. The picture below shows how an iPhone 7 is able to create depth maps from two very close images.

The details of how such a reconstruction is done are explained in Instant 3D Photography, a paper that Peter Hedman and Johannes Kopf have presented at SIGGRAPH2018.

Once a reliable depth map is available, simulating the three-dimensionality of an image becomes almost a trivial task. The real limitation of this technique comes from the fact that even if a rough 3D model can be reconstructed, there is lacking information on how to render the parts that were occluded in the original photos. This problem, at the moment, cannot be solved and it is why all the deformations seen in 3D photos are rather mild.

## What’s Next…

This post introduced the concept of 3D photos, briefly explaining how modern smartphones are able to capture them. The next tutorial in this online course will show how to use those very same techniques to implement 3D photos in Unity using shaders.

This is a two-part series. You can read all the posts here: