What is occlusion in AR and how does Selerio SDK help?
We introduced occlusion briefly when we announced the release of our SDK here. In this article, we take a deeper jab at occlusion in AR and show how you can use the SDK to do some heavy lifting in your application. Occlusion commonly refers to objects blocking other objects located behind them from a given viewpoint. In AR, occlusion is essential for experiences to be immersive; virtual objects should be displayed only where there are no physical objects between them and the camera. Unfortunately, this important feature is missing in today’s mobile AR applications.
How occlusion is solved currently and why is it hard?
Virtual content added to a physical scene needs to know what physical objects are in the scene and where exactly they are in the real world. This is to determine what needs to be occluded and render content accurately. Currently, application developers use a variety of gimmicks to address this issue. One such technique is to only use free spaces in the real world to place virtual content and avoid dealing with occlusions altogether, or alternatively, create custom experiences as shown below, where the user manually inputs 3D surfaces that support occlusion.
There are a couple of ways occlusion in mobile AR can be addressed properly. One trivial solution would be to use depth cameras which give us the 3D location of each pixel. This will tell us what lies (if anything) between us (the camera) and the virtual content. An even better solution is to aggregate this depth information across frames to generate the 3D geometry of the scene. However, there are a few drawbacks to using depth cameras but we will highlight one: most phones do not have depth sensors.
Getting occlusion correct in AR is vital to maintain the illusion of immersive reality
This leaves us with the challenge of figuring out depth (3D) information from camera (2D) images. Turns out this is a hard problem, with many researchers working towards solving this. Most proposed solutions are computationally expensive, with intensive use of GPUs, making it challenging to run on a mobile phone. This process of estimating the scene mesh, often called “3D reconstruction”, can be split into two sub-problems:
- Generate 3D depth information from 2D images, ideally in the form of a dense point cloud or a depth image
- Integrate the 3D information over several frames to generate geometry, in the form of a triangular mesh, in real time
Generating 3D depth information from 2D images can be done in myriad ways. ARKit/ARCore generates and provides 3D feature points (the tiny dots you see as you move around in the debug view of AR demos). However, they are quite noisy and sparse in nature. We can also use depth prediction networks (such as this) to infer depth but the output tends to be without scale and limited by the training dataset. As mentioned earlier, running them on a mobile device is also compute intensive. Assuming we have crossed this hurdle, we now have to use this 3D information to construct a geometric mesh which is updated progressively as new depth information is gathered through mapping (moving around with your phone) very quickly.
No points for guessing that now I am going to say — we figured out a way to solve the above-stated problems. Indeed, we developed a clever way to use the sparse feature points in combination with neural network to generate a dense depth map. We then use the state of the art TSDF volume integration to compute and generate a 3D triangular mesh of the scene in real-time. All of this happens on-device and it is blazingly fast! You can see the meshing happen in real-time below running on an iPhone 7. But don’t take our word for it, take the sample app for a spin.
As a result of the mesh generation, we can now enable occlusion using special materials applied on the mesh (SceneKit on iOS to the rescue!). The tennis balls dropped on the couch get occluded when the camera moves down.
Pretty impressive right? We thought so!
Remember, even Superman has kryptonite unfortunately!
- Area of interaction has to be mapped before. This step should only take a few seconds. The mesh is updated continuously as you move.
- Tracking of moving (dynamic) objects such as people in the scene is currently not supported. Stay tuned, we are working to add this.
- We can handle reasonably large meshes (size of a living room works fine) but performance drops with very large scenes. Expect updates on this too.
If you have made it this far and have a question, why not join our slack to ask?
If you are wondering how to integrate this into your AR app, worry not. Keep an eye on the blog, we will be releasing articles to show a step-by-step guide. We invite you to join our closed beta and get access to the API documentation.