Path Tracing with Babylon, Background and Implementation

Part 5 - Implementation (cont.)

In this part we’ll take a look at the path traced scene itself: how objects are either loaded from disk or server, or created from scratch inside the path tracer, how to transform these scene objects, and how we can use the underlying library (Babylon.js) to help us with all of these tasks.

Now we hopefully have most of the bootstrap setup code out of the way (3 scenes with 3 render targets and 3 cameras (or 2 if you can reuse the ortho camera)). We can now decide what type of scene environment we wish to render - small or large room?, indoors or outdoors?, mathematical shapes or triangle models?, etc.

I’ll briefly go over all of the above decisions and how we sometimes have to change our approach to filling in scenes with arbitrary objects when we’re dealing with ray/path tracing and the GPU.

Let’s take the most simple scene objects and work up to the most complex. First, let’s talk about spheres. There’s a reason why if you see 99 percent of ray and path tracers out there, they all begin with (and some end with) just spheres. Also, if you take a graphics course in college and your professor tells you that you have to write a ray tracer by the end of the course, chances are the first and possibly only scene geometry requirement is sphere (and maybe plane) rendering.

There are many reasons for this first choice for rendering, but perhaps the more important ones are that the ray-sphere intersection code is historically well-known and optimized, the sphere object is easier to grasp geometrically and algebraically, finding a sphere surface normal is a simple matter of 1 vector subtraction, and when the sphere is lit and different materials are experimented with, you can immediately and easily see the changes.

A sphere (which is really a special case of an ellipsoid), has a surface that can be defined implicitly-> x2 + y2 + z2 - r2 = 0, if I remember correctly (the 2’s mean squared and the r is the sphere radius). Any 3-dimensional point in space that fulfills this implicit equation is a point located exactly on the sphere surface. On the other hand, the ray itself is usually defined explicitly, or with a parametric equation -> ray = ray.origin + ray.direction * t, where t is the parameter that defines how far we are along the ray, assuming that the ray.direction vector is always normalized to a unit length of 1. Without going too deep into the math, some smart person long ago figured out that if you place the explicit ray equation into the implicit sphere equation, and after all the algebraic reductions are performed, it ends up becoming a simple quadratic equation. And methods to solve quadratic equations have been around for many centuries - it basically boils down to root finding. There usually are 2 roots - a smaller root and larger root, often referred to as t0 and t1 in ray tracing. Once we find t0 and t1 with a quadratic formula, we take the smaller of these 2 roots, (but only if it is positive!) and stick this t value back into the simple ray parametric equation mentioned above, and we have our intersection point in 3d space! Since spheres can be defined with implicit equations and solved with quadratic equations, they are part of a class called quadrics. Other quadrics include ellipsoids, cylinders, cones, paraboloids, hyperboloids, and hyperbolic paraboloids (whew, that last one is a mouthful, just to describe a saddle or Pringle potato chip, lol). Ray tracers can handle all of these quadric shapes quite well and they are pixel perfect when rendered, no need for extensive tesselation of triangles or other polygons. You might think a torus (donut) fits into this class, but unfortunately it is known as a quartic shape, which requires a massive amount of calculations to find the closest of 4 possible roots (hence the name quartic). There are some cubic(3 roots) shapes as well, including bezier patches and the like, but those are also prohibitively expensive to analytically ray trace. In the end, cubic and quartic shapes are best represented as triangle meshes, or handled with ray marching distances rather than analytically trying to find multiple roots.

All the shapes described so far have some kind of mathematical curvature, but at the other end of the spectrum are all of the shapes that have straight edges and flat faces: planes, boxes, triangles, and quads. Luckily, just as with quadric shapes, there are many well known and efficient routines to handle a ray’s intersection with these shapes.

When choosing which end of the shape spectrum to use for modeling scene geometry, it is important to note that even though mathematical implicit quadric shapes are fun to ray trace and cool to look at in abstract scenes, it is very difficult to model complex geometry like a human, animal, car, spacecraft, architectural interior, etc. using just those shapes. Eventually everyone realizes that to render arbitrarily complex objects, the objects need to be made of either triangles or voxels. Triangles are a natural choice for representing complex geometry- a triangle has the fewest vertices and edges of all polygons, and by definition, a triangle is embedded in a plane with a well-defined surface normal vector. A quad is a convenient shape also, and one frequently used for modeling software, but great care must be taken to prevent one of the quad’s vertices from slightly veering off its plane, thus making a degenerate, untraceable shape. Therefore, over the years triangles have become the dominant construction primitive in professional renderers. The 1 big issue with using triangles to model everything is that after about 100 triangles, it starts to bog a ray tracer down and after 1000 maybe, it slows to a crawl. In order to reduce this bottleneck, much research has been done to create efficient acceleration structures that create hierarchies of triangles and use a divide and conquer approach when dealing with possibly millions of triangles in a modern scene.

Likewise, much research has gone into voxel rendering. If you’ve ever played Minecraft, then you know what voxels look like! Now Minecraft is great and all, but when we say voxel rendering, what it usually refers to is SVO, or Sparse Voxel Octrees. In this rendering scheme, everything is a cube of varying sizes that are organized hierarchically, all the way from the entire scene universe bounding box, down to a grain of sand and smaller, even possibly at the molecular scale. Since there are very fast ray-box routines, these SVO renderers can be realtime also, while containing practically limitless resolution. Rather than the speed bottleneck, one of the major hurdles to overcome with voxel scene representation is storage. It takes many more voxels to create larger curved shapes, that could have easily been handled by a couple of mathematical shapes, or a handful of large arbitrarily rotated triangles. Another major hurdle is the streaming of such large voxel data sets into GPU memory on demand as a viewer navigates the scene. As of yet, I have not attempted ray/path tracing voxels and octrees, but I am currently studying them and will hopefully one day offer a side branch to my renderer that allows for minecraft type scenes as well as true SVO scenes with almost infinite detail and some kind of data streaming in the background.

Another popular alternative to analytical shape and triangle rendering is using distance approximations and instead moving the rays along in a step-wise fashion through the scene, being careful not to overstep and miss any object. This technique is called ray marching. Instead of tracing triangles or math shapes with analyitic t value finding, we can march rays against any kind of shape, as long as it has a distance approximation equation giving the distance from the ray’s origin to its own surface. Now one might be tempted to trace all shapes with this approach, but there is a big problem - ray marching is much slower than ray tracing analytically. However, if you don’t have the analytical implicit equation for an arbitrary shape, you must resort to either tesselating it with lots of triangles and using a BVH, or simply ray march it. Shapes that are great for ray marching include fractal shapes found in nature: mountains, water waves, clouds, etc. as well as purely mathematical fractal 3d shapes like a Mandlebulb. Check out my Terrain_Rendering demo where I do only ray marching for the entire scene. I even use ray marching to trace a torus in my geometry demos, because as previously mentioned, it is prohibitively expensive to analytically trace and find the closest of 4 possible roots of a quartic equation. In contrast, ray marching a torus can be done in about 5 lines of code!

So the next topic in this post that I wanted to cover is how to manipulate the objects once you get them into your scene and are ray tracing them. Let’s say you want to make a simple rotating box scene in which the box has been stretched and scaled. Well, the hard way to do it would be finding a complex ray vs. oriented-scaled-box intersection routine. Likewise, if a triangular mesh in your scene needs to be rotated, positioned, or scaled, you would have to go back and change the entire geometry vertex list and resulting BVH in order to trace an arbitrarily transformed model like that. That sounds like a lot of uneccessary work, and turns out, it is!

The solution that follows is one of my favorite things about ray tracing. Instead of transforming objects and using complex ray intersection routines on them, what would happen if we instead transformed the ray itself by the inverse matrix of the objects? It totally works! What’s happening is that the ray is transformed into the object space of the shape we wish to have translated, rotated, and scaled, and the end result looks like you had used the complex, expensive ray vs. transformed-shape routine all along! I won’t go deep in the math, but we can just use the object’s matrix inverse to multiply against our ray and we get instant, perfect transformation. To see this effect in action, please take a look at my Transforming_Quadric_Geometry demo. In the eyes of the path tracer, none of those shapes are moving at all. Instead the rays are reverse transformed, but the result is the same - although way more efficient!

Finally, how can a library/engine like Babylon.js help us when we want to add shapes and triangular meshes to a ray traced scene? When loading triangle data, we can rely on the loaders of Babylon to correctly and efficiently load an entire triangular mesh off of a server or off disk. Writing such a loader is no easy task. When it comes to creating objects on the fly, as previously mentioned, it is not as straightforward as calling new Sphere() or new Cylinder() and expecting to see anything. However, the underlying library can assist us greatly because we can use its notion of matrix manipulation and matrix math routines like matrix.rotate() and matrix.inverse() to apply to all the rays that are intersecting arbitrarily transformed objects, even entire rigid triangular meshes.

Doing matrix math like rotation and inversion in the shader is too expensive when WebGL is concerned. Therefore, it is better to do all of those manipulations every animation frame on the Javascript side, or in other words, with Babylon’s routines, and then send the results over to the ray tracing shader via matrix uniforms. The really nice thing is that once you have hooked everything up, you can call the familiar Babylon.js translate, rotate, scale, lookAt, etc. on your triangle mesh or math shape (just an empty transform in that instance), every animation frame if you want, and things will just work!

In fact, as far as the path tracer can tell, all it receives from our .js app is a bunch of transforms - think 3d gizmos in a scene editor - they don’t really tell you what the object looks like, but they contain all the info you need about how you want the object to be placed in the scene: its position, rotation, and scale.

The short of it is, if you want traditional math shapes in your scene, you can create them and trace them in the shader directly. Then use an empty transform (gizmo) inside the .js engine to manipulate those shapes, and finally their inverse transform to ray trace them. If you want arbitrarily complex triangle meshes, it’s best to use the Babylon.js loaders to first load in the data, then also use the engine to give a transform (gizmo again) to every model that you want to be moved or rotated, then finally you have to use some kind of data structure like a BVH to be able to efficiently trace all those triangles.
Lastly, if If you need shapes like those found in nature, and you can’t triangulate them, or you don’t have an easy math equation for their surfaces, it is best to ray march a distance to those objects with a distance approximation equation, and knowingly pay the speed penalty, but having the satisfaction of pixel perfect tracing of those arbitrarily complex, organic, possibly fractal objects.

I hope I have sufficiently demonstrated the differences between the traditional graphics geometry pipeline and ray tracing geometry, and the different mindset and approaches necessary for scene geometry rendering. In the next post, we’ll take a look at lighting the scene, and the differences between the library’s notion of lights and the path tracer’s definition and handling of lights.