And yes, outputting only the edge in the init pass still produces correct results. And could be used to do both an interior, exterior, or centered line if so wanted. Some of the links I posted above show examples of just that. But I don’t need that. I only want an exterior line. Plus the outline version of the init shader is slower.
Step 3: Profit!
The third and final pass takes the output of the second stage and gets the distance from the current pixel to the stored closest position. From that I can just use the target outline width to figure out the color that pixel should be. I.E.: if it’s less than the outline width, output the outline color, otherwise output a transparent value.
But just doing the basic jump flood outline ends up a bit jagged, with no anti-aliasing.
So how do I support anti-aliasing? Well, for one the third pass doesn’t use a hard on / off for the outline color output, but I let it use the distance fade the edge by 1 pixel. That helps, but it’s not the biggest problem. The init shader pass I mentioned above is effectively aliasing the output of the mask, so even if I render that render texture with anti-aliasing enabled, it’ll still end up having a jagged edge since that’s all the init pass is outputting.
In the above example you can see some minor anti-aliasing in the step corners. That’s the anti-aliased distance fade. But that doesn’t help the edges that are almost straight with the screen vertical or horizontal. There the stair stepping is still quite obvious.
I realized I could modify the init to make a guess at where the closest sub-pixel edge was based on the anti-aliased color. The basics of this is any mask texel that’s fully black I output the “no position” value. For any mask texel that’s white I output that pixel position. The values that aren’t fully black or white are the interesting ones. For that I check the texel values immediately around the current one and compare them to find the average direction.
For sub pixel position estimation, lets just think about this on one axis to start. One thing to be mindful of here is the position being output by a basic init pass is not actually of the geometry edge, but rather half a pixel inside the geometry edge.
This isn’t a problem though. It actually has some benefits we can come back to later. If I have an anti-aliased mask texel with a value of 0.5, it means the geometry was covering approximately half of that pixel. But from that single texel alone you don’t know which half. But we can make a good guess. By sampling the left and right texels we can estimate which direction the closest edge is in just by subtracting the right texel’s value from the left. If the right texel is 1 and left is 0, then I know the geometry is covering the right side. And we can adjust the half pixel inset position to be half a texel in that direction.
If the reverse is true then it’s covering the left side. If both the left and right values are the same, then the best guess we can make is that it’s centered on the pixel. I then do the same sampling above and below texels. This gives me a directional offset to approximate the subpixel position, which I then add to the current pixel position and output in the init pass.
This looks quite good compared to the original brute force version. However just doing the horizontal and vertical axis meant some obvious artifacts in sharp corners where a single pixel could show up. Below you can see the corner has a “bubbled” look to it compared to the brute force approach. Though that too isn’t quite right and is a little too soft and rounded.
This is because on single pixel corners the estimation basically thinks this is a pixel floating by itself and doesn’t do any (or enough) offsetting. So I sampled diagonals as well and add those with a slightly reduced weight. Then I normalize the resulting direction vector. If some of you are reading that and thinking “that sounds like a Sobel operator”. Yep.
I had tried a couple of different weights on the diagonal and ended up recreating Sobel on accident.
This ends up being very close to the original brute force approach. And because of the sub pixel estimation and the fact it doesn’t fade out on anti-aliased corners, may actually be closer to the ground truth.
Before I mentioned that having the init pass output position being half a pixel inset from the real edge, and that this was actually an advantage. The reason for that comment is imaging a line that’s only a single pixel wide, or narrower.
In the above example, the estimated edge doesn’t line up with the real geometry. But we can only store a single position per texel, and we can only see that single texel line. If we were attempting to store the actual edge position, in the above case there are two edges and we’d have to pick one. The result would be the nearest distance wouldn’t be centered and the outline would be wider on one side than the other. For wide edges, that probably wouldn’t be obvious. But a 1 pixel outline? That would be obvious as only one side of the line would get an outline. This is why storing a half-pixel inset position is beneficial. We can still draw a correct 1 pixel outline on a 1 pixel wide shape by using a 1.5 pixel distance. This is something even the brute force method has a hard time with.
It’s not perfect. In the above image if you look closely you can see the outline fades out a little where it shouldn’t a little down from the top where it transitions from 25% coverage to 50% coverage. Doing a little more work in the init could probably fix this, but this is a rare enough scenario that I’m happy leaving it as is. I mean, it’s pretty darn good as is, and is only obvious when this zoomed in.
In comparison the brute force method just fades out entirely.
That’s clearly wrong, even when not zoomed in.
This is already incredibly fast, but it didn’t stop me from at least attempting some further optimizations.
I found only doing the diagonals and not doing a full Sobel gave very nearly identical results most situations. And it was faster! This produced a compiled shader that had roughly half as many instructions! Nice optimization right?
Well, not really. It was technically faster, but trivially so. It was faster by about a single microsecond. That’s one millionth of a second faster, less than the margin of error in profiling. Less than half a percent of the effect’s cost for a single pixel radius outline. It also caused the lines to get slightly too wide on diagonal edges. So while it was faster, I still do the whole Sobel operation as it doesn’t meaningfully impact the performance. The init shader pass was also already the least expensive of the three, so this was more an attempt to squeeze water from a stone.
The jump flood passes take up most of the time, so that’d be a better place to look to optimize. And I did think of a pretty simple one here. The basic idea is you’re looking for the closest of 9 the samples by comparing the distance from the current pixel. Basically all examples you’ll find, including those I link above, use a
length() call and compare the results. But you don’t need the linear distance. You can get away with the square distance, which cheaper to compute with a dot product. That removes 9 square roots from the shader, so that should be a good savings.
Again, nope. I could not measure a difference here at all, not even a single microsecond. More or less any kind of clever shader change I attempted here resulted in no change, or made things slower. Both of these shader passes are completely hardware texture unit limited. I still use this optimization though, because it does no harm.
Render Texture Format
The biggest savings I got was to use a
RenderTextureFormat.RGHalf) render texture instead of a
RenderTextureFormat.RGFloat). That reduced the memory requirements in half, but still more than enough precision. That alone dropped the cost of the init pass from 47 μs to 28 μs, and the jump flood pass from 75 μs to 67 μs. Not a lot, but for a 50 pixel radius outline the entire effect dropped from 630 μs to 565 μs. That’s at a least measurable improvement of around 65 μs, even if they’re both effectively 0.6 ms. However
R16G16_SFloat had some weird precision issues, which caused the outline to be slightly offset in the subpixel position, even when not using the subpixel estimation. So I swapped to
R16G16_SNorm instead, which removed the issues while still retaining the same memory footprint. This is ever so slightly slower than the
R16G16_UNorm also works, and is technically twice the precision for the 0.0 to 1.0 range I was using, but it requires a small amount of extra math to scale the encoded range that added about 8 μs to the same 50 pixel radius test. And you could use the
R16G16_SNorm with a similar bit of extra math to get the same precision at the same relative cost increase. For 1080p, I didn’t think it was necessary.
Separable Axis JFA
The next optimization came in the form of an idea proposed by Alan Wolfe (aka demofox), who’s article I linked above.
This splits up each of the flood pass into two separate passes, each only doing one axis at a time. This nearly doubles the total number of passes required for the effect, but reduces the number of texture samples by a third. Amazingly it is faster! By about 35 μs total, bringing that 565 μs down to 530 μs. Honestly I have no idea if it’ll be faster on all GPUs though.
There is one more “easy” optimization. Like the Gaussian blur approach this technique lends itself well to relatively easy downsampling. For very wide outlines or higher resolutions it wouldn’t be too difficult to render the starting mask at a lower resolution, or downsample it for better quality, and run the JFA at that lower resolution. Especially with the added sub pixel estimation in the initialize pass. I did actually try this and it can really significantly improve performance as you would expect.
But also like the Gaussian blur it means some of the details get smoothed out. There’s also a gotcha with the final outline pass. You can’t just use bilinear sampling on the output from the jump flood passes as it’s interpolating an offset position. This produces very weird artifacts in any interior corners. So instead you need to add one more pass to decode the distance field to a texture for the final outline pass to sample from. And even then you may want to add some kind of higher quality bicubic or catmull-rom filtering to hide the linear artifacts of bilinear filtering. I don’t implement this. Might be a good option for supporting very high resolutions more easily.
A compute shader would likely be faster at this than the render texture approach I’m currently using. But that’s a task for another day.
So, obviously I failed at making a brute force outline that could compete with a more efficient approach. But I hope you enjoyed reading about my trip as much as I did. And now we have a much more efficient, and more adaptable outline technique. Mainly because a the jump flood based approach can do so much more. For example you could use a gradient instead of a simple edge. Even an animated gradient if you really want to be fancy.
You could also do interesting things like render out a depth texture and composite the outline into the scene with the depth of the closest edge! But I’ll leave you to play with that.