JVM Anatomy Quark #23: Compressed References

The obvious limitation is the heap size. Once the heap size gets larger than the threshold under which compressed references are working, a surprising thing happens: references suddenly become uncompressed and take twice as much memory. Depending on how many references you have in the heap, you can have a significant increase in the perceived heap occupancy.

To illustrate that, let’s estimate how much heap is actually taken by allocating some objects, with the toy example like this:

import java.util.stream.IntStream; public class RandomAllocate { static Object[] arr; public static void main(String... args) { int size = Integer.parseInt(args[0]); arr = new Object[size]; IntStream.range(0, size).parallel().forEach(x -> arr[x] = new byte[(x % 20) + 1]); System.out.println("All done."); }

It is much more convenient to run with Epsilon GC, which would fail on heap exhaustion, rather than trying to GC its way out. There is no point in GC-ing this example, because all objects are reachable. Epsilon would also print heap occupancy stats for our convenience.

Let’s take some reasonable amount of small objects. 800M objects sounds enough? Run:

$ java -XX:+UseEpsilonGC -Xlog:gc -Xlog:gc+heap+coops -Xmx31g RandomAllocate 800000000
[0.004s][info][gc] Using Epsilon
[0.004s][info][gc,heap,coops] Heap address: 0x0000001000001000, size: 31744 MB, Compressed Oops mode: Non-zero disjoint base: 0x0000001000000000, Oop shift amount: 3
All done.
[2.380s][info][gc] Heap: 31744M reserved, 26322M (82.92%) committed, 26277M (82.78%) used

There, we took 26 GB to store those objects, good. Compressed references got enabled, so the references to those byte[] arrays are smaller now. But let’s suppose our friends who admin the servers said to themselves: "Hey, we have a gigabyte or two we can spare for our Java installation", and have bumped the old -Xmx31g to -Xmx33g. Then this happens:

$ java -XX:+UseEpsilonGC -Xlog:gc -Xlog:gc+heap+coops -Xmx33g RandomAllocate 800000000
[0.004s][info][gc] Using Epsilon
Terminating due to java.lang.OutOfMemoryError: Java heap space

Oopsies. Compressed references got disabled, because heap size is too large. References became larger, and the dataset does not fit anymore. I would say this again: the same dataset does not fit anymore just because we requested the excessively large heap size, even though we don’t even use it.

If we try to figure out what is the minimum heap size required to fit the dataset after 32 GB, this would be the minimum:

$ java -XX:+UseEpsilonGC -Xlog:gc -Xlog:gc+heap+coops -Xmx36g RandomAllocate 800000000
[0.004s][info][gc] Using Epsilon
All done.
[3.527s][info][gc] Heap: 36864M reserved, 35515M (96.34%) committed, 35439M (96.13%) used

See, we used to take ~26 GB for the dataset, now we are taking ~35 GB, almost 40% increase!