In this quick article, we'll see how the JVM makes sure to collect the unreachable but cyclic references.
First, we'll explore different types of GC algorithms. After that, we're going to see how the cyclic references are handled in the JVM.
It's also worth mentioning that GC is not part of the JVM specification and is left to the discretion of the implementor. Therefore, each JVM implementation may have different GC strategies or none at all.
In this article, we're focusing on one specific JVM implementation: the HotSpot JVM. We also may use the JVM and HotSpot JVM terms interchangeably throughout the article.
2. Reference Counting
Reference counting GC algorithms associate a reference count with each object. These algorithms consider an object to be alive as long as the number of references to that object is greater than zero. Usually, the runtime stores the reference count in the object header.
In a very naive implementation, each new reference to an object should trigger an atomic reference count increment. Likewise, each new dereference should trigger an atomic decrement.
The Swift programming language uses a form of reference counting for memory management. Also, there is no GC algorithm based on reference counting in the JVM.
2.1. Pros and Cons
On the bright side, reference counting can distribute memory management costs throughout the application lifecycle, as there are (almost) no periodic GC hiccups. Also, it can potentially destroy the objects as soon as their reference count reaches zero and become garbage.
Reference counting is no free lunch, either. In the naive implementation, updating the reference count can be inefficient as we need to increment or decrement it atomically. Few optimizations can make reference counting more efficient in this regard, such as deferred or buffered reference counting approaches.
However, there is still one serious issue with reference counting: it can't reclaim cyclic references.
For instance, suppose object A refers to object B and vice versa. Even if A and B become unreachable from the rest of the object graph, their reference count will never reach zero. That's because they still hold a reference to each other.
As it turns out, these sorts of cyclic references are pretty common in computer science. For example, let's consider the following doubly-linked list. At first, another object has a reference to the list:
The linked list is reachable from the object D, so it shouldn't be collected, and the reference counts are aligned with this expectation. Now, suppose the object D itself becomes unreachable:
Even though the linked list is also unreachable now, the reference count for its components is more than one. Therefore, with this naive reference counting implementation, the runtime won't consider this linked list as garbage, even though it is.
3. Tracing GCs
Tracing collectors will determine the objects' reachability by tracing them from a set of root objects, known as GC roots. If an object is reachable from a root object, either directly or indirectly, then it will be considered alive. Others are unreachable and candidates for collection:
Here's how a simple tracing collector works. Starting from the GC roots, it traverses the object graph recursively until there are no more gray objects left to visit. In the end, it considers all the white objects unreachable and candidates for collection. This is a simple depiction of the tri-color marking algorithm.
We can think of GC roots as objects that we're sure are alive. For instance, these are some GC roots in Java and JVM:
- Local variables or anything stack frames are referring to right now. These variables are used by currently executing methods, so we don't want to collect them
- Live threads
- Static variables
- Classes loaded by the system classloader
- JNI locals and globals
Tracing collectors, as opposed to reference counting collectors, will perform the collection process periodically. So, for most of the time, allocations and assignments should work fast. However, when the GC kicks off, there might be some hiccups.
On the bright side, these GC algorithms won't suffer from cyclic references. Instead of counting the references to each object, they traverse the object graph starting from the GC roots. Therefore, even if there are some cyclic references, objects will be collected as long as they're unreachable, as shown in the above diagram.
Quite interestingly, using a backup tracing collector in tandem with a reference counting GC is one of the conventional approaches to fix the cyclic references in reference counting.
3.1. The HotSpot JVM
All GC implementations in the HotSpot JVM, as of this writing, are tracing collectors, including CMS, G1, and ZGC. So, the JVM won't suffer from the cyclic reference issue. That's the key takeaway from this article!
In this quick article, we saw how the JVM handles cyclic references.
For a more detailed treatment of garbage collection, it's highly recommended to check out the garbage collection handbook.