Stay with me, I’m going somwhere with this…
OK, so last time we talked about generational garbage collection, and the write barriers necessary to allow collecting the young generation by itself. This time we’re going to talk about two strategies for decreasing pause times.
The first strategy was for decreasing old generation pause times. The insight was that young generation pause times are short, but old generation pauses are too disruptive to the application. What if we could do some of the work (the marking of live objects) while the Java threads were running and thereby decrease old generation pause times? This work became the Concurrent Mark Sweep (CMS) collector. There’s one non-obvious problem with marking the heap while the Java threads are running. If you store a pointer into an already marked object you need to be sure that the object referenced by the stored pointer gets scanned. There are two ways to deal with this problem: incremental update, and Snapshot at The Beginning (SATB) CMS choose to use SATB. They enforced this with a write barrier that records updates to objects and makes sure that the references written to objects get scanned.
So, CMS does the marking of live objects concurrently with the Java threads. It stops the world to do the initial scanning of thread stacks and other roots, and then stops the world again to make sure all the updates were processed and to do the sweeping and updating of free lists. Because it sweeps up free space into free lists it has the downside of possibly leaving a fragmented heap where there’s enough space for an object, but the space isn’t contiguous. You can read more about the CMS collector here.
The second collector of that time was the parallel collector. You can read about it in the paper: Parallel Garbage Collection for Shared Memory Multiprocessors.
The basic idea here was that if you are going to stop the world to do a garbage collection it makes sense to use all the available processors. We used a dynamic load balancing algorithm called work stealing to partition the scanning of the object graph. We were faster on two processors that the optimized collector was on 1 and we scaled linearly from there.
Next up: Garbage First (G1) or Take what we learned from CMS, the parallel collector, and the Train algorithm and make “one collector to rule them all”.