memory management musings

Just putting my JavaOne/Strange Loop slides out there.

Slides from Java One

Stay with me, I’m going somwhere with this…

OK, so last time we talked about generational garbage collection, and the write barriers necessary to allow collecting the young generation by itself.  This time we’re going to talk about two strategies for decreasing pause times.


The first strategy was for decreasing old generation pause times.  The insight was that young generation pause times are short, but old generation pauses are too disruptive to the application.  What if we could do some of the work (the marking of live objects) while the Java threads were running and thereby decrease old generation pause times?  This work became the Concurrent Mark Sweep (CMS) collector.  There’s one non-obvious problem with marking the heap while the Java threads are running.  If you store a pointer into an already marked object you need to be sure that the object referenced by the stored pointer gets scanned.  There are two ways to deal with this problem: incremental update, and Snapshot at The Beginning (SATB)  CMS choose to use SATB.  They enforced this with a write barrier that records updates to objects and makes sure that the references written to objects get scanned.

So, CMS does the marking of live objects concurrently with the Java threads.  It stops the world to do the initial scanning of thread stacks and other roots, and then stops the world again to make sure all the updates were processed and to do the sweeping and updating of free lists.   Because it sweeps up free space into free lists it has the downside of possibly leaving a fragmented heap where there’s enough space for an object, but the space isn’t contiguous.  You can read more about the CMS collector here.

The second collector of that time was the parallel collector. You can read about it in the paper:  Parallel Garbage Collection for Shared Memory Multiprocessors.  

The basic idea here was that if you are going to stop the world to do a garbage collection it makes sense to use all the available processors.  We used a dynamic load balancing algorithm called work stealing to partition the scanning of the object graph.  We were faster on two processors that the optimized collector was on 1 and we scaled linearly from there.


Next up:  Garbage First (G1) or Take what we learned from CMS, the parallel collector, and the Train algorithm and make “one collector to rule them all”.

Only the Good Die Young

It’s called many things in the literature but we’ll use “generational hypothesis”.  The insight is that most objects only survive for a short while, or a small number of GC cycles.  Conversely any object that has survived several GC cycles will probably survive a lot more.  If your program behaves in this fashion, and a lot of lisp and Java programs did, it makes sense to focus your GC effort on efficiently collecting those young objects.  You can do this by partitioning your heap into multiple generations.  A small frequently collected young generation and a larger less frequently collected old generation.

The astute reader (all of you I’m sure) will be wondering how we can collect the young  generation without having to collect the much larger old generation?  We do this by keeping track of references from the old generation to the young generation.  When we garbage collect the young generation we also have to look at the objects in the old generation that point to the young generation. We can keep a little cheat sheet of the locations of these objects  which is called a card table.  Each card represents an area of memory.  If there is an object in that memory that points to the young generation than that card is marked.  When it is time to perform a young generation garbage collection you only need to scan the areas represented by marked cards, not the entire old generation.  This card table must be kept up to date, and the only way to do this is adding a tax to writes which updates the card table if appropriate.  We call this tax a write barrier.

The first high performance GC’s in the JVM were generational collectors.  ExactVM had a semi-space collector and a mark/compact old generation.  Hotspot had an eden (where objects are allocated), a semi-space collector, and a mark/compact old generation.

Next up will be several approaches to making GC pauses shorter.

Kick Butt, Eat Lunch, and Take out the Garbage

That was the motto of the Sun Labs GC Research group which I joined back in 1998.  After spending 5 years working on Fortress, the best programming language no one has ever worked with, I’m back to doing GC work again.  I’d love to tell you about Shenandoah, the new GC we’re working on at Red Hat, but before I do I’m going to give you a GC history lesson.

Back in 1998 when I joined Sun the only JVM that I knew of was the reference VM.  This VM had what was called a conservative garbage collector.  You had to guess whether objects on the stack were pointers or not, if they looked like a correctly aligned address in allocated memory then they were considered a pointer.  This GC has the benefit that it is easy to implement, but it has a couple of deficits.  The first downside is that you can’t move any object that is referenced from the stack.   This means that you can’t compact the live objects, and that you can’t do fast “pointer bump” allocation.  This is huge.  The cost of having to maintain free lists, and more importantly find a free block at allocation time was too high.  The other downside was that integers that looked like pointers could keep dead objects alive. 

My group, I had little to do with this, ripped the guts out of the reference VM and added safepoints and stackmaps.  Stackmaps are a contract between the compiler and the garbage collector.  The compiler already has this information so it only makes sense for it to share with the GC.  Stackmaps tell you the type of every location on the stack.  Maintaining them all the time is expensive and inhibits many compiler optimizations, so therefore we need safepoints.  For example the compiler is allowed to keep pointers into the middle of objects between safepoints, but at safepoints the stack must be consistent.  Safepoints are frequent enough (allocations, backward branches, etc) that waiting to get to a safepoint to GC is reasonable. 

Safepoints and Stackmaps enabled the first generational copying collector which I’ll talk about in my next post.