How Garbage Collection Really Works

Java Memory Management

Java Memory Management, with its built-in garbage collection, is one of the language’s finest achievements. It allows developers to create new objects without worrying explicitly about memory allocation and deallocation, because the garbage collector automatically reclaims memory for reuse. This enables faster development with less boilerplate code, while eliminating memory leaks and other memory-related problems. At least in theory.

Ironically, Java garbage collection seems to work too well, creating and removing too many objects. Most memory-management issues are solved, but often at the cost of creating serious performance problems. Making garbage collection adaptable to all kinds of situations has led to a complex and hard-to-optimize system. In order to wrap your head around garbage collection, you need first to understand how memory management works in a Java Virtual Machine (JVM).

How Garbage Collection Really Works

Many people think garbage collection collects and discards dead objects. In reality, Java garbage collection is doing the opposite! Live objects are tracked and everything else designated garbage. As you’ll see, this fundamental misunderstanding can lead to many performance problems.

Let's start with the heap, which is the area of memory used for dynamic allocation. In most configurations the operating system allocates the heap in advance to be managed by the JVM while the program is running. This has a couple of important ramifications:

  • Object creation is faster because global synchronization with the operating system is not needed for every single object. An allocation simply claims some portion of a memory array and moves the offset pointer forward (see Figure 2.1). The next allocation starts at this offset and claims the next portion of the array.
  • When an object is no longer used, the garbage collector reclaims the underlying memory and reuses it for future object allocation. This means there is no explicit deletion and no memory is given back to the operating system.
New objects are simply allocated at the end of the used heapFigure 2.1: New objects are simply allocated at the end of the used heap.

 

All objects are allocated on the heap area managed by the JVM. Every item that the developer uses is treated this way, including class objects, static variables, and even the code itself. As long as an object is being referenced, the JVM considers it alive. Once an object is no longer referenced and therefore is not reachable by the application code, the garbage collector removes it and reclaims the unused memory. As simple as this sounds, it raises a question: what is the first reference in the tree?

Garbage-Collection Roots — The Source of All Object Trees

Every object tree must have one or more root objects. As long as the application can reach those roots, the whole tree is reachable. But when are those root objects considered reachable? Special objects called garbage-collection roots (GC roots; see Figure 2.2) are always reachable and so is any object that has a garbage-collection root at its own root.

There are four kinds of GC roots in Java:

  1. Local variables are kept alive by the stack of a thread. This is not a real object virtual reference and thus is not visible. For all intents and purposes, local variables are GC roots.
  2. Active Java threads are always considered live objects and are therefore GC roots. This is especially important for thread local variables.
  3. Static variables are referenced by their classes. This fact makes them de facto GC roots. Classes themselves can be garbage-collected, which would remove all referenced static variables. This is of special importance when we use application servers, OSGi containers or class loaders in general. We will discuss the related problems in the Problem Patterns section.
  4. JNI References are Java objects that the native code has created as part of a JNI call. Objects thus created are treated specially because the JVM does not know if it is being referenced by the native code or not. Such objects represent a very special form of GC root, which we will examine in more detail in the Problem Patterns section below.
GC Roots are objects that are themselves referenced by the JVM and thus keep every other object from being garbage collected.Figure 2.2: GC roots are objects that are themselves referenced by the JVM and thus keep every other object from being garbage-collected.

 

Therefore, a simple Java application has the following GC roots:

  • Local variables in the main method
  • The main thread
  • Static variables of the main class

Marking and Sweeping Away Garbage

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. As you might intuit, it’s a straightforward, two-step process:

  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.

Garbage collection is intended to remove the cause for classic memory leaks: unreachable-but-not-deleted objects in memory. However, this works only for memory leaks in the original sense. It’s possible to have unused objects that are still reachable by an application because the developer simply forgot to dereference them. Such objects cannot be garbage-collected. Even worse, such a logical memory leak cannot be detected by any software (see Figure 2.3). Even the best analysis software can only highlight suspicious objects. We will examine memory leak analysis in the Analyzing the Performance Impact of Memory Utilization and Garbage Collection section, below.

When objects are no longer referenced directly or indirectly by a GC root, they will be removed. There are no classic memory leaks. Analysis cannot really identify memory leaks, it can only hint at suspicious objectsFigure 2.3: When objects are no longer referenced directly or indirectly by a GC root, they will be removed. There are no classic memory leaks. Analysis cannot really identify memory leaks; it can only point out suspicious objects.
 
 
 
 
 

引用计数收集器

引用计数是垃圾收集器中的早期策略。在这种方法中,堆中每个对象(不是引用)都有一个引用计数。当一个对象被创建时,且将该对象分配给一个变量,该变量计数设置为1。当任何其它变量被赋值为这个对象的引用时,计数加1(a = b,则b引用的对象+1),但当一个对象的某个引用超过了生命周期或者被设置为一个新值时,对象的引用计数减1。任何引用计数为0的对象可以被当作垃圾收集。当一个对象被垃圾收集时,它引用的任何对象计数减1。

优点:引用计数收集器可以很快的执行,交织在程序运行中。对程序不被长时间打断的实时环境比较有利。

缺点: 无法检测出循环引用。如父对象有一个对子对象的引用,子对象反过来引用父对象。这样,他们的引用计数永远不可能为0.

var o = { 
  a: {
    b:2
  }
}; 
// 两个对象被创建,一个做为另一个的属性被引用,另一个被分配给变量o
// 很显然,没有一个可以被垃圾收集


var o2 = o; // o2变量是第二个对“这个对象”的引用
o = 1; // 现在,“这个对象”的原始引用o被o2替换了

var oa = o2.a; // 引用“这个对象”的a属性
// 现在,“这个对象”有两个引用了,一个是o2,一个是oa

o2 = "yo"; // 最初的对象现在已经是零引用了
// 他可以被垃圾回收了
// 然而它的属性a的对象还在被oa引用,所以还不能回收

oa = null; // a属性的那个对象现在也是零引用了
// 它可以被垃圾回收了
   public void buidDog()
        {
           Dog newDog = new Dog();
          Tail newTail = new Tail();
          newDog.tail = newTail;
         newTail.dog = newDog;
        }
        在这里,newTail中拿着对newDog的引用,newDog中拿着对newTail的引用。如果newDog要被回收,前提是newTail被先回收,这样才能释放对newDog的引用。但是反回过来,newTail要被回收的前提是newDog要被先回收。当buildDog函数退出后,看起来垃圾回收管理似乎就始终无法回收这两个实际已经不再需要的对象。

 

posted on 2013-06-13 07:53  brave_bo  阅读(194)  评论(0)    收藏  举报

导航