Java中的四种引用学习(一) —— Understanding Weak References Blog
Some time ago I was interviewing candidates for a Senior Java Engineer position. Among the many questions I asked was "What can you tell me about weak references?" I wasn't expecting a detailed technical treatise on the subject. I would probably have been satisfied with "Umm... don't they have something to do with garbage collection?" I was instead surprised to find that out of twenty-odd engineers, all of whom had at least five years of Java experience and good qualifications, only two of them even knew that weak references existed, and only one of those two had actual useful knowledge about them. I even explained a bit about them, to see if I got an "Oh yeah" from anybody -- nope. I'm not sure why this knowledge is (evidently) uncommon, as weak references are a massively useful feature which have been around since Java 1.2 was released, over seven years ago.
Now, I'm not suggesting you need to be a weak reference expert to qualify as a decent Java engineer. But I humbly submit that you should at least know what they are -- otherwise how will you know when you should be using them? Since they seem to be a little-known feature, here is a brief overview of what weak references are, how to use them, and when to use them.
Strong references
First I need to start with a refresher on strong references. A strong reference is an ordinary Java reference, the kind you use every day. For example, the code:
StringBuffer buffer = new StringBuffer();
creates a new StringBuffer() and stores a strong reference to it in the variable buffer. Yes, yes, this is kiddie stuff, but bear with me. The important part about strong references -- the part that makes them "strong" -- is how they interact with the garbage collector. Specifically, if an object is reachable via a chain of strong references (strongly reachable), it is not eligible for garbage collection. As you don't want the garbage collector destroying objects you're working on, this is normally exactly what you want.
理解:我们平常new出来的引用就是所谓的强引用,他所指的对象不能被GC掉。
When strong references are too strong
It's not uncommon for an application to use classes that it can't reasonably extend. The class might simply be marked final, or it could be something more complicated, such as an interface returned by a factory method backed by an unknown (and possibly even unknowable) number of concrete implementations. Suppose you have to use a class Widget and, for whatever reason, it isn't possible or practical to extend Widget to add new functionality.
理解:我们经常使用无法扩充的类,比如说final类或者工厂方法返回的接口,这个接口具体的实现数目不可知。假如你要用一个Widget类,但是这个类却不能增加新的函数。
What happens when you need to keep track of extra information about the object? In this case, suppose we find ourselves needing to keep track of each Widget's serial number, but the Widget class doesn't actually have a serial number property -- and because Widget isn't extensible, we can't add one. No problem at all, that's what HashMaps are for:
serialNumberMap.put(widget, widgetSerialNumber);
This might look okay on the surface, but the strong reference to widget will almost certainly cause problems. We have to know (with 100% certainty) when a particular Widget's serial number is no longer needed, so we can remove its entry from the map. Otherwise we're going to have a memory leak (if we don't remove Widgets when we should) or we're going to inexplicably find ourselves missing serial numbers (if we remove Widgets that we're still using). If these problems sound familiar, they should: they are exactly the problems that users of non-garbage-collected languages face when trying to manage memory, and we're not supposed to have to worry about this in a more civilized language like Java.
理解:假定你需要持续跟踪每一个widget的序列号,但是widget并没有序列号这个属性,而且不能进行扩展,只能应用HashMap.这时候我们就必须100%确定地知道一个特定对象的序列号什么时候不再需要(比如对象的生命周期已经结束,就不再需要它的序列号属性),这样我们就可以从map中移除它的entry.如果我们在应当移除引用的时候没有移除,垃圾回收将一直不会回收这个对象,引起内存泄露。而如果我们过早地移除了我们还在使用的对象的引用,又会发现自己丢失了信息。
Another common problem with strong references is caching, particular with very large structures like images. Suppose you have an application which has to work with user-supplied images, like the web site design tool I work on. Naturally you want to cache these images, because loading them from disk is very expensive and you want to avoid the possibility of having two copies of the (potentially gigantic) image in memory at once.
Because an image cache is supposed to prevent us from reloading images when we don't absolutely need to, you will quickly realize that the cache should always contain a reference to any image which is already in memory. With ordinary strong references, though, that reference itself will force the image to remain in memory, which requires you (just as above) to somehow determine when the image is no longer needed in memory and remove it from the cache, so that it becomes eligible for garbage collection. Once again you are forced to duplicate the behavior of the garbage collector and manually determine whether or not an object should be in memory.
理解:强引用另一个常见的问题是缓存问题。比方说,图像的缓存。图像缓存应当阻止我们重复载入图像。所以图像缓存保存有内存中已有的所有图像的引用,如果使用通常的强引用,强引用本身会使得图像一直存留在内存中,这样就使得程序员像上面一样,必须自己决定什么时候移除缓存中的引用,这样对象才能被垃圾回收机制回收。这样你就又放弃了让GC自己管理垃圾回收的机制,而开始手动地管理内存。
Weak references
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself. You create a weak reference like this:
WeakReference<Widget> weakWidget = new WeakReference<Widget>(widget);
and then elsewhere in the code you can use weakWidget.get() to get the actual Widget object. Of course the weak reference isn't strong enough to prevent garbage collection, so you may find (if there are no strong references to the widget) that weakWidget.get() suddenly starts returning null.
理解:弱引用,简单的来说,就是这个引用不能够保证他所指的对象一定能够保存在内存当中。你可以用WeakReference<Widget> weakWidget = new WeakReference<Widget>(widget)的方法创建一个弱引用weakWidget来指向一个widget对象,weakWidget.get()来获取Widget对象。如果强引用widget不存在了,则widget对象就可能被GC, weakWidget.get()就开始返回空(因为widget对象不存在了)。
To solve the "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.
理解:解决widget问题的最简单的的方法就是用内置的WeakHashMap类。WeakHashMap和HashMap最大的不同就是WeakHashMap的“键值”(不是value值)是弱引用而不是强引用。这样如果键值(弱引用)被GC,那么他所指的对象就会被GC(如果没有其他强引用指向这个对象,也就是这个对象已经成为了垃圾)。这样就可以通过将HashMap转换成为WeakHashMap来避免上述的问题。如果你用Map接口来指向你的Map(不是HashMap引用),那么其他的代码就不用修改。
Reference queues
Once a WeakReference starts returning null, the object it pointed to has become garbage and the WeakReference object is pretty much useless. This generally means that some sort of cleanup is required; WeakHashMap, for example, has to remove such defunct entries to avoid holding onto an ever-increasing number of dead WeakReferences.
The ReferenceQueue class makes it easy to keep track of dead references. If you pass a ReferenceQueue into a weak reference's constructor, the reference object will be automatically inserted into the reference queue when the object to which it pointed becomes garbage. You can then, at some regular interval, process the ReferenceQueue and perform whatever cleanup is needed for dead references.
理解:如果WeakReference开始返回null,说明它所指向的对象已经被GC,那么这个weakReference开始返回空,说明它已经是无用的了,可以被GC。WeakHashMap需要移除已死的实体(key值),避免持有持续增长的死弱引用。ReferenceQueue类可以使跟踪死弱引用变得简单。当你在weakRefernece的构造函数中传递一个ReferenceQueue对象参数,那么当弱引用所指的对象变成垃圾的时候,这个弱引用就会进入ReferenceQueue中,这样你就可以在一些间隔中进行无用引用(弱引用)的清理工作。
Different degrees of weakness
Up to this point I've just been referring to "weak references", but there are actually four different degrees of reference strength: strong, soft, weak, and phantom, in order from strongest to weakest. We've already discussed strong and weak references, so let's take a look at the other two.
理解:前面介绍的是强引用和弱引用,下面将介绍软引用和虚引用。
Soft references
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
SoftReferences aren't required to behave any differently than WeakReferences, but in practice softly reachable objects are generally retained as long as memory is in plentiful supply. This makes them an excellent foundation for a cache, such as the image cache described above, since you can let the garbage collector worry about both how reachable the objects are (a strongly reachable object will never be removed from the cache) and how badly it needs the memory they are consuming.
理解:软引用和弱引用差不多,相比于弱引用,软引用不是那么渴望丢弃自己所指的对象。一个弱可达性对象(就是这个对象最多能被弱引用指向)在下一轮GC中将被抛弃,但是一个软可达对象将会逗留一段时间。软引用和弱引用并没有太大的差异,实际应用中,如果内存充足的话,内存中的软可达对象将会一直保留在内存当中。软引用的这个特性使其非常适用于作为缓存,比如说图片缓存,因为你可以让垃圾回收器来管理对象的可达性(强可达对象永远不会被移除出缓存)和对象所需内存的紧迫性。
Phantom references
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead. How is that different from WeakReference, though?
理解:虚引用和软引用以及弱引用都不太一样。它对对象的“把握”程度很弱,以至于虚引用的get方法经常返回空值(也就是说如果一个对象被虚引用所指,但是没有其他更强一些引用所指向的话,这个对象是可以被GC掉的)。我们可以跟踪到一个虚引用进入ReferenceQueue中,也就是它所指向的对象已经死亡,但是虚引用和若引用又有什么区别呢?
The difference is in exactly when the enqueuing happens. WeakReferences are enqueued as soon as the object to which they point becomes weakly reachable. This is before finalization or garbage collection has actually happened; in theory the object could even be "resurrected" by an unorthodox finalize() method, but the WeakReference would remain dead. PhantomReferences are enqueued only when the object is physically removed from memory, and the get() method always returns null specifically to prevent you from being able to "resurrect" an almost-dead object.
理解:区别在于进入队列发生时。1.当弱引用所指的对象变成弱可达的时候,它(弱引用)会马上进入到队列当中。这(弱引用进入队列这个事情)发生在对象析构和GC之前,理论上这个对象可以被复活,通过调用一个finalize函数,但是弱引用仍然保持死亡。2.虚引用进入队列发生在虚引用所指的对象被移除出了物理内存,而且虚引用的get方法会经常返回空,以防止你复活一个即将死亡的对象。
What good are PhantomReferences? I'm only aware of two serious cases for them: first, they allow you to determine exactly when an object was removed from memory. They are in fact the only way to determine that. This isn't generally that useful, but might come in handy in certain very specific circumstances like manipulating large images: if you know for sure that an image should be garbage collected, you can wait until it actually is before attempting to load the next image, and therefore make the dreaded OutOfMemoryErrorless likely.
Second, PhantomReferences avoid a fundamental problem with finalization: finalize() methods can "resurrect" objects by creating new strong references to them. So what, you say? Well, the problem is that an object which overrides finalize() must now be determined to be garbage in at least two separate garbage collection cycles in order to be collected. When the first cycle determines that it is garbage, it becomes eligible for finalization. Because of the (slim, but unfortunately real) possibility that the object was "resurrected" during finalization, the garbage collector has to run again before the object can actually be removed. And because finalization might not have happened in a timely fashion, an arbitrary number of garbage collection cycles might have happened while the object was waiting for finalization. This can mean serious delays in actually cleaning up garbage objects, and is why you can get OutOfMemoryErrors even when most of the heap is garbage.
理解:虚引用的好处:(1)虚引用能够让你准确地确定一个对象被移除出了内存,实际上也是唯一的方法。虽然这个方法不具有普遍的适用性,但是在一些特定的场景下能够派的上用场,比如操纵大型图像:如果你想知道一个图像是否被GC,那么在加载下一张图片之前,你可以运用虚引用的get方法是否返回空来判断上一张图片是否被移除出了物理内存,以避免发生内存泄漏(2)虚引用能够从根本上解决析构函数所带来的问题:析构函数能够通过创建一个强引用的方式来复活一个对象。这样如果一个要被GC的对象重写了析构函数,那你就必须在至少两个GC周期后才能确定这个对象是否被GC掉。第一个周期用来确定这个对象已经成为了垃圾,可以被回收。但是这个对象在析构的过程中可能会被复活,所以还需要一次GC周期来确定这个对象被移除掉了。而且由于析构过程不能够及时的发生,所以在这个对象真正被析构前可能经历了数次的GC循环,也就意味着垃圾对象的清理存在很强的延迟性,这就是为什么在堆中充满垃圾对象的情况下出现了内存溢出错误。
With PhantomReference, this situation is impossible -- when a PhantomReference is enqueued, there is absolutely no way to get a pointer to the now-dead object (which is good, because it isn't in memory any longer). Because PhantomReference cannot be used to resurrect an object, the object can be instantly cleaned up during the first garbage collection cycle in which it is found to be phantomly reachable. You can then dispose whatever resources you need to at your convenience.
Arguably, the finalize() method should never have been provided in the first place. PhantomReferences are definitely safer and more efficient to use, and eliminating finalize() would have made parts of the VM considerably simpler. But, they're also more work to implement, so I confess to still using finalize()most of the time. The good news is that at least you have a choice.
理解:如果用虚引用的话,这种情形变得不可能 — 当一个虚引用进入队列后,我们完全无法得到一个已死的对象的指针(这样很好,因为这个对象已经完全不在内存当中了)。因为虚引用无法被用来复活一个对象,所以这个对象在第一次GC循环中就被清除掉了。这样你就可以根据自己的需要来处理资源。finalize函数的应用存在很大的争议。相比于析构函数,虚引用可能更安全更有效,移除掉析构函数会使得虚拟机变得更为简单。
Conclusion
I'm sure some of you are grumbling by now, as I'm talking about an API which is nearly a decade old and haven't said anything which hasn't been said before. While that's certainly true, in my experience many Java programmers really don't know very much (if anything) about weak references, and I felt that a refresher course was needed. Hopefully you at least learned a little something from this review.
浙公网安备 33010602011771号