使用urllib2的HttpResponse导致内存不回收(内存泄漏)

  • 问题出现环境:python 2.7.1(X)及以下, Windows(或CentOS)

这个问题产生在lib/urllib2.py的line 1174 (python 2.7.1),导致形成了cycle,即使调用gc.collect()也不能释放到HttpResponse等相关联对象(gc.garbage可以查看)

 1    r.recv = r.read
 2 
 3         fp = socket._fileobject(r, close=True)
 4 
 5          resp = addinfourl(fp, r.msg, req.get_full_url())
 6 
 7         resp.code = r.status
 8 
 9         resp.msg = r.reason
10 
11         return resp 

在python官方网站上很早发现了此BUG(见以下两个issues),但就是没有正式解决此问题。不过以下两个threads可以得到workarounds。

 

http://bugs.python.org/issue1208304

http://bugs.python.org/issue7464


  • 引申一下,如果python代码写成这样(自己写代码犯的一个错误),会导致以上相同cycle问题,从而导致内存泄漏。
1 class T(object):
2     def __init__(self):
3         self.test = self.test0
4         
5     def test0(self, d={}):
6         d['a'] = 1

在python shell运行如下:

 1 Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32
 2 Type "help", "copyright", "credits" or "license" for more information.
 3 >>> import gc
 4 >>> gc.set_debug(gc.DEBUG_LEAK)
 5 >>> class T(object):
 6 ...     def __init__(self):
 7 ...         self.test = self.test0
 8 ...
 9 ...     def test0(self, d={}):
10 ...         d['a'] = 1
11 ...
12 >>> t=T()
13 >>> del t
14 >>> gc.collect()
15 gc: collectable <T 0260D870>
16 gc: collectable <instancemethod 01DCFDF0>
17 gc: collectable <dict 0260EA50>
18 3
19 >>> for _item in gc.garbage:
20 ...     print _item
21 ...
22 <__main__.T object at 0x0260D870>
23 <bound method T.test0 of <__main__.T object at 0x0260D870>>
24 {'test': <bound method T.test0 of <__main__.T object at 0x0260D870>>}

导致不能释放内存即是以上红色字体部分,可以通过调用GC自带两方法查看为什么会形成cycle。

 1 >>> t2=T()
 2 >>> gc.get_referrers(t2)
 3 [<bound method T.test0 of <__main__.T object at 0x0260D890>>, {'__builtins__': <module '__builtin__' (built-in)>, 't2': <__main__.T object at 0x0260D890>, '__package__': None, 'gc'
 4 : <module 'gc' (built-in)>, 'T': <class '__main__.T'>, '__name__': '__main__', '__doc__': None, '_item': {'test': <bound method T.test0 of <__main__.T object at 0x0260D870>>}}]
 5 >>> for _item in gc.get_referrers(t2):
 6 ...     print _item
 7 ...
 8 <bound method T.test0 of <__main__.T object at 0x0260D890>>
 9 {'__builtins__': <module '__builtin__' (built-in)>, 't2': <__main__.T object at 0x0260D890>, '__package__': None, 'gc': <module 'gc' (built-in)>, 'T': <class '__main__.T'>, '__name
10 __': '__main__', '__doc__': None, '_item': {...}}
11 >>> for _item in gc.get_referents(t2):
12 ...     print _item
13 ...
14 {'test': <bound method T.test0 of <__main__.T object at 0x0260D890>>}
15 <class '__main__.T'>
gc.get_referrers:Return the list of objects that directly refer to any of objs.
返回引用t2的对象,包括<bound method T.test0 of <__main__.T object at 0x0260D890>>对象
gc.get_referents:Return a list of objects directly referred to by any of the arguments.
返回被t2引用的对象,包括<bound method T.test0 of <__main__.T object at 0x0260D890>>对象
  • 以下情况不产生cycle:
 1 class T2(object):
 2     def __init__(self):
 3         pass
 4         
 5     def test(self):
 6         return self.test0()
 7         
 8     def test0(self, d={}):
 9         d['a'] = 1
10 class T3(object):
11     def __init__(self):
12         self.test = self.test0
13         
14     @classmethod    
15     def test0(cls, d={}):
16         d['a'] = 1
17     kkk = test0

 



 

 

 

 

posted @ 2013-06-25 00:01  大料  阅读(1268)  评论(0编辑  收藏  举报