Basic Windbg - 4.Out Of Memory的分析及诊断方法

    这里省略了part3,因为还没想好要写啥,我的项目就出现了问题。CASE刚刚开好,虽然还没有技术人员和我联系,但是我还是要把自己的分析过程写下来。
    首先,什么是Out Of Memory?就是内存溢出,简称OOM(下边我就用这个简称了啊!)。说白了,就是程序想用内存的时候,OS没有那么多内存可以分配了,然后就抱OOM错误了。
    首先介绍一下我这个项目的情况,基于exchange 2007+sp1+hmc+web service call通过一个winform的模拟测试程序,单线程添加信息:循环25万次,每循环一次创建一个公司,开通邮件域名,并创建5个帐号,每个帐号都开通邮件服务。现在循环到100次左右的时候,w3wp.exe的内存占用为private bytes:100Mvirtual bytes:300M。听兄弟讲,他们做过类似的测试,当循环到400-500个的时候,会出现OOM的问题。

    既然是OOM,我们当然要介绍一个超级cool的工具:debugdiag!(这个工具以后再介绍,因为要贴N多图,实在痛苦……)通过debugdiagmemory leakdump239M),发现有如下问题:mscorwks泄漏了35M左右的内存。
    现在转到windbg中来,我们首先看命令!eeheap,它一共有两个参数:
0:000> !help eeheap
-------------------------------------------------------------------------------
!EEHeap [-gc] [-loader]

    首先看一下-gc的参数,!eeheap -gc。这个命令表明,我们程序占用托管堆的大小:
0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x01ff3f30
generation 1 starts at 0x01f5df5c
generation 2 starts at 0x019c1000
ephemeral segment allocation context: (0x020bcbf8, 0x020bd46c)
 segment    begin allocated     size
000ead50 7a72c43c  7a74d318 0x00020edc(134876)
000e62a0 790d55a0  790f4b50 0x0001f5b0(128432)
019c0000 019c1000  020bd46c 0x006fc46c(7324780)
Large object heap starts at 0x029c1000
 segment    begin allocated     size
029c0000 029c1000  02cfbbd8 0x0033abd8(3386328)
Total Size  0xa774d0(10974416)
------------------------------
GC Heap Size  0xa774d0(10974416)
    dump大小为239M,托管堆大小为11M不到,差别很大的!剩下的内存在哪里?现在来看一个新的命令,!address,运行后,会有一坨又一坨的输出,我们关心的是后面的summary,如下:

-------------------- Usage SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage
    89b8000 (  141024) : 06.72%    43.69%    : RegionUsageIsVAD
   6c4b4000 ( 1774288) : 84.61%    00.00%    : RegionUsageFree
    61d4000 (  100176) : 04.78%    31.03%    : RegionUsageImage
     7fc000 (    8176) : 00.39%    02.53%    : RegionUsageStack
      20000 (     128) : 00.01%    00.04%    : RegionUsageTeb
    4791000 (   73284) : 03.49%    22.70%    : RegionUsageHeap
          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap
       1000 (       4) : 00.00%    00.00%    : RegionUsagePeb
       1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs
       1000 (       4) : 00.00%    00.00%    : RegionUsageEnvironmentBlock
       Tot: 7fff0000 (2097088 KB) Busy: 13b3c000 (322800 KB)

-------------------- Type SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
   6c4b4000 ( 1774288) : 84.61%   : <free>
    66c3000 (  105228) : 05.02%   : MEM_IMAGE
    30a8000 (   49824) : 02.38%   : MEM_MAPPED
    a3d1000 (  167748) : 08.00%   : MEM_PRIVATE

-------------------- State SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
    ee99000 (  244324) : 11.65%   : MEM_COMMIT
   6c4b4000 ( 1774288) : 84.61%   : MEM_FREE
    4ca3000 (   78476) : 03.74%   : MEM_RESERVE

    上面的信息比较有意思,RegionUsageImage代表的是dlls占用的内存,一共是100M;RegionUsageheap,代表的是NT heaps,一共是73M;MEM_COMMIT和MEM_RESERVE加起来,是virtual memory,他俩的合计是322M。我们还少看了什么?!eeheap还有一个参数,是-loader,运行一下后,会有N长的结果,我们看一部分。
!eeheap –loader
Domain 2: fbd98
LowFrequencyHeap: 03ba0000(2000:2000) 03d80000(10000:10000) 03ee0000(10000:10000) 040b0000(10000:10000) 04ea0000(10000:10000) 04f70000(10000:f000) 05190000(10000:10000) 很黄很暴力。。。 1edc0000(10000:10000) 1ee00000(10000:3000) Size: 0x1ce2000(30285824)bytes.
Wasted: 0x13000(77824)bytes.
HighFrequencyHeap: 03ba2000(8000:8000) 03e10000(10000:f000) 04aa0000(10000:10000) 050c0000(10000:f000) 05180000(10000:10000) 05430000(10000:10000) 056d0000(10000:10000) 很黄很暴力 1eba0000(10000:10000) 1ee30000(10000:3000) Size: 0x7a9000(8032256)bytes.
Wasted: 0x2000(8192)bytes.

StubHeap: 03baa000(2000:1000) Size: 0x1000(4096)bytes.
Virtual Call Stub Heap:
  IndcellHeap: Size: 0x0(0)bytes.
  LookupHeap: Size: 0x0(0)bytes.
  ResolveHeap: Size: 0x0(0)bytes.
  DispatchHeap: Size: 0x0(0)bytes.
  CacheEntryHeap: 03bc3000(3000:1000) Size: 0x1000(4096)bytes.
Total size: 0x248c000(38322176)bytes

一共占用了38M内存。继续看下面的内容,我居然发现了
9998module!!!
Module Thunk heaps:
Module 699aa000: Size: 0x0(0)bytes.
Module 648e8000: Size: 0x0(0)bytes.
Module 01963a34: Size: 0x0(0)bytes.
Module 66366000: Size: 0x0(0)bytes.
=============很黄很暴力======================
Module 1ee30010: Size: 0x0(0)bytes.
Module 1ee30634: Size: 0x0(0)bytes.
Module 1ee30c44: Size: 0x0(0)bytes.
Module 1ee3124c: Size: 0x0(0)bytes.
Total size: 0x0(0)bytes
--------------------------------------
Total LoaderHeap size: 0x2a3f000(44298240)bytes
=======================================


    问题基本出来了,居然在内存里面有将近1万个module!随便dump出来一个,看看。这里又有一个命令:!dumpmodule
随便
dump出来一个看,类似如下信息:
0:000> !dumpmodule 1ee30010
Name: qrt-x6cw, Version=4.0.460.0, Culture=neutral, PublicKeyToken=null
Attributes: PEFile
Assembly: 1cf04e28
LoaderHeap: 00000000
TypeDefToMethodTableMap: 1edcc454
TypeRefToMethodTableMap: 1edcc46c
MethodDefToDescMap: 1edcc4c4
FieldDefToDescMap: 1edcc530
MemberRefToDescMap: 1edcc54c
FileReferencesMap: 1edcc634
AssemblyReferencesMap: 1edcc638
MetaData start address: 1ed20848 (4256 bytes)

    这里可能看不到啥,那么我们加一个参数-mt来看看

0:000> !dumpmodule -mt 1ee30010
Name: qrt-x6cw, Version=4.0.460.0, Culture=neutral, PublicKeyToken=null
Attributes: PEFile
Assembly: 1cf04e28
LoaderHeap: 00000000
TypeDefToMethodTableMap: 1edcc454
TypeRefToMethodTableMap: 1edcc46c
MethodDefToDescMap: 1edcc4c4
FieldDefToDescMap: 1edcc530
MemberRefToDescMap: 1edcc54c
FileReferencesMap: 1edcc634
AssemblyReferencesMap: 1edcc638
MetaData start address: 1ed20848 (4256 bytes)

Types defined in this module

      MT    TypeDef Name
------------------------------------------------------------------------------
1ee3059c 0x02000003 Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderCreateUserResponseData
1ee3046c 0x02000006 Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializerContract

Types referenced in this module

      MT    TypeRef Name
------------------------------------------------------------------------------
69e58eac 0x01000002 System.Xml.Serialization.XmlSerializationReader
69e59b84 0x01000004 System.Xml.Serialization.XmlSerializerImplementation
03e10b4c 0x01000005 Microsoft.Provisioning.WebServices.HostedActiveDirectory.CreateUserResponseData
69e32be0 0x01000006 System.Xml.XmlReader
790fd8cc 0x01000007 System.Collections.Hashtable
790f8a94 0x0100000e System.Object
69e35c38 0x01000013 System.Xml.XmlQualifiedName
79103c18 0x01000014 System.Boolean
69e3a5b0 0x01000015 System.Xml.XmlNameTable

    出现了System.Xml.Serialization,大家熟悉吗?我们转过头来看debugdiag分析的call stack。

Call stack sample 1

Address   0x0c790000
Allocation Time   00:05:04 since tracking started
Allocation Size    Bytes

Function   Source   Destination
mscorjit!norls_allocator::nraAllocNewPage+52      
mscorjit!norls_allocator::nraAlloc+19      mscorjit!norls_allocator::nraAllocNewPage
mscorjit!jitNativeCode+84      mscorjit!norls_allocator::nraAlloc
mscorjit!CILJit::compileMethod+3d      mscorjit!jitNativeCode
0x79E9C9E0      
mscorjit!Compiler::impExpandInline+1aa      
mscorjit!Compiler::fgMorphTree+40      
mscorjit!Compiler::fgMorphStmts+63      mscorjit!Compiler::fgMorphTree
mscorjit!Compiler::fgMorphBlocks+79      mscorjit!Compiler::fgMorphStmts
mscorjit!Compiler::fgMorph+60      mscorjit!Compiler::fgMorphBlocks
mscorjit!Compiler::compCompile+5f      mscorjit!Compiler::fgMorph
mscorjit!Compiler::compCompile+2d8      mscorjit!Compiler::compCompile
mscorjit!jitNativeCode+b8      mscorjit!Compiler::compCompile
mscorjit!CILJit::compileMethod+3d      mscorjit!jitNativeCode
0x79EED3FF      
System.Xml.Serialization.TempAssembly.InvokeReader(System.Xml.Serialization.XmlMapping, System.Xml.XmlReader, System.Xml.Serialization.XmlDeserializationEvents, System.String)      
System.Xml.Serialization.TempAssembly.InvokeReader(System.Xml.Serialization.XmlMapping, System.Xml.XmlReader, System.Xml.Serialization.XmlDeserializationEvents, System.String)      
System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String, System.Xml.Serialization.XmlDeserializationEvents)      System.Xml.Serialization.TempAssembly.InvokeReader(System.Xml.Serialization.XmlMapping, System.Xml.XmlReader, System.Xml.Serialization.XmlDeserializationEvents, System.String)
System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String)      System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String, System.Xml.Serialization.XmlDeserializationEvents)
System.Xml.Serialization.XmlSerializer.Deserialize(System.IO.Stream)      System.Xml.Serialization.XmlSerializer.Deserialize(System.Xml.XmlReader, System.String)
Microsoft.Provisioning.Sdk.Xml.Serialization.ProvisioningObjectFactory.Convert[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon)      System.Xml.Serialization.XmlSerializer.Deserialize(System.IO.Stream)
Microsoft.Provisioning.WebServices.ServiceBase.Submit[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon, System.String, System.String, Boolean)      Microsoft.Provisioning.Sdk.Xml.Serialization.ProvisioningObjectFactory.Convert[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.__Canon)
Microsoft.Provisioning.WebServices.HostedActiveDirectory.Service.CreateOrganization(Microsoft.Provisioning.WebServices.HostedActiveDirectory.CreateOrganizationRequest, Boolean)      
0x79E7BE1B      
0x7A11E56E      
System.Web.Services.Protocols.LogicalMethodInfo.Invoke(System.Object, System.Object[])      
System.Web.Services.Protocols.WebServiceHandler.Invoke()      System.Web.Services.Protocols.LogicalMethodInfo.Invoke(System.Object, System.Object[])
0x79F35FC8      
System.Threading._TimerCallback.TimerCallback_Context(System.Object)      
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)      
webengine!HashtableIUnknown::AddCallback+a      
webengine!HttpCompletion::ProcessRequestInManagedCode+1a3      
webengine!HttpCompletion::ProcessRequestInManagedCode+1a3      
webengine!HttpCompletion::ProcessCompletion+3e      webengine!HttpCompletion::ProcessRequestInManagedCode
webengine!CorThreadPoolWorkitemCallback+18      
0x79F70305      
0x79F8BC87      
kernel32!BaseThreadStart+34

    看到这里,基本差不多。偶认为是exchange2007内部的代码问题。此话怎讲?从头说,在System.Xml.Serialization下面,有一个by design的“bug”。我们用reflector看XmlSerializer的构造代码:

 1  this.tempAssembly = cache[defaultNamespace, type];
 2        if (this.tempAssembly == null)
 3        {
 4            lock (cache)
 5            {
 6                this.tempAssembly = cache[defaultNamespace, type];
 7                if (this.tempAssembly == null)
 8                {
 9                    XmlSerializerImplementation implementation;
10                    Assembly assembly = TempAssembly.LoadGeneratedAssembly(type, defaultNamespace, out implementation);
11                    if (assembly == null)
12                    {
13                        this.mapping = new XmlReflectionImporter(defaultNamespace).ImportTypeMapping(type, null, defaultNamespace);
14                        this.tempAssembly = GenerateTempAssembly(this.mapping, type, defaultNamespace);
15                    }

16                    else
17                    {
18                        this.mapping = XmlReflectionImporter.GetTopLevelMapping(type, defaultNamespace);
19                        this.tempAssembly = new TempAssembly(new XmlMapping[] this.mapping }, assembly, implementation);
20                    }

21                }

22                cache.Add(defaultNamespace, type, this.tempAssembly);
23            }

24
    为了加快运行速度,xmlserializer做了cache。在代码中,也许我们有N多的type,那么每个type在这里都做了一个assembly,都把assembly放到了cache中。这样,本来没问题。但是如果type有几千个,有几万个,那么就有几千几万个temp assembly出现。这些assembly都很小,也许只有100个字节,也许是0字节。但是注意的是,内存分配时,是按照“块”来分配的。假如说每个块最小为4k大小。那么即使只分配一个字节的内存,我们也要申请一个4k的page。那么,如果我们的小块非常非常多,那么我们身子缩小N倍后,你会发现内存里面四处都是小窟窿。这些窟窿加起来很大,但是别人就是不能用。因为这4k内存,必须要连续的块。
    so,当我们发现w3wp.exe仅仅3、4百兆的时候,就报OOM了,就是这个原因。

    关于这个“bug”,可以看msdn这个kb:http://support.microsoft.com/kb/886385/en-us

    至于,是不是这个原因,我也需要等gtec的人确认。现在的工作是,先把这个心得写上。如果有错误,以后会修正。
posted @ 2008-01-15 15:47 鞠强 阅读(...) 评论(...) 编辑 收藏

hello

world