测试中心反应: 在跑某些查询的时候GS有时候会出现挂起的症状,客户端都没发登录进去,只能手动结束dllhost进程才可。
调试了三天了,结果还是不明朗
程序hang住了,首先想到的是抓hang dump.
但是这个hang 确是有内存访问问题造成的。
沿着这条路分析了两天没有多大进展。
于是今天开始抓crash dump.
问题就是出在下面这块里
019fd744 774ef5ba 00090000 00000000 00000014 ntdll!RtlAllocateHeap+0x58a
019fd758 774ed5a4 775d22ac 00000014 019fd804 ole32!CRetailMalloc_Alloc+0x16
019fd768 776da201 00000014 77686f6c 77686e7c ole32!CoTaskMemAlloc+0x13
019fd804 776d4285 00000001 00000001 00000001 clbcatq!CSLTComs::GetTable+0x9c0
019fd8e8 776d4347 00000000 77686e88 77686e78 clbcatq!CSimpleTableDispenser::GetClientOrServerTable+0x61f
019fd910 776a3712 776f7998 77686e88 77686e78 clbcatq!CSimpleTableDispenser::GetServerTable+0x24
019fd940 776a422e 77686e78 019fd978 00000002 clbcatq!CComCLBCatalog::GetReadTable+0x39
019fda38 776a7596 77686e78 7769c7b4 776818c8 clbcatq!CComClass::FindCorrectClass+0x178
019fdac4 77684f32 01083e94 775cfbfc 00000000 clbcatq!CComClass::Init+0x12c
019fdaf4 77684ecb 01083e94 00000000 00000000 clbcatq!CComClass::Create+0x5e
019fdb1c 774e7872 0009c588 00000000 01083e94 clbcatq!CComCLBCatalog::GetClassInfoW+0x23
019fdb98 774ec033 775cfbf4 00000017 00000000 ole32!CComCatalog::GetClassInfoInternal+0x1f0
019fdbbc 774ec069 775cfbf8 00000017 00000000 ole32!CComCatalog::GetClassInfoW+0x22
019fdbdc 774ec0a1 01083e94 019fdbf8 00000000 ole32!GetClassInfoFromClsid+0x2d
019fdbfc 774f132c 01083e94 019fdcd0 00000000 ole32!LookForConfiguredClsid+0x33
019fdce4 774f1275 01083e94 00000000 00000005 ole32!ICoCreateInstanceEx+0x109
019fdd0c 774f1244 01083e94 00000000 00000005 ole32!CComActivator::DoCreateInstance+0x28
019fdd30 774f14eb 01083e94 00000000 00000005 ole32!CoCreateInstanceEx+0x23
019fdd60 01036c59 01083e94 00000000 00000005 ole32!CoCreateInstance+0x3c
019fdd9c 010794c5 019ff0b0 01079608 019fddd0 DO_SYSPUBC!CreateComObject+0x1d
019fddd0 01081ce9 00000000 7c96248b 010a23fc DO_SYSPUBC!DealWithException+0x45
019ff10c 010817b3 0011f3b0 019ff164 000eeb30 DO_SYSPUBC!TDBOperation.GetMainAreaData+0xf9
019ff124 010827cd 0011f3b0 019ff1a8 010827da DO_SYSPUBC!TDBOperation.GetData+0x1b
019ff14c 774fc169 010a2930 000f6e04 037a480c DO_SYSPUBC!TDO_SysPub.GetData+0x35
红色的是我们的代码,但是看源程序确没找出错误来!郁闷
感觉这个问题已经快搞定了,弄出来后一定写个详细的过程。
在这两天里学到了些东西,随手记了下来,也帖到这吧。
写的不详细,也许只有我自己才能看懂
0.微软public symbols url
srv*C:\symcache*http://msdl.microsoft.com/download/symbols
1. 用 virtual machine 调试 windows
windbg -b -k com:pipe,port=\\.\pipe\com_1,resets=0
2. 出现死锁后
!locks
查看临界区
!cs
The !cs extension displays one or more critical sections, or the entire critical section tree.
能查看一个临界区的更详细的信息如DebugInfo 。。。
3.符号问题
Often, when you're debugging an application, you encounter the message Using Export symbols for XXX. This message should raise a flag to you. The message means that either the debugger couldn't locate the symbols or the symbols weren't the right version for a given file. In either case, the debugger will revert to using built-in information called an export table. An export table is a list of all functions in a DLL that are available for other .dll or .exe files to call (i.e., all the functions the DLL exports to other programs). The debugger reads the information from the export table and builds the symbolic information from that information, which might be fine in some cases.
However, here's where you run into trouble. Assume that you have a DLL with five exported functions and six more functions that are just for this DLL to use. The order in which the functions are written in the code is random, so you have export and private functions mixed up. As far as the debugger is concerned, all the memory between Export Function 1 and Export Function 2 belongs to Export Function 1. However, halfway through this code is Private Function 1. If you're in this code, the debugger reports that you're still in the export function. If you have messages stating that you're using export symbols, take the information you get from the debug with a grain of salt, because it might not be correct. Get the proper symbols for the file in question.
4. 查看抓的程序是否启用pageheap .
!gflag
具体参数看windbg 的帮助就可以
5.
找到处理异常的函数
找到那个指向
typedef struct _EXCEPTION_POINTERS { PEXCEPTION_RECORD ExceptionRecord; PCONTEXT ContextRecord;
} EXCEPTION_POINTERS, *PEXCEPTION_POINTERS;
的参数地址 ox888888888吧
然后 dd ox888888888
开始的两个地址分别是异常信息的地址和上下文的地址
用 .exr 看异常的详细信息
用 .cxr 切换上下文环境
切换上下文后 用KB可以看当时出错的调用 stack 了
6.pageheap 也不是万能的,有些情况下不起作用
特别是ole 里的BSTR缓存问题,如果使用了BSTR的缓存
第一错误的现场应该是发生在Ole层里的,
导致即使开了pageheap 选项也不能立即抓住错误。。。
还有 8字节对齐的问题
郁闷啊
可以用OANOCACHE=1 把自动缓存关掉
unasign 把 8字节对齐去掉。
这些选项都是改的注册表
最终影响系统里的heap manager.
7.COM+
用Crash 来抓Dump,原来进程没有崩溃的时候还是可以抓到的
怪
server.urlencode
8.关于delphi b编译产生的map文件,转化为dbg文件后,windbg 还是不能加载的问题
要打开!sym noisy 开关。
根据提示的符号路径把dbg文件拷贝到相应的正确目录下就好了
9 COM+ 是不会产生second chance exception 的。因为dllhost 会为每个线程在最外边加一层try ...catch
10. dllhost 虽然没有崩溃,但还是可以用adplus去抓的。
这个跟培训的内容不一样,事件查看器里面并没有COM+程序崩溃的信息
但用adplus -crash 还是可以抓出dump 来
11.HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\COM3\DEBUG
"DebugBreakOnFailFast"="Y"
添加这个注册表可以在COM+ 崩溃的时候出现提示对话框
我实验的怎么没出现呢