代码改变世界

利用压力测试来保证软件的质量(三) 窗口链混乱问题

2010-09-10 21:14  王克伟  阅读(608)  评论(0编辑  收藏  举报

窗口链混乱问题

简介:比如进程A莫名其妙的销毁了进程B的一个窗口,进程B就杯具了。

 

摸索的过程

2010.6.7/6.8/6.9

Bug2609:

1.我们看到这个异常发生在pimg.exe进程中,那么这个Application干嘛的?
图片与视频应用程序
dumpsource pimg.exe发现代码在:\private\apps\pimg

2.但是我们发现异常抛\private\shellw\gserver\pimgdll\camera\frame.cpp函数WndProc L52处,看来是在调用pimgdll.dll中的Camera相关函数时出现问题。
pimgdll.dll的代码在:\private\shellw\gserver\pimgdll

3.从Callstack看,Pimg.exe在调出Camera的UI后,Camera的窗口回调函数在处理0x0000c007这个消息时出错的:

PIMGDLL!WndProc(HWND__ * 0x700ee5a0, unsigned int 0x0000c007, unsigned int 0x00000003, long 0x00000000) frame.cpp line 52 + 5 bytes 
GWES!WindowProcCallback(void * 0x0cff588e, long (HWND__ *, unsigned int, unsigned int, long)* 0x41e9ea5d, CWindow * 0x700ee5a0, unsigned int 0x0000c007, unsigned int 0x00000003, long 0x00000000, bool * 0xd986edcf) wbase.cpp line 3198 + 18 bytes 
GWES!CWindow::CallWindowProcW_I(CePtr_t<long (__cdecl*)(HWND__ *,unsigned int,unsigned int,long)> {...}, HWND__ * 0x41e9ea5d, unsigned int 0x0000c007, unsigned int 0x00000003, long 0x00000000, SendMsgEntry_t * 0x00000000) wbase.cpp line 3403 + 21 bytes 
GWES!MsgQueue::DispatchMessageW_I(const tagMSG * 0x0002eca8) msgque.cpp line 4967 + 32 bytes 
GWES!PixelDoubled_t::DispatchMessageW_I(const tagMSG * 0x0002eca8) pixeldouble.cpp line 2083 + 9 bytes 
COREDLL!DispatchMessageW(const tagMSG * 0x0002eca8) twinuser.cpp line 2947 + 9 bytes 
PIMGDLL!CMainWnd::Run() frame.cpp line 365 + 9 bytes 
PIMGDLL!CCamera::Run(HWND__ * 0x700a9ec0, CConfig * 0x003ae620) cameraui.cpp line 997 
PIMGDLL!StartCameraUI(HWND__ * 0x700a9ec0, wchar_t * 0x00031d00, int 0x00000104, int 0x00000000) cameraui.cpp line 143 //▲这里 
FBROWSER!CFilePicker::OnCmdSelectCamera(_ITEMIDLIST * 0x003a9d00) fpicker.cpp line 1024 
FBROWSER!CFilePicker::OnCommand(unsigned int 0x00000fab, long 0x003a9d00) fpicker.cpp line 1139 
FBROWSER!CPicturePicker::OnCommand(unsigned int 0x00000fab, long 0x003a9d00) picturepicker.cpp line 280 + 13 bytes 
FBROWSER!CCoreBrowser::WndProc(HWND__ * 0x700a9ec0, unsigned int 0x00000111, unsigned int 0x00000fab, long 0x003a9d00) core.cpp line 649 + 13 bytes ... 
FBROWSER!StartFilePicker(tagOPENFILENAMEEX * 0x0002fb04, int 0x00000002, int 0x00000001) picturepicker.cpp line 609 + 19 bytes 
PIMG!WinMain(HINSTANCE__ * 0x0cff588e, HINSTANCE__ * 0x0cff588e, wchar_t * 0x0002fc70, HINSTANCE__ * 0x0cff588e) pimg.cpp line 46 + 13 bytes

StartCameraUI先初始化Camera,然后让Camera Run起来:
    CCamera::Initialize(hwnd,g_hInstRes, pcfg, TRUE);
    ...
    CCamera::GetCamera()->Run(hwnd,pcfg);
但实际上在到PIMGDLL!WndProc时:
    CState* pState= CCamera::GetCamera()->GetState();
CCamera::GetCamera()取得的指针为空,也就是
\private\shellw\gserver\pimgdll\camera\cameraui.h的静态成员s_pCamera为空,
s_pCamera为空的可能情景是:
    a.CState::HandleMessage收到了WM_SIZE、WM_HIBERNATE或者WM_DESTROY消息。(CState::HandleMessage作为Camera主窗口的一个消息处理函数。)->
    b.在Initialize失败时,但是这样情况应该被排除,因为从Log看和逻辑上推这个函数都是已经被成功执行了的。
    c.等排除了a再看有没有其它的
然后调用CCamera::SetState(CV_STT_QUIT)销毁了自己,
然后CCamera的析构函数被调用,在析构函数里面ClearAll被调用,s_pCamera被清空。

4.出现问题时PIMGDLL!WndProc收到的消息是0x0000c007,这是什么消息?看一下RegisterWindowMessage API发现:
0xC000 through 0xFFFF的消息是为了需要跨应用唯一标识的消息用的,否则可以使用WM_USER through 0x7FFF的消息作为私用消息用,那么0x0000c007这个消息是谁发过来的?
Hopper?这个消息是发过来干嘛的?查遍Cashmere的代码只看到也没有找到可能与0x0000c007以及0x0000c000相关的。

5.此Bug是否是以下情景:
Pimg.exe创建了一个Camera窗口A,IE也创建了一个Camera窗口B(因为某种原因让窗口A和B其实是一个窗口),IE已经把窗口B销毁了(也就是A销毁了),这时Pimg.exe去给窗口A发送消息,
但实际上CCamera已经跟着窗口    A一起销毁了,这时调用已经销毁了的CCamera对象的成员触发AV异常。(窗口A的消息循环的标志位s_fRunMessageLoop在窗口销毁时并没有被置成FALSE,
难道窗口A销毁时并没有走窗口A的窗口回调函数,而是走窗口B的?)
一些供推测的Code:
(1).\private\shellw\gserver\pimgdll\camera\cameraui.cpp L695 位置我们首先new个CMainWnd对象:
    s_pCamera->m_pMWnd = new CMainWnd(hInstance,parent);
然后在L740 位置创建窗口并指定父窗口:
    CBR(s_pCamera->m_pMWnd->Create(s_pCamera->m_hParent));
CMainWnd::Create在
(2). CCamera的类定义中我们看到:
    // parent window. this window may not the same as passing one
    // if the camera/video window is active
    HWND m_hParent;
如果多个应用(比如IE、Pimg)同时创建了CCamera,那么不同的CCamera对象具体如果存在的?CCamera是单一的实例?CCamera对应的Camera窗口呢?
(3).CCamera在不同应用里面是不同实例,不能共享数据(比如CCamera的成员m_pMWnd的s_fRunMessageLoop的值在IE和Pimg里面是不一样的),
也就是说IE和Pimg虽然拥有的CCamera对象不同,但是CCamera对应的Camera窗口其实是同一个。
这就是为什么出现以下现象:

       hWnd value 0x700ee5a0. the memory is valid and member value: 
       m_hprcCreater: 0x0cff588e(pimg.exe) 
       m_hprcDestroyer: 0x0a4c0282(iexplorer.exe) 
       m_fBeginDestroyed = 1; 

以及:

       Follow callstack is a thread(0x8f211b6) of iexplorer(0x0a4c0282): ie process destroy pimg process's camera window. 

       0xd98cf618 K.COREDLL!xxx_WaitForSingleObject(0x0d680dab, 0xffffffff)  line 418 
       0xd98cf690 GWES!MsgQueue::GetEvent(0x80000001, 0x00000000, 0x00000000, 0x00000000, 0xd98cf6b8)  line 3746 + 10 bytes 
       0xd98cf700 GWES!MsgQueue::SendMessageWithOptions(0x700ee5a0, 0x00000002, 0x00000000, 0x00000000, 0x00000000)  line 4171 
       0xd98cf728 GWES!CWindow::SendDestroyMessages1()  line 1200 + 11 bytes 
       0xd98cf748 GWES!CWindow::DestroyWindow_I(0x700ee5a0)  line 1434  =====> 0x700ee5a0 is pimg camera window handle value. 
       0xd98cf764 GWES!CWindow::DestroyWindow_I(0x700c4c20)  line 1346 
       0x009efa4c BROWSUI!DialogBase::Run(0x700ae240, 0x009efa5c)  line 293 + 6 bytes 
       0x009efa60 BROWSUI!BrowserOptions::Run(0x700ae240)  line 123 

6.9号验证iexplore.exe进程并没有加载pimgdll.dll模块,而且从Log中也没有看到过有卸载这个模块的记录。说明以上推测可能不成立。
自己通过命令查看0x700ee5a0的内核对象:
{,,coredll.dll}(CWindow*)0x700ee5a0
是:

-    {,,coredll.dll}(CWindow*)0x700ee5a0    0x700ee5a0 
+    m_ParentChild1    {...} 
    s_sigValid    0x574e4457 
    m_sig    0x574e4457 
+    m_pcwndOwner    0x00000000 
+    m_pcwndOwned    0x00000000 
+    m_pcwndNextOwned    0x700d7980 //这个是由iexplorer.exe创建,那么这个成员作何用? 
+    m_pcwndRestore    0x00000000 
+    m_rc    {...} 
+    m_rcClient    {...} 
+    m_pgdiwnd    {...} 
+    m_pgdiwndClient    {...} 
+    m_pgdiwndClientUpdate    {...} 
+    m_rcRestore    {...} 
+    m_ptblProperties    0x00000000 
    m_dwState    0x00000000 
    m_psbii    0x00000000 
    m_pGestureStateManager    0x00000000 
+    m_pszName    0x700ee6a0 " //字符串的值为 FE 56 47 72 8C 54 C6 89 91 98 00 00 
+    m_pmsgq    0x700883e0 
    m_himc    0x00032360 
    m_hprcHimcOwner    0x0cff588e 
    m_grfStyle    0x80000000 
    m_grfExStyle    0x80000000 
+    m_pwc    0x700ee544 
    m_lID    0x00000000 
    m_lUserData    0x00000000 
    m_hprcCreator    0x0cff588e //pimg.exe 
    m_hthdCreator    0x0d08607e 
-    m_WindowProcPtr    {...} 
-    CePtrBase_t    {...} 
    m_hProc    0x0cff588e //窗口回调函数在pimg.exe中 
    m_Ptr    0x41e9ea5d WndProc(HWND__ *, unsigned int, unsigned int, long) 
    m_PtrLong    0x41e9ea5d 
    m_PtrUnsignedLong    0x41e9ea5d 
+    m_hrgnWindowRgn    0x00000000 
+    m_hrgnVisible    0x039a16f1 
+    m_hrgnUpdate    0x033e1e6b 
+    m_hrgnClientVisible    0x009e0fb0 
+    m_hrgnClientUpdate    0x00000000 
    m_pBackBuffer    0x00000000 
+    m_BlendFunction    {...} 
    m_crKey    0x00000000 
+    mLayeredWindowFlags    {...} 
+    m_hmenu    0x0a4c0282 
    m_hprcDestroyer    0x0a4c0282 //iexplorer.exe 
+    m    {...} 
    m_grfBitFields    0x000044d0 
    m_ullGuardGestureFlags    0x0000000000000000 
    m_rgdwExtraBytes    0x700ee680 

我们看到它在内核中的对象还真是挺复杂的啊。
查看private\winceos\coreos\gwe\winmgr\wmbase\wmbase.cpp里面CreateWindowEx函数(即CWindowManager::CreateWindowExW_I函数)的实现,发现在L690:

    // Handle parent-child-owner issues 
    if ( grfStyle & WS_POPUP ) 
        { 
        grfStyle &= ~WS_CHILD;  //    Make sure we don't have both WS_POPUP and WS_CHILD. 

        //    If popup, owner was passed in as parent parameter. 
        pcwndOwner = (CWindow*)hwndParent; 

L724:

        hwndParent = NULL;      //    Make the new window a top level window. 

L860:            

        pcwnd->m_pcwndOwner = pcwndOwner; 
        //    Insert into owner's owned list. 
        pcwnd->m_pcwndNextOwned = pcwndOwner->m_pcwndOwned; 
        pcwndOwner->m_pcwndOwned = pcwnd; 

从private\winceos\coreos\gwe\winmgr\wmbase\wbase.cpp

/* 
CWindow::Initialize 

Initializes a window. 

*/ 
BOOL 
CWindow:: 
Initialize( 
    CWindow *pcwndParent, 
    UINT32 cy, 
    UINT32 cx, 
    INT32 y, 
    INT32 x, 
    UINT32 grfStyle, 
    const WCHAR *szName, 
    UINT32 grfExStyle, 
    MsgQueue *pmsgq, 
    CWindowClass *pwc, 
    LONG lID 
    ) 
{ 
    BOOL Ret = FALSE; 
    const WCHAR *szCopy = NULL; 
    DWORD ProcessVersion; 

    AssertInUserOnce(); 

    // These will be filled in below.  NULL them here for finally clause. 
    Parent(pcwndParent); //将pcwndParent作为m_ParentChild1.m_pParent 
    m_rc.left = x; 
    m_rc.top = y; 

    if ( Parent() ) 
        { 
        m_rc.left += Parent() -> m_rcClient.left; 
        m_rc.top  += Parent() -> m_rcClient.top ; 
        } 

从上面看一个0x700ee5a0窗口的m_ParentChild1.m_pParent的值应该为空,而m_pcwndOwner的值应该不为空,
并且m_pcwndNextOwned的值与m_pcwndOwner.m_pcwndOwned应该是一样的(不对?),
但是为什么m_ParentChild1.m_pParent的值为空,m_pcwndOwner的值是实际上为空,
为什么?
m_pcwndNextOwned应该是由pimg.exe创建,
(从Callstack看到,传进去的第一参数就是0x700ee5a0的父窗口,m_pcwndOwner的值应该为这个,
PIMGDLL!StartCameraUI(HWND__ * 0x700a9ec0, wchar_t * 0x00031d00, int 0x00000104, int 0x00000000) )
但是实际是由iexplorer.exe创建。
一个解释就是哪个进程对0x700ee5a0窗口的内核数据进行了篡改。(iexplorer.exe?NK.exe?)
但是2个独立的进程创建一个同ClassName以及同WindowsName的窗口,它们得到的窗口句柄是不一样的。那这到底是为什么?
做以下实验:
正常启动Pimg.exe打开Camera,并在private\shellw\gserver\pimgdll\camera\cameraUI.cpp的
StartCameraUI函数开始位置设断点,看Camera窗口的数据:

-    {,,coredll.dll}(CWindow*)0x700525c0    0x700525c0 
-    m_ParentChild1    {...} 
+    m_pNextSiblingBehind    0x70051880 
+    m_pNextSiblingInFront    0x70051c00 
+    m_pParent    0x700302a0 //NK.EXE的窗口 
+    m_pFirstChildFront    0x70052720 
+    m_pFirstChildBack    0x70052b80 
    s_sigValid    0x574e4457 
    m_sig    0x574e4457 
    s_sigValid    0x574e4457 
    m_sig    0x574e4457 
+    m_pcwndOwner    0x70042b00 //pimg.exe的窗口 
+    m_pcwndOwned    0x70051c00 //pimg.exe的窗口 
+    m_pcwndNextOwned    0x70050260 //pimg.exe的窗口 
+    m_pcwndRestore    0x00000000 
+    m_rc    {...} 
+    m_rcClient    {...} 
+    m_pgdiwnd    {...} 
+    m_pgdiwndClient    {...} 
+    m_pgdiwndClientUpdate    {...} 
+    m_rcRestore    {...} 
+    m_ptblProperties    0x00000000 
    m_dwState    0x00000000 
    m_psbii    0x00000000 
    m_pGestureStateManager    0x00000000 
+    m_pszName    0x700526c0 "Pictures & Videos" 
+    m_pmsgq    0x70045600 
    m_himc    0x000322e0 
    m_hprcHimcOwner    0x0bd704c6 
    m_grfStyle    0x90000000 
    m_grfExStyle    0x80000000 
+    m_pwc    0x70052304 
    m_lID    0x00000000 
    m_lUserData    0x00000000 
    m_hprcCreator    0x0bd704c6 //pimg.exe 
    m_hthdCreator    0x0bd804c6 
+    m_WindowProcPtr    {...} 
+    m_hrgnWindowRgn    0x00000000 
+    m_hrgnVisible    0x004a091f 
+    m_hrgnUpdate    0x00800971 
+    m_hrgnClientVisible    0x004a093d 
+    m_hrgnClientUpdate    0x00000000 
    m_pBackBuffer    0x00000000 
+    m_BlendFunction    {...} 
    m_crKey    0x00000000 
+    mLayeredWindowFlags    {...} 
+    m_hmenu    0x00000000 
    m_hprcDestroyer    0x00000000 
+    m    {...} 
    m_grfBitFields    0x00004447 
    m_ullGuardGestureFlags    0x0000000000000000 
    m_rgdwExtraBytes    0x700526a0 

问题:
现创建一个窗口,如果指定的ClassName和WindowName的窗口已经存在了,那么会怎样?(得到不同的窗口句柄。)比如:

//*************************************************************************** 
// Function name    : CMainWnd::Create 
// Description      : create main window 
// Return type      : BOOL 
//*************************************************************************** 
BOOL CMainWnd::Create(HWND hParent) 
{ 
    BOOL fSuccess = TRUE; 
    WNDCLASS wc; 
    wc.style = CS_VREDRAW | CS_HREDRAW;             // Window style 
    wc.lpfnWndProc = WndProc;                       // Callback function 
    wc.cbClsExtra = 0;                              // Extra class data 
    wc.cbWndExtra = 0;                              // Extra window data 
    wc.hInstance = m_hInst;                         // Owner handle 
    wc.hIcon = NULL,                                // Application icon 
    wc.hCursor = LoadCursor(NULL, IDC_ARROW);       // Default cursor 
    wc.hbrBackground = GetBrush(); 
    wc.lpszMenuName =  NULL;                        // Menu name 
    wc.lpszClassName = CCamera::GetAppClassName();                  // Window class name 
    RegisterClass (&wc); 

    // Create main window. 
    int iMenuBarHeight = SHGetMetric(SHUI_MENUHEIGHT); 
    RECT rc; 

    SystemParametersInfo(SPI_GETWORKAREA, 0, &rc, 0); 

    if (hParent) 
    { 
        m_hParent = hParent; 
    } 
    DWORD Style = WS_POPUP | WS_VISIBLE; 
    m_hWnd = CreateWindowEx(WS_EX_CAPTIONOKBTN, CCamera::GetAppClassName(), CCamera::GetCamera()->GetAppName(), 
                    Style, 
                    rc.left, rc.top, 
                    RECTWIDTH(rc), 
                    RECTHEIGHT(rc) - iMenuBarHeight, 
                    m_hParent, 0, m_hInst, 0); 
    // parameters 
    if (!IsWindow (m_hWnd)) 
    { 
        fSuccess = FALSE; 
        goto Error; 
    } 

6.8号的任务:
1.搞清楚哪些场景下s_pCamera被清空了。
2.使用CeDebugX看看一些数据。

待研究:
private\winceos\coreos\gwe\inc\window.hpp GWE里面CWindow的具体实现。

 

2010.6.10 
Bug2609太诡异了,很多问题我搞不懂,
现在出问题的Camera窗口的一些内核数据如下:

m_pcwndOwner    0x00000000 
m_pcwndOwned    0x00000000 
m_pcwndNextOwned    0x700d7980 //这个是由iexplorer.exe创建 
m_hprcCreator    0x0cff588e //pimg.exe 
m_pszName    0x700ee6a0 " //字符串的值为 FE 56 47 72 8C 54 C6 89 91 98 00 00 
m_hProc    0x0cff588e //窗口回调函数在pimg.exe中 
m_hprcDestroyer    0x0a4c0282 //iexplorer.exe 

我试过从Pimg.exe中正常StartCameraUI,我在Camera窗口的回调函数中设断点,这时看其一些内核数据如下:

m_ParentChild1.m_pParent    0x700302a0 //NK.EXE的窗口 
m_pcwndOwner    0x70042b00 //pimg.exe的窗口 
m_pcwndOwned    0x70051c00 //pimg.exe的窗口 
m_pcwndNextOwned    0x70050260 //pimg.exe的窗口 
m_pszName    0x700526c0 "Pictures & Videos" 
m_hprcCreator    0x0bd704c6 //pimg.exe 

我们看到出问题时的窗口的内核数据都有点混乱了,pimg.exe创建的Camera窗口给iexplorer.exe销毁了,m_pcwndOwner这样的数据居然是空的。
IE可以启动Camera?
我看到每个应用需要Camera功能时加载pimgdll.dll模块,而CCamera是单一实例,它也管理了Camera窗口,现在会不会有这样的情况:
Pimg.exe创建了一个Camera窗口A,IE也创建了一个Camera窗口B(因为某种原因让窗口A和B其实是一个窗口),IE已经把窗口B销毁了(也就是A销毁了),这时Pimg.exe去给窗口A发送消息,
但实际上CCamera已经跟着窗口    A一起销毁了,这时调用已经销毁了的CCamera对象的成员触发AV异常。
2个独立的应用就算创建相同ClassName和WindowName的窗口,它们得到的窗口句柄也是不一样的(PS:但是奇怪的是用spy查看时只发现一个窗口。),内核对象应该也是不一样的(待验证)。
我这个推测老师看有问题没?或者有给点更好的建议呀。

继续摸索Bug2609:
写一个最简单的只有一个窗口的应用Test.exe,打出进程信息如下:

Windows CE>!proc 0x0AA1007E 

+========================================= 
| test.exe 
|  
| Process Handle : 0x0aa1007e 
+========================================= 

   Cmdline        : 
   Process ptr    : 0x8c719354 
   Size           : 0x00003000 
   Base ptr       : 0x00010000 
   Lowest free VM : 0x000e0000 
   Time stamp     : 0xef74579d 
   File type      : Object Store 
   hTok           : 0x00000002 
   ASID           : 0x00 
   State          : (0) normal 
   Rsrc mod ptr   : 0x00000000 

   Module                 VM range                  Ref cnt  Module ptr 
   -------------------------------------------------------------------------- 
   aygshell.dll           0x41310000 - 0x41397000      9     0x84d2d404 
   ceperf.dll             0x43cb0000 - 0x43cbc000      9     0x8535d2a4 
   commctrl.dll           0x41f20000 - 0x41fb1000      9     0x8535d024 
   commctrl.dll.0409.mui   0x43ca0000 - 0x43ca2000     10     0x8535d15c 
   compime.dll            0x41160000 - 0x41193000      3     0x850274b4 
   coredll.dll            0x40030000 - 0x400f2000     19     0x8ffa9594 
   coredll.dll.0409.mui   0x43ba0000 - 0x43bb5000     19     0x8fadf540 
   lpcrt.dll              0x42170000 - 0x42175000     13     0x84bc0f30 
   ole32.dll              0x41d70000 - 0x41df3000     13     0x84bc0cc0 
   oleaut32.dll           0x41a90000 - 0x41ac9000     12     0x851a11e4 
   ossvcs.dll             0x41c30000 - 0x41c8f000     10     0x850d592c 
   rpcrt4.dll             0x420b0000 - 0x42112000     13     0x84bc0df8 
   shellres.dll           0x43d10000 - 0x43d23000      5     0x85027688 
   shellres.dll.0409.mui   0x43d30000 - 0x43d60000      6     0x85027724 
   shutil.dll             0x41800000 - 0x41807000      4     0x861a2a24 

   pThread     hThread      Priority       User Time   Run        Death      Current Proc      Thread Name 
                            curr:base                  State      State 
   ---------------------------------------------------------------------------------------------------------- 
   0x875e09c4  0x0aa4009a   251 : 251         367 ms   Blocked    Healthy    NK.EXE            WinMainCRTStartup 

并且使用!win all命令查看到我启动2个Test.exe的窗口:

"TryWindowName" 
hwnd=0x70047700 Class=TryClassName parent=0x700302a0 hThread=0x0ab200de hProcess=0x0aaf00de 
x=0 y=36 width=480 height=764 
Style=WS_VISIBLE WS_POPUP 
exstyle=WS_CAPTIONOKBTN 

"TryWindowName" 
hwnd=0x70051560 Class=TryClassName parent=0x700302a0 hThread=0x0aa4009a hProcess=0x0aa1007e 
x=0 y=0 width=0 height=0 
Style=WS_VISIBLE WS_POPUP 
exstyle=WS_CAPTIONOKBTN 

分别{,,coredll.dll}(CWindow*)0x70047700和{,,coredll.dll}(CWindow*)0x70051560,我们发现一切都很正常,
尝试销毁其中一个没有激活的窗口时发现一切也是正常的。那么使用特殊手段去修改其中一个窗口的内核数据才能复现这个Bug?使用什么手段?

 

2010.6.12/6.13
能否把窗口创建进程与销毁进程不一致的一类问题解决掉?如何做?
很可能是double free window handle问题。

为什么iexplorer.exe打开一个对话框,窗口的创建进程是shell32.exe,而创建线程是iexplorer.exe的线程,销毁者是iexplorer.exe。
答:
iexplorer.exe PSL into shell32.exe. If shell32.exe create window during this PSL call. the window creator process will be shell32.exe, but the window create thread will be iexplorer.exe.
Normally, creator process and thread will be consistent, but if in PSL the thing will change.
所以尝试在private\winceos\COREOS\gwe\winmgr\wmbase\wbase.cpp L1400判断m_hprcCreator与m_hprcDestroyer是否一样,如果不一样则Break一下。

问题:
\private\winceos\COREOS\gwe\winmgr\wmbase\wmbase.cpp L860:

            pcwnd->m_pcwndOwner = pcwndOwner; 
            //    Insert into owner's owned list. 
            pcwnd->m_pcwndNextOwned = pcwndOwner->m_pcwndOwned; 
            pcwndOwner->m_pcwndOwned = pcwnd; 

窗口链到底是怎么管理的?

 

我的解决压力测试Bug的思路:
1.如果通过一个Bug很难分析出结果,那么不能浪费太多时间在上面,因为这个Bug本身可用的数据也许很少,而且很多数据可能引导你走向错误的方向。
所以此时首先要想办法缩小问题的范围,让Repo方法可以简单重复,然后编写自动化工具,利用工具去得到更多的数据。然后利用这些数据再次分析并缩小问题的范围。
2.窗口链表混乱的问题,我的解决思路如下:
a.在\private\winceos\coreos\gwe\winmgr\wmbase\wbase.cpp L1303行添加如下代码:

    //Trace double free window handle problem 
    if (This.m_hprcDestroyer != This.m_hprcCreator) 
    { 
        HTHREAD hthdDestroyer = (HTHREAD)GetCurrentThreadId(); 
        if (hthdDestroyer != This.m_hthdCreator) 
        { 
            DebugBreak(); 
        } 
    } 

今天Hit了一次,情况如下:
此时并没有任何异常。
Transcriber.exe的窗口0x70083e20跑到cprog.exe进程的一个窗口链里面去了,Break的时候,cprog.exe正在销毁这个窗口:

-    {,,coredll.dll}(CWindow*)0x70083e20    0x70083e20 
+    m_pcwndOwner    0x00000000 
+    m_pcwndOwned    0x00000000 
+    m_pcwndNextOwned    0x70083440 
+    m_pcwndRestore    0x00000000 
+    m_pszName    0x70083f40 "Transcriber" 
+    m_pmsgq    0x7006e060 
    m_himc    0x00000000 
    m_hprcHimcOwner    0x00000000 
    m_grfStyle    0x80000a04 
    m_grfExStyle    0x84090000 
+    m_pwc    0x7006eea4 
    m_lID    0x00000000 
    m_lUserData    0x00000000 
    m_hprcCreator    0x00d60776 //transcriber.exe 
    m_hthdCreator    0x03ba29f6 //transcriber.exe的线程 
    m_hprcDestroyer    0x01ce0066 //cprog.exe 

这个窗口被销毁之后,有可能不会有任何问题,但也可能造成transcriber.exe进程的AV异常,就像之前的Hopper Bug:2609、2652、2665。
b.但是我们还是不知道Transcriber.exe的窗口0x70083e20如何跑到cprog.exe进程的一个窗口链里面去了,在
\private\winceos\COREOS\gwe\winmgr\wmbase\wmbase.cpp L860添加如下代码:

                //Trace double free window handle problem 
                if (pcwnd.m_hprcCreator != pcwndOwner.m_hprcCreator) 
                { 
                    if (pcwnd.m_hthdCreator != pcwndOwner.hthdCreator) 
                    { 
                        DebugBreak(); 
                    } 
                } 
(2010.7.8:
                //Trace double free window handle problem 
                if (0 != lstrcmp(pcwnd->m_pszName, L"Transcriber") && 
                    pcwnd->m_hprcCreator != pcwndOwner->m_hprcCreator) 
                { 
                    if (pcwnd->m_hthdCreator != pcwndOwner->m_hthdCreator) 
                    { 
                        DebugBreak(); 
                    } 
                } 
)
验证过程中出现几次Bread,检查了下发现是类似这样的代码造成的:
    HWND  hwnd = GetForegroundWindow(); 
    ... 
    MessageBox(hwnd, szMsgStr, CGR_APP_TITLE, uiType|MB_SETFOREGROUND);

 

2010.6.19
从昨天晚上Hit到的一个Dumpfile(见6.13日Bug2609的解决思路)的Callstack上看:

GWES!CWindowManager::CreateWindowExW_I(unsigned int 0x80000000, const wchar_t * 0x41f21468, const wchar_t * 0x42d60808, unsigned int 0x90000000, int 0x00000000, int 0x00000024, unsigned int 0x000001e0, unsigned int 0x000002b8, HWND__ * 0x00000000, HMENU__ * 0x00000000, HINSTANCE__ * 0x84aaeb08, void * 0x00000000, tagCREATESTRUCTW * 0x0002e9f4) wmbase.cpp line 865 
GWES!PixelDoubled_t::CreateWindowExW_I(unsigned int 0x80000000, const wchar_t * 0x41f21468, const wchar_t * 0x42d60808, unsigned int 0x90000000, int 0x00000000, int 0x00000024, unsigned int 0x000001e0, unsigned int 0x000002b8, HWND__ * 0x7007d0c0, HMENU__ * 0x00000000, HINSTANCE__ * 0x84aaeb08, void * 0x00000000, tagCREATESTRUCTW * 0x0002e9f4) pixeldouble.cpp line 1299 + 41 bytes 
COREDLL!CreateWindowExW(unsigned long 0x80000000, const wchar_t * 0x41f21468, const wchar_t * 0x42d60808, unsigned long 0x90000000, int 0x00000000, int 0x00000024, int 0x000001e0, int 0x000002b8, HWND__ * 0x7007d0c0▲句柄B, HMENU__ * 0x00000000, HINSTANCE__ * 0x84aaeb08, void * 0x00000000) twinuser.cpp line 474 + 48 bytes 
PIMGDLL!CMainWnd::Create(HWND__ * 0x7007d0c0▲句柄B) camera_ppc.cpp line 165 
PIMGDLL!CCamera::Initialize(HWND__ * 0x700a5e40▲句柄A, HINSTANCE__ * 0x84aaeb08, CConfig * 0x0003c1e0, int 0x00000001) cameraui.cpp line 740 + 40 bytes 
PIMGDLL!StartCameraUI(HWND__ * 0x700a5e40▲句柄A, wchar_t * 0x00031be0, int 0x00000104, int 0x00000000) cameraui.cpp line 133 + 17 bytes 
... 

StartCameraUI本来想让句柄为A的窗口作为Camera的父窗口的,但是传递到最后变成句柄为B的窗口,而
句柄A对应的内核数据为:

-    {,,coredll.dll}(CWindow*)0x700a5e40    0x700a5e40 
+    m_ParentChild1    {...} 
    s_sigValid    0x574e4457 
    m_sig    0x574e4457 
+    m_pcwndOwner    0x00000000 //▲ pimg.exe的顶级窗口 
+    m_pcwndOwned    0x700a69e0 
+    m_pcwndNextOwned    0x00000000 
+    m_pcwndRestore    0x700a5e40 
+    m_rc    {...} 
+    m_rcClient    {...} 
+    m_pgdiwnd    {...} 
+    m_pgdiwndClient    {...} 
+    m_pgdiwndClientUpdate    {...} 
+    m_rcRestore    {...} 
+    m_ptblProperties    0xd9edc320 
    m_dwState    0x00000000 
    m_psbii    0x00000000 
    m_pGestureStateManager    0x00000000 
+    m_pszName    0x700a5f60 "Pictures & Videos" 
+    m_pmsgq    0x700a5220 
    m_himc    0x000322e0 
    m_hprcHimcOwner    0x027b2ac6 
    m_grfStyle    0x10000000 
    m_grfExStyle    0x00000000 
+    m_pwc    0x700a5de4 
    m_lID    0x00000000 
    m_lUserData    0x000312e0 
    m_hprcCreator    0x027b2ac6 //▲ pimg.exe 
    m_hthdCreator    0x02d228be 
+    m_WindowProcPtr    {...} 
+    m_hrgnWindowRgn    0x00000000 
+    m_hrgnVisible    0x00841875 
+    m_hrgnUpdate    0x00920e3c 
+    m_hrgnClientVisible    0x049e128a 
+    m_hrgnClientUpdate    0x00000000 
    m_pBackBuffer    0x00000000 
+    m_BlendFunction    {...} 
    m_crKey    0x00000000 
+    mLayeredWindowFlags    {...} 
+    m_hmenu    0x00000000 
    m_hprcDestroyer    0x00000000 
+    m    {...} 
    m_grfBitFields    0x00000444 
    m_ullGuardGestureFlags    0x0000000000000000 
    m_rgdwExtraBytes    0x700a5f20 


句柄B对应的内核数据为:

-    {,,coredll.dll}(CWindow*)0x7007d0c0    0x7007d0c0 
+    m_ParentChild1    {...} 
    s_sigValid    0x574e4457 
    m_sig    0x574e4457 
+    m_pcwndOwner    0x00000000 //▲clock.exe的顶级窗口 
+    m_pcwndOwned    0x70087f00 
+    m_pcwndNextOwned    0x00000000 
+    m_pcwndRestore    0x70087f00 
+    m_rc    {...} 
+    m_rcClient    {...} 
+    m_pgdiwnd    {...} 
+    m_pgdiwndClient    {...} 
+    m_pgdiwndClientUpdate    {...} 
+    m_rcRestore    {...} 
+    m_ptblProperties    0x00000000 
    m_dwState    0x00000000 
    m_psbii    0x00000000 
    m_pGestureStateManager    0x00000000 
+    m_pszName    0x7007d1e0 "Clock & Alarms" 
+    m_pmsgq    0x700725e0 
    m_himc    0x00031c00 
    m_hprcHimcOwner    0x07b178b6 
    m_grfStyle    0x08c00000 
    m_grfExStyle    0x40000000 
+    m_pwc    0x70075384 
    m_lID    0x00000000 
    m_lUserData    0x00000000 
    m_hprcCreator    0x07b178b6 //▲clock.exe 
    m_hthdCreator    0x09d4c7fe 
+    m_WindowProcPtr    {...} 
+    m_hrgnWindowRgn    0x00000000 
+    m_hrgnVisible    0x060a0d58 
+    m_hrgnUpdate    0x05f21d01 
+    m_hrgnClientVisible    0x05941dac 
+    m_hrgnClientUpdate    0x00000000 
    m_pBackBuffer    0x00000000 
+    m_BlendFunction    {...} 
    m_crKey    0x00000000 
+    mLayeredWindowFlags    {...} 
+    m_hmenu    0x00000000 
    m_hprcDestroyer    0x00000000 
+    m    {...} 
    m_grfBitFields    0x00000544 
    m_ullGuardGestureFlags    0x0000000000000000 
    m_rgdwExtraBytes    0x7007d1a0 

而创建的Camera窗口为:
Camera的窗口为:

-    {,,coredll.dll}(CWindow*)0x700e5d60    0x700e5d60 
+    m_ParentChild1    {...} 
    s_sigValid    0x574e4457 
    m_sig    0x574e4457 
+    m_pcwndOwner    0x7007d0c0 //▲句柄B,应该为句柄A 
+    m_pcwndOwned    0x00000000 
+    m_pcwndNextOwned    0x70087f00 
+    m_pcwndRestore    0x00000000 
+    m_rc    {...} 
+    m_rcClient    {...} 
+    m_pgdiwnd    {...} 
+    m_pgdiwndClient    {...} 
+    m_pgdiwndClientUpdate    {...} 
+    m_rcRestore    {...} 
+    m_ptblProperties    0x00000000 
    m_dwState    0x00000000 
    m_psbii    0x00000000 
    m_pGestureStateManager    0x00000000 
+    m_pszName    0x700e5c60 "Pictures & Videos" 
+    m_pmsgq    0x700a5220 
    m_himc    0x000322e0 
    m_hprcHimcOwner    0x027b2ac6 
    m_grfStyle    0x80000000 
    m_grfExStyle    0x80000000 
+    m_pwc    0x700e5c04 
    m_lID    0x00000000 
    m_lUserData    0x00000000 
    m_hprcCreator    0x027b2ac6 //▲ pimg.exe 
    m_hthdCreator    0x02d228be 
+    m_WindowProcPtr    {...} 
+    m_hrgnWindowRgn    0x00000000 
+    m_hrgnVisible    0x00ea201e 
+    m_hrgnUpdate    0x01be1ff4 
+    m_hrgnClientVisible    0x019a1ffd 
+    m_hrgnClientUpdate    0x00000000 
    m_pBackBuffer    0x00000000 
+    m_BlendFunction    {...} 
    m_crKey    0x00000000 
+    mLayeredWindowFlags    {...} 
+    m_hmenu    0x00000000 
    m_hprcDestroyer    0x00000000 
+    m    {...} 
    m_grfBitFields    0x00000404 
    m_ullGuardGestureFlags    0x0000000000000000 
    m_rgdwExtraBytes    0x700e5e40 

正常情况下Camera父窗口的句柄应该为A,但是实际上为B,合理吗?那么我们是否能够猜测在一些情况下Camera的父窗口是其它EXE的窗口,
如果是这样的话Bug2609以及Bug2665(窗口A变成了shell32.exe的一个cpl窗口)。
进一步推理到问题可能出在\private\shellw\gserver\pimgdll\camera\cameraui.cpp L708附近代码。

现将\private\shellw\gserver\pimgdll\camera\cameraui.cpp L698的代码改为:

    else 
    { 
        ... 
        //if (!s_pCamera->m_hParent) 
        //{// after close the option dialog, the s_pCamera->m_hParen should have valide value, 
        // // otherwise, we force it equal to the parent passed in. 
        //    s_pCamera->m_hParent = parent; 
        //} 
    } 

    if (s_pCamera) 
    { 
        s_pCamera->m_hParent = parent; 
    } 

看是否能解决窗口链混乱问题,待验证。

还有个待验证的问题:
执行DestroyWindow_I函数时消息WM_DESTROY会被发送给窗口吗?如果不会那么Bug2609里面的 s_fRunMessageLoop没有被置为FALSE会比较合理,如果会那又作何解释呢?
在s_fRunMessageLoop被置为FALSE之前收到的0xC007消息?
另外0xC007这个消息是谁发送的?Hopper?

 

2010.6.23
借助Tool help了解进程的信息,比如:
CreateToolhelp32Snapshot
Process32First
Process32Next
look PROCESSENTRY32
详细见:
public\common\oak\inc\toolhelp.h
private\winceos\coreos\toolhelp\toolhelp.c

 

2010.6.24
最本质原因是:
Camera window was destroyed as a owned window by other process.

s_pCamera would cache the Video and Still Option dialog window handle in m_hDlgStill or m_hDlgVideo when open options.
When the dialog exit no one would set those variable to NULL, s_pCamera will cache a invalid window handle or a handle reused by any others.
Once the window handle reused by other process, and when CCamera::Initialize invoke CloseDlgOpt it would find a other process window's parent
and set to s_pCamera->m_hParent. After that we create a window owned by that process.