[转]从源代码看.net下exe的加载过程
在看雪学院看到一篇好文章,发上来大家共享共享。
http://bbs.pediy.com/showthread.php?threadid=31799
这里的源代码自然不是指.net Framework的源码,不过微软公开了一个代号为rotor的open source cli的源码,你可以把它看为轻量级的.net framework。最关键的是,它俩的运行机理大致相同。今天,我们就从rotor的源码中看看做为程序调试最基本的exe文件的动态加载。同样,先给出参考文献,免得有人说我抄袭。《inside the rotor cli》,另一本是《shared source cli》,只不过网上搞不到。当然,还要从MSDN的网站下载sscli2.0压缩包。
和win32下一样,系统会提供一个loader将exe读入,sscli中提供了另一个loader的例子:clix.exe。我们暂且把它看为系统默认的loader,来看源码(clix.cpp),注意红色的代码
这里我们看到了著名的CorExeMain,还记得用PE编辑文件打开.netPE文件,只引入了一个函数吗?mscoree.dll!_CorExeMain。奇怪,怎么不是_CorExeMain2呢?这只是rotor和商业版的framework的一点区别而已。你可以用IDApro逆一下mscoree.dll,就可以看到_CorExeMain()只不过是一个中转,代码如下
进入后马上就调用了mscorwks.dll的_CorExeMain。而这个函数和rotor中刚才提到的_CorExeMain2提供的功能差不多,就开始exe载入的初始化了。这些都可以从反汇编代码与源代码比较看出来。继续回到sscli中,来看_CorExeMain2()的代码(ceemain.cpp)
大多数代码都可以略过,关键的就两个,一个是初始化ee(execute engine),初始化成功后就调用ExecuteEXE,参数是文件名。这里可以清楚地看到_CorExeMain()的传入参数是什么。ExecuteEXE()的代码不多,也是个跳板:
同样,关键的代码只有一行,SystemDomain::ExecuteMainMethod(hMod)。其中,字面上看ExecuteMainMethod是将传入的文件作为了一个module,在.net中,如果要以包含关系算的话,assembly > module > class > method。也就是说每一个assembly可能包含多个module,且至少有一个module有且只有一个MainMethod,就是入口方法。
下面转到SystemDomain::ExecuteMainMethod()的代码中(assembly.cpp)
关键的步骤还是两个,准备好线程环境,然后运行Main方法。下面来到clsload.cpp中看ClassLoader::RunMain,这也是这次我们的最后一站。
这些代码主要是进行方法最终运行前的一些准备,然后运行。分两种,有返回值的和void()的。下面的运行情况就是深入到framework的核心中了,改天看了再写吧。代码中运用了许多COM下的定义,也可见.net和COM关系的密切。就像.net下的Debugger和Profiler甚至直接调用了COM接口来编译。只是我对COM了解不深,无法就此问题深入。
http://bbs.pediy.com/showthread.php?threadid=31799
这里的源代码自然不是指.net Framework的源码,不过微软公开了一个代号为rotor的open source cli的源码,你可以把它看为轻量级的.net framework。最关键的是,它俩的运行机理大致相同。今天,我们就从rotor的源码中看看做为程序调试最基本的exe文件的动态加载。同样,先给出参考文献,免得有人说我抄袭。《inside the rotor cli》,另一本是《shared source cli》,只不过网上搞不到。当然,还要从MSDN的网站下载sscli2.0压缩包。
和win32下一样,系统会提供一个loader将exe读入,sscli中提供了另一个loader的例子:clix.exe。我们暂且把它看为系统默认的loader,来看源码(clix.cpp),注意红色的代码
代码:
DWORD Launch(WCHAR* pFileName, WCHAR* pCmdLine) { WCHAR exeFileName[MAX_PATH + 1]; DWORD dwAttrs; DWORD dwError; DWORD nExitCode; ... //这里进行一系列文件的属性检查 ... if (dwError != ERROR_SUCCESS) { // We can't find the file, or there's some other problem. Exit with an error. fwprintf(stderr, L"%s: ", pFileName); DisplayMessageFromSystem(dwError); return 1; // error } nExitCode = _CorExeMain2(NULL, 0, pFileName, NULL, pCmdLine); // _CorExeMain2 never returns with success _ASSERTE(nExitCode != 0); DisplayMessageFromSystem(::GetLastError()); return nExitCode; }
代码:
.text:79011B47 push offset a_corexemain ; "_CorExeMain" .text:79011B4C push [ebp+hModule] ; hModule .text:79011B4F call ds:__imp__GetProcAddress@8 ; GetProcAddress(x,x) .text:79011B55 test eax, eax .text:79011B57 jz loc_79019B46 .text:79011B5D call eax
代码:
__int32 STDMETHODCALLTYPE _CorExeMain2( // Executable exit code. PBYTE pUnmappedPE, // -> memory mapped code DWORD cUnmappedPE, // Size of memory mapped code __in LPWSTR pImageNameIn, // -> Executable Name __in LPWSTR pLoadersFileName, // -> Loaders Name __in LPWSTR pCmdLine) // -> Command Line { // This entry point is used by clix BOOL bRetVal = 0; //BEGIN_ENTRYPOINT_VOIDRET; // Before we initialize the EE, make sure we've snooped for all EE-specific // command line arguments that might guide our startup. HRESULT result = CorCommandLine::SetArgvW(pCmdLine); if (!CacheCommandLine(pCmdLine, CorCommandLine::GetArgvW(NULL))) { LOG((LF_STARTUP, LL_INFO10, "Program exiting - CacheCommandLine failed\n")); bRetVal = -1; goto exit; } if (SUCCEEDED(result)) result = CoInitializeEE(COINITEE_DEFAULT | COINITEE_MAIN); if (FAILED(result)) { VMDumpCOMErrors(result); SetLatchedExitCode (-1); goto exit; } // This is here to get the ZAPMONITOR working correctly INSTALL_UNWIND_AND_CONTINUE_HANDLER; // Load the executable bRetVal = ExecuteEXE(pImageNameIn); ... ...
大多数代码都可以略过,关键的就两个,一个是初始化ee(execute engine),初始化成功后就调用ExecuteEXE,参数是文件名。这里可以清楚地看到_CorExeMain()的传入参数是什么。ExecuteEXE()的代码不多,也是个跳板:
代码:
BOOL STDMETHODCALLTYPE ExecuteEXE(HMODULE hMod) { STATIC_CONTRACT_GC_TRIGGERS; _ASSERTE(hMod); if (!hMod) return FALSE; ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_EXEC_EXE); TIMELINE_START(STARTUP, ("ExecuteExe")); EX_TRY_NOCATCH { // Executables are part of the system domain SystemDomain::ExecuteMainMethod(hMod); } EX_END_NOCATCH; ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_EXEC_EXE+1); TIMELINE_END(STARTUP, ("ExecuteExe")); return TRUE; }
同样,关键的代码只有一行,SystemDomain::ExecuteMainMethod(hMod)。其中,字面上看ExecuteMainMethod是将传入的文件作为了一个module,在.net中,如果要以包含关系算的话,assembly > module > class > method。也就是说每一个assembly可能包含多个module,且至少有一个module有且只有一个MainMethod,就是入口方法。
下面转到SystemDomain::ExecuteMainMethod()的代码中(assembly.cpp)
代码:
INT32 Assembly::ExecuteMainMethod(PTRARRAYREF *stringArgs) { CONTRACTL { INSTANCE_CHECK; THROWS; GC_TRIGGERS; MODE_ANY; ENTRY_POINT; INJECT_FAULT(COMPlusThrowOM()); } CONTRACTL_END; HRESULT hr = S_OK; INT32 iRetVal = 0; BEGIN_ENTRYPOINT_THROWS; Thread *pThread = GetThread(); MethodDesc *pMeth; { // This thread looks like it wandered in -- but actually we rely on it to keep the process alive. pThread->SetBackground(FALSE); GCX_COOP(); pMeth = GetEntryPoint(); if (pMeth) { RunMainPre(); hr = ClassLoader::RunMain(pMeth, 1, &iRetVal, stringArgs); } } //RunMainPost is supposed to be called on the main thread of an EXE, //after that thread has finished doing useful work. It contains logic //to decide when the process should get torn down. So, don't call it from // AppDomain.ExecuteAssembly() if (pMeth) { if (stringArgs == NULL) RunMainPost(); } else { StackSString displayName; GetDisplayName(displayName); COMPlusThrowHR(COR_E_MISSINGMETHOD, IDS_EE_FAILED_TO_FIND_MAIN, displayName); } if (FAILED(hr)) ThrowHR(hr); END_ENTRYPOINT_THROWS; return iRetVal; }
关键的步骤还是两个,准备好线程环境,然后运行Main方法。下面来到clsload.cpp中看ClassLoader::RunMain,这也是这次我们的最后一站。
代码:
HRESULT ClassLoader::RunMain(MethodDesc *pFD , short numSkipArgs, INT32 *piRetVal, PTRARRAYREF *stringArgs /*=NULL*/) { STATIC_CONTRACT_THROWS; _ASSERTE(piRetVal); DWORD cCommandArgs = 0; // count of args on command line DWORD arg = 0; LPWSTR *wzArgs = NULL; // command line args HRESULT hr = S_OK; *piRetVal = -1; // The exit code for the process is communicated in one of two ways. If the // entrypoint returns an 'int' we take that. Otherwise we take a latched // process exit code. This can be modified by the app via setting // Environment's ExitCode property. if (stringArgs == NULL) SetLatchedExitCode(0); if (!pFD) { _ASSERTE(!"Must have a function to call!"); return E_FAIL; } CorEntryPointType EntryType = EntryManagedMain; ValidateMainMethod(pFD, &EntryType); if ((EntryType == EntryManagedMain) && (stringArgs == NULL)) { // If you look at the DIFF on this code then you will see a major change which is that we // no longer accept all the different types of data arguments to main. We now only accept // an array of strings. wzArgs = CorCommandLine::GetArgvW(&cCommandArgs); // In the WindowsCE case where the app has additional args the count will come back zero. if (cCommandArgs > 0) { if (!wzArgs) return E_INVALIDARG; } } ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_MAIN); TIMELINE_START(STARTUP, ("RunMain")); EX_TRY_NOCATCH { MethodDescCallSite threadStart(pFD); PTRARRAYREF StrArgArray = NULL; GCPROTECT_BEGIN(StrArgArray); // Build the parameter array and invoke the method. if (EntryType == EntryManagedMain) { if (stringArgs == NULL) { // Allocate a COM Array object with enough slots for cCommandArgs - 1 StrArgArray = (PTRARRAYREF) AllocateObjectArray((cCommandArgs - numSkipArgs), g_pStringClass); // Create Stringrefs for each of the args for( arg = numSkipArgs; arg < cCommandArgs; arg++) { STRINGREF sref = COMString::NewString(wzArgs[arg]); StrArgArray->SetAt(arg-numSkipArgs, (OBJECTREF) sref); } } else StrArgArray = *stringArgs; } #ifdef STRESS_THREAD OBJECTHANDLE argHandle = (StrArgArray != NULL) ? CreateGlobalStrongHandle (StrArgArray) : NULL; Stress_Thread_Param Param = {pFD, argHandle, numSkipArgs, EntryType, 0}; Stress_Thread_Start (&Param); #endif ARG_SLOT stackVar = ObjToArgSlot(StrArgArray); if (pFD->IsVoid()) { // Set the return value to 0 instead of returning random junk *piRetVal = 0; threadStart.Call(&stackVar); } else { *piRetVal = (INT32)threadStart.Call_RetArgSlot(&stackVar); if (stringArgs == NULL) { SetLatchedExitCode(*piRetVal); } } GCPROTECT_END(); fflush(stdout); fflush(stderr); } EX_END_NOCATCH ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_MAIN+1); TIMELINE_END(STARTUP, ("RunMain")); return hr; }
这些代码主要是进行方法最终运行前的一些准备,然后运行。分两种,有返回值的和void()的。下面的运行情况就是深入到framework的核心中了,改天看了再写吧。代码中运用了许多COM下的定义,也可见.net和COM关系的密切。就像.net下的Debugger和Profiler甚至直接调用了COM接口来编译。只是我对COM了解不深,无法就此问题深入。
posted on 2009-10-14 14:13 John Connor 阅读(625) 评论(1) 编辑 收藏 举报