FLARE 脚本系列:自动解密混淆字符串

翻译自FLARE Script Series: Automating Obfuscated String Decoding

We are expanding our script series beyond IDA Pro. This post extends the FireEye Labs Advanced Reverse Engineering (FLARE) script series to an invaluable tool for the reverse engineer – the debugger. Just like IDA Pro, debuggers have scripting interfaces. For example, OllyDbg uses an asm-like scripting language, the Immunity debugger contains a Python interface, and Windbg has its own language. Each of these options isn’t ideal for rapidly creating string decoding debugger scripts. Both Immunity and OllyDbg only support 32-bit applications, and Windbg’s scripting language is specific to Windbg and, therefore, not as well-known.
The pykd project was created to interface between Python and Windbg to allow debugger scripts to be written in Python. Because malware reverse engineers love Python, we built our debugger scripting library on top of pykd for Windbg.

我们正在扩展我们的脚本系列到IDA pro以外的范围。这篇帖子延伸FLARE脚本系类到对逆向工程师来说非常珍贵的工具，调试器。正如IDA pro一样，调试器拥有脚本接口。举个例子，OD有类似汇编的脚本语言，Immunity debugger包含了python接口，windbg也有它自己的语言。上面说的每一种选择都不能理想地快速创建字符串解密调试脚本。Immunity and OllyDbg仅支持32bit应用程序，windbg的调试语言又是特定于windbg的，正因如此，所有不是非常出名。pykd项目在python和windbg和python之间创造了一个接口，允许调试脚本能用python来写。因为恶意软件逆向工程师喜欢python，我们在pykd的基础上，为widbg创建我们的调试脚本库。

Here we release a library we call flare-dbg. This library provides several utility classes and functions to rapidly develop scripts to automate debugging tasks within Windbg. Stay tuned for future blog posts that will describe additional uses for debugger scripts!

这里我们发布了一个叫做flare-dbg的库。这个库提供了大量的工具类和函数去加速开发在windbg中自动调试脚本，请继续关注后续添加的调试的脚本方面的文章。

String Decoding

Malware authors like to hide the intent of their software by obfuscating their strings. Quickly deobfuscating strings allows you to quickly figure out what the malware is doing.

恶意软件的作者喜欢通过混淆字符串去隐藏他们软件的意图。如果能够快速的反混淆字符串，将会使你快速的发现恶意软件的行为。

As stated in Practical Malware Analysis, there are generally two approaches to deobfuscating strings: self-decoding and manual programming.
The self-decoding approach allows the malware to decode its own strings. Manual programming requires the reverse engineer to reprogram the decoding function logic.A subset of the self-decoding approach is emulation, where each assembly instruction execution is emulated.Unfortunately, library call emulation is required, and emulating every library call is difficult and may cause inaccurate results.In contrast, a debugger is attached to the actual running process, so all the library functions can be run without issue.Each of these approaches has their place, but this post teaches a way to use debugger scripting to automatically self-decode all obfuscated strings.

正如在 Practical Malware Analysis所说的一样，烦混淆字符串的途径有2种，一种是自解密，一种是手动编程。自解密的途径是让恶意软件自己解密他们的字符串。手动编程需要逆向工程师(自己)去重新编程解密函数的逻辑。自解密的途径小类中有一种是模拟器，在模拟器中每一条汇编指令都会被模拟执行。不幸的是，这需要模拟库函数调用，模拟所有的库函数调用是非常困难的，而且可能造成错误的结果。与此相反的是用调试器附加到实际运行的进程，所以所有的库函数都被准确无误地执行。每一种途径都有适用的地方，这篇文章讲述使用调试脚本自解密所有混淆字符串。

Challenge

To decode all obfsucated strings, we need to find the following: the string decoder function, each time it is called, and all arguments to each of those instances. We then need to run the function and read out the result. The challenge is to do this in a semi-automated way.

为了解密所有混淆字符串，我们需要去找到以下：字符串解密函数，他们每一次的调用，和每次调用时使用的参数。我们需要去运行这些函数然后读出结果。这个挑战去做这个是还是半自动的方式。

Approach

The first task is to find the string decoder function and get a basic understanding of the inputs and outputs of the function.The next task is to identify each time the string decoder function is called and all of the arguments to each call.Without using IDA, a handy Python project for binary analysis is Vivisect.Vivisect contains several heuristics for identifying functions and cross-references.Additionally, Vivisect can emulate and disassemble a series of opcodes, which can help us identify function arguments. If you haven’t already, be sure to check out the FLARE scripting series post on tracking function arguments using emulation, which also uses Vivisect.

第一个任务是去找到字符串解密的函数，并且获得对函数输入输出的基本的理解。下一个任务是去鉴定每一次字符串解密函数被调用时所有的参数。不能使用IDA，一个便利的二进制分析python项目是Vivisect.Vivisect包含了几个识别函数盒交叉引用的启发式引擎。另外，Vivisect能模拟执行和反汇编一连串的字节码，这能帮助我们识别函数的参数。如果你尚未有Vivisect的环境，一定要确认安装好，FLARE脚本系列文章《使用模拟器追踪函数参数》
时，同样需要Vivisect。

Introducing flare-dbg

The FLARE team is introducing a Python project, flare-dbg that runs on top of pykd. Its goal is to make Windbg scripting easy. The heart of the flare-dbg project lies in the DebugUtils class, which contains several functions to handle:

FLARE团队正在介绍的python项目flare-dbg是基于pykd运行的。这样做的目的是使windbg脚本简单。flare-dbg的核心在于DebugUtils类，这个类包含了很多函数去处理：

Memory and register manipulation 内存和寄存器操作
Stack operations 栈操作
Debugger execution 调试执行
Breakpoints 断点
Function calling 函数调用

In addition to the basic debugger utility functions, the DebugUtils class uses Vivisect to handle the binary analysis portion.

除了基本的调试器功能函数之外，DebugUtils类使用Vivisect处理二进制分析部分。

Example

I wrote a simple piece of malware that hides strings by encoding them. Figure 1 shows an HTTP User-Agent string being decoded by a function I named string_decoder.

我写了一个简单的通过编码隐藏字符串的恶意软件片段，图1展示了 HTTP User-Agent被名为string_decoder的函数解密

After a cursory look at the string_decoder function, the arguments are identified as an offset to an encoded string of bytes, an output address, and a length. The function can be described as the following C prototype:

大致看一下string_decoder函数，参数是指定字符串的偏移，输出地址，长度。这个函数能被下面的c函数原型描述：

Now that we have a basic understanding of the string_decoder function, we test decoding using Windbg and flare-dbg.We begin by starting the process with Windbg and executing until the program’s entry point. Next, we start a Python interactive shell within Windbg using pykd and import flaredbg.

现在我们对string_decoder函数有了一个基本的认识，我们尝试解密字符串使用 windbg和flare-dbg.我们通过在windbg里启动进程并且执行到程序入口点开始第一步。接下来，我们在windbg里打开一个python交互式窗口使用pykd和导入flaredbg。

Next, we create a DebugUtils object, which contains the functions we need to control the debugger.

接下来，我们定义一个DebugUtils类的对象，这个对象包含了我们需要去控制调试器的函数。

We then allocate 0x3A-bytes of memory for the output string. We use the newly allocated memory as the second parameter and setup the remainder of the arguments.

接下来，我们为输出字符串分配0x3A字节的内存。我们使用新分配的内存作为第二个参数，然后设置剩下的参数。

Finally, we call the string_decoder function at virtual address 0x401000, and read the output string buffer.

最后，我们调用在虚拟地址0x041000处的string_decoder函数，并且读取输出字符串缓冲区。

After proving we can decode a string with flare-dbg, let’s automate all calls to the string_decoder function. An example debugger script is shown in Figure 2. The full script is available in the examples directory in the github repository.

经过证明我们可以用flare-dbg解密字符串。让我们自动所有string_decoder的函数的调用。一个实例调试脚本如图2.完整的脚本在github仓库里的examples目录里可以获得。

Let’s break this script down. First, we identify the function virtual address of the string decoder function and create a DebugUtils object. Next, we use the DebugUtils function get_call_list to find the three push arguments for each time string_decoder is called.

让我们打断这个脚本。第一，我们鉴定字符串解密函数的虚拟地址，同时创建一个DebugUtils对象。下一步，我们使用DebugUtils中函数get_call_list去找到每次调用string_decoder函数时push的三个参数

Once the call_list is generated, we iterate all calling addresses and associated arguments. In this example, the output string is decoded to the stack.Because we are only executing the string decoder function and won’t have the same stack setup as the malware, we must allocate memory for the output string. We use the third parameter, the length, to specify the size of the memory allocation. Once we allocate memory for the output string, we set the newly allocated memory address as the second parameter to receive the output bytes.

一旦call_list生成了，我们遍历所有的地址地址和管理的参数。在这个例子离，输出字符串被解密到栈上。因为我们仅仅执行字符串解密函数，而我们没有和malware同样的栈设置，我们必须为输出字符串分配内存。我们使用第三个参数-长度，去指定分配的内存的大小。一旦我们为输出字符串分配了内存，我们设置新分配的内存地址作为第二个参数去接输出的字符串。

Finally, we run the string_decoder function by using the DebugUtils call function and read the result from our allocated buffer. The call function sets up the stack, sets any specified register values, and executes the function. Once all strings are decoded, the final step is to get these strings back into our IDB. The utils script contains utility functions to create IDA Python scripts. In this case, we output an IDA Python script that creates comments in the IDB.

最后，我们运行通过DebugUtils调用函数string_decoder，然后从分配的内存中读取结果。调用函数建立堆栈，设置指定的寄存器的值，然后执行这个函数。一旦所有的字符串解密了，最后一步是把这些字符串写回IDB中。这个工具套件包含了创建IDApython的脚本的函数，在这个例子中，我们输出一个IDApython脚本用来创建IDB中的注释。

Running this debugger script produces the following output:

运行调试脚本产生下面的输出

The output IDA Python script creates repeatable comments on all encoded string locations, as shown in Figure 3.

输出的IDApython脚本重复的注释在每一个加密的字符串位置，如图3所示：

Conclusion

Stay tuned for another debugger scripting series post that will focus on plugins! For now, head over to the flare-dbg github project page to get started. The project requires pykd,winappdbg, and vivisect.

请继续关注调试脚本系列的其他的关注在插件的文章。对于目前来讲，回头到flare-dbg的github
项目主页去入门。这个项目需要pykd，winappdbg,vivisect

posted @ 2015-12-30 16:54 Lnju 阅读(646) 评论(0) 收藏举报

刷新页面返回顶部

Loading

Lnju的博客