二进制翻译(PIN & DynamoRIO)
原文:http://blog.deniable.org/posts/binary-instrumentation/
Dynamic Binary Instrumentation Primer
Dynamic Binary Instrumentation
(DBI
) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code - Uninformed 2007
Introduction
The purpose of this post is to document my dive into the “world” of Dynamic Binary Instrumentation
. I’ll cover some of the most well known and used DBI
frameworks. That is Pin, DynamoRIO, and Frida. From these three I’ll mainly focus on Pin
. There are other DBI
frameworks that I won’t touch at all, like Valgrind, Triton (uses Pin
), QDBI, BAP, Dyninst, plus many others. You might want to have a look at them. Some are more mature, some are less mature. Some have more features, some have fewer features. You’ll have to do some research yourself and see which ones fit your needs. Even though Valgrind is one of the most widely known, and used DBI
frameworks, it’s only available for Linux
. So, I won’t touch it at all.
In my vulnerability hunting adventures I’ve been focused on Windows
, and in fact, if you want to take the code I’ll present here and build it on Linux
it should be pretty straightforward. While the opposite wouldn’t be true. The reason being is that building Pin or DynamoRIO on Windows
can be a bit frustrating. Especially if you aren’t motivated to do so.
I’m not an expert in this area (DBI
), however since the beginning of the year that I’ve been doing some experiments around Fuzzing
, and I’ve read a lot about the subject. Hence, I’ll try to document some of what I learned for future reference. Possibly you’ll also find it useful. Note that my goal was to write a reference and not a tutorial.
The funny part is that I actually thought about doing “something” with Pin, or DynamoRIO, while trying to do some browser Heap Spraying. Basically, I wanted to monitor the memory allocations my code was producing. While I could do it inside a debugger
I thought, “why not use a DBI framework? Maybe I can learn something”. After all, debuggers are slow. Until today, I’m still unsure if I prefer to use WinDbg
or Pin
for this anyway.
Instrumentation
According to Wikipedia, instrumentation
refers to an ability to monitor or measure the level of a product’s performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system (…). When an application contains instrumentation code, it can be managed using a management tool. Instrumentation is necessary to review the performance of the application. Instrumentation approaches can be of two types: Source instrumentation and binary instrumentation.
As stated above, there are two types of instrumentation. Source instrumentation
, which is not possible if you don’t have the source code of the software application. And binary instrumentation
, which can be used with any software application assuming we can execute it. It turns out that most of the programs you run on a Windows
operating system are closed source. Which means, in this post, I’ll be “talking” only about binary instrumentation
. Often called Dynamic Binary Instrumentation
, or Dynamic Binary Modification
. Because words take too long, usually people use the acronym DBI
, as I already did above.
In a one-line statement, Dynamic Binary Instrumentation
is a technique that involves injecting instrumentation
code into a running process. The instrumentation code
will be entirely transparent to the application that it’s been injected to.
With a DBI
framework, we can analyze the target binary execution step by step. However, note that the analysis only applies to executed code.
Dynamic Program Analysis
There are two types of program analysis, static
, and dynamic
. We perform static analysis
without running a computer program. While we perform dynamic analysis
when we run a computer program.
Citing Wikipedia again, Dynamic program analysis is the analysis of computer software that is performed by executing programs on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to produce interesting behavior. Use of software testing measures such as code coverage helps ensure that an adequate slice of the program’s set of possible behaviors has been observed.
Dynamic binary modification tools, like the frameworks mentioned earlier, introduce a layer between a running program and the underlying operating system. Providing a unique opportunity to inspect and modify user-level program instructions while a program executes.
These systems are very complex internally. However, all the complexity is masked in an API
that allows any user to quickly build a multitude of tools to aid software analysis. And that’s what I’ll try to show in this post
, by sharing some code I wrote while playing with some DBI
frameworks.
There are many reasons for us to observe and modify the runtime behavior of a computer program. Software and/or hardware developers, system engineers, bug hunters, malware analysts, end users, and so on. All of them will have their own reasons. DBI
frameworks provide access to every executed user-level instruction. Besides a potentially small runtime and memory overhead, the program will run identically to a native execution.
You can say that the main advantage of static analysis is that it ensures 100% code coverage. With dynamic analysis, to ensure a high code coverage we’ll need to run the program many times, and with different inputs so the analysis takes different code paths. However, in some cases, the software applications are so big that’s too costly to perform static analysis. I would say, one complements the other. Even though static analysis is very boring, and dynamic analysis is (very) fun.
As I mentioned before, DBI
frameworks operate directly in binaries/executables. We don’t need the source code of the program. We don’t need to (re)compile or (re)link the program. Obviously, this is an major advantage, as it allows us to analyze proprietary software.
A dynamic binary system operates at the same time as the “guest” program executes while performing all the requested/required modifications on the fly. This dynamic approach can also handle programs that generate code dynamically (even though it imposes a big engineering challenge), that is, self-modifying code. If you “google” a bit you’ll actually find multiple cases where DBI
frameworks are/were used to analyze malware with self-modifying code. As an example, check this presentation from last year’s blackhat Europe. Or, this post about how to unpack Skype
with Pin
.
DBI
frameworks are daily used to solve computer architecture problems, being heavily used in software engineering, program analysis, and computer security. Software engineers want to deeply understand the software they develop, analyze its performance, and runtime behavior in a systematic manner. One common use of DBI
frameworks is emulating new CPU
instructions. Since the dynamic binary system has access to every instruction before executing it, hardware engineers can actually use these systems to test new instructions that are currently unsupported by the hardware. Instead of executing a specific instruction, they can emulate the new instruction behavior. The same approach can be used to replace faulty instructions with the correct emulation of the desired behavior. Anyway, from a computer security perspective, a DBI
system can be used for flow analysis
, taint analysis
, fuzzing
, code coverage
, test cases generation
, reverse engineering
, debugging
, vulnerability detection
, and even crazy things like patching of vulnerabilities
, and automated exploit development
.
There are two main ways of using a dynamic binary system. The first, and eventually most common, in computer security at least, is executing a program from start to finish under the control of the dynamic binary system. We use it when we want to achieve full system simulation/emulation because full control and code coverage are desired. The second, we may just want to attach to an already running program (exactly in the same way a debugger
can be attached, or detached, from a running program). This option might be useful if we are interested in figuring out what a program is doing in a specific moment.
Besides, most of the DBI
frameworks have three modes of execution. Interpretation mode
, probe mode
, and JIT mode
. The JIT (just-in-time) mode is the most common implementation, and most commonly used mode even when the DBI
system supports more than one mode of execution. In JIT mode the original binary/executable is actually never modified or executed. The binary is seen as data, and a modified copy of the binary is generated in a new memory area (but only for the executed parts of the binary, not the whole binary). Is this modified copy that’s then executed. In interpretation mode, the binary is also seen as data, and each instruction is used as a lookup table of alternative instructions that have the corresponding functionality (as implemented by the user). In probe mode, the binary is actually modified by overwriting instructions with new instructions. even though this results in a low run-time overhead it’s very limited in certain architectures (like x86).
Whatever the execution mode, once we have control over the execution of a program, through a DBI
framework, we then have the ability to add instrumentation
into the executing program. We can insert our code, instrumentation
, before and after blocks
of code, or even replace them completely.
We can visualize how it works in the diagram below.
Also, there are different types of granularity.
- Instruction level
- Basic block level
- Function level
The granularity choice, as you can guess, will allow you to have more, or less, control over the execution of a program. Obviously, this will have an impact on performance. Also, note that instrumenting a program in its totality is unpractical in most cases.
Performance
You might be thinking what’s the performance impact of modifying a running program on the fly as described above. Well, I have a very limited experience to answer this question. However, after reading multiple papers, articles, and presentations, the overhead commonly observed depends on a random number of factors really. Anyway, as kind of expected, the modifications the user implements are responsible for the majority of the overhead. The number 30% is apparently accepted as a common average number observed. Can’t really remember where I read this to mention the source, but I definitely read it somewhere. You’ll find it for sure in the References
section anyway. Obviously, one of the first decisions that you, as a DBI
user, will have to make is to decide the amount of code coverage required by your needs and the amount of performance overhead you’ll be able to accept as reasonable.
Pin
Pin is a DBI
framework developed by Intel Corp. It allows us to build program analysis tools known as Pintools
, for Windows
, Linux
, and OSX
. We can use these tools to monitor, modify, and record the behavior of a program while it is running.
Pin
is proprietary software. However, we can download and use it free of charge for non-commercial use. Besides the documentation and the binaries, Pin
also includes source code for a large collection of sample Pintools
. These are invaluable examples that we must consider, and definitely read, before developing any Pintool
.
In my opinion, Pin
is the easiest DBI
framework to use. At least I felt it was easier to dive into it’s API
than into the DynamoRIO
one. Even though I didn’t spend too much time trying to learn other APIs
besides these two, I had a look at a few others. Like Valgrind, Triton, Dyninst, and Frida. The choice will always depend on what you intend to do, honestly.
If you want to create a commercial tool and distribute binary versions of it, Pin
won’t be a good choice. If that’s not the case, Pin
might be a very good choice. Mainly because based on the tests I did, Pin
is stable and reliable. I had some issues running some programs under some DBI
frameworks. Mainly big programs, like Office
suites, games, and AV engines. Some DBI
frameworks were failing miserably, some even with small applications.
Pin setup (Windows)
Pin
setup in Linux
is quite straightforward. However, on Windows
systems, it can be a bit tricky. See below how to quickly set it up to get started in case you want to try the samples I’ll present in this post.
Get the latest Pin
version from here, and unpack it on your C:\
drive, or wherever you want. For simplicity, I usually use C:\pin
. I advise you to do the same if you plan to follow some of the experiments presented in this post.
The Pin
zip file includes a big collection of sample Pintools
under source/tools. The API
is very easy to read and understand as we’ll see. By the end of this post you should be able to read the source code of most of the samples without any struggle (well, kind of).
I like Visual Studio
, and I’ll be using it to build “every” tool mentioned in this post. There’s one Pintool
sample that’s almost ready to be built with Visual Studio
. You’ll have to adjust only a couple of settings. However, I didn’t want to manually copy and rename files every time I wanted to create a new Pintool
project. So I created a sample project already tweaked, available here that you can place under C:\pin\source\tools
, together with the following python script. The script was inspired by Peter’s script. However, since the way newer versions of Visual Studio
save the settings has changed I had to re-write/create a completely new script.
So, every time you want to build a new Pintool
with Visual Studio
, just do:
cd\
cd pin
python create_pintool_project.py -p <name_of_your_project>
You can then just click the project’s solution file and build your Pintool
with Visual Studio
without any pain. I used Visual Studio Professional 2015
, but it will also work with Visual Studio 2017
. I did a couple of builds with Visual Studio 2017 Enterprise
without any issue.
Pin Visual Studio integration
We can add our Pintools
as external tools to Visual Studio
. This will allow us to run, and test, our Pintool
without using the command line all the time. The configuration is very simple. From the Tools
menu, select External tools
and a dialog box will appear. Click the Add
button and fill out the text input boxes according to the image below.
In the Title
, input text box enter whatever you want. In the Command
input text box enter the full path to your pin.exe
, so c:\pin\pin.exe
in case you installed it under c:\pin
. In the Arguments
, you must include all the arguments you want to pass to your Pintool. You’ll need at least the ones specified in the image above. The -t
is to specify where your Pintool is, and after the --
is the target program you want to instrument.
After the setup, you can simply run your Pintool from the Tools
menu as shown in the image below.
Click ok, and enjoy.
The Output
window of Visual Studio
will show whatever the output your Pintool
is writing to stdout
.
DynamoRIO
DynamoRIO is another DBI
framework originally developed in a collaboration between HP’s Dynamo optimization system and the Runtime Introspection and Optimization (RIO) research group at MIT. It allows us to build program analysis tools known as clients
, for Windows
, and Linux
. We can use these tools to monitor, modify, and record the behavior of a program while it is running.
DynamoRIO
was first released as a proprietary binary toolkit in 2002 and was later open-sourced with a BSD license in 2009. Like Pin
, it also comes with source code for multiple client
samples. These are invaluable examples to get us started and playing with its API
.
DynamoRIO
is a runtime code manipulation system which allows code transformation on any part of the program as the program runs. It works as an intermediate platform between applications and operating system.
As I said before, I didn’t find DynamoRIO
's API
the most friendly and easy to use. However, if you plan to make a commercial version, and/or distribute binary versions, DynamoRIO
might be the best option. One of its advantages is the fact that it is BSD
licensed, which means free software
. If that’s important for you, go with DynamoRIO
.
Also note that’s commonly accepted that DynamoRIO
is faster than Pin
, check the References
section. However, is equally accepted that Pin
is more reliable than DynamoRIO
, which I also personally experienced when running big software programs.
DynamoRIO setup (Windows)
To install DynamoRIO
on Windows
simply download the latest Windows
version from here (DynamoRIO-Windows-7.0.0-RC1.zip at the time of this writing), and similarly to what we did with Pin
just unzip it under C:\dynamorio
.
To build your own DynamoRIO
projects on Windows it can be a bit tricky though. You can try to follow the instructions here or the instructions here or, to avoid frustration, just… use my DynamoRIO
Visual Studio
template project.
As I said before, I like Visual Studio
. I created a sample project already tweaked with all the includes
and libs
required (assuming you unzipped DynamoRIO
in the directory I mentioned before), available here. Then, more or less the same way we did with Pin
, also download the following python script. Since the file structure of the project is a bit different I couldn’t use the script
I wrote before to clone a project, and I had to create a new one specific to DynamoRIO
.
So, every time you want to build a new DynamoRIO
client with Visual Studio
, just do:
python create_dynamorio_project.py -p <name_of_your_project>
The command above assumes that both the Python
script and the template project mentioned above are in the same folder.
You can then just click the project’s solution file and build your DynamoRIO client
with Visual Studio
without any pain. I used Visual Studio Professional 2015
, but it will also work with Visual Studio 2017
. I did a couple of builds with Visual Studio 2017 Enterprise
without any issue.
DynamoRIO Visual Studio integration
We can also integrate DynamoRIO
with Visual Studio
, exactly the same way we did with Pin
. Since the setup process is exactly the same, I’ll only leave here the screenshot below and you can figure how to do the rest.
Frida
Frida is a DBI
framework developed mainly by Ole. It became very popular among the “mobile” community and gained a considerable group of contributors (now sponsored by NowSecure). Frida
supports OSX
, Windows
, Linux
, and QNX
, and has an API
available for multiple languages, like Python
, C#
, Swift
, Qt\QML
and C
. Just like the DBI
frameworks mentioned above, we can use Frida
together with scripts
to monitor, modify, and record the behavior of a program while it is running.
Frida
is free (free as in free beer) and is very easy to install (see below). There are also many usage examples online that we can use to get started. Frida
injects Google’s V8 engine into a process. Then, Frida
core communicates with Frida
's agent (process side) and uses the V8
engine to run the JavaScript
code (creating dynamic hooks).
Frida
's API
has two main parts. The JavaScript
API
and the bindings API
. I didn’t dive too deep into them and just used the most popular I believe. That is the JavaScript
API
. I found it easy to use, very flexible, and I could use it to quickly write some introspection tools.
Even though Pin
and DynamoRIO
are the “main” DBI
frameworks, and most mature, Frida
has some advantages. As mentioned above, it has bindings for other/more languages, and rapid tool development is a reality. It also has some disadvantages, less maturity, less documentation, less granularity than other frameworks, and consequently lack of some functionalities.
Frida setup (Windows)
Frida
's setup is very easy. Just download https://bootstrap.pypa.io/get-pip.py
and then run:
python get-pip.py
And, to actually install Frida
type the following.
cd\
cd Python27\Scripts
pip.exe install frida
And that’s it, you are ready to go. Yes, you have to install Python
before the steps above. However, I don’t know anyone that doesn’t have Python
installed so I just assume it’s already there.
Generic DBI usage
Before diving into some code, I’ll try to document in this section generic ways of using some of the DBI
frameworks I mentioned before. More precisely Pin, and DynamoRIO.
As mentioned before, the most common execution mode in a DBI
system is the JIT (just-in-time-compiler). The JIT compiler will create a modified copy of chunks of instructions just before executing them, and these will be cached in memory. This mode of execution is the default in most of the DBI
frameworks I had a look and is also generally accepted as the most robust execution model.
Also, as mentioned before, there are two main methods to control the execution of a program. The first is to run the entire program under the control of the DBI
framework. The second is to attach to a program already running. Just like a debugger.
Below is the standard way to run a program under the control of a DBI
system. Our target/guest application is not directly launched from the command line. Instead, it is passed as an argument to the DBI
system. The DBI
system initializes itself, and then launches the program under its control and modifies the program according to the plug-in. The plug-in contains the actual user-defined code, that is our instrumentation
code. The plug-in on Pin
it’s called Pintool
, on DynamoRIO
it’s called client
, on Frida
I believe it’s simply called script
?
PIN
JIT mode.
pin.exe <pin args> -t <pintool>.dll <pintool args> -- target_program.exe <target_program args>
PIN
Probe mode.
pin.exe -probe <pin args> -t <pintool>.dll <pintool args> -- target_program.exe <target_program args>
DynamoRIO
JIT mode.
drrun.exe -client <dynamorio client>.dll 0 "" target_program.exe <target_program args>
DynamoRIO
Probe mode.
drrun.exe -mode probe -client <dynamorio client>.dll 0 "" target_program.exe <target_program args>
As we can see above, the way we launch Pin and DynamoRIO it is not that different. In Linux
systems, it’s pretty much the same (yes, remove the .exe
, and substitute the .dll
by .so
and that’s it).
Obviously, there are many other options that can be passed on the command line besides the ones shown above. For a full list check the help/man pages. Above are just the required options for reference.
Frida is a bit different, and we’ll see ahead how to use it.
If you want to attach to a running process, you can do it with Pin
. However, as of today, attaching to a process with DynamoRIO
is not supported. However, there are two methods of running a process under DynamoRIO
in Windows
. You can read more about it here.
With Pin
you can simply attach to a process by using the -pid
argument as shown below.
pin.exe -pid <target pid> <other pin args> -t <pintool>.dll <pintool args>
User defined modifications
Despite the DBI
we are using, each DBI
framework provides an API
that we can use to specify how we modify the target/guest program. The abstraction introduced by the API
is used together with code usually written in C
, or C++
(or even JavaScript
, or Swift
in the case of Frida
) to create a plug-in (in the form of a shared library as we saw above) which will then be “injected” in the running target/guest program by the DBI
system. It will run on the same address space of the target/guest program.
This means that in order for us to use a DBI
system, we need not only to know how to launch a target/guest program, as illustrated above but also be familiar and understand the API
exported by the framework we want to use.
Unfortunately, the APIs
of these multiple frameworks are very different. However, as will see the general concepts apply to most of them. As I mentioned before, I’ll be focusing mainly in Pin
. I’ll also try to recreate more or less the same functionality with DynamoRIO
and Frida
, so we will also get a bit familiar with their API
somehow. Note that the API
coverage won’t be by any means extensive. I advise you to check each DBI
framework API
documentation if you want to know more. By following this post you’ll simply get a sense of what’s available in the API
, eventually limited to the use case scenario I chose.
The idea behind any API
is to hide the complexity of certain operations from the user, without removing any power to perform any task (including complex tasks). We can usually say that the easier is to use the API
the better it is.
All the APIs
allow us to, in a certain way, iterate over the instructions the DBI
system is about to run. This allows us to add, remove, modify, or observe the instructions prior to execute them. For example, my initial idea was to simply log (observe) all the calls to memory related functions (malloc
, and free
).
We can, not only introduce instructions to get profiling/tracing information about a program but also introduce complex changes to the point of completely replace certain instructions with a completely new implementation. Think for example, as replacing all the malloc
calls with your own malloc
implementation (that, for example, introduces shadow bytes and so on).
In DynamoRIO
it’s slightly different. However, in Pin
most of the API
routines are call based. This makes the API
very user-friendly. At least to the way I think when I visualize the usage of a DBI
system. This is also possible with DynamoRIO
, obviously, as we will see. Basically, we register a callback
to be notified
when certain events occur (a call to malloc
). For performance reasons, Pin
inlines these callbacks
.
As we saw, most of the DBI
frameworks support multiple operating systems, and platforms. Most of the time, the APIs
are the same and all the differences between operating systems are kept away from the user and handled “under the table”. However, there are still certain APIs
that are specific to certain operating systems. You need to be aware of that.
It’s also important to distinguish between instrumentation
and analysis
code. Instrumentation
code is applied to specific code locations, while analysis
code is applied to events that occur at some point in the execution of the program. As stated on Wikipedia Instrumentation routines are called when code that has not yet been recompiled is about to be run, and enable the insertion of analysis routines. Analysis routines are called when the code associated with them is run. In other words, instrumentation
routines define where to insert instrumentation. Analysis
routines define what to do when the instrumentation is activated.
The APIs
of Pin
, DynamoRIO
, and Frida
allow us to iterate over the target/guest program with a distinct level of granularities. That is, iterate over every single instruction, just before an instruction execute, entire basic blocks, traces (multiple basic blocks), or the entire target/guest program (image).
Example tool
As I mentioned, while I was playing with Heap Spraying
I felt the need of logging all the memory allocations my code was performing. Since I felt a bit annoyed after doing this repeatedly with WinDbg
, even with some automation, I thought about doing it with a DBI
framework. More precisely, with Pin
.
I remember that during one of Peter Van Eeckhoutte’s exploitation classes he mentioned he had written something similar. I looked at his GitHub and found his Pintool. I had a look at his code, but since he used Visual Studio 2012
, plus an old version of Pin
, plus a different approach of what I had in mind, plus a different goal (I had in mind doing something else besides logging memory allocations), and things changed a bit since then… I decided to write my own Pintool
instead of using, or modifying, his code. After all, it’s all about struggling and learning. Not running tools. Later I realized that most of his code comes from the Pin
documentation, so does mine.
The goal was to write a Pintool
, or a DynamoRIO client
, and use it to detect critical memory issues. Such as memory leaks and double frees. Yes, in C/C++
programs. You may say that there are plenty of tools that already allow you to do that, and that’s eventually true (in fact DynamoRIO
comes with a couple of tools that can help here). The point here was to learn how to write my own tool, have fun, get familiar with DBI
frameworks, and document my experiments for later reference. Eventually, it will also be used as a soft introduction to Dynamic Binary Analysis
by people who don’t know where to start.
So, the “critical memory issues” I had in mind weren’t really that difficult to trace
. After looking at some almost ready to go code samples, I found in the Pin
's documentation, I ended up expanding a bit the initial logging goal I had in mind. And added a couple of “features” to aid my vulnerability discover capabilities.
As you know, some common memory (de)allocation problems in C/C++
programs are:
- Memory leaks
- Double frees
- Invalid frees
- Use after frees (Note, the code presented in this post doesn’t detect potential
user after free
issues. An improved version of this code does, available here).
I assume everyone knows what the problems listed above are. If you don’t, or you need a ‘refresh’, just click the links above.
At least the first 3 problems are very “easy” to detect with a Pintool
, or a DynamoRIO client
. I’ll do a couple of assumptions. The target program is a single binary/executable file, and the only functions that I’ll track to allocate and free memory are malloc
and free
(calloc
, and realloc
are just “special” versions of malloc
anyway). Internally new
and delete
use malloc
and free
, so we are covered. I can simply “monitor” these calls. I won’t consider other functions like realloc
, calloc
, HeapAlloc
, HeapFree
, etc. (for now). Yes, for now, I’ll focus only on the generic malloc and free functions from the C Run-Time Library. In Windows, these functions when called will then call HeapAlloc and HeapFree.
Here’s a diagram showing the relationship of Windows API
calls used to allocate process memory (from the book The Art of Memory Forensics, and used with authorization. Thanks to Andrew Case).
As we can see above, ideally we should actually be “monitoring” RtlAllocateHeap
and RtlFreeHeap
. However, we can ignore this for now. This way, if you just want to try this code in Linux
, or OSX
, its mostly copy and paste. Later, in the main version of this tool, I’ll indeed be only working with the Windows Heap functions or my Pintool
won’t work with Internet Explorer
, for example.
Whenever a program calls malloc
, I’ll log the return address (that is, the address of the allocated memory region). Whenever a program calls free
, I’ll match its address being freed with the addresses I saved before. If it has been allocated and not freed, I’ll mark it as free. If it has been allocated and already freed, then we have a double free. If I don’t have that address saved has been allocated before, then we have a free of unallocated memory. Simple, huh? Finally, when the program exits, I can look at my records to detect memory addresses that have been allocated but not freed. This way I can also detect memory leaks.
As we’ll see, using a dynamic binary framework to achieve what’s described above can be done with very little effort. However, there are some issues that we’ll ignore to keep this post simple. As you can eventually guess, the Heap Manager
also plays a role here, and our tool might have to be Heap Manager
specific if we don’t want to be flooded with false positives. Also, as mentioned before, this tool will tell us there’s a bug, but not exactly where. You can tell your tool to break/pause
when an issue is found and attach a debugger
. However, depending on the class of bug
it may still be very hard to find where’s the bug
and reproduce it.
While I was writing this blog post, a very interesting tool from Joxean Koret called membugtool was released during the EuskalHack 2018 conference. His tool does a bit more than mine (well, actually considerable more), and the code is certainly better than mine. Keep following this post if you want to learn more about Pin
and other DBI
frameworks, but don’t forget to check his tool later. I was actually very happy when I saw it released because it means my idea wasn’t a complete nonsense. On top of that Joxean Koret is a respected researcher that I’ve been following for quite a long time, mainly due to his awesome work on breaking Antivirus engines.
Target/Guest program (ExercisePin.exe)
To test our multiple dynamic binary analysis tools, I wrote the following non-sense program (I called it ExercisePin.exe
). It’s quite clear that there are some memory leaks, an invalid free, and a potential double-free (depending on our input).
#include <stdio.h>
#include <stdlib.h>
void do_nothing() {
int *xyz = (int*)malloc(2);
}
int main(int argc, char* argv[]) {
free(NULL);
do_nothing();
char *A = (char*)malloc(128 * sizeof(char));
char *B = (char*)malloc(128 * sizeof(char));
char *C = (char*)malloc(128 * sizeof(char));
free(A);
free(C);
if (argc != 2)
do_nothing();
else
free(C);
puts("done");
return 0;
}
As you can see it’s a very stupid program, I recommend you to test your tools with real software and see how they behave. Also, check the previously mentioned project membugtool since it includes a very nice set of tests which actually made me lazy and I didn’t even try to improve the code above and create new sample buggy programs. Depending on which compiler you use to build this sample, you might have different results. I built mine with Visual Studio
. It has advantages, and disadvantages. If you prefer you can use Dev-C++ (which uses GCC), or cygwin (and install gcc
or i686-w64-mingw32-gcc.exe
), or even Embarcadero. Anyway, expect different results depending on the compiler you choose to build the target program.
Basic Pintool (MallocTracer)
In this first Pintool
example, I’m logging all the malloc
and free
calls. The instrumentation is added before and after the malloc
call and logs the parameter passed to the call and its return value. For the free
call we’ll only look at its parameter, and not at its return value. So the instrumentation is only added before the call. This Pintool
will not be very useful in big applications since it doesn’t really tell you where the issue is. Anyway, it is a good start and will serve the purpose of “showing” how the Pin
API
can be used.
We need to start by choosing which instrumentation granularity we’ll use. Have a look at the documentation for more details. I’ll be using Image
instrumentation
.
Image instrumentation lets the Pintool inspect and instrument an entire image, IMG, when it is first loaded. A Pintool can walk the sections, SEC, of the image, the routines, RTN, of a section, and the instructions, INS of a routine. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Image instrumentation utilizes the IMG_AddInstrumentFunction API call. Image instrumentation depends on symbol information to determine routine boundaries hence PIN_InitSymbols must be called before PIN_Init.
We start with some includes. To use the Pin API
we need to include pin.h
.
#include "pin.h"
#include <iostream>
#include <fstream>
#include <map>
The iostream
header is required for basic input/output operations, and the fstream
header is required because I’ll write the output of my Pintool
to a file. In small programs, we could live with the console output, however for big programs we need to save the output to a file. If you are instrumenting Internet Explorer
for example and playing with some JavaScript
code, the amount of malloc
and free
calls is impressive (well, RtlAllocateHeap
, and RtlFreeHeap
). In some big programs you might not even want to write to disk every time there’s a call due to performance reasons, but let’s ignore that to keep things simple.
Additionally, I’ll use a map
container to keep a log of all the memory allocated and freed. Check the References
section to see how the C++
map
container “works” if you aren’t used to writing code in C++
. Since I’m not a developer, I’m not, so my code can be a bit scary but hopefully works. Consider yourself warned.
I’ll also have some global variables. It’s very common to use global variables in a Pintool
, have a look at the samples provided to get a feeling of how they are most commonly used. In my case, I’ll use the following global variables.
map<ADDRINT, bool> MallocMap;
ofstream LogFile;
KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name");
I already mentioned the map
container above, again have a look here if you don’t know how it works. The idea is to store
in this MallocMap
the state of each allocation. The ADDRINT
type is defined in pin.h
, and as you can guess represents a memory address. It will be mapped to a BOOL
value. If the BOOL
value is set to true
it means it has been deallocated.
The LogFile
is the output file where I’ll save the output of the Pintool
. Lastly, the KNOB
variable. It is basically a switch supported by our Pintool
(a way to get command arguments to our Pintool
. This KNOB
allows us to specify the name of the log file through the “o” switch. Its default value is “memprofile.out”.
If we look at the main
function of the code samples, you’ll see that they are all very similar. And the one below is no exception.
int main(int argc, char *argv[])
{
PIN_InitSymbols();
PIN_Init(argc, argv);
LogFile.open(LogFileName.Value().c_str());
IMG_AddInstrumentFunction(CustomInstrumentation, NULL);
PIN_AddFiniFunction(FinalFunc, NULL);
PIN_StartProgram();
return 0;
}
I have to call PIN_InitSymbols
before PIN_Init
because I’m using Image instrumentation
, which depends on symbol information. Then I open the log file for writing, and I call IMG_AddInstrumentFunction. The instrumentation function that I’ll be using is called CustomInstrumentation
and is defined by me (not a Pin
API
function). You can call it whatever you want.
Then I have to call PIN_AddFiniFunction, which is a call to a function to be executed immediately before the application exits. In this case, my function is FinalFunc
.
Finally, I call PIN_StartProgram to start executing my program. This function never returns.
So let’s have a look at my CustomInstrumentation()
function.
VOID CustomInstrumentation(IMG img, VOID *v)
{
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
if (undFuncName == "malloc")
{
RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(allocRtn))
{
RTN_Open(allocRtn);
// Record Malloc size
RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
// Record Malloc return address
RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
RTN_Close(allocRtn);
}
}
else if (undFuncName == "free")
{
RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(freeRtn))
{
RTN_Open(freeRtn);
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_Close(freeRtn);
}
}
}
}
We need to “tell” Pin
what are the instrumentation
routines, and when to execute them. The instrumentation routine above is called every time an image
is loaded, and then we also “tell” Pin
where to insert the analysis
routines.
Basically, above, when we find a call to malloc
or free
we insert the analysis routines by using the RTN_InsertCall function.
The RTN_InsertCall
accepts multiple arguments, a variable number of arguments actually. Three are quite important, and you can easily guess which ones by looking at these calls. The first is the routine we want to instrument. The second is an IPOINT that determines where the analysis call is inserted relative to the instrumented object. And the third is the analysis routine to be inserted.
Also, note that all RTN_InsertCall
functions must be preceded by a call to RTN_Open
and followed by a call to RTN_Close
.
We can specify a list of arguments to be passed to the analysis routine, and this list must be terminated with IARG_END. As we can also guess by looking at the code, to pass the return value of malloc
to the analysis routine we use IARG_FUNCRET_EXITPOINT_VALUE. To pass the argument of the malloc
or free
calls to the analysis routine, we use IARG_FUNCARG_ENTRYPOINT_VALUE followed by the index of the argument. In our case, both are 0
(first and only argument).
All the Pin
functions that operate at the routine level start with RTN_
. Have a look at the RTN Routine Object
documentation here.
Also, all the Pin
functions that operate at the image
level start with IMG_
. Have a look at the IMG Image Object
documentation here.
The same applies to all the Pin
functions that operate at the symbol
level, they all (or almost all) start with SYM_
. Have a look at the SYM Symbol Object
documentation here.
You might be thinking how Pin
finds malloc
and free
. Pin
will use whatever symbol information is available. Debug symbols from the target/guest program if available, PDB
files if available, export tables, and dbghelp. There are two possible methods to instrument
our functions. We can use RTN_FindByName, or alternatively handling name-mangling and multiple symbols (the method I used) as shown below.
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
if (undFuncName == "malloc") // find the malloc function
After we find the calls (malloc
and free
in our example) we want to instrument, we “tell” Pin
which function must be called every time a malloc
call is executed.
// Record Malloc size
RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
// Record Malloc return address
RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
If we look at the code above, we have two calls to RTN_InsertCall. In the first, we “tell” Pin
which function must be called before the malloc
call. In the second we “tell” Pin
which function must be called after the malloc
call. We want to log the allocation sizes and the return value of the malloc
call. So, we need both.
For the free
call, we are only interested in its parameter (the address of the memory to free).
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
These three functions are very straightforward. First, before the malloc
call we just want to save the size of the memory being allocated.
VOID LogBeforeMalloc(ADDRINT size)
{
LogFile << "[*] malloc(" << dec << size << ")";
}
After the malloc
call, we just want to save the return address. However, as we can see below, we use the map
container and by using an iterator
we check if the chunk of memory is being allocated for the first time. If yes, we also log it.
VOID LogAfterMalloc(ADDRINT addr)
{
if (addr == NULL)
{
cerr << "[-] Error: malloc() return value was NULL. Heap full!?!";
return;
}
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
it->second = false;
else
cerr << "[-] Error: allocating memory not freed!?!" << endl;
}
else
{
MallocMap.insert(pair<ADDRINT, bool>(addr, false));
LogFile << "\t\t= 0x" << hex << addr << endl;
}
}
Finally, when we free
a chunk of memory we verify if that address was already freed to detect double frees. Plus, if we don’t know the address being freed then we are trying to free memory that wasn’t allocated before. Which can lead to undefined behavior?
VOID LogFree(ADDRINT addr)
{
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once." << endl; // Double free
else
{
it->second = true; // Mark it as freed
LogFile << "[*] free(0x" << hex << addr << ")" << endl;
}
}
else
LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl; // Freeing unallocated memory
}
Lastly, we have the call to FinalFunc
, which is executed just before the program ends. We basically verify if there’s memory that has been allocated but not freed, and we close our log file. The return of this function marks the end of the instrumentation.
VOID FinalFunc(INT32 code, VOID *v)
{
for (pair<ADDRINT, bool> p : MallocMap)
{
if (!p.second)
LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl;
}
LogFile.close();
}
Simple.
The whole Pintool
code is below. You can also get the whole Visual Studio
project from GitLab here.
// Built on top of https://software.intel.com/sites/default/files/managed/62/f4/cgo2013.pdf (slide 33)
#include "pin.h"
#include <iostream>
#include <fstream>
#include <map>
map<ADDRINT, bool> MallocMap;
ofstream LogFile;
KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name");
VOID LogAfterMalloc(ADDRINT addr)
{
if (addr == NULL)
{
cerr << "[-] Error: malloc() return value was NULL. Heap full!?!";
return;
}
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
it->second = false;
else
cerr << "[-] Error: allocating memory not freed!?!" << endl;
}
else
{
MallocMap.insert(pair<ADDRINT, bool>(addr, false));
LogFile << "\t\t= 0x" << hex << addr << endl;
}
}
VOID LogBeforeMalloc(ADDRINT size)
{
LogFile << "[*] malloc(" << dec << size << ")";
}
VOID LogFree(ADDRINT addr)
{
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once." << endl; // Double free
else
{
it->second = true; // Mark it as freed
LogFile << "[*] free(0x" << hex << addr << ")" << endl;
}
}
else
LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;
}
VOID CustomInstrumentation(IMG img, VOID *v)
{
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
if (undFuncName == "malloc")
{
RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(allocRtn))
{
RTN_Open(allocRtn);
// Record Malloc size
RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
// Record Malloc return address
RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
RTN_Close(allocRtn);
}
}
else if (undFuncName == "free")
{
RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(freeRtn))
{
RTN_Open(freeRtn);
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_Close(freeRtn);
}
}
}
}
VOID FinalFunc(INT32 code, VOID *v)
{
for (pair<ADDRINT, bool> p : MallocMap)
{
if (!p.second)
LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl;
}
LogFile.close();
}
int main(int argc, char *argv[])
{
PIN_InitSymbols();
PIN_Init(argc, argv);
LogFile.open(LogFileName.Value().c_str());
IMG_AddInstrumentFunction(CustomInstrumentation, NULL);
PIN_AddFiniFunction(FinalFunc, NULL);
PIN_StartProgram();
return 0;
}
If you run it against our ExercisePin.exe
(see the section Target/Guest Program
) binary.
C:\pin>pin -t c:\pin\source\tools\MallocTracer\Release\MallocTracer.dll -- ExercisePin.exe
done
C:\pin>type memprofile.out
[*] Freeing unallocated memory at address 0x0.
[*] malloc(2) = 0x564f68
[*] malloc(128) = 0x569b88
[*] malloc(128) = 0x569c10
[*] malloc(128) = 0x569c98
[*] free(0x569b88)
[*] free(0x569c98)
[*] malloc(2) = 0x564e78
[*] Memory at address 0x564e78 allocated but not freed
[*] Memory at address 0x564f68 allocated but not freed
[*] Memory at address 0x569c10 allocated but not freed
Or, if we pass any data as an argument to our ExercisePin.exe
…
C:\pin>pin.exe -t "C:\pin\source\tools\MallocTracer\Release\MallocTracer.dll" -- C:\TARGET\ExercisePin.exe moo
C:\pin>type memprofile.out
[*] Freeing unallocated memory at address 0x0.
[*] malloc(2) = 0x214f78
[*] malloc(128) = 0x218f98
[*] malloc(128) = 0x219020
[*] malloc(128) = 0x2190a8
[*] free(0x218f98)
[*] free(0x2190a8)
[*] Memory at address 0x2190a8 has been freed more than once (Double Free).
As we can see above, our Pintool
was able to identify all the issues we were aware of in our test case. That is, invalid free
, memory leaks
, and a double free
. The reason why we don’t see the memory leaks in the last output, it’s because our binary crashes when the double free happens. The binary was built with Visual Studio
, which adds some Heap
integrity checks and makes it crash. If you build ExercisePin.exe
with gcc
, or another compiler, the double free
won’t be noticed and the program will keep running. However, if you build it with gcc
, for example, you’ll see many other malloc
and free
calls from the C Run-Time
Library initialization code. Hence, I didn’t use gcc
to make it easier to follow.
Basic DynamoRIO client (MallocWrap)
We’ll create a DynamoRIO client
that mimics the Pintool
above. That is, we’ll log all the malloc
and free
calls. The same way, the instrumentation
is added before and after the malloc
call since we want to log the parameter passed to the call and its return value. For the free
call, we’ll only look at its parameter, and not at its return value. So the instrumentation is only added before the call.
We’ll use the drwrap DynamoRIO
extension, which provides function wrapping and replacing support, drwrap
uses the drmgr extension to ensure its events occur at the proper order.
We start with some “standard” includes, and to use the DynamoRIO
APIs
we need to include dr_api.h
.
#include "stdafx.h"
#include <fstream>
#include "dr_api.h"
#include "drmgr.h"
#include "drwrap.h"
using namespace std;
Additionally, we include the headers for the extensions mentioned above. That is, drmgr.h
and drwrap.h
. We’ll write the output of this DynamoRIO client
to a text file, hence the fstream
include. I won’t use a container in this example to keep track of the memory allocations. You can just copy and paste that functionality from the Pintool
above with slight modifications, so I’ll leave that for you as an exercise. In this example, we’ll simply log malloc
and free
calls to demonstrate how to use the DynamoRIO
API
to accomplish the same as before, where we used Pin
.
Then, we have the functions’ declaration, and some global variables.
static void event_exit(void);
static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data);
static void wrap_malloc_post(void *wrapcxt, void *user_data);
static void wrap_free_pre(void *wrapcxt, OUT void **user_data);
ofstream LogFile;
#define MALLOC_ROUTINE_NAME "malloc"
#define FREE_ROUTINE_NAME "free"
These are all cosmetic, we could have used these #define
s in our Pintool
too. We didn’t, the reason being is… we don’t have to. Feel free to adopt the style you want. I built this example on top of this one, so I ended up using more or less the same “style”. If you plan to port your client
or Pintool
to other platforms, this can be considered a good practice because it will make the changes easier.
Next, we have a function called module_load_event
, which his a callback
function registered by the drmgr_register_module_load_event. DynamoRIO
will call this function whenever the application loads a module. As you can see, not that different from Pin
.
static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded)
{
app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, MALLOC_ROUTINE_NAME);
if (towrap != NULL)
{
bool ok = drwrap_wrap(towrap, wrap_malloc_pre, wrap_malloc_post);
if (!ok)
{
dr_fprintf(STDERR, "[-] Could not wrap 'malloc': already wrapped?\n");
DR_ASSERT(ok);
}
}
towrap = (app_pc)dr_get_proc_address(mod->handle, FREE_ROUTINE_NAME);
if (towrap != NULL)
{
bool ok = drwrap_wrap(towrap, wrap_free_pre, NULL);
if (!ok)
{
dr_fprintf(STDERR, "[-] Could not wrap 'free': already wrapped?\n");
DR_ASSERT(ok);
}
}
}
As we can see above, we then use dr_get_proc_address to get the entry point of malloc
. If it doesn’t return NULL
(on failure), then we use drwrap_wrap to wrap the application function by calling wrap_malloc_pre()
prior to every invocation of the original function (malloc
) and calling wrap_malloc_post()
after every invocation of the original function (malloc
). Again, conceptually, very close to what we did with Pin
.
We do the same with free
. However, as stated before we are only interested in the free
parameter and not its return value. So we only wrap the free
call prior to every invocation (wrap_free_pre
). Since we don’t care about its return value we just pass NULL
as the third parameter to drwrap_wrap
. With drwrap_wrap
one of the callbacks can be NULL
, but not both.
We then have the dr_client_main, which is, let’s say, our main
function. DynamoRIO
looks up dr_client_main
in each client library and calls that function when the process starts.
We have a pretty common “main”, with calls to dr_set_client_name (which sets information presented to users in diagnostic messages), dr_log (which simply writes to DynamoRIO
's log file), and a couple of functions that you can guess what they do by its name.
Additionally, drmgr_init, and drwrap_init, initialize the respective extensions. The dr_register_exit_event is pretty much the same as the Pin
PIN_AddFiniFunction
, which is a call to a function to be executed immediately before the application exits.
Lastly, we have the call to drmgr_register_module_load_event
that we already mentioned above.
DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
{
LogFile.open("memprofile.out");
dr_set_client_name("DynamoRIO Sample Client 'wrap'", "http://dynamorio.org/issues");
dr_log(NULL, LOG_ALL, 1, "Client 'wrap' initializing\n");
if (dr_is_notify_on())
{
dr_enable_console_printing();
dr_fprintf(STDERR, "[*] Client wrap is running\n");
}
drmgr_init();
drwrap_init();
dr_register_exit_event(event_exit);
drmgr_register_module_load_event(module_load_event);
}
The function to be executed immediately before the application exits. Nothing special here.
static void event_exit(void)
{
drwrap_exit();
drmgr_exit();
}
And lastly, the callback
functions already mentioned before. What’s relevant here? The call drwrap_get_arg, that as we can guess “Returns the value of the arg-th argument (0-based) to the wrapped function represented by wrapcxt. Assumes the regular C calling convention (i.e., no fastcall). May only be called from a drwrap_wrap pre-function callback. To access argument values in a post-function callback, store them in the user_data parameter passed between the pre and post functions.". And the call drwrap_get_retval, which obviously returns the return value of the wrapped function.
static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data)
{
/* malloc(size) or HeapAlloc(heap, flags, size) */
//size_t sz = (size_t)drwrap_get_arg(wrapcxt, 2); // HeapAlloc
size_t sz = (size_t)drwrap_get_arg(wrapcxt, 0); // malloc
LogFile << "[*] malloc(" << dec << sz << ")"; // log the malloc size
}
static void wrap_malloc_post(void *wrapcxt, void *user_data)
{
int actual_read = (int)(ptr_int_t)drwrap_get_retval(wrapcxt);
LogFile << "\t\t= 0x" << hex << actual_read << endl;
}
static void wrap_free_pre(void *wrapcxt, OUT void **user_data)
{
int addr = (int)drwrap_get_arg(wrapcxt, 0);
LogFile << "[*] free(0x" << hex << addr << ")" << endl;
}
Very simple, and not that different from what we have seen before with Pin
.
The whole DynamoRIO client
code is below. You can also get the whole Visual Studio
project from GitLab here.
#include "stdafx.h"
#include <fstream>
#include "dr_api.h"
#include "drmgr.h"
#include "drwrap.h"
using namespace std;
static void event_exit(void);
static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data);
static void wrap_malloc_post(void *wrapcxt, void *user_data);
static void wrap_free_pre(void *wrapcxt, OUT void **user_data);
ofstream LogFile;
#define MALLOC_ROUTINE_NAME "malloc"
#define FREE_ROUTINE_NAME "free"
static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded)
{
app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, MALLOC_ROUTINE_NAME);
if (towrap != NULL)
{
bool ok = drwrap_wrap(towrap, wrap_malloc_pre, wrap_malloc_post);
if (!ok)
{
dr_fprintf(STDERR, "[-] Could not wrap 'malloc': already wrapped?\n");
DR_ASSERT(ok);
}
}
towrap = (app_pc)dr_get_proc_address(mod->handle, FREE_ROUTINE_NAME);
if (towrap != NULL)
{
bool ok = drwrap_wrap(towrap, wrap_free_pre, NULL);
if (!ok)
{
dr_fprintf(STDERR, "[-] Could not wrap 'free': already wrapped?\n");
DR_ASSERT(ok);
}
}
}
DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
{
LogFile.open("memprofile.out");
dr_set_client_name("DynamoRIO Sample Client 'wrap'", "http://dynamorio.org/issues");
dr_log(NULL, LOG_ALL, 1, "Client 'wrap' initializing\n");
if (dr_is_notify_on())
{
dr_enable_console_printing();
dr_fprintf(STDERR, "[*] Client wrap is running\n");
}
drmgr_init();
drwrap_init();
dr_register_exit_event(event_exit);
drmgr_register_module_load_event(module_load_event);
}
static void event_exit(void)
{
drwrap_exit();
drmgr_exit();
}
static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data)
{
/* malloc(size) or HeapAlloc(heap, flags, size) */
//size_t sz = (size_t)drwrap_get_arg(wrapcxt, 2); // HeapAlloc
size_t sz = (size_t)drwrap_get_arg(wrapcxt, 0); // malloc
LogFile << "[*] malloc(" << dec << sz << ")"; // log the malloc size
}
static void wrap_malloc_post(void *wrapcxt, void *user_data)
{
int actual_read = (int)(ptr_int_t)drwrap_get_retval(wrapcxt);
LogFile << "\t\t= 0x" << hex << actual_read << endl;
}
static void wrap_free_pre(void *wrapcxt, OUT void **user_data)
{
int addr = (int)drwrap_get_arg(wrapcxt, 0);
LogFile << "[*] free(0x" << hex << addr << ")" << endl;
}
If you run it against our ExercisePin.exe
(see the section Target/Guest Program
) binary.
C:\dynamorio\bin32>drrun.exe -client "C:\Users\bob\Desktop\WRKDIR\MallocWrap\Release\MallocWrap.dll" 0 "" c:\Users\bob\Desktop\ExercisePin.exe
[*] Client wrap is running
done
C:\dynamorio\bin32>type memprofile.out
[*] free(0x0)
[*] malloc(2) = 0x5a35d0
[*] malloc(128) = 0x5a9c50
[*] malloc(128) = 0x5a9cd8
[*] malloc(128) = 0x5a9d60
[*] free(0x5a9c50)
[*] free(0x5a9d60)
[*] malloc(2) = 0x5a34e0
We can extend this program to get the exact same functionality as our Pintool
and check for memory corruption bugs instead of logging the calls only. I’ll leave that as an exercise for you.
Basic Frida script (MallocLogger)
Frida
is a fast-growing DBI
framework, mainly used in mobile devices. I haven’t played much with mobile applications in a long time (it’s about to change though), still, I wanted to give Frida
a try because I heard good things about it, and it also supports Windows
. The interesting part here is that Frida
injects a JavaScript
interpreter in the target/guest program. So, instead of writing C
code, we’ll be writing JavaScript
to instrument our program (actually, if we want we can also use C
or Swift
). You can see this as an advantage, or disadvantage. If you are a vulnerability hunter, and you like to poke around browsers then this should be an advantage, I guess. It’s actually very interesting that we are writing instrumentation
code to manipulate low-level instructions by using a high-level language.
You can find the JavaScript API
here. Anyway, the use case will be exactly the same as the ones we saw before.
While the instrumentation
code has to be written in JavaScript
(well, again, that’s not true but let’s use JavaScript
because it’s cool), the resulting tools can be written in either Python
or JavaScript
.
We’ll use Frida’s Interceptor
to trace
all malloc
and free
calls for a start. The target will be our ExercisePin.exe
binary again. We’ll also try to create an output close to the one of our basic MallocTracer
Pintool
, and MallocWrap
DynamoRIO client
. Which means we’ll log the amount of memory requested, the return address of malloc
and the argument of free
.
Here’s the sample MallocLogger.py
Python
script.
#!/usr/bin/env python
import frida
import sys
pid = frida.spawn(['ExercisePin.exe'])
session = frida.attach(pid)
contents = open('mallocLogger.js').read()
script = session.create_script(contents)
script.load()
frida.resume(pid)
sys.stdin.read()
And below is the instrumentation JavaScript
file, MallocLogger.js
.
// Interceptor for 'malloc'
Interceptor.attach(Module.findExportByName(null, 'malloc'),
{
// Log before malloc
onEnter: function (args) {
console.log("malloc(" + args[0].toInt32() + ")");
},
// Log after malloc
onLeave: function (retval) {
console.log("\t\t= 0x" + retval.toString(16));
}
});
// Interceptor for 'free'
Interceptor.attach(Module.findExportByName(null, 'free'),
{
onEnter: function (args) {
console.log("free(0x" + args[0].toString(16) + ")");
}
});
If we run this Python
script we get something like.
C:\Users\bob\Desktop\frida>python MallocLogger.py
free(0x0)
malloc(2)
= 0x984268
malloc(128)
= 0x9856d8
malloc(128)
= 0x985760
malloc(128)
= 0x9857e8
done
free(0x9856d8)
free(0x9857e8)
malloc(2)
= 0x984278
Interestingly enough, Frida
also comes with an utility frida-trace.exe
that pretty much allows us to do the exact same thing we did above without writing almost any code (besides adding a bit more of information and tweaking the output).
C:\Users\bob\Desktop\frida>frida-trace -i malloc -i free .\ExercisePin.exe
Instrumenting functions...
malloc: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\malloc.js"
malloc: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\malloc.js"
free: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\free.js"
free: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\free.js"
Started tracing 4 functions. Press Ctrl+C to stop.
done
/* TID 0x1f84 */
125 ms free()
125 ms malloc()
125 ms malloc()
125 ms malloc()
125 ms malloc()
125 ms free()
125 ms free()
125 ms malloc()
Process terminated
If you look at the output above you can see that some JavaScript
handlers were auto-generated. We can just tweak this JavaScript
code to make the output look as before. If we open for example the file __handlers__\msvcrt.dll\malloc.js
we’ll see something like:
/*
* Auto-generated by Frida. Please modify to match the signature of malloc.
* This stub is currently auto-generated from manpages when available.
*
* For full API reference, see: http://www.frida.re/docs/javascript-api/
*/
{
/**
* Called synchronously when about to call malloc.
*
* @this {object} - Object allowing you to store state for use in onLeave.
* @param {function} log - Call this function with a string to be presented to the user.
* @param {array} args - Function arguments represented as an array of NativePointer objects.
* For example use Memory.readUtf8String(args[0]) if the first argument is a pointer to a C string encoded as UTF-8.
* It is also possible to modify arguments by assigning a NativePointer object to an element of this array.
* @param {object} state - Object allowing you to keep state across function calls.
* Only one JavaScript function will execute at a time, so do not worry about race-conditions.
* However, do not use this to store function arguments across onEnter/onLeave, but instead
* use "this" which is an object for keeping state local to an invocation.
*/
onEnter: function (log, args, state) {
log("malloc()");
},
/**
* Called synchronously when about to return from malloc.
*
* See onEnter for details.
*
* @this {object} - Object allowing you to access state stored in onEnter.
* @param {function} log - Call this function with a string to be presented to the user.
* @param {NativePointer} retval - Return value represented as a NativePointer object.
* @param {object} state - Object allowing you to keep state across function calls.
*/
onLeave: function (log, retval, state) {
}
}
We just need to tweak the onEnter
and onLeave
functions. For example.
/*
* Auto-generated by Frida. Please modify to match the signature of malloc.
* This stub is currently auto-generated from manpages when available.
*
* For full API reference, see: http://www.frida.re/docs/javascript-api/
*/
{
/**
* Called synchronously when about to call malloc.
*
* @this {object} - Object allowing you to store state for use in onLeave.
* @param {function} log - Call this function with a string to be presented to the user.
* @param {array} args - Function arguments represented as an array of NativePointer objects.
* For example use Memory.readUtf8String(args[0]) if the first argument is a pointer to a C string encoded as UTF-8.
* It is also possible to modify arguments by assigning a NativePointer object to an element of this array.
* @param {object} state - Object allowing you to keep state across function calls.
* Only one JavaScript function will execute at a time, so do not worry about race-conditions.
* However, do not use this to store function arguments across onEnter/onLeave, but instead
* use "this" which is an object for keeping state local to an invocation.
*/
onEnter: function (log, args, state) {
log("malloc(" + args[0].toInt32() + ")");
},
/**
* Called synchronously when about to return from malloc.
*
* See onEnter for details.
*
* @this {object} - Object allowing you to access state stored in onEnter.
* @param {function} log - Call this function with a string to be presented to the user.
* @param {NativePointer} retval - Return value represented as a NativePointer object.
* @param {object} state - Object allowing you to keep state across function calls.
*/
onLeave: function (log, retval, state) {
log("\t\t= 0x" + retval.toString(16));
}
}
Now, if we run again the exact same command as before we’ll get the following.
C:\Users\bob\Desktop\frida>frida-trace -i malloc -i free .\ExercisePin.exe
Instrumenting functions...
malloc: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\malloc.js"
malloc: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\malloc.js"
free: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\free.js"
free: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\free.js"
Started tracing 4 functions. Press Ctrl+C to stop.
done
/* TID 0x23e4 */
64 ms free(0x0)
64 ms malloc(2)
64 ms = 0x8a42a8
64 ms malloc(128)
64 ms = 0x8a57a0
64 ms malloc(128)
64 ms = 0x8a5828
64 ms malloc(128)
64 ms = 0x8a58b0
64 ms free(0x8a57a0)
64 ms free(0x8a58b0)
65 ms malloc(2)
65 ms = 0x8a42b8
Process terminated
We can extend this program to get the exact same functionality as our Pintool
, and check for memory corruption bugs instead of logging the calls only. I’ll leave that as an exercise for you.
Debugging
If you want to debug your Pintool
you should use the -pause_tool
switch and specify the number of seconds to wait until you attach the debugger to its process. See below how.
C:\pin\source\tools\MallocTracer\Release>c:\pin\pin.exe -pause_tool 20 -t "C:\pin\source\tools\MallocTracer\Release\MallocTracer.dll" -- ExercisePin.exe
Pausing for 20 seconds to attach to process with pid 1568
For debugging of the Pintool
I actually don’t use Visual Studio
, I prefer to use WinDbg
because I’m used to it and it is awesome. Once you attach to the process with WinDbg
it’s very easy to set up a breakpoint
wherever you like in your Pintool
. Below is just a simple example of setting a breakpoint
in the main
function of my Pintool
.
Microsoft (R) Windows Debugger Version 10.0.17134.12 X86
Copyright (c) Microsoft Corporation. All rights reserved.
*** wait with pending attach
Symbol search path is: srv*
Executable search path is:
ModLoad: 00080000 00087000 C:\pin\source\tools\MallocTracer\Release\ExercisePin.exe
ModLoad: 77800000 77980000 C:\Windows\SysWOW64\ntdll.dll
ModLoad: 769d0000 76ae0000 C:\Windows\syswow64\kernel32.dll
ModLoad: 76b50000 76b97000 C:\Windows\syswow64\KERNELBASE.dll
Break-in sent, waiting 30 seconds...
ModLoad: 54c20000 54f93000 MallocTracer.dll
It is now possible to set breakpoints in Pin tool.
Use "Go" command (F5) to proceed.
(620.12c0): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=53833c8c ecx=76b6388e edx=00000000 esi=53833c8c edi=53833cb8
eip=76b6338d esp=01ad1930 ebp=0042e7e4 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
KERNELBASE!DebugBreak+0x2:
76b6338d cc int 3
0:000> lmf
start end module name
00080000 00087000 ExercisePin C:\pin\source\tools\MallocTracer\Release\ExercisePin.exe
54c20000 54f93000 MallocTracer MallocTracer.dll
769d0000 76ae0000 kernel32 C:\Windows\syswow64\kernel32.dll
76b50000 76b97000 KERNELBASE C:\Windows\syswow64\KERNELBASE.dll
77800000 77980000 ntdll C:\Windows\SysWOW64\ntdll.dll
0:000> lmDvmMallocTracer
Browse full module list
start end module name
54c20000 54f93000 MallocTracer (deferred)
Image path: MallocTracer.dll
Image name: MallocTracer.dll
Browse all global symbols functions data
Timestamp: Sat Jun 30 14:28:14 2018 (5B37F5EE)
CheckSum: 00000000
ImageSize: 00373000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Information from resource tables:
0:000> x /D /f MallocTracer!a*
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
*** WARNING: Unable to verify checksum for MallocTracer.dll
54c549b8 MallocTracer!ASM_pin_wow64_gate (<no parameter info>)
54c5483c MallocTracer!ATOMIC_Increment16 (<no parameter info>)
54c547d0 MallocTracer!ATOMIC_Swap8 (<no parameter info>)
54c54854 MallocTracer!ATOMIC_Increment32 (<no parameter info>)
54e28b64 MallocTracer!ADDRINT_AtomicInc (<no parameter info>)
54c35e20 MallocTracer!atexit (<no parameter info>)
54c547fc MallocTracer!ATOMIC_Swap32 (<no parameter info>)
54c54740 MallocTracer!ATOMIC_SpinDelay (<no parameter info>)
54c533c0 MallocTracer!ATOMIC::LIFO_PTR<LEVEL_BASE::SWMALLOC::FREE_LIST_ELEMENT,3,LEVEL_BASE::ATOMIC_STATS>::PopInternal (<no parameter info>)
54e1a2b0 MallocTracer!abort (<no parameter info>)
54c54810 MallocTracer!ATOMIC_Copy64 (<no parameter info>)
54c547e4 MallocTracer!ATOMIC_Swap16 (<no parameter info>)
54c41710 MallocTracer!ATOMIC::LIFO_CTR<ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT,ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT_HEAP,1,32,unsigned __int64,ATOMIC::NULLSTATS>::Pop (<no parameter info>)
54c54824 MallocTracer!ATOMIC_Increment8 (<no parameter info>)
54c549bb MallocTracer!ASM_pin_wow64_gate_end (<no parameter info>)
54c5478c MallocTracer!ATOMIC_CompareAndSwap32 (<no parameter info>)
54c54750 MallocTracer!ATOMIC_CompareAndSwap8 (<no parameter info>)
54c41820 MallocTracer!ATOMIC::LIFO_CTR<ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT,ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT_HEAP,1,32,unsigned __int64,ATOMIC::NULLSTATS>::Push (<no parameter info>)
54c535a0 MallocTracer!ATOMIC::IDSET<7,LEVEL_BASE::ATOMIC_STATS>::ReleaseID (<no parameter info>)
54c547a8 MallocTracer!ATOMIC_CompareAndSwap64 (<no parameter info>)
54c3e660 MallocTracer!ATOMIC::EXPONENTIAL_BACKOFF<LEVEL_BASE::ATOMIC_STATS>::~EXPONENTIAL_BACKOFF<LEVEL_BASE::ATOMIC_STATS> (<no parameter info>)
54c5476c MallocTracer!ATOMIC_CompareAndSwap16 (<no parameter info>)
0:000> x /D /f MallocTracer!m*
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
54e21e20 MallocTracer!mbsinit (<no parameter info>)
54c6e450 MallocTracer!mmap (<no parameter info>)
54c3bb40 MallocTracer!malloc (<no parameter info>)
54e21db0 MallocTracer!memchr (<no parameter info>)
54e21e00 MallocTracer!mbrtowc (<no parameter info>)
54e26500 MallocTracer!mbrlen (<no parameter info>)
54e21e40 MallocTracer!mbsnrtowcs (<no parameter info>)
54e261b0 MallocTracer!mbrtoc32 (<no parameter info>)
54c38730 MallocTracer!main (<no parameter info>)
54e1a2f0 MallocTracer!memset (<no parameter info>)
54e26410 MallocTracer!mbstate_get_byte (<no parameter info>)
54e22010 MallocTracer!mbsrtowcs (<no parameter info>)
54e1a1a0 MallocTracer!memmove (<no parameter info>)
54e263e0 MallocTracer!mbstate_bytes_so_far (<no parameter info>)
54e1a2c0 MallocTracer!memcpy (<no parameter info>)
54c6e480 MallocTracer!munmap (<no parameter info>)
54e26420 MallocTracer!mbstate_set_byte (<no parameter info>)
0:000> bp 54c38730
0:000> g
Breakpoint 0 hit
eax=53833cb8 ebx=54f64000 ecx=00000000 edx=54f356c0 esi=54f6500a edi=54f65000
eip=54c38730 esp=01ad19f4 ebp=53833c8c iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
MallocTracer!main:
54c38730 55 push ebp
For DynamoRIO
I’ll just point you to the official documentation since the debugging process can be a bit more tricky. Check the documentation here.
Pintool (WinMallocTracer)
As mentioned in the beginning, this post is all about Windows
. Which means it doesn’t really make sense to be tracking malloc
, and/or free
. If we want to play with “real” Windows
applications we need to trace
the Windows
Heap
family of functions.
It’s a good time to look again at the diagram shown before that illustrates the relationship of Windows API
calls used to allocate process memory (from the book The Art of Memory Forensics).
If we want to make sure we’ll always “see” the memory allocations performed by Windows
applications, we should be looking for RtlAllocateHeap, RtlReAllocateHeap, RtlFreeHeap, VirtualAllocEx, and VirtualFreeEx.
The Pintool
below looks exactly at these functions. If you play a bit with multiple applications you’ll realize that to accomplish “our” goal of tracking memory allocations we’ll face a lot of challenges. The code below tries to overcome some of them.
I won’t go into detail explaining the API
calls used as I did before. Mainly because they are mostly the same. I’ll leave the code here and you can go through it. After I simply mention some of the main differences when compared to the basic Pintool
presented before.
#include "pin.h"
#include <iostream>
#include <fstream>
#include <map>
map<ADDRINT, bool> MallocMap;
ofstream LogFile;
KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name");
KNOB<string> EntryPoint(KNOB_MODE_WRITEONCE, "pintool", "entrypoint", "main", "Guest entry-point function");
KNOB<BOOL> EnumSymbols(KNOB_MODE_WRITEONCE, "pintool", "symbols", "0", "List Symbols");
BOOL start_trace = false;
VOID LogBeforeVirtualAlloc(ADDRINT size)
{
if (!start_trace)
return;
LogFile << "[*] VirtualAllocEx(" << dec << size << ")";
}
VOID LogAfterVirtualAlloc(ADDRINT addr)
{
if (!start_trace)
return;
if (addr == NULL)
{
cerr << "[-] Error: VirtualAllocEx() return value was NULL.";
return;
}
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
it->second = false;
else
cerr << "[-] Error: allocating memory not freed!?!" << endl;
}
else
{
MallocMap.insert(pair<ADDRINT, bool>(addr, false));
LogFile << "\t\t= 0x" << hex << addr << endl;
}
}
VOID LogBeforeVirtualFree(ADDRINT addr)
{
if (!start_trace)
return;
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once (Double Free)." << endl;
else
{
it->second = true; // Mark it as freed
LogFile << "[*] VirtualFreeEx(0x" << hex << addr << ")" << endl;
}
}
else
LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;
}
VOID LogBeforeReAlloc(ADDRINT freed_addr, ADDRINT size)
{
if (!start_trace)
return;
// mark freed_addr as free
map<ADDRINT, bool>::iterator it = MallocMap.find(freed_addr);
if (it != MallocMap.end())
{
it->second = true;
LogFile << "[*] RtlHeapfree(0x" << hex << freed_addr << ") from RtlHeapRealloc()" << endl;
}
else
LogFile << "[-] RtlHeapRealloc could not find addr to free??? - " << freed_addr << endl;
LogFile << "[*] RtlHeapReAlloc(" << dec << size << ")";
}
VOID LogAfterReAlloc(ADDRINT addr)
{
if (!start_trace)
return;
if (addr == NULL)
return;
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
it->second = false;
else
// it already exists because of the HeapAlloc, we don't need to insert... just log it
LogFile << "\t\t= 0x" << hex << addr << endl;
}
}
VOID LogBeforeMalloc(ADDRINT size)
{
if (!start_trace)
return;
LogFile << "[*] RtlAllocateHeap(" << dec << size << ")";
}
VOID LogAfterMalloc(ADDRINT addr)
{
if (!start_trace)
return;
if (addr == NULL)
{
cerr << "[-] Error: RtlAllocateHeap() return value was NULL.";
return;
}
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
it->second = false;
else
cerr << "[-] Error: allocating memory not freed!?!" << endl;
}
else
{
MallocMap.insert(pair<ADDRINT, bool>(addr, false));
LogFile << "\t\t= 0x" << hex << addr << endl;
}
}
VOID LogFree(ADDRINT addr)
{
if (!start_trace)
return;
map<ADDRINT, bool>::iterator it = MallocMap.find(addr);
if (it != MallocMap.end())
{
if (it->second)
LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once (Double Free)." << endl;
else
{
it->second = true; // Mark it as freed
LogFile << "[*] RtlFreeHeap(0x" << hex << addr << ")" << endl;
}
}
else
LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;
}
VOID BeforeMain() {
start_trace = true;
}
VOID AfterMain() {
start_trace = false;
}
VOID CustomInstrumentation(IMG img, VOID *v)
{
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
if(EnumSymbols.Value())
{
LogFile << "" << undFuncName << "" << endl;
continue;
}
if (undFuncName == EntryPoint.Value().c_str())
{
RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(allocRtn))
{
RTN_Open(allocRtn);
RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)BeforeMain, IARG_END);
RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)AfterMain, IARG_END);
RTN_Close(allocRtn);
}
}
if (undFuncName == "RtlAllocateHeap")
{
RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(allocRtn))
{
RTN_Open(allocRtn);
// Record RtlAllocateHeap size
RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END);
// Record RtlAllocateHeap return address
RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
RTN_Close(allocRtn);
}
}
if (undFuncName == "RtlReAllocateHeap")
{
RTN reallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(reallocRtn))
{
RTN_Open(reallocRtn);
// Record RtlReAllocateHeap freed_addr, size
RTN_InsertCall(reallocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeReAlloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_FUNCARG_ENTRYPOINT_VALUE, 3, IARG_END);
// Record RtlReAllocateHeap return address
RTN_InsertCall(reallocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterReAlloc,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
RTN_Close(reallocRtn);
}
}
else if (undFuncName == "RtlFreeHeap")
{
RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(freeRtn))
{
RTN_Open(freeRtn);
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
IARG_FUNCARG_ENTRYPOINT_VALUE, 2,
IARG_END);
RTN_Close(freeRtn);
}
}
if (undFuncName == "VirtualAllocEx")
{
RTN vrallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(vrallocRtn))
{
RTN_Open(vrallocRtn);
RTN_InsertCall(vrallocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeVirtualAlloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END);
RTN_InsertCall(vrallocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterVirtualAlloc,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
RTN_Close(vrallocRtn);
}
}
if (undFuncName == "VirtualFreeEx")
{
RTN vrfreeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(vrfreeRtn))
{
RTN_Open(vrfreeRtn);
RTN_InsertCall(vrfreeRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeVirtualFree,
IARG_FUNCARG_ENTRYPOINT_VALUE, 1, IARG_END);
RTN_Close(vrfreeRtn);
}
}
}
}
VOID FinalFunc(INT32 code, VOID *v)
{
for (pair<ADDRINT, bool> p : MallocMap)
{
if (!p.second)
LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl;
}
LogFile.close();
}
int main(int argc, char *argv[])
{
PIN_InitSymbols();
PIN_Init(argc, argv);
LogFile.open(LogFileName.Value().c_str());
LogFile << "## Memory tracing for PID = " << PIN_GetPid() << " started" << endl;
if (EnumSymbols.Value())
LogFile << "### Listing Symbols" << endl;
else
LogFile << "### Started tracing after '" << EntryPoint.Value().c_str() << "()' call" << endl;
IMG_AddInstrumentFunction(CustomInstrumentation, NULL);
PIN_AddFiniFunction(FinalFunc, NULL);
PIN_StartProgram();
return 0;
}
There are a couple of new options supported by this Pintool
. If you look at the KNOB
switches (below), you’ll see that the Pintool
now supports two new options.
KNOB<string> EntryPoint(KNOB_MODE_WRITEONCE, "pintool", "entrypoint", "main", "Guest entry-point function");
KNOB<BOOL> EnumSymbols(KNOB_MODE_WRITEONCE, "pintool", "symbols", "0", "List Symbols");
You can specify what’s the entry-point function of the target/guest application you want to trace. Why is this useful? If you don’t do it, all the initialization code will also be traced
and it will become very hard to make sense of the output of our Pintool
. Try. By default, the tracing
will start only after the function main
is called. Obviously, if our target/guest application doesn’t have a main
function, we’ll end with an empty output file.
Let’s look at a specific example. Let’s look at the Windows
calc.exe
. This binary doesn’t have a main
function. So we run our Pintool
as shown below.
C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -- calc.exe
We’ll get the following output.
## Memory tracing for PID = 1732 started
### Started tracing after 'main()' call
As expected, since calc.exe
doesn’t have a main
function. So, if we want to trace
calc.exe
or any other binary, we’ll need to find what’s its entry-point (or any other call after we want to start our trace
). We can launch it on IDA
, for example, or we can use the other KNOB
switch (-symbols
) as shown below to list all the symbols
.
C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -symbols 1 -- calc.exe
And look at the output file (by default memprofile.out
) to see if we can find the function we are looking for.
C:\pin> type memprofile.out
## Memory tracing for PID = 5696 started
### Listing Symbols
unnamedImageEntryPoint
InterlockedIncrement
InterlockedDecrement
InterlockedExchange
InterlockedCompareExchange
InterlockedExchangeAdd
KernelBaseGetGlobalData
unnamedImageEntryPoint
GetErrorMode
SetErrorMode
CreateIoCompletionPort
PostQueuedCompletionStatus
GetOverlappedResult
(...)
If you want to see the whole contents of the file you can find it here. The first line is quite interesting though, and it’s probably what we are looking for (unnamedImageEntryPoint
). So we can use our Pintool
as shown below.
C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -entrypoint unnamedImageEntryPoint -- calc.exe
And if we look at the output this time we’ll get something like:
C:\pin> type memprofile.out
## Memory tracing for PID = 6656 started
### Started tracing after 'unnamedImageEntryPoint()' call
[*] RtlAllocateHeap(32) = 0x4d9098
[*] RtlAllocateHeap(564) = 0x2050590
[*] RtlAllocateHeap(520) = 0x4dcb18
[*] RtlAllocateHeap(1024) = 0x4dd240
[*] RtlAllocateHeap(532) = 0x20507d0
[*] RtlAllocateHeap(1152) = 0x20509f0
[*] RtlAllocateHeap(3608) = 0x4dd648
[*] RtlAllocateHeap(1804) = 0x2050e78
[*] RtlFreeHeap(0x4dd648)
(...)
If you want to see the whole contents of the file you can find it here. As you’ll see, it’s still hard to read and make sense of the output. As I mentioned before, this Pintool
can actually tell there’s a problem, but not where it is. I’ll try to improve the Pintool
, and if you are interested you can follow its future developments here. At least, every time I detect an issue I’ll add a PIN_ApplicationBreakpoint (see here). In some cases, it might still be very hard to locate the issue, but it’s a starting point. There are also a lot of false positives
, as you can see in the output of calc.exe
. To validate that actually the Pintool
is working we can use the following sample target/guest (I called it ExercisePin2.exe
).
#include <windows.h>
#include <stdio.h>
#define PAGELIMIT 80
int my_heap_functions(char *buf) {
HLOCAL h1 = 0, h2 = 0, h3 = 0, h4 = 0;
h1 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 260);
h2 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 260);
HeapFree(GetProcessHeap(), 0, h1);
h3 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 520);
h4 = HeapReAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, h3, 1040);
HeapFree(GetProcessHeap(), 0, h4);
return 0;
}
int my_virtual_functions(char *buf) {
LPVOID lpvBase;
DWORD dwPageSize;
BOOL bSuccess;
SYSTEM_INFO sSysInfo; // Useful information about the system
GetSystemInfo(&sSysInfo); // Initialize the structure.
dwPageSize = sSysInfo.dwPageSize;
// Reserve pages in the virtual address space of the process.
lpvBase = VirtualAlloc(
NULL, // System selects address
PAGELIMIT*dwPageSize, // Size of allocation
MEM_RESERVE, // Allocate reserved pages
PAGE_NOACCESS); // Protection = no access
if (lpvBase == NULL)
exit("VirtualAlloc reserve failed.");
bSuccess = VirtualFree(
lpvBase, // Base address of block
0, // Bytes of committed pages
MEM_RELEASE); // Decommit the pages
return 0;
}
int main(void) {
my_heap_functions("moo");
my_virtual_functions("moo");
return 0;
}
You can find the Visual Studio
project here. You can play with it a compare the output with what’s expected based on ExercisePin2.c
source code.
C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -symbols 1 -- C:\TARGET\ExercisePin2.exe
C:\pin> type memprofile.out
## Memory tracing for PID = 5600 started
### Listing Symbols
_enc$textbss$end
unnamedImageEntryPoint
main
my_heap_functions
my_virtual_functions
HeapAlloc
HeapReAlloc
HeapFree
GetProcessHeap
GetSystemInfo
(...)
The full output is here. Since the entry-point function is main
, we can simply run the Pintool
without passing anything to it.
C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -- C:\TARGET\ExercisePin2.exe
C:\pin> type memprofile.out
## Memory tracing for PID = 4396 started
### Started tracing after 'main()' call
[*] RtlAllocateHeap(260) = 0x41dd30
[*] RtlAllocateHeap(260) = 0x41de40
[*] RtlFreeHeap(0x41dd30)
[*] RtlAllocateHeap(520) = 0x41df50
[*] RtlHeapfree(0x41df50) from RtlHeapRealloc()
[*] RtlHeapReAlloc(1040) = 0x41df50
[*] RtlFreeHeap(0x41df50)
[*] VirtualAllocEx(327680) = 0x2410000
[*] VirtualFreeEx(0x2410000)
[*] Memory at address 0x41de40 allocated but not freed
As we can see, tracing memory calls is tricky, but achievable. I’ll try to add a few more things to this WinMallocTracer
Pintool
in a near future. Keep an eye on GitLab if you fancy.
Final notes
Playing with a DBI
framework is not that hard, as we saw, the challenge lies in doing it right. That is, handle all the corner cases efficiently. Something that looks fairly easy can become very challenging if we are going to do it right. The example tool I chose came from a specific need, and from a vulnerability discovering perspective DBI
frameworks are indeed very useful. There’s a lot of room for improvement, and I plan to keep working on it.
Even though it was the Fuzzing
subject that brought me here (that is, playing with DBI
frameworks) I ended up not talking too much about its relationship. Think that a DBI
tool per si won’t find many bugs unless you exercise as many code paths as possible. After all, a DBI
system only modifies the code that’s executed. So, it’s easy to understand that we need to combine it with a coverage-guided Fuzzer
to discover more bugs (preferably, exploitable).
DBI
systems are here to stay, they emerged as a means for bypassing the restrictions imposed by binary code. Or, lack of access to source code. The need to understand, and modify the runtime behavior, of computer programs, is undeniable.
The field of dynamic binary modification
is evolving very fast. New applications and new complex engineering challenges are appearing constantly and static binary patching
and hooking
are “things” from the past.
This post documents the first steps if you want to get into this area. All the code snippets used are available at this GitLab repo. And, an improved version of the WinMallocTracer
Pintool
is available at this GitLab repo.
References (in no particular order)
- https://en.wikipedia.org/wiki/Pin_(computer_program)
- https://en.wikipedia.org/wiki/Dynamic_program_analysis
- https://en.wikipedia.org/wiki/Instrumentation_(computer_programming)
- http://uninformed.org/index.cgi?v=7&a=1&p=3
- https://software.intel.com/sites/landingpage/pintool/docs/97619/Pin/html/
- http://www.ic.unicamp.br/~rodolfo/mo801/04-PinTutorial.pdf
- https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool
- https://software.intel.com/sites/default/files/managed/62/f4/cgo2013.pdf
- https://software.intel.com/sites/default/files/m/d/4/1/d/8/pin_tutorial_cgo_ispass_2012.ppt
- https://software.intel.com/sites/default/files/m/d/4/1/d/8/Pin_tutorial_cgo_2011_final_1.ppt
- https://software.intel.com/sites/default/files/article/256675/cgo-2010-final.ppt
- https://msdn.microsoft.com/en-gb/magazine/dn818497.aspx (got a bunch of ideas from this post)
- https://github.com/JonathanSalwan/PinTools (mandatory)
- http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/ (mandatory)
- http://shell-storm.org/blog/Binary-analysis-Concolic-execution-with-Pin-and-z3/
- http://shell-storm.org/blog/In-Memory-fuzzing-with-Pin/
- https://github.com/jingpu/pintools/blob/master/source/tools/ManualExamples/w_malloctrace.cpp
- https://github.com/corelan/pin
- http://dynamorio.org/docs/
- http://dynamorio.org/tutorial.html
- http://dynamorio.org/pubs.html
- http://dynamorio.org/docs/API_BT.html#sec_decode
- https://groups.google.com/forum/#!forum/dynamorio-users
- http://dynamorio.org/docs/samples/wrap.c
- https://github.com/DynamoRIO/dynamorio/blob/master/api/samples/ssljack.c
- https://axtaxt.wordpress.com/2014/03/02/implementing-a-simple-hit-tracer-in-dynamorio/
- Building Dynamic Instrumentation Tools with DynamoRIO
- https://media.blackhat.com/bh-us-11/Diskin/BH_US_11_Diskin_Binary_Instrumentation_Slides.pdf
- Using Binary Instrumentation for Vulnerability Discovery (mandatory)
- Dynamic Binary Analysis and Instrumentation Covering a function using a DSE approach (mandatory)
- http://2011.zeronights.org/files/dmitriyd1g1evdokimov-dbiintro-111202045015-phpapp01.pdf
- https://qbdi.quarkslab.com/QBDI_34c3.pdf
- Getting fun with Frida (mandatory)
- https://dyninst.org/sites/default/files/manuals/dyninst/dyninstAPI.pdf
- https://www.frida.re/docs/home/
- https://www.frida.re/docs/presentations/
- https://monosource.github.io/tutorial/2017/01/26/frida-linux-part1/ (my frida section comes mostly from here)
- https://vicarius.io/blog/wtf-is-frida/
- http://blog.kalleberg.org/post/833101026/live-x86-code-instrumentation-with-frida
- https://www.codemetrix.net/hacking-android-apps-with-frida-1/
- https://en.wikipedia.org/wiki/Chrome_V8
- https://github.com/BinaryAnalysisPlatform/bap-tutorial
- Hiding PIN’s Artifacts to Defeat Evasive Malware
- https://software.intel.com/en-us/articles/pin-errors-in-2017-update-3-and-4-analysis-tools
- Pwning Intel Pin Reconsidering Intel Pin in Context of Security
- Dynamic Program Analysis and Optimization under DynamoRIO
- https://bsidesvienna.at/slides/2017/the_art_of_fuzzing.pdf
- https://libraries.io/github/memtt/malt
- http://3nity.io/~vj/downloads/publications/pldi05_pin.pdf
- http://valgrind.org/docs/valgrind2007.pdf
- http://groups.csail.mit.edu/commit/papers/03/RIO-adaptive-CGO03.pdf
- http://groups.csail.mit.edu/commit/papers/01/RIO-FDDO.pdf
- Triton Concolic Execution Framework
- https://www.cc.gatech.edu/~orso/papers/clause.li.orso.ISSTA07.pdf
- http://www-leland.stanford.edu/class/cs343/resources/shadow-memory2007.pdf
- http://www.burningcutlery.com/derek/docs/drmem-CGO11.pdf
- http://valgrind.org/docs/iiswc2006.pdf
- https://pdfs.semanticscholar.org/1156/5da78c06a94c1fc8a0ff3a8d710cb9a5d450.pdf
- http://homepages.dcc.ufmg.br/~fernando/publications/papers_pt/Tymburiba15Tools.pdf
- http://delivery.acm.org/10.1145/3030000/3029812/p219-elsabagh.pdf
- http://sharcs-project.eu/m/filer_public/74/5c/745c0bf6-7636-405f-86e6-089ac630f0d2/patharmor_ccs15.pdf
- https://www.bodden.de/pubs/fb2016ropocop.pdf
- https://arxiv.org/pdf/1502.03245.pdf
- https://suif.stanford.edu/papers/vmi-ndss03.pdf
- https://recon.cx/2012/schedule/attachments/42_FalconRiva_2012.pdf
- https://hackinparis.com/data/slides/2013/slidesricardorodriguez.pdf
- Black Box Auditing Adobe Shockwave
- Covert Debugging Circumventing Software Armoring Techniques
- Shellcode analysis using dynamic binary instrumentation
- http://taviso.decsystem.org/making_software_dumber.pdf
- http://web.cs.iastate.edu/~weile/cs513x/2018spring/taintanalysis.pdf
- Hybrid analysis of executables to detect security vulnerabilities
- Tripoux Reverse Engineering Of Malware Packers For Dummies
- https://pdfs.semanticscholar.org/presentation/c135/68c933ea8f6a91db67a103715fd1d4ce2253.pdf
- https://code.google.com/archive/p/devilheart/
- http://groups.csail.mit.edu/commit/papers/02/RIO-security-usenix.pdf
- http://pages.cs.wisc.edu/~madhurm/pindb/pindb.pdf
- https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Enck.pdf
- https://deepsec.net/docs/Slides/2009/DeepSec_2009_Daniel_Reynaud_-_Deobfuscation_Unpacking.pdf
- http://fmv.jku.at/master/Holzleiter-MasterThesis-2009.pdf
- http://csl.cs.ucf.edu/debugging/user_guide.html
- http://bitblaze.cs.berkeley.edu/papers/sweeper.pdf
- http://www.ece.neu.edu/groups/nucar/publications/ASSISD06moffie.pdf
- https://events.ccc.de/congress/2009/Fahrplan/attachments/1430_secuBT.pdf
- https://recon.cx/2010/slides/Recon2010-UnderStaningSwizzorObfuscation.pdf
- http://www.dtic.mil/dtic/tr/fulltext/u2/a462289.pdf
- Pin++: A Object-oriented Framework for Writing Pintools
- Rootkit detection via Kernel Code Tunneling
- https://media.blackhat.com/bh-eu-11/Mihai_Chiriac/BlackHat_EU_2011_Chiriac_Rootkit_detection-WP.pdf
- https://www.cc.gatech.edu/~orso/papers/clause.li.orso.ISSTA07.pdf
- https://recon.cx/2014/slides/pinpoint_control_for_analyzing_malware_recon2014_jjones.pdf
- https://arxiv.org/pdf/1503.01186.pdf
- https://code.google.com/archive/p/tartetatintools/
- https://github.com/0xPhoeniX/MazeWalker
- https://recon.cx/2017/montreal/resources/slides/RECON-MTL-2017-MazeWalker.pdf
- https://github.com/poxyran/misc/blob/master/frida-heap-trace.py
- https://github.com/OALabs/frida-extract
- https://github.com/Nightbringer21/fridump
- https://edmcman.github.io/papers/oakland10.pdf
- https://edmcman.github.io/pres/oakland10.pdf
- https://github.com/falconre/falcon
- http://reversing.io/posts/palindrome-progress/
- [https://www.blackhat.com/docs/us-16/materials/us-16-Mariani-Pindemonium-A-Dbi-Based-Generic-Unpacker-For-Windows-Executables-wp.pdf](PinDemonium: a DBI-based generic unpacker for Windows executables)
- https://www.reddit.com/r/REMath/comments/8ml1ep/books_on_program_analysis/
- http://bitblaze.cs.berkeley.edu/temu.html
- https://code.google.com/archive/p/flayer/
- https://resources.infosecinstitute.com/pin-dynamic-binary-instrumentation-framework/
- http://www.ckluk.org/ck/papers/pin_ieeecomputer10.pdf
- A simple PIN tool unpacker for the Linux version of Skype (mandatory)
- http://www.msreverseengineering.com/program-analysis-reading-list/ (mandatory)
- Dynamic Binary Modifications: Tools, Techniques & Applications (mandatory)
- https://riot.im/app/#/room/#programanalysis:disroot.org
- https://github.com/wapiflapi/villoc/blob/master/pintool/pintool.cpp
- http://www.computerix.info/skripten/mem-bugs.pdf
- https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Leaks
- https://en.wikipedia.org/wiki/Memory_debugger
- https://nebelwelt.net/publications/students/11fs-kravina-lightweight_memory_tracing.pdf
- https://panthema.net/2013/malloc_count/
- http://www.burningcutlery.com/derek/docs/drmem-CGO11.pdf
- https://github.com/DataChi/memdb
- https://hshrzd.wordpress.com/2018/07/16/how-to-compile-a-pin-tool-using-visual-studio-2017/
Videos
- Implementing an LLVM based Dynamic Binary Instrumentation framework
- DEF CON 15 - Quist and Valsmith - Covert Debugging
- HIRBSecConf 2009 - Travis Ormandy - Making Software Dumber
- Ole André Vadla Ravnås - Frida: The engineering behind the reverse-engineering
- Finding security vulnerabilities with modern fuzzing techniques (RuhrSec 2018) (multiple references to dynamic binary instrumentation)