二进制翻译（PIN & DynamoRIO）

原文：http://blog.deniable.org/posts/binary-instrumentation/

Dynamic Binary Instrumentation Primer

July 25, 2018 64-minute read

Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code - Uninformed 2007

Introduction

The purpose of this post is to document my dive into the “world” of Dynamic Binary Instrumentation. I’ll cover some of the most well known and used DBI frameworks. That is Pin, DynamoRIO, and Frida. From these three I’ll mainly focus on Pin. There are other DBI frameworks that I won’t touch at all, like Valgrind, Triton (uses Pin), QDBI, BAP, Dyninst, plus many others. You might want to have a look at them. Some are more mature, some are less mature. Some have more features, some have fewer features. You’ll have to do some research yourself and see which ones fit your needs. Even though Valgrind is one of the most widely known, and used DBI frameworks, it’s only available for Linux. So, I won’t touch it at all.

In my vulnerability hunting adventures I’ve been focused on Windows, and in fact, if you want to take the code I’ll present here and build it on Linux it should be pretty straightforward. While the opposite wouldn’t be true. The reason being is that building Pin or DynamoRIO on Windows can be a bit frustrating. Especially if you aren’t motivated to do so.

I’m not an expert in this area (DBI), however since the beginning of the year that I’ve been doing some experiments around Fuzzing, and I’ve read a lot about the subject. Hence, I’ll try to document some of what I learned for future reference. Possibly you’ll also find it useful. Note that my goal was to write a reference and not a tutorial.

The funny part is that I actually thought about doing “something” with Pin, or DynamoRIO, while trying to do some browser Heap Spraying. Basically, I wanted to monitor the memory allocations my code was producing. While I could do it inside a debugger I thought, “why not use a DBI framework? Maybe I can learn something”. After all, debuggers are slow. Until today, I’m still unsure if I prefer to use WinDbg or Pin for this anyway.

Instrumentation

According to Wikipedia, instrumentation refers to an ability to monitor or measure the level of a product’s performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system (…). When an application contains instrumentation code, it can be managed using a management tool. Instrumentation is necessary to review the performance of the application. Instrumentation approaches can be of two types: Source instrumentation and binary instrumentation.

As stated above, there are two types of instrumentation. Source instrumentation, which is not possible if you don’t have the source code of the software application. And binary instrumentation, which can be used with any software application assuming we can execute it. It turns out that most of the programs you run on a Windows operating system are closed source. Which means, in this post, I’ll be “talking” only about binary instrumentation. Often called Dynamic Binary Instrumentation, or Dynamic Binary Modification. Because words take too long, usually people use the acronym DBI, as I already did above.

In a one-line statement, Dynamic Binary Instrumentation is a technique that involves injecting instrumentation code into a running process. The instrumentation code will be entirely transparent to the application that it’s been injected to.

With a DBI framework, we can analyze the target binary execution step by step. However, note that the analysis only applies to executed code.

Dynamic Program Analysis

There are two types of program analysis, static, and dynamic. We perform static analysis without running a computer program. While we perform dynamic analysis when we run a computer program.

Citing Wikipedia again, Dynamic program analysis is the analysis of computer software that is performed by executing programs on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to produce interesting behavior. Use of software testing measures such as code coverage helps ensure that an adequate slice of the program’s set of possible behaviors has been observed.

Dynamic binary modification tools, like the frameworks mentioned earlier, introduce a layer between a running program and the underlying operating system. Providing a unique opportunity to inspect and modify user-level program instructions while a program executes.

These systems are very complex internally. However, all the complexity is masked in an API that allows any user to quickly build a multitude of tools to aid software analysis. And that’s what I’ll try to show in this post, by sharing some code I wrote while playing with some DBI frameworks.

There are many reasons for us to observe and modify the runtime behavior of a computer program. Software and/or hardware developers, system engineers, bug hunters, malware analysts, end users, and so on. All of them will have their own reasons. DBI frameworks provide access to every executed user-level instruction. Besides a potentially small runtime and memory overhead, the program will run identically to a native execution.

You can say that the main advantage of static analysis is that it ensures 100% code coverage. With dynamic analysis, to ensure a high code coverage we’ll need to run the program many times, and with different inputs so the analysis takes different code paths. However, in some cases, the software applications are so big that’s too costly to perform static analysis. I would say, one complements the other. Even though static analysis is very boring, and dynamic analysis is (very) fun.

As I mentioned before, DBI frameworks operate directly in binaries/executables. We don’t need the source code of the program. We don’t need to (re)compile or (re)link the program. Obviously, this is an major advantage, as it allows us to analyze proprietary software.

A dynamic binary system operates at the same time as the “guest” program executes while performing all the requested/required modifications on the fly. This dynamic approach can also handle programs that generate code dynamically (even though it imposes a big engineering challenge), that is, self-modifying code. If you “google” a bit you’ll actually find multiple cases where DBI frameworks are/were used to analyze malware with self-modifying code. As an example, check this presentation from last year’s blackhat Europe. Or, this post about how to unpack Skype with Pin.

DBI frameworks are daily used to solve computer architecture problems, being heavily used in software engineering, program analysis, and computer security. Software engineers want to deeply understand the software they develop, analyze its performance, and runtime behavior in a systematic manner. One common use of DBI frameworks is emulating new CPU instructions. Since the dynamic binary system has access to every instruction before executing it, hardware engineers can actually use these systems to test new instructions that are currently unsupported by the hardware. Instead of executing a specific instruction, they can emulate the new instruction behavior. The same approach can be used to replace faulty instructions with the correct emulation of the desired behavior. Anyway, from a computer security perspective, a DBI system can be used for flow analysis, taint analysis, fuzzing, code coverage, test cases generation, reverse engineering, debugging, vulnerability detection, and even crazy things like patching of vulnerabilities, and automated exploit development.

There are two main ways of using a dynamic binary system. The first, and eventually most common, in computer security at least, is executing a program from start to finish under the control of the dynamic binary system. We use it when we want to achieve full system simulation/emulation because full control and code coverage are desired. The second, we may just want to attach to an already running program (exactly in the same way a debugger can be attached, or detached, from a running program). This option might be useful if we are interested in figuring out what a program is doing in a specific moment.

Besides, most of the DBI frameworks have three modes of execution. Interpretation mode, probe mode, and JIT mode. The JIT (just-in-time) mode is the most common implementation, and most commonly used mode even when the DBI system supports more than one mode of execution. In JIT mode the original binary/executable is actually never modified or executed. The binary is seen as data, and a modified copy of the binary is generated in a new memory area (but only for the executed parts of the binary, not the whole binary). Is this modified copy that’s then executed. In interpretation mode, the binary is also seen as data, and each instruction is used as a lookup table of alternative instructions that have the corresponding functionality (as implemented by the user). In probe mode, the binary is actually modified by overwriting instructions with new instructions. even though this results in a low run-time overhead it’s very limited in certain architectures (like x86).

Whatever the execution mode, once we have control over the execution of a program, through a DBI framework, we then have the ability to add instrumentation into the executing program. We can insert our code, instrumentation, before and after blocks of code, or even replace them completely.

We can visualize how it works in the diagram below.

Also, there are different types of granularity.

Instruction level
Basic block level
Function level

The granularity choice, as you can guess, will allow you to have more, or less, control over the execution of a program. Obviously, this will have an impact on performance. Also, note that instrumenting a program in its totality is unpractical in most cases.

Performance

You might be thinking what’s the performance impact of modifying a running program on the fly as described above. Well, I have a very limited experience to answer this question. However, after reading multiple papers, articles, and presentations, the overhead commonly observed depends on a random number of factors really. Anyway, as kind of expected, the modifications the user implements are responsible for the majority of the overhead. The number 30% is apparently accepted as a common average number observed. Can’t really remember where I read this to mention the source, but I definitely read it somewhere. You’ll find it for sure in the References section anyway. Obviously, one of the first decisions that you, as a DBI user, will have to make is to decide the amount of code coverage required by your needs and the amount of performance overhead you’ll be able to accept as reasonable.

Pin is a DBI framework developed by Intel Corp. It allows us to build program analysis tools known as Pintools, for Windows, Linux, and OSX. We can use these tools to monitor, modify, and record the behavior of a program while it is running.

Pin is proprietary software. However, we can download and use it free of charge for non-commercial use. Besides the documentation and the binaries, Pin also includes source code for a large collection of sample Pintools. These are invaluable examples that we must consider, and definitely read, before developing any Pintool.

In my opinion, Pin is the easiest DBI framework to use. At least I felt it was easier to dive into it’s API than into the DynamoRIO one. Even though I didn’t spend too much time trying to learn other APIs besides these two, I had a look at a few others. Like Valgrind, Triton, Dyninst, and Frida. The choice will always depend on what you intend to do, honestly.

If you want to create a commercial tool and distribute binary versions of it, Pin won’t be a good choice. If that’s not the case, Pin might be a very good choice. Mainly because based on the tests I did, Pin is stable and reliable. I had some issues running some programs under some DBI frameworks. Mainly big programs, like Office suites, games, and AV engines. Some DBI frameworks were failing miserably, some even with small applications.

Pin setup (Windows)

Pin setup in Linux is quite straightforward. However, on Windows systems, it can be a bit tricky. See below how to quickly set it up to get started in case you want to try the samples I’ll present in this post.

Get the latest Pin version from here, and unpack it on your C:\ drive, or wherever you want. For simplicity, I usually use C:\pin. I advise you to do the same if you plan to follow some of the experiments presented in this post.

The Pin zip file includes a big collection of sample Pintools under source/tools. The API is very easy to read and understand as we’ll see. By the end of this post you should be able to read the source code of most of the samples without any struggle (well, kind of).

I like Visual Studio, and I’ll be using it to build “every” tool mentioned in this post. There’s one Pintool sample that’s almost ready to be built with Visual Studio. You’ll have to adjust only a couple of settings. However, I didn’t want to manually copy and rename files every time I wanted to create a new Pintool project. So I created a sample project already tweaked, available here that you can place under C:\pin\source\tools, together with the following python script. The script was inspired by Peter’s script. However, since the way newer versions of Visual Studio save the settings has changed I had to re-write/create a completely new script.

So, every time you want to build a new Pintool with Visual Studio, just do:

cd\
cd pin
python create_pintool_project.py -p <name_of_your_project>

You can then just click the project’s solution file and build your Pintool with Visual Studio without any pain. I used Visual Studio Professional 2015, but it will also work with Visual Studio 2017. I did a couple of builds with Visual Studio 2017 Enterprise without any issue.

Pin Visual Studio integration

We can add our Pintools as external tools to Visual Studio. This will allow us to run, and test, our Pintool without using the command line all the time. The configuration is very simple. From the Tools menu, select External tools and a dialog box will appear. Click the Add button and fill out the text input boxes according to the image below.

In the Title, input text box enter whatever you want. In the Command input text box enter the full path to your pin.exe, so c:\pin\pin.exe in case you installed it under c:\pin. In the Arguments, you must include all the arguments you want to pass to your Pintool. You’ll need at least the ones specified in the image above. The -t is to specify where your Pintool is, and after the -- is the target program you want to instrument.

After the setup, you can simply run your Pintool from the Tools menu as shown in the image below.

Click ok, and enjoy.

The Output window of Visual Studio will show whatever the output your Pintool is writing to stdout.

DynamoRIO

DynamoRIO is another DBI framework originally developed in a collaboration between HP’s Dynamo optimization system and the Runtime Introspection and Optimization (RIO) research group at MIT. It allows us to build program analysis tools known as clients, for Windows, and Linux. We can use these tools to monitor, modify, and record the behavior of a program while it is running.

DynamoRIO was first released as a proprietary binary toolkit in 2002 and was later open-sourced with a BSD license in 2009. Like Pin, it also comes with source code for multiple client samples. These are invaluable examples to get us started and playing with its API.

DynamoRIO is a runtime code manipulation system which allows code transformation on any part of the program as the program runs. It works as an intermediate platform between applications and operating system.

As I said before, I didn’t find DynamoRIO's API the most friendly and easy to use. However, if you plan to make a commercial version, and/or distribute binary versions, DynamoRIO might be the best option. One of its advantages is the fact that it is BSD licensed, which means free software. If that’s important for you, go with DynamoRIO.

Also note that’s commonly accepted that DynamoRIO is faster than Pin, check the References section. However, is equally accepted that Pin is more reliable than DynamoRIO, which I also personally experienced when running big software programs.

DynamoRIO setup (Windows)

To install DynamoRIO on Windows simply download the latest Windows version from here (DynamoRIO-Windows-7.0.0-RC1.zip at the time of this writing), and similarly to what we did with Pin just unzip it under C:\dynamorio.

To build your own DynamoRIO projects on Windows it can be a bit tricky though. You can try to follow the instructions here or the instructions here or, to avoid frustration, just… use my DynamoRIO Visual Studio template project.

As I said before, I like Visual Studio. I created a sample project already tweaked with all the includes and libs required (assuming you unzipped DynamoRIO in the directory I mentioned before), available here. Then, more or less the same way we did with Pin, also download the following python script. Since the file structure of the project is a bit different I couldn’t use the script I wrote before to clone a project, and I had to create a new one specific to DynamoRIO.

So, every time you want to build a new DynamoRIO client with Visual Studio, just do:

python create_dynamorio_project.py -p <name_of_your_project>

The command above assumes that both the Python script and the template project mentioned above are in the same folder.

You can then just click the project’s solution file and build your DynamoRIO client with Visual Studio without any pain. I used Visual Studio Professional 2015, but it will also work with Visual Studio 2017. I did a couple of builds with Visual Studio 2017 Enterprise without any issue.

DynamoRIO Visual Studio integration

We can also integrate DynamoRIO with Visual Studio, exactly the same way we did with Pin. Since the setup process is exactly the same, I’ll only leave here the screenshot below and you can figure how to do the rest.

Frida

Frida is a DBI framework developed mainly by Ole. It became very popular among the “mobile” community and gained a considerable group of contributors (now sponsored by NowSecure). Frida supports OSX, Windows, Linux, and QNX, and has an API available for multiple languages, like Python, C#, Swift, Qt\QML and C. Just like the DBI frameworks mentioned above, we can use Frida together with scripts to monitor, modify, and record the behavior of a program while it is running.

Frida is free (free as in free beer) and is very easy to install (see below). There are also many usage examples online that we can use to get started. Frida injects Google’s V8 engine into a process. Then, Frida core communicates with Frida's agent (process side) and uses the V8 engine to run the JavaScript code (creating dynamic hooks).

Frida's API has two main parts. The JavaScript API and the bindings API. I didn’t dive too deep into them and just used the most popular I believe. That is the JavaScript API. I found it easy to use, very flexible, and I could use it to quickly write some introspection tools.

Even though Pin and DynamoRIO are the “main” DBI frameworks, and most mature, Frida has some advantages. As mentioned above, it has bindings for other/more languages, and rapid tool development is a reality. It also has some disadvantages, less maturity, less documentation, less granularity than other frameworks, and consequently lack of some functionalities.

Frida setup (Windows)

Frida's setup is very easy. Just download https://bootstrap.pypa.io/get-pip.py and then run:

python get-pip.py

And, to actually install Frida type the following.

cd\
cd Python27\Scripts
pip.exe install frida

And that’s it, you are ready to go. Yes, you have to install Python before the steps above. However, I don’t know anyone that doesn’t have Python installed so I just assume it’s already there.

Generic DBI usage

Before diving into some code, I’ll try to document in this section generic ways of using some of the DBI frameworks I mentioned before. More precisely Pin, and DynamoRIO.

As mentioned before, the most common execution mode in a DBI system is the JIT (just-in-time-compiler). The JIT compiler will create a modified copy of chunks of instructions just before executing them, and these will be cached in memory. This mode of execution is the default in most of the DBI frameworks I had a look and is also generally accepted as the most robust execution model.

Also, as mentioned before, there are two main methods to control the execution of a program. The first is to run the entire program under the control of the DBI framework. The second is to attach to a program already running. Just like a debugger.

Below is the standard way to run a program under the control of a DBI system. Our target/guest application is not directly launched from the command line. Instead, it is passed as an argument to the DBI system. The DBI system initializes itself, and then launches the program under its control and modifies the program according to the plug-in. The plug-in contains the actual user-defined code, that is our instrumentation code. The plug-in on Pin it’s called Pintool, on DynamoRIO it’s called client, on Frida I believe it’s simply called script?

PIN JIT mode.

pin.exe <pin args> -t <pintool>.dll <pintool args> -- target_program.exe <target_program args>

PIN Probe mode.

pin.exe -probe <pin args> -t <pintool>.dll <pintool args> -- target_program.exe <target_program args>

DynamoRIO JIT mode.

drrun.exe -client <dynamorio client>.dll 0 "" target_program.exe <target_program args>

DynamoRIO Probe mode.

drrun.exe -mode probe -client <dynamorio client>.dll 0 "" target_program.exe <target_program args>

As we can see above, the way we launch Pin and DynamoRIO it is not that different. In Linux systems, it’s pretty much the same (yes, remove the .exe, and substitute the .dll by .so and that’s it).

Obviously, there are many other options that can be passed on the command line besides the ones shown above. For a full list check the help/man pages. Above are just the required options for reference.

Frida is a bit different, and we’ll see ahead how to use it.

If you want to attach to a running process, you can do it with Pin. However, as of today, attaching to a process with DynamoRIO is not supported. However, there are two methods of running a process under DynamoRIO in Windows. You can read more about it here.

With Pin you can simply attach to a process by using the -pid argument as shown below.

pin.exe -pid <target pid> <other pin args> -t <pintool>.dll <pintool args>

User defined modifications

Despite the DBI we are using, each DBI framework provides an API that we can use to specify how we modify the target/guest program. The abstraction introduced by the API is used together with code usually written in C, or C++ (or even JavaScript, or Swift in the case of Frida) to create a plug-in (in the form of a shared library as we saw above) which will then be “injected” in the running target/guest program by the DBI system. It will run on the same address space of the target/guest program.

This means that in order for us to use a DBI system, we need not only to know how to launch a target/guest program, as illustrated above but also be familiar and understand the API exported by the framework we want to use.

Unfortunately, the APIs of these multiple frameworks are very different. However, as will see the general concepts apply to most of them. As I mentioned before, I’ll be focusing mainly in Pin. I’ll also try to recreate more or less the same functionality with DynamoRIO and Frida, so we will also get a bit familiar with their API somehow. Note that the API coverage won’t be by any means extensive. I advise you to check each DBI framework API documentation if you want to know more. By following this post you’ll simply get a sense of what’s available in the API, eventually limited to the use case scenario I chose.

The idea behind any API is to hide the complexity of certain operations from the user, without removing any power to perform any task (including complex tasks). We can usually say that the easier is to use the API the better it is.

All the APIs allow us to, in a certain way, iterate over the instructions the DBI system is about to run. This allows us to add, remove, modify, or observe the instructions prior to execute them. For example, my initial idea was to simply log (observe) all the calls to memory related functions (malloc, and free).

We can, not only introduce instructions to get profiling/tracing information about a program but also introduce complex changes to the point of completely replace certain instructions with a completely new implementation. Think for example, as replacing all the malloc calls with your own malloc implementation (that, for example, introduces shadow bytes and so on).

In DynamoRIO it’s slightly different. However, in Pin most of the API routines are call based. This makes the API very user-friendly. At least to the way I think when I visualize the usage of a DBI system. This is also possible with DynamoRIO, obviously, as we will see. Basically, we register a callback to be notified when certain events occur (a call to malloc). For performance reasons, Pin inlines these callbacks.

As we saw, most of the DBI frameworks support multiple operating systems, and platforms. Most of the time, the APIs are the same and all the differences between operating systems are kept away from the user and handled “under the table”. However, there are still certain APIs that are specific to certain operating systems. You need to be aware of that.

It’s also important to distinguish between instrumentation and analysis code. Instrumentation code is applied to specific code locations, while analysis code is applied to events that occur at some point in the execution of the program. As stated on Wikipedia Instrumentation routines are called when code that has not yet been recompiled is about to be run, and enable the insertion of analysis routines. Analysis routines are called when the code associated with them is run. In other words, instrumentation routines define where to insert instrumentation. Analysis routines define what to do when the instrumentation is activated.

The APIs of Pin, DynamoRIO, and Frida allow us to iterate over the target/guest program with a distinct level of granularities. That is, iterate over every single instruction, just before an instruction execute, entire basic blocks, traces (multiple basic blocks), or the entire target/guest program (image).

Example tool

As I mentioned, while I was playing with Heap Spraying I felt the need of logging all the memory allocations my code was performing. Since I felt a bit annoyed after doing this repeatedly with WinDbg, even with some automation, I thought about doing it with a DBI framework. More precisely, with Pin.

I remember that during one of Peter Van Eeckhoutte’s exploitation classes he mentioned he had written something similar. I looked at his GitHub and found his Pintool. I had a look at his code, but since he used Visual Studio 2012, plus an old version of Pin, plus a different approach of what I had in mind, plus a different goal (I had in mind doing something else besides logging memory allocations), and things changed a bit since then… I decided to write my own Pintool instead of using, or modifying, his code. After all, it’s all about struggling and learning. Not running tools. Later I realized that most of his code comes from the Pin documentation, so does mine.

The goal was to write a Pintool, or a DynamoRIO client, and use it to detect critical memory issues. Such as memory leaks and double frees. Yes, in C/C++ programs. You may say that there are plenty of tools that already allow you to do that, and that’s eventually true (in fact DynamoRIO comes with a couple of tools that can help here). The point here was to learn how to write my own tool, have fun, get familiar with DBI frameworks, and document my experiments for later reference. Eventually, it will also be used as a soft introduction to Dynamic Binary Analysis by people who don’t know where to start.

So, the “critical memory issues” I had in mind weren’t really that difficult to trace. After looking at some almost ready to go code samples, I found in the Pin's documentation, I ended up expanding a bit the initial logging goal I had in mind. And added a couple of “features” to aid my vulnerability discover capabilities.

As you know, some common memory (de)allocation problems in C/C++ programs are:

Memory leaks
Double frees
Invalid frees
Use after frees (Note, the code presented in this post doesn’t detect potential user after free issues. An improved version of this code does, available here).

I assume everyone knows what the problems listed above are. If you don’t, or you need a ‘refresh’, just click the links above.

At least the first 3 problems are very “easy” to detect with a Pintool, or a DynamoRIO client. I’ll do a couple of assumptions. The target program is a single binary/executable file, and the only functions that I’ll track to allocate and free memory are malloc and free (calloc, and realloc are just “special” versions of malloc anyway). Internally new and delete use malloc and free, so we are covered. I can simply “monitor” these calls. I won’t consider other functions like realloc, calloc, HeapAlloc, HeapFree, etc. (for now). Yes, for now, I’ll focus only on the generic malloc and free functions from the C Run-Time Library. In Windows, these functions when called will then call HeapAlloc and HeapFree.

Here’s a diagram showing the relationship of Windows API calls used to allocate process memory (from the book The Art of Memory Forensics, and used with authorization. Thanks to Andrew Case).

As we can see above, ideally we should actually be “monitoring” RtlAllocateHeap and RtlFreeHeap. However, we can ignore this for now. This way, if you just want to try this code in Linux, or OSX, its mostly copy and paste. Later, in the main version of this tool, I’ll indeed be only working with the Windows Heap functions or my Pintool won’t work with Internet Explorer, for example.

Whenever a program calls malloc, I’ll log the return address (that is, the address of the allocated memory region). Whenever a program calls free, I’ll match its address being freed with the addresses I saved before. If it has been allocated and not freed, I’ll mark it as free. If it has been allocated and already freed, then we have a double free. If I don’t have that address saved has been allocated before, then we have a free of unallocated memory. Simple, huh? Finally, when the program exits, I can look at my records to detect memory addresses that have been allocated but not freed. This way I can also detect memory leaks.

As we’ll see, using a dynamic binary framework to achieve what’s described above can be done with very little effort. However, there are some issues that we’ll ignore to keep this post simple. As you can eventually guess, the Heap Manager also plays a role here, and our tool might have to be Heap Manager specific if we don’t want to be flooded with false positives. Also, as mentioned before, this tool will tell us there’s a bug, but not exactly where. You can tell your tool to break/pause when an issue is found and attach a debugger. However, depending on the class of bug it may still be very hard to find where’s the bug and reproduce it.

While I was writing this blog post, a very interesting tool from Joxean Koret called membugtool was released during the EuskalHack 2018 conference. His tool does a bit more than mine (well, actually considerable more), and the code is certainly better than mine. Keep following this post if you want to learn more about Pin and other DBI frameworks, but don’t forget to check his tool later. I was actually very happy when I saw it released because it means my idea wasn’t a complete nonsense. On top of that Joxean Koret is a respected researcher that I’ve been following for quite a long time, mainly due to his awesome work on breaking Antivirus engines.

Target/Guest program (ExercisePin.exe)

To test our multiple dynamic binary analysis tools, I wrote the following non-sense program (I called it ExercisePin.exe). It’s quite clear that there are some memory leaks, an invalid free, and a potential double-free (depending on our input).

#include <stdio.h>
#include <stdlib.h>

void do_nothing() {
  int *xyz = (int*)malloc(2);
}

int main(int argc, char* argv[]) {
  free(NULL);

  do_nothing();

  char *A = (char*)malloc(128 * sizeof(char));
  char *B = (char*)malloc(128 * sizeof(char));
  char *C = (char*)malloc(128 * sizeof(char));

  free(A);
  free(C);

  if (argc != 2)
    do_nothing();
  else
    free(C);

  puts("done");
  return 0;
}

As you can see it’s a very stupid program, I recommend you to test your tools with real software and see how they behave. Also, check the previously mentioned project membugtool since it includes a very nice set of tests which actually made me lazy and I didn’t even try to improve the code above and create new sample buggy programs. Depending on which compiler you use to build this sample, you might have different results. I built mine with Visual Studio. It has advantages, and disadvantages. If you prefer you can use Dev-C++ (which uses GCC), or cygwin (and install gcc or i686-w64-mingw32-gcc.exe), or even Embarcadero. Anyway, expect different results depending on the compiler you choose to build the target program.

Basic Pintool (MallocTracer)

In this first Pintool example, I’m logging all the malloc and free calls. The instrumentation is added before and after the malloc call and logs the parameter passed to the call and its return value. For the free call we’ll only look at its parameter, and not at its return value. So the instrumentation is only added before the call. This Pintool will not be very useful in big applications since it doesn’t really tell you where the issue is. Anyway, it is a good start and will serve the purpose of “showing” how the Pin API can be used.

We need to start by choosing which instrumentation granularity we’ll use. Have a look at the documentation for more details. I’ll be using Image instrumentation.

Image instrumentation lets the Pintool inspect and instrument an entire image, IMG, when it is first loaded. A Pintool can walk the sections, SEC, of the image, the routines, RTN, of a section, and the instructions, INS of a routine. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Image instrumentation utilizes the IMG_AddInstrumentFunction API call. Image instrumentation depends on symbol information to determine routine boundaries hence PIN_InitSymbols must be called before PIN_Init.

We start with some includes. To use the Pin API we need to include pin.h.

#include "pin.h"
#include <iostream>
#include <fstream>
#include <map>

The iostream header is required for basic input/output operations, and the fstream header is required because I’ll write the output of my Pintool to a file. In small programs, we could live with the console output, however for big programs we need to save the output to a file. If you are instrumenting Internet Explorer for example and playing with some JavaScript code, the amount of malloc and free calls is impressive (well, RtlAllocateHeap, and RtlFreeHeap). In some big programs you might not even want to write to disk every time there’s a call due to performance reasons, but let’s ignore that to keep things simple.

Additionally, I’ll use a map container to keep a log of all the memory allocated and freed. Check the References section to see how the C++ map container “works” if you aren’t used to writing code in C++. Since I’m not a developer, I’m not, so my code can be a bit scary but hopefully works. Consider yourself warned.

I’ll also have some global variables. It’s very common to use global variables in a Pintool, have a look at the samples provided to get a feeling of how they are most commonly used. In my case, I’ll use the following global variables.

map<ADDRINT, bool> MallocMap;
ofstream LogFile;
KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name");

I already mentioned the map container above, again have a look here if you don’t know how it works. The idea is to store in this MallocMap the state of each allocation. The ADDRINT type is defined in pin.h, and as you can guess represents a memory address. It will be mapped to a BOOL value. If the BOOL value is set to true it means it has been deallocated.

The LogFile is the output file where I’ll save the output of the Pintool. Lastly, the KNOB variable. It is basically a switch supported by our Pintool (a way to get command arguments to our Pintool. This KNOB allows us to specify the name of the log file through the “o” switch. Its default value is “memprofile.out”.

If we look at the main function of the code samples, you’ll see that they are all very similar. And the one below is no exception.

int main(int argc, char *argv[])
{
  PIN_InitSymbols();
  PIN_Init(argc, argv);
  LogFile.open(LogFileName.Value().c_str());
  IMG_AddInstrumentFunction(CustomInstrumentation, NULL);
  PIN_AddFiniFunction(FinalFunc, NULL);
  PIN_StartProgram();

  return 0;
}

I have to call PIN_InitSymbols before PIN_Init because I’m using Image instrumentation, which depends on symbol information. Then I open the log file for writing, and I call IMG_AddInstrumentFunction. The instrumentation function that I’ll be using is called CustomInstrumentation and is defined by me (not a Pin API function). You can call it whatever you want.

Then I have to call PIN_AddFiniFunction, which is a call to a function to be executed immediately before the application exits. In this case, my function is FinalFunc.

Finally, I call PIN_StartProgram to start executing my program. This function never returns.

So let’s have a look at my CustomInstrumentation() function.

VOID CustomInstrumentation(IMG img, VOID *v) 
{
  for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
  {
    string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);

    if (undFuncName == "malloc")
    {
      RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(allocRtn))
      {
        RTN_Open(allocRtn);

        // Record Malloc size
        RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);

        // Record Malloc return address
        RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
          IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

        RTN_Close(allocRtn);
      }
    }
    else if (undFuncName == "free")
    {
      RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(freeRtn))
      {
        RTN_Open(freeRtn);

        RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
          IARG_END);

        RTN_Close(freeRtn);
      }
    }
  }
}

We need to “tell” Pin what are the instrumentation routines, and when to execute them. The instrumentation routine above is called every time an image is loaded, and then we also “tell” Pin where to insert the analysis routines.

Basically, above, when we find a call to malloc or free we insert the analysis routines by using the RTN_InsertCall function.

The RTN_InsertCall accepts multiple arguments, a variable number of arguments actually. Three are quite important, and you can easily guess which ones by looking at these calls. The first is the routine we want to instrument. The second is an IPOINT that determines where the analysis call is inserted relative to the instrumented object. And the third is the analysis routine to be inserted.

Also, note that all RTN_InsertCall functions must be preceded by a call to RTN_Open and followed by a call to RTN_Close.

We can specify a list of arguments to be passed to the analysis routine, and this list must be terminated with IARG_END. As we can also guess by looking at the code, to pass the return value of malloc to the analysis routine we use IARG_FUNCRET_EXITPOINT_VALUE. To pass the argument of the malloc or free calls to the analysis routine, we use IARG_FUNCARG_ENTRYPOINT_VALUE followed by the index of the argument. In our case, both are 0 (first and only argument).

All the Pin functions that operate at the routine level start with RTN_. Have a look at the RTN Routine Object documentation here.

Also, all the Pin functions that operate at the image level start with IMG_. Have a look at the IMG Image Object documentation here.

The same applies to all the Pin functions that operate at the symbol level, they all (or almost all) start with SYM_. Have a look at the SYM Symbol Object documentation here.

You might be thinking how Pin finds malloc and free. Pin will use whatever symbol information is available. Debug symbols from the target/guest program if available, PDB files if available, export tables, and dbghelp. There are two possible methods to instrument our functions. We can use RTN_FindByName, or alternatively handling name-mangling and multiple symbols (the method I used) as shown below.

  for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
  {
    string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);

    if (undFuncName == "malloc") // find the malloc function

After we find the calls (malloc and free in our example) we want to instrument, we “tell” Pin which function must be called every time a malloc call is executed.

        // Record Malloc size
        RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);

        // Record Malloc return address
        RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
          IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

If we look at the code above, we have two calls to RTN_InsertCall. In the first, we “tell” Pin which function must be called before the malloc call. In the second we “tell” Pin which function must be called after the malloc call. We want to log the allocation sizes and the return value of the malloc call. So, we need both.

For the free call, we are only interested in its parameter (the address of the memory to free).

RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
  IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
  IARG_END);

These three functions are very straightforward. First, before the malloc call we just want to save the size of the memory being allocated.

VOID LogBeforeMalloc(ADDRINT size)
{
  LogFile << "[*] malloc(" << dec << size << ")";
}

After the malloc call, we just want to save the return address. However, as we can see below, we use the map container and by using an iterator we check if the chunk of memory is being allocated for the first time. If yes, we also log it.

VOID LogAfterMalloc(ADDRINT addr)
{
  if (addr == NULL)
  {
    cerr << "[-] Error: malloc() return value was NULL. Heap full!?!";
    return;
  }

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      it->second = false;
    else
      cerr << "[-] Error: allocating memory not freed!?!" << endl;
  }
  else
  {
    MallocMap.insert(pair<ADDRINT, bool>(addr, false));
    LogFile << "\t\t= 0x" << hex << addr << endl;
  }
}

Finally, when we free a chunk of memory we verify if that address was already freed to detect double frees. Plus, if we don’t know the address being freed then we are trying to free memory that wasn’t allocated before. Which can lead to undefined behavior?

VOID LogFree(ADDRINT addr) 
{
  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end()) 
  {
    if (it->second) 
      LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once."  << endl; // Double free
    else 
    {
      it->second = true;    // Mark it as freed
      LogFile << "[*] free(0x" << hex << addr << ")" << endl;
    }
  }
  else 
    LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;    // Freeing unallocated memory
}

Lastly, we have the call to FinalFunc, which is executed just before the program ends. We basically verify if there’s memory that has been allocated but not freed, and we close our log file. The return of this function marks the end of the instrumentation.

VOID FinalFunc(INT32 code, VOID *v) 
{
  for (pair<ADDRINT, bool> p : MallocMap) 
  {
    if (!p.second) 
      LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl;
  }

  LogFile.close();
}

Simple.

The whole Pintool code is below. You can also get the whole Visual Studio project from GitLab here.

// Built on top of https://software.intel.com/sites/default/files/managed/62/f4/cgo2013.pdf (slide 33)

#include "pin.h"
#include <iostream>
#include <fstream>
#include <map>

map<ADDRINT, bool> MallocMap;
ofstream LogFile;
KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name");

VOID LogAfterMalloc(ADDRINT addr)
{
  if (addr == NULL)
  {
    cerr << "[-] Error: malloc() return value was NULL. Heap full!?!";
    return;
  }

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      it->second = false;
    else
      cerr << "[-] Error: allocating memory not freed!?!" << endl;
  }
  else
  {
    MallocMap.insert(pair<ADDRINT, bool>(addr, false));
    LogFile << "\t\t= 0x" << hex << addr << endl;
  }
}

VOID LogBeforeMalloc(ADDRINT size)
{
  LogFile << "[*] malloc(" << dec << size << ")";
}

VOID LogFree(ADDRINT addr) 
{
  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end()) 
  {
    if (it->second) 
      LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once."  << endl; // Double free
    else 
    {
      it->second = true;    // Mark it as freed
      LogFile << "[*] free(0x" << hex << addr << ")" << endl;
    }
  }
  else 
    LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;
}

VOID CustomInstrumentation(IMG img, VOID *v) 
{
  for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
  {
    string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);

    if (undFuncName == "malloc")
    {
      RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(allocRtn))
      {
        RTN_Open(allocRtn);

        // Record Malloc size
        RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);

        // Record Malloc return address
        RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
          IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

        RTN_Close(allocRtn);
      }
    }
    else if (undFuncName == "free")
    {
      RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(freeRtn))
      {
        RTN_Open(freeRtn);

        RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
          IARG_END);

        RTN_Close(freeRtn);
      }
    }
  }
}

VOID FinalFunc(INT32 code, VOID *v) 
{
  for (pair<ADDRINT, bool> p : MallocMap) 
  {
    if (!p.second) 
      LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl;
  }

  LogFile.close();
}

int main(int argc, char *argv[])
{
  PIN_InitSymbols();
  PIN_Init(argc, argv);
  LogFile.open(LogFileName.Value().c_str());
  IMG_AddInstrumentFunction(CustomInstrumentation, NULL);
  PIN_AddFiniFunction(FinalFunc, NULL);
  PIN_StartProgram();

  return 0;
}

If you run it against our ExercisePin.exe (see the section Target/Guest Program) binary.

C:\pin>pin -t c:\pin\source\tools\MallocTracer\Release\MallocTracer.dll -- ExercisePin.exe
done

C:\pin>type memprofile.out
[*] Freeing unallocated memory at address 0x0.
[*] malloc(2)   = 0x564f68
[*] malloc(128)   = 0x569b88
[*] malloc(128)   = 0x569c10
[*] malloc(128)   = 0x569c98
[*] free(0x569b88)
[*] free(0x569c98)
[*] malloc(2)   = 0x564e78
[*] Memory at address 0x564e78 allocated but not freed
[*] Memory at address 0x564f68 allocated but not freed
[*] Memory at address 0x569c10 allocated but not freed

Or, if we pass any data as an argument to our ExercisePin.exe…

C:\pin>pin.exe -t "C:\pin\source\tools\MallocTracer\Release\MallocTracer.dll" -- C:\TARGET\ExercisePin.exe moo

C:\pin>type memprofile.out
[*] Freeing unallocated memory at address 0x0.
[*] malloc(2)   = 0x214f78
[*] malloc(128)   = 0x218f98
[*] malloc(128)   = 0x219020
[*] malloc(128)   = 0x2190a8
[*] free(0x218f98)
[*] free(0x2190a8)
[*] Memory at address 0x2190a8 has been freed more than once (Double Free).

As we can see above, our Pintool was able to identify all the issues we were aware of in our test case. That is, invalid free, memory leaks, and a double free. The reason why we don’t see the memory leaks in the last output, it’s because our binary crashes when the double free happens. The binary was built with Visual Studio, which adds some Heap integrity checks and makes it crash. If you build ExercisePin.exe with gcc, or another compiler, the double free won’t be noticed and the program will keep running. However, if you build it with gcc, for example, you’ll see many other malloc and free calls from the C Run-Time Library initialization code. Hence, I didn’t use gcc to make it easier to follow.

Basic DynamoRIO client (MallocWrap)

We’ll create a DynamoRIO client that mimics the Pintool above. That is, we’ll log all the malloc and free calls. The same way, the instrumentation is added before and after the malloc call since we want to log the parameter passed to the call and its return value. For the free call, we’ll only look at its parameter, and not at its return value. So the instrumentation is only added before the call.

We’ll use the drwrap DynamoRIO extension, which provides function wrapping and replacing support, drwrap uses the drmgr extension to ensure its events occur at the proper order.

We start with some “standard” includes, and to use the DynamoRIO APIs we need to include dr_api.h.

#include "stdafx.h"
#include <fstream>
#include "dr_api.h"
#include "drmgr.h"
#include "drwrap.h"
using namespace std;

Additionally, we include the headers for the extensions mentioned above. That is, drmgr.h and drwrap.h. We’ll write the output of this DynamoRIO client to a text file, hence the fstream include. I won’t use a container in this example to keep track of the memory allocations. You can just copy and paste that functionality from the Pintool above with slight modifications, so I’ll leave that for you as an exercise. In this example, we’ll simply log malloc and free calls to demonstrate how to use the DynamoRIO API to accomplish the same as before, where we used Pin.

Then, we have the functions’ declaration, and some global variables.

static void event_exit(void);
static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data);
static void wrap_malloc_post(void *wrapcxt, void *user_data);
static void wrap_free_pre(void *wrapcxt, OUT void **user_data);

ofstream LogFile;
#define MALLOC_ROUTINE_NAME "malloc"
#define FREE_ROUTINE_NAME "free"

These are all cosmetic, we could have used these #defines in our Pintool too. We didn’t, the reason being is… we don’t have to. Feel free to adopt the style you want. I built this example on top of this one, so I ended up using more or less the same “style”. If you plan to port your client or Pintool to other platforms, this can be considered a good practice because it will make the changes easier.

Next, we have a function called module_load_event, which his a callback function registered by the drmgr_register_module_load_event. DynamoRIO will call this function whenever the application loads a module. As you can see, not that different from Pin.

static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded)
{
  app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, MALLOC_ROUTINE_NAME);
  if (towrap != NULL)
  {
    bool ok = drwrap_wrap(towrap, wrap_malloc_pre, wrap_malloc_post);

    if (!ok)
    {
      dr_fprintf(STDERR, "[-] Could not wrap 'malloc': already wrapped?\n");
      DR_ASSERT(ok);
    }
  }

  towrap = (app_pc)dr_get_proc_address(mod->handle, FREE_ROUTINE_NAME);
  if (towrap != NULL)
  {
    bool ok = drwrap_wrap(towrap, wrap_free_pre, NULL);

    if (!ok)
    {
      dr_fprintf(STDERR, "[-] Could not wrap 'free': already wrapped?\n");
      DR_ASSERT(ok);
    }
  }
}

As we can see above, we then use dr_get_proc_address to get the entry point of malloc. If it doesn’t return NULL (on failure), then we use drwrap_wrap to wrap the application function by calling wrap_malloc_pre() prior to every invocation of the original function (malloc) and calling wrap_malloc_post() after every invocation of the original function (malloc). Again, conceptually, very close to what we did with Pin.

We do the same with free. However, as stated before we are only interested in the free parameter and not its return value. So we only wrap the free call prior to every invocation (wrap_free_pre). Since we don’t care about its return value we just pass NULL as the third parameter to drwrap_wrap. With drwrap_wrap one of the callbacks can be NULL, but not both.

We then have the dr_client_main, which is, let’s say, our main function. DynamoRIO looks up dr_client_main in each client library and calls that function when the process starts.

We have a pretty common “main”, with calls to dr_set_client_name (which sets information presented to users in diagnostic messages), dr_log (which simply writes to DynamoRIO's log file), and a couple of functions that you can guess what they do by its name.

Additionally, drmgr_init, and drwrap_init, initialize the respective extensions. The dr_register_exit_event is pretty much the same as the Pin PIN_AddFiniFunction, which is a call to a function to be executed immediately before the application exits.

Lastly, we have the call to drmgr_register_module_load_event that we already mentioned above.

DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
{
  LogFile.open("memprofile.out");

  dr_set_client_name("DynamoRIO Sample Client 'wrap'", "http://dynamorio.org/issues");
  dr_log(NULL, LOG_ALL, 1, "Client 'wrap' initializing\n");

  if (dr_is_notify_on()) 
  {
    dr_enable_console_printing();
    dr_fprintf(STDERR, "[*] Client wrap is running\n");
  }

  drmgr_init();
  drwrap_init();
  dr_register_exit_event(event_exit);
  drmgr_register_module_load_event(module_load_event);
}

The function to be executed immediately before the application exits. Nothing special here.

static void event_exit(void)
{
  drwrap_exit();
  drmgr_exit();
}

And lastly, the callback functions already mentioned before. What’s relevant here? The call drwrap_get_arg, that as we can guess “Returns the value of the arg-th argument (0-based) to the wrapped function represented by wrapcxt. Assumes the regular C calling convention (i.e., no fastcall). May only be called from a drwrap_wrap pre-function callback. To access argument values in a post-function callback, store them in the user_data parameter passed between the pre and post functions.". And the call drwrap_get_retval, which obviously returns the return value of the wrapped function.

static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data)
{
  /* malloc(size) or HeapAlloc(heap, flags, size) */
  //size_t sz = (size_t)drwrap_get_arg(wrapcxt, 2); // HeapAlloc
  size_t sz = (size_t)drwrap_get_arg(wrapcxt, 0); // malloc

  LogFile << "[*] malloc(" << dec << sz << ")"; // log the malloc size
}

static void wrap_malloc_post(void *wrapcxt, void *user_data)
{
  int actual_read = (int)(ptr_int_t)drwrap_get_retval(wrapcxt);
  LogFile << "\t\t= 0x" << hex << actual_read << endl;
}

static void wrap_free_pre(void *wrapcxt, OUT void **user_data)
{
  int addr = (int)drwrap_get_arg(wrapcxt, 0);
  LogFile << "[*] free(0x" << hex << addr << ")" << endl;
}

Very simple, and not that different from what we have seen before with Pin.

The whole DynamoRIO client code is below. You can also get the whole Visual Studio project from GitLab here.

#include "stdafx.h"
#include <fstream>
#include "dr_api.h"
#include "drmgr.h"
#include "drwrap.h"
using namespace std;

static void event_exit(void);
static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data);
static void wrap_malloc_post(void *wrapcxt, void *user_data);
static void wrap_free_pre(void *wrapcxt, OUT void **user_data);

ofstream LogFile;
#define MALLOC_ROUTINE_NAME "malloc"
#define FREE_ROUTINE_NAME "free"

static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded)
{
  app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, MALLOC_ROUTINE_NAME);
  if (towrap != NULL)
  {
    bool ok = drwrap_wrap(towrap, wrap_malloc_pre, wrap_malloc_post);

    if (!ok)
    {
      dr_fprintf(STDERR, "[-] Could not wrap 'malloc': already wrapped?\n");
      DR_ASSERT(ok);
    }
  }

  towrap = (app_pc)dr_get_proc_address(mod->handle, FREE_ROUTINE_NAME);
  if (towrap != NULL)
  {
    bool ok = drwrap_wrap(towrap, wrap_free_pre, NULL);

    if (!ok)
    {
      dr_fprintf(STDERR, "[-] Could not wrap 'free': already wrapped?\n");
      DR_ASSERT(ok);
    }
  }
}

DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
{
  LogFile.open("memprofile.out");

  dr_set_client_name("DynamoRIO Sample Client 'wrap'", "http://dynamorio.org/issues");
  dr_log(NULL, LOG_ALL, 1, "Client 'wrap' initializing\n");

  if (dr_is_notify_on()) 
  {
    dr_enable_console_printing();
    dr_fprintf(STDERR, "[*] Client wrap is running\n");
  }

  drmgr_init();
  drwrap_init();
  dr_register_exit_event(event_exit);
  drmgr_register_module_load_event(module_load_event);
}

static void event_exit(void)
{
  drwrap_exit();
  drmgr_exit();
}

static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data)
{
  /* malloc(size) or HeapAlloc(heap, flags, size) */
  //size_t sz = (size_t)drwrap_get_arg(wrapcxt, 2); // HeapAlloc
  size_t sz = (size_t)drwrap_get_arg(wrapcxt, 0); // malloc

  LogFile << "[*] malloc(" << dec << sz << ")"; // log the malloc size
}

static void wrap_malloc_post(void *wrapcxt, void *user_data)
{
  int actual_read = (int)(ptr_int_t)drwrap_get_retval(wrapcxt);
  LogFile << "\t\t= 0x" << hex << actual_read << endl;
}

static void wrap_free_pre(void *wrapcxt, OUT void **user_data)
{
  int addr = (int)drwrap_get_arg(wrapcxt, 0);
  LogFile << "[*] free(0x" << hex << addr << ")" << endl;
}

If you run it against our ExercisePin.exe (see the section Target/Guest Program) binary.

C:\dynamorio\bin32>drrun.exe -client "C:\Users\bob\Desktop\WRKDIR\MallocWrap\Release\MallocWrap.dll" 0 "" c:\Users\bob\Desktop\ExercisePin.exe
[*] Client wrap is running
done
C:\dynamorio\bin32>type memprofile.out
[*] free(0x0)
[*] malloc(2)   = 0x5a35d0
[*] malloc(128)   = 0x5a9c50
[*] malloc(128)   = 0x5a9cd8
[*] malloc(128)   = 0x5a9d60
[*] free(0x5a9c50)
[*] free(0x5a9d60)
[*] malloc(2)   = 0x5a34e0

We can extend this program to get the exact same functionality as our Pintool and check for memory corruption bugs instead of logging the calls only. I’ll leave that as an exercise for you.

Basic Frida script (MallocLogger)

Frida is a fast-growing DBI framework, mainly used in mobile devices. I haven’t played much with mobile applications in a long time (it’s about to change though), still, I wanted to give Frida a try because I heard good things about it, and it also supports Windows. The interesting part here is that Frida injects a JavaScript interpreter in the target/guest program. So, instead of writing C code, we’ll be writing JavaScript to instrument our program (actually, if we want we can also use C or Swift). You can see this as an advantage, or disadvantage. If you are a vulnerability hunter, and you like to poke around browsers then this should be an advantage, I guess. It’s actually very interesting that we are writing instrumentation code to manipulate low-level instructions by using a high-level language.

You can find the JavaScript API here. Anyway, the use case will be exactly the same as the ones we saw before.

While the instrumentation code has to be written in JavaScript (well, again, that’s not true but let’s use JavaScript because it’s cool), the resulting tools can be written in either Python or JavaScript.

We’ll use Frida’s Interceptor to trace all malloc and free calls for a start. The target will be our ExercisePin.exe binary again. We’ll also try to create an output close to the one of our basic MallocTracer Pintool, and MallocWrap DynamoRIO client. Which means we’ll log the amount of memory requested, the return address of malloc and the argument of free.

Here’s the sample MallocLogger.py Python script.

#!/usr/bin/env python
import frida
import sys

pid = frida.spawn(['ExercisePin.exe'])
session = frida.attach(pid)

contents = open('mallocLogger.js').read()
script = session.create_script(contents)
script.load()
frida.resume(pid)
sys.stdin.read()

And below is the instrumentation JavaScript file, MallocLogger.js.

// Interceptor for 'malloc'
Interceptor.attach(Module.findExportByName(null, 'malloc'),
    {
      // Log before malloc
      onEnter: function (args) {
        console.log("malloc(" + args[0].toInt32() + ")");
      },
      // Log after malloc
      onLeave: function (retval) {
        console.log("\t\t= 0x" + retval.toString(16));
      }
    });

// Interceptor for 'free'
Interceptor.attach(Module.findExportByName(null, 'free'),
    {
      onEnter: function (args) {
        console.log("free(0x" + args[0].toString(16) + ")");
      }
    });

If we run this Python script we get something like.

C:\Users\bob\Desktop\frida>python MallocLogger.py
free(0x0)
malloc(2)
                = 0x984268
malloc(128)
                = 0x9856d8
malloc(128)
                = 0x985760
malloc(128)
                = 0x9857e8
done
free(0x9856d8)
free(0x9857e8)
malloc(2)
                = 0x984278

Interestingly enough, Frida also comes with an utility frida-trace.exe that pretty much allows us to do the exact same thing we did above without writing almost any code (besides adding a bit more of information and tweaking the output).

C:\Users\bob\Desktop\frida>frida-trace -i malloc -i free .\ExercisePin.exe
Instrumenting functions...
malloc: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\malloc.js"
malloc: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\malloc.js"
free: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\free.js"
free: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\free.js"
Started tracing 4 functions. Press Ctrl+C to stop.
done
           /* TID 0x1f84 */
   125 ms  free()
   125 ms  malloc()
   125 ms  malloc()
   125 ms  malloc()
   125 ms  malloc()
   125 ms  free()
   125 ms  free()
   125 ms  malloc()
Process terminated

If you look at the output above you can see that some JavaScript handlers were auto-generated. We can just tweak this JavaScript code to make the output look as before. If we open for example the file __handlers__\msvcrt.dll\malloc.js we’ll see something like:

/*
 * Auto-generated by Frida. Please modify to match the signature of malloc.
 * This stub is currently auto-generated from manpages when available.
 *
 * For full API reference, see: http://www.frida.re/docs/javascript-api/
 */

{
  /**
   * Called synchronously when about to call malloc.
   *
   * @this {object} - Object allowing you to store state for use in onLeave.
   * @param {function} log - Call this function with a string to be presented to the user.
   * @param {array} args - Function arguments represented as an array of NativePointer objects.
   * For example use Memory.readUtf8String(args[0]) if the first argument is a pointer to a C string encoded as UTF-8.
   * It is also possible to modify arguments by assigning a NativePointer object to an element of this array.
   * @param {object} state - Object allowing you to keep state across function calls.
   * Only one JavaScript function will execute at a time, so do not worry about race-conditions.
   * However, do not use this to store function arguments across onEnter/onLeave, but instead
   * use "this" which is an object for keeping state local to an invocation.
   */
  onEnter: function (log, args, state) {
    log("malloc()");
  },

  /**
   * Called synchronously when about to return from malloc.
   *
   * See onEnter for details.
   *
   * @this {object} - Object allowing you to access state stored in onEnter.
   * @param {function} log - Call this function with a string to be presented to the user.
   * @param {NativePointer} retval - Return value represented as a NativePointer object.
   * @param {object} state - Object allowing you to keep state across function calls.
   */
  onLeave: function (log, retval, state) {
  }
}

We just need to tweak the onEnter and onLeave functions. For example.

/*
 * Auto-generated by Frida. Please modify to match the signature of malloc.
 * This stub is currently auto-generated from manpages when available.
 *
 * For full API reference, see: http://www.frida.re/docs/javascript-api/
 */

{
  /**
   * Called synchronously when about to call malloc.
   *
   * @this {object} - Object allowing you to store state for use in onLeave.
   * @param {function} log - Call this function with a string to be presented to the user.
   * @param {array} args - Function arguments represented as an array of NativePointer objects.
   * For example use Memory.readUtf8String(args[0]) if the first argument is a pointer to a C string encoded as UTF-8.
   * It is also possible to modify arguments by assigning a NativePointer object to an element of this array.
   * @param {object} state - Object allowing you to keep state across function calls.
   * Only one JavaScript function will execute at a time, so do not worry about race-conditions.
   * However, do not use this to store function arguments across onEnter/onLeave, but instead
   * use "this" which is an object for keeping state local to an invocation.
   */
  onEnter: function (log, args, state) {
    log("malloc(" + args[0].toInt32() + ")");
  },

  /**
   * Called synchronously when about to return from malloc.
   *
   * See onEnter for details.
   *
   * @this {object} - Object allowing you to access state stored in onEnter.
   * @param {function} log - Call this function with a string to be presented to the user.
   * @param {NativePointer} retval - Return value represented as a NativePointer object.
   * @param {object} state - Object allowing you to keep state across function calls.
   */
  onLeave: function (log, retval, state) {
    log("\t\t= 0x" + retval.toString(16));
  }
}

Now, if we run again the exact same command as before we’ll get the following.

C:\Users\bob\Desktop\frida>frida-trace -i malloc -i free .\ExercisePin.exe
Instrumenting functions...
malloc: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\malloc.js"
malloc: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\malloc.js"
free: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\free.js"
free: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\free.js"
Started tracing 4 functions. Press Ctrl+C to stop.
done
           /* TID 0x23e4 */
    64 ms  free(0x0)
    64 ms  malloc(2)
    64 ms               = 0x8a42a8
    64 ms  malloc(128)
    64 ms               = 0x8a57a0
    64 ms  malloc(128)
    64 ms               = 0x8a5828
    64 ms  malloc(128)
    64 ms               = 0x8a58b0
    64 ms  free(0x8a57a0)
    64 ms  free(0x8a58b0)
    65 ms  malloc(2)
    65 ms               = 0x8a42b8
Process terminated

We can extend this program to get the exact same functionality as our Pintool, and check for memory corruption bugs instead of logging the calls only. I’ll leave that as an exercise for you.

Debugging

If you want to debug your Pintool you should use the -pause_tool switch and specify the number of seconds to wait until you attach the debugger to its process. See below how.

C:\pin\source\tools\MallocTracer\Release>c:\pin\pin.exe -pause_tool 20 -t "C:\pin\source\tools\MallocTracer\Release\MallocTracer.dll" -- ExercisePin.exe
Pausing for 20 seconds to attach to process with pid 1568

For debugging of the Pintool I actually don’t use Visual Studio, I prefer to use WinDbg because I’m used to it and it is awesome. Once you attach to the process with WinDbg it’s very easy to set up a breakpoint wherever you like in your Pintool. Below is just a simple example of setting a breakpoint in the main function of my Pintool.

Microsoft (R) Windows Debugger Version 10.0.17134.12 X86
Copyright (c) Microsoft Corporation. All rights reserved.

*** wait with pending attach
Symbol search path is: srv*
Executable search path is: 
ModLoad: 00080000 00087000   C:\pin\source\tools\MallocTracer\Release\ExercisePin.exe
ModLoad: 77800000 77980000   C:\Windows\SysWOW64\ntdll.dll
ModLoad: 769d0000 76ae0000   C:\Windows\syswow64\kernel32.dll
ModLoad: 76b50000 76b97000   C:\Windows\syswow64\KERNELBASE.dll
Break-in sent, waiting 30 seconds...
ModLoad: 54c20000 54f93000   MallocTracer.dll
It is now possible to set breakpoints in Pin tool.
Use "Go" command (F5) to proceed.
(620.12c0): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=53833c8c ecx=76b6388e edx=00000000 esi=53833c8c edi=53833cb8
eip=76b6338d esp=01ad1930 ebp=0042e7e4 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
KERNELBASE!DebugBreak+0x2:
76b6338d cc              int     3
0:000> lmf
start    end        module name
00080000 00087000   ExercisePin C:\pin\source\tools\MallocTracer\Release\ExercisePin.exe
54c20000 54f93000   MallocTracer MallocTracer.dll
769d0000 76ae0000   kernel32 C:\Windows\syswow64\kernel32.dll
76b50000 76b97000   KERNELBASE C:\Windows\syswow64\KERNELBASE.dll
77800000 77980000   ntdll    C:\Windows\SysWOW64\ntdll.dll
0:000> lmDvmMallocTracer
Browse full module list
start    end        module name
54c20000 54f93000   MallocTracer   (deferred)             
    Image path: MallocTracer.dll
    Image name: MallocTracer.dll
    Browse all global symbols  functions  data
    Timestamp:        Sat Jun 30 14:28:14 2018 (5B37F5EE)
    CheckSum:         00000000
    ImageSize:        00373000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
0:000> x /D /f MallocTracer!a*
 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

*** WARNING: Unable to verify checksum for MallocTracer.dll
54c549b8          MallocTracer!ASM_pin_wow64_gate (<no parameter info>)
54c5483c          MallocTracer!ATOMIC_Increment16 (<no parameter info>)
54c547d0          MallocTracer!ATOMIC_Swap8 (<no parameter info>)
54c54854          MallocTracer!ATOMIC_Increment32 (<no parameter info>)
54e28b64          MallocTracer!ADDRINT_AtomicInc (<no parameter info>)
54c35e20          MallocTracer!atexit (<no parameter info>)
54c547fc          MallocTracer!ATOMIC_Swap32 (<no parameter info>)
54c54740          MallocTracer!ATOMIC_SpinDelay (<no parameter info>)
54c533c0          MallocTracer!ATOMIC::LIFO_PTR<LEVEL_BASE::SWMALLOC::FREE_LIST_ELEMENT,3,LEVEL_BASE::ATOMIC_STATS>::PopInternal (<no parameter info>)
54e1a2b0          MallocTracer!abort (<no parameter info>)
54c54810          MallocTracer!ATOMIC_Copy64 (<no parameter info>)
54c547e4          MallocTracer!ATOMIC_Swap16 (<no parameter info>)
54c41710          MallocTracer!ATOMIC::LIFO_CTR<ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT,ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT_HEAP,1,32,unsigned __int64,ATOMIC::NULLSTATS>::Pop (<no parameter info>)
54c54824          MallocTracer!ATOMIC_Increment8 (<no parameter info>)
54c549bb          MallocTracer!ASM_pin_wow64_gate_end (<no parameter info>)
54c5478c          MallocTracer!ATOMIC_CompareAndSwap32 (<no parameter info>)
54c54750          MallocTracer!ATOMIC_CompareAndSwap8 (<no parameter info>)
54c41820          MallocTracer!ATOMIC::LIFO_CTR<ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT,ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT_HEAP,1,32,unsigned __int64,ATOMIC::NULLSTATS>::Push (<no parameter info>)
54c535a0          MallocTracer!ATOMIC::IDSET<7,LEVEL_BASE::ATOMIC_STATS>::ReleaseID (<no parameter info>)
54c547a8          MallocTracer!ATOMIC_CompareAndSwap64 (<no parameter info>)
54c3e660          MallocTracer!ATOMIC::EXPONENTIAL_BACKOFF<LEVEL_BASE::ATOMIC_STATS>::~EXPONENTIAL_BACKOFF<LEVEL_BASE::ATOMIC_STATS> (<no parameter info>)
54c5476c          MallocTracer!ATOMIC_CompareAndSwap16 (<no parameter info>)
0:000> x /D /f MallocTracer!m*
 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

54e21e20          MallocTracer!mbsinit (<no parameter info>)
54c6e450          MallocTracer!mmap (<no parameter info>)
54c3bb40          MallocTracer!malloc (<no parameter info>)
54e21db0          MallocTracer!memchr (<no parameter info>)
54e21e00          MallocTracer!mbrtowc (<no parameter info>)
54e26500          MallocTracer!mbrlen (<no parameter info>)
54e21e40          MallocTracer!mbsnrtowcs (<no parameter info>)
54e261b0          MallocTracer!mbrtoc32 (<no parameter info>)
54c38730          MallocTracer!main (<no parameter info>)
54e1a2f0          MallocTracer!memset (<no parameter info>)
54e26410          MallocTracer!mbstate_get_byte (<no parameter info>)
54e22010          MallocTracer!mbsrtowcs (<no parameter info>)
54e1a1a0          MallocTracer!memmove (<no parameter info>)
54e263e0          MallocTracer!mbstate_bytes_so_far (<no parameter info>)
54e1a2c0          MallocTracer!memcpy (<no parameter info>)
54c6e480          MallocTracer!munmap (<no parameter info>)
54e26420          MallocTracer!mbstate_set_byte (<no parameter info>)
0:000> bp 54c38730
0:000> g
Breakpoint 0 hit
eax=53833cb8 ebx=54f64000 ecx=00000000 edx=54f356c0 esi=54f6500a edi=54f65000
eip=54c38730 esp=01ad19f4 ebp=53833c8c iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MallocTracer!main:
54c38730 55              push    ebp

For DynamoRIO I’ll just point you to the official documentation since the debugging process can be a bit more tricky. Check the documentation here.

Pintool (WinMallocTracer)

As mentioned in the beginning, this post is all about Windows. Which means it doesn’t really make sense to be tracking malloc, and/or free. If we want to play with “real” Windows applications we need to trace the Windows Heap family of functions.

It’s a good time to look again at the diagram shown before that illustrates the relationship of Windows API calls used to allocate process memory (from the book The Art of Memory Forensics).

If we want to make sure we’ll always “see” the memory allocations performed by Windows applications, we should be looking for RtlAllocateHeap, RtlReAllocateHeap, RtlFreeHeap, VirtualAllocEx, and VirtualFreeEx.

The Pintool below looks exactly at these functions. If you play a bit with multiple applications you’ll realize that to accomplish “our” goal of tracking memory allocations we’ll face a lot of challenges. The code below tries to overcome some of them.

I won’t go into detail explaining the API calls used as I did before. Mainly because they are mostly the same. I’ll leave the code here and you can go through it. After I simply mention some of the main differences when compared to the basic Pintool presented before.

#include "pin.h"
#include <iostream>
#include <fstream>
#include <map>

map<ADDRINT, bool> MallocMap;
ofstream LogFile;
KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name");
KNOB<string> EntryPoint(KNOB_MODE_WRITEONCE, "pintool", "entrypoint", "main", "Guest entry-point function");
KNOB<BOOL> EnumSymbols(KNOB_MODE_WRITEONCE, "pintool", "symbols", "0", "List Symbols");
BOOL start_trace = false;

VOID LogBeforeVirtualAlloc(ADDRINT size)
{
  if (!start_trace)
    return;

  LogFile << "[*] VirtualAllocEx(" << dec << size << ")";
}

VOID LogAfterVirtualAlloc(ADDRINT addr)
{
  if (!start_trace)
    return;

  if (addr == NULL)
  {
    cerr << "[-] Error: VirtualAllocEx() return value was NULL.";
    return;
  }

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      it->second = false;
    else
      cerr << "[-] Error: allocating memory not freed!?!" << endl;
  }
  else
  {
    MallocMap.insert(pair<ADDRINT, bool>(addr, false));
    LogFile << "\t\t= 0x" << hex << addr << endl;
  }
}

VOID LogBeforeVirtualFree(ADDRINT addr)
{
  if (!start_trace)
    return;

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once (Double Free)." << endl;
    else
    {
      it->second = true;    // Mark it as freed
      LogFile << "[*] VirtualFreeEx(0x" << hex << addr << ")" << endl;
    }
  }
  else
    LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;
}

VOID LogBeforeReAlloc(ADDRINT freed_addr, ADDRINT size)
{
  if (!start_trace)
    return;

  // mark freed_addr as free
  map<ADDRINT, bool>::iterator it = MallocMap.find(freed_addr);

  if (it != MallocMap.end())
  {
    it->second = true;
    LogFile << "[*] RtlHeapfree(0x" << hex << freed_addr << ") from RtlHeapRealloc()" << endl;
  }
  else
    LogFile << "[-] RtlHeapRealloc could not find addr to free??? - " << freed_addr << endl;

  LogFile << "[*] RtlHeapReAlloc(" << dec << size << ")";
}

VOID LogAfterReAlloc(ADDRINT addr)
{
  if (!start_trace)
    return;

  if (addr == NULL)
    return;

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      it->second = false;
    else
      // it already exists because of the HeapAlloc, we don't need to insert... just log it
      LogFile << "\t\t= 0x" << hex << addr << endl;
  }
}

VOID LogBeforeMalloc(ADDRINT size)
{
  if (!start_trace)
    return;

  LogFile << "[*] RtlAllocateHeap(" << dec << size << ")";
}

VOID LogAfterMalloc(ADDRINT addr)
{
  if (!start_trace)
    return;

  if (addr == NULL)
  {
    cerr << "[-] Error: RtlAllocateHeap() return value was NULL.";
    return;
  }

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      it->second = false;
    else
      cerr << "[-] Error: allocating memory not freed!?!" << endl;
  }
  else
  {
    MallocMap.insert(pair<ADDRINT, bool>(addr, false));
    LogFile << "\t\t= 0x" << hex << addr << endl;
  }
}

VOID LogFree(ADDRINT addr)
{
  if (!start_trace)
    return;

  map<ADDRINT, bool>::iterator it = MallocMap.find(addr);

  if (it != MallocMap.end())
  {
    if (it->second)
      LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once (Double Free)." << endl;
    else
    {
      it->second = true;    // Mark it as freed
      LogFile << "[*] RtlFreeHeap(0x" << hex << addr << ")" << endl;
    }
  }
  else
    LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl;
}

VOID BeforeMain() {
  start_trace = true;
}
VOID AfterMain() {
  start_trace = false;
}

VOID CustomInstrumentation(IMG img, VOID *v)
{
  for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
  {
    string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);

    if(EnumSymbols.Value())
    {
      LogFile << "" << undFuncName << "" << endl;
      continue;
    }

    if (undFuncName == EntryPoint.Value().c_str())
    {
      RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(allocRtn))
      {
        RTN_Open(allocRtn);

        RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)BeforeMain, IARG_END);
        RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)AfterMain, IARG_END);

        RTN_Close(allocRtn);
      }
    }
    if (undFuncName == "RtlAllocateHeap")
    {
      RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(allocRtn))
      {
        RTN_Open(allocRtn);
        
        // Record RtlAllocateHeap size
        RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END);

        // Record RtlAllocateHeap return address
        RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc,
          IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
        
        RTN_Close(allocRtn);
      }
    }
    if (undFuncName == "RtlReAllocateHeap")
    {
      RTN reallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(reallocRtn))
      {
        RTN_Open(reallocRtn);

        // Record RtlReAllocateHeap freed_addr, size
        RTN_InsertCall(reallocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeReAlloc,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_FUNCARG_ENTRYPOINT_VALUE, 3, IARG_END);

        // Record RtlReAllocateHeap return address
        RTN_InsertCall(reallocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterReAlloc,
          IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

        RTN_Close(reallocRtn);
      }
    }
    else if (undFuncName == "RtlFreeHeap")
    {
      RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(freeRtn))
      {
        RTN_Open(freeRtn);

        RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 2,
          IARG_END);

        RTN_Close(freeRtn);
      }
    }
    if (undFuncName == "VirtualAllocEx")
    {
      RTN vrallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(vrallocRtn))
      {
        RTN_Open(vrallocRtn);

        RTN_InsertCall(vrallocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeVirtualAlloc,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END);

        RTN_InsertCall(vrallocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterVirtualAlloc,
          IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

        RTN_Close(vrallocRtn);
      }
    }
    if (undFuncName == "VirtualFreeEx")
    {
      RTN vrfreeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

      if (RTN_Valid(vrfreeRtn))
      {
        RTN_Open(vrfreeRtn);

        RTN_InsertCall(vrfreeRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeVirtualFree,
          IARG_FUNCARG_ENTRYPOINT_VALUE, 1, IARG_END);

        RTN_Close(vrfreeRtn);
      }
    }
  }
}

VOID FinalFunc(INT32 code, VOID *v)
{
  for (pair<ADDRINT, bool> p : MallocMap)
  {
    if (!p.second)
      LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl;
  }

  LogFile.close();
}

int main(int argc, char *argv[])
{
  PIN_InitSymbols();
  PIN_Init(argc, argv);

  LogFile.open(LogFileName.Value().c_str());
  LogFile << "## Memory tracing for PID = " << PIN_GetPid() << " started" << endl;

  if (EnumSymbols.Value())
    LogFile << "### Listing Symbols" << endl;
  else
    LogFile << "### Started tracing after '" << EntryPoint.Value().c_str() << "()' call" << endl;
  
  IMG_AddInstrumentFunction(CustomInstrumentation, NULL);
  PIN_AddFiniFunction(FinalFunc, NULL);
  PIN_StartProgram();

  return 0;
}

There are a couple of new options supported by this Pintool. If you look at the KNOB switches (below), you’ll see that the Pintool now supports two new options.

KNOB<string> EntryPoint(KNOB_MODE_WRITEONCE, "pintool", "entrypoint", "main", "Guest entry-point function");
KNOB<BOOL> EnumSymbols(KNOB_MODE_WRITEONCE, "pintool", "symbols", "0", "List Symbols");

You can specify what’s the entry-point function of the target/guest application you want to trace. Why is this useful? If you don’t do it, all the initialization code will also be traced and it will become very hard to make sense of the output of our Pintool. Try. By default, the tracing will start only after the function main is called. Obviously, if our target/guest application doesn’t have a main function, we’ll end with an empty output file.

Let’s look at a specific example. Let’s look at the Windows calc.exe. This binary doesn’t have a main function. So we run our Pintool as shown below.

C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -- calc.exe

We’ll get the following output.

## Memory tracing for PID = 1732 started
### Started tracing after 'main()' call

As expected, since calc.exe doesn’t have a main function. So, if we want to trace calc.exe or any other binary, we’ll need to find what’s its entry-point (or any other call after we want to start our trace). We can launch it on IDA, for example, or we can use the other KNOB switch (-symbols) as shown below to list all the symbols.

C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -symbols 1 -- calc.exe

And look at the output file (by default memprofile.out) to see if we can find the function we are looking for.

C:\pin> type memprofile.out
## Memory tracing for PID = 5696 started
### Listing Symbols
unnamedImageEntryPoint
InterlockedIncrement
InterlockedDecrement
InterlockedExchange
InterlockedCompareExchange
InterlockedExchangeAdd
KernelBaseGetGlobalData
unnamedImageEntryPoint
GetErrorMode
SetErrorMode
CreateIoCompletionPort
PostQueuedCompletionStatus
GetOverlappedResult
(...)

If you want to see the whole contents of the file you can find it here. The first line is quite interesting though, and it’s probably what we are looking for (unnamedImageEntryPoint). So we can use our Pintool as shown below.

C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -entrypoint unnamedImageEntryPoint -- calc.exe

And if we look at the output this time we’ll get something like:

C:\pin> type memprofile.out
## Memory tracing for PID = 6656 started
### Started tracing after 'unnamedImageEntryPoint()' call
[*] RtlAllocateHeap(32)   = 0x4d9098
[*] RtlAllocateHeap(564)    = 0x2050590
[*] RtlAllocateHeap(520)    = 0x4dcb18
[*] RtlAllocateHeap(1024)   = 0x4dd240
[*] RtlAllocateHeap(532)    = 0x20507d0
[*] RtlAllocateHeap(1152)   = 0x20509f0
[*] RtlAllocateHeap(3608)   = 0x4dd648
[*] RtlAllocateHeap(1804)   = 0x2050e78
[*] RtlFreeHeap(0x4dd648)
(...)

If you want to see the whole contents of the file you can find it here. As you’ll see, it’s still hard to read and make sense of the output. As I mentioned before, this Pintool can actually tell there’s a problem, but not where it is. I’ll try to improve the Pintool, and if you are interested you can follow its future developments here. At least, every time I detect an issue I’ll add a PIN_ApplicationBreakpoint (see here). In some cases, it might still be very hard to locate the issue, but it’s a starting point. There are also a lot of false positives, as you can see in the output of calc.exe. To validate that actually the Pintool is working we can use the following sample target/guest (I called it ExercisePin2.exe).

#include <windows.h>
#include <stdio.h>

#define PAGELIMIT 80

int my_heap_functions(char *buf) {
  HLOCAL h1 = 0, h2 = 0, h3 = 0, h4 = 0;

  h1 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 260);

  h2 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 260);
  
  HeapFree(GetProcessHeap(), 0, h1);

  h3 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 520);

  h4 = HeapReAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, h3, 1040);

  HeapFree(GetProcessHeap(), 0, h4);
  return 0;
}

int my_virtual_functions(char *buf) {
  LPVOID lpvBase;
  DWORD dwPageSize;
  BOOL bSuccess;
  SYSTEM_INFO sSysInfo;         // Useful information about the system

  GetSystemInfo(&sSysInfo);     // Initialize the structure.
  dwPageSize = sSysInfo.dwPageSize;

  // Reserve pages in the virtual address space of the process.
  lpvBase = VirtualAlloc(
    NULL,                 // System selects address
    PAGELIMIT*dwPageSize, // Size of allocation
    MEM_RESERVE,          // Allocate reserved pages
    PAGE_NOACCESS);       // Protection = no access

  if (lpvBase == NULL)
    exit("VirtualAlloc reserve failed.");

  bSuccess = VirtualFree(
    lpvBase,       // Base address of block
    0,             // Bytes of committed pages
    MEM_RELEASE);  // Decommit the pages

  return 0;
}

int main(void) {
  my_heap_functions("moo");
  my_virtual_functions("moo");

  return 0;
}

You can find the Visual Studio project here. You can play with it a compare the output with what’s expected based on ExercisePin2.c source code.

C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -symbols 1 -- C:\TARGET\ExercisePin2.exe
C:\pin> type memprofile.out
## Memory tracing for PID = 5600 started
### Listing Symbols
_enc$textbss$end
unnamedImageEntryPoint
main
my_heap_functions
my_virtual_functions
HeapAlloc
HeapReAlloc
HeapFree
GetProcessHeap
GetSystemInfo
(...)

The full output is here. Since the entry-point function is main, we can simply run the Pintool without passing anything to it.

C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -- C:\TARGET\ExercisePin2.exe
C:\pin> type memprofile.out
## Memory tracing for PID = 4396 started
### Started tracing after 'main()' call
[*] RtlAllocateHeap(260)    = 0x41dd30
[*] RtlAllocateHeap(260)    = 0x41de40
[*] RtlFreeHeap(0x41dd30)
[*] RtlAllocateHeap(520)    = 0x41df50
[*] RtlHeapfree(0x41df50) from RtlHeapRealloc()
[*] RtlHeapReAlloc(1040)    = 0x41df50
[*] RtlFreeHeap(0x41df50)
[*] VirtualAllocEx(327680)    = 0x2410000
[*] VirtualFreeEx(0x2410000)
[*] Memory at address 0x41de40 allocated but not freed

As we can see, tracing memory calls is tricky, but achievable. I’ll try to add a few more things to this WinMallocTracer Pintool in a near future. Keep an eye on GitLab if you fancy.

Final notes

Playing with a DBI framework is not that hard, as we saw, the challenge lies in doing it right. That is, handle all the corner cases efficiently. Something that looks fairly easy can become very challenging if we are going to do it right. The example tool I chose came from a specific need, and from a vulnerability discovering perspective DBI frameworks are indeed very useful. There’s a lot of room for improvement, and I plan to keep working on it.

Even though it was the Fuzzing subject that brought me here (that is, playing with DBI frameworks) I ended up not talking too much about its relationship. Think that a DBI tool per si won’t find many bugs unless you exercise as many code paths as possible. After all, a DBI system only modifies the code that’s executed. So, it’s easy to understand that we need to combine it with a coverage-guided Fuzzer to discover more bugs (preferably, exploitable).

DBI systems are here to stay, they emerged as a means for bypassing the restrictions imposed by binary code. Or, lack of access to source code. The need to understand, and modify the runtime behavior, of computer programs, is undeniable.

The field of dynamic binary modification is evolving very fast. New applications and new complex engineering challenges are appearing constantly and static binary patching and hooking are “things” from the past.

This post documents the first steps if you want to get into this area. All the code snippets used are available at this GitLab repo. And, an improved version of the WinMallocTracer Pintool is available at this GitLab repo.

References (in no particular order)

Videos

Implementing an LLVM based Dynamic Binary Instrumentation framework
DEF CON 15 - Quist and Valsmith - Covert Debugging
HIRBSecConf 2009 - Travis Ormandy - Making Software Dumber
Ole André Vadla Ravnås - Frida: The engineering behind the reverse-engineering
Finding security vulnerabilities with modern fuzzing techniques (RuhrSec 2018) (multiple references to dynamic binary instrumentation)

posted on 2021-06-02 11:01 alexicob 阅读(1189) 评论(0) 收藏举报