- Into the House of Logic
- Should Reverse Engineering Be Illegal?
- Reverse Engineering Tools and Concepts
- Approaches to Reverse Engineering
- Methods of the Reverser
- Writing Interactive Disassembler (IDA) Plugins
- Decompiling and Disassembling Software
- Decompilation in Practice: Reversing helpctr.exe
- Automatic, Bulk Auditing for Vulnerabilities
- Writing Your Own Cracking Tools
- Building a Basic Code Coverage Tool
- Conclusion
Writing Your Own Cracking Tools
Reverse engineering is mostly a tedious sport consisting of thousands of small steps and encompassing bazillions of facts. The human mind cannot manage all the data needed to do this in a reasonable way. If you're like most people, you are going to need tools to help you manage all the data. There are quite a number of debugging tools available on the market and in freeware form, but sadly most of them do not present a complete solution. For this reason, you are likely to need to write your own tools.
Coincidentally, writing tools is a great way to learn about software. Writing tools requires a real understanding of the architecture of softwaremost important, how software tends to be structured in memory and how the heap and stack operate. Learning by writing tools is more efficient than a blind brute-force approach using pencil and paper. Your skills will be better honed by tool creation, and the larval stage (learning period) will not take as long.
x86 Tools
The most common processor in most workstations seems to be the Intel x86 family, which includes the 386, 486, and Pentium chips. Other manufacturers also make compatible chips. The chips are a family because they have a subset of features that are common to all the processors. This subset is called the x86 feature set. A program that is running on an x86 processor will usually have a stack, a heap, and a set of instructions. The x86 processor has registers that contain memory addresses. These addresses indicate the location in memory where important data structures reside.
The Basic x86 Debugger
Microsoft supplies a relatively easy-to-use debugging API for Windows. The API allows you to access debugging events from a user-mode program using a simple loop. The structure of the program is quite simple:
DEBUG_EVENT dbg_evt; m_hProcess = OpenProcess( PROCESS_ALL_ACCESS | PROCESS_VM_OPERATION, 0, mPID); if(m_hProcess == NULL) { _error_out("[!] OpenProcess Failed !\n"); return; } // Alright, we have the process opened; time to start debugging. if(!DebugActiveProcess(mPID)) { _error_out("[!] DebugActiveProcess failed !\n"); return; } // Don't kill the process on thread exit. // Note: only supported on Windows XP. fDebugSetProcessKillOnExit(FALSE); while(1) { if(WaitForDebugEvent(&dbg_evt, DEBUGLOOP_WAIT_TIME)) { // Handle the debug events. OnDebugEvent(dbg_evt); if(!ContinueDebugEvent( mPID, dbg_evt.dwThreadId, DBG_CONTINUE)) { _error_out("ContinueDebugEvent failed\n"); break; } } else { // Ignore timeout errors. int err = GetLastError(); if(121 != err) { _error_out("WaitForDebugEvent failed\n"); break; } } // Exit if debugger has been disabled. if(FALSE == mDebugActive) { break; } } RemoveAllBreakPoints();
This code shows how you can connect to an already running process. You can also launch a process in debug mode. Either way, the debugging loop is the same: You simply wait for debug events. The loop continues until there is an error or the mDebugActive flag is set to TRUE. In either case, once the debugger exits, the debugger is automatically detached from the process. If you are running on Windows XP, the debugger is detached gracefully and the target process can continue executing. If you are on an older version of Windows, the debugger API will kill the patient (the target process dies). In fact, it is considered quite annoying that the debugger API kills the target process on detach! In some people's opinion this was a serious design flaw of the Microsoft debugging API that should have been fixed in version 0.01. Fortunately, this has finally been fixed in the Windows XP version.
On Breakpoints
Breakpoints are central to debugging. Elsewhere in the book you will find references to standard breakpoint techniques. A breakpoint can be issued using a simple instruction. The standard breakpoint instruction under x86 seems to be interrupt 3. The nice thing about interrupt 3 is that it can be coded as a single byte of data. This means it can be patched over existing code with minimal concern for the surrounding code bytes. This breakpoint is easy to set in code by copying the original byte to a safe location and replacing it with the byte 0xCC.
Breakpoint instructions are sometimes globbed together into blocks and are written to invalid regions of memory. Thus, if the program "accidentally" jumps to one of these invalid locations, the debug interrupt will fire. You sometimes see this on the program stack in regions between stack frames.
Of course, interrupt 3 doesn't have to be the way a breakpoint is handled. It could just as easily be interrupt 1, or anything for that matter. The interrupts are software driven and the software of the OS decides how it will handle the event. This is controlled via the interrupt descriptor table (when the processor is running in protected mode) or the interrupt vector table (when running in real mode).
To set a breakpoint, you must first save the original instruction you are replacing, then when you remove the breakpoint you can put the saved instruction back in its original location. The following code illustrates saving the original value before setting a breakpoint:
//////////////////////////////////////////////////////////////////////////////// // Change the page protection so we can read the original target instruction, // then change it back when we are done. //////////////////////////////////////////////////////////////////////////////// MEMORY_BASIC_INFORMATION mbi; VirtualQueryEx( m_hProcess, (void *)(m_bp_address), &mbi, sizeof(MEMORY_BASIC_INFORMATION)); // Now read the original byte. if(!ReadProcessMemory(m_hProcess, (void *)(m_bp_address), &(m_original_byte), 1, NULL)) { _error_out("[!] Failed to read process memory ! \n"); return NULL; } if(m_original_byte == 0xCC) { _error_out("[!] Multiple setting of the same breakpoint ! \n"); return NULL; } DWORD dwOldProtect; // Change protection back. if(!VirtualProtectEx( m_hProcess, mbi.BaseAddress, mbi.RegionSize, mbi.Protect, &dwOldProtect )) { _error_out("VirtualProtect failed!"); return NULL; } SetBreakpoint();
The previous code alters the memory protection so we can read the target address. It stores the original data byte. The following code then overwrites the memory with a 0xCC instruction. Notice that we check the memory to determine whether a breakpoint was already set before we arrived.
bool SetBreakpoint() { char a_bpx = '\xCC'; if(!m_hProcess) { _error_out("Attempt to set breakpoint without target process"); return FALSE; } //////////////////////////////////////////////////////////////////////////////// // Change the page protection so we can write, then change it back. //////////////////////////////////////////////////////////////////////////////// MEMORY_BASIC_INFORMATION mbi; VirtualQueryEx( m_hProcess, (void *)(m_bp_address), &mbi, sizeof(MEMORY_BASIC_INFORMATION)); if(!WriteProcessMemory(m_hProcess, (void *)(m_bp_address), &a_bpx, 1, NULL)) { char _c[255]; sprintf(_c, "[!] Failed to write process memory, error %d ! \n", GetLastError()); _error_out(_c); return FALSE; } if(!m_persistent) { m_refcount++; } DWORD dwOldProtect; // Change protection back. if(!VirtualProtectEx( m_hProcess, mbi.BaseAddress, mbi.RegionSize, mbi.Protect, &dwOldProtect )) { _error_out("VirtualProtect failed!"); return FALSE; } // TODO: Flush instruction cache. return TRUE; }
The previous code writes to the target process memory a single 0xCC byte. As an instruction, this is translated as an interrupt 3. We must first change the page protection of the target memory so that we can write to it. We change the protection back to the original value before allowing the program to continue. The API calls used here are fully documented in Microsoft Developer Network (MSDN) and we encourage you to check them out there.
Reading and Writing Memory
Once you have hit a breakpoint, the next task is usually to examine memory. If you want to use some of the debugging techniques discussed in this book you need to examine memory for user-supplied data. Reading and writing to memory is easily accomplished in the Windows environment using a simple API. You can query to see what kind of memory is available and you can also read and write memory using routines that are similar to memcpy.
If you want to query a memory location to determine whether it's valid or what properties are set (read, write, nonpaged, and so on) you can use the VirtualQueryEx routine.
//////////////////////////////////////////////////////// // Check that we can read the target memory address. //////////////////////////////////////////////////////// bool can_read( CDThread *theThread, void *p ) { bool ret = FALSE; MEMORY_BASIC_INFORMATION mbi; int sz = VirtualQueryEx( theThread->m_hProcess, (void *)p, &mbi, sizeof(MEMORY_BASIC_INFORMATION)); if( (mbi.State == MEM_COMMIT) && (mbi.Protect != PAGE_READONLY) && (mbi.Protect != PAGE_EXECUTE_READ) && (mbi.Protect != PAGE_GUARD) && (mbi.Protect != PAGE_NOACCESS) ) { ret = TRUE; } return ret; }
The example function will determine whether the memory address is readable. If you want to read or write to memory you can use the ReadProcessMemory and WriteProcessMemory API calls.
Debugging Multithreaded Programs
If the program has multiple threads, you can control the behavior of each individual thread (something that is very helpful when attacking more modern code). There are API calls for manipulating the thread. Each thread has a CONTEXT. A context is a data structure that controls important process data like the current instruction pointer. By modifying and querying context structures, you can control and track all the threads of a multithreaded program. Here is an example of setting the instruction pointer of a given thread:
bool SetEIP(DWORD theEIP) { CONTEXT ctx; HANDLE hThread = fOpenThread( THREAD_ALL_ACCESS, FALSE, m_thread_id ); if(hThread == NULL) { _error_out("[!] OpenThread failed ! \n"); return FALSE; } ctx.ContextFlags = CONTEXT_FULL; if(!::GetThreadContext(hThread, &ctx)) { _error_out("[!] GetThreadContext failed ! \n"); return FALSE; } ctx.Eip = theEIP; ctx.ContextFlags = CONTEXT_FULL; if(!::SetThreadContext(hThread, &ctx)) { _error_out("[!] SetThreadContext failed ! \n"); return FALSE; } CloseHandle(hThread); return TRUE; }
From this example you can see how to read and set the thread context structure. The thread context structure is fully documented in the Microsoft header files. Note that the context flag CONTEXT_FULL is set during a get or set operation. This allows you to control all the data values of the thread context structure.
Remember to close your thread handle when you are finished with the operation or else you will cause a resource leak problem. The example uses an API call called OpenThread. If you cannot link your program to OpenThread you will need to import the call manually. This has been done in the example, which uses a function pointer named fOpenThread. To initialize fOpenThread you must import the function pointer directly from KERNEL32.DLL:
typedef void * (__stdcall *FOPENTHREAD) ( DWORD dwDesiredAccess, // Access right BOOL bInheritHandle, // Handle inheritance option DWORD dwThreadId // Thread identifier ); FOPENTHREAD fOpenThread=NULL; fOpenThread = (FOPENTHREAD) GetProcAddress( GetModuleHandle("kernel32.dll"), "OpenThread" ); if(!fOpenThread) { _error_out("[!] failed to get openthread function!\n"); }
This is a particularly useful block of code because it illustrates how to define a function and import it from a DLL manually. You may use variations of this syntax for almost any exported DLL function.
Enumerate Threads or Processes
Using the "toolhelp" API that is supplied with Windows you can query all running processes and threads. You can use this code to query all running threads in your debug target.
// For the target process, build a // thread structure for each thread. HANDLE hProcessSnap = NULL; hProcessSnap = CreateToolhelp32Snapshot( TH32CS_SNAPTHREAD, mPID); if (hProcessSnap == INVALID_HANDLE_VALUE) { _error_out("toolhelp snap failed\n"); return; } else { THREADENTRY32 the; the.dwSize = sizeof(THREADENTRY32); BOOL bret = Thread32First( hProcessSnap, &the); while(bret) { // Create a thread structure. if(the.th32OwnerProcessID == mPID) { CDThread *aThread = new CDThread; aThread->m_thread_id = the.th32ThreadID; aThread->m_hProcess = m_hProcess; mThreadList.push_back( aThread ); } bret = Thread32Next(hProcessSnap, &the); } }
In this example, a CDThread object is being built and initialized for each thread. The thread structure that is obtained, THREADENTRY32, has many interesting values to the debugger. We encourage you to reference the Microsoft documentation on this API. Note that the code checks the owner process identification (PID) for each thread to make sure it belongs to the debug target process.
Single Stepping
Tracing the flow of program execution is very important when you want to know if the attacker (or maybe you) can control logic. For example, if the 13th byte of the packet is being passed to a switch statement, the attacker controls the switch statement by virtue of the fact that the attacker controls the 13th byte of the packet.
Single stepping is a feature of the x86 chipset. There is a special flag (called TRAP FLAG) in the processor that, if set, will cause only a single instruction to be executed followed by an interrupt. Using the single-step interrupt, a debugger can examine each and every instruction that is executing. You can also examine memory at each step using the routines listed earlier. In fact, this is exactly what a tool called The PIT does. [15] These techniques are all fairly simple, but when properly combined, they result in a very powerful debugger.
To put the processor into single step, you must set the single-step flag. The following code illustrates how to do this:
bool SetSingleStep() { CONTEXT ctx; HANDLE hThread = fOpenThread( THREAD_ALL_ACCESS, FALSE, m_thread_id ); if(hThread == NULL) { _error_out("[!] Failed to Open the BPX thread !\n"); return FALSE; } // Rewind one instruction. This means no manual snapshots anymore. ctx.ContextFlags = CONTEXT_FULL; if(!::GetThreadContext(hThread, &ctx)) { _error_out("[!] GetThreadContext failed ! \n"); return FALSE; } // Set single step for this thread. ctx.EFlags |= TF_BIT ; ctx.ContextFlags = CONTEXT_FULL; if(!::SetThreadContext(hThread, &ctx)) { _error_out("[!] SetThreadContext failed ! \n"); return FALSE; } CloseHandle(hThread); return TRUE; }
Note that we influence the trace flag by using the thread context structures. The thread ID is stored in a variable called m_thread_id. To single step a multithreaded program, all threads must be set single step.
Patching
If you are using our kind of breakpoints, you have already experienced patching. By reading the original byte of an instruction and replacing it with 0xCC, you patched the original program! Of course the technique can be used to patch in much more than a single instruction. Patching can be used to insert branching statements, new code blocks, and even to overwrite static data. Patching is one way that software pirates have cracked digital copyright mechanisms. In fact, many interesting things are made possible by changing only a single jump statement. For example, if a program has a block of code that checks the license file, all the software pirate needs to do is insert a jump that branches around the license check. [16] If you are interested in software cracking, there are literally thousands of documents on the Net published on the subject. These are easily located on the Internet by googling "software cracking."
Patching is an important skill to learn. It allows you, in many cases, to fix a software bug. Of course, it also allows you to insert a software bug. You may know that a certain file is being used by the server software of your target. You can insert a helpful backdoor using patching techniques. There is a good example of a software patch (patching the NT kernel) discussed in Chapter 8.
Fault Injection
Fault injection can take many forms [Voas and McGraw, 1999]. At its most basic, the idea is simply to supply strange or unexpected inputs to a software program and see what happens. Variations of the technique involve mutating the code and injecting corruption into the data heap or program stack. The goal is to cause the software to fail in interesting ways.
Using fault injection, software will always fail. The question is how does it fail? Does the software fail in a way that allows an attacker to gain access to the system? Does the software reveal secret information? Does the failure result in a cascade failure that affects other parts of the system? Failures that do not cause damage to the system indicate a fault-tolerant system.
Fault injection is one of the most powerful testing methodologies ever invented, yet it remains one of the most underused by commercial software vendors. This is one of the reasons why commercial software has so many bugs today. Many so-called software engineers subscribe to the philosophy that a rigid software development process necessarily results in secure and bug-free code, but it ain't necessarily so. The real world has shown us repeatedly that without a solid testing strategy, code will always have dangerous bugs. It's almost amusing (from an attacker's perspective) to know that software testing is still receiving the most meager of budgets in most software houses today. This means the world will belong to the attackers for many years to come.
Fault injection on software input is a good way to test for vulnerabilities. The reason is simple: The attacker controls the software input, so it's natural to test every possible input combination that an attacker can supply. Eventually you are bound to find a combination that exploits the software, right?! [17]
Process Snapshots
When a breakpoint fires, the program becomes frozen in mid run. All execution in all threads is stopped. It is possible at this point to use the memory routines to read or write any part of the program memory. A typical program will have several relevant memory sections. This is a snapshot of memory from the name server running BIND 9.02 under Windows NT:
named.exe: Found memory based at 0x00010000, size 4096 Found memory based at 0x00020000, size 4096 Found memory based at 0x0012d000, size 4096 Found memory based at 0x0012e000, size 8192 Found memory based at 0x00140000, size 184320 Found memory based at 0x00240000, size 24576 Found memory based at 0x00250000, size 4096 Found memory based at 0x00321000, size 581632 Found memory based at 0x003b6000, size 4096 Found memory based at 0x003b7000, size 4096 Found memory based at 0x003b8000, size 4096 Found memory based at 0x003b9000, size 12288 Found memory based at 0x003bc000, size 8192 Found memory based at 0x003be000, size 8192 Found memory based at 0x003c0000, size 8192 Found memory based at 0x003c2000, size 8192 Found memory based at 0x003c4000, size 4096 Found memory based at 0x003c5000, size 4096 Found memory based at 0x003c6000, size 12288 Found memory based at 0x003c9000, size 4096 Found memory based at 0x003ca000, size 4096 Found memory based at 0x003cb000, size 4096 Found memory based at 0x003cc000, size 8192 Found memory based at 0x003e1000, size 12288 Found memory based at 0x003e5000, size 4096 Found memory based at 0x003f1000, size 24576 Found memory based at 0x003f8000, size 4096 Found memory based at 0x0042a000, size 8192 Found memory based at 0x0042c000, size 8192 Found memory based at 0x0042e000, size 8192 Found memory based at 0x00430000, size 4096 Found memory based at 0x00441000, size 491520 Found memory based at 0x004d8000, size 45056 Found memory based at 0x004f1000, size 20480 Found memory based at 0x004f7000, size 16384 Found memory based at 0x00500000, size 65536 Found memory based at 0x00700000, size 4096 Found memory based at 0x00790000, size 4096 Found memory based at 0x0089c000, size 4096 Found memory based at 0x0089d000, size 12288 Found memory based at 0x0099c000, size 4096 Found memory based at 0x0099d000, size 12288 Found memory based at 0x00a9e000, size 4096 Found memory based at 0x00a9f000, size 4096 Found memory based at 0x00aa0000, size 503808 Found memory based at 0x00c7e000, size 4096 Found memory based at 0x00c7f000, size 135168 Found memory based at 0x00cae000, size 4096 Found memory based at 0x00caf000, size 4096 Found memory based at 0x0ffed000, size 8192 Found memory based at 0x0ffef000, size 4096 Found memory based at 0x1001f000, size 4096 Found memory based at 0x10020000, size 12288 Found memory based at 0x10023000, size 4096 Found memory based at 0x10024000, size 4096 Found memory based at 0x71a83000, size 8192 Found memory based at 0x71a95000, size 4096 Found memory based at 0x71aa5000, size 4096 Found memory based at 0x71ac2000, size 4096 Found memory based at 0x77c58000, size 8192 Found memory based at 0x77c5a000, size 20480 Found memory based at 0x77cac000, size 4096 Found memory based at 0x77d2f000, size 4096 Found memory based at 0x77d9d000, size 8192 Found memory based at 0x77e36000, size 4096 Found memory based at 0x77e37000, size 8192 Found memory based at 0x77e39000, size 8192 Found memory based at 0x77ed6000, size 4096 Found memory based at 0x77ed7000, size 8192 Found memory based at 0x77fc5000, size 20480 Found memory based at 0x7ffd9000, size 4096 Found memory based at 0x7ffda000, size 4096 Found memory based at 0x7ffdb000, size 4096 Found memory based at 0x7ffdc000, size 4096 Found memory based at 0x7ffdd000, size 4096 Found memory based at 0x7ffde000, size 4096 Found memory based at 0x7ffdf000, size 4096
You can read all these memory sections and store them. You can think of this as a snapshot of the program. If you allow the program to continue executing, you can freeze it at any time in the future using another breakpoint. At any point where the program is frozen, you can then write back the original memory that you saved earlier. This effectively "restarts" the program at the point where you took the snapshot. This means you can continually keep "rewinding" the program in time.
For automated testing, this is a powerful technique. You can take a snapshot of a program and restart it. After restoring the memory you can then fiddle with memory, add corruption, or simulate different types of attack input. Then, once running, the program will act on the faulty input. You can apply this process in a loop and keep testing the same code with different perturbation of input. This automated approach is very powerful and can allow you to test millions of input combinations.
The following code illustrates how to take a snapshot of a target process. The code performs a query on the entire possible range of memory. For each valid location, the memory is copied into a list of structures:
struct mb { MEMORY_BASIC_INFORMATION mbi; char *p; }; std: :list<struct mb *> gMemList; void takesnap() { DWORD start = 0; SIZE_T lpRead; while(start < 0xFFFFFFFF) { MEMORY_BASIC_INFORMATION mbi; int sz = VirtualQueryEx( hProcess, (void *)start, &mbi, sizeof(MEMORY_BASIC_INFORMATION)); if( (mbi.State == MEM_COMMIT) && (mbi.Protect != PAGE_READONLY) && (mbi.Protect != PAGE_EXECUTE_READ) && (mbi.Protect != PAGE_GUARD) && (mbi.Protect != PAGE_NOACCESS) ) { TRACE("Found memory based at %d, size %d\n", mbi.BaseAddress, mbi.RegionSize); struct mb *b = new mb; memcpy( (void *)&(b->mbi), (void *)&mbi, sizeof(MEMORY_BASIC_INFORMATION)); char *p = (char *)malloc(mbi.RegionSize); b->p = p; if(!ReadProcessMemory( hProcess, (void *)start, p, mbi.RegionSize, &lpRead)) { TRACE("ReadProcessMemory failed %d\nRead %d", GetLastError(), lpRead); } if(mbi.RegionSize != lpRead) { TRACE("Read short bytes %d != %d\n", mbi.RegionSize, lpRead); } gMemList.push_front(b); } if(start + mbi.RegionSize < start) break; start += mbi.RegionSize; } }
The code uses the VirtualQueryEx API call to test each location of memory from 0 to 0xFFFFFFFF. If a valid memory address is found, the size of the memory region is obtained and the next query is placed just beyond the current region. In this way the same memory region is not queried more than once. If the memory region is committed, then this means it's being used. We check that the memory is not read-only so that we only save memory regions that might be modified. Clearly, read-only memory is not going to be modified, so there is no reason to save it. If you are really careful, you can save all the memory regions. You may suspect that the target program changes the memory protections during execution, for example.
If you want to restore the program state, you can write back all the saved memory regions:
void setsnap() { std::list<struct mb *>::iterator ff = gMemList.begin(); while(ff != gMemList.end()) { struct mb *u = *ff; if(u) { DWORD lpBytes; TRACE("Writing memory based at %d, size %d\n", u->mbi.BaseAddress, u->mbi.RegionSize); if(!WriteProcessMemory(hProcess, u->mbi.BaseAddress, u->p, u->mbi.RegionSize, &lpBytes)) { TRACE("WriteProcessMemory failed, error %d\n", GetLastError()); } if(lpBytes != u->mbi.RegionSize) { TRACE("Warning, write failed %d != %d\n", lpBytes, u->mbi.RegionSize); } } ff++; } }
The code to write back the memory is much simpler. It does not need to query the memory regions; it simply writes the memory regions back to their original locations.
Disassembling Machine Code
A debugger needs to be able to disassemble instructions. A breakpoint or single-step event will leave each thread of the target process pointing to some instruction. By using the thread CONTEXT functions you can determine the address in memory where the instruction lives, but this does not reveal the actual instruction itself.
The memory needs to be "disassembled" to determine the instruction. Fortunately you don't need to write a disassembler from scratch. Microsoft supplies a disassembler with the OS. This disassembler is used, for example, by the Dr. Watson utility when a crash occurs. We can borrow from this existing tool to provide disassembly functions in our debugger:
HANDLE hThread = fOpenThread( THREAD_ALL_ACCESS, FALSE, theThread->m_thread_id ); if(hThread == NULL) { _error_out("[!] Failed to Open the thread handle !\n"); return FALSE; } DEBUGPACKET dp; dp.context = theThread->m_ctx; dp.hProcess = theThread->m_hProcess; dp.hThread = hThread; DWORD ulOffset = dp.context.Eip; // Disassemble the instruction. if ( disasm ( &dp , &ulOffset , (PUCHAR)m_instruction, FALSE ) ) { ret = TRUE; } else { _error_out("error disassembling instruction\n"); ret = FALSE; } CloseHandle(hThread);
A user-defined thread structure is used in this code. The context is obtained so we know which instruction is being executed. The disasm function call is published in the Dr. Watson source code and can easily be incorporated into your project. We encourage you to locate the source code to Dr. Watson to add the relevant disassembly functionality. Alternatively, there are other open-source disassemblers available that provide similar functionality.