2.4 Process Memory Organization
A process is a name given to a program instance that has been loaded into memory and managed by the operating system.
Process memory is generally organized into code, data, heap, and stack segments, as shown in Figure 2–12 (a). The code or text segment includes instructions and read-only data. It can be marked read only so that modifying memory in the code section results in faults.4 The data segment contains initialized data, uninitialized data, static variables, and global variables. The heap
Figure 2–12. Process memory organization
is used for dynamically allocating process memory. The stack is a last-in, first-out (LIFO) data structure used to support process execution.
The exact organization of process memory depends on the operating system, compiler, linker, and loader. Figure 2–12 (b) and (c) show possible process memory organization under UNIX and Win32.
Stack Management
The stack supports program execution by maintaining automatic process-state data. If the main routine of a program, for example, invokes function A, which in turn invokes function B, function B will eventually return control to function A, which in turn will return control to the main routine. To return control to the proper location, the sequence of return addresses must be stored. A stack is well suited for maintaining this information because it is a dynamic LIFO data structure that can support any level of nesting within memory constraints. When a subroutine is called, the address of the next instruction to execute in the calling routine is pushed onto the stack. When the subroutine returns, this return address is popped from the stack and program execution jumps to the specified location. The information maintained in the stack reflects the execution state of the process at any given instant.
In addition to the return address, the stack is used to store the arguments to the subroutine, as well as local (or automatic) variables. Information pushed onto the stack as a result of a function call is called a frame. The address of the current frame is stored in the frame or base pointer register. On Intel architectures, the ebp5 register is used for this purpose. The frame pointer is used as a fixed point of reference within the stack. When a subroutine is called, the frame pointer for the calling routine is also pushed onto the stack so that it can be restored when the subroutine exits.
Figure 2–13 shows the disassembly of a call to foo(MyInt, MyStrPtr) from Visual C++. The invocation consists of three steps.
- The second argument is mustered in the eax register and pushed on the stack (lines 1 and 2). Notice how these mov instructions use the ebp register to reference arguments and local variables on the stack.
- The first argument is moved into the ecx register and pushed on the stack (lines 3 and 4).
- The call instruction pushes a return address (the address of the instruction following the call statement) onto the stack and transfers control to the foo() function (line 5).
void foo(int, char *); //function prototype int main(int argc, char *argv[]) { int MyInt=1; // stack variable located at ebp-8 char *MyStrPtr="MyString"; // stack var at ebp-4 ... foo(MyInt,MyStrPtr); // call foo function 1. mov eax, [ebp-4] 2. push eax 3. mov ecx, [ebp-8] 4. push ecx 5. call foo 6. add esp, 8 ... }
Figure 2–13. Disassembly of a function call
When control is returned to the return address, the stack pointer (SP)6 is incremented by eight bytes (line 6). The SP points to the top of the stack. The direction the stack grows depends on the implementation of the pop and push instructions for that architecture (that is, they either increment or decrement the SP). For many popular architectures, including IA-32, SPARC, and MIPS processors, the stack grows toward lower memory. On these architectures, incrementing the stack pointer is equivalent to popping the stack.
Figure 2–14 shows the function prolog, instructions that are executed for the function call on invocation. The push instruction (line 1) pushes the ebp
void foo (int i, char *name) { char LocalChar[24]; int LocalInt; 1. push ebp 2. mov ebp, esp 3. sub esp, 28 ...
Figure 2–14. Disassembly of a function prolog
... return; 1. mov esp, ebp 2. pop ebp 3. ret}
Figure 2–15. Disassembly of a function epilog
register containing the pointer to the current stack frame onto the stack. The mov instruction (line 2) sets the frame pointer for the function (the ebp register) to the current stack pointer. On line 3, the function allocates a total of 28 bytes of space on the stack for local variables (24 bytes for LocalChar and 4 bytes for LocalInt).
Figure 2–15 shows the assembly language instructions used to return from the foo() function. These instructions can be viewed as the inverse of the invocation sequence shown in Figure 2–14. The stack pointer (esp) is restored from the frame pointer (ebp) (line 1). The original ebp is popped from the stack (line 2). The ret instruction pops a return address off the stack and transfers control to that location (line 3).
Table 2–1 shows a sample stack frame foo() that takes two arguments and contains four local variables. The low memory is at the top of the stack, so in this illustration the stack grows toward lower memory.
Table 2–1. Sample Stack Frame
foo(int i, char* name) |
|||
Address |
Value |
Description |
Len |
0x0012FF4C |
? |
Last Local Variable - Integer - LocalInt |
4 |
0x0012FF50 |
? |
First Local Variable - String - LocalChar |
24 |
0x0012FF68 |
0x12FF80 |
Calling Frame of Calling Function: main() |
4 |
0x0012FF6C |
0x401040 |
Return Address of Calling Function: main() |
4 |
0x0012FF70 |
1 |
Arg: 1st argument: MyInt (int) |
4 |
0x0012FF74 |
0x40703C |
Arg: 2nd argument: Pointer toMyString (char *) |
4 |