Writing Insecure C, Part 3
- Buffers and Strings
- When It All Goes Wrong
- Dropping Privileges
- Racing the Kernel
- Conclusions
Part 1 and Part 2 of this series pointed out some simple ways of writing security holes in C code, and discussed how to avoid them. This article concludes the series with a look at the most common source of security issues in C code—buffer handling—and some slightly more advanced techniques for introducing and avoiding security problems.
Buffers and Strings
Strings in C are a perennial cause of problems. When C was created, two concepts competed about the best way to implement strings—now referred to as C strings and Pascal strings for the two languages that made these ideas popular. Languages like Lisp used a third implementation; strings were linked lists of characters (Erlang still uses this model).
Lisp-style strings have an obvious disadvantage. Every character needs one byte to store the character and four or eight bytes to store the address of the next one—up to nine bytes to store a single byte of data. This structure is far from ideal, but it makes splitting and concatenating strings very easy.
More advanced models treat strings as linked lists of arrays of characters, allowing them to be joined easily.
All of these models can be (and have been) implemented in C, but the standard string functions still work on arrays of bytes.
Most of the "classic" string functions are basically impossible to use safely. (For this reason, the OpenBSD linker conveniently emits a warning when you use any of them.) The canonical example of a bad function is strcat(), which takes two pointers to C strings. The function scans along the first string until it finds a null terminator; it writes bytes from the second string there until it gets to the terminator in the second string. The caller has to make sure that enough space exists in the first string to store the second one.
A newer function, strncat(), was introduced to make this practice safer. This function takes the amount of space in the first string as a third argument. It ensures that the function never runs off the end of the first string, but introduces a new problem: The function returns the new string as its result, so you can't easily test whether it has truncated the result. A big problem if you're concatenating parts of a passphrase, for example.
OpenBSD introduced strlcat, which is similar to strncat but returns the sum of both inputs. If the result of the function is greater than the third argument, truncation has occurred. This function can be found in the libc of each member of the BSD family (including Darwin/OS X), but not in glibc—because it's "inefficient BSD crap," according to the glibc maintainer. Fortunately, the BSD license allows you to copy the function from a BSD libc into your own code without problems.
The problems with strings in C are largely due to the fact that strings are just arrays, and arrays aren't bounds-checked. As such, most problems that affect strings affect arbitrary buffers.
One of the nastiest things in C99 is the design of variable-length arrays, which let you allocate small, dynamically sized arrays on the stack. You could always do this using alloca(), although the quality of alloca() implementations varies between platforms. The following are roughly equivalent:
int *a = alloca(sizeof(int) * n); int a[n];
The difference is what happens if insufficient space exists to grow the stack to fit n integers. The first line will set a to NULL—irritating, but a condition you can check for, and one that causes an easy-to-debug crash if you just access the start of the array. In the second line, if there's not enough stack space, a will point to... somewhere. Exactly where is entirely implementation-dependent. Therefore, if you use C99 variable-length arrays, it's impossible to check for stack overflows. In most cases, this isn't a problem. Small allocations are pretty much guaranteed to work, but if an attacker can influence the size of n you may end up with an array pointing nowhere.
This kind of thing is a serious problem due to the way in which the stack is traditionally implemented. In general, the "bottom" of the stack is the top of the process's memory, and it grows downward. If you have an array on the stack and go over the end, you're writing over the caller's stack frame. Worse, you're also overwriting the return address. If you use something like strcat() with the destination string on the stack, it's very easy to overwrite the return address, allowing an attacker to control where execution jumps after the function returns.
This issue is mitigated on modern operating systems (they check the integrity of the return address and kill the process if it's invalid), but it's still worth avoiding. Crashes are better than remote exploits, but aren't nearly as good as valid code.