Pointer Arithmetic
Pointers are usually the first major hurdle that beginning C programmers encounter, as they can prove quite difficult to understand. The rules involving pointer arithmetic, dereferencing and indirection, pass-by-value semantics, pointer operator precedence, and pseudo-equivalence with arrays can be challenging to learn. The following sections focus on a few aspects of pointer arithmetic that might catch developers by surprise and lead to possible security exposures.
Pointer Overview
You know that a pointer is essentially a location in memory—an address—so it's a data type that's necessarily implementation dependent. You could have strikingly different pointer representations on different architectures, and pointers could be implemented in different fashions even on the 32-bit Intel architecture. For example, you could have 16-bit code, or even a compiler that transparently supported custom virtual memory schemes involving segments. So assume this discussion uses the common architecture of GCC or vc++ compilers for userland code on Intel machines.
You know that pointers probably have to be unsigned integers because valid virtual memory addresses can range from 0x0 to 0xffffffff. That said, it seems slightly odd when you subtract two pointers. Wouldn't a pointer need to somehow represent negative values as well? It turns out that the result of the subtraction isn't a pointer at all; instead, it's a signed integer type known as a ptrdiff_t.
Pointers can be freely converted into integers and into pointers of other types with the use of casts. However, the compiler makes no guarantee that the resulting pointer or integer is correctly aligned or points to a valid object. Therefore, pointers are one of the more implementation-dependent portions of the C language.
Pointer Arithmetic Overview
When you do arithmetic with a pointer, what occurs? Here's a simple example of adding 1 to a pointer:
short *j; j=(short *)0x1234; j = j + 1;
This code has a pointer to a short named j. It's initialized to an arbitrary fixed address, 0x1234. This is bad C code, but it serves to get the point across. As mentioned previously, you can treat pointers and integers interchangeably as long you use casts, but the results depend on the implementation. You might assume that after you add 1 to j, j is equal to 0x1235. However, as you probably know, this isn't what happens. j is actually 0x1236.
When C does arithmetic involving a pointer, it does the operation relative to the size of the pointer's target. So when you add 1 to a pointer to an object, the result is a pointer to the next object of that size in memory. In this example, the object is a short integer, which takes up 2 bytes (on the 32-bit Intel architecture), so the short following 0x1234 in memory is at location 0x1236. If you subtract 1, the result is the address of the short before the one at 0x1234, which is 0x1232. If you add 5, you get the address 0x123e, which is the fifth short past the one at 0x1234.
Another way to think of it is that a pointer to an object is treated as an array composed of one element of that object. So j, a pointer to a short, is treated like the array short j[1], which contains one short. Therefore, j + 2 would be equivalent to &j[2]. Table 6-11 shows this concept.
Table 6-11. Pointer Arithmetic and Memory
Pointer Expression |
Array Expression |
Address |
j - 2 |
&j[-2] |
0x1230 |
0x1231 |
||
j - 1 |
&j[-1] |
0x1232 |
0x1233 |
||
j |
j or &j[0] |
0x1234 |
0x1235 |
||
j + 1 |
&j[1] |
0x1236 |
0x1237 |
||
j + 2 |
&j[2] |
0x1238 |
0x1239 |
||
j + 3 |
&j[3] |
0x123a |
0x123b |
||
j + 4 |
&j[4] |
0x123c |
0x123d |
||
j + 5 |
&j[5] |
0x123e |
0x123f |
Now look at the details of the important pointer arithmetic operators, covered in the following sections.
Addition
The rules for pointer addition are slightly more restrictive than you might expect. You can add an integer type to a pointer type or a pointer type to an integer type, but you can't add a pointer type to a pointer type. This makes sense when you consider what pointer addition actually does; the compiler wouldn't know which pointer to use as the base type and which to use as an index. For example, look at the following operation:
unsigned short *j; unsigned long *k; x = j+k;
This operation would be invalid because the compiler wouldn't know how to convert j or k into an index for the pointer arithmetic. You could certainly cast j or k into an integer, but the result would be unexpected, and it's unlikely someone would do this intentionally.
One interesting rule of C is that the subscript operator falls under the category of pointer addition. The C standard states that the subscript operator is equivalent to an expression involving addition in the following way:
E1[E2] is equivalent to (*((E1)+(E2)))
With this in mind, look at the following example:
char b[10]; b[4]='a';
The expression b[4] refers to the fifth object in the b character array. According to the rule, here's the equivalent way of writing it:
(*((b)+(4)))='a';
You know from your earlier analysis that b + 4, with b of type pointer to char, is the same as saying &b[4]; therefore, the expression would be like saying (*(&b[4])) or b[4].
Finally, note that the resulting type of the addition between an integer and a pointer is the type of the pointer.
Subtraction
Subtraction has similar rules to addition, except subtracting one pointer from another is permissible. When you subtract a pointer from a pointer of the same type, you're asking for the difference in the subscripts of the two elements. In this case, the resulting type isn't a pointer but a ptrdiff_t, which is a signed integer type. The C standard indicates it should be defined in the stddef.h header file.
Comparison
Comparison between pointers works as you might expect. They consider the relative locations of the two pointers in the virtual address space. The resulting type is the same as with other comparisons: an integer type containing a 1 or 0.
Conditional Operator
The conditional operator (?) can have pointers as its last two operands, and it has to reconcile their types much as it does when used with arithmetic operands. It does this by applying all qualifiers either pointer type has to the resulting type.
Vulnerabilities
Few vulnerabilities involving pointer arithmetic have been widely publicized, at least in the sense being described here. Plenty of vulnerabilities that involve manipulation of character pointers essentially boil down to miscounting buffer sizes, and although they technically qualify as pointer arithmetic errors, they aren't as subtle as pointer vulnerabilities can get. The more pernicious form of problems are those in which developers mistakenly perform arithmetic on pointers without realizing that their integer operands are being scaled by the size of the pointer's target. Consider the following code:
int buf[1024]; int *b=buf; while (havedata() && b < buf + sizeof(buf)) { *b++=parseint(getdata()); }
The intent of b < buf + sizeof(buf) is to prevent b from advancing past buf[1023]. However, it actually prevents b from advancing past buf[4092]. Therefore, this code is potentially vulnerable to a fairly straightforward buffer overflow.
Listing 6-29 allocates a buffer and then copies the first path component from the argument string into the buffer. There's a length check protecting the wcscat function from overflowing the allocated buffer, but it's constructed incorrectly. Because the strings are wide characters, the pointer subtraction done to check the size of the input (sep - string) returns the difference of the two pointers in wide characters—that is, the difference between the two pointers in bytes divided by 2. Therefore, this length check succeeds as long as (sep – string) contains less than (MAXCHARS * 2) wide characters, which could be twice as much space as the allocated buffer can hold.
Listing 6-29. Pointer Arithmetic Vulnerability Example
wchar_t *copy_data(wchar_t *string) { wchar *sep, *new; int size = MAXCHARS * sizeof(wchar); new = (wchar *)xmalloc(size); *new = '\0'; if(*string != '/'){ wcscpy(new, "/"); size -= sizeof(wchar_t); } sep = wstrchr(string, '/'); if(!sep) sep = string + wcslen(string); if(sep - string >= (size – sizeof(wchar_t)) { free(new); die("too much data"); } *sep = '\0'; wcscat(new, string); return new; }