- Introduction
- C Language Background
- Data Storage Overview
- Arithmetic Boundary Conditions
- Type Conversions
- Type Conversion Vulnerabilities
- Operators
- Pointer Arithmetic
- Other C Nuances
- Summary
Arithmetic Boundary Conditions
You've learned that C's basic integer types have minimum and maximum possible values determined by their underlying representation in memory. (Typical ranges for 32-bit twos complement architectures were presented in Table 6-2.) So, now you can explore what can happen when you attempt to traverse these boundaries. Simple arithmetic on a variable, such as addition, subtraction, or multiplication, can result in a value that can't be held in that variable. Take a look at this example:
unsigned int a; a=0xe0000020; a=a+0x20000020;
You know that a can hold a value of 0xE0000020 without a problem; Table 6-2 lists the maximum value of an unsigned 32-bit variable as 4,294,967,295, or 0xFFFFFFFF. However, when 0x20000020 is added to 0xE0000000, the result, 0x100000040, can't be held in a. When an arithmetic operation results in a value higher than the maximum possible representable value, it's called a numeric overflow condition.
Here's a slightly different example:
unsigned int a; a=0; a=a-1;
The programmer subtracts 1 from a, which has an initial value of 0. The resulting value, -1, can't be held in a because it's below the minimum possible value of 0. This result is known as a numeric underflow condition.
Although these conditions might seem as though they would be infrequent or inconsequential in real code, they actually occur quite often, and their impact can be quite severe from a security perspective. The incorrect result of an arithmetic operation can undermine the application's integrity and often result in a compromise of its security. A numeric overflow or underflow that occurs early in a block of code can lead to a subtle series of cascading faults; not only is the result of a single arithmetic operation tainted, but every subsequent operation using that tainted result introduces a point where an attacker might have unexpected influence.
In the following sections, you look at arithmetic boundary conditions affecting unsigned integers and then examine signed integers.
Unsigned Integer Boundaries
Unsigned integers are defined in the C specification as being subject to the rules of modular arithmetic (see the "Modular Arithmetic" sidebar). For an unsigned integer that uses X bits of storage, arithmetic on that integer is performed modulo 2X. For example, arithmetic on a 8-bit unsigned integer is performed modulo 28, or modulo 256. Take another look at this simple expression:
unsigned int a; a=0xE0000020; a=a+0x20000020;
The addition is performed modulo 232, or modulo 4,294,967,296 (0x100000000). The result of the addition is 0x40, which is (0xE0000020 + 0x20000020) modulo 0x100000000.
Another way to conceptualize it is to consider the extra bits of the result of a numeric overflow as being truncated. If you do the calculation 0xE0000020 + 0x20000020 in binary, you would have the following:
1110 0000 0000 0000 0000 0000 0010 0000 + 0010 0000 0000 0000 0000 0000 0010 0000 = 1 0000 0000 0000 0000 0000 0000 0100 0000
The result you actually get in a is 0x40, which has a binary representation of 0000 0000 0000 0000 0000 0000 0100 0000.
You can see that it's the same as the result of the addition but without the highest bit. This isn't far from what's happening at the machine level. For example, Intel architectures have a carry flag (CF) that holds this highest bit. C doesn't have a mechanism for allowing access to this flag, but depending on the underlying architecture, it could be checked via assembly code.
Here's an example of a numeric overflow condition that occurs because of multiplication:
unsigned int a; a=0xe0000020; a=a*0x42;
Again, the arithmetic is performed modulo 0x100000000. The result of the multiplication is 0xC0000840, which is (0xE0000020 * 0x42) modulo 0x100000000. Here it is in binary:
1110 0000 0000 0000 0000 0000 0010 0000 * 0000 0000 0000 0000 0000 0000 0100 0010 = 11 1001 1100 0000 0000 0000 0000 1000 0100 0000
The result you actually get in a, 0xC0000840, has a binary representation of 1100 0000 0000 0000 0000 1000 0100 0000. Again, you can see how the higher bits that didn't fit into the result were effectively truncated. At a machine level, often it's possible to detect an overflow with integer multiplication as well as recover the high bits of a multiplication. For example, on Intel the imul instruction uses a destination object that's twice the size of the source operands when multiplying, and it sets the flags OF (overflow) and CF (carry) if the result of the multiplication requires a width greater than the source operand. Some code even uses inline assembly to check for numeric overflow (discussed in the "Multiplication Overflows on Intel" sidebar later in this chapter).
You've seen examples of how arithmetic overflows could occur because of addition and multiplication. Another operator that can cause overflows is left shift, which, for this discussion, can be thought of as multiplication with 2. It behaves much the same as multiplication, so an example hasn't been provided.
Now, you can look at some security exposures related to numeric overflow of unsigned integers. Listing 6-2 is a sanitized, edited version of an exploitable condition found recently in a client's code.
Listing 6-2. Integer Overflow Example
u_char *make_table(unsigned int width, unsigned int height, u_char *init_row) { unsigned int n; int i; u_char *buf; n = width * height; buf = (char *)malloc(n); if (!buf) return (NULL); for (i=0; i< height; i++) memcpy(&buf[i*width], init_row, width); return buf; }
The purpose of the make_table() function is to take a width, a height, and an initial row and create a table in memory where each row is initialized to have the same contents as init_row. Assume that users have control over the dimensions of the new table: width and height. If they specify large dimensions, such as a width of 1,000,000, and a height of 3,000, the function attempts to call malloc() for 3,000,000,000 bytes. The allocation likely fails, and the calling function detects the error and handles it gracefully. However, users can cause an arithmetic overflow in the multiplication of width and height if they make the dimensions just a bit larger. This overflow is potentially exploitable because the allocation is done by multiplying width and height, but the actual array initialization is done with a for loop. So if users specify a width of 0x400 and a height of 0x1000001, the result of the multiplication is 0x400000400. This value, modulo 0x100000000, is 0x00000400, or 1024. So 1024 bytes would be allocated, but then the for loop would copy init_row roughly 16 million too many times. A clever attacker might be able to leverage this overflow to take control of the application, depending on the low-level details of the process's runtime environment.
Take a look at a real-world vulnerability that's similar to the previous example, found in the OpenSSH server. Listing 6-3 is from the OpenSSH 3.1 challenge-response authentication code: auth2-chall.c in the input_userauth_info_response() function.
Listing 6-3. Challenge-Response Integer Overflow Example in OpenSSH 3.1
u_int nresp; ... nresp = packet_get_int(); if (nresp > 0) { response = xmalloc(nresp * sizeof(char*)); for (i = 0; i < nresp; i++) response[i] = packet_get_string(NULL); } packet_check_eom();
The nresp unsigned integer is user controlled, and its purpose is to tell the server how many responses to expect. It's used to allocate the response[] array and fill it with network data. During the allocation of the response[] array in the call to xmalloc(), nresp is multiplied by sizeof(char *), which is typically 4 bytes. If users specify an nresp value that's large enough, a numeric overflow could occur, and the result of the multiplication could end up being a small number. For example, if nresp has a value of 0x40000020, the result of the multiplication with 4 is 0x80. Therefore, 0x80 bytes are allocated, but the following for loop attempts to retrieve 0x40000020 strings from the packet! This turned out to be a critical remotely exploitable vulnerability.
Now turn your attention to numeric underflows. With unsigned integers, subtractions can cause a value to wrap under the minimum representable value of 0. The result of an underflow is typically a large positive number because of the modulus nature of unsigned integers. Here's a brief example:
unsigned int a; a=0x10; a=a-0x30;
Look at the calculation in binary:
0000 0000 0000 0000 0000 0000 0001 0000 - 0000 0000 0000 0000 0000 0000 0011 0000 = 1111 1111 1111 1111 1111 1111 1110 0000
The result you get in a is the bit pattern for 0xffffffe0, which in twos complement representation is the correct negative value of -0x20. Recall that in modulus arithmetic, if you advance past the maximum possible value, you wrap around to 0. A similar phenomenon occurs if you go below the minimum possible value: You wrap around to the highest possible value. Since a is an unsigned int type, it has a value of 0xffffffe0 instead of -0x20 after the subtraction. Listing 6-4 is an example of a numeric underflow involving an unsigned integer.
Listing 6-4. Unsigned Integer Underflow Example
struct header { unsigned int length; unsigned int message_type; }; char *read_packet(int sockfd) { int n; unsigned int length; struct header hdr; static char buffer[1024]; if(full_read(sockfd, (void *)&hdr, sizeof(hdr))<=0){ error("full_read: %m"); return NULL; } length = ntohl(hdr.length); if(length > (1024 + sizeof (struct header) - 1)){ error("not enough room in buffer\n"); return NULL; } if(full_read(sockfd, buffer, length – sizeof(struct header))<=0) { error("read: %m"); return NULL; } buffer[sizeof(buffer)-1] = '\0'; return strdup(buffer); }
This code reads a packet header from the network and extracts a 32-bit length field into the length variable. The length variable represents the total number of bytes in the packet, so the program first checks that the data portion of the packet isn't longer than 1024 bytes to prevent an overflow. It then tries to read the rest of the packet from the network by reading (length – sizeof(struct header)) bytes into buffer. This makes sense, as the code wants to read in the packet's data portion, which is the total length minus the length of the header.
The vulnerability is that if users supply a length less than sizeof(struct header), the subtraction of (length – sizeof(struct header)) causes an integer underflow and ends up passing a very large size parameter to full_read(). This error could result in a buffer overflow because at that point, read() would essentially copy data into the buffer until the connection is closed, which would allow attackers to take control of the process.
Signed Integer Boundaries
Signed integers are a slightly different animal. According to the C specifications, the result of an arithmetic overflow or underflow with a signed integer is implementation defined and could potentially include a machine trap or fault. However, on most common architectures, the results of signed arithmetic overflows are well defined and predictable and don't result in any kind of exception. These boundary behaviors are a natural consequence of how twos complement arithmetic is implemented at the hardware level, and they should be consistent on mainstream machines.
If you recall, the maximum positive value that can be represented in a twos complement signed integer is one in which all bits are set to 1 except the most significant bit, which is 0. This is because the highest bit indicates the sign of the number, and a value of 1 in that bit indicates that the number is negative. When an operation on a signed integer causes an arithmetic overflow or underflow to occur, the resulting value "wraps around the sign boundary" and typically causes a change in sign. For example, in a 32-bit integer, the value 0x7FFFFFFF is a large positive number. Adding 1 to it produces the result 0x80000000, which is a large negative number. Take a look at another simple example:
int a; a=0x7FFFFFF0; a=a+0x100;
The result of the addition is -0x7fffff10, or -2,147,483,408. Now look at the calculation in binary:
0111 1111 1111 1111 1111 1111 1111 0000 + 0000 0000 0000 0000 0000 0001 0000 0000 = 1000 0000 0000 0000 0000 0000 1111 0000
The result you get in a is the bit pattern for 0x800000f0, which is the correct result of the addition, but because it's interpreted as a twos complement number, the value is actually interpreted as -0x7fffff10. In this case, a large positive number plus a small positive number resulted in a large negative number.
With signed addition, you can overflow the sign boundary by causing a positive number to wrap around 0x80000000 and become a negative number. You can also underflow the sign boundary by causing a negative number to wrap below 0x80000000 and become a positive number. Subtraction is identical to addition with a negative number, so you can analyze them as being essentially the same operation. Overflows during multiplication and shifting are also possible, and classifying their results isn't as easy. Essentially, the bits fall as they may; if a bit happens to end up in the sign bit of the result, the result is negative. Otherwise, it's not. Arithmetic overflows involving multiplication seem a little tricky at first glance, but attackers can usually make them return useful, targeted values.
Certain unexpected sign changes in arithmetic can lead to subtly exploitable conditions in code. These changes can cause programs to calculate space requirements incorrectly, leading to conditions similar to those that occur when crossing the maximum boundary for unsigned integers. Bugs of this nature typically occur in applications that perform arithmetic on integers taken directly from external sources, such as network data or files. Listing 6-5 is a simple example that shows how crossing the sign boundary can adversely affect an application.
Listing 6-5. Signed Integer Vulnerability Example
char *read_data(int sockfd) { char *buf; int length = network_get_int(sockfd); if(!(buf = (char *)malloc(MAXCHARS))) die("malloc: %m"); if(length < 0 || length + 1 >= MAXCHARS){ free(buf); die("bad length: %d", value); } if(read(sockfd, buf, length) <= 0){ free(buf); die("read: %m"); } buf[value] = '\0'; return buf; }
This example reads an integer from the network and performs some sanity checks on it. First, the length is checked to ensure that it's greater than or equal to zero and, therefore, positive. Then the length is checked to ensure that it's less than MAXCHARS. However, in the second part of the length check, 1 is added to the length. This opens an attack vector: A value of 0x7FFFFFFF passes the first check (because it's greater than 0) and passes the second length check (as 0x7FFFFFFF + 1 is 0x80000000, which is a negative value). read() would then be called with an effectively unbounded length argument, leading to a potential buffer overflow situation.
This kind of mistake is easy to make when dealing with signed integers, and it can be equally challenging to spot. Protocols that allow users to specify integers directly are especially prone to this type of vulnerability. To examine this in practice, take a look at a real application that performs an unsafe calculation. The following vulnerability was in the OpenSSL 0.9.6 codebase related to processing Abstract Syntax Notation (ASN.1) encoded data. (ASN.1 is a language used for describing arbitrary messages to be sent between computers, which are encoded using BER, its basic encoding rules.) This encoding is a perfect candidate for a vulnerability of this nature because the protocol deals explicitly with 32-bit integers supplied by untrusted clients. Listing 6-6 is taken from crypto/asn1/a_d2i_fp.c—the ASN1_d2i_fp() function, which is responsible for reading ASN.1 objects from buffered IO (BIO) streams. This code has been edited for brevity.
Listing 6-6. Integer Sign Boundary Vulnerability Example in OpenSSL 0.9.6l
c.inf=ASN1_get_object(&(c.p),&(c.slen),&(c.tag),&(c.xclass), len-off); ... { /* suck in c.slen bytes of data */ want=(int)c.slen; if (want > (len-off)) { want-=(len-off); if (!BUF_MEM_grow(b,len+want)) { ASN1err(ASN1_F_ASN1_D2I_BIO, ERR_R_MALLOC_FAILURE); goto err; } i=BIO_read(in,&(b->data[len]),want);
This code is called in a loop for retrieving ASN.1 objects. The ASN1_get_object() function reads an object header that specifies the length of the next ASN.1 object. This length is placed in the signed integer c.slen, which is then assigned to want. The ASN.1 object function ensures that this number isn't negative, so the highest value that can be placed in c.slen is 0x7FFFFFFF. At this point, len is the amount of data already read in to memory, and off is the offset in that data to the object being parsed. So, (len-off) is the amount of data read into memory that hasn't yet been processed by the parser. If the code sees that the object is larger than the available unparsed data, it decides to allocate more space and read in the rest of the object.
The BUF_MEM_grow() function is called to allocate the required space in the memory buffer b; its second argument is a size parameter. The problem is that the len+want expression used for the second argument can be overflowed. Say that upon entering this code, len is 200 bytes, and off is 50. The attacker specifies an object size of 0x7FFFFFFF, which ends up in want. 0x7FFFFFFF is certainly larger than the 150 bytes of remaining data in memory, so the allocation code will be entered. want will be subtracted by 150 to reflect the amount of data already read in, giving it a value of 0x7FFFFF69. The call to BUF_MEM_grow() will ask for len+want bytes, or 0x7FFFFF69 + 200. This is 0x80000031, which is interpreted as a large negative number.
Internally, the BUF_MEM_grow() function does a comparison to check its length argument against how much space it has previously allocated. Because a negative number is less than the amount of memory it has already allocated, it assumes everything is fine. So the reallocation is bypassed, and arbitrary amounts of data can be copied into allocated heap data, with severe consequences.