Variables
A variable is a name for some bytes of memory in a program. When you assign a value to a variable, what you are really doing is storing that value in those bytes. Variables in a computer language are like the nouns in a natural language. They represent items or quantities in the problem space of your program.
C requires that you tell the compiler about any variables that you are going to use by declaring them. A variable declaration has the form
variabletype name;
C allows multiple variables to be declared in a single declaration:
variabletype name1, name2, name3;
A variable declaration causes the compiler to reserve storage (memory) for that variable. The value of a variable is the contents of its memory location. The next chapter describes variable declarations in more detail. It covers where variable declarations are placed, where the variables are created in memory, and the lifetimes of different classes of variables.
Integer Types
C provides the following types to hold integers: char, short, int, long, and long long. Table 1.1 shows the size in bytes of the integer types on 32- and 64-bit executables on Apple systems.
Table 1.1. The Sizes of Integer Types on iOS and Mac OS X
Type |
32-Bit |
64-Bit |
char |
1 byte |
1 byte |
short |
2 bytes |
2 bytes |
int |
4 bytes |
4 bytes |
long |
4 bytes |
8 bytes |
long long |
8 bytes |
8 bytes |
The char type is named char because it was originally intended to hold characters, but it is frequently used as an 8-bit integer type.
An integer type can be declared to be unsigned:
unsigned char a; unsigned short b; unsigned int c; unsigned long d; unsigned long long e;
When used alone, unsigned is taken to mean unsigned int:
unsigned a; // a is an unsigned int
An unsigned variable’s bit pattern is always interpreted as a positive number. If you assign a negative quantity to an unsigned variable, the result is a very large positive number. This is almost always a mistake.
Floating-Point Types
C’s floating-point types are float, double, and long double. The sizes of the floating-point types are the same in both 32- and 64-bit executables:
float aFloat; // floats are 4 bytes double aDouble; // doubles are 8 bytes long double aLongDouble; // long doubles are 16 bytes
Floating-point values are always signed.
Truth Values
Ordinary expressions are commonly used for truth values. Expressions that evaluate to zero are considered false, and expressions that evaluate to non-zero are considered true (see the following sidebar).
Initialization
Variables can be initialized when they are declared:
int a = 9; int b = 2*4; float c = 3.14159; char d = 'a';
A character enclosed in single quote marks is a character constant. It is numerically equal to the encoding value of the character. Here, the variable d has the numeric value of 97, which is the ASCII value of the character a.
Pointers
A pointer is a variable whose value is a memory address. It “points” to a location in memory.
You declare a variable to be a pointer by preceding the variable name with an * in the declaration. The following code declares pointerVar to be a variable pointing to a location in memory that holds an integer:
int *pointerVar;
The unary & operator (“address-of” operator) is used to get the address of a variable so it can be stored in a pointer variable. The following code sets the value of the pointer variable b to be the address of the integer variable a:
1 int a = 9; 2 3 int *b; 4 5 b = &a;
Now let’s take a look at that example line by line:
- Line 1 declares a to be an int variable. The compiler reserves 4 bytes of storage for a and initializes them with a value of 9.
- Line 3 declares b to be a pointer to an int.
- Line 5 uses the & operator to get the address of a and then assigns a’s address as the value of b.
Figure 1.1 illustrates the process. (Assume that the compiler has located a beginning at memory address 1048880.) The arrow in the figure shows the concept of pointing.
Figure 1.1. Pointer variables
The unary * operator (called the “contents of” or “dereferencing” operator) is used to set or retrieve the contents of a memory location by using a pointer variable that points to that location. One way to think of this is to consider the expression *pointerVar to be an alias, another name, for whatever memory location is stored in the contents of pointerVar. The expression *pointerVar can be used to either set or retrieve the contents of that memory location. In the following code, b is set to the address of a, so *b becomes an alias for a:
int a; int c; int *b; a = 9; b = &a; c = *b; // c is now 9 *b = 10; // a is now 10
Pointers are used in C to reference dynamically allocated memory (see Chapter 2, “More about C Variables”). Pointers are also used to avoid copying large chunks of memory, such as arrays and structures (discussed later in this chapter), from one part of a program to another. For example, instead of passing a large structure to a function, you pass the function a pointer to the structure. The function then uses the pointer to access the structure. As you will see later in the book, Objective-C objects are always referenced by pointer.
Generic Pointers
A variable declared as a pointer to void is a generic pointer:
void *genericPointer;A generic pointer may be set to the address of any variable type:
int a = 9; void *genericPointer; genericPointer = &a;
However, trying to obtain a value from a generic pointer is an error because the compiler has no way of knowing how to interpret the bytes at the address indicated by the generic pointer:
int a = 9; int b; void *genericPointer; genericPointer = &a; b = *genericPointer; // WRONG - won't compile
To obtain a value through a void* pointer, you must cast it to a pointer to a known type:
int a = 9; int b; void *genericPointer; genericPointer = &a; b = *((int*) genericPointer) ; // OK - b is now 9
The cast operator (int*) forces the compiler to consider genericPointer to be a pointer to an integer. (See Conversion and Casting later in the chapter.)
C does not check to see that a pointer variable points to a valid area of memory. Incorrect use of pointers has probably caused more crashes than any other aspect of C programming.
Arrays
A C array is an ordered collection of elements of the same type. C arrays are declared by adding the number of elements in the array, enclosed in square brackets ([]), to the declaration, after the type and array name:
int a[100];
Individual elements of the array are accessed by placing the index of the element in [] after the array name:
a[6] = 9;
The index is zero-based. In the previous example, the legitimate indices run from 0 to 99. Access to C arrays is not bounds checked on either end. C will blithely let you do the following:
int a[100]; a[200] = 25; a[-100] = 30;
Using an index outside of the array’s bounds lets you trash memory belonging to other variables, resulting in either crashes or corrupted data. Taking advantage of this lack of checking is one of the pillars of mischievous malware.
The bracket notation is just a nicer syntax for pointer arithmetic. The name of an array, without the array brackets, is a pointer variable pointing to the beginning of the array. These two lines are completely equivalent:
a[6] = 9; *(a + 6) = 9;
When compiling an expression using pointer arithmetic, the compiler takes into account the size of the type the pointer is pointing to. If a is an array of int, the expression *(a + 2) refers to the contents of the 4 bytes (one int worth) of memory at an address 8 bytes (two int) beyond the beginning of the array a. However, if a is an array of char, the expression *(a + 2) refers to the contents of 1 byte (one char worth) of memory at an address 2 bytes (two char) beyond the beginning of the array a.
Multidimensional Arrays
Multidimensional arrays are declared as follows:
int b[4][10];
Multidimensional arrays are stored linearly in memory by rows. Here, b[0][0] is the first element, b[0][1] is the second element, and b[1][0] is the eleventh element.
Using pointer notation:
b[i][j]
may be written as
*(b + i*10 + j)
Strings
A C string is a one-dimensional array of bytes (type char) terminated by a zero byte. A constant C string is coded by placing the characters of the string between double quote marks (""):
"A constant string"When the compiler creates a constant string in memory, it automatically adds the zero byte at the end. But if you declare an array of char that will be used to hold a string, you must remember to include the zero byte when deciding how much space you need. The following line of code copies the five characters of the constant string "Hello" and its terminating zero byte to the array aString:
char aString[6] = "Hello";As with any other array, arrays representing strings are not bounds checked. Over-running string buffers used for program input is a favorite trick of hackers.
A variable of type char* can be initialized to point to a constant string. You can set such a variable to point at a different string, but you can’t use it to modify a constant string:
char *aString = "Hello"; aString = "World"; aString[4] = 'q'; // WRONG - causes a crash, "World" is a constant
The first line points aString at the constant string "Hello". The second line changes aString to point at the constant string "World". The third line causes a crash, because constant strings are stored in a region of protected, read-only memory.
Structures
A structure is a collection of related variables that can be referred to as a single entity. The following is an example of a structure declaration:
struct dailyTemperatures { float high; float low; int year; int dayOfYear; };
The individual variables in a structure are called member variables or just members for short. The name following the keyword struct is a structure tag. A structure tag identifies the structure. It can be used to declare variables typed to the structure:
struct dailyTemperatures today; struct dailyTemperatures *todayPtr;
In the preceding example, today is a dailyTemperatures structure, whereas todayPtr is a pointer to a dailyTemperatures structure.
The dot operator (.) is used to access individual members of a structure from a structure variable. The pointer operator (->) is used to access structure members from a variable that is a pointer to a structure:
todayPtr = &today; today.high = 68.0; todayPtr->high = 68.0;
The last two statements do the same thing.
Structures can have other structures as members. The previous example could have been written like this:
struct hiLow { float high; float low; }; struct dailyTemperatures { struct hiLow tempExtremes; int year; int dayOfYear; };
Setting the high temperature for today would then look like this:
struct dailyTemperatures today; today.tempExtremes.high = 68.0;
typedef
The typedef declaration provides a means for creating aliases for variable types:
typedef float Temperature;
Temperature can now be used to declare variables, just as if it were one of the built-in types:
Temperature high, low;
typedefs just provide alternate names for variable types. Here, high and low are still floats. The term typedef is often used as a verb when talking about C code, as in “Temperature is typedef’d to float.”
Enumeration Constants
An enum statement lets you define a set of integer constants:
enum woodwind { oboe, flute, clarinet, bassoon };
The result of the previous statement is that oboe, flute, clarinet, and bassoon are constants with values of 0, 1, 2, and 3, respectively.
If you don’t like going in order from zero, you can assign the values of the constant yourself. Any constant without an assignment has a value one higher than the previous constant:
enum woodwind { oboe=100, flute=150, clarinet, bassoon=200 };
The preceding statement makes oboe, flute, clarinet, and bassoon equal to 100, 150, 151, and 200, respectively.
The name after the keyword enum is called an enumeration tag. Enumeration tags are optional. Enumeration tags can be used to declare variables:
enum woodwind soloist; soloist = oboe;
Enumerations are useful for defining multiple constants, and for helping to make your code self-documenting, but they aren’t distinct types and they don’t receive much support from the compiler. The declaration enum woodwind soloist; shows your intent that soloist should be restricted to one of oboe, flute, clarinet, or bassoon, but unfortunately, the compiler does nothing to enforce the restriction. The compiler considers soloist to be an int, and it lets you assign any integer value to soloist without generating a warning:
enum woodwind { oboe, flute, clarinet, bassoon }; enum woodwind soloist; soloist = 5280; // No complaint from the compiler!