Why Move to 64-Bit?
One significant motivation for porting your code to 64-bit might be simply keeping up with the times. In addition to the other advantages I mentioned earlier, however, 64-bit mode provides more memory, which may result in faster running time, particularly if your application has a significant number-crunching element.
Let's look now at some of the issues that must be tackled in moving from 32-bit to 64-bit C/C++ code. Perhaps less well-known among the merits and demerits of 64-bit code is the topic of data alignment. Alignment can cause some very knotty problems for the unwary, so that's a good place to start.
Alignment Issues
Alignment dictates the layout of data in memory. To illustrate alignment, imagine you have a simple structure such as this:
// #pragma pack (1) struct AlignSample { unsigned size; void* pointer; } object;
Notice that I've commented out the pragma statement. This preprocessor statement instructs the compiler to enforce a given alignment in the structure. Packing can be important in situations where space is at a premium; for example, in certain specialized networking environments.
In this particular case, the instruction requests no byte-boundary padding in the structure. We'll uncomment the pragma shortly to see its effects. Now we want to investigate size and alignment as they relate to this super-simple structure. Let's do this in a generic way, using a simple template and a function, as follows:
template <typename T> void sizeAndAlignment() { printf(“Size of void %lu\n”, sizeof(T)); printf(“Alignment of type %s %lu\n”, typeid(T).name(), __alignof(T)); }
This code used run-time type information, also known as run-time type identification (RTTI), with typeid(T).name() to provide some generic data type details.
RTTI is a C++ language add-on that can be used to acquire runtime information about data. RTTI can be used on simple data types, such as integers and Booleans, or (as in this case) on generic types.
Running the C++ Code
Now that we have a generic function, we can call it for a variety of data types. This means that we don't need a separate function for each data type of interestthe generic function infers the correct type for us, which saves us some coding. Let's try it for the AlignSample structure shown earlier:
sizeAndAlignment<AlignSample>();
Here's the program output:
Size of type 16 Alignment of type 11AlignSample 8
In this case, we get to see the size (16 bytes), the inferred data type (11AlignSample), and the alignment requirement (8-byte boundary).
I now uncomment the pragma statement above the structure definition, which produces the following code:
pragma pack (1) struct AlignSample { unsigned size; void* pointer; } object;
Re-running the program produces this output:
Size of type 12 Alignment of type 11AlignSample 1
Notice that the now-packed structure size has shrunk from 16 to 12 bytes. Also, the alignment is now on a byte boundary. Clearly, this example modifies the alignment by using packing. It's worth noting that alignment can differ for specific data types when moving from 32-bit to 64-bit. For example, a pointer to void on a 32-bit processor has a size of 4 bytes and alignment of 4 bytes, whereas on a 64-bit platform, the same type has a size of 8 bytes and alignment of 8 bytes.
So a potential porting concern arises with any C/C++ code that makes assumptions about the location of such data in memory. This rule also applies to code that makes assumptions about the memory size of such structures. Watch out for alignment issuesthey can be really hard to find and fix.
Pointer arithmetic is an allied area of concern that typically receives a lot more coverage online than alignment. Let's have a look at this issue next.
Pointer Arithmetic
The general rule for pointers is that if you increment the pointer by 1, then the associated address will be increased by the size of the underlying data type. As an example, on a 64-bit machine, an integer type is 4 bytes. So, by definition, incrementing an integer pointer increases the address by 4 bytesthe size of the underlying type. Let's see this principle in action with the following code:
int a[5]; a[0] = 0; a[1] = 1; a[2] = 2; a[3] = 3; a[4] = 4; int* b = &a[2]; // Point to the third element in the array, value = 2 b++; // Increment the pointer, value now = 3 printf(“Value of *b = %d\n”, *b); printf(“Sizeof int %d\n”, sizeof(int)); printf(“Pointer size %d\n”, sizeof(&a[4])); printf(“Pointer difference %d\n”, &a[4] - &a[3]); printf(“Alignment of type %lu\n”, __alignof(a[0]));
The above code produces this annotated output:
Value of *b = 3 ------> Notice the increment operation to b, which points b at a[3] Sizeof int 4 ----------> The underlying data type is 4 bytes Pointer size 8 --------> This would be 4 on a 32-bit platform Pointer difference 1 ----> Subtracting adjacent pointers produces a simple index value Alignment of type 4 ----> Each array element is aligned on a 4-byte boundary
The increment operation allows us to move the pointer along to whatever array element is of interest to us. The pointer size is 8 bytes, and subtracting two pointers within an array gives us what might be called a “dimensionless” data-item view. Subtracting the address of element 3 from element 4 gives us the value 1; that is, one underlying data item.
Pointer Problems and the Strict Aliasing Rules
It's worth pointing out here that the strict aliasing rules make it illegal to access an object through a pointer of a different type. This is a fundamental tenet of C programming. However, access through a char* is allowed, as is a cast from a void pointer. These rules allow for the compiler to assume that pointers to different types do not point to the same memory. While the strict aliasing rules are handy in helping us avoid using two pointers to access the same memory, they don't stop us making that all-too-common error of overrunning the end of an array. What happens if you violate strict aliasing rules? Well, it's undefined; in other words, anything can happen.