- Introduction to C++ for Java and C# Programmers
- Getting Started with C++
- Main Language Differences
- The Standard C++ Library
Main Language Differences
We will now take a more structured look at the areas where C++ differs from Java and C#. Many of the language differences are due to C++'s compiled nature and commitment to performance. Thus, C++ does not check array bounds at run-time, and there is no garbage collector to reclaim unused dynamically allocated memory.
For the sake of brevity, C++ constructs that are nearly identical to their Java and C# counterparts are not reviewed. In addition, some C++ topics are not covered here because they are not necessary when programming using Qt. Among these are defining template classes and functions, defining union types, and using exceptions. For the whole story, refer to a book such as The C++ Programming Language by Bjarne Stroustrup (Addison-Wesley, 2000) or C++ for Java Programmers by Mark Allen Weiss (Prentice Hall, 2003).
Primitive Data Types
The primitive data types offered by the C++ language are similar to those found in Java or C#. Figure D.2 lists C++'s primitive types and their definitions on the platforms supported by Qt 4.
Table D.2. Primitive C++ types
C++ Type |
Description |
bool |
Boolean value |
char |
8-bit integer |
short |
16-bit integer |
int |
32-bit integer |
long |
32-bit or 64-bit integer |
long long [*] |
64-bit integer |
float |
32-bit floating-point value (IEEE 754) |
double |
64-bit floating-point value (IEEE 754) |
By default, the short, int, long, and long long data types are signed, meaning that they can hold negative values as well as positive values. If we only need to store nonnegative integers, we can put the unsigned keyword in front of the type. Whereas a short can hold any value between -32768 and +32767, an unsigned short goes from 0 to 65535. The right-shift operator >> has unsigned ("fill with 0s") semantics if one of the operands is unsigned.
The bool type can take the values true and false. In addition, numeric types can be used where a bool is expected, with the rule that 0 means false and any non-zero value means true.
The char type is used for storing ASCII characters and 8-bit integers (bytes). When used as an integer, it can be signed or unsigned, depending on the platform. The types signed char and unsigned char are available as unambiguous alternatives to char. Qt provides a QChar type that stores 16-bit Unicode characters.
Instances of built-in types are not initialized by default. When we create an int variable, its value could conceivably be 0, but could just as likely be -209486515. Fortunately, most compilers warn us when we attempt to read the contents of an uninitialized variable, and we can use tools such as Rational PurifyPlus and Valgrind to detect uninitialized memory accesses and other memory-related problems at run-time.
In memory, the numeric types (except long) have identical sizes on the different platforms supported by Qt, but their representation varies depending on the system's byte order. On big-endian architectures (such as PowerPC and SPARC), the 32-bit value 0x12345678 is stored as the four bytes 0x12 0x34 0x56 0x78, whereas on little-endian architectures (such as Intel x86), the byte sequence is reversed. This makes a difference in programs that copy memory areas onto disk or that send binary data over the network. Qt's QDataStream class, presented in Chapter 12, can be used to store binary data in a platform-independent way.
Class Definitions
Class definitions in C++ are similar to those in Java and C#, but there are several differences to be aware of. We will study these differences using a series of examples. Let's start with a class that represent an (x, y) coordinate pair:
#ifndef POINT2D_H #define POINT2D_H class Point2D { public: Point2D() { xVal = 0; yVal = 0; } Point2D(double x, double y) { xVal = x; yVal = y; } void setX(double x) { xVal = x; } void setY(double y) { yVal = y; } double x() const { return xVal; } double y() const { return yVal; } private: double xVal; double yVal; }; #endif
The preceding class definition would appear in a header file, typically called point2d.h. The example exhibits the following C++ idiosyncrasies:
- A class definition is divided into public, protected, and private sections, and ends with a semicolon. If no section is specified, the default is private. (For compatibility with C, C++ provides a struct keyword that is identical to class except that the default is public if no section is specified.)
- The class has two constructors (one that has no parameters and one that has two). If we declared no constructor, C++ would automatically supply one with no parameters and an empty body.
- The getter functions x() and y() are declared to be const. This means that they don't (and can't) modify the member variables or call non-const member functions (such as setX() and setY()).
The preceding functions were implemented inline, as part of the class definition. An alternative is to provide only function prototypes in the header file and to implement the functions in a .cpp file. Using this approach, the header file would look like this:
#ifndef POINT2D_H #define POINT2D_H class Point2D { public: Point2D(); Point2D(double x, double y); void setX(double x); void setY(double y); double x() const; double y() const; private: double xVal; double yVal; }; #endif
The functions would then be implemented in point2d.cpp:
#include "point2d.h" Point2D::Point2D() { xVal = 0.0; yVal = 0.0; } Point2D::Point2D(double x, double y) { xVal = x; yVal = y; } void Point2D::setX(double x) { xVal = x; } void Point2D::setY(double y) { yVal = y; } double Point2D::x() const { return xVal; } double Point2D::y() const { return yVal; }
We start by including point2d.h because the compiler needs the class definition before it can parse member function implementations. Then we implement the functions, prefixing the function name with the class name using the :: operator.
We have seen how to implement a function inline and now how to implement it in a .cpp file. The two approaches are semantically equivalent, but when we call a function that is declared inline, most compilers simply expand the function's body instead of generating an actual function call. This normally leads to faster code, but might increase the size of your application. For this reason, only very short functions should be implemented inline; longer functions should always be implemented in a .cpp file. In addition, if we forget to implement a function and try to call it, the linker will complain about an unresolved symbol.
Now, let's try to use the class.
#include "point2d.h" int main() { Point2D alpha; Point2D beta(0.666, 0.875); alpha.setX(beta.y()); beta.setY(alpha.x()); return 0; }
In C++, variables of any types can be declared directly without using new. The first variable is initialized using the default Point2D constructor (the constructor that has no parameters). The second variable is initialized using the second constructor. Access to an object's member is performed using the . (dot) operator.
Variables declared this way behave like Java/C# primitive types such as int and double. For example, when we use the assignment operator, the contents of the variable are copied—not just a reference to an object. And if we modify a variable later on, any other variables that were assigned from it are left unchanged.
As an object-oriented language, C++ supports inheritance and polymorphism. To illustrate how it works, we will review the example of a Shape abstract base class and a subclass called Circle. Let's start with the base class:
#ifndef SHAPE_H #define SHAPE_H #include "point2d.h" class Shape { public: Shape(Point2D center) { myCenter = center; } virtual void draw() = 0; protected: Point2D myCenter; }; #endif
The definition appears in a header file called shape.h. Since the class definition refers to the Point2D class, we include point2d.h.
The Shape class has no base class. Unlike Java and C#, C++ doesn't provide an Object class from which all classes are implicitly derived. Qt provides QObject as a natural base class for all kinds of objects.
The draw() function declaration has two interesting features: It contains the virtual keyword, and it ends with = 0. The virtual keyword indicates that the function may be reimplemented in subclasses. Like in C#, C++ member functions aren't reimplementable by default. The bizarre = 0 syntax indicates that the function is a pure virtual function—a function that has no default implementation and that must be implemented in subclasses. The concept of an "interface" in Java and C# maps to a class with only pure virtual functions in C++.
Here's the definition of the Circle subclass:
#ifndef CIRCLE_H #define CIRCLE_H #include "shape.h" class Circle : public Shape { public: Circle(Point2D center, double radius = 0.5) : Shape(center) { myRadius = radius; } void draw() { // do something here } private: double myRadius; }; #endif
The Circle class is publicly derived from Shape, meaning that all public members of Shape remain public in Circle. C++ also supports protected and private inheritance, which restrict the access of the base class's public and protected members.
The constructor takes two parameters. The second parameter is optional and takes the value 0.5 if not specified. The constructor passes the center parameter to the base class's constructor using a special syntax between the function signature and the function body. In the body, we initialize the myRadius member variable. We could also have initialized the variable on the same line as the base class constructor initialization:
Circle(Point2D center, double radius = 0.5) : Shape(center), myRadius(radius) { }
On the other hand, C++ doesn't allow us to initialize a member variable in the class definition, so the following code is wrong:
// WON'T COMPILE private: double myRadius = 0.5; };
The draw() function has the same signature as the virtual draw() function declared in Shape. It is a reimplementation and it will be invoked polymorphically when draw() is called on a Circle instance through a Shape reference or pointer. C++ has no override keyword like in C#. Nor does C++ have a super or base keyword that refers to the base class. If we need to call the base implementation of a function, we can prefix the function name with the base class name and the :: operator. For example:
class LabeledCircle : public Circle { public: void draw() { Circle::draw(); drawLabel(); } ... };
C++ supports multiple inheritance, meaning that a class can be derived from several classes at the same time. The syntax is as follows:
class DerivedClass : public BaseClass1, public BaseClass2, ..., public BaseClassN { ... };
By default, functions and variables declared in a class are associated with instances of that class. We can also declare static member functions and static member variables, which can be used without an instance. For example:
#ifndef TRUCK_H #define TRUCK_H class Truck { public: Truck() { ++counter; } ~Truck() { --counter; } static int instanceCount() { return counter; } private: static int counter; }; #endif
The static member variable counter keeps track of how many Truck instances exist at any time. The Truck constructor increments it. The destructor, recognizable by the tilde (~) prefix, decrements it. In C++, the destructor is automatically invoked when a statically allocated variable goes out of scope or when a variable allocated using new is deleted. This is similar to the finalize() method in Java, except that we can rely on it being called at a specific point in time.
A static member variable has a single existence in a class: Such variables are "class variables" rather than "instance variables". Each static member variable must be defined in a .cpp file (but without repeating the static keyword). For example:
#include "truck.h" int Truck::counter = 0;
Failing to do this would result in an "unresolved symbol" error at link time. The instanceCount() static function can be accessed from outside the class, prefixed by the class name. For example:
#include <iostream> #include "truck.h" int main() { Truck truck1; Truck truck2; std::cout << Truck::instanceCount() << " equals 2" << std::endl; return 0; }
Pointers
A pointer in C++ is a variable that stores the memory address of an object (instead of storing the object directly). Java and C# have a similar concept, that of a "reference", but the syntax is different. We will start by studying a contrived example that illustrates pointers in action:
1 #include "point2d.h" 2 int main() 3 { 4 Point2D alpha; 5 Point2D beta; 6 Point2D *ptr; 7 ptr = α 8 ptr->setX(1.0); 9 ptr->setY(2.5); 10 ptr = β 11 ptr->setX(4.0); 12 ptr->setY(4.5); 13 ptr = 0; 14 return 0; 15 }
The example relies on the Point2D class from the previous subsection. Lines 4 and 5 define two objects of type Point2D. These objects are initialized to (0, 0) by the default Point2D constructor.
Line 6 defines a pointer to a Point2D object. The syntax for pointers uses an asterisk in front of the variable name. Since we did not initialize the pointer, it contains a random memory address. This is solved on line 7 by assigning alpha's address to the pointer. The unary & operator returns the memory address of an object. An address is typically a 32-bit or a 64-bit integer value specifying the offset of an object in memory.
On lines 8 and 9, we access the alpha object through the ptr pointer. Because ptr is a pointer and not an object, we must use the -> (arrow) operator instead of the . (dot) operator.
On line 10, we assign beta's address to the pointer. From then on, any operation we perform through the pointer will affect the beta object.
Line 13 sets the pointer to be a null pointer. C++ has no keyword for representing a pointer that does not point to an object; instead, we use the value 0 (or the symbolic constant NULL, which expands to 0). Trying to use a null pointer results in a crash with an error message such as "Segmentation fault", "General protection fault", or "Bus error". Using a debugger, we can find out which line of code caused the crash.
At the end of the function, the alpha object holds the coordinate pair (1.0, 2.5), whereas beta holds (4.0, 4.5).
Pointers are often used to store objects allocated dynamically using new. In C++ jargon, we say that these objects are allocated on the "heap", whereas local variables (variables defined inside a function) are stored on the "stack".
Here's a code snippet that illustrates dynamic memory allocation using new:
#include "point2d.h" int main() { Point2D *point = new Point2D; point->setX(1.0); point->setY(2.5); delete point; return 0; }
The new operator returns the memory address of a newly allocated object. We store the address in a pointer variable and access the object through that pointer. When we are done with the object, we release its memory using the delete operator. Unlike Java and C#, C++ has no garbage collector; dynamically allocated objects must be explicitly released using delete when we don't need them anymore. Chapter 2 describes Qt's parent–child mechanism, which greatly simplifies memory management in C++ programs.
If we forget to call delete, the memory is kept around until the program finishes. This would not be an issue in the preceding example, because we allocate only one object, but in a program that allocates new objects all the time, this could cause the program to keep allocating memory until the machine's memory is exhausted. Once an object is deleted, the pointer variable still holds the address of the object. Such a pointer is a "dangling pointer" and should not be used to access the object. Qt provides a "smart" pointer, QPointer<T>, that automatically sets itself to 0 if the QObject it points to is deleted.
In the preceding example, we invoked the default constructor and called setX() and setY() to initialize the object. We could have used the two-parameter constructor instead:
Point2D *point = new Point2D(1.0, 2.5);
The example didn't require the use of new and delete. We could just as well have allocated the object on the stack as follows:
Point2D point; point.setX(1.0); point.setY(2.5);
Objects allocated like this are automatically freed at the end of the block in which they appear.
If we don't intend to modify the object through the pointer, we can declare the pointer const. For example:
const Point2D *ptr = new Point2D(1.0, 2.5); double x = ptr->x(); double y = ptr->y(); // WON'T COMPILE ptr->setX(4.0); *ptr = Point2D(4.0, 4.5);
The ptr const pointer can be used only to call const member functions such as x() and y(). It is good style to declare pointers const when we don't intend to modify the object using them. Furthermore, if the object itself is const, we have no choice but to use a const pointer to store its address. The use of const provides information to the compiler that can lead to early bug detection and performance gains. C# has a const keyword that is very similar to that of C++. The closest Java equivalent is final, but it only protects variables from assignment, not from calling "non-const" member functions on it.
Pointers can be used with built-in types as well as with classes. In an expression, the unary * operator returns the value of the object associated with the pointer. For example:
int i = 10; int j = 20; int *p = &i; int *q = &j; std::cout << *p << " equals 10" << std::endl; std::cout << *q << " equals 20" << std::endl; *p = 40; std::cout << i << " equals 40" << std::endl; p = q; *p = 100; std::cout << i << " equals 40" << std::endl; std::cout << j << " equals 100" << std::endl;
The -> operator, which can be used to access an object's members through a pointer, is pure syntactic sugar. Instead of ptr->member, we can also write (*ptr).member. The parentheses are necessary because the . (dot) operator has precedence over the unary * operator.
Pointers had a poor reputation in C and C++, to the extent that Java is often advertised as having no pointers. In reality, C++ pointers are conceptually similar to Java and C# references except that we can use pointers to iterate through memory, as we will see later in this section. Furthermore, the inclusion of "copy on write" container classes in Qt, along with C++'s ability to instantiate any class on the stack, means that we can often avoid pointers.
References
In addition to pointers, C++ also supports the concept of a "reference". Like a pointer, a C++ reference stores the address of an object. Here are the main differences:
- References are declared using & instead of *.
- The reference must be initialized and can't be reassigned later.
- The object associated with a reference is directly accessible; there is no special syntax such as * or ->.
- A reference cannot be null.
References are generally used when declaring parameters. For most types, C++ uses call-by-value as its default parameter-passing mechanism, meaning that when an argument is passed to a function, the function receives a brand new copy of the object. Here's the definition of a function that receives its parameters through call-by-value:
#include <cstdlib> double manhattanDistance(Point2D a, Point2D b) { return std::abs(b.x() - a.x()) + std::abs(b.y() - a.y()); }
We would then invoke the function as follows:
Point2D broadway(12.5, 40.0); Point2D harlem(77.5, 50.0); double distance = manhattanDistance(broadway, harlem);
C programmers avoid needless copy operations by declaring their parameters as pointers instead of as values:
double manhattanDistance(const Point2D *ap, const Point2D *bp) { return std::abs(bp->x() - ap->x()) + std::abs(bp->y() - ap->y()); }
They must then pass addresses instead of values when calling the function:
double distance = manhattanDistance(&broadway, &harlem);
C++ introduced references to make the syntax less cumbersome and to prevent the caller from passing a null pointer. If we use references instead of pointers, the function looks like this:
double manhattanDistance(const Point2D &a, const Point2D &b) { return std::abs(b.x() - a.x()) + std::abs(b.y() - a.y()); }
The declaration of a reference is similar to that of a pointer, with & instead of *. But when we actually use the reference, we can forget that it is a memory address and treat it like an ordinary variable. In addition, calling a function that takes references as arguments doesn't require any special care (no & operator).
All in all, by replacing Point2D with const Point2D & in the parameter list, we reduced the overhead of the function call: Instead of copying 256 bits (the size of four doubles), we copy only 64 or 128 bits, depending on the target platform's pointer size.
The previous example used const references, preventing the function from modifying the objects associated with the references. When this kind of side effect is desired, we can pass a non-const reference or pointer. For example:
void transpose(Point2D &point) { double oldX = point.x(); point.setX(point.y()); point.setY(oldX); }
In some cases, we have a reference and we need to call a function that takes a pointer, or vice versa. To convert a reference to a pointer, we can simply use the unary & operator:
Point2D point; Point2D &ref = point; Point2D *ptr = &ref;
To convert a pointer to a reference, there is the unary * operator:
Point2D point; Point2D *ptr = &point; Point2D &ref = *ptr;
References and pointers are represented the same way in memory, and they can often be used interchangeably, which begs the question of when to use which. On the one hand, references have a more convenient syntax; on the other hand, pointers can be reassigned at any time to point to another object, they can hold a null value, and their more explicit syntax is often a blessing in disguise. For these reasons, pointers tend to prevail, with references used almost exclusively for declaring function parameters, in conjunction with const.
Arrays
Arrays in C++ are declared by specifying the number of items in the array within brackets in the variable declaration after the variable name. Two-dimensional arrays are possible using an array of arrays. Here's the definition of a one-dimensional array containing ten items of type int:
int fibonacci[10];
The items are accessible as fibonacci[0], fibonacci[1], ..., fibonacci[9]. Often we want to initialize the array as we define it:
int fibonacci[10] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 };
In such cases, we can then omit the array size, since the compiler can deduce it from the number of initializers:
int fibonacci[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 };
Static initialization also works for complex types, such as Point2D:
Point2D triangle[] = { Point2D(0.0, 0.0), Point2D(1.0, 0.0), Point2D(0.5, 0.866) };
If we have no intention of altering the array later on, we can make it const:
const int fibonacci[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 };
To find out how many items an array contains, we can use the sizeof() operator as follows:
int n = sizeof(fibonacci) / sizeof(fibonacci[0]);
The sizeof() operator returns the size of its argument in bytes. The number of items in an array is its size in bytes divided by the size of one of its items. Because this is cumbersome to type, a common alternative is to declare a constant and to use it for defining the array:
enum { NFibonacci = 10 }; const int fibonacci[NFibonacci] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 };
It would have been tempting to declare the constant as a const int variable. Unfortunately, some compilers have issues with const variables as array size specifiers. We will explain the enum keyword later in this appendix.
Iterating through an array is normally done using an integer. For example:
for (int i = 0; i < NFibonacci; ++i) std::cout << fibonacci[i] << std::endl;
It is also possible to traverse the array using a pointer:
const int *ptr = &fibonacci[0]; while (ptr != &fibonacci[10]) { std::cout << *ptr << std::endl; ++ptr; }
We initialize the pointer with the address of the first item and loop until we reach the "one past the last" item (the "eleventh" item, fibonacci[10]). At each iteration, the ++ operator advances the pointer to the next item.
Instead of &fibonacci[0], we could also have written fibonacci. This is because the name of an array used alone is automatically converted into a pointer to the first item in the array. Similarly, we could substitute fibonacci + 10 for &fibonacci[10]. This works the other way around as well: We can retrieve the contents of the current item using either *ptr or ptr[0] and could access the next item using *(ptr + 1) or ptr[1]. This principle is sometimes called "equivalence of pointers and arrays".
To prevent what it considers to be a gratuitous inefficiency, C++ does not let us pass arrays to functions by value. Instead, they must be passed by address. For example:
#include <iostream> void printIntegerTable(const int *table, int size) { for (int i = 0; i < size; ++i) std::cout << table[i] << std::endl; } int main() { const int fibonacci[10] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 }; printIntegerTable(fibonacci, 10); return 0; }
Ironically, although C++ doesn't give us any choice about whether we want to pass an array by address or by value, it gives us some freedom in the syntax used to declare the parameter type. Instead of const int *table, we could also have written const int table[] to declare a pointer-to-constant-int parameter. Similarly, the argv parameter to main() can be declared as either char *argv[] or char **argv.
To copy an array into another array, one approach is to loop through the array:
const int fibonacci[NFibonacci] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 }; int temp[NFibonacci]; for (int i = 0; i < NFibonacci; ++i) temp[i] = fibonacci[i];
For basic data types such as int, we can also use memcpy(), which copies a block of memory. For example:
std::memcpy(temp, fibonacci, sizeof(fibonacci));
When we declare a C++ array, the size must be a constant. [*] If we want to create an array of a variable size, we have several options.
-
We can dynamically allocate the array:
int *fibonacci = new int[n];
The new [] operator allocates a certain number of items at consecutive memory locations and returns a pointer to the first item. Thanks to the "equivalence of pointers and arrays" principle, the items can be accessed through the pointer as fibonacci[0], fibonacci[1], ..., fibonacci[n - 1]. When we have finished using the array, we should release the memory it consumes using the delete [] operator:
delete [] fibonacci;
-
We can use the standard vector<T> class:
#include <vector> std::vector<int> fibonacci(n);
Items are accessible using the [] operator, just like with a plain C++ array. With vector<T> (where T is the type of the items stored in the vector), we can resize the array at any time using resize() and we can copy it using the assignment operator. Classes that contain angle brackets (<>) in their name are called template classes.
-
We can use Qt's QVector<T> class:
#include <QVector> QVector<int> fibonacci(n);
QVector<T>'s API is very similar to that of vector<T>, but it also supports iteration using Qt's foreach keyword and uses implicit data sharing ("copy on write") as a memory and speed optimization. Chapter 11 presents Qt's container classes and explains how they relate to the Standard C++ containers.
You might be tempted to avoid built-in arrays whenever possible and use vector<T> or QVector<T> instead. It is nonetheless worthwhile understanding how the built-in arrays work because sooner or later you might want to use them in highly optimized code, or need them to interface with existing C libraries.
Character Strings
The most basic way to represent character strings in C++ is to use an array of chars terminated by a null byte ('/ 0'). The following four functions demonstrate how these kinds of strings work:
void hello1() { const char str[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }; std::cout << str << std::endl; } void hello2() { const char str[] = "Hello world!"; std::cout << str << std::endl; } void hello3() { std::cout << "Hello world!" << std::endl; } void hello4() { const char *str = "Hello world!"; std::cout << str << std::endl; }
In the first function, we declare the string as an array and initialize it the hard way. Notice the '/0' terminator at the end, which indicates the end of the string. The second function has a similar array definition, but this time we use a string literal to initialize the array. In C++, string literals are simply const char arrays with an implicit '/0' terminator. The third function uses a string literal directly, without giving it a name. Once translated into machine language instructions, it is identical to the previous two functions.
The fourth function is a bit different in that it creates not only an (anonymous) array, but also a pointer variable called str that stores the address of the array's first item. In spite of this, the semantics of the function are identical to the previous three functions, and an optimizing compiler would eliminate the superfluous str variable.
Functions that take C++ strings as arguments usually take either a char * or a const char *. Here's a short program that illustrates the use of both:
#include <cctype> #include <iostream> void makeUppercase(char *str) { for (int i = 0; str[i] != '\0'; ++i) str[i] = std::toupper(str[i]); } void writeLine(const char *str) { std::cout << str << std::endl; } int main(int argc, char *argv[]) { for (int i = 1; i < argc; ++i) { makeUppercase(argv[i]); writeLine(argv[i]); } return 0; }
In C++, the char type normally holds an 8-bit value. This means that we can easily store ASCII, ISO 8859-1 (Latin-1), and other 8-bit-encoded strings in a char array, but that we can't store arbitrary Unicode characters without resorting to multibyte sequences. Qt provides the powerful QString class, which stores Unicode strings as sequences of 16-bit QChars and internally uses the implicit data sharing ("copy on write") optimization. Chapter 11 and Chapter 18 explain QString in more detail.
Enumerations
C++ has an enumeration feature for declaring a set of named constants similar to that provided by C# and recent versions of Java. Let's suppose that we want to store days of the week in a program:
enum DayOfWeek { Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };
Normally, we would put this declaration in a header file, or even inside a class. The preceding declaration is superficially equivalent to the following constant definitions:
const int Sunday = 0; const int Monday = 1; const int Tuesday = 2; const int Wednesday = 3; const int Thursday = 4; const int Friday = 5; const int Saturday = 6;
By using the enumeration construct, we can later declare variables or parameters of type DayOfWeek and the compiler will ensure that only values from the DayOfWeek enumeration are assigned to it. For example:
DayOfWeek day = Sunday;
If we don't care about type safety, we can also write
int day = Sunday;
Notice that to refer to the Sunday constant from the DayOfWeek enum, we simply write Sunday, not DayOfWeek::Sunday.
By default, the compiler assigns consecutive integer values to the constants of an enum, starting at 0. We can specify other values if we want:
enum DayOfWeek { Sunday = 628, Monday = 616, Tuesday = 735, Wednesday = 932, Thursday = 852, Friday = 607, Saturday = 845 };
If we don't specify the value of an enum item, the item takes the value of the preceding item, plus 1. Enums are sometimes used to declare integer constants, in which case we normally omit the name of the enum:
enum { FirstPort = 1024, MaxPorts = 32767 };
Another frequent use of enums is to represent sets of options. Let's consider the example of a Find dialog, with four checkboxes controlling the search algorithm (Wildcard syntax, Case sensitive, Search backward, and Wrap around). We can represent this by an enum where the constants are powers of 2:
enum FindOption { NoOptions = 0x00000000, WildcardSyntax = 0x00000001, CaseSensitive = 0x00000002, SearchBackward = 0x00000004, WrapAround = 0x00000008 };
Each option is often called a "flag". We can combine flags using the bitwise | or |= operator:
int options = NoOptions; if (wilcardSyntaxCheckBox->isChecked()) options |= WildcardSyntax; if (caseSensitiveCheckBox->isChecked()) options |= CaseSensitive; if (searchBackwardCheckBox->isChecked()) options |= SearchBackwardSyntax; if (wrapAroundCheckBox->isChecked()) options |= WrapAround;
We can test whether a flag is set using the bitwise & operator:
if (options & CaseSensitive) { // case-sensitive search }
A variable of type FindOption can contain only one flag at a time. The result of combining several flags using | is a plain integer. Unfortunately, this is not type-safe: The compiler won't complain if a function expecting a combination of FindOptions through an int parameter receives Saturday instead. Qt uses QFlags<T> to provide type safety for its own flag types. The class is also available when we define custom flag types. See the QFlags<T> online documentation for details.
Typedefs
C++ lets us give an alias to a data type using the typedef keyword. For example, if we use QVector<Point2D> a lot and want to save a few keystrokes (or are unfortunate enough to be stuck with a Norwegian keyboard and have trouble locating the angle brackets), we can put this typedef declaration in one of our header files:
typedef QVector<Point2D> PointVector;
From then on, we can use PointVector as a shorthand for QVector<Point2D>. Notice that the new name for the type appears after the old name. The typedef syntax deliberately mimics that of variable declarations.
In Qt, typedefs are used mainly for three reasons:
- Convenience: Qt declares uint and QWidgetList as typedefs for unsigned int and QList<QWidget *> to save a few keystrokes.
- Platform differences: Certain types need different definitions on different platforms. For example, qlonglong is defined as __int64 on Windows and as long long on other platforms.
- Compatibility: The QIconSet class from Qt 3 was renamed QIcon in Qt 4. To help Qt 3 users port their applications to Qt 4, QIconSet is provided as a typedef for QIcon when Qt 3 compatibility is enabled.
Type Conversions
C++ provides several syntaxes for casting values from one type to another. The traditional syntax, inherited from C, involves putting the resulting type in parentheses before the value to convert:
const double Pi = 3.14159265359; int x = (int)(Pi * 100); std::cout << x << " equals 314" << std::endl;
This syntax is very powerful. It can be used to change the types of pointers, to remove const, and much more. For example:
short j = 0x1234; if (*(char *)&j == 0x12) std::cout << "The byte order is big-endian" << std::endl;
In the preceding example, we cast a short * to a char * and we use the unary * operator to access the byte at the given memory location. On big-endian systems, that byte is 0x12; on little-endian systems, it is 0x34. Since pointers and references are represented the same way, it should come as no surprise that the preceding code can be rewritten using a reference cast:
short j = 0x1234; if ((char &)j == 0x12) std::cout << "The byte order is big-endian" << std::endl;
If the data type is a class name, a typedef, or a primitive type that can be expressed as a single alphanumeric token, we can use the constructor syntax as a cast:
int x = int(Pi * 100);
Casting pointers and references using the traditional C-style casts is a kind of extreme sport, on par with paragliding and elevator surfing, because the compiler lets us cast any pointer (or reference) type into any other pointer (or reference) type. For that reason, C++ introduced four new-style casts with more precise semantics. For pointers and references, the new-style casts are preferable to the risky C-style casts and are used in this book.
-
static_cast<T>() can be used to cast a pointer-to-A to a pointer-to-B, with the constraint that class B must be a subclass of class A. For example:
A *obj = new B; B *b = static_cast<B *>(obj); b->someFunctionDeclaredInB();
If the object isn't an instance of B, using the resulting pointer can lead to obscure crashes.
-
dynamic_cast<T>() is similar to static_cast<T>(), except that it uses run-time type information (RTTI) to check that the object associated with the pointer is an instance of class B. If this is not the case, the cast returns a null pointer. For example:
A *obj = new B; B *b = dynamic_cast<B *>(obj); if (b) b->someFunctionDeclaredInB();
On some compilers, dynamic_cast<T>() doesn't work across dynamic library boundaries. It also relies on the compiler supporting RTTI, a feature that programmers can turn off to reduce the size of their executables. Qt solves these problems by providing qobject_cast<T>() for QObject subclasses.
-
const_cast<T>() adds or removes a const qualifier to a pointer or reference. For example:
int MyClass::someConstFunction() const { if (isDirty()) { MyClass *that = const_cast<MyClass *>(this); that->recomputeInternalData(); } ... }
In the previous example, we cast away the const qualifier of the this pointer to call the non-const member function recomputeInternalData(). Doing so is not recommended and can normally be avoided by using the mutable keyword, as explained in Chapter 4.
-
reinterpret_cast<T>() converts any pointer or reference type to any other such type. For example:
short j = 0x1234; if (reinterpret_cast<char &>(j) == 0x12) std::cout << "The byte order is big-endian" << std::endl;
In Java and C#, any reference can be stored as an Object reference if needed. C++ doesn't have any universal base class, but it provides a special data type, void *, that stores the address of an instance of any type. A void * must be cast back to another type (using static_cast<T>()) before it can be used.
C++ provides many ways to cast types, but most of the time we don't even need a cast. When using container classes such as vector<T> or QVector<T>, we can specify the T type and extract items without casts. In addition, for primitive types, certain conversions occur implicitly (e.g., from char to int), and for custom types we can define implicit conversions by providing a one-parameter constructor. For example:
class MyInteger { public: MyInteger(); MyInteger(int i); ... }; int main() { MyInteger n; n = 5; ... }
For some one-parameter constructors, the automatic conversion makes little sense. We can disable it by declaring the constructor with the explicit keyword:
class MyVector { public: explicit MyVector(int size); ... };
Operator Overloading
C++ allows us to overload functions, meaning that we can declare several functions with the same name in the same scope, as long as they have different parameter lists. In addition, C++ supports operator overloading—the possibility of assigning special semantics to built-in operators (such as +, <<, and []) when they are used with custom types.
We have already seen a few examples of overloaded operators. When we used << to output text to cout or cerr, we didn't trigger C++'s left-shift operator, but rather a special version of the operator that takes an ostream object (such as cout and cerr) on the left side and a string (alternatively, a number or a stream manipulator such as endl) on the right side and that returns the ostream object, allowing multiple calls in a row.
The beauty of operator overloading is that we can make custom types behave just like built-in types. To show how operator overloading works, we will overload +=, -=, +, and - to work on Point2D objects:
#ifndef POINT2D_H #define POINT2D_H class Point2D { public: Point2D(); Point2D(double x, double y); void setX(double x); void setY(double y); double x() const; double y() const; Point2D &operator+=(const Point2D &other) { xVal += other.xVal; yVal += other.yVal; return *this; } Point2D &operator-=(const Point2D &other) { xVal -= other.xVal; yVal -= other.yVal; return *this; } private: double xVal; double yVal; }; inline Point2D operator+(const Point2D &a, const Point2D &b) { return Point2D(a.x() + b.x(), a.y() + b.y()); } inline Point2D operator-(const Point2D &a, const Point2D &b) { return Point2D(a.x() - b.x(), a.y() - b.y()); } #endif
Operators can be implemented either as member functions or as global functions. In our example, we implemented += and -= as member functions, and + and - as global functions.
The += and -= operators take a reference to another Point2D object and increment or decrement the x- and y-coordinates of the current object based on the other object. They return *this, which denotes a reference to the current object (this is of type Point2D *). Returning a reference allows us to write exotic code such as
a += b += c;
The + and - operators take two parameters and return a Point2D object by value (not a reference to an existing object). The inline keyword allows us to put these function definitions in the header file. If the function's body had been longer, we would put a function prototype in the header file and the function definition (without the inline keyword) in a .cpp file.
The following code snippet shows all four overloaded operators in action:
Point2D alpha(12.5, 40.0); Point2D beta(77.5, 50.0); alpha += beta; beta -= alpha; Point2D gamma = alpha + beta; Point2D delta = beta - alpha;
We can also invoke the operator functions just like any other functions:
Point2D alpha(12.5, 40.0); Point2D beta(77.5, 50.0); alpha.operator+=(beta); beta.operator-=(alpha); Point2D gamma = operator+(alpha, beta); Point2D delta = operator-(beta, alpha);
Operator overloading in C++ is a complex topic, but we can go a long way without knowing all the details. It is still important to understand the fundamentals of operator overloading because several Qt classes (including QString and QVector<T>) use this feature to provide a simple and more natural syntax for such operations as concatenation and append.
Value Types
Java and C# distinguish between value types and reference types.
-
Value types: These are primitive types such as char, int, and float, as well as C# structs. What characterizes them is that they aren't created using new and the assignment operator performs a copy of the value held by the variable. For example:
int i = 5; int j = 10; i = j;
-
Reference types: These are classes such as Integer (in Java), String, and MyVeryOwnClass. Instances are created using new. The assignment operator copies only a reference to the object; to obtain a deep copy, we must call clone() (in Java) or Clone() (in C#). For example:
Integer i = new Integer(5); Integer j = new Integer(10); i = j.clone();
In C++, all types can be used as "reference types", and those that are copyable can be used as "value types" as well. For example, C++ doesn't need any Integer class, because we can use pointers and new as follows:
int *i = new int(5); int *j = new int(10); *i = *j;
Unlike Java and C#, C++ treats user-defined classes in the same way as built-in types:
Point2D *i = new Point2D(5, 5); Point2D *j = new Point2D(10, 10); *i = *j;
If we want to make a C++ class copyable, we must ensure that our class has a copy constructor and an assignment operator. The copy constructor is invoked when we initialize an object with another object of the same type. C++ provides two equivalent syntaxes for this:
Point2D i(20, 20); Point2D j(i); // first syntax Point2D k = i; // second syntax
The assignment operator is invoked when we use the assignment operator on an existing variable:
Point2D i(5, 5); Point2D j(10, 10); j = i;
When we define a class, the C++ compiler automatically provides a copy constructor and an assignment operator that perform member-by-member copying. For the Point2D class, this is as though we had written the following code in the class definition:
class Point2D { public: ... Point2D(const Point2D &other) : xVal(other.xVal), yVal(other.yVal) { } Point2D &operator=(const Point2D &other) { xVal = other.xVal; yVal = other.yVal; return *this; } ... private: double xVal; double yVal; };
For some classes, the default copy constructor and assignment operator are unsuitable. This typically occurs if the class uses dynamic memory. To make the class copyable, we must then implement the copy constructor and the assignment operator ourselves.
For classes that don't need to be copyable, we can disable the copy constructor and assignment operator by making them private. If we accidentally attempt to copy instances of such a class, the compiler reports an error. For example:
class BankAccount { public: ... private: BankAccount(const BankAccount &other); BankAccount &operator=(const BankAccount &other); };
In Qt, many classes are designed to be used as value classes. These have a copy constructor and an assignment operator, and are normally instantiated on the stack without new. This is the case for QDateTime, QImage, QString, and container classes such as QList<T>, QVector<T>, and QMap<K, T>.
Other classes fall in the "reference type" category, notably QObject and its subclasses (QWidget, QTimer, QTcpSocket, etc.). These have virtual functions and cannot be copied. For example, a QWidget represents a specific window or control on-screen. If there are 75 QWidget instances in memory, there are also 75 windows or controls on-screen. These classes are typically instantiated using the new operator.
Global Variables and Functions
C++ lets us declare functions and variables that don't belong to any classes and that are accessible from any other function. We have seen several examples of global functions, including main(), the program's entry point. Global variables are rarer, because they compromise modularity and thread reentrancy. It is still important to understand them because you might encounter them in code written by reformed C programmers and other C++ users.
To illustrate how global functions and variables work, we will study a small program that prints a list of 128 pseudo-random numbers using a quick-and-dirty algorithm. The program's source code is spread over two .cpp files.
The first source file is random.cpp:
int randomNumbers[128]; static int seed = 42; static int nextRandomNumber() { seed = 1009 + (seed * 2011); return seed; } void populateRandomArray() { for (int i = 0; i < 128; ++i) randomNumbers[i] = nextRandomNumber(); }
The file declares two global variables (randomNumbers and seed) and two global functions (nextRandomNumber() and populateRandomArray()). Two of the declarations contain the static keyword; these are visible only within the current compilation unit (random.cpp) and are said to have static linkage. The two others can be accessed from any compilation unit in the program; these have external linkage.
Static linkage is ideal for helper functions and internal variables that should not be used in other compilation units. It reduces the risks of having colliding identifiers (global variables with the same name or global functions with the same signature in different compilation units) and prevents malicious or otherwise ill-advised users from accessing the internals of a compilation unit.
Let's now look at the second file, main.cpp, which uses the two global variables declared with external linkage in random.cpp:
#include <iostream> extern int randomNumbers[128]; void populateRandomArray(); int main() { populateRandomArray(); for (int i = 0; i < 128; ++i) std::cout << randomNumbers[i] << std::endl; return 0; }
We declare the external variables and functions before we call them. The external variable declaration (which makes an external variable visible in the current compilation unit) for randomNumbers starts with the extern keyword. Without extern, the compiler would think it has to deal with a variable definition, and the linker would complain because the same variable is defined in two compilation units (random.cpp and main.cpp). Variables can be declared as many times as we want, but they may be defined only once. The definition is what causes the compiler to reserve space for the variable.
The populateRandomArray() function is declared using a function prototype. The extern keyword is optional for functions.
Typically, we would put the external variable and function declarations in a header file and include it in all the files that need them:
#ifndef RANDOM_H #define RANDOM_H extern int randomNumbers[128]; void populateRandomArray(); #endif
We have already seen how static can be used to declare member variables and functions that are not attached to a specific instance of the class, and now we have seen how to use it to declare functions and variables with static linkage. There is one more use of the static keyword that should be noted in passing. In C++, we can declare a local variable static. Such variables are initialized the first time the function is called and hold their value between function invocations. For example:
void nextPrime() { static int n = 1; do { ++n; } while (!isPrime(n)); return n; }
Static local variables are similar to global variables, except that they are only visible inside the function where they are defined.
Namespaces
Namespaces are a mechanism for reducing the risks of name clashes in C++ programs. Name clashes are often an issue in large programs that use several third-party libraries. In your own programs, you can choose whether you want to use namespaces.
Typically, we put a namespace around all the declarations in a header file to ensure that the identifiers declared in that header file don't leak into the global namespace. For example:
#ifndef SOFTWAREINC_RANDOM_H #define SOFTWAREINC_RANDOM_H namespace SoftwareInc { extern int randomNumbers[128]; void populateRandomArray(); } #endif
(Notice that we have also renamed the preprocessor macro used to avoid multiple inclusions, reducing the risk of a name clash with a header file of the same name but located in a different directory.)
The namespace syntax is similar to that of a class, but it doesn't end with a semicolon. Here's the new random.cpp file:
#include "random.h" int SoftwareInc::randomNumbers[128]; static int seed = 42; static int nextRandomNumber() { seed = 1009 + (seed * 2011); return seed; } void SoftwareInc::populateRandomArray() { for (int i = 0; i < 128; ++i) randomNumbers[i] = nextRandomNumber(); }
Unlike classes, namespaces can be "reopened" at any time. For example:
namespace Alpha { void alpha1(); void alpha2(); } namespace Beta { void beta1(); } namespace Alpha { void alpha3(); }
This makes it possible to define hundreds of classes, located in as many header files, as part of a single namespace. Using this trick, the Standard C++ library puts all its identifiers in the std namespace. In Qt, namespaces are used for global-like identifiers such as Qt::AlignBottom and Qt::yellow. For historical reasons, Qt classes do not belong to any namespace but are prefixed with the letter 'Q'.
To refer to an identifier declared in a namespace from outside the namespace, we prefix it with the name of the namespace (and ::). Alternatively, we can use one of the following three mechanisms, which are aimed at reducing the number of keystrokes we must type.
-
We can define a namespace alias:
namespace ElPuebloDeLaReinaDeLosAngeles { void beverlyHills(); void culverCity(); void malibu(); void santaMonica(); } namespace LA = ElPuebloDeLaReinaDeLosAngeles;
After the alias definition, the alias can be used instead of the original name.
-
We can import a single identifier from a namespace:
int main() { using ElPuebloDeLaReinaDeLosAngeles::beverlyHills; beverlyHills(); ... }
The using declaration allows us to access a given identifier from a namespace without having to prefix it with the name of the namespace.
-
We can import an entire namespace with a single directive:
int main() { using namespace ElPuebloDeLaReinaDeLosAngeles; santaMonica(); malibu(); ... }
With this approach, name clashes are more likely to occur. If the compiler complains about an ambiguous name (e.g., two classes with the same name defined in two different namespaces), we can always qualify the identifier with the name of the namespace when referring to it.
The Preprocessor
The C++ preprocessor is a program that converts a .cpp source file containing # directives (such as #include, #ifndef, and #endif) into a source file that contains no such directives. These directives perform simple textual operations on the source file, such as conditional compilation, file inclusion, and macro expansion. Normally, the preprocessor is invoked automatically by the compiler, but most systems still offer a way to invoke it alone (often through a -E or /E compiler option).
- The #include directive expands to the contents of the file specified within angle brackets (<>) or double quotes (""), depending on whether the header file is installed at a standard location or is part of the current project. The file name may contain .. and / (which Windows compilers correctly interpret as a directory separator). For example:
#include "../shared/globaldefs.h"
-
The #define directive defines a macro. Occurrences of the macro appearing after the #define directive are replaced with the macro's definition. For example, the directive
#define PI 3.14159265359
tells the preprocessor to replace all future occurrences of the token PI in the current compilation unit with the token 3.14159265359. To avoid clashes with variable and class names, it is common practice to give macros all-uppercase names. It is possible to define macros that take arguments:
#define SQUARE(x) ((x) * (x))
In the macro body, it is good style to surround all occurrences of the parameters with parentheses, as well as the entire body, to avoid problems with operator precedence. After all, we want 7 * SQUARE(2 + 3) to expand to 7 * ((2 + 3) * (2 + 3)), not to 7 * 2 + 3 * 2 + 3.
C++ compilers normally allow us to define macros on the command line, using the -D or /D option. For example:
CC -DPI=3.14159265359 -c main.cpp
Macros were very popular in the old days, before typedefs, enums, constants, inline functions, and templates were introduced. Nowadays, their most important role is to protect header files against multiple inclusions.
-
Macros can be undefined at any point using #undef:
#undef PI
This is useful if we want to redefine a macro, since the preprocessor doesn't let us define the same macro twice. It is also useful to control conditional compilation.
-
Portions of code can be processed or skipped using #if, #elif, #else, and #endif, based on the numeric value of macros. For example:
#define NO_OPTIM 0 #define OPTIM_FOR_SPEED 1 #define OPTIM_FOR_MEMORY 2 #define OPTIMIZATION OPTIM_FOR_MEMORY ... #if OPTIMIZATION == OPTIM_FOR_SPEED typedef int MyInt; #elif OPTIMIZATION == OPTIM_FOR_MEMORY typedef short MyInt; #else typedef long long MyInt; #endif
In the preceding example, only the second typedef declaration would be processed by the compiler, resulting in MyInt being defined as a synonym for short. By changing the definition of the OPTIMIZATION macro, we get different programs. If a macro isn't defined, its value is taken to be 0.
Another approach to conditional compilation is to test whether a macro is defined. This can be done using the using the defined() operator as follows:
#define OPTIM_FOR_MEMORY ... #if defined(OPTIM_FOR_SPEED) typedef int MyInt; #elif defined(OPTIM_FOR_MEMORY) typedef short MyInt; #else typedef long long MyInt; #endif
-
For convenience, the preprocessor recognizes #ifdef X and #ifndef X as synonyms for #if defined(X) and #if !defined(X). To protect a header file against multiple inclusions, we wrap its contents with the following idiom:
#ifndef MYHEADERFILE_H #define MYHEADERFILE_H ... #endif
The first time the header file is included, the symbol MYHEADERFILE_H is not defined, so the compiler processes the code between #ifndef and #endif. The second and any subsequent times the header file is included, MYHEADERFILE_H is defined, so the entire #ifndef ... #endif block is skipped.
- The #error directive emits a user-defined error message at compile time. This is often used in conjunction with conditional compilation to report an impossible case. For example:
class UniChar { public: #if BYTE_ORDER == BIG_ENDIAN uchar row; uchar cell; #elif BYTE_ORDER == LITTLE_ENDIAN uchar cell; uchar row; #else #error "BYTE_ORDER must be BIG_ENDIAN or LITTLE_ENDIAN" #endif };
Unlike most other C++ constructs, where whitespace is irrelevant, preprocessor directives stand alone on a line and require no semicolon. Very long directives can be split across multiple lines by ending every line except the last with a backslash.