Dealing with Data
In this chapter you'll learn about the following:
- Rules for naming C++ variables
- C++'s built-in integer types: unsigned long, long, unsigned int, int, unsigned short, short, char, unsigned char, signed char, and bool
- The climits file, which represents system limits for various integer types
- Numeric constants of various integer types
- Using the const qualifier to create symbolic constants
- C++'s built-in floating-point types: float, double, and long double
- The cfloat file, which represents system limits for various floating-point types
- Numeric constants of various floating-point types
- C++'s arithmetic operators
- Automatic type conversions
- Forced type conversions (type casts)
The essence of object-oriented programming (OOP) is designing and extending your own data types. Designing your own data types represents an effort to make a type match the data. If you do this properly, you'll find it much simpler to work with the data later. But before you can create your own types, you must know and understand the types that are built in to C++ because those types will be your building blocks.
The built-in C++ types come in two groups: fundamental types and compound types. In this chapter you'll meet the fundamental types, which represent integers and floating-point numbers. That might sound like just two types; however, C++ recognizes that no one integer type and no one floating-point type match all programming requirements, so it offers several variants on these two data themes. Chapter 4, "Compound Types," follows up by covering several types that are built on the basic types; these additional compound types include arrays, strings, pointers, and structures.
Of course, a program also needs a means to identify stored data. In this chapter you'll examine one method for doing sousing variables. Then, you'll look at how to do arithmetic in C++. Finally, you'll see how C++ converts values from one type to another.
Simple Variables
Programs typically need to store informationperhaps the current price of IBM stock, the average humidity in New York City in August, the most common letter in the U.S. Constitution and its relative frequency, or the number of available Elvis impersonators. To store an item of information in a computer, the program must keep track of three fundamental properties:
Where the information is stored
What value is kept there
What kind of information is stored
The strategy the examples in this book have used so far is to declare a variable. The type used in the declaration describes the kind of information, and the variable name represents the value symbolically. For example, suppose Chief Lab Assistant Igor uses the following statements:
int braincount; braincount = 5;
These statements tell the program that it is storing an integer and that the name braincount represents the integer's value, 5 in this case. In essence, the program locates a chunk of memory large enough to hold an integer, notes the location, assigns the label braincount to the location, and copies the value 5 into the location. These statements don't tell you (or Igor) where in memory the value is stored, but the program does keep track of that information, too. Indeed, you can use the & operator to retrieve braincount's address in memory. You'll learn about that operator in the next chapter, when you investigate a second strategy for identifying datausing pointers.
Names for Variables
C++ encourages you to use meaningful names for variables. If a variable represents the cost of a trip, you should call it cost_of_trip or costOfTrip, not just x or cot. You do have to follow a few simple C++ naming rules:
The only characters you can use in names are alphabetic characters, numeric digits, and the underscore (_) character.
The first character in a name cannot be a numeric digit.
Uppercase characters are considered distinct from lowercase characters.
You can't use a C++ keyword for a name.
Names beginning with two underscore characters or with an underscore character followed by an uppercase letter are reserved for use by the implementationthat is, the compiler and the resources it uses. Names beginning with a single underscore character are reserved for use as global identifiers by the implementation.
C++ places no limits on the length of a name, and all characters in a name are significant.
The next-to-last point is a bit different from the preceding points because using a name such as __time_stop or _Donut doesn't produce a compiler error; instead, it leads to undefined behavior. In other words, there's no telling what the result will be. The reason there is no compiler error is that the names are not illegal but rather are reserved for the implementation to use. The bit about global names refers to where the names are declared; Chapter 4 touches on that topic.
The final point differentiates C++ from ANSI C (C99), which guarantees only that the first 63 characters in a name are significant. (In ANSI C, two names that have the same first 63 characters are considered identical, even if the 64th characters differ.)
Here are some valid and invalid C++ names:
int poodle; // valid int Poodle; // valid and distinct from poodle int POODLE; // valid and even more distinct Int terrier; // invalid -- has to be int, not Int int my_stars3 // valid int _Mystars3; // valid but reserved -- starts with underscore int 4ever; // invalid because starts with a digit int double; // invalid -- double is a C++ keyword int begin; // valid -- begin is a Pascal keyword int __fools; // valid but reserved starts with two underscores int the_very_best_variable_i_can_be_version_112; // valid int honky-tonk; // invalid -- no hyphens allowed
If you want to form a name from two or more words, the usual practice is to separate the words with an underscore character, as in my_onions, or to capitalize the initial character of each word after the first, as in myEyeTooth. (C veterans tend to use the underscore method in the C tradition, whereas Pascalians prefer the capitalization approach.) Either form makes it easier to see the individual words and to distinguish between, say, carDrip and cardRip, or boat_sport and boats_port.
Real-World Note: Variable Names
Schemes for naming variables, like schemes for naming functions, provide fertile ground for fervid discussion. Indeed, this topic produces some of the most strident disagreements in programming. Again, as with function names, the C++ compiler doesn't care about your variable names as long as they are within legal limits, but a consistent, precise personal naming convention will serve you well.
As in function naming, capitalization is a key issue in variable naming (see the sidebar "Naming Conventions" in Chapter 2, "Setting Out to C++"), but many programmers may insert an additional level of information in a variable namea prefix that describes the variable's type or contents. For instance, the integer myWeight might be named nMyWeight; here, the n prefix is used to represent an integer value, which is useful when you are reading code and the definition of the variable isn't immediately at hand. Alternatively, this variable might be named intMyWeight, which is more precise and legible, although it does include a couple extra letters (anathema to many programmers). Other prefixes are commonly used in like fashion: str or sz might be used to represent a null-terminated string of characters, b might represent a Boolean value, p a pointer, c a single character.
As you progress into the world of C++, you will find many examples of the prefix naming style (including the handsome m_lpctstr prefixa class member value that contains a long pointer to a constant, null-terminated string of characters), as well as other, more bizarre and possibly counterintuitive styles that you may or may not adopt as your own. As in all the stylistic, subjective parts of C++, consistency and precision are best. You should use variable names to fit your own needs, preferences, and personal style. (Or, if required, choose names that fit the needs, preferences, and personal style of your employer.)
Integer Types
Integers are numbers with no fractional part, such as 2, 98, 5286, and 0. There are lots of integers, assuming that you consider an infinite number to be a lot, so no finite amount of computer memory can represent all possible integers. Thus, a language can represent only a subset of all integers. Some languages, such as standard Pascal, offer just one integer type (one type fits all!), but C++ provides several choices. This gives you the option of choosing the integer type that best meets a program's particular requirements. This concern with matching type to data presages the designed data types of OOP.
The various C++ integer types differ in the amount of memory they use to hold an integer. A larger block of memory can represent a larger range in integer values. Also, some types (signed types) can represent both positive and negative values, whereas others (unsigned types) can't represent negative values. The usual term for describing the amount of memory used for an integer is width. The more memory a value uses, the wider it is. C++'s basic integer types, in order of increasing width, are char, short, int, and long. Each comes in both signed and unsigned versions. That gives you a choice of eight different integer types! Let's look at these integer types in more detail. Because the char type has some special properties (it's most often used to represent characters instead of numbers), this chapter covers the other types first.
The short, int, and long Integer Types
Computer memory consists of units called bits. (See the "Bits and Bytes" sidebar, later in this chapter.) By using different numbers of bits to store values, the C++ types short, int, and long can represent up to three different integer widths. It would be convenient if each type were always some particular width for all systemsfor example, if short were always 16 bits, int were always 32 bits, and so on. But life is not that simple. However, no one choice is suitable for all computer designs. C++ offers a flexible standard with some guaranteed minimum sizes, which it takes from C. Here's what you get:
-
A short integer is at least 16 bits wide.
-
An int integer is at least as big as short.
-
A long integer is at least 32 bits wide and at least as big as int.
Bits and Bytes
The fundamental unit of computer memory is the bit. Think of a bit as an electronic switch that you can set to either off or on. Off represents the value 0, and on represents the value 1. An 8-bit chunk of memory can be set to 256 different combinations. The number 256 comes from the fact that each bit has two possible settings, making the total number of combinations for 8 bits 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2, or 256. Thus, an 8-bit unit can represent, say, the values 0 through 255 or the values 128 through 127. Each additional bit doubles the number of combinations. This means you can set a 16-bit unit to 65,536 different values and a 32-bit unit to 4,294,672,296 different values.
A byte usually means an 8-bit unit of memory. Byte in this sense is the unit of measurement that describes the amount of memory in a computer, with a kilobyte equal to 1,024 bytes and a megabyte equal to 1,024 kilobytes. However, C++ defines byte differently. The C++ byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters. In the United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits, so the C++ byte is typically 8 bits on systems using those character sets. However, international programming can require much larger character sets, such as Unicode, so some implementations may use a 16-bit byte or even a 32-bit byte.
Many systems currently use the minimum guarantee, making short 16 bits and long 32 bits. This still leaves several choices open for int. It could be 16, 24, or 32 bits in width and meet the standard. Typically, int is 16 bits (the same as short) for older IBM PC implementations and 32 bits (the same as long) for Windows 98, Windows NT, Windows XP, Macintosh OS X, VAX, and many other minicomputer implementations. Some implementations give you a choice of how to handle int. (What does your implementation use? The next example shows you how to determine the limits for your system without your having to open a manual.) The differences between implementations for type widths can cause problems when you move a C++ program from one environment to another. But a little care, as discussed later in this chapter, can minimize those problems.
You use these type names to declare variables just as you would use int:
short score; // creates a type short integer variable int temperature; // creates a type int integer variable long position; // creates a type long integer variable
Actually, short is short for short int and long is short for long int, but hardly anyone uses the longer forms.
The three types, int, short, and long, are signed types, meaning each splits its range approximately equally between positive and negative values. For example, a 16-bit int might run from 32,768 to +32,767.
If you want to know how your system's integers size up, you can use C++ tools to investigate type sizes with a program. First, the sizeof operator returns the size, in bytes, of a type or a variable. (An operator is a built-in language element that operates on one or more items to produce a value. For example, the addition operator, represented by +, adds two values.) Note that the meaning of byte is implementation dependent, so a 2-byte int could be 16 bits on one system and 32 bits on another. Second, the climits header file (or, for older implementations, the limits.h header file) contains information about integer type limits. In particular, it defines symbolic names to represent different limits. For example, it defines INT_MAX as the largest possible int value and CHAR_BIT as the number of bits in a byte. Listing 3.1 demonstrates how to use these facilities. The program also illustrates initialization, which is the use of a declaration statement to assign a value to a variable.
Listing 3.1 limits.cpp
// limits.cpp -- some integer limits #include <iostream> #include <climits> // use limits.h for older systems int main() { using namespace std; int n_int = INT_MAX; // initialize n_int to max int value short n_short = SHRT_MAX; // symbols defined in limits.h file long n_long = LONG_MAX; // sizeof operator yields size of type or of variable cout << "int is " << sizeof (int) << " bytes." << endl; cout << "short is " << sizeof n_short << " bytes." << endl; cout << "long is " << sizeof n_long << " bytes." << endl << endl; cout << "Maximum values:" << endl; cout << "int: " << n_int << endl; cout << "short: " << n_short << endl; cout << "long: " << n_long << endl << endl; cout << "Minimum int value = " << INT_MIN << endl; cout << "Bits per byte = " << CHAR_BIT << endl; return 0; }
Compatibility Note
The climits header file is the C++ version of the ANSI C limits.h header file. Some earlier C++ platforms have neither header file available. If you're using such a system, you must limit yourself to experiencing this example in spirit only.
Here is the output from the program in Listing 3.1, using Microsoft Visual C++ 7.1:
int is 4 bytes. short is 2 bytes. long is 4 bytes. Maximum values: int: 2147483647 short: 32767 long: 2147483647 Minimum int value = -2147483648 Bits per byte = 8
Here is the output for a second system, running Borland C++ 3.1 for DOS:
int is 2 bytes. short is 2 bytes. long is 4 bytes. Maximum values: int: 32767 short: 32767 long: 2147483647 Minimum int value = -32768 Bits per byte = 8
Program Notes
The following sections look at the chief programming features for this program.
The sizeof Operator and the climits Header File
The sizeof operator reports that int is 4 bytes on the base system, which uses an 8-bit byte. You can apply the sizeof operator to a type name or to a variable name. When you use the sizeof operator with a type name, such as int, you enclose the name in parentheses. But when you use the operator with the name of the variable, such as n_short, parentheses are optional:
cout << "int is " << sizeof (int) << " bytes.\n"; cout << "short is " << sizeof n_short << " bytes.\n";
The climits header file defines symbolic constants (see the sidebar "Symbolic Constants the Preprocessor Way," later in this chapter) to represent type limits. As mentioned previously, INT_MAX represents the largest value type int can hold; this turned out to be 32,767 for our DOS system. The compiler manufacturer provides a climits file that reflects the values appropriate to that compiler. For example, the climits file for Windows XP, which uses a 32-bit int, defines INT_MAX to represent 2,147,483,647. Table 3.1 summarizes the symbolic constants defined in the climits file; some pertain to types you have not yet learned.
Table 3.1 Symbolic Constants from climits
Symbolic Constant |
Represents |
CHAR_BIT |
Number of bits in a char |
CHAR_MAX |
Maximum char value |
CHAR_MAX |
Minimum char value |
SCHAR_MAX |
Maximum signed char value |
SCHAR_MIN |
Minimum signed char value |
UCHAR_MAX |
Maximum unsigned char value |
SHRT_MAX |
Maximum short value |
SHRT_MIN |
Minimum short value |
USHRT_MAX |
Maximum unsigned short value |
INT_MAX |
Maximum int value |
INT_MIN |
Minimum int value |
UINT_MAX |
Maximum unsigned int value |
LONG_MAX |
Maximum long value |
LONG_MIN |
Minimum long value |
ULONG_MAX |
Maximum unsigned long value |
Initialization
Initialization combines assignment with declaration. For example, the statement
int n_int = INT_MAX;
declares the n_int variable and sets it to the largest possible type int value. You can also use regular constants to initialize values. You can initialize a variable to another variable, provided that the other variable has been defined first. You can even initialize a variable to an expression, provided that all the values in the expression are known when program execution reaches the declaration:
int uncles = 5; // initialize uncles to 5 int aunts = uncles; // initialize aunts to 5 int chairs = aunts + uncles + 4; // initialize chairs to 14
Moving the uncles declaration to the end of this list of statements would invalidate the other two initializations because then the value of uncles wouldn't be known at the time the program tries to initialize the other variables.
The initialization syntax shown previously comes from C; C++ has a second initialization syntax that is not shared with C:
int owls = 101; // traditional C initialization int wrens(432); // alternative C++ syntax, set wrens to 432
Remember
If you don't initialize a variable that is defined inside a function, the variable's value is undefined. That means the value is whatever happened to be sitting at that memory location prior to the creation of the variable.
If you know what the initial value of a variable should be, initialize it. True, separating the declaring of a variable from assigning it a value can create momentary suspense:
short year; // what could it be? year = 1492; // oh
But initializing the variable when you declare it protects you from forgetting to assign the value later.
Symbolic Constants the Preprocessor Way
The climits file contains lines similar to the following:
#define INT_MAX 32767
Recall that the C++ compilation process first passes the source code through a preprocessor. Here #define, like #include, is a preprocessor directive. What this particular directive tells the preprocessor is this: Look through the program for instances of INT_MAX and replace each occurrence with 32767. So the #define directive works like a global search-and-replace command in a text editor or word processor. The altered program is compiled after these replacements occur. The preprocessor looks for independent tokens (separate words) and skips embedded words. That is, the preprocessor doesn't replace PINT_MAXIM with P32767IM. You can use #define to define your own symbolic constants, too. (See Listing 3.2.) However, the #define directive is a C relic. C++ has a better way of creating symbolic constants (using the const keyword, discussed in a later section), so you won't be using #define much. But some header files, particularly those designed to be used with both C and C++, do use it.
Unsigned Types
Each of the three integer types you just learned about comes in an unsigned variety that can't hold negative values. This has the advantage of increasing the largest value the variable can hold. For example, if short represents the range 32,768 to +32,767, the unsigned version can represent the range 0 to 65,535. Of course, you should use unsigned types only for quantities that are never negative, such as populations, bean counts, and happy face manifestations. To create unsigned versions of the basic integer types, you just use the keyword unsigned to modify the declarations:
unsigned short change; // unsigned short type unsigned int rovert; // unsigned int type unsigned quarterback; // also unsigned int unsigned long gone; // unsigned long type
Note that unsigned by itself is short for unsigned int.
Listing 3.2 illustrates the use of unsigned types. It also shows what might happen if your program tries to go beyond the limits for integer types. Finally, it gives you one last look at the preprocessor #define statement.
Listing 3.2 exceed.cpp
// exceed.cpp -- exceeding some integer limits #include <iostream> #define ZERO 0 // makes ZERO symbol for 0 value #include <climits> // defines INT_MAX as largest int value int main() { using namespace std; short sam = SHRT_MAX; // initialize a variable to max value unsigned short sue = sam;// okay if variable sam already defined cout << "Sam has " << sam << " dollars and Sue has " << sue; cout << " dollars deposited." << endl << "Add $1 to each account." << endl << "Now "; sam = sam + 1; sue = sue + 1; cout << "Sam has " << sam << " dollars and Sue has " << sue; cout << " dollars deposited.\nPoor Sam!" << endl; sam = ZERO; sue = ZERO; cout << "Sam has " << sam << " dollars and Sue has " << sue; cout << " dollars deposited." << endl; cout << "Take $1 from each account." << endl << "Now "; sam = sam - 1; sue = sue - 1; cout << "Sam has " << sam << " dollars and Sue has " << sue; cout << " dollars deposited." << endl << "Lucky Sue!" << endl; return 0; }
Compatibility Note
Listing 3.2, like Listing 3.1, uses the climits file; older compilers might need to use limits.h, and some very old compilers might not have either file available.
Here's the output from the program in Listing 3.2:
Sam has 32767 dollars and Sue has 32767 dollars deposited. Add $1 to each account. Now Sam has -32768 dollars and Sue has 32768 dollars deposited. Poor Sam! Sam has 0 dollars and Sue has 0 dollars deposited. Take $1 from each account. Now Sam has -1 dollars and Sue has 65535 dollars deposited. Lucky Sue!
The program sets a short variable (sam) and an unsigned short variable (sue) to the largest short value, which is 32,767 on our system. Then, it adds 1 to each value. This causes no problems for sue because the new value is still much less than the maximum value for an unsigned integer. But sam goes from 32,767 to 32,768! Similarly, subtracting 1 from 0 creates no problems for sam, but it makes the unsigned variable sue go from 0 to 65,535. As you can see, these integers behave much like an odometer. If you go past the limit, the values just start over at the other end of the range. (See Figure 3.1.) C++ guarantees that unsigned types behave in this fashion. However, C++ doesn't guarantee that signed integer types can exceed their limits (overflow and underflow) without complaint, but that is the most common behavior on current implementations.
Figure 3.1 Typical overflow behavior for integers.
Beyond long
C99 has added a couple new types that most likely will be part of the next edition of the C++ Standard. Indeed, many C++ compilers already support them. The types are long long and unsigned long long. Both are guaranteed to be at least 64 bits and to be at least as wide as the long and unsigned long types.
Choosing an Integer Type
With the richness of C++ integer types, which should you use? Generally, int is set to the most "natural" integer size for the target computer. Natural size refers to the integer form that the computer handles most efficiently. If there is no compelling reason to choose another type, you should use int.
Now look at reasons why you might use another type. If a variable represents something that is never negative, such as the number of words in a document, you can use an unsigned type; that way the variable can represent higher values.
If you know that the variable might have to represent integer values too great for a 16-bit integer, you should use long. This is true even if int is 32 bits on your system. That way, if you transfer your program to a system with a 16-bit int, your program won't embarrass you by suddenly failing to work properly. (See Figure 3.2.)
Figure 3.2 For portability, use long for big integers.
Using short can conserve memory if short is smaller than int. Most typically, this is important only if you have a large array of integers. (An array is a data structure that stores several values of the same type sequentially in memory.) If it is important to conserve space, you should use short instead of int, even if the two are the same size. Suppose, for example, that you move your program from a 16-bit int DOS PC system to a 32-bit int Windows XP system. That doubles the amount of memory needed to hold an int array, but it doesn't affect the requirements for a short array. Remember, a bit saved is a bit earned.
If you need only a single byte, you can use char. We'll examine that possibility soon.
Integer Constants
An integer constant is one you write out explicitly, such as 212 or 1776. C++, like C, lets you write integers in three different number bases: base 10 (the public favorite), base 8 (the old Unix favorite), and base 16 (the hardware hacker's favorite). Appendix A, "Number Bases," describes these bases; here we'll look at the C++ representations. C++ uses the first digit or two to identify the base of a number constant. If the first digit is in the range 19, the number is base 10 (decimal); thus 93 is base 10. If the first digit is 0 and the second digit is in the range 17, the number is base 8 (octal); thus 042 is octal and equal to 34 decimal. If the first two characters are 0x or 0X, the number is base 16 (hexadecimal); thus 0x42 is hex and equal to 66 decimal. For hexadecimal values, the characters af and AF represent the hexadecimal digits corresponding to the values 1015. 0xF is 15 and 0xA5 is 165 (10 sixteens plus 5 ones). Listing 3.3 is tailor-made to show the three bases.
Listing 3.3 hexoct1.cpp
// hexoct1.cpp -- shows hex and octal constants #include <iostream> int main() { using namespace std; int chest = 42; // decimal integer constant int waist = 0x42; // hexadecimal integer constant int inseam = 042; // octal integer constant cout << "Monsieur cuts a striking figure!\n"; cout << "chest = " << chest << "\n"; cout << "waist = " << waist << "\n"; cout << "inseam = " << inseam << "\n"; return 0; }
By default, cout displays integers in decimal form, regardless of how they are written in a program, as the following output shows:
Monsieur cuts a striking figure! chest = 42 (42 in decimal) waist = 66 (0x42 in hex) inseam = 34 (042 in octal)
Keep in mind that these notations are merely notational conveniences. For example, if you read that the CGA video memory segment is B000 in hexadecimal, you don't have to convert the value to base 10 45,056 before using it in your program. Instead, you can simply use 0xB000. But whether you write the value ten as 10, 012, or 0xA, it's stored the same way in the computeras a binary (base 2) value.
By the way, if you want to display a value in hexadecimal or octal form, you can use some special features of cout. Recall that the iostream header file provides the endl manipulator to give cout the message to start a new line. Similarly, it provides the dec, hex, and oct manipulators to give cout the messages to display integers in decimal, hexadecimal, and octal formats, respectively. Listing 3.4 uses hex and oct to display the decimal value 42 in three formats. (Decimal is the default format, and each format stays in effect until you change it.)
Listing 3.4 hexoct2.cpp
// hexoct2.cpp -- display values in hex and octal #include <iostream> using namespace std; int main() { using namespace std; int chest = 42; int waist = 42; int inseam = 42; cout << "Monsieur cuts a striking figure!" << endl; cout << "chest = " << chest << " (decimal)" << endl; cout << hex; // manipulator for changing number base cout << "waist = " << waist << " hexadecimal" << endl; cout << oct; // manipulator for changing number base cout << "inseam = " << inseam << " (octal)" << endl; return 0; }
Here's the program output for Listing 3.4:
Monsieur cuts a striking figure! chest = 42 (decimal) waist = 2a hexadecimal inseam = 52 (octal)
Note that code like
cout << hex;
doesn't display anything onscreen. Instead, it changes the way cout displays integers. Thus, the manipulator hex is really a message to cout that tells it how to behave. Also note that because the identifier hex is part of the std namespace and the program uses that namespace, this program can't use hex as the name of a variable. However, if you omitted the using directive and instead used std::cout, std::endl, std::hex, and std::oct, you could still use plain hex as the name for a variable.
How C++ Decides What Type a Constant Is
A program's declarations tell the C++ compiler the type of a particular integer variable. But what about constants? That is, suppose you represent a number with a constant in a program:
cout << "Year = " << 1492 << "\n";
Does the program store 1492 as an int, a long, or some other integer type? The answer is that C++ stores integer constants as type int unless there is a reason to do otherwise. Two such reasons are if you use a special suffix to indicate a particular type or if a value is too large to be an int.
First, look at the suffixes. These are letters placed at the end of a numeric constant to indicate the type. An l or L suffix on an integer means the integer is a type long constant, a u or U suffix indicates an unsigned int constant, and ul (in any combination of orders and uppercase and lowercase) indicates a type unsigned long constant. (Because a lowercase l can look much like the digit 1, you should use the uppercase L for suffixes.) For example, on a system using a 16-bit int and a 32-bit long, the number 22022 is stored in 16 bits as an int, and the number 22022L is stored in 32 bits as a long. Similarly, 22022LU and 22022UL are unsigned long.
Next, look at size. C++ has slightly different rules for decimal integers than it has for hexadecimal and octal integers. (Here decimal means base 10, just as hexadecimal means base 16; the term decimal does not necessarily imply a decimal point.) A decimal integer without a suffix is represented by the smallest of the following types that can hold it: int, long, or unsigned long. On a computer system using a 16-bit int and a 32-bit long, 20000 is represented as type int, 40000 is represented as long, and 3000000000 is represented as unsigned long. A hexadecimal or octal integer without a suffix is represented by the smallest of the following types that can hold it: int, unsigned int, long, or unsigned long. The same computer system that represents 40000 as long represents the hexadecimal equivalent 0x9C40 as an unsigned int. That's because hexadecimal is frequently used to express memory addresses, which intrinsically are unsigned. So unsigned int is more appropriate than long for a 16-bit address.
The char Type: Characters and Small Integers
It's time to turn to the final integer type: char. As you probably suspect from its name, the char type is designed to store characters, such as letters and numeric digits. Now, whereas storing numbers is no big deal for computers, storing letters is another matter. Programming languages take the easy way out by using number codes for letters. Thus, the char type is another integer type. It's guaranteed to be large enough to represent the entire range of basic symbolsall the letters, digits, punctuation, and the likefor the target computer system. In practice, most systems support fewer than 256 kinds of characters, so a single byte can represent the whole range. Therefore, although char is most often used to handle characters, you can also use it as an integer type that is typically smaller than short.
The most common symbol set in the United States is the ASCII character set, described in Appendix C, "The ASCII Character Set." A numeric code (the ASCII code) represents each character in the set. For example, 65 is the code for the character A, and 77 is the code for the character M. For convenience, this book assumes ASCII code in its examples. However, a C++ implementation uses whatever code is native to its host systemfor example, EBCDIC (pronounced "eb-se-dik") on an IBM mainframe. Neither ASCII nor EBCDIC serve international needs that well, and C++ supports a wide-character type that can hold a larger range of values, such as are used by the international Unicode character set. You'll learn about this wchar_t type later in this chapter.
Try the char type in Listing 3.5.
Listing 3.5 chartype.cpp
// chartype.cpp -- the char type #include <iostream> int main( ) { using namespace std; char ch; // declare a char variable cout << "Enter a character: " << endl; cin >> ch; cout << "Holla! "; cout << "Thank you for the " << ch << " character." << endl; return 0; }
Here's the output from the program in Listing 3.5:
Enter a character: M Holla! Thank you for the M character.
The interesting thing is that you type an M, not the corresponding character code, 77. Also, the program prints an M, not 77. Yet if you peer into memory, you find that 77 is the value stored in the ch variable. The magic, such as it is, lies not in the char type but in cin and cout. These worthy facilities make conversions on your behalf. On input, cin converts the keystroke input M to the value 77. On output, cout converts the value 77 to the displayed character M; cin and cout are guided by the type of variable. If you place the same value 77 into an int variable, cout displays it as 77. (That is, cout displays two 7 characters.) Listing 3.6 illustrates this point. It also shows how to write a character constant in C++: Enclose the character within two single quotation marks, as in 'M'. (Note that the example doesn't use double quotation marks. C++ uses single quotation marks for a character and double quotation marks for a string. The cout object can handle either, but, as Chapter 4 discusses, the two are quite different from one another.) Finally, the program introduces a cout feature, the cout.put() function, which displays a single character.
Listing 3.6 morechar.cpp
// morechar.cpp -- the char type and int type contrasted #include <iostream> int main() { using namespace std; char ch = 'M'; // assign ASCII code for M to c int i = ch; // store same code in an int cout << "The ASCII code for " << ch << " is " << i << endl; cout << "Add one to the character code:" << endl; ch = ch + 1; // change character code in c i = ch; // save new character code in i cout << "The ASCII code for " << ch << " is " << i << endl; // using the cout.put() member function to display a char cout << "Displaying char ch using cout.put(ch): "; cout.put(ch); // using cout.put() to display a char constant cout.put('!'); cout << endl << "Done" << endl; return 0; }
Here is the output from the program in Listing 3.6:
The ASCII code for M is 77 Add one to the character code: The ASCII code for N is 78 Displaying char ch using cout.put(ch): N! Done
Program Notes
In the program in Listing 3.6, the notation 'M' represents the numeric code for the M character, so initializing the char variable c to 'M' sets c to the value 77. The program then assigns the identical value to the int variable i, so both c and i have the value 77. Next, cout displays c as M and i as 77. As previously stated, a value's type guides cout as it chooses how to display that valuejust another example of smart objects.
Because c is really an integer, you can apply integer operations to it, such as adding 1. This changes the value of c to 78. The program then resets i to the new value. (Equivalently, you can simply add 1 to i.) Again, cout displays the char version of that value as a character and the int version as a number.
The fact that C++ represents characters as integers is a genuine convenience that makes it easy to manipulate character values. You don't have to use awkward conversion functions to convert characters to ASCII and back.
Finally, the program uses the cout.put() function to display both c and a character constant.
A Member Function: cout.put()
Just what is cout.put(), and why does it have a period in its name? The cout.put() function is your first example of an important C++ OOP concept, the member function. Remember that a class defines how to represent data and how to manipulate it. A member function belongs to a class and describes a method for manipulating class data. The ostream class, for example, has a put() member function that is designed to output characters. You can use a member function only with a particular object of that class, such as the cout object, in this case. To use a class member function with an object such as cout, you use a period to combine the object name (cout) with the function name (put()). The period is called the membership operator. The notation cout.put() means to use the class member function put() with the class object cout. You'll learn about this in greater detail when you reach classes in Chapter 10, "Objects and Classes." Now, the only classes you have are the istream and ostream classes, and you can experiment with their member functions to get more comfortable with the concept.
The cout.put() member function provides an alternative to using the << operator to display a character. At this point you might wonder why there is any need for cout.put(). Much of the answer is historical. Before Release 2.0 of C++, cout would display character variables as characters but display character constants, such as 'M' and 'N', as numbers. The problem was that earlier versions of C++, like C, stored character constants as type int. That is, the code 77 for 'M' would be stored in a 16-bit or 32-bit unit. Meanwhile, char variables typically occupied 8 bits. A statement like
char c = 'M';
copied 8 bits (the important 8 bits) from the constant 'M' to the variable c. Unfortunately, this meant that, to cout, 'M' and c looked quite different from one another, even though both held the same value. So a statement like
cout << '$';
would print the ASCII code for the $ character rather than simply display $. But
cout.put('$');
would print the character, as desired. Now, after Release 2.0, C++ stores single-character constants as type char, not type int. Therefore, cout now correctly handles character constants.
The cin object has a couple different ways of reading characters from input. You can explore these by using a program that uses a loop to read several characters, so we'll return to this topic when we cover loops in Chapter 5, "Loops and Relational Expressions."
char Constants
You have several options for writing character constants in C++. The simplest choice for ordinary characters, such as letters, punctuation, and digits, is to enclose the character in single quotation marks. This notation stands for the numeric code for the character. For example, an ASCII system has the following correspondences:
'A' is 65, the ASCII code for A
'a' is 97, the ASCII code for a
'5' is 53, the ASCII code for the digit 5
' ' is 32, the ASCII code for the space character
'!' is 33, the ASCII code for the exclamation point
Using this notation is better than using the numeric codes explicitly. It's clearer, and it doesn't assume a particular code. If a system uses EBCDIC, then 65 is not the code for A, but 'A' still represents the character.
There are some characters that you can't enter into a program directly from the keyboard. For example, you can't make the newline character part of a string by pressing the Enter key; instead, the program editor interprets that keystroke as a request for it to start a new line in your source code file. Other characters have difficulties because the C++ language imbues them with special significance. For example, the double quotation mark character delimits strings, so you can't just stick one in the middle of a string. C++ has special notations, called escape sequences, for several of these characters, as shown in Table 3.2. For example, \a represents the alert character, which beeps your terminal's speaker or rings its bell. The escape sequence \n represents a newline. And \" represents the double quotation mark as an ordinary character instead of a string delimiter. You can use these notations in strings or in character constants, as in the following examples:
char alarm = '\a'; cout << alarm << "Don't do that again!\a\n"; cout << "Ben \"Buggsie\" Hacker\nwas here!\n";
Table 3.2 C++ Escape Sequence Codes
Character Name |
ASCII Symbol |
C++ Code |
ASCII Decimal Code |
ASCII Hex Code |
Newline |
NL (LF) |
\n |
10 |
0xA |
Horizontal tab |
HT |
\t |
9 |
0x9 |
Vertical tab |
VT |
\v |
11 |
0xB |
Backspace |
BS |
\b |
8 |
0x8 |
Carriage return |
CR |
\r |
13 |
0xD |
Alert |
BEL |
\a |
7 |
0x7 |
Backslash |
\ |
\\ |
92 |
0x5C |
Question mark |
? |
\? |
63 |
0x3F |
Single quote |
' |
\' |
39 |
0x27 |
Double quote |
" |
\" |
34 |
0x22 |
The last line produces the following output:
Ben "Buggsie" Hacker was here!
Note that you treat an escape sequence, such as \n, just as a regular character, such as Q. That is, you enclose it in single quotes to create a character constant and don't use single quotes when including it as part of a string.
The newline character provides an alternative to endl for inserting new lines into output. You can use the newline character in character constant notation ('\n') or as character in a string ("\n"). All three of the following move the screen cursor to the beginning of the next line:
cout << endl; // using the endl manipulator cout << '\n'; // using a character constant cout << "\n"; // using a string
You can embed the newline character in a longer string; this is often more convenient than using endl. For example, the following two cout statements produce the same output:
cout << endl << endl << "What next?" << endl << "Enter a number:" << endl; cout << "\n\nWhat next?\nEnter a number:\n";
When you're displaying a number, endl is a bit easier to type than "\n" or '\n', but, when you're displaying a string, ending the string with a newline character requires less typing:
cout << x << endl; // easier than cout << x << "\n"; cout << "Dr. X.\n"; // easier than cout << "Dr. X." << endl;
Finally, you can use escape sequences based on the octal or hexadecimal codes for a character. For example, Ctrl+Z has an ASCII code of 26, which is 032 in octal and 0x1a in hexadecimal. You can represent this character with either of the following escape sequences: \032 or \x1a. You can make character constants out of these by enclosing them in single quotes, as in '\032', and you can use them as parts of a string, as in "hi\x1a there".
TIP
When you have a choice between using a numeric escape sequence or a symbolic escape sequence, as in \0x8 versus \b, use the symbolic code. The numeric representation is tied to a particular code, such as ASCII, but the symbolic representation works with all codes and is more readable.
Listing 3.7 demonstrates a few escape sequences. It uses the alert character to get your attention, the newline character to advance the cursor (one small step for a cursor, one giant step for cursorkind), and the backspace character to back the cursor one space to the left. (Houdini once painted a picture of the Hudson River using only escape sequences; he was, of course, a great escape artist.)
Listing 3.7 bondini.cpp
// bondini.cpp -- using escape sequences #include <iostream> int main() { using namespace std; cout << "\aOperation \"HyperHype\" is now activated!\n"; cout << "Enter your agent code:________\b\b\b\b\b\b\b\b"; long code; cin >> code; cout << "\aYou entered " << code << "...\n"; cout << "\aCode verified! Proceed with Plan Z3!\n"; return 0; }
Compatibility Note
Some C++ systems based on pre-ANSI C compilers don't recognize \a. You can substitute \007 for \a on systems that use the ASCII character code. Some systems might behave differently, displaying the \b as a small rectangle rather than backspacing, for example, or perhaps erasing while backspacing, perhaps ignoring \a.
When you start the program in Listing 3.7, it puts the following text onscreen:
Operation "HyperHype" is now activated! Enter your agent code:________
After printing the underscore characters, the program uses the backspace character to back up the cursor to the first underscore. You can then enter your secret code and continue. Here's a complete run:
Operation "HyperHype" is now activated! Enter your agent code:42007007 You entered 42007007... Code verified! Proceed with Plan Z3!
Universal Character Names
C++ implementations support a basic source character setthat is, the set of characters you can use to write source code. It consists of the letters (uppercase and lowercase) and digits found on a standard U.S. keyboard, the symbols, such as { and =, used in the C language, and a scattering of other characters, such as newline and space characters. Then there is a basic execution character set (that is, characters that can be produced by the execution of a program), which adds a few more characters, such as backspace and alert. The C++ Standard also allows an implementation to offer extended source character sets and extended execution character sets. Furthermore, those additional characters that qualify as letters can be used as part of the name of an identifier. Thus, a German implementation might allow you to use umlauted vowels and a French implementation might allow accented vowels. C++ has a mechanism for representing such international characters that is independent of any particular keyboard: the use of universal character names.
Using universal character names is similar to using escape sequences. A universal character name begins either with \u or \U. The \u form is followed by 8 hexadecimal digits, and the \U form by 16 hexadecimal digits. These digits represent the ISO 10646 code for the character. (ISO 10646 is an international standard under development that provides numeric codes for a wide range of characters. See "Unicode and ISO 10646," later in this chapter.)
If your implementation supports extended characters, you can use universal character names in identifiers, as character constants, and in strings. For example, consider the following code:
int k\u00F6rper; cout << "Let them eat g\u00E2teau.\n";
The ISO 10646 code for ö is 00F6, and the code for is 00E2. Thus, this C++ code would set the variable name to körper and display the following output:
Let them eat gteau.
If your system doesn't support ISO 10646, it might display some other character for or perhaps simply display the word gu00E2teau.
Unicode and ISO 10646
Unicode provides a solution to the representation of various character sets by providing standard numeric codes for a great number of characters and symbols, grouping them by type. For example, the ASCII code is incorporated as a subset of Unicode, so U.S. Latin characters such as A and Z have the same representation under both systems. But Unicode also incorporates other Latin characters, such as those used in European languages; characters from other alphabets, including Greek, Cyrillic, Hebrew, Arabic, Thai, and Bengali; and ideographs, such as those used for Chinese and Japanese. So far Unicode represents more than 96,000 symbols and 49 scripts, and it is still under development. If you want to know more, you can check the Unicode Consortium's website, at http://www.unicode.org.
The International Organization for Standardization (ISO) established a working group to develop ISO 10646, also a standard for coding multilingual text. The ISO 10646 group and the Unicode group have worked together since 1991 to keep their standards synchronized with one another.
signed char and unsigned char
Unlike int, char is not signed by default. Nor is it unsigned by default. The choice is left to the C++ implementation in order to allow the compiler developer to best fit the type to the hardware properties. If it is vital to you that char has a particular behavior, you can use signed char or unsigned char explicitly as types:
char fodo; // may be signed, may be unsigned unsigned char bar; // definitely unsigned signed char snark; // definitely signed
These distinctions are particularly important if you use char as a numeric type. The unsigned char type typically represents the range 0 to 255, and signed char typically represents the range 128 to 127. For example, suppose you want to use a char variable to hold values as large as 200. That works on some systems but fails on others. You can, however, successfully use unsigned char for that purpose on any system. On the other hand, if you use a char variable to hold a standard ASCII character, it doesn't really matter whether char is signed or unsigned, so you can simply use char.
For When You Need More: wchar_t
Programs might have to handle character sets that don't fit within the confines of a single 8-bit byte (for example, the Japanese kanji system). C++ handles this in a couple ways. First, if a large set of characters is the basic character set for an implementation, a compiler vender can define char as a 16-bit byte or larger. Second, an implementation can support both a small basic character set and a larger extended character set. The usual 8-bit char can represent the basic character set, and another type, called wchar_t (for wide character type), can represent the extended character set. The wchar_t type is an integer type with sufficient space to represent the largest extended character set used on the system. This type has the same size and sign properties as one of the other integer types, which is called the underlying type. The choice of underlying type depends on the implementation, so it could be unsigned short on one system and int on another.
The cin and cout family consider input and output as consisting of streams of chars, so they are not suitable for handling the wchar_t type. The latest version of the iostream header file provides parallel facilities in the form of wcin and wcout for handling wchar_t streams. Also, you can indicate a wide-character constant or string by preceding it with an L. The following code stores a wchar_t version of the letter P in the variable bob and displays a whar_t version of the word tall:
wchar_t bob = L'P'; // a wide-character constant wcout << L"tall" << endl; // outputting a wide-character string
On a system with a 2-byte wchar_t, this code stores each character in a 2-byte unit of memory. This book doesn't use the wide-character type, but you should be aware of it, particularly if you become involved in international programming or in using Unicode or ISO 10646.
The bool Type
The ANSI/ISO C++ Standard has added a new type (new to C++, that is), called bool. It's named in honor of the English mathematician George Boole, who developed a mathematical representation of the laws of logic. In computing, a Boolean variable is one whose value can be either true or false. In the past, C++, like C, has not had a Boolean type. Instead, as you'll see in greater detail in Chapters 5 and 6, "Branching Statements and Logical Operators," C++ interprets nonzero values as true and zero values as false. Now, however, you can use the bool type to represent true and false, and the predefined literals true and false represent those values. That is, you can make statements like the following:
bool isready = true;
The literals true and false can be converted to type int by promotion, with true converting to 1 and false to 0:
int ans = true; // ans assigned 1 int promise = false; // promise assigned 0
Also, any numeric or pointer value can be converted implicitly (that is, without an explicit type cast) to a bool value. Any nonzero value converts to true, whereas a zero value converts to false:
bool start = -100; // start assigned true bool stop = 0; // stop assigned false
After the book introduces if statements (in Chapter 6), the bool type will become a common feature in the examples.