1.2 Variables
C++ is a strongly typed language (in contrast to many scripting languages). This means that every variable has a type and this type never changes. A variable is declared by a statement beginning with a type followed by a variable name with optional initialization—or a list thereof:
int i1= 2; // Alignment for readability only int i2, i3= 5; // Note: i2 is not initialized float pi= 3.14159; double x= -1.5e6; // -1500000 double y= -1.5e-6; // -0.0000015 char c1= 'a', c2= 35; bool cmp= i1 < pi, // -> true happy= true;
The two slashes // here start a single-line comment; i.e., everything from the double slashes to the end of the line is ignored. In principle, this is all that really matters about comments. So as not to leave you with the feeling that something important on the topic is still missing, we will discuss it a little more in Section 1.9.1.
1.2.1 Intrinsic Types
The most fundamental types in C++ are the Intrinsic Types listed in Table 1–1. They are part of the core language and always available.
Table 1–1: Intrinsic Types
Name |
Semantics |
---|---|
char |
letter and very short integer number |
short |
rather short integer number |
int |
regular integer number |
long |
long integer number |
long long |
very long integer number |
unsigned |
unsigned versions of all the former |
signed |
signed versions of all the former |
float |
single-precision floating-point number |
double |
double-precision floating-point number |
long double |
long floating-point number |
bool |
boolean |
The first five types are integer numbers of nondecreasing length. For instance, int is at least as long as short; i.e., it is usually but not necessarily longer. The exact length of each type is implementation-dependent; e.g., int could be 16, 32, or 64 bits. All these types can be qualified as signed or unsigned. The former has no effect on integer numbers (except char) since they are signed by default.
When we declare an integer type as unsigned, we will have no negative values but twice as many positive ones (plus one when we consider zero as neither positive nor negative). signed and unsigned can be considered adjectives for the noun int with int as the default noun when only the adjective is declared. The same applies for the adjectives short, long, and long long.
The type char can be used in two ways: for letters and rather short numbers. Except for really exotic architectures, it almost always has a length of 8 bits. Thus, we can either represent values from -128 to 127 (signed) or from 0 to 255 (unsigned) and perform all numeric operations on them that are available for integers. When neither signed nor unsigned is declared, it depends on the implementation of the compiler which one is used. Using char or unsigned char for small numbers, however, can be useful when there are large containers of them.
Logic values are best represented as bool. A boolean variable can store true and false.
The non-decreasing length property applies in the same manner to floating-point numbers: float is shorter than or equally as long as double, which in turn is shorter than or equally as long as long double. Typical sizes are 32 bits for float, 64 bits for double, and 128 bits for long double.
1.2.2 Characters and Strings
As mentioned before, the type char can be used to store characters:
char c= 'f';
We can also represent any letter whose code fits into 8 bits. It can even be mixed with numbers; e.g., 'a' + 7 usually leads to 'h' depending on the underlying coding of the letters. We strongly recommend not playing with this since the potential confusion will likely lead to a perceivable waste of time.
From C we inherited the opportunity to represent strings as arrays of char.
char name[8]= "Herbert";
These old C strings all end with a binary 0 as a char value. If the 0 is missing, algorithms keep going until the next memory location with a 0-byte is found. Another big danger is appending to strings: name has no extra space and the additional characters overwrite some other data. Getting all string operations right—without corrupting memory or cutting off longer strings—is everything but trivial with these old strings. We therefore strongly recommend not using them except for literal values.
The C++ compiler distinguishes between single and double quotes: 'a' is the character “a” (it has type char) and "a" is an array with a binary 0 as termination (i.e., its type is const char[2]).
The much more convenient fashion to deal with string is by using the class string (which requires that we include <string>):
#include <string> int main() { std::string name= "Herbert"; }
C++ strings use dynamic memory and manage it themselves. So if we append more text to a string, we don’t need to worry about memory corruption or cutting off strings:
name= name + ", our cool anti-hero"; // more on this later
Many current implementations also use optimization for short strings (e.g., to 16 bytes) that are not stored in dynamic memory, but directly in the string object itself. This optimization can significantly reduce the expensive memory allocation and release.
C++14 Since text in double quotes is interpreted as a char array, we need to be able to denote that the text should be considered a string. This is done with the suffix s, e.g., "Herbert"s.3 Unfortunately, it took us until C++14 to enable this. An explicit conversion like string("Herbert") was always possible. A lightweight constant view on strings was added in C++17 that we will show in Section 4.4.5.
1.2.3 Declaring Variables
This makes programs more readable when they grow long. It also allows the compiler to use the memory more efficiently with nested scopes.
C++11 C++11 can deduce the type of a variable for us, e.g.:
auto i4= i3 + 7;
The type of i4 is the same as that of i3 + 7, which is int. Although the type is automatically determined, it remains the same, and whatever is assigned to i4 afterward will be converted to int. We will see later how useful auto is in advanced programming. For simple variable declarations like those in this section, it is usually better to declare the type explicitly. auto will be discussed thoroughly in Section 3.4.
1.2.4 Constants
Syntactically, constants are like special variables in C++ with the additional attribute of constancy:
const int ci1 = 2; const int ci3; // Error: no value const float pi= 3.14159; const char cc= 'a'; const bool cmp = ci1 < pi;
As they cannot be changed, it is mandatory to set their values in the declaration. The second constant declaration violates this rule, and the compiler will not tolerate such misbehavior.
Constants can be used wherever variables are allowed—as long as they are not modified, of course. On the other hand, constants like those above are already known during compilation. This enables many kinds of optimizations, and the constants can even be used as arguments of types (we will come back to this later in §5.1.4).
1.2.5 Literals
Literals like 2 or 3.14 are typed as well. Simply put, integral numbers are treated as int, long, or unsigned long depending on the magnitude of the number. Every number with a dot or an exponent (e.g., 3e12 ≡ 3· 1012) is considered a double.
Literals of other types can be written by adding a suffix from the following table:
Literal |
Type |
---|---|
2 |
int |
2u |
unsigned |
2l |
long |
2ul |
unsigned long |
2.0 |
double |
2.0f |
float |
2.0l |
long double |
In most cases, it is not necessary to declare the type of literals explicitly since the implicit conversion (a.k.a. Coercion) between built-in numeric types usually sets the values at the programmer’s expectation.
There are, however, four major reasons why we should pay attention to the types of literals.
Availability: The standard library provides a type for complex numbers where the type for the real and imaginary parts can be parameterized by the user:
std::complex<float> z(1.3, 2.4), z2;
Unfortunately, operations are only provided between the type itself and the underlying real type (and arguments are not converted here).4 As a consequence, we cannot multiply z with an int or double but with float:
z2= 2 * z; // Error: no int * complex<float> z2= 2.0 * z; // Error: no double * complex<float> z2= 2.0f * z; // Okay: float * complex<float>
Ambiguity: When a function is overloaded for different argument types (§1.5.4), an argument like 0 might be ambiguous whereas a unique match may exist for a qualified argument like 0u.
Accuracy: The accuracy issue comes up when we work with long double. Since the non-qualified literal is a double, we might lose digits before we assign it to a long double variable:
long double third1= 0.3333333333333333333; // may lose digits long double third2= 0.3333333333333333333l; // accurate
Nondecimal Numbers: Integer literals starting with a zero are interpreted as octal numbers, e.g.:
int o1= 042; // int o1= 34; int o2= 084; // Error! No 8 or 9 in octals!
Hexadecimal literals can be written by prefixing them with 0x or 0X:
int h1= 0x42; // int h1= 66; int h2= 0xfa; // int h2= 250;
C++14 C++14 introduces binary literals, which are prefixed with 0b or 0B:
int b1= 0b11111010; // int b1= 250;
C++14 To improve readability of long literals, C++14 allows us to separate the digits with apostrophes:
long d= 6'546'687'616'861'129l; unsigned long ulx= 0x139'ae3b'2ab0'94f3; int b= 0b101'1001'0011'1010'1101'1010'0001; const long double pi= 3.141'592'653'589'793'238'462l;
C++17 Since C++17, we can even write hexadecimal floating-point literals:
float f1= 0x10.1p0f; // 16.0625 double d2= 0x1ffp10; // 523264
For these, introduced with the p character. The exponent is mandatory—thus we needed p0 in the first example. Due to the suffix f, f1 is a float storing the value 161+16–1 = 16.0625. These literals involve three bases: the pseudo-mantissa is a hexadecimal scaled with powers of 2 whereby the exponent is given as a decimal number. Thus, d2 is 511 × 210 = 523264. Hexadecimal literals seem, admittedly, a little nerdy at the beginning but they allow us to declare binary floating-point values without rounding errors.
String literals are typed as arrays of char:
char s1[]= "Old C style"; // better not
However, these arrays are everything but convenient, and we are better off with the true string type from the library <string>. It can be created directly from a string literal:
#include <string> std::string s2= "In C++ better like this";
Very long text can be split into multiple sub-strings:
std::string s3= "This is a very long and clumsy text " "that is too long for one line.";
C++14 Although s2 and s3 have type string, they are still initialized with literals of type const char[]. This is not a problem here but might be in other situations where the type is deduced by the compiler. Since C++14, we can directly create literals of type string by appending an s:
f("I'm not a string"); // literal of type const char[] f("I'm really a string"s); // literal of type string
As before, we assume that the namespace std is used. To not import the entire standard namespace, we can use sub-spaces thereof, i.e. writing at least one of the following lines:
using namespace std::literals; using namespace std::string_literals; using namespace std::literals::string_literals;
For more details on literals, see for instance [62, §6.2]. We will show how to define your own literals in Section 2.3.6.
C++11 1.2.6 Non-narrowing Initialization
Say we initialize a long variable with a long number:
long l2= 1234567890123;
This compiles just fine and works correctly—when long takes 64 bits as on most 64-bit platforms. When long is only 32 bits long (we can emulate this by compiling with flags like -m32), the value above is too long. However, the program will still compile (maybe with a warning) and runs with another value, e.g., where the leading bits are cut off.
C++11 introduces an initialization that ascertains that no data is lost or, in other words, that the values are not Narrowed. This is achieved with the Uniform Initialization or Braced Initialization that we only touch upon here and expand on in Section 2.3.4. Values in braces cannot be narrowed:
long l= {1234567890123};
Now the compiler will check whether the variable l can hold the value on the target architecture. When using the braces, we can omit the equals sign:
long l{1234567890123};
The compiler’s narrowing protection allows us to verify that values do not lose precision in initializations. Whereas an ordinary initialization of an int by a floating-point number is allowed due to implicit conversion:
int i1= 3.14; // compiles despite narrowing (our risk) int i1n= {3.14}; // Narrowing ERROR: fractional part lost
The new initialization form in the second line forbids this because it cuts off the fractional part of the floating-point number. Likewise, assigning negative values to unsigned variables or constants is tolerated with traditional initialization but denounced in the new form:
unsigned u2= -3; // Compiles despite narrowing (our risk) unsigned u2n= {-3}; // Narrowing ERROR: no negative values
In the previous examples, we used literal values in the initializations and the compiler checks whether a specific value is representable with that type:
float f1= {3.14}; // okay
Well, the value 3.14 cannot be represented with absolute accuracy in any binary floating-point format, but the compiler can set f1 to the value closest to 3.14. When a float is initialized from a double variable (not a literal), we have to consider all possible double values and whether they are all convertible to float in a loss-free manner.
double d; ... float f2= {d}; // narrowing ERROR
Note that the narrowing can be mutual between two types:
unsigned u3= {3}; int i2= {2}; unsigned u4= {i2}; // narrowing ERROR: no negative values int i3= {u3}; // narrowing ERROR: not all large values
The types signed int and unsigned int have the same size, but not all values of each type are representable in the other.
1.2.7 Scopes
Scopes determine the lifetime and visibility of (nonstatic) variables and constants and contribute to establishing a structure in our programs.
1.2.7.1 Global Definition
Every variable that we intend to use in a program must have been declared with its type specifier at an earlier point in the code. A variable can be located in either the global or local scope. A global variable is declared outside all functions. After their declaration, global variables can be referred to from anywhere in the code, even inside functions. This sounds very handy at first because it makes the variables easily available, but when your software grows, it becomes more difficult and painful to keep track of the global variables’ modifications. At some point, every code change bears the potential of triggering an avalanche of errors.
If you do use them, sooner or later you will regret it because they can be accessed from the entire program and it is therefore extremely tedious to keep track of where global variables are changed—and when and how.
Global constants like
const double pi= 3.14159265358979323846264338327950288419716939;
are fine because they cannot cause side effects.
1.2.7.2 Local Definition
A local variable is declared within the body of a function. Its visibility/availability is limited to the { }-enclosed block of its declaration. More precisely, the scope of a variable starts with its declaration and ends with the closing brace of the declaration block.
If we define π in the function main:
int main () { const double pi= 3.14159265358979323846264338327950288419716939; std::cout ≪ "pi is " ≪ pi ≪ ".\n"; }
the variable pi only exists in the main function. We can define blocks within functions and within other blocks:
int main () { { const double pi= 3.14159265358979323846264338327950288419716939; } std::cout ≪ "pi is " ≪ pi ≪ ".\n"; // ERROR: pi is out of scope }
In this example, the definition of pi is limited to the block within the function, and an output in the remainder of the function is therefore an error:
≫pi≪ is not defined in this scope.
because π is Out of Scope.
1.2.7.3 Hiding
When a variable with the same name exists in nested scopes, only one variable is visible. The variable in the inner scope hides the homonymous variables in the outer scopes (causing a warning in many compilers). For instance:
int main () { int a= 5; // define a#1 { a= 3; // assign a#1, a#2 is not defined yet int a; // define a#2 a= 8; // assign a#2, a#1 is hidden { a= 7; // assign a#2 } } // end of a#2's scope a= 11; // assign to a#1 (a#2 out of scope) return 0; }
Due to hiding, we must distinguish the lifetime and the visibility of variables. For instance, a#1 lives from its declaration until the end of the main function. However, it is only visible from its declaration until the declaration of a#2 and again after closing the block containing a#2. In fact, the visibility is the lifetime minus the time when it is hidden. Defining the same variable name twice in one scope is an error.
The advantage of scopes is that we do not need to worry about whether a variable is already defined somewhere outside the scope. It is just hidden but does not create a conflict.5 Unfortunately, the hiding makes the homonymous variables in the outer scope inaccessible. We can cope with this to some extent with clever renaming. A better solution, however, to manage nesting and accessibility is namespaces; see Section 3.2.1.
static variables are the exception that confirms the rule: they live until the end of the execution but are only visible in the scope. We are afraid that their detailed introduction is more distracting than helpful at this stage and have postponed the discussion to Section A.2.1.