1.4 Expressions and Statements
C++ distinguishes between expressions and statements. Very casually, we could say that every expression becomes a statement if a semicolon is appended. However, we would like to discuss this topic a bit more.
1.4.1 Expressions
Let us build this recursively from the bottom up. Any variable name (x, y, z, ...), constant, or literal is an expression. One or more expressions combined by an operator constitute an expression, e.g., x + y or x * y + z. In several languages, such as Pascal, the assignment is a statement. In C++, it is an expression, e.g., x= y + z. As a consequence, it can be used within another assignment: x2= x= y + z. Assignments are evaluated from right to left. Input and output operations such as
std::cout ≪ "x is " ≪ x ≪ "\n"
are also expressions.
A function call with expressions as arguments is an expression, e.g., abs(x) or abs(x * y + z). Therefore, function calls can be nested: pow(abs(x), y). Note that nesting would not be possible if function calls were statements.
Since an assignment is an expression, it can be used as an argument of a function: abs(x= y). Or I/O operations such as those above, e.g.:
print(std::cout ≪ "x is " ≪ x ≪ "\n", "I am such a nerd !");
Needless to say, this is not particularly readable and it would cause more confusion than doing something useful. An expression surrounded by parentheses is an expression as well, e.g., (x + y). As this grouping by parentheses precedes all operators, we can change the order of evaluation to suit our needs: x * (y + z) computes the addition first.
1.4.2 Statements
Any of the expressions above followed by a semicolon is a statement, e.g.:
x= y + z; y= f(x + z) * 3.5;
A statement like
y + z;
is allowed despite having no effect (usually). During program execution, the sum of y and z is computed and then thrown away. Recent compilers optimize out such useless computations. However, it is not guaranteed that this statement can always be omitted. If y or z is an object of a user type, then the addition is also user defined and might change y or z or something else. This is obviously bad programming style (hidden side effect) but legitimate in C++.
A single semicolon is an empty statement, and we can thus put as many semicolons after an expression as we want. Some statements do not end with a semicolon, e.g., function definitions. If a semicolon is appended to such a statement, it is not an error but just an extra empty statement. Nonetheless, some compilers print a warning in pedantic mode. Any sequence of statements surrounded by curly braces is a statement—called a Compound Statement.
The variable and constant declarations we have seen before are also statements. As the initial value of a variable or constant, we can use any expression (except another assignment or comma operator). Other statements—to be discussed later—are function and class definitions, as well as control statements that we will introduce in the next sections.
With the exception of the conditional operator, program flow is controlled by statements. Here we will distinguish between branches and loops.
1.4.3 Branching
In this section, we will present the different features that allow us to select a branch in the program execution.
1.4.3.1 if-Statement
This is the simplest form of control and its meaning is intuitively clear, for instance in:
if (weight > 100.0) cout ≪ "This is quite heavy.\n"; else cout ≪ "I can carry this.\n";
Often, the else branch is not needed and can be omitted. Say we have some value in variable x and compute something on its magnitude:
if (x < 0.0) x= -x; // Now we know that x >= 0.0 (post-condition)
The branches of the if-statement are scopes, rendering the following statements erroneous:
if (x < 0.0) double absx= -x; else double absx= x; cout ≪ "|x| is " ≪ absx ≪ "\n"; // Error: absx out of scope
Above, we introduced two new variables, both named absx. They are not in conflict because they reside in different scopes. Neither of them exists after the if-statement, and accessing absx in the last line is an error. In fact, variables declared in a branch can only be used within this branch.
Each branch of if consists of one single statement. To perform multiple operations, we can use braces, as in the following example realizing Cardano’s method:
double D= q*q/4.0 + p*p*p/27.0; if (D > 0.0) { double z1= ...; complex<double> z2 = ..., z3= ...; ... } else if (D == 0.0) { double z1= ..., z2= ..., z3= ...; ... } else { // D < 0.0 complex<double> z1= ..., z2= ..., z3= ...; ... }
In the beginning, it is helpful to always write the braces. Many style guides also enforce curly braces on single statements whereas the author prefers them without braces. Irrespective of this, it is highly advisable to indent the branches for better readability.
if-statements can be nested whereas each else is associated with the last open if. If you are interested in examples, have a look at Section A.2.2. Finally, we give you the following:
⇒ c++17/if_init.cpp
C++17 The if-statement was extended in C++17 with the possibility to initialize a variable whose scope is limited to the if-statement. This helps control the lifetime of variables; for instance, the result of an insertion into a map (see Section 4.1.3.5) is a kind of reference to the new entry and a bool if the insertion was successful:
map<string, double> constants= {{"e", 2.7}, {"pi", 3.14}}; if (auto res = constants.insert({"h", 6.6e-34}); res.second) cout ≪ "inserted" ≪ res.first->first ≪ " mapping to " ≪ res.first->second ≪ endl; else cout ≪ "entry for " ≪ res.first->first ≪ " already exists.\n";
We could have declared res before the if-statement and it would then exist until the end of the surrounding block—unless we put extra braces around the variable declaration and the if-statement.
1.4.3.2 Conditional Expression
Although this section describes statements, we like to talk about the conditional expression here because of its proximity to the if-statement. The result of
condition ? result_for_true : result_for_false
is the second subexpression (i.e., result_for_true) when condition evaluates to true and result_for_false otherwise. For instance,
min= x <= y ? x : y;
corresponds to the following if-statement:
if (x <= y) min= x; else min= y;
For a beginner, the second version might be more readable while experienced programmers often prefer the first form for its brevity.
?: is an expression and can therefore be used to initialize variables:
int x= f(a), y= x < 0 ? -x : 2 * x;
Calling functions with several selected arguments is easy with the operator:
f(a, (x < 0 ? b : c), (y < 0 ? d : e));
but quite clumsy with an if-statement. If you do not believe us, try it.
In most cases it is not important whether an if or a conditional expression is used. So use what feels most convenient to you.
Anecdote: An example where the choice between if and ?: makes a difference is the replace_copy operation in the Standard Template Library (STL), §4.1. It used to be implemented with the conditional operator whereas if would be slightly more general. This “bug” remained undiscovered for approximately 10 years and was only detected by an automatic analysis in Jeremy Siek’s Ph.D. thesis [57].
1.4.3.3 switch Statement
A switch is like a special kind of if. It provides a concise notation when different computations for different cases of an integral value are performed:
switch(op_code) { case 0: z= x + y; break; case 1: z= x - y; cout ≪ "compute diff\n"; break; case 2: case 3: z= x * y; break; default: z= x / y; }
A somewhat surprising behavior is that the code of the following cases is also performed unless we terminate it with break. Thus, the same operations are performed in our example for cases 2 and 3. A compiler warning for (nonempty) cases without break is generated with -Wimplicit-fallthrough in g++ and clang++.
C++17 To avoid such warnings and to communicate to co-developers that the fall-through is intended, C++17 introduces the attribute [[fallthrough]]:
switch(op_code) { case 0: z= x + y; break; case 1: z= x - y; cout ≪ "compute diff\n"; break; case 2: x= y; [[fallthrough]]; case 3: z= x * y; break; default: z= x / y; }
C++17 Also added in C++17 is the ability to initialize a variable in the switch-statement in the same way as in if.
An advanced use of switch is found in Appendix A.2.3.
1.4.4 Loops
1.4.4.1 while- and do-while-Loops
As the name suggests, a while-loop is repeated as long as the given condition holds. Let us implement as an example the Collatz series that is defined by
Algorithm 1–1: Collatz series
If we do not worry about overflow, this is easily implemented with a while-loop:
int x= 19; while (x != 1) { cout ≪ x ≪ '\n'; if (x % 2 == 1) // odd x= 3 * x + 1; else // even x= x / 2; }
Like the if-statement, the loop can be written without braces in case of a single statement.
C++ also offers a do-while-loop. There the condition for continuation is tested at the end:
double eps = 0.001; do { cout ≪ "eps= " ≪ eps ≪ '\n'; eps/= 2.0; } while (eps > 0.0001);
The loop is performed at least once regardless of the condition.
1.4.4.2 for-Loop
The most common loop in C++ is the for-loop. As a simple example, we add two vectors8 and print the result afterward:
double v[3], w[]= {2., 4., 6.}, x[]= {6., 5., 4}; for (int i= 0; i < 3; ++i) v[i]= w[i] + x[i]; for (int i= 0; i < 3; ++i) cout ≪ "v[" ≪ i ≪ "]= " ≪ v[i] ≪ '\n';
The loop head consists of three components:
The initialization
A Continuation criterion
A step operation
The example above is a typical for-loop. In the initialization, we usually declare a new variable and initialize it with 0—this is the start index of most indexed data structures. The condition typically tests whether the loop index is smaller than a certain size while the last operation usually increments the loop index. In the example, we pre-incremented the loop variable i. For intrinsic types like int, it does not matter whether we write ++i or i++. However, it does for user types where the post-increment causes an unnecessary copy; cf. §3.3.2.5. To be consistent in this book, we always use a pre-increment for loop indices.
It is a very popular beginners’ mistake to write conditions like i <= size(..). Since indices are zero based in C++, the index i == size(..) is already out of range. People with experience in Fortran or MATLAB need some time to get used to zero-based indexing. One-based indexing seems more natural to many and is also used in mathematical literature. However, calculations on indices and addresses are almost always simpler with zero-based indexing.
As another example, we like to compute the Taylor series of the exponential function:
up to the tenth term:
double x= 2.0 , xn= 1.0 , exp_x= 1.0; unsigned long fac= 1; for (unsigned long n= 1; n <= 10; ++n) { xn*= x; fac*= n; exp_x+= xn / fac; cout ≪ "e^x is " ≪ exp_x ≪ '\n'; }
Here it was simpler to compute term 0 separately and start the loop with term 1. We also used less than or equal to ensure that the term x10/10! is considered.
The for-loop in C++ is very flexible. The initialization part can be any expression, a variable declaration, or empty. It is possible to introduce multiple new variables of the same type. This can be used to avoid repeating the same operation in the condition, e.g.:
for (int i= begin(xyz), e= end(xyz); i < e; ++i) ...
Variables declared in the initialization are only visible within the loop and hide variables of the same names from outside the loop.
The condition can be any expression that is convertible to a bool. An empty condition is always true and the loop is repeated infinitely. It can still be terminated inside the body, as we will discuss in Section 1.4.4.4. We already mentioned that a loop index is typically incremented in the third subexpression of for. In principle, we can modify it within the loop body as well. However, programs are much clearer if it is done in the loop head. On the other hand, there is no limitation that only one variable is increased by 1. We can modify as many variables as we want using the comma operator (§1.3.5) and by any modification desired, such as
for (int i= 0, j= 0, p= 1; ...; ++i , j+= 4, p*= 2) ...
This is of course more complex than having just one loop index, but it is still more readable than declaring/modifying indices before the loop or inside the loop body.
C++11 1.4.4.3 Range-Based for-Loop
A very compact notation is provided by the feature called Range-Based for -Loop. We will tell you more about its background once we come to the iterator concept (§4.1.2).
For now, we will consider it as a concise form to iterate over all entries of an array or other containers:
int primes[]= {2, 3, 5, 7, 11, 13, 17, 19}; for (int i : primes) std::cout ≪ i ≪ " ";
C++20 This will print the primes from the array separated by spaces. In C++20 we can initialize prime in the range-based loop:
for (int primes[]= {2, 3, 5, 7, 11, 13, 17, 19}; int i : primes) std::cout ≪ i ≪ " ";
1.4.4.4 Loop Control
There are two statements to deviate from the regular loop evaluation:
break
continue
A break terminates the loop entirely, and continue ends only the current iteration and continues the loop with the next iteration, for instance:
for (...; ...; ...) { ... if (dx == 0.0) continue; x+= dx; ... if (r < eps) break; ... }
In the example above, we assumed that the remainder of the iteration is not needed when dx == 0.0. In some iterative computations, it might be clear in the middle of an iteration (here when r < eps) that all work is already done.
1.4.5 goto
All branches and loops are internally realized by jumps. C++ provides explicit jumps called goto. However:
The applicability of goto is more restrictive in C++ than in C (e.g., we cannot jump over initializations); it still has the power to ruin the structure of our program.
Writing software without goto is called Structured Programming. However, the term is rarely used nowadays since this is taken for granted in high-quality software.