1.4 Expressions and Statements
C++ distinguishes between expressions and statements. Very casually, we could say that every expression becomes a statement if a semicolon is appended. However, we would like to discuss this topic a bit more.
1.4.1 Expressions
Let us build this recursively from the bottom up. Any variable name (x, y, z, . . . ), constant, or literal is an expression. One or more expressions combined by an operator constitute an expression, e.g., x + y or x * y + z. In several languages, such as Pascal, the assignment is a statement. In C++, it is an expression, e.g., x= y + z. As a consequence, it can be used within another assignment: x2= x= y + z. Assignments are evaluated from right to left. Input and output operations such as
std::cout ≪ "x is " ≪ x ≪ "\n"
are also expressions.
A function call with expressions as arguments is an expression, e.g., abs(x) or abs(x * y + z). Therefore, function calls can be nested: pow(abs(x), y). Note that nesting would not be possible if function calls were statements.
Since assignment is an expression, it can be used as an argument of a function: abs(x= y). Or I/O operations such as those above, e.g.:
print(std::cout ≪ "x is " ≪ x ≪ "\n", "I am such a nerd!");
Needless to say this is not particularly readable and it would cause more confusion than doing something useful. An expression surrounded by parentheses is an expression as well, e.g., (x + y). As this grouping by parentheses precedes all operators, we can change the order of evaluation to suit our needs: x * (y + z) computes the addition first.
1.4.2 Statements
Any of the expressions above followed by a semicolon is a statement, e.g.:
x= y + z;
y= f(x + z) * 3.5;
A statement like
y + z;
is allowed despite being useless (most likely). During program execution, the sum of y and z is computed and then thrown away. Recent compilers optimize out such useless computations. However, it is not guaranteed that this statement can always be omitted. If y or z is an object of a user type, then the addition is also user-defined and might change y or z or something else. This is obviously bad programming style (hidden side effect) but legitimate in C++.
A single semicolon is an empty statement, and we can thus put as many semicolons after an expression as we want. Some statements do not end with a semicolon, e.g., function definitions. If a semicolon is appended to such a statement it is not an error but just an extra empty statement. Nonetheless some compilers print a warning in pedantic mode. Any sequence of statements surrounded by curly braces is a statement—called a Compound Statement.
The variable and constant declarations we have seen before are also statements. As the initial value of a variable or constant, we can use any expression (except another assignment or comma operator). Other statements—to be discussed later—are function and class definitions, as well as control statements that we will introduce in the next section.
With the exception of the conditional operator, program flow is controlled by statements. Here we will distinguish between branches and loops.
1.4.3 Branching
In this section, we will present the different features that allow us to select a branch in the program execution.
1.4.3.1 if-Statement
This is the simplest form of control and its meaning is intuitively clear, for instance in
if (weight > 100.0)
cout ≪ "This is quite heavy.\n";
else
cout ≪ "I can carry this.\n";
Often, the else branch is not needed and can be omitted. Say we have some value in variable x and compute something on its magnitude:
if (x < 0.0)
x= -x;
// Now we know that x >= 0.0 (post - condition)
The branches of the if-statement are scopes, rendering the following statements erroneous:
if (x < 0.0)
int absx= -x;
else
int absx= x;
cout ≪ "|x| is " ≪ absx ≪ "\n"; // absx already out of scope
Above, we introduced two new variables, both named absx. They are not in conflict because they reside in different scopes. Neither of them exists after the if-statement, and accessing absx in the last line is an error. In fact, variables declared in a branch can only be used within this branch.
Each branch of if consists of one single statement. To perform multiple operations, we can use braces as in Cardano’s method:
double D= q*q/4.0 + p*p*p/27.0;
if (D > 0.0) {
double z1= ...;
complex<double> z2= ..., z3= ...;
...
} else if (D == 0.0) {
double z1= ..., z2= ..., z3= ...;
...
} else { // D < 0.0
complex<double> z1= ..., z2= ..., z3= ...;
...
}
In the beginning, it is helpful to always write the braces. Many style guides also enforce curly braces on single statements whereas the author prefers them without braces. Irrespective of this, it is highly advisable to indent the branches for better readability.
if-statements can be nested whereas each else is associated with the last open if. If you are interested in examples, have a look at Section A.2.3. Finally, we give you the following:
1.4.3.2 Conditional Expression
Although this section describes statements, we like to talk about the conditional expression here because of its proximity to the if-statement. The result of
condition ? result_for_true : result_for_false
is the second sub-expression (i.e., result_for_true) when condition evaluates to true and result_for_false otherwise. For instance,
min= x <= y ? x : y;
corresponds to the following if-statement:
if (x <= y)
min= x;
else
min= y;
For a beginner, the second version might be more readable while experienced programmers often prefer the first form for its brevity.
?: is an expression and can therefore be used to initialize variables:
int x= f(a),
y= x < 0 ? -x : 2 * x;
Calling functions with several selected arguments is easy with the operator:
f(a, (x < 0 ? b : c), (y < 0 ? d : e));
but quite clumsy with an if-statement. If you do not believe us, try it.
In most cases it is not important whether an if or a conditional expression is used. So use what feels most convenient to you.
Anecdote: An example where the choice between if and ?: makes a difference is the replace_copy operation in the Standard Template Library (STL), §4.1. It used to be implemented with the conditional operator whereas if would be more general. This “bug” remained undiscovered for approximately 10 years and was only detected by an automatic analysis in Jeremy Siek’s Ph.D. thesis [38].
1.4.3.3 switch Statement
A switch is like a special kind of if. It provides a concise notation when different computations for different cases of an integral value are performed:
switch(op_code) {
case 0: z= x + y; break;
case 1: z= x - y; cout ≪ "compute diff\n"; break;
case 2:
case 3: z= x * y; break;
default: z= x / y;
}
A somewhat surprising behavior is that the code of the following cases is also performed unless we terminate it with break. Thus, the same operations are performed in our example for cases 2 and 3. An advanced use of switch is found in Appendix A.2.4.
1.4.4 Loops
1.4.4.1 while- and do-while-Loops
As the name suggests, a while-loop is repeated as long as a certain condition holds. Let us implement as an example the Collatz series that is defined by
Algorithm 1–1: Collatz series
Input: x0
As long as we do not worry about overflow, this is easily implemented with a while-loop:
int x= 19;
while (x != 1) {
cout ≪ x ≪ '\n';
if (x % 2 == 1) // odd
x= 3 * x + 1;
else // even
x= x / 2;
}
Like the if-statement, the loop can be written without braces when there is only one statement.
C++ also offers a do-while-loop. In this case, the condition for continuation is tested at the end:
double eps= 0.001;
do {
cout ≪ "eps= " ≪ eps ≪ '\n';
eps/= 2.0;
} while (eps > 0.0001);
The loop is performed at least one time—even with an extremely small value for eps in our example.
1.4.4.2 for-Loop
The most common loop in C++ is the for-loop. As a simple example, we add two vectors5 and print the result afterward:
double v[3], w[]= {2., 4., 6.}, x[]= {6., 5., 4};
for (int i= 0; i < 3; ++i)
v[i]= w[i] + x[i];
for (int i= 0; i < 3; ++i)
cout ≪ "v[" ≪ i ≪ "]= " ≪ v[i] ≪ '\n';
The loop head consists of three components:
- The initialization;
- A Continuation criterion; and
- A step operation.
The example above is a typical for-loop. In the initialization, we typically declare a new variable and initialize it with 0—this is the start index of most indexed data structures. The condition usually tests whether the loop index is smaller than a certain size and the last operation typically increments the loop index. In the example, we pre-incremented the loop variable i. For intrinsic types like int, it does not matter whether we write ++i or i++. However, it does for user types where the post-increment causes an unnecessary copy; cf. §3.3.2.5. To be consistent in this book, we always use a pre-increment for loop indices.
It is a very popular beginners’ mistake to write conditions like i <= size(..). Since indices are zero-based in C++, the index i == size(..) is already out of range. People with experience in Fortran or MATLAB need some time to get used to zero-based indexing. One-based indexing seems more natural to many and is also used in mathematical literature. However, calculations on indices and addresses are almost always simpler with zero-based indexing.
As another example, we like to compute the Taylor series of the exponential function:
up to the tenth term:
double x= 2.0, xn= 1.0, exp_x= 1.0;
unsigned long fac= 1;
for (unsigned long i= 1; i <= 10; ++i) {
xn*= x;
fac*= i;
exp_x+= xn / fac;
cout ≪ "e^x is " ≪ exp_x ≪ '\n';
}
Here it was simpler to compute term 0 separately and start the loop with term 1. We also used less-equal to assure that the term x10/10! is considered.
The for-loop in C++ is very flexible. The initialization part can be any expression, a variable declaration, or empty. It is possible to introduce multiple new variables of the same type. This can be used to avoid repeating the same operation in the condition, e.g.:
for (int i= xyz.begin(), end= xyz.end(); i < end; ++i) ...
Variables declared in the initialization are only visible within the loop and hide variables of the same names from outside the loop.
The condition can be any expression that is convertible to a bool. An empty condition is always true and the loop is repeated infinitely. It can still be terminated inside the body as we will discuss in the next section. We already mentioned that a loop index is typically incremented in the third sub-expression of for. In principle, we can modify it within the loop body as well. However, programs are much clearer if it is done in the loop head. On the other hand, there is no limitation that only one variable is increased by 1. We can modify as many variables as wanted using the comma operator (§1.3.5) and by any modification desired such as
for (int i= 0, j= 0, p= 1; ...; ++i, j+= 4, p*= 2) ...
This is of course more complex than having just one loop index but still more readable than declaring/modifying indices before the loop or inside the loop body.
1.4.4.3 Range-Based for-Loop
A very compact notation is provided by the new feature called Range-Based for-Loop. We will tell you more about its background once we come to the iterator concept (§4.1.2).
For now, we will consider it as a concise form to iterate over all entries of an array or other containers:
int primes[]= {2, 3, 5, 7, 11, 13, 17, 19};
for (int i : primes)
std::cout ≪ i ≪ " ";
This will print out the primes from the array separated by spaces.
1.4.4.4 Loop Control
There are two statements to deviate from the regular loop evaluation:
- break and
- continue.
A break terminates the loop entirely, and continue ends only the current iteration and continues the loop with the next iteration, for instance:
for (...; ...; ...) {
...
if (dx == 0.0) continue;
x+= dx;
...
if (r < eps) break;
...
}
In the example above we assumed that the remainder of the iteration is not needed when dx == 0.0. In some iterative computations it might be clear in the middle of an iteration (here when r < eps) that work is already done.
1.4.5 goto
All branches and loops are internally realized by jumps. C++ provides explicit jumps called goto. However:
The applicability of goto is more restrictive in C++ than in C (e.g., we cannot jump over initializations); it still has the power to ruin the structure of our program.
Writing software without goto is called Structured Programming. However, the term is rarely used nowadays as it is taken for granted in high-quality software.