Exception Handling and Fault Tolerance in C++: Defect Removal vs. Defect Survival
We all would like the software we develop to behave properly under both normal and abnormal conditions. In the best-case scenario, we would like the software to perform properly in adverse or even hostile environments. If the software cannot perform all of its duties at an optimal level, then, at the very least, we want the software to perform its core duties at some acceptable level. What’s needed is some way to make our software context failure resilient—in other words, insensitive to failures in hardware, software, or human operation.
Basic Terminology
In this article, our primary focus is on how the C++ exception handling mechanisms can be used to help achieve fault tolerance and where in the journey to robust and reliable software exception handlers should fit. To get started, we need to establish a few ground rules. Because some key terms are commonly used in different ways, Table 1 provides some simple definitions for how these terms are used in this article.
Table 1 Basic definitions of key terms used in this article.
Term |
Definition |
Defect |
A flaw in any aspect of software or requirements that may contribute to the occurrence of one or more failures. |
Error |
An inappropriate decision by a software engineer or programmer that leads to a defect in the software. |
Exception handling |
A mechanism for managing exceptions (unanticipated conditions during program execution) that changes the normal flow of program execution. |
Failure |
An unacceptable departure from the operation of a software element that occurs as a consequence of a fault. |
Fault |
A defect in the software due to human error that causes failure when executed under particular conditions. |
Fault tolerance |
A property that allows a program to survive and recover from the software failures caused by faults (defects) introduced into the software as a result of human error. |
Reliability |
The ability of the software to perform a required function under specified conditions for a stated period of time. |
Robustness |
The ability of the software to function under abnormal conditions. |
The extent to which software is able to minimize the effects of failure is a measure of its fault tolerance. Achieving fault tolerant software is one of the primary goals of any software engineering effort. However, the distinction between fault tolerant software and well-tested software is often misunderstood or blurred. Sometimes the responsibilities and activities of software verification, software validation, and exception handling are erroneously interchanged. To work toward our goal of using the C++ exception handling mechanism to help us achieve fault tolerant software, we must first be clear where exception handling fits in the scheme of things.