4.2 In-Process Quality Metrics
Because our goal is to understand the programming process and to learn to engineer quality into the process, in-process quality metrics play an important role. In-process quality metrics are less formally defined than end-product metrics, and their practices vary greatly among software developers. On the one hand, in-process quality metrics simply means tracking defect arrival during formal machine testing for some organizations. On the other hand, some software organizations with well-established software metrics programs cover various parameters in each phase of the development cycle. In this section we briefly discuss several metrics that are basic to sound in-process quality management. In later chapters on modeling we will examine some of them in greater detail and discuss others within the context of models.
4.2.1 Defect Density During Machine Testing
Defect rate during formal machine testing (testing after code is integrated into the system library) is usually positively correlated with the defect rate in the field. Higher defect rates found during testing is an indicator that the software has experienced higher error injection during its development process, unless the higher testing defect rate is due to an extraordinary testing effortfor example, additional testing or a new testing approach that was deemed more effective in detecting defects. The rationale for the positive correlation is simple: Software defect density never follows the uniform distribution. If a piece of code or a product has higher testing defects, it is a result of more effective testing or it is because of higher latent defects in the code. Myers (1979) discusses a counterintuitive principle that the more defects found during testing, the more defects will be found later. That principle is another expression of the positive correlation between defect rates during testing and in the field or between defect rates between phases of testing.
This simple metric of defects per KLOC or function point, therefore, is a good indicator of quality while the software is still being tested. It is especially useful to monitor subsequent releases of a product in the same development organization. Therefore, release-to-release comparisons are not contaminated by extraneous factors. The development team or the project manager can use the following scenarios to judge the release quality:
If the defect rate during testing is the same or lower than that of the previous release (or a similar product), then ask: Does the testing for the current release deteriorate?
If the answer is no, the quality perspective is positive.
If the answer is yes, you need to do extra testing (e.g., add test cases to increase coverage, blitz test, customer testing, stress testing, etc.).
If the defect rate during testing is substantially higher than that of the previous release (or a similar product), then ask: Did we plan for and actually improve testing effectiveness?
If the answer is no, the quality perspective is negative. Ironically, the only remedial approach that can be taken at this stage of the life cycle is to do more testing, which will yield even higher defect rates.
If the answer is yes, then the quality perspective is the same or positive.
4.2.2 Defect Arrival Pattern During Machine Testing
Overall defect density during testing is a summary indicator. The pattern of defect arrivals (or for that matter, times between failures) gives more information. Even with the same overall defect rate during testing, different patterns of defect arrivals indicate different quality levels in the field. Figure 4.2 shows two contrasting patterns for both the defect arrival rate and the cumulative defect rate. Data were plotted from 44 weeks before code-freeze until the week prior to code-freeze. The second pattern, represented by the charts on the right side, obviously indicates that testing started late, the test suite was not sufficient, and that the testing ended prematurely.
FIGURE 4.2 Two Contrasting Defect Arrival Patterns During Testing
The objective is always to look for defect arrivals that stabilize at a very low level, or times between failures that are far apart, before ending the testing effort and releasing the software to the field. Such declining patterns of defect arrival during testing are indeed the basic assumption of many software reliability models. The time unit for observing the arrival pattern is usually weeks and occasionally months. For reliability models that require execution time data, the time interval is in units of CPU time.
When we talk about the defect arrival pattern during testing, there are actually three slightly different metrics, which should be looked at simultaneously:
The defect arrivals (defects reported) during the testing phase by time interval (e.g., week). These are the raw number of arrivals, not all of which are valid defects.
The pattern of valid defect arrivalswhen problem determination is done on the reported problems. This is the true defect pattern.
The pattern of defect backlog overtime. This metric is needed because development organizations cannot investigate and fix all reported problems immediately. This metric is a workload statement as well as a quality statement. If the defect backlog is large at the end of the development cycle and a lot of fixes have yet to be integrated into the system, the stability of the system (hence its quality) will be affected. Retesting (regression test) is needed to ensure that targeted product quality levels are reached.
4.2.3 Phase-Based Defect Removal Pattern
The phase-based defect removal pattern is an extension of the test defect density metric. In addition to testing, it requires the tracking of defects at all phases of the development cycle, including the design reviews, code inspections, and formal verifications before testing. Because a large percentage of programming defects is related to design problems, conducting formal reviews or functional verifications to enhance the defect removal capability of the process at the front end reduces error injection. The pattern of phase-based defect removal reflects the overall defect removal ability of the development process.
With regard to the metrics for the design and coding phases, in addition to defect rates, many development organizations use metrics such as inspection coverage and inspection effort for in-process quality management. Some companies even set up "model values" and "control boundaries" for various in-process quality indicators. For example, Cusumano (1992) reports the specific model values and control boundaries for metrics such as review coverage rate, review manpower rate (review work hours/number of design work hours), defect rate, and so forth, which were used by NEC's Switching Systems Division.
Figure 4.3 shows the patterns of defect removal of two development projects: project A was front-end loaded and project B was heavily testing-dependent for removing defects. In the figure, the various phases of defect removal are high-level design review (I0), low-level design review (I1), code inspection (I2), unit test (UT), component test (CT), and system test (ST). As expected, the field quality of project A outperformed project B significantly.
FIGURE 4.3 Defect Removal by Phase for Two Products
4.2.4 Defect Removal Effectiveness
Defect removal effectiveness (or efficiency, as used by some writers) can be defined as follows:
Because the total number of latent defects in the product at any given phase is not known, the denominator of the metric can only be approximated. It is usually estimated by:
Defects removed during the phase + defects found later
The metric can be calculated for the entire development process, for the front end (before code integration), and for each phase. It is called early defect removal and phase effectiveness when used for the front end and for specific phases, respectively. The higher the value of the metric, the more effective the development process and the fewer the defects escape to the next phase or to the field. This metric is a key concept of the defect removal model for software development. (In Chapter 6 we give this subject a detailed treatment.) Figure 4.4 shows the DRE by phase for a real software project. The weakest phases were unit test (UT), code inspections (I2), and component test (CT). Based on this metric, action plans to improve the effectiveness of these phases were established and deployed.
FIGURE 4.4 Phase Effectiveness of a Software Project