- In-Process Metrics for Software Testing
- In-Process Metrics and Quality Management
- Possible Metrics for Acceptance Testing to Evaluate Vendor-Developed Software
- How Do You Know Your Product Is Good Enough to Ship?
- Summary
- References
10.3 Possible Metrics for Acceptance Testing to Evaluate Vendor-Developed Software
Due to business considerations, a growing number of organizations rely on external vendors to develop the software for their needs. These organizations typically conduct an acceptance test to validate the software. In-process metrics and detailed information to assess the quality of the vendors' software are generally not available to the contracting organizations. Therefore, useful indicators and metrics related to acceptance testing are important for the assessment of the software. Such metrics would be different from the calendar-timebased metrics discussed in previous sections because acceptance testing is normally short and there may be multiple code drops and, therefore, multiple mini acceptance tests in the validation process.
The IBM 2000 Sydney Olympics project was one such project, in which IBM evaluated vendor-delivered code to ensure that all elements of a highly complex system could be integrated successfully (Bassin, Biyani, and Santhanam, 2002). The summer 2000 Olympic Games was considered the largest sporting event in the world. For example, there were 300 medal events, 28 different sports, 39 competition venues, 30 accreditation venues, 260,000 INFO users, 2,000 INFO terminals, 10,000 news records, 35,000 biographical records, and 1.5 million historical records. There were 6.4 million INFO requests per day on the average and the peak Internet hits per day was 874.5 million. For the Venue Results components of the project, Bassin, Biyani, and Santhanam developed and successfully applied a set of metrics for IBM's testing of the vendor software. The metrics were defined based on test case data and test case execution data; that is, when a test case was attempted for a given increment code delivery, an execution record was created. Entries for a test case execution record included the date and time of the attempt, and the execution status, test phase, pointers to any defects found during execution, and other ancillary information. There were five categories of test execution status: pass, completed with errors, fail, not implemented, and blocked. A status of "failed" or "completed with errors" would result in the generation of a defect record. A status of "not implemented" indicated that the test case did not succeed because the targeted function had not yet been implemented, because this was in an incremental code delivery environment. The "blocked" status was used when the test case did not succeed because access to the targeted area was blocked by code that was not functioning correctly. Defect records would not be recorded for these latter two statuses. The key metrics derived and used include the following:
Metrics related to test cases
Percentage of test cases attemptedused as an indicator of progress relative to the completeness of the planned test effort
Number of defects per executed test caseused as an indicator of code quality as the code progressed through the series of test activities
Number of failing test cases without defect recordsused as an indicator of the completeness of the defect recording process
Metrics related to test execution records
Success rateThe percentage of test cases that passed at the last execution was an important indicator of code quality and stability.
Persistent failure rateThe percentage of test cases that consistently failed or completed with errors was an indicator of code quality. It also enabled the identification of areas that represented obstacles to progress through test activities.
Defect injection rateThe authors used the percentage of test cases whose status went from pass to fail or error, fail to error, or error to fail, as an indicator of the degree to which inadequate or incorrect code changes were being made. Again, the project involves multiple code drops from the vendor. When the status of a test case changes from one code drop to another, it is an indication that a code change was made.
Code completenessThe percentage of test executions that remained "not implemented" or "blocked" throughout the execution history was used as an indicator of the completeness of the coding of component design elements.
With these metrics and a set of in-depth defect analysis referenced as orthogonal defect classification, Bassin and associates were able to provide value-added reports, evaluations, and assessments to the project team.
These metrics merit serious considerations for software projects in similar environments. The authors contend that the underlying concepts are useful, in addition to vendor-delivered software, for projects that have the following characteristics:
Testers and developers are managed by different organizations.
The tester population changes significantly, for skill or business reasons.
The development of code is iterative.
The same test cases are executed in multiple test activities.
It should be noted these test case execution metrics require tracking at a very granular level. By definition, the unit of analysis is at the execution level of each test case. They also require the data to be thorough and complete. Inaccurate or incomplete data will have much larger impact on the reliability of these metrics than on metrics based on higher-level units of analysis. Planning the implementation of these metrics therefore must address the issues related to the test and defect tracking system as part of the development process and project management system. Among the most important issues are cost and behavioral compliance with regard to the recording of accurate data. Finally, these metrics measure the outcome of test executions. When using these metrics to assess the quality of the product to be shipped, the effectiveness of the test plan should be known or assessed a priori, and the framework of effort/outcome model should be applied.