Problem 3: Inaccurate Perceptions
The final problem I noticed was a perception problem. Since we had scripted test cases, progress was measured by the number of test cases executed. I don’t want to suggest that this type of information is not valuable. It is. But I don’t think it’s the only information that’s valuable—and that’s how it was being used.
I think the fact that the number of scripts executed was the driving metric added to the urgency many testers felt in passing test cases (see problem 1 above) and moving on to the next one as quickly as possible. It builds the mentality, "We can’t spend time looking for bugs if we’re measured by how many test cases we actually execute."
In his article "Inside the Software Testing Quagmire," Paul Garbaczeski illustrates the perception problem beautifully. Paul asks an important question and follows it with analysis that I couldn’t disagree with more:
Are test cases comprehensive and repeatable; are they executed in a controlled environment?You’re really asking: Is testing ad hoc or disciplined?
You’re trying to determine: If testing is effective.
Interpreting the response: There should be a set of repeatable test cases and a controlled test environment where the state of the software being tested and the test data are always known. Absent these, it will be difficult to discern true software defects from false alarms caused by flawed test practices.
A related symptom to check: If temporary testers are conscripted from other parts of the organization to "hammer" the software without using formal test cases, it means the organization is reacting to poor testing by adding resources to collapse the test time, rather than addressing the problem’s root causes.
This example illustrates many of the perception problems surrounding scripted testing. Many people believe that the scripts developed for an application can actually capture all the aspects of the application worth testing. But many aspects of an application traditionally aren’t captured with test scripts. Typically we only capture functionality specified by some requirements specification. However, there are many other aspects of the application that we might want to test. Just check out the Product Elements list in the Satisfice Heuristic Test Strategy Model. How many of those elements do you create scripts for? If not all of them, are your scripts complete? Kaner and Bach talk more about the measurement problem and the impossibility of complete testing in their Black Box Software Testing course.
Paul Garbaczeski’s question asks for repeatability. Many people believe that if two people follow the same script, they’ll achieve the same result. However, different people following the same script sometimes get different results. As illustrated in problem 2 above, I found six defects with a test script in which one of the other testers (a very good tester, I might add) found none. James Bach has some great posts on repeatability that I use to guide my decision of when to repeat a test. Both are worth checking out:
The question also asks whether testing is ad hoc or disciplined. Many people believe that a tester who doesn’t document all test cases (completely?) is not a "disciplined" tester. I would offer another explanation. When I elect not to document a test case, it’s because I’m fighting bad test documentation, not because I’m undisciplined. I’m disciplined with my testing, regardless of whether or not I document my test cases. For me, disciplined testing is not a matter of recording steps and results—that’s recordkeeping. For me, discipline in testing comes by remaining focused on delivering value to your project stakeholders. An ad hoc test would be a test without a clear understanding of what value it delivers to the project team.
Garbaczeski’s conclusion that testers need formal test cases in order to be effective misses the point of testing entirely. The conclusion suggests that the person executing a script knows what was in the mind of the test designer when the test case was written. I’ve found that one of the most effective things you can do for a stale test group is to bring in a tester from another part of the organization, let her test in her own way (not using your scripts), see what she finds, and figure out why you didn’t find it. I would argue that temporary resources (testers, developers, or others) conscripted from other parts of the organization to collapse time is a risk regardless of what documents are available to those folks when they get to the project. That approach can work, but success most likely won’t depend on whether we documented all our test cases up front.