Exploratory Testing on Agile Teams
An Elusive Bug
It was near the end of the day, and I was working as a tester on an agile team on a high-profile web application. Two teams using Extreme Programming and Scrum were working on separate but interdependent web applications. While I was the lead tester on one of the teams, I was also providing testing guidance to the other team, and it was common for people from the other team—particularly the developers—to come and ask me questions. Our team's day had been productive, and I was hoping to finish up the afternoon with some quiet time at my desk to catch up on phone messages, email, and maybe some more test automation work before I went home.
Just as I settled in, one of the developers from the other agile team walked up to my desk. I could tell immediately that something was wrong: Jason was an upbeat, talented developer with a lot of experience with Extreme Programming and test-driven development and a wonderful attractive energy. As he approached, his shoulders were slumped, he was nervously stroking his chin, and he looked worried. He practically whispered to me, "Jonathan, I need your help." I dragged a chair to my workstation so we could pair up to solve the problem.
Jason explained that his team had a sporadic bug that had become a high-profile problem. Because the developers couldn't repeat it, they didn't think they could afford to spend time on it, and it was put on the back burner. Somewhere along the line, the bug had slipped through the cracks—until it cropped up in a demo of the software to a senior manager, putting the team's credibility at stake.
It turned out that I had originally found the bug while developing an automated test to generate transactions on our system, which was dependent on the system the other team was working on. The area of the system where the bug was occurring wasn't part of our application, but our application relied on this service, so I had filed a bug report and passed it on to their team. I hadn't heard anything about it in a while, and thought they were on top of it. Jason asked what I thought he should do. I told him that the bug wasn't sporadic for me, and I pulled up the automated test I had developed, ran the script, and demonstrated the failure. He broke out into a grin, thanked me, and rushed back to his computer with the stack trace.
"That was easy," I thought with relief.
Two minutes later, Jason came back to my desk, stack trace in hand. "I can't repeat it," he said. I started the test, and we watched it play back on my monitor. When it hit the problem area in the application, we got the same error with a stack trace. He ran back to his computer, and pulled in a pair partner to help. They tried again, several times, but without success. Since we were together in an open work environment, I pulled up a chair and peered over their shoulders as they struggled to reproduce the bug, realizing that it wasn't as straightforward as it appeared.
Back at my workstation, I ran my automated test again and the application crashed. Then I tried it manually several times, and realized I couldn't repeat it reliably. What had changed since I had written the test? The automated test was using test data and running through step by step, doing the same thing over and over. The error occurred in the fifth step of an involved workflow that takes a tester several minutes to reach in the application, so there were several potential distractions and points of variance. I realized that I couldn't repeat it manually anymore. To simplify, I created a new test by modifying the automated test so that it ran until it reached the area of the program where I was having problems, at which point I could take over by hand.
Jason came back and asked how things were going. I showed him the different results achieved when running the fully automated versus manual tests. He asked for the automated test that caused the problem, and went back and ran it with his pair partner. A few minutes later, he shouted over, "Got it!" They could also reproduce it every time with the automated test: "We're going to put some tracing on this to see where it's causing the problem while you work on getting a repeatable case manually." He hoped that even without a manual test case, they could use the automated test to find the source of the problem.
At this point, I was frustrated. Within minutes, my testing efforts had moved from a seemingly repeatable bug to a sporadic bug. It was time to challenge a few assumptions. The problem occurred after the fifth step in the workflow, which contained a screen that took several pieces of test data as inputs. Using my partially automated test, I started designing and implementing tests with different data inputs. I found that if I changed one particular input to a different type of data in the automated test, I stopped getting the failure. Having an inkling of what might be causing this problem, I returned to Jason, who was now working alone. He was running the test, looking at tracing information and growing more and more frustrated. "Something's wrong. I'm getting an error in an area of the application that isn't related to our code at all." I shared my hunch on the test data, so he tried it out using the test data I described, but didn't get a failure.
I reran my tests manually, and still couldn't repeat the failure. The automated test still managed to uncover the problem. I went back to the drawing board, and analyzed where we were with regard to risk, the time we had to work, and test ideas I had already generated. I started thinking about test ideas I had missed and factors that might be affecting the outcome. The biggest difference between the automated test and the manual tests I was running without success: me. I must have been doing something differently.
I realized that I might be varying my test data as I ran through the workflow. I decided to run the test completely manually, but use the test data directly from the automated test, copying and pasting the data directly from the test file. This strategy reproduced the bug consistently. As I looked at the test data in the script, I remembered something important: Two fields on the fifth page required data that was generated for us by another group. That generated test data was evaluated by a system with strict parameters, limiting what could be entered on that page. Back when I wrote the automated test, only one type of test data was working; there were problems with other kinds of test data. Eventually, as each of those test data items was fixed, everyone on the team started to vary the data used in tests. I had developed a habit of using a particular data set for manual testing that was different from the one in the older automated test. Another look at the test data in the automated test showed me that I had developed another habit: The automated test entered data in an optional field, which I sometimes left blank in my manual testing. If when using one data type in a required field I entered data in the optional field, it produced the error, regardless of whether I varied the rest of the data on the page. I had stopped being able to repeat it because I was rushing through the workflow, and forgetting something important about my tests when I got to the screen that had revealed the problem. My partially automated script helped keep me focused on the area that needed my attention and interaction. It also helped me to test more efficiently; I could now repeat the error very quickly using both automated and manual tests.
I went back to Jason and explained what was happening. "I'm using the same test data you're using and that code is unit-tested like you wouldn't believe," he said. "I can't figure out why your automated test is causing the data to run into an area of the program it isn't supposed to run into." I told him I could now repeat the error manually on command, so we paired at my computer. When I copied and pasted in the data I thought was problematic, he stared at it, and asked to see it again. He asked me to come back to his computer, and compared the test data from my automated test to the unit tests he was running. He copied each one and pasted them in a text editor. They were different. Only slightly different, but different. He ran an automated unit test with the test data I provided, and it failed. We had found our culprit. Now we worked on piecing together the puzzle, and as we talked about the failure, new information came to light.
It turned out that the administrators of a central system we all depended on had created two kinds of test data for one data category. One was the generic set that most people in the company used when testing the system. The other set was part of another project that had not been publicized, for security reasons. This new set of data had been released only to Jason's team, and they had written their unit tests using it. Because both data sets were referred to by the same name, we assumed that we were using the same data. The automated tests on the project I was working on hadn't been provided this new test data yet. Each of our automated test suites was using different test data, and we each had missed a crucial set of test cases due to the communication error. If this sporadic bug had made it out into the field, it would have been catastrophic. It would not have been sporadic in production; a large percentage of transactions in the system would have failed. The source of our problem was due to communication problems as well as assumptions. Our testing that afternoon helped reveal not only the bug, but these other problems within our teams.