Lessons Learned
We learned many lessons on this project. Many of them were positive, but hindsight pointed out that we should have done some things differently. The largest drawback to our approach was that we initially expected to automate the entire parallel testing effort, but we ended up not automating the GUI-level testing (screen-based rules for data verification, consistent behavior, and so on). One of the reasons we wanted to use production data was not just to test the end calculations that were performed on the transaction data, but to test the input constraints on the GUI. This entire aspect of the testing was left out. Doing it again, it would be nice to develop a balance between the two: Use the performance test tools for the bulk of the testing and then randomly select the data for some GUI-level testing using the framework we had already created.
We should have added more elegant response timing and designed the data access so that it could be pointed to different data sources. We didn't think to capture meaningful performance metrics until after we had the bulk of the framework created. It was only when we noticed the performance limitation that we thought to add that level of logging to the scripts. In addition, at one point we wanted to rerun a past dataset to verify a bug fix. Unfortunately, we hadn't thought that far ahead, and we couldn't re-create the environment or point our framework to another data source.
From a maintenance perspective, it might have been easier to write the data parser in a different language entirely (most likely Java, the primary language the developers were using). Even though the language we used was a full-featured programming language, we could have had more support if we had written that part of the framework outside of the test tool. We should also have had some sort of archiving for the data; either basic version control on the source files or perhaps a set of XML files that could be used to regenerate our data easily in any format.
In addition, isolating errors found was very difficult. First we had to determine which data set was used (manually intensive); then we had to reproduce the problem in the web application (which didn't always happen because there was no load on the application). Looking back, we also should have done a better job of logging the actual data and test data used by each virtual user.
Finally, it's worth sharing that this method of testing was eventually abandoned. After some of the more technical testers left the project, it was too difficult for automated testers new to the project to figure out the framework and keep up to date with the web application. When discussing the issue with performance test expert Scott Barber, he commented that these types of tests often tend to be too time-consuming to develop to be thought of as disposable; yet making them maintainable by anyone who was not part of the initial development of the tests is even more difficult. He concluded that it tends to lead to the need for abandoning the test or committing a more senior programmer/tester to the maintenance phase instead of assigning the maintenance task to a junior programmer. As a testament to this, even with the help of the developers, the code for our test was too complex to transfer knowledge to the testers still on the project in a reasonable amount of time. As it turns out, most testers had never done performance testing and had never actually written code. To me, this speaks to a larger problem in the organization and in our industry.