- 29.1 Three Grains of Rice
- 29.2 Understanding Has to Grow
- 29.3 First Day Automated Testing
- 29.4 Attempting to Get Automation Started
- 29.5 Struggling with (against) Management
- 29.6 Exploratory Test Automation: Database Record Locking
- 29.7 Lessons Learned from Test Automation in an Embedded Hardware-Software Computer Environment
- 29.8 The Contagious Clock
- 29.9 Flexibility of the Automation System
- 29.10 A Tale of Too Many Tools (and Not Enough Cross-Department Support)
- 29.11 A Success with a Surprising End
- 29.12 Cooperation Can Overcome Resource Limitations
- 29.13 An Automation Process for Large-Scale Success
- 29.14 Test Automation Isn't Always What It Seems
29.14 Test Automation Isn’t Always What It Seems
Julian Harty, United Kingdom
Tester at large
I strongly believe testing can and should use automation appropriately, and conversely, we should be careful not to waste time and resources on automating garbage (e.g., ineffective, misguided, or useless tests). Also, we should beware of being beguiled by shiny automation for its own sake, and over the years, I’ve sadly met many people who believe, without foundation, that because they have automated tests, these tests are appropriate or sufficient. One of my self-assigned responsibilities as a test engineer is to challenge these flawed tests and retire as many as practical.
This anecdote includes several experience reports of test automation, both good and bad. Generally, I was directly involved in them, but sometimes the analysis was done by other team members. They are taken from companies I’ve worked with and for over the last 11 years. Project teams ranged from about 10 to 150 technical staff and typically ran for several years.
In every case, test automation was core to the project.
29.14.1 Just Catching Exceptions Does Not Make It a Good Test
A large global application included several APIs that allowed both internal and external groups to integrate with it. Java was the primary programming language. Over the years, before I was involved, hundreds of automated tests had been written for the respective APIs. For one API, the tests were written as a separate application, started from the command line, and in the other, the open source JUnit framework was used. Each set of tests ran slowly, and several days were required to update the tests after each release from the application’s development team.
Our team of test engineers was asked to assume responsibility for both sets of tests. Each engineer was assigned to one set of tests. We spent several days learning how to simply run each set of tests (the process was cumbersome, poorly documented, and simply unreliable). We then started reading through the source code. What we found horrified us: There was an incredible amount of poorly written, duplicated code (implying little or no software design or structure), and worst of all, the only thing each test did to determine success or failure was catch runtime exceptions (e.g., out of memory, network timeout). When an exception was caught, the test reported a failure.
API tests should provide known inputs and confirm the results received are as expected without undesirable side effects or problems. For example, if we have an API for a calculator program, a typical method may be
result = divide(numerator, denominator);
A good test should check that the calculated result is within the error range for the sum (for real numbers, the answer may be approximated, truncated, or rounded, etc.). It should also check for what happens when invalid inputs (e.g., trying to divide by zero) are provided. For example, what should the result be, and should an exception be thrown? (And if so, which exception, and what should the exception contain?)
After spending several more weeks working on the test automation code, we ended up deleting all the tests in one case and effectively rewriting the tests for the other API. In both cases, we decided to focus on enhancing the lower-level unit tests written by the developers of the respective APIs rather than propagating or sustaining inadequate tests written by testing “specialists.”
29.14.2 Sometimes the Failing Test Is the Test Worth Trusting
We decided to restructure our web browser–based tests because the existing tests had various problems and limitations, including high maintenance and poor reliability. The initial restructuring went well, and we also migrated from Selenium RC to WebDriver, which had a more compact and powerful API designed to make tests easier and faster to write. At this stage, the tests ran on a single machine, typically shared with the web application under test when run by the automated build process.
The tests took a long time to run (tens of minutes), which was much longer than our goal (of having them run within a few minutes). Thankfully, we had existing infrastructure to run the tests in parallel across banks of machines. The tests needed to connect to the appropriate test server, which was compiled and started by the build process, so the test engineer made what seemed to be the appropriate modifications to the automated tests to take advantage of the distributed testing infrastructure. Perplexingly, however, one of the tests failed every time he ran the tests using the distributed infrastructure.
Over the next few days, he dug into his code, the configuration scripts, and so on, but was unable to get the now embarrassingly obdurate test to pass. Finally, he discovered that a network configuration issue prevented any of the tests from reaching the newly built server; however, only one of the tests detected this! At this point, he was able to fix the network configuration issue and finally get the failing test to pass.
Several valuable lessons were learned:
- The other existing tests had effectively been worthless because they didn’t fail when they could not reach the server at all.
- Even expert engineers can be fooled for days when test results don’t conform to expectations.
- The failing test was actually the friend of the project because it exposed the problems with the rest of the—very flawed—tests.
One concept worth embracing is to consider how easily the current test could be fooled, or misled, into providing an erroneous result. For example, would an automated test for an email service detect missing menu options? Then consider how to strengthen the test so that it will not be fooled by this problem. While this concept can be applied iteratively to a given test, I suggest you limit yourself to addressing potentially significant problems; otherwise, your test code may take too long to write, maintain, and run.
29.14.3 Sometimes, Micro-Automation Delivers the Jackpot
In this story, 10 lines of Perl cracked open a critical nationwide system.
I learned many years ago that I’m not a brilliant typist. On one project, my poor typing helped expose a potential security issue when I accidentally mistyped some commands for a file transfer protocol in a telnet application, which led to a potential problem. Although I wanted to explore the potential flaw more scientifically, I continued to mistype commands in different ways and found that my mistakes were now hampering my ability to explore the application effectively. At the time, I lacked UNIX or Perl skills, so although writing an automated script to enter the commands seemed sensible, I was unsure whether it was worth spending the time to learn how to write a suitable script.
Salvation came in the form of a gnarly system administrator who knew both UNIX and Perl inside-out. Furthermore, the system architect had unilaterally decreed there were no security flaws in his file transfer protocol, and the system administrator saw a great opportunity to potentially prove the architect wrong, so he immediately offered to write a simple command-line script in Perl that would start telnet and issue various preliminary commands (those I’d been mistyping). The work took him less than 30 minutes.
Once I had this basic script, I was able to experiment with the script through interactive typing, once the script had completed the initial steps, or by adding additional file transfer commands to custom versions of the script. With this script, we eventually proved that there were serious issues in the implementation of the file transfer protocol that resulted in buffer overflows in the underlying program, which we could then exploit to compromise the security of the system. I also identified several design flaws in the software update mechanism and proved that these flaws allowed an attacker to effectively disable the entire nationwide system. Not bad for a few hours work (and a few days to get permission to reproduce the problems in various environments, including the live production system).