Quality Code through Software Testing: Where Do I Start?
Knowing how to approach a problem and having a plan of attack simplifies and increases the effectiveness of testing. In this sense, test-driven development provides an easier framework in which to test because your tests are directed by what you want to implement next. Untested existing code presents a more formidable challenge because there are so many options for how to start.
This chapter provides guidance in the form of a reasoned approach to testing. The strategy is not radical or new, but it establishes a context in which the techniques in the remainder of the book can be applied. It hopefully clears away some of the questions about testing strategy so that testing techniques can be used effectively.
This chapter—in fact, this whole book—focuses on automated testing. In the Agile Testing Quadrants identified by Lisa Crispin and Janet Gregory [AT], it is all about Q1 and Q2, the quadrants that support the team.
An Approach to Testing
No one has discovered the magic formula for perfect testing. This means that, as practitioners, we need to apply a combination of reasoning, experience, and intuition when we write tests. This section captures many of the concerns you need to consider when writing automated tests.
Understand Your Scope
The first question you need to ask yourself is, “What am I trying to test?” This broad question includes the need to answer several more detailed questions, such as the following.
- Which use cases am I trying to verify?
- Are you trying to test the full stack, an integrated subset, or a unit?
- What technologies are you trying to verify?
- What architectural layers are you trying to verify?
- Are you testing new code, clean, well-written code, or are you rescuing a legacy hair ball?
- Can you decompose the testing problem into usefully smaller, more tractable pieces?
A full stack test gives you the ultimate verification of the fully integrated system, but at a cost. Full stack tests often run slowly due to the presence of all of the component interactions, including network, database, and file system. Full stack tests often have difficulty distinguishing nuanced behaviors and can require complicated recipes to induce error conditions because of the larger number of conditions that you must manage to achieve those results. Finally, full stack tests may be fragile or difficult to implement because of user interface and timing interactions. The smaller the scope of the test you tackle, the less these factors will have significance.
The technologies and architectural layers will naturally relate to your scope, but will steer the testing tools you use and influence your options for test doubles.1 The testing frameworks you use to test a web interface are different from those for a ReST service, which are different from those for a Java class. Targeting the business layer may give you substitution options for the database, whereas testing stored procedures or ORM mappings may not.
The maturity and cleanliness of the code will guide your approach significantly. If you are starting a new project, you have the power to create the future. An existing exemplary code base makes your life easy, perhaps even boring, as you focus on the task at hand instead of reverse engineering and debugging. An unfamiliar spaghetti mess requires patience, insight, and a host of techniques from Michael Feathers [WEwLC]—such as characterization testing and seam2 identification—before you can even start cleaning it up and adding features to it with confidence.
The more independent the parts of your system, the more likely you will feel confident testing those parts separately. Just like system decomposition, breaking your testing task into multiple subtasks will simplify your work and reduce your overall coupling.
A Conceptual Framework for Testing
For conceptual guidance, I use two principles to guide the overall shape of my testing efforts. First and foremost, I focus on the purpose of the software I am testing. Second, I actively work to reduce the degree of coupling introduced by the tests. Let’s consider purpose now; we will discuss coupling in the next chapter.
The purpose of code applies differently at different levels. At a system level, the purpose of the code is the reason the software exists: its features, requirements, and use cases. Critical evaluation of the purpose helps to constrain your test cases and acceptance criteria. Is it important that a user interface element is a particular color or size or that it is aligned a particular way? You may not need to verify particular database fields if the value of the application is simply that the data or state is persisted and can be retrieved; exercising the retrieval and verifying the results may be sufficient.
At a module or component level, purpose refers to a function in the overall system. Integration tests satisfy the needs at this level. A module may overtly implement a feature, directly bearing the responsibility for the functionality the users expect. Alternatively, a module may implement an underlying, enabling design component. Either way, the module serves a role in the overall system. That role should be clear, and it needs to be tested.
At a unit- or isolation-test level, I like to think of the purpose as the value added by the code. Those who live in countries like Mexico or the European Union with a value-added tax (VAT) may find this concept clear. A company only pays a VAT on the amount of value it adds to a product. Raw materials or component parts have a value when you receive them that is subtracted out of the value of the product you sell for purpose of taxation. Similarly, your code takes the libraries and collaborators on which it is built and adds an additional level of functionality or purpose for its consumers: the value added by that code.
Defining a unit test has caused considerable debate, but the value-added perspective gives us an alternative. Many have used definitions like that put forth by Michael Feathers3 stating what a unit test does not do. I feel this merely codifies some test-design heuristics as rules and prefer to use the value-added concept for an inclusive definition of a unit test.
- A unit test is a test that verifies the value added by the code under test. Any use of independently testable collaborators is simply a matter of convenience.
With this definition, use of an untestable method in the same class falls within the value added by the code under test. A testable method called from the code under test should have independent tests and therefore need not be part of the verification except to the extent that it adds value to the code under test. Use of other classes can be mocked, stubbed, or otherwise test doubled because they are independently testable. You can use databases, networks, or file systems at the expense of test performance, although I would not recommend it.
Every subset of the system, from the smallest method to the entire system itself, should have a well-defined purpose. If you find yourself struggling to test the subset, you may not know its purpose. If you do not know the purpose of the subset, it probably has some implementation, design, or architectural issues. You have just diagnosed current or future problems maintaining and extending your software simply by looking at it through a particular lens in order to test it.
State and Behavioral Testing
Classifying the testing task at hand aids in identifying the best approach in writing the tests. Determining whether a test will verify state or behavior usefully steers the testing process.
Pure state verification exercises the code and examines the resulting changes in data values associated with the operation. Mathematical functions that simply transform inputs to outputs exemplify the target of state verification. In a functional model, verification simply determines that the return values correlate correctly with the input parameters. In an object-oriented model, you additionally verify that the attributes of the object have changed (or not changed!) appropriately. State verification tests typically supply inputs and verify that the outputs correspond to the inputs in the expected ways.
Behavior verification checks that the code under test performs the right operations: It exhibits the right behavior in ways that go beyond data transformation. The purest example of behavioral verification simply orchestrates activities, as in Listing 3-1. It is the method that only calls other methods that are independently testable in order to coordinate their execution order and to channel the outputs of one to the inputs of another. Behavioral testing relies heavily on test doubles, frequently mocks.
Listing 3-1: A JavaScript example of a purely behavioral function using the jQuery.Deferred() implementation of the Promise pattern to coordinate asynchronous computation. It orchestrates the steps of the process, ensuring that dependencies are satisfied and parallel tasks are coordinated, but does no computation itself.
function submitTestScore() { verifyAllQuestionsAnswered(); $.when( computeScore(), // Returns deferred object allocateAttemptID() // Server call returns deferred object ).done( sendScore() ); }
Highly decomposed software tends to be either behavioral or stateful. You will encounter cases that justifiably do a little of both, but the majority of the software you test will fit cleanly into one category or the other, guiding you toward the most effective style for testing it.
To Test or Not to Test
Although most of this book assumes you want or need to test your software, you should not blindly assume that is necessary. Perhaps counterintuitively, the primary purpose of automated tests is not to verify new functionality. After you create the test, each time it runs it mainly serves to verify that you did not break existing functionality.
But what if you do not care about preserving functionality? If you write code that will be thrown away, consider whether you should write automated tests for it. Prototypes, proofs of concept, demos, and experiments may all have short and limited lives.
On the other hand, code that you want to sustain should be tested. If you test your code properly, you will increase your ability to modify your code safely. Code that you expect to base your business on and code that you want other people to use should be developed sustainably. This also means that the throwaway code just mentioned should be brought or rewritten under test if you decide to productize it.
You also need to decide how to test your code, and economics play a role in that decision. System and integration tests are usually cheaper and easier for highly coupled, low-quality code that changes infrequently. You will get more benefit applying unit tests to loosely coupled, high-quality code that changes often. The Automated Testing Pyramid presented in Chapter 1 assumes that you want to create sustainable software, but that may not always be the case for perfectly valid economic reasons. Or you may have inherited legacy code4 that needs rescue, in which case you will wrap it in characterization tests [WEwLC]—essentially a form of system or integration test—until you can refactor it into the form more conducive to unit tests.