- Programming by Coincidence
- Algorithm Speed
- Refactoring
- Code That's Easy to Test
- Evil Wizards
Code That's Easy to Test
The Software IC is a metaphor that people like to toss around when discussing reusability and component-based development.3 The idea is that software components should be combined just as integrated circuit chips are combined. This works only if the components you are using are known to be reliable.
Chips are designed to be tested—not just at the factory, not just when they are installed, but also in the field when they are deployed. More complex chips and systems may have a full Built-in Self Test (BIST) feature that runs some base-level diagnostics internally, or a Test Access Mechanism (TAM) that provides a test harness that allows the external environment to provide stimuli and collect responses from the chip.
We can do the same thing in software. Like our hardware colleagues, we need to build testability into the software from the very beginning, and test each piece thoroughly before trying to wire them together.
Unit Testing
Chip-level testing for hardware is roughly equivalent to unit testing in software—testing done on each module, in isolation, to verify its behavior. We can get a better feeling for how a module will react in the big wide world once we have tested it throughly under controlled (even contrived) conditions.
A software unit test is code that exercises a module. Typically, the unit test will establish some kind of artificial environment, then invoke routines in the module being tested. It then checks the results that are returned, either against known values or against the results from previous runs of the same test (regression testing).
Later, when we assemble our "software IC's" into a complete system, we'll have confidence that the individual parts work as expected, and then we can use the same unit test facilities to test the system as a whole. We talk about this large-scale checking of the system in Ruthless Testing.
Before we get that far, however, we need to decide what to test at the unit level. Typically, programmers throw a few random bits of data at the code and call it tested. We can do much better, using the ideas behind design by contract.
Testing Against Contract
We like to think of unit testing as testing against contract (see Design by Contract). We want to write test cases that ensure that a given unit honors its contract. This will tell us two things: whether the code meet the contract, and whether the contract means what we think it means. We want to test that the module delivers the functionality it promises, over a wide range of test cases and boundary conditions.
What does this mean in practice? Let's look at the square root routine we first encountered on page 114. Its contract is simple:
require: argument >= 0; ensure: ((result * result) - argument).abs <= epsilon*argument;
This tells us what to test:
Pass in a negative argument and ensure that it is rejected.
Pass in an argument of zero to ensure that it is accepted (this is the boundary value).
Pass in values between zero and the maximum expressible argument and verify that the difference between the square of the result and the original argument is less than some small fraction of the argument.
Armed with this contract, and assuming that our routine does its own pre- and postcondition checking, we can write a basic test script to exercise the square root function.
public void testValue(double num, double expected) { double result = 0.0; try { // We may throw a result = mySqrt(num); // precondition exception } catch (Throwable e) { if (num < 0.0) // If input is < 0, then return; // we're expecting the else // exception, otherwise assert(false); // force a test failure } assert(Math.abs(expected-result) < epsilon*expected); }
Then we can call this routine to test our square root function:
testValue(-4.0, 0.0); testValue( 0.0, 0.0); testValue( 2.0, 1.4142135624); testValue(64.0, 8.0); testValue(1.0e7, 3162.2776602);
This is a pretty simple test; in the real world, any nontrivial module is likely to be dependent on a number of other modules, so how do we go about testing the combination?
Suppose we have a module A that uses a LinkedList and a Sort. In order, we would test:
LinkedList's contract, in full
Sort's contract, in full
A's contract, which relies on the other contracts but does not directly expose them
This style of testing requires you to test subcomponents of a module first. Once the subcomponents have been verified, then the module itself can be tested.
If LinkedList and Sort's tests passed, but A's test failed, we can be pretty sure that the problem is in A, or in A's use of one of those subcomponents. This technique is a great way to reduce debugging effort: we can quickly concentrate on the likely source of the problem within module A, and not waste time reexamining its subcomponents.
Why do we go to all this trouble? Above all, we want to avoid creating a "time bomb"—something that sits around unnoticed and blows up at an awkward moment later in the project. By emphasizing testing against contract, we can try to avoid as many of those downstream disasters as possible.
Tip 48
Design to Test
When you design a module, or even a single routine, you should design both its contract and the code to test that contract. By designing code to pass a test and fulfill its contract, you may well consider boundary conditions and other issues that wouldn't occur to you otherwise. There's no better way to fix errors than by avoiding them in the first place. In fact, by building the tests before you implement the code, you get to try out the interface before you commit to it.
Writing Unit Tests
The unit tests for a module shouldn't be shoved in some far-away corner of the source tree. They need to be conveniently located. For small projects, you can embed the unit test for a module in the module itself. For larger projects, we suggest moving each test into a subdirectory. Either way, remember that if it isn't easy to find, it won't be used.
By making the test code readily accessible, you are providing developers who may use your code with two invaluable resources:
Examples of how to use all the functionality of your module
A means to build regression tests to validate any future changes to the code
It's convenient, but not always practical, for each class or module to contain its own unit test. In Java, for example, every class can have its own main. In all but the application's main class file, the main routine can be used to run unit tests; it will be ignored when the application itself is run. This has the benefit that the code you ship still contains the tests, which can be used to diagnose problems in the field.
In C++ you can achieve the same effect (at compile time) by using #ifdef to compile unit test code selectively. For example, here's a very simple unit test in C++, embedded in our module, that checks our square root function using a testValue routine similar to the Java one defined previously:
#ifdef _TEST_ int main(int argc, char **argv) { argc--; argv++; // skip program name if (argc < 2) { // do standard tests if no args testValue(-4.0, 0.0); testValue( 0.0, 0.0); testValue( 2.0, 1.4142135624); testValue(64.0, 8.0); testValue(1.0e7, 3162.2776602); } else { // else use args double num, expected; while (argc >= 2) { num = atof(argv[0]); expected = atof(argv[1]); testValue(num,expected); argc -= 2; argv += 2; } } return 0; } #endif
This unit test will either run a minimal set of tests or, if given arguments, allow you to pass data in from the outside world. A shell script could use this ability to run a much more complete set of tests.
What do you do if the correct response for a unit test is to exit, or abort the program? In that case, you need to be able to select the test to run, perhaps by specifying an argument on the command line. You'll also need to pass in parameters if you need to specify different starting conditions for your tests.
But providing unit tests isn't enough. You must run them, and run them often. It also helps if the class passes its tests once in a while.
Using Test Harnesses
Because we usually write a lot of test code, and do a lot of testing, we'll make life easier on ourselves and develop a standard testing harness for the project. The main shown in the previous section is a very simple test harness, but usually we'll need more functionality than that.
A test harness can handle common operations such as logging status, analyzing output for expected results, and selecting and running the tests. Harnesses may be GUI driven, may be written in the same target language as the rest of the project, or may be implemented as a combination of makefiles and Perl scripts. A simple test harness sis shown in the answer to Exercise 41 on page 305.
In object-oriented languages and environments, you might create a base class that provides these common operations. Individual tests can subclass from that and add specific test code. You could use a standard naming convention and reflection in Java to build a list of tests dynamically. This technique is a nice way of honoring the DRY principle—you don't have to maintain a list of available tests. But before you go off and start writing your own harness, you may want to investigate Kent Beck and Erich Gamma's xUnit at [URL 22]. They've already done the hard work.
Regardless of the technology you decide to use, test harnesses should include the following capabilities:
A standard way to specify setup and cleanup
A method for selecting individual tests or all available tests
A means of analyzing output for expected (or unexpected) results
A standardized form of failure reporting
Tests should be composable; that is, a test can be composed of subtests of subcomponents to any depth. We can use this feature to test selected parts of the system or the entire system just as easily, using the same tools.
Ad Hoc Testing
During debugging, we may end up creating some particular tests on-the-fly. These may be as simple as a print statement, or a piece of code entered interactively in a debugging or IDE environment.
At the end of the debugging session you need to formalize the adhoc test. If the code broke once, it is likely to break again. Dont't just throw away the test you created; add it to the existing unit test.
For example, using JUnit (the Java member of the xUnit family), we might write our square root test as follows:
public class JUnitExample extends TestCase { public JUnitExample(final String name) { super(name); } protected void setUp() { // Load up test data... testData.addElement(new dblPair(-4.0,0.0)); testData.addElement(new dblPair(0.0,0.0)); testData.addElement(new dblPair(64.0,8.0)); testData.addElement(new dblPair(Double.MAX_VALUE, 1.3407807929942597E154)); } public void testMySqrt() { double num, expected,.result = 0.0; Enumeration enum = testData.element(); while (enum.hasMoreElements()) { dblPair p = (dblPair)enum.nextElement(); num = p.getNum(); expected = p.getExpected(); testValue(num, expected); } } public static Test suite() { TestSuite suite= new Testsuit(); suite.addTest(new JUnitExample("testMySqrt")); return suite; } }
JUnit is designed to be composable: we could add as many tests as we wanted to this suite, and each of those tests could in turn be a suite. In addition, you have your choice of a graphical or batch interface to drive the tests.
Build a Test Window
Even the best sets of tests are unlikely to find all the bugs; there's something about the damp, warm conditions of a production environment that seems to bring them out of the woodwork.
This means you'll often need to test a piece of software once it has been deployed—with real-world data flowing though its veins. Unlike a circuit board or chip, we don't have test pins in software, but we can provide various views into the internal state of a module, without using the debugger (which may be inconvenient or impossible in a production application).
Log files containing trace messages are one such mechanism. Log messages should be in a regular, consistent format; you may want to parse them automatically to deduce processing time or logic paths that the program took. Poorly or inconsistently formatted diagnostics are just so much "spew"—they are difficult to read and impractical to parse.
Another mechanism for getting inside running code is the "hot-key" sequence. When this particular combination of keys is pressed, a diagnostic control window pops up with status messages and so on. This isn't something you normally would reveal to end users, but it can be very handy for the help desk.
For larger, more complex server code, a nifty technique for providing a view into its operation is to include a built-in Web server. Anyone can point a Web browser to the application's HTTP port (which is usually on a nonstandard number, such as 8080) and see internal status, log entries, and possibly even some sort of a debug control panel. This may sound difficult to implement, but it's not. Freely available and embed-dable HTTP Web servers are available in a variety of modern languages. A good place to start looking is [URL 58].
A Culture of Testing
All software you write will be tested—if not by you and your team, then by the eventual users—so you might as well plan on testing it thoroughly. A little forethought can go a long way toward minimizing maintenance costs and help-desk calls.
Despite its hacker reputation, the Perl community has a very strong commitment to unit and regression testing. The Perl standard module installation procedure supports a regression test by invoking
% make test
There's nothing magic about Perl itself in this regard. Perl makes it easier to collate and analyze test results to ensure compliance, but the big advantage is simply that it's a standard—tests go in a particular place, and have a certain expected output. Testing is more cultural than technical; we can instill this testing culture in a project regardless of the language being used.
Tip 49
Test Your Software, or Your Users Will
Related sections include:
The Cat Ate My Source Code
Orthogonality
Design by Contract
Refactoring
Ruthless Testing
Exercises
41. | Design a test jig for the blender interface described in the answer to Exercise 17 on page 289. Write a shell script that will perform a regression test for the blender. You need to test basic functionality, error and boundary conditions, and any contractual obligations. What restrictions are placed on changing the speed? Are they being honored? |