Where Does Unit Testing Fit In?
Unit testing is another tool that developers can use to test their own software. You will find out more about how unit tests are designed and written in Chapter 3, “How to Write a Unit Test,” but for the moment it is sufficient to say that unit tests are small pieces of code that test the behavior of other code. They set up the preconditions, run the code under test, and then make assertions about the final state. If the assertions are valid (that is, the conditions tested are satisfied), the test passes. Any deviation from the asserted state represents a failure, including exceptions that stop the test from running to completion.2
In this way, unit tests are like miniature versions of the test cases written by integration testers: They specify the steps to run the test and the expected result, but they do so in code. This allows the computer to do the testing, rather than forcing the developer to step through the process manually. However, a good test is also good documentation: It describes the expectations the tester had of how the code under test would behave. A developer who writes a class for an application can also write tests to ensure that this class does what is required. In fact, as you will see in the next chapter, the developer can also write tests before writing the class that is being tested.
Unit tests are so named because they test a single “unit” of source code, which, in the case of object-oriented software, is usually a class. The terminology comes from the compiler term “translation unit,” meaning a single file that is passed to the compiler. This means that unit tests are naturally white-box tests, because they take a single class out of the context of the application and evaluate its behavior independently. Whether you choose to treat that class as a black box, and only interact with it via its public API, is a personal choice, but the effect is still to interact with a small portion of the application.
This fine granularity of unit testing makes it possible to get a very rapid turnaround on problems discovered through running the unit tests. A developer working on a class is often working in parallel on that class’s tests, so the code for that class will be at the front of her mind as she writes the tests. I have even had cases where I didn’t need to run a unit test to know that it would fail and how to fix the code, because I was still thinking about the class that the test was exercising. Compare this with the situation where a different person tests a use case that the developer might not have worked on for months. Even though unit testing means that a developer is writing code that won’t eventually end up in the application, this cost is offset by the benefit of discovering and fixing problems before they ever get to the testers.
Bug-fixing is every project manager’s worst nightmare: There’s some work to do, the product can’t ship until it’s done, but you can’t plan for it because you don’t know how many bugs exist and how long it will take the developers to fix them. Looking back at Table 1.1, you will see that the bugs fixed at the end of a project are the most expensive to fix, and that there is a large variance in the cost of fixing them. By factoring the time for writing unit tests into your development estimates, you can fix some of those bugs as you’re going and reduce the uncertainty over your ship date.
Unit tests will almost certainly be written by developers because using a testing framework means writing code, working with APIs, and expressing low-level logic: exactly the things that developers are good at. However it’s not necessary for the same developer to write a class and its tests, and there are benefits to separating the two tasks. A senior developer can specify the API for a class to be implemented by a junior developer by expressing the expected behavior as a set of tests. Given these tests, the junior developer can implement the class by successively making each test in the set pass.
This interaction can also be reversed. Developers who have been given a class to use or evaluate but who do not yet know how it works can write tests to codify their assumptions about the class and find out whether those assumptions are valid. As they write more tests, they build a more complete picture of the capabilities and behavior of the class. However, writing tests for existing code is usually harder than writing tests and code in parallel. Classes that make assumptions about their environment may not work in a test framework without significant effort, because dependencies on surrounding objects must be replaced or removed. Chapter 11, “Designing for Test-Driven Development” covers applying unit testing to existing code.
Developers working together can even switch roles very rapidly: One writes a test that the other codes up the implementation for; then they swap, and the second developer writes a test for the first. However the programmers choose to work together is immaterial. In any case, a unit test or set of unit tests can act as a form of documentation expressing one developer’s intent to another.
One key advantage of unit testing is that running the tests is automated. It may take as long to write a good test as to write a good plan for a manual test, but a computer can then run hundreds of unit tests per second. Developers can keep all the tests they’ve ever used for an application in their version control systems alongside the application code, and then run the tests whenever they want. This makes it very cheap to test for regression bugs: bugs that had been fixed but are reintroduced by later development work. Whenever you change the application, you should be able to run all the tests in a few seconds to ensure that you didn’t introduce a regression. You can even have the tests run automatically whenever you commit source code to your repository, by a continuous integration system as described in Chapter 4, “Tools for Testing.”
Repeatable tests do not just warn you about regression bugs. They also provide a safety net when you want to edit the source code without any change in behavior—when you want to refactor the application. The purpose of refactoring is to tidy up your app’s source or reorganize it in some way that will be useful in the future, but without introducing any new functionality, or bugs! If the code you are refactoring is covered by sufficient unit tests, you know that any differences in behavior you introduce will be detected. This means that you can fix up the problems now, rather than trying to find them before (or after) shipping your next release.
However, unit testing is not a silver bullet. As discussed earlier, there is no way that developers can meaningfully test whether they understood the requirements. If the same person wrote the tests and the code under test, each will reflect the same preconceptions and interpretation of the problem being solved by the code. You should also appreciate that no good metrics exist for quantifying the success of a unit-testing strategy. The only popular measurements—code coverage and number of passing tests—can both be changed without affecting the quality of the software being tested.
Going back to the concept that testing is supposed to reduce the risk associated with deploying the software to the customer, it would be really useful to have some reporting tool that could show how much risk has been mitigated by the tests that are in place. The software can’t really know what risk you place in any particular code, so the measurements that are available are only approximations to this risk level.
Counting tests is a very naïve way to measure the effectiveness of a set of tests. Consider your annual bonus—if the manager uses the number of passing tests to decide how much to pay you, you could write a single test and copy it multiple times. It doesn’t even need to test any of your application code; a test that verifies the result "1==1" would add to the count of passing tests in your test suite. And what is a reasonable number of tests for any application? Can you come up with a number that all iOS app developers should aspire to? Probably not—I can’t. Even two developers each tasked with writing the same application would find different problems in different parts, and would thus encounter different levels of risk in writing the app.
Measuring code coverage partially addresses the problems with test counting by measuring the amount of application code that is being executed when the tests are run. This now means that developers can’t increase their bonuses by writing meaningless tests—but they can still just look for “low-hanging fruit” and add tests for that code. Imagine increasing code coverage scores by finding all of the @synthesize property definitions in your app and testing that the getters and setters work. Sure, as we’ll see, these tests do have value, but they still aren’t the most valuable use of your time.
In fact, code coverage tools specifically weigh against coverage of more complicated code. The definition of “complex” here is a specific one from computer science called cyclomatic complexity. In a nutshell, the cyclomatic complexity of a function or method is related to the number of loops and branches—in other words, the number of different paths through the code.
Take two methods: -methodOne has twenty lines with no if, switch, ?: expressions or loops (in other words, it is minimally complex). The other method, -methodTwo:(BOOL)flag has an if statement with 10 lines of code in each branch. To fully cover -methodOne only needs one test, but you must write two tests to fully cover -methodTwo:. Each test exercises the code in one of the two branches of the if condition. The code coverage tool will just report how many lines are executed—the same number, twenty, in each case—so the end result is that it is harder to improve code coverage of more complex methods. But it is the complex methods that are likely to harbor bugs.
Similarly, code coverage tools don’t do well at handling special cases. If a method takes an object parameter, whether you test it with an initialized object or with nil, it’s all the same to the coverage tool. In fact, maybe both tests are useful; that doesn’t matter as far as code coverage is concerned. Either one will run the lines of code in the method, so adding the other doesn’t increase the coverage.
Ultimately, you (and possibly your customers) must decide how much risk is present in any part of the code, and how much risk is acceptable in the shipping product. Even if the test metric tools worked properly, they could not take that responsibility away from you. Your aim, then, should be to test while you think the tests are being helpful—and conversely, to stop testing when you are not getting any benefit from the tests. When asked the question, “Which parts of my software should I test?” software engineer and unit testing expert Kent Beck replied, “Only the bits that you want to work.”