Sight Unseen: Pro Tips to Supercharge Your Automated Tests
Testing is an interesting subject. Everyone pays lip service to it, but I suspect that secretly no one wants to do it. I'm specifically talking about writing automated tests. Much of the available literature focuses on testing frameworks (xUnit, QuickCheck, Selenium, and so on) or methodologies (test-driven development, functional testing), but not much on testing techniques. This may sound reasonable, but by comparison literature on writing production code is considerably richeryou can find all kinds of books and articles on design patterns, architecture, and algorithms. But apart from some pedantic stuff about mock versus stub objects, I haven't really come across a lot on the techniques of testing. I've always found learning a new technique to be far more valuable than learning a new framework.
Until a few years ago, I had pretty much assumed that I knew all there was to know about testing. It was a chore that simply had to be endured, with things like test-driven development (TDD) being occasional, interesting distractions. However, since then I've come to realize that what I don't know far outweighs what I do know. Visual testing is a technique I picked up from watching and imitating brilliant engineers over the years. While it may not be revolutionary, I've found it incredibly useful when attacking difficult testing problems. This article explains why.
Comparing Strings
Like many good techniques, visual testing is largely about giving you clear, concise, and exhaustive information about what happened. Here's a simple example:
@Test public void sortSomeNumbers() { assertEquals("[1, 2, 3]", Sorter.sort(3, 2, 1).toString()); }
This test asserts that my program, Sorter, correctly sorts a list of three numbers. But the test is comparing strings, rather than asserting order in a list of numbers.
Since we're only testing string equality, it doesn't really matter if Sorter.sort() returns a list, an array, or some other kind of objectas long as its string form produces a result that we expect. This capability is incredibly powerful for a couple of reasons:
- You can instantly see when something is wrong by simply diffing two strings.
- You're free to change your mind about the underlying logic (repeatedly), and your test remains unchanged.
You might argue that the second point is achieved with a sufficiently abstract interfacethis is largely true, but in many cases it's quite cumbersome. (Particularly with evolving code, I've found it quite painful.) And refactoring tools only take you so far. Using strings neatly sidesteps this problem.
Testing Parsers
Visual testing is primarily useful when you want to assert that things have happened in a particular sequence. The natural application is in parsing code. Parsers are usually complex to write. Reasoning about them and debugging them can sometimes be a nightmare, because parsers accept an incredibly wide range of inputs, with very strict rules about the output they produce, and these rules are often recursive or combinatorial in nature.
Keeping a running mental model of such code is a real challengeso debugging parsers is no picnic. Here visual testing is a real boon:
public Node computation() { Node node = chain(); // Production failed. if (null == node) return null; Computation computation = new Computation(); computation.add(node); Node rightOp; Node operand; while ((rightOp = rightOp()) != null) { operand = group(); if (null == operand) operand = chain(); if (null == operand) break; rightOp.add(operand); computation.add(rightOp); } return computation; }
This function parses binary expressions of the following form:
computation := chain (comprehension | (rightOp | chain) )*
Without going into too much detail, this grammar would parse strings of the following forms:
1 + 2 x - y x * a.b() ...
The output of a parser is actually an intermediary form known as a parse tree, which resembles a graph of nodes that correspond to individual productions in the original text. (Check out the Kiwi Community's nice parse tree example.)
Since a parse tree is generally not the end result we want, it's not always easy to test. For example, a calculator program would use the parse tree as an intermediate form to compute values in an expression:
expression -> parse tree -> walk tree/compute -> output
If we plug in some values, it might look like this:
3 + 3 -> (comput (. 3) (+ (. 3))) -> [left (+) right branch] -> 6
Before we get to the end result, we need to make sure that our parser has correctly processed the input string and generated a valid parse tree.
The canonical method of testing a parser is to walk the parse tree much as you would when processing it normallyfor our calculator, that would be similar to the step of computing a result. This kind of test is very tedious to write, as you must walk the entire tree and assert that each node is in the correct place, and you must do this for every permutation of your test data. In other words, you have to do almost the same work that the calculator itself doeswalking the parse tree and processing it node by nodefor every expression that you want to test.
Of course there are some ways around this requirement. You can satisfy yourself with tests of certain invariants: assert that a binary operator (+, -, *, /) is always followed by a chain (1, 2, x, y), or something like that. This approach nearly works, but has serious limitations, and it runs somewhat contrary to how we think. It would be much easier to handle the test in terms of a desired end result, such as this expression:
2 * x / a.b() * 6
We could work backward from there. That option also better matches how bugs are handled in productionyou receive bug reports about a specific failure case, and you want to make sure that the string is always parsed correctly henceforth by capturing it in a regression test.
Walking the parse tree can compound these issues by binding up a lot of test code to the structure of your data model. This problem goes back to the flexibility argument I made earlierif you want to change the type of each Node or its contents to reflect new requirements (for instance, to support complex numbers rather than doubles), you can do so easily without going back and changing the entire swath of tests that already exist for your parser. (And it's inordinately worse if you want to merge or split Nodes.)
Visual testing is a great help in this case. One scheme I'm fond of using is nominating a symbol for every type of Node and then generating a string from the resulting parse tree. Here's a sample test case from one of my projects:
@Test public final void parseSimpleExpression() { compare("(computation (. a b c) (+ (. d e)))", "a.b.c + d.e"); }
This test asserts that the string form of my parse tree (whose input is on the right-side) matches the expected test string (on the left side). In this case, parse trees break down into S-Expressions, which are basically lists of symbols enclosed in parentheses that represent branches of the parse tree. For instance, the expression a.b.c is broken down into the dereference chain (. a b c). You might recall a dereference chain from familiar languages such as Java or Ruby, where the variable c is resolved as a member of b, which is itself resolved as a member of variable a.
Similarly, the (+ (. d e)) grouping tells us that everything appearing inside this group should be added to the preceding expression. (Any Lisp fans out there? Note that these S-Expressions don't directly equate to Lisp forms; rather, they're a representational aid.)
Now imagine that you wanted to change from having generic binary operators to using specific numeric operators for each kind of computation. Your parse tree Nodes would change as follows:
BinaryOp -> PlusOp, MinusOp, DivideOp, MultiplyOp
And so on.
If you hadn't planned for this development, at the very least it would be a laborious change to all existing testsand at worst it could cost you days of refactoring. In our visual test, however, nothing needs to change, because the symbols we use correctly denote the desired result. It's also extremely easy to see what went wrong in a very large parse tree, simply by looking at the difference between two strings.
Good IDEs will even highlight the errors for you when a test fails.
Concurrency
The parser use case was a real eye-opener for me. Since then, I've found visual testing to be indispensable, and the speed with which I can produce valid, useful test cases for my parsing code has risen considerably. Where visual testing really shines is in testing multithreaded code.
We might generalize our experience with parsers by saying that visual testing is a useful technique when working on any sort of code that involves a progression of states. In our parse tree, these states were encountered as we processed each subsequent part of an input. In that sense, visual testing acted as a semantic record of our running code.
This general idea also can be applied fruitfully to code that runs in several threads. It's particularly helpful when you want to ensure a "happens before" relationship between two eventsin other words, if you want events to occur in a predetermined sequence. A string of text is quite naturally a match for such a sequence, because it's just a sequence of symbols itself:
@Test public final void multipleThreadsOrderTest() { ConcurrentLinkedQueue<String> result = new ConcurrentLinkedQueue<String>(); // Multi-threaded service. MyOrderlyService my = new MyOrderlyService(); my.registerCallback(new Runnable() { result.add("1"); }); my.registerCallback(new Runnable() { result.add("2"); }); my.perform(Executors.newFixedThreadPool(10 /* threads */)); assertEquals("[1, 2]", result.toString()); }
This test verifies that MyOrderlyService fires two callbacks in the correct order. The service may do a number of things in parallel using the given thread-pool, but the idea is that we can easily verify that our callbacks are called in the order in which they were registered. This isn't a foolproof regression test, of course, especially given that multiple interacting threads can create nondeterministic outcomes. But it's still very useful. In practice, these kinds of tests can catch a number of subtle bugs fairly early.
Furthermore, the idea that you can inspect a string and see the order of events in it is very compelling. Simply examining the string can often tell you exactly where to look in your code for a bug. For example, if we had three callbacks and the test consistently returned "[3, 2, 1]" we can be fairly confident that the synchronization logic is correct, but that perhaps the ordering logic is wrong.
Conclusions
So far, I've found visual testing to be remarkably useful. The natural domain of problems that visual testing can solve is also considerably large. Even in those cases where you might not think visual testing is all that useful, it can be a surprising help; with a simple number-sorting algorithm, for example, a quick inspection of a sequence of numbers will tell you exactly where your test is going wrong. Of course you don't want to rely on it too muchif you're writing visual tests for comparing dates, you should probably stop.
I hope that this article encourages you to think about testing techniques at largemore than just the one technique I've described hereand even seek out your own techniques. Rather than frameworks, coverage metrics, or methodologies, testing tools will genuinely improve the quality of the code we write and the products we build, and provide real delight to the profession of engineering.