Monitoring Test Execution: Bug Advocacy and Finding the Right Test Metrics
Monitoring test execution is difficult to do and harder to do well. When I determine what I should be monitoring I base it on two things: what I need to manage my test project, and what the people I report to need to manage their projects. These can be especially difficult to determine for new projects, new types of projects, new project teams, agile projects, projects practicing rapid testing, or projects with a lot of para-functional testing (performance, security, usability, etc.). I know (most of the time) what information I need, but it’s always a struggle to figure out what’s helpful to everyone else. I find the best way to do this is to simply ask.
I was recently talking with a project manager and I asked him, “Why types of information do you like to know during test execution?” His response was interesting in that it was made up of two very different parts. His first and primary concern was bug advocacy. The second thing he wanted was a test execution dashboard. In this article, I will take a closer look at what good bug advocacy is and then I’ll list some practical items to include in your execution dashboard.
Bug Advocacy
When I asked the project manger what he wanted, about three minutes into his fifteen-minute answer to my questions (he’s a project manager after all), it was plain to see that he wanted someone involved with the test project to be the bug advocate. He wanted the test manager (or some other designated hitter) to hold a working knowledge of the current bugs in the system and to be able to articulate the risks, implications, and high-level details at any given moment.
Now, I’ll grant that’s a tall order to ask of anyone. If a system has 50 or more active bugs (and most have hundreds), how could anyone know about all of them? When I asked him about this he clarified his requirement. What he really wanted was someone who knows about all the new bugs in the system. In his context, the key project stakeholders meet every day during test execution cycles for daily defect status reports. His requirement stems from having to deal with blank looks and quite mutters when he asks for more information on bugs. In his view, it’s difficult to prioritize and asses risk, assign developers, and look for trends when you don’t really know more than the headline for a defect. I think he might have a point.
So what does a bug advocate look like? Well, Cem Kaner happens to have a wonderfully rich guide to bug advocacy titled Bug Advocacy: How to win friends, influence programmers, and stomp bugs. I don’t think I can say it better, but for those looking for an immediate information fix, I might be able to say it shorter. I would encourage you to share Cem’s article with whoever the bug advocate is on your project.
In the project manager’s context, a bug advocate is someone who knows how to read a bug report and translate it into meaningful information for his or her various audiences. If talking with a project manager, they might list some of the risks associated with the bug and how it relates to other problems that have been found. If they are talking with a development manager, they may concentrate of some of the technical issues of the bug-- how it was found, what tools were used to find it, and what the specifics of the error were. And if they are talking with a requirements manager, they may focus on some of the implications of the bug (or the bug fix) to the requirements and any rework that may result.
By having someone who can speak intelligently about all of the new bugs for the project, the management team can make better-informed decisions (which is often a good thing) and hopefully can finish the meeting faster, allowing everyone to move onto more important tasks. A bug advocate should also be trained in identifying trends and should be able to articulate those trends as he or she finds them.
So that’s the first half of monitoring test execution; effective communicative of the defects that are found during the test cycles. In the next section we will look at some tips for how to determine how far you have gone with your testing and how much farther you have to go.
Execution Dashboard
It would seem that in testing we have no shortage of metrics. While talking with my friendly local project manager about the metrics he preferred, we digressed into a discussion on metrics in general and why some are good, some are bad, and some don’t really matter at all. The problem with most of the metrics used in testing is that they are all open to interpretation. Or worse, any one metric by itself may be misleading.
The topic of the validity of any given metric could be a very large article all by itself. In addition, several people smarter then I have already written on the topic. Therefore, in this article we will just look at some of the more common metrics I have seen used and I will let you work out whether they are right for you. I have used all of these metrics at one point or another to varying degrees of success. All I can say is that they are all very context specific. If one is not working, drop it. If one seems like it might be meaningful, try it. Figure out what works for you and the people to whom you report.
An execution dashboard should provide ongoing evaluation and assessment of items relating to the application-under-test and the test project as a whole. Ideally, you will be looking to record the information necessary to diagnose the status of your testing effort and to give you a feel for the overall health of the application you are testing. With any of these metrics, we are trying to make ongoing summary evaluations of the perceived quality of the product.
Here are some of the testing metrics that I have included in my dashboard in the past:
- Requirements coverage
- Code coverage
- Session based coverage
- Planned vs. Executed
- Pass vs. Fail
- Defect submission rate, density, aging, verification rate, and trending
Requirements Coverage
This metric is typically only useful if all the requirements are completely cataloged. This metric looks at your testing effort in terms of the requirements you are testing. Often this metric is not practical unless you use some tool to assist its gathering. Given that some cheap tools for this (and some expensive ones) can be found I really can’t see why a team would want to do this manually.
This metric is often associated with a traceability matrix. A traceability matrix is an artifact that allows you “trace” or link your requirements to your test cases (it’s this step that’s typically automated). In theory, one would desire 100% traceability. That is, for every requirement there is at least one test case that tests that requirement.
Once the traceability has been established, it might then be desirable to cross reference test execution results with those requirements. So after a day of testing, one might be able to say, “Today we executed 30% of our test cases and 70% passed. Of those 30% that failed, the following requirements were found to be implemented incorrectly…”
Code Coverage
This metric is one of the older and better-known metrics for testing. Code coverage metrics are based on the premise that if a line of code is never executed, then you are sure to have not found the bugs in it. If you're new to working with code coverage (or even it you’re not), I would recommend Brian Marick's article How to Misuse Code Coverage.
Again, there are many good free (and expensive) tools available to help with the gathering of this metric. Most of them are self-explanatory, and I would recommend using one for reasons other then for gathering test metrics. Aside from providing a sometimes-useful metric, I find them to be great in helping isolate certain types of bugs.
Session Based Coverage
Session based coverage is based in the practice of session-based test management. Session-based test management is a method for measuring and managing exploratory testing, and was developed at Satisfice Inc. The Satisfice website contains all sorts of material on the topic, and if you poke around enough you’ll find a section on session metrics.
In their session-based test management toolkit, they provide a method of reporting how test sessions map to coverage areas in the application. In session-based exploratory testing, a tester works in sessions (lasting a predefined length of time) that focus on particular area of the application they are testing (or on a particular type of testing). A coverage report based on sessions covered would then be an indication of test coverage from a tester's perspective, “What did I test today?”
Planned vs. Executed
This metric is the ratio of test cases (or scripts depending on terminology) planned versus the number executed for a given period. That is, if I planned 100 test cases, and I executed 87 test cases, I have an 87% test execution. Used in conjunction with the other metrics listed, this one can often be useful in adding context to what the others mean.
This metric can also be useful in determining how much more testing you have to execute in terms of planned test cases (assuming you have an estimate of how long it takes to execute each remaining test case). If I have executed 50% of my testing and most of my test cases take the same amount of time to execute, all other things being equal, it should take me roughly the same amount of time to execute the remaining 50%. As a warning, all other things are rarely equal.
Pass vs. Fail
Much like execution coverage, pass/fail rates are fairly intuitive. Again, this metric is useful to add context to the other metrics. On its own it’s not very useful, but add it to defect metrics, execution metrics, or some of the other metrics and you might have a nice indicator for project health.
Defect Submission Rate, Density, Aging, Verification Rate, and Trending
These metrics focus on bugs. In general, I advocate these the most, simply because they tend to get people looking at bugs instead of metrics. Most defect-tracking tools provide automated methods of gathering all of these metrics.
Submission rates and verification rates should be self-explanatory. Defect density looks at the number of defects as a function of one or two defect attributes (such as distribution over functional area or quality risk compared to status or severity). Defect trends look at defect counts shown as a function over time. And defect aging reports are a special defect density report in which the defect counts are shown as a function of the length of time a defect remained in a given status (open, new, waiting-for-verification, etc.).
Don’t Lose Focus
Remember the project manager that sent us down this path? He sent us down this path because this is the information he wants when we execute our testing. As you look at these metrics (and bug advocacy – a critical part of test execution) think about how you can serve the project members who are using the testing services, but don’t forget about you. Make sure you think about what information is meaningful to you and gather that information as well. You have your own test project (and possibly test group) to run, and you might be interested in different information.
Finally, I would encourage you (if you have not done so already) to talk to your project manager and ask him what he is looking for from you in terms of advocacy and metrics. You might also check with any development managers or other stakeholders who might need to use your information. A quick question or two can save tons of work (or re-work) to prevent you from gathering the wrong information or no information at all.