The Skeptic’s Guide to Project Management, Part IV: Evidence-Based Scheduling
"The history of Enterprise Resource Planning contains both successes and failures, though the failures have been especially notable. Scott Buckhout and his colleagues reported on a study of Enterprise Resource Planning (ERP) implementations in companies with more than $500 million in revenues. The average cost overrun was 179 percent, and the average schedule overrun was 230 percent."
~Information Systems Management in Practice, 6th Edition, Pg. 349
One hundred and seventy-nine percent. That means the cost was nearly three times the initial estimate and the duration more than three times the initial estimate. How does that happen?
The book itself gives us a little insight on page 81:
When Martin presented the $11 million business case for the SAP and e-business initiatives, the CEO asked "How long will this take to deliver?" The Answer: 3 years. "I need it in 18 months" replied the CEO, "It's very important to our strategy. We need this foundation in place."
Now think about the poor project team in this situation. They'll be encouraged to do more with less, to be aggressive and optimistic, and, should they end up delivering earlier than the initial estimates—say in 35 months instead of three years (and likely a few tears, too)—will be told the project is one hundred percent late. Yes, the team could take on contractors, but those contractors need training, and that training takes away from the productive work on the existing team, meaning that adding more staff to a late project will make it later still.
It’s not hard to see where those late and over-budget projects come from.
The Problem
Absent some sort of data, the CEO's idea of how long a project should take is just as good as anyone else's. To quote Henry Ford: "Without data, you're just another guy with an opinion."
This article is about evidence-based project management, a technique that allows you to generate predictions for a release based on historical data—not estimates, but predictions. Along the way I'll discuss quick and easy (and light! and fluffy!) ways to manage projects with evidence, then go on to list common objections and how to overcome them.
Evidence-Based Scheduling In a Nutshell
On an ideal project, no one ever gets interrupted. The information we need to do our jobs is right at our fingertips. No one ever has to wait for a decision, everyone always has the hardware and software they need, no one ever get sick, interrupted, takes vacation, has to take a few minutes off to support someone else, or has a bad day. When we estimate knowledge work, we tend to estimate in "ideal days."
Yet we live in the real world.
We are good at estimation, but poor at predicting this additional "real world" stress. Jeffries, Hendrickson and Anderson call this stress the "load factor" on the project, and suggest that actual time to get anything done will tend to be the multiple of the initial estimate and the load factor, claiming to see load factors in the 2x, 3x, and sometimes even 4x range as typical in an IT organization.
To find out how long things will actually take, compare your estimates to actual performance to get load factor, then adjust future estimates. This idea is the basis of evidence-based scheduling.
Notice I am not suggesting that try to get your estimates more accurate over time. In fact, quite the opposite; I am suggesting that we give up entirely on accurate estimates. Instead we'll try to predict in terms of "ideal engineering days", then multiply our estimates by the load-factor to get something to put on the project plan.
Technique #1: Evidence at the Project Level
The simplest of the evidence-based techniques, evidence at the project level only requires a bit of corporate memory. To perform it, you look for similar-sized projects to your own and find out how late those projects were as a percentage. Once you have four to five examples, you average this percentage of lateness. To create a complete project plan, add that percentage of extra time on the end of your existing schedule. If you don't have that data—say at a brand-new company or you are a new hire working in isolation—you might consider the (free) Riskology Tool developed by Tom DeMarco and Tim Lister.
This method will create a schedule that is similar to one created with a Critical Chain approach. The two use a different foundation to get to the same place and can be compatible.
Technique #2: Points (or Gummy Bears)
Accounting for risk at the project level is a good thing. If things start to go haywire within a project, the method provides a buffer at the end, but it doesn't really help the team steer to success. If you have a way to measure performance of tasks within the project, you can use those performance numbers to predict the outcome of that project itself using a technique called story points.
In order to get to story points we have to do a fair bit of work. First, we'll slice the features very thin, into something like a User Story. The "stories" describe a feature, or an action a user might take, along with all the tasks to get that story complete, from experience design to programming and test. If we express the entire system as User Stories, we can build the system piece by piece, bringing the work to production quality every few weeks. This breaks the work down into micro-projects, called an iteration, that can be anywhere from a week to two months. If we track the "number of ideal days" for each story, we can calculate the number of ideal days the team can accomplish per iteration.
Once we finish an iteration, we track the number of story points accomplished, and that becomes the prediction for the next iteration. We call this technique "yesterday's weather," based on the old rule of thumb that the best predictor of today's weather is yesterday’s. (Some teams do a three iteration rolling average of points in order to round out vacations and out outliers.)
There is one problem with accomplishing a fraction of "ideal days": Management and customers tend to expect the fraction to be one-to-one. My colleague, Catherine Powell, told me one story where her team was accomplishing 2.5 "ideal days" of work per two-week iteration, and the disappointments and conflicts this caused. Most experts suggest replacing the term "ideal days" with something else, such as "points," "story points," "engineering effort," or even something silly like "Gummy Bears."
Once you know the number of points available for the next iteration, you can tell the product owners, which allows them to schedule work that can actually be accomplished in time based on real data. Thus the person who sets the priorities can shuffle the cards to match the number of allocated points. It's possible that is where the term "Gummy Bears" comes from—the technical team could give twenty Gummy Bears to the product owner to "spend." (Let's hope no one on the team is hungry, lest they disappear.)
Technique #3: Apply the data
Once you have established a standard iteration length (say two weeks), the total number of points to complete the project, and the points accomplished per iteration, you’ll have something like this:
Iteration |
Points Required |
Points Completed |
1 |
250 |
14 |
2 |
250 |
26 |
3 |
250 |
36 |
4 |
250 |
44 |
5 |
250 |
51 |
6 |
250 |
61 |
7 |
260 |
66 |
8 |
260 |
71 |
9 |
260 |
83 |
10 |
260 |
94 |
11 |
260 |
101 |
12 |
285 |
111 |
13 |
285 |
121 |
With this information in hand, we can can calculate velocity (average number of points accomplished per iteration) and average scope creep, which is the number of points now required, minus the original requirements, divided by the number of iterations. Using these numbers, we can predict the optimistic number of remaining iterations as well as a pessimistic number that assumes scope creep:
Velocity |
121/13 |
9.3 |
Work Remaining |
285-121 |
164.0 |
Optimistic Iterations Remaining |
164.0 / 9.3 |
15.1 |
Typical Scope Creep |
(285-250)/13 |
2.7 |
Pessimistic Iterations Remain |
((2.6*13)+(164))/9.3 |
21.3 |
Plugging in dates for numbers, you can now predict the project end date. This moves the discussion from one of opinion to one of management: If the sponsor wants to ship the project earlier, he’ll have to control scope, not apply pressure.
These kind of hard numbers create evidence for project performance, hence the name: Evidence-Based Management.
Technique #4: Track Completed Work-items
Doing estimates for each story in story points and calculating velocity takes time. That time takes away from essential practices like figuring out what to build, building it, then figuring out if it works. Wouldn't it be nice if there were a better way?
It turns out there is—with one caveat. If your team can size the work so that each piece is about the same effort as any other piece, then you can count the number of work items accomplished per week and use that as a rough approximation for the next week. I've had a lot of success with this approach when working on maintenance projects, and it is the type of measurement that Kent Beck and Martin Fowler recommend in their book Planning Extreme Programming: Embrace Change.
The risk here is that the effort does not fall into a nice bell curve, or the team spends so much time sizing the work when they could have been doing it. If you want to try this approach, I suggest tasks of no more than one day in duration, and don't fall into the trap of breaking tasks up into testing tasks, development tasks, and requirements tasks. At the end of the project, our goal is working systems: Not a large stack of completed tasks.
Objections to Evidence
Before I even typed up this section, in the back of my mind, I saw Jeff,, that guy who hates change. (On your team, he might have a different name. In my life, the his name has been Jeff.) I can even hear his words now, nearly a decade later, and predict what Jeff will say: "In theory, of course this sounds great, but it will never work in practice."
You probably have a Jeff in your organization. Jeff won’t be satisfied by logic, but he might be satisfied by data. The thing about data is that someone needs to be collect it, and it needs to be accurate.
Collecting the data means work. If you collect data at the task level, someone needs to enter numbers into a spreadsheet, or keep sticky notes, or something else. All of this will detract from the essential work. The data collection can cause immediate, certain short-term negative pain, a kind that is much stronger that the delayed, uncertain, positive benefits from keeping those numbers. That is the same sort of feedback loop that keeps people addicted to nicotine, overeating, and other bad habits—don't discount it. (To overcome this, find ways to make the data gathering cheap; for example, using a project management tool that can generate graphs automatically based on team progress.)
Accurate data is also critical. One source who requested anonymity told me how they tracked projects at his company. According to my friend, the team had to work on many projects, and it was important that each project be "on time" and "on budget." In order to make the projects on time, the team entered only the correct number of hours in the project tracking software, then entered the rest as "administrative work." As a result, all projects looked like they were on time—all the time! Of course, this kind of error corrupts the data and makes it useless. Projects plans built on this kind of "historical data" are not evidence-based scheduling; they are more like a complex illusion. (You might not fall for this trap, but I have seen many more companies compare estimates to actuals, yet fail to consider little things like vacations, team meetings, and sick days.)
A final common objection is in the measurement system itself. Terms like "points" and "velocity" imply some sense of rigor and measurement. These measures are really, at best, just the averaged opinion of the team. If the team wants to "prove" it is going faster, it's easy enough to double or triple the point count for all the stories in the next iteration, or slice the stories so they become smaller and smaller over time.
The key here is to remember that points can be a measure to predict performance. If more points become a goal instead of a measure, human nature will kick in and the purpose of the measurements will be lost. I knew a programmer who was going to be measured by how many change controls he did per week, so he entered a different change control ticket for each file that would be touched for each change. That measurement system didn't last long.
It’s worth saying twice: You use these measurement systems to learn about team performance and make it more visible. Using them to evaluate and control performance leads to dysfunction. (For a more appropriate way to measure performance, consider the auditing techniques that Dr. Robert Austin recommends in his book “Measuring and Managing Performance In Organizations.”)
Likewise, if the hidden goal of management is to use the unrealistic schedule to create a sense of desperation and inspire performance, the evidence-based approach isn't going to be compatible with that. The counter to this behavior is to point back to the data. When people complain the project is "late," go back to the data and ask "what can we change? Scope? Staffing? Delivery dates? Because this trend line isn't lying. If your goal is (date), something needs to change; wishing so won't make it so."
Finally: A Word of Caution
Evidence-based scheduling is not perfect. For one thing, it is entirely based on historic performance. If your company moves into an entirely different business arena, hires new staff who have never worked together before, and tries to implement a new technology no one has ever used, all bets are off for how well your team will perform. You can still gather evidence within that project, though, and could have a valid burn-up chart within a month or two.
Because the method is based on historic data, it is always possible that something goes wrong that has never gone wrong before. Author Nicholas Nassim Taleb calls these "Black Swans," suggesting that they are hard to predict in advance, seem obvious in retrospect, and tend to do a great deal of damage to companies trying to predict the future with mathematical models.
Black Swans can, and likely will, happen on projects. The core development team will catch a cold the week before go-live, the contract vendor you are working with might go out of business the month you really need them, or your cloud hosting provider that you bought because of its massive scale and redundancy might just have an outage.
These things happen. What evidence-based scheduling does, at its best, is give you some amount of confidence in your estimate—some reason that it is not "just your opinion."
And that can make a world of difference.