Analysis for Continuous Delivery: Five Core Practices
Continuous delivery is a software development strategy that optimizes your delivery process to get high-quality, valuable software delivered as quickly as possible. This approach allows you to validate your business hypothesis fast and then rapidly iterate by testing new ideas on users. Although Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation focuses on engineering practices, the concept of continuous delivery has implications for the whole product-delivery process, including the "fuzzy front end" and the design and analysis of features.
Here's the general principle: Rather than coming up with a bunch of features and planning a multi-month release, come up with new ideas continually and try them out individually on users. With enough thought, even big features or large-scale changes can be implemented as a series of smaller steps to get faster feedback, with the ability to change course or stop at any point if your idea is found wanting. With a cross-functional team working to deliver these small increments in hours or days, you can be more innovative than your competition and maximize your return on investment.
In this article, I'll discuss five practices of continuous delivery that can help you to create the most efficient path from hypothesis to continuous feedback:
- Start with a minimum viable product (MVP).
- Measure the value of your features.
- Perform just enough analysis up front.
- Do less.
- Include feature toggles in your stories.
Start with a Minimum Viable Product (MVP)
"If you are not embarrassed by [the first version of your product], you've launched too late!" Reid Hoffman, cofounder and chairman of LinkedIn (see "Ten Entrepreneurship Rules for Building Massive Companies").
If the start of your project is marked by a big requirements document landing on the project manager's desk, you've already failed. One of the key ideas that the Lean Startup movement has popularized is the minimum viable product (MVP), defined as the least possible amount of work you can do to test your business hypothesis.
Of course, people in manufacturing have been producing minimum viable products for decadesthey're called prototypes. As with prototypes, you don't need to show the whole world your minimum viable productyou could expose it to a select group of beta users. It might not even be working softwareyou could create a pretotype instead, to gather data without writing a line of code.
The final version you reveal to the wider world may be a much more polished productif that's important to your target audience. One company releases its MVP iPhone apps under a different brand. The point is simply to get statistically significant feedback on whether your business plan is likely to succeed; then, as a secondary goal, prove out the delivery process.
Crucially, working out how the MVP looks requires a cross-functional team consisting of representatives from both business and technology. Roles you want on that team include User Experience Designer (UX), analysis, testing, development, operations, and infrastructure. (Of course, one person could potentially play multiple roles, so you don't necessarily need an enormous committee to get something done.)
Since your minimum viable product is going to take a small team just a few weeksand under no circumstances more than a few months to build, you don't need a lot of ceremony at this point, because you're not betting the company or spending big wads of cash.
Measure the Value of Your Features
"Metrics are part of your product." John Allspaw, VP of Technical Operations, Etsy (see "Building Resilience in Web Development and Operations").
Another key lean concept is validated learning, in which you gather actionable metrics based on real usage of your product, without asking people. As TV's Dr. Gregory House likes to say, people liealthough, more charitably, you might say that they don't know what they want. Treat your users as experimental subjects rather than intelligent agents.
You need to be able to answer questions like these:
- Did the changes we made to the product improve signup, retention, revenue? Or is it time to pivot?
- Which version of our new feature came out better when we did A/B testing?
- All our system metrics look okay, but a user reports that our site isn't working. Are we down?
- Which features of our product are generating the most revenue?
You should be able to answer these questions without trawling through Apache logs, trying to instrument features retroactively, or running custom queries. These questions should be answerable by looking at a dashboard, and the information should be completely auditable.
In his book The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses (Crown Business, 2011), Eric Ries tells the story of Grockit:
Following the lean manufacturing principle of kanban, […]Grockit changed the product prioritization process. Under the new system, user stories were not considered complete until they led to validated learning. Thus, stories could be cataloged as being in one of four states of development: in the product backlog, actively being built, done (feature complete from a technical point of view), or in the process of being validated. Validated was defined as "knowing whether the story was a good idea to have been done in the first place." This validation usually would come in the form of a split test showing a change in customer behavior but also might include customer interviews or surveys.
This kind of learning is only possible if metrics are built into the stories that are being played.
This principle might appear to be web-specific, but it is also true of embedded systems and user-installed products. All types of systems need to gather these kinds of metrics for remote debugging and fault-reporting purposes, as well as for understanding usage patterns.
Perform Just Enough Analysis Up Front
"You know when you are not doing iterative development when [the] team tries to complete specifications before programming." Bas Vodde, "History of Nokia Test."
Once you have an idea for a minimum viable product, you need to start delivering software. The first step is analysis. But having a backlog of fully analyzed stories is wasteful. To analyze stories fully, you need input from customers, developers, testers, UX, and users. If your team is spending time gathering this information, they're not working on delivering valuable functionality and getting real feedback from users based on working software.
How much analysis needs to be done up front? Before development starts on a story, we only care about three things:
- What is the marginal value of delivering the story?
- What is the marginal cost of delivering the story?
- Do we have enough information to begin delivering the story?
The first two questions are important so that we can decide, at the point that delivery capacity becomes available, which story the team should play next. In order to do this, we need to work out which story will maximize economic outcome. The final two questions are closely related, because the amount of information required to estimate the marginal cost of delivering a story is usually more than you need to begin delivering it. But as my colleague Peter Gillard-Moss once pointed out to me, you need at least one acceptance criterion.
This discipline of doing just enough analysis needs to continue throughout the lifecycle of the project, with the emphasis on creating very small, incremental stories. This brings us to the next practice, doing less.
Do Less
"[If] you find yourself running out of room on the cards, use smaller cards." PhlIp (see "Re: [XP] Re: Token definition in User Stories").
Perhaps the most popular acronym in Agile analysis is Bill Wake's INVEST principles. Wake says that good stories are independent, negotiable, valuable, estimable, small, and testable. I want to focus in particular on "small."
People often think that features and stories are interchangeable. Sometimes people think of a feature as something that might take weeks to complete. I remember on one project being presented with "stories" that came in the form of Word documents that were many pages long.
There's a reason why XP told people to write the summary of a story on 3 [ts] 5 index cards. Stories shouldn't take more than a couple of days to complete. Anything bigger than a week is way too long, and should be broken up into smaller bits. Why?
- To make sure that we're getting constant feedback on our work from users, so we can find out whether what we're doing is actually valuable
- To validate that we're actually getting work donenot just "dev complete," but releasableso we can demonstrate that we're making real progress
- To prevent us from creating big batches of work that will be painful to integrate, test, and deploy
- To ensure that we're constantly testing and improving our delivery process
The usual objection is that you can't do anything valuable in a couple of days. I think this statement demonstrates both a lack of imagination and a misinterpretation of what constitutes value. The value of a story can only be measured by showing it to a user, as I mentioned previously. Sure, you're not going to complete a whole feature in a couple of days. But you can complete and get feedback on the kernel of the feature. For example, say you're working on a hotel booking site, and you want to add a feature to allow people to choose whether they want breakfast. Don't create this feature for all hotels or for all partner sites. Instead, start with a story that allows you to add that feature for a single hotel, with no configuration options, and get feedback for that possibility before you proceed further.
Whatever you do, don't decompose features into "stories" focused on one tier of the solution, such as a "story" to implement the persistence layer, another to do the business logic, and a third to implement the UI. Stories should always be small vertical slices. If you're going to have to do a bunch of integration work, focus on making the vertical slice as narrow as possible. For example, if you have to pass a series of messages to another system as part of a piece of functionality, your first story should be to pass the simplest possible message in order to drive out the end-to-end integration.
Forcing yourself to work out the true value of a feature by constantly stripping away functionality until you get to the smallest possible bit of functionality you can show to users (and thus learn from) is a difficult but tremendously valuable discipline. You can use that feedback to determine what small, incremental stepanother couple of days' workyou should do next, or whether you should just stop working on the feature at all in its current form, because it's not as valuable as you thought.
Always keep in the front of your mind that the biggest source of waste in software delivery is unused functionalitymore than half of all functionality developed is never or rarely used, according to one study.
Instead of asking, "What more do we need to put into this feature to make sure people are going to love it?" or "What extra features do we need to put into this release to create a really great product?" ask this: "Can we deliver what we have now? If not, why not?" Do less, so that you can focus your efforts instead on learning from your users.
You may not want to show your feature to the world until a number of stories have been played. There's always a tradeoff between getting something out and making sure that everything released is of sufficient quality (as determined by the user). That's when you need feature toggles, as described in the next section.
Include Feature Toggles in Your Stories
"Right now, out on Facebook.com, is all the code for every major thing we're gonna do in the next six months and beyond." Chuck Rossi (see TechCrunch discussion about Rossi's May 26, 2011 "Facebook Tech Talk").
If you want to increase your release frequency, it's essential to decouple your deployments from the act of taking a feature live. Feature toggles (also known as bits or switches) is a pattern whereby you can control who can see which features. This setup allows you to release new versions of your software that include partially complete features, containing perhaps a story or two of work, but nothing you'd want the general public using.
Facebook's Gatekeeper software allows Facebook to control dynamically who can see which features; for example, exposing a given feature to only 10% of Facebook users, or to people who work at Facebook, or to female users under the age of 25, or to UK residents, and so on. (The Gatekeeper even has a toggle that makes a feature visible to everyone except TechCrunch employees.) This capability allows the Facebook programmers to try a feature on only a certain demographic, and to roll it out incrementally over time.
Feature toggles represent a crucial constraint on the way you break down features into stories. One common objection to the use of feature toggles goes like this: "Some of my stories spray controls all over the user interface. It's going to be a lot of work to add a toggle to make this story invisible to users." The answer? Decompose your features into stories in such a way that it's easy to add the feature toggles.
Feature toggles should be a first-class part of your stories. One team at Orbitz builds feature bits into their stories such that the first task they perform when they play a story is to add the feature bit for that story. The feature bit forms part of the value of the story, and of course the work to add the feature bit gets estimated along with the story. If the estimate for the task of adding the feature bit is high, it's a sign that you've decomposed your feature poorly.
In addition to enabling incremental delivery of functionality, feature toggles have other important applications. They make it possible to degrade your service gracefully (for example, by turning off some resource-intensive feature such as a recommendation engine) in the case of unexpected load. They also let you turn off a problematic feature when a release goes wrong, as an alternative to rolling back your deployment.
Conclusions
A common failure mode of software projects is what Don Reinertsen, in his book The Principles of Product Development Flow: Second Generation Lean Product Development (Celeritas, 2009), calls the large batch death-spiral, whereby product owners, in an attempt to ensure the success of their product, add more and more scope as the project progresses, in a vicious circle that leads to exponential growth of cost and schedule.
Continuous delivery allows teams to reduce dramatically the transaction cost of releasing high-quality software, so you can do it much more frequently, providing a much richer and faster feedback cycle from users back to product teams. But, in turn, you need to change the way you think about managing the flow of work through the software delivery process. In particular, if you're doing continuous delivery correctly, the technology people are no longer the constraint in terms of testing new ideas on users. With traditional delivery processes, we have to wait weeks or months to see our ideas turned into working software. By delivering small increments of functionality and getting feedback, we can constantly be thinking, "What should we try next?" No team that has achieved this transformation wants to go back to the old way of working.
Using traditional delivery methods, we had to be careful to select which ideas we would actually attempt to deliver, because the software delivery process was so expensive. Of course, that sifting process wasn't based on real data. With continuous delivery, we have what my colleague Darius Kumana calls an "airbag for innovation failure"; we can try crazily innovative ideas cheaply and safely at any stage in the evolution of the system, mitigating the risk if they don't work out by (for example) exposing them to only a small group of users. Continuous delivery liberates us by massively reducing the cost and risk of releasing software, putting analysts back where they belongpowering innovation.
Thanks to Chad Wathington, Elena Yatzeck, Dan McClure, and Darius Kumana for feedback on an earlier draft of this article.