The Importance of Measurement
One of the reasons that we find it difficult to discard bad ideas is that we don’t really measure our performance in software development very effectively.
Most metrics applied to software development are either irrelevant (velocity) or sometimes positively harmful (lines of code or test coverage).
In agile development circles it has been a long-held view that measurement of software team, or project performance, is not possible. Martin Fowler wrote about one aspect of this in his widely read Bliki in 2003.2
Fowler’s point is correct; we don’t have a defensible measure for productivity, but that is not the same as saying that we can’t measure anything useful.
The valuable work carried out by Nicole Fosgren, Jez Humble, and Gene Kim in the “State of DevOps” reports3 and in their book Accelerate: The Science of Lean Software & DevOps4 represents an important step forward in being able to make stronger, more evidence-based decisions. They present an interesting and compelling model for the useful measurement of the performance of software teams.
Interestingly, they don’t attempt to measure productivity; rather, they evaluate the effectiveness of software development teams based on two key attributes. The measures are then used as a part of a predictive model. They cannot prove that these measures have a causal relationship with the performance of software development teams, but they can demonstrate a statistical correlation.
The measures are stability and throughput. Teams with high stability and high throughput are classified as “high performers,” while teams with low scores against these measures are “low performers.”
The interesting part is that if you analyze the activities of these high- and low-performing groups, they are consistently correlated. High-performing teams share common behaviors. Equally, if we look at the activities and behaviors of a team, we can predict their score, against these measures, and it too is correlated. Some activities can be used to predict performance on this scale.
For example, if your team employs test automation, trunk-based development, deployment automation, and about ten other practices, their model predicts that you will be practicing continuous delivery. If you practice continuous delivery, the model predicts that you will be “high performing” in terms of software delivery performance and organizational performance.
Alternatively, if we look at organizations that are seen as high performers, then there are common behaviors, such as continuous delivery and being organized into small teams, that they share.
Measures of stability and throughput, then, give us a model that we can use to predict team outcomes.
Stability and throughput are each tracked by two measures.
Stability is tracked by the following:
Change Failure Rate: The rate at which a change introduces a defect at a particular point in the process
Recovery Failure Time: How long to recover from a failure at a particular point in the process
Measuring stability is important because it is really a measure of the quality of work done. It doesn’t say anything about whether the team is building the right things, but it does measure that their effectiveness in delivering software with measurable quality.
Throughput is tracked by the following:
Lead Time: A measure of the efficiency of the development process. How long for a single-line change to go from “idea” to “working software”?
Frequency: A measure of speed. How often are changes deployed into production?
Throughput is a measure of a team’s efficiency at delivering ideas, in the form of working software.
How long does it take to get a change into the hands of users, and how often is that achieved? This is, among other things, an indication of a team’s opportunities to learn. A team may not take those opportunities, but without a good score in throughput, any team’s chance of learning is reduced.
These are technical measures of our development approach. They answer the questions “what is the quality of our work?” and “how efficiently can we produce work of that quality?”
These are meaningful ideas, but they leave some gaps. They don’t say anything about whether we are building the right things, only if we are building them right, but just because they aren’t perfect does not diminish their utility.
Interestingly, the correlative model that I described goes further than predicting team size and whether you are applying continuous delivery. The Accelerate authors have data that shows significant correlations with much more important things.
For example, organizations made up of high-performing teams, based on this model, make more money than orgs that don’t. Here is data that says that there is a correlation between a development approach and the commercial outcome for the company that practices it.
It also goes on to dispel a commonly held belief that “you can have either speed or quality but not both.” This is simply not true. Speed and quality are clearly correlated in the data from this research. The route to speed is high-quality software, the route to high-quality software is speed of feedback, and the route to both is great engineering.