Why Software Goes Wrong
We don’t want to overstate the seriousness of the poor state of enterprise software development, and we don’t think it can be overstated.
When discussing enterprise software system conditions with Fortune and Global companies, we quickly learn about their major pain points. These are always related to aged software that has undergone decades of maintenance, long after innovation took place. Most discussions identify that software development is considered a cost center to the business, which makes it that much more difficult to invest in improvements. Today, however, software should be a profit center. Unfortunately, the collective corporate mindset is stuck 30-plus years back when software was meant to make some operations work faster than manual labor.
A specific application (or subsystem) starts with a core business reason to be built. Over time, its core purpose will be enhanced or even altered considerably. Continuous additions of features can become so extensive that the application’s original purpose is lost and it likely means different things to different business functions, with the full diversity of those understandings not readily known. This often leads to many hands stirring the pot. Eventually the urgent development transitions from strategic to keeping the software running by fixing urgent bugs and patching data directly in the database in an effort to compensate for failures. New features are generally added slowly and gingerly in an attempt to avoid producing even more bugs. Even so, injecting new bugs is inevitable: With the ever-increasing level of system disorder and lost historical perspective, it’s impossible to determine the full impact a single given change will have on the greater body of software.
Teams admit that there is no clear and intentional expression of software architecture, either in individual applications (subsystems) or even overall in any large system. Where some sense of architecture exists, it is generally brittle and obsolete given advances in hardware design and operational environments such as the cloud. Software design is also unintentional, and thus appears to be nonexistent. In consequence, most ideas behind an implementation are implicit, committed to the memories of a few people who worked on it. Both architecture and design are by and large ad hoc and just plain weird. These unrecognized failures make for some really sloppy results due to slipshod work.
Just as dangerous as producing no well-defined architecture at all is introducing architecture for merely technical reasons. A fascination often exists among software architects and developers with regard to a novel development style relative to what they previously employed, or even a newly named software tool that is the subject of a lot of hype and industry buzz. This generally introduces accidental complexity2 because the IT professionals don’t fully understand what impacts their ill-advised decisions will have on the overall system, including its execution environment and operations. Yes, Microservices architecture and tools such as Kubernetes, although duly applicable in the proper context, drive a lot of unqualified adoption. Unfortunately, such adoption is rarely driven by insights into business needs.
The prolonged buildup of software model inaccuracies within the system from failure to perform urgent changes is described as the debt metaphor. In contrast, the accumulation from accepting uncontrolled changes to a system is known as software entropy. Both are worth a closer look.
Debt Metaphor
Decades ago, a very smart software developer, Ward Cunningham, who was at the time working on financial software, needed to explain to his boss why the current efforts directed toward software change were necessary [Cunningham]. The changes being made were not in any way ad hoc; in fact, they were quite the opposite. The kinds of changes being made would make it look as if the software developers had known all along what they were doing, and serve to make it look like it was easy to do. The specific technique they used is now known as software refactoring. In this case, the refactoring was done in the way it was meant to be implemented—that is, to reflect the acquisition of new business knowledge into the software model.
To justify this work, Cunningham needed to explain that if the team didn’t make adjustments to the software to match their increased learning about the problem domain, they would continue to stumble over the disagreement between the software that existed and their current, refined understanding. In turn, the continued stumbling would slow down the team’s progress on continued development, which is like paying interest on a loan. Thus, the debt metaphor was born.
Anyone can borrow money to enable them to do things sooner than if they hadn’t obtained the money. Of course, as long as the loan exists, the borrower will be paying interest. The primary idea in taking on debt in the software is to be able to release sooner, but with the idea that you must pay the debt sooner rather than later. The debt is paid by refactoring the software to reflect the team’s newly acquired knowledge of the business needs. In the industry at that time, just as it is today, software was rushed out to users knowing that debt existed, but too often teams had the idea that you never have to pay off the debt.
Of course, we all know what happens next. If debt continues to stack up and the person borrows more and more, all the borrower’s money goes to paying interest and they reach a point where they have zero buying power. Matters work the same way with software debt, because eventually developers deep in debt will be severely compromised. Adding new features will take longer and longer, to the point where the team will make almost no progress.
One of the major problems with the contemporary understanding of the debt metaphor is that many developers think this metaphor supports deliberately delivering poorly designed and implemented software so as to deliver sooner. Yet, the metaphor doesn’t support that practice. Attempting that feat is more like borrowing on subprime loans3 with upward adjustable interest rates, which often results in the borrower becoming financially overextended to the point of defaulting. Debt is useful only as long as it is controlled; otherwise, it creates instability within the entire system.
Software Entropy
Software entropy4 is a different metaphor but closely related to the debt metaphor in terms of the software system conditions it describes. The word entropy is used in statistical mechanics in the field of thermodynamics to measure a system’s disorder. Without attempting to go too deep into this topic: “The second law of thermodynamics states that the entropy of an isolated system never decreases over time. Isolated systems spontaneously evolve towards thermodynamic equilibrium, the state with maximum entropy” [Entropy]. The software entropy metaphor names the condition of a software system where change is inevitable, and that change will cause increasing uncontrolled complexity unless a vigorous effort is made to prevent it [Jacobson].
Big Ball of Mud
An application or system like the one previously described has become known as a Big Ball of Mud. In terms of architecture, it has been further described as haphazardly structured; sprawling; sloppy; duct-taped-and-baling-wired; jungle; unregulated growth; repeated, expedient repair. “Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition” [BBoM].
It seems appropriate to describe the Big Ball of Mud “architecture” as the unarchitecture.
Throughout the remainder of this chapter, as well as in this book in general, we will key in on a few of these characteristics: haphazardly structured; unregulated growth; repeated, expedient repair; information shared promiscuously; all important information global or duplicated.
An enterprise norm of the Big Ball of Mud results in organizations experiencing competitive paralysis, which has spread across business industries. It is quite common for large enterprises, which once enjoyed competitive distinction, to become hamstrung by systems with deep debt and nearly complete entropy.
You can easily contrast the Big Ball of Mud system in Figure 1.2 to that depicted in Figure 1.1. Of course, the segment of the system in Figure 1.1 doesn’t represent the number of features that are supported by the system in Figure 1.2, but clearly the architecture of the first system brings order, whereas the lack thereof in the second offers chaos.
Figure 1.2 The Big Ball of Mud might be classified as the unarchitecture.
These chaotic conditions prevent more than a few software releases per year, which result in even worse problems than the software releases of previous years. Individuals and the teams to which they belong tend to become indifferent and complacent because they know they can’t produce the change they see as necessary to turn things around. The next level from there is becoming disillusioned and demoralized. Businesses facing this situation cannot innovate in software and continue to compete under such conditions. Eventually they fall victim to a nimble startup that can make significant strides forward, to the point where within a few months to a few years, it can displace previous market leaders.
Running Example
From this point forward, we present a case study using an existing Big Ball of Mud and describe a situation where the affected business struggles to innovate as it faces the realities of the associated deep debt and entropy. Because you might already be tired of reading bad news, here’s a spoiler: The situation improves over time.
There is no better way to explain the issues every company has to face with software development than with examples borrowed from the real world. The example offered here as a case study in dealing with an existing Big Ball of Mud comes from the insurance industry.
At some point in life, just about everyone has to deal with an insurance company. There are various reasons why people seek to obtain diverse insurance policies. Some are to address legal requirements, and some provide security measures for the future. These policies include protection for health, personal lines such as life, automobile, home, mortgage, financial products investments, international travel, and even the loss of a favorite set of golf clubs. Policy product innovation in the field of insurance seems endless, where almost any risk imaginable can be covered. If there is a potential risk, you’re likely to find an insurance company that will provide coverage for it.
The basic idea behind insurance is that some person or thing is at risk of loss, and for a fee, the calculated financial value of the insured person or thing may be recovered when such loss occurs. Insurance is a successful business proposition due to the law of large numbers. This law says that given a large number of persons and things being covered for risks, the overall risk of loss among all of those covered persons and things is quite small, so the fees paid by all will be far greater than the actual payments made for losses. Also, the greater the probability of loss, the greater the fee that the insurance company will receive to provide coverage.
Imagine the complexity of the insurance domain. Is coverage for automobiles and homes the same? Does adjusting a few business rules that apply to automobiles make covering homes possible? Even if automobile and home policies might be considered “close enough” to hold a lot in common, think of the different risks involved in these two policy types.
Consider some example scenarios. There is a much higher possibility of an automobile striking another automobile than there is of a part of a house striking another house and causing damage. The likelihood of a fire occurring in a kitchen due to normal everyday use is greater than the likelihood of the car’s engine catching fire due to normal everyday use. As we can see, the difference between the two kinds of insurance isn’t subtle. When considering the variety of possible kinds of coverage, it requires substantial investment to provide policies that have value to those facing risk and that won’t be a losing proposition to the insurance company.
Thus, it’s understandable that the complexity among insurance firms in terms of business strategy, operations, and software development is considerable. That is why insurance companies tend to specialize in a small subset of insurable products. It’s not that they wouldn’t want to be a larger player in the market, but rather that costs could easily outweigh the benefits of competing in all possible segments. It’s not surprising, then, that insurance companies more often attempt to lead in insurance products in which they have already earned expertise. Even so, adjusting business strategies, accepting unfamiliar yet measured risks, and developing new products might be too lucrative an opportunity to pass up.
It is time to introduce NuCoverage Insurance. This fictitious company is based on real-world scenarios previously experienced by the authors. NuCoverage has become the leader in low-cost auto insurance in the United States. The company was founded in 2001 with a business plan to focus on providing lower-cost premiums for safe drivers. It saw a clear opportunity in focusing on this specific market, and it succeeded. The success came from the company’s ability to assess risks and premiums very accurately and offer the lowest-cost policies on the market. Almost 20 years later, the company is insuring 23% of the overall US market, but nearly 70% in the specialized lower-cost safe-driver market.
Current Business Context
Although NuCoverage is a leader in auto insurance, it would like to expand its business to other kinds of insurance products. The company has recently added home insurance and is working on adding personal lines of insurance. However, adding new insurance products became more complex than was originally perceived.
While the development process of personal lines of insurance was ongoing, management had an opportunity to sign a partnership with one of the largest US banks, WellBank. The deal involves enabling WellBank to sell auto insurance under its own brand. WellBank sees great potential in selling auto insurance along with its familiar auto loans. Behind the WellBank auto insurance policies is NuCoverage.
Of course, there are differences between NuCoverage auto insurance products and the ones to be sold by WellBank. The most prominent differences relate to the following areas:
Premiums and coverages
Rules and premium price calculation
Risk assessment
Although NuCoverage has never before experienced this kind of partnership, the business leaders immediately saw the potential to expand their reach, and possibly even introduce a completely new and innovative business strategy. But in what form?
Business Opportunity
NuCoverage’s board of directors and executives recognized an even larger strategic opportunity than the WellBank partnership: They could introduce a white-label5 insurance platform that would support any number of fledgling insurers. Many types of businesses might potentially support selling insurance products under the business’s own brand. Each business best knows its customers and grasps what insurance products could be offered. The recently inked partnership with WellBank is just one example. NuCoverage can certainly identify other forward-thinking partners that would share the vision of selling insurance products under a white label.
For example, NuCoverage could establish partnerships with car makers that offer their own financing. When a customer purchases a car, the dealer could offer both financing and manufacturer-branded insurance. The possibilities are endless, due to the fact that any random company cannot easily become an insurance company, but can still benefit from the margins gained through insurance sales. In the long run, NuCoverage considered diversifying with new insurance products such as motorcycle, yacht, and even pet insurance.
This possibility seems very exciting to the board and executives, but when the software management team learned about the plans, a few of them had to swallow hard. The original auto insurance application was built quickly under a lot of pressure to deliver, which quickly led to a Big Ball of Mud Monolith. As Figure 1.3 illustrates, with more than 20 years of changes and deep unpaid debt, and the ongoing development of the system for the personal lines of insurance, the teams have reached a point of stifling unplanned complexity. The existing software will absolutely not support current business goals. All the same, development needs to answer the call.
Figure 1.3 The NuCoverage Big Ball of Mud. All business activities are intertwined with tangled software components that are in deep debt and near maximum entropy.
What NuCoverage must understand is that its business is no longer insurance alone. It was always a product company, but its products were insurance policies. Its digital transformation is leading the firm to become a technology company, and its products now include software. To that end, NuCoverage must start thinking like a technology product company and making decisions that support that positioning—not only as a quick patch, but for the long term. This is a very important shift in the company mindset. NuCoverage’s digital transformation cannot be successful if it is driven only by technology choices. Company executives will need to focus on changing the mindset of the organization’s members as well as the organizational culture and processes before they decide what digital tools to use and how to use them.