Five Questions
Good questions make us think, uncover details, and transform our understanding. I have found them to be a great tool when designing. These questions have helped me scope a system and dig into it. They are designed to ground us in concrete situations and to avoid grasping for ideals that often cause projects to fail.
Question 1: When Is the Best Time to Market?
The business, not the architect, decides the project timing. However, this is the first question that we, as architects, need to ask. Time can be our enemy because, often, even skills and money can’t change the timeline of the product.
Time to market is everything, and our designs must incorporate these realities. When the deadlines are strict, we can design with the knowledge that we will be able to rewrite the system beyond the deadline.
My usual experience is that although the time-to-market deadlines are often not negotiable, features that should go into that version are often flexible. My recommendation is to work with a UX designer and product manager to understand the minimal features you must incorporate into the design and do it as fast as possible, using the most straightforward approach.
Question 2: What Is the Skill Level of the Team?
Leadership is about working with your team. Some teams are so good they might handle the system without any help from you at all. However, leadership is needed when we work with less-than-perfect teams.
You go to war with the army you have, not the army you might want or wish to have at a later time.
—Donald Rumsfeld
Take a hard, realistic look at your team. Your team may be veteran superstars, handpicked and employed by you over the years, fresh hires, or some mix of these. You must pick an architecture your team can manage. For example, do not pick an event-driven architecture or CQRS (Command and Query Responsibility Segregation)-based server unless you have a few people who have done this before. Those kinds of architectures have high costs in understandability and debugging challenges, and they most likely will cost much more in the long run unless the team understands their finer details.
What if you feel in your gut that CQRS is the right solution, but you do not have experts on board to achieve that architecture? I recommend designing the current version using a simple architecture and starting a proof of concept (PoC). That way, you can try out the CQRS in the background with the person who is most likely to handle it and with the hope that, in the second version, you will undertake CQRS.
You might be thinking, “Can’t I train the team?” Yes, in some cases. For example, you might hire an expert who works shoulder to shoulder with the team for a few months. However, a deeper understanding of most complex systems takes time. My experience is giving someone a fingertip feel for performance, keeping in mind that the ability to handle details like concurrency or an LMAX disrupter takes at least a year, maybe two.1
Similarly, you should pick the programming language also based on the team. Programmers have many skills acquired around a given programming language, and it is often tough to change.
Certain abilities, like security and user experience, are essential. So, the leader must find a way to cover these areas. Doing so could involve hiring a consultant or the leader personally stepping in to support the team and provide guidance.
Rarely, the right choice could be to refuse to build certain software with a novice team. Instead, sometimes (in a startup, for instance) you can build a limited version (e.g., that scales less), which can then be used to justify more investment later. Even then, you need to make things very clear for people who are making the investment and make sure they know the risks.
Question 3: What Is Our System’s Performance Sensitivity?
If a system operates close to the performance limits of a naive architecture, we say that the system is performance sensitive. Architectural considerations change significantly between systems that are sensitive and insensitive to performance.
The performance sensitivity of the system tells how much leeway you have and how much precision you need. Achieving a higher precision is like walking a tightrope; it is exponentially hard and needs experienced developers. Thus, performance-sensitive systems need exotic techniques, careful design, greater creativity, continuous performance measurements, and a feedback cycle. We need to test and identify unknowns through experiments as soon as possible. We must have a thin slice working end to end and invest early on in the system to collect detailed metrics on its mechanics. We discuss this style of design in Chapter 3. All this adds complexity and cost.
We can design performance-insensitive systems using a fingertip feel for the performance and simple architectural choices. We discuss this approach in detail in Chapter 3. Hence, the answer to this question significantly affects our architectural choices.
Note that many systems are performance insensitive. For example, using open-source service frameworks such as a Spring Boot and a database, you can easily implement a service that handles a few hundred requests per second. Even 50 requests per second are 4.32 million requests per day. With most businesses, if you are getting that many requests, chances are that you are already successful and can afford to write the second and third versions of the system. Most systems never need to exceed this limit.
Here’s a second follow-up question: When we go beyond the limit of trivial implementation (e.g., 50 requests per second), will we have enough money to rewrite the system? Always ask this question: If we have that many requests, will we have enough money to rewrite the system? If the answer is yes, you can start with a simpler design and wait.
A much trickier scenario is if use cases require operating with latency bounds. We discuss this topic in Chapter 3. However, naive architecture can support (in most cases) expectations of latency of less than a few seconds (e.g., 1–10 seconds).
Question 4: When Can We Rewrite the System?
The fourth question helps us accept that we will rewrite the system eventually. For example, if you are a startup, do not try to build the architecture that you will need when you have a few billion users and hundreds of millions of dollars in revenue. When you get there, you will have enough money to rewrite the system several times over. Most successful systems have been rewritten many times over.
The common objection is that it would waste money to redo the system. Yes, it will cost, but to believe that you can think through all the details of a system as it will be in three to five years down the line is arrogance. There is so much uncertainty along the way. Chances are your system will not work for the first few moderate trials and will take longer to deliver.
Instead, be humble. Make the system work for the first 10,000 to 50,000 users, learn from them, and rewrite when the time is appropriate. Often, that time is not that far into the future. This approach helps us to be lean and simple, focusing on a few key problems, yet solving those systems properly. Do Things that Don’t Scale!!.
Also, with the new IDEs, it is comparatively easy to refactor and redesign logic into a new structure. My belief is that we should plan to rewrite beyond key milestones (e.g., startup PoC to first serious funding round or beyond a million users). Having accepted that we will rewrite, we often realize that many features or guarantees can be done in the next rewrite.
Question 5: What Are the Hard Problems?
With the line of thinking I am proposing, it is easy to forget hard problems or push them out to the future. This question guards against such procrastination. But, sometimes, the hard problem is unrelated to the software, which is someone else’s problem.
Most systems are part of a competitive landscape. We must, therefore, ask this question: What is our competitive advantage? If the competitive advantage is in the software, we have to work hard to achieve that. By definition, good competitive advantages are difficult. Otherwise, your competition would have already accomplished that or will do it once they figure it out. We can’t achieve sustainable competitive advantages by doing as little as possible.
If your hard problems do not give you competitive advantages, then there is a good chance you can learn about hard problems from others. Likely, others have done it before, which can save you a lot of time and money. If a hard problem provides a competitive advantage, you must invest your time and energy in solving it. You need to invest in those as PoCs, independently of the system’s design.
You need to start this process as early as possible. To do so, we should first ask the question: What is the minimal implementation that tests the idea? Then we should conduct a PoC to test it. We should bring the PoCs into the system after eliminating uncertainties in the simplest way possible.
In summary, we need to identify hard problems and handle them differently. Postponing them is not advantageous. We should identify problems that need long-term work and start fixing them early on, giving us time to get them right.