4.2 Overall Architecture Structure
Before delving into the details of the overall structure, let us clarify how we use certain terminology. The terms module and component are frequently overloaded and used in different fashions in different writings. For us, a module is a code unit with coherent functionality. A component is an executable unit. A compiler or interpreter turns modules into binaries, and a builder turns the binaries into components. The development team thus directly develops modules. Components are results of the modules developed by development teams, and so it is possible to speak of a team developing a component, but it should be clear that the development of a component is an indirect activity of a development team.
As we described in Chapter 1, development teams using DevOps processes are usually small and should have limited inter-team coordination. Small teams imply that each team has a limited scope in terms of the components they develop. When a team deploys a component, it cannot go into production unless the component is compatible with other components with which it interacts. This compatibility can be ensured explicitly through multi-team coordination, or it can be ensured implicitly through the definition of the architecture.
An organization can introduce continuous deployment without major architectural modifications. For example, the case study in Chapter 12 is fundamentally architecture-agnostic. Dramatically reducing the time required to place a component into production, however, requires architectural support:
- Deploying without the necessity of explicit coordination with other teams reduces the time required to place a component into production.
- Allowing for different versions of the same service to be simultaneously in production leads to different team members deploying without coordination with other members of their team.
- Rolling back a deployment in the event of errors allows for various forms of live testing.
Microservice architecture is an architectural style that satisfies these requirements. This style is used in practice by organizations that have adopted or inspired many DevOps practices. Although project requirements may cause deviations to this style, it remains a good general basis for projects that are adopting DevOps practices.
A microservice architecture consists of a collection of services where each service provides a small amount of functionality and the total functionality of the system is derived from composing multiple services. In Chapter 6, we also see that a microservice architecture, with some modifications, gives each team the ability to deploy their service independently from other teams, to have multiple versions of a service in production simultaneously, and to roll back to a prior version relatively easily.
Figure 4.1 describes the situation that results from using a microservice architecture. A user interacts with a single consumer-facing service. This service, in turn, utilizes a collection of other services. We use the terminology service to refer to a component that provides a service and client to refer to a component that requests a service. A single component can be a client in one interaction and a service in another. In a system such as LinkedIn, the service depth may reach as much as 70 for a single user request.
Figure 4.1 User interacting with a single service that, in turn, utilizes multiple other services [Notation: Architecture]
Having an architecture composed of small services is a response to having small teams. Now we look at the aspects of an architecture that can be specified globally as a response to the requirement that inter-team coordination be minimized. We discuss three categories of design decisions that can be made globally as a portion of the architecture design, thus removing the need for inter-team coordination with respect to these decisions. The three categories are: the coordination model, management of resources, and mapping among architectural elements.
Coordination Model
If two services interact, the two development teams responsible for those services must coordinate in some fashion. Two details of the coordination model that can be included in the overall architecture are: how a client discovers a service that it wishes to use, and how the individual services communicate.
Figure 4.2 gives an overview of the interaction between a service and its client. The service registers with a registry. The registration includes a name for the service as well as information on how to invoke it, for example, an endpoint location as a URL or an IP address. A client can retrieve the information about the service from the registry and invoke the service using this information. If the registry provides IP addresses, it acts as a local DNS serverlocal, because typically, the registry is not open to the general Internet but is within the environment of the application. Netflix Eureka is an example of a cloud service registry that acts as a DNS server. The registry serves as a catalogue of available services, and can further be used to track aspects such as versioning, ownership, service level agreements (SLAs), etc., for the set of services in an organization. We discuss extensions to the registry further in Chapter 6.
Figure 4.2 An instance of a service registers itself with the registry, the client queries the registry for the address of the service and invokes the service. [Notation: Architecture]
There will typically be multiple instances of a service, both to support a load too heavy for a single instance and to guard against failure. The registry can rotate among the instances registered to balance the load. That is, the registry acts as a load balancer as well as a registry. Finally, consider the possibility that an instance of a service may fail. In this case, the registry should not direct the client to the failed instance. By requiring the service to periodically renew its registration or proactively checking the health of the service, a guard against failure is put in place. If the service fails to renew its registration within the specified period, it is removed from the registry. Multiple instances of the service typically exist, and so the failure of one instance does not remove the service. The above-mentioned Netflix Eureka is an example for a registry offering load balancing. Eureka supports the requirement that services periodically renew their registration.
The protocol used for communication between the client and the service can be any remote communication protocol, for example, HTTP, RPC, SOAP, etc. The service can provide a RESTful interface or not. The remote communication protocol should be the only means for communication among the services. The details of the interface provided by the service still require cross-team coordination. When we discuss the example of Amazon later, we will see one method of providing this coordination. We will also see an explicit requirement for restricting communication among services to the remote communication protocol.
Management of Resources
Two types of resource management decisions can be made globally and incorporated in the architectureprovisioning/deprovisioning VMs and managing variation in demand.
Provisioning and Deprovisioning VMs
New VMs can be created in response to client demand or to failure. When the demand subsides, instances should be deprovisioned. If the instances are stateless (i.e., they do not retain any information between requests), a new instance can be placed into service as soon as it is provisioned. Similarly, if no state is kept in an instance, deprovisioning becomes relatively painless: After a cool-down period where the instance receives no new requests and responds to existing ones, the instance can be deprovisioned. The cool-down period should therefore be long enough for an instance to respond to all requests it received (i.e., the backlog). If you deprovision an instance due to reduced demand, the backlog should be fairly smallin any other case this action needs to be considered carefully. An additional advantage of a stateless service is that messages can be routed to any instance of that service, which facilitates load sharing among the instances.
This leads to a global decision to maintain state external to a service instance. As discussed in Chapter 2, large amounts of application state can be maintained in persistent storage, small amounts of application state can be maintained by tools such as ZooKeeper, and client state should not be maintained on the providers side anyway.
Determining which component controls the provisioning and deprovisioning of a new instance for a service is another important aspect. Three possibilities exist for the controlling component.
- A service itself can be responsible for (de)provisioning additional instances. A service can know its own queue lengths and its own performance in response to requests. It can compare these metrics to thresholds and (de)provision an instance itself if the threshold is crossed. Assuming that the distribution of requests is fair, in some sense, across all instances of the service, one particular instance (e.g., the oldest one) of the service can make the decision when to provision or deprovision instances. Thus, the service is allowed to expand or shrink capacity to meet demand.
- A client or a component in the client chain can be responsible for (de)provisioning instances of a service. For instance, the client, based on the demands on it, may be aware that it will shortly be making demands on the service that exceed a given threshold and provisions new instances of the service.
- An external component monitors the performance of service instances (e.g., their CPU load) and (de)provisions an instance when the load reaches a given threshold. Amazons autoscaling groups provide this capability, in collaboration with the CloudWatch monitoring system.
Managing Demand
The number of instances of an individual service that exist should reflect the demand on the service from client requests. We just discussed several different methods for provisioning and deprovisioning instances, and these methods make different assumptions about how demand is managed.
- One method for managing demand is to monitor performance. Other decisions to be made include determining how to implement monitoring (e.g., whether done internally by running a monitoring agent inside each service instance or externally by a specialized component). That is, when demand grows that needs to be detected, a new instance can be provisioned. It takes time to provision a new instance, so it is important that the indicators are timely and even predictive to accommodate for that time. We discuss more details about monitoring in Chapter 7.
- Another possible technique is to use SLAs to control the number of instances. Each instance of the service guarantees through its SLAs that it is able to handle a certain number of requests with a specified latency. The clients of that service then know how many requests they can send and still receive a response within the specified latency. This technique has several constraints. First, it is likely that the requirements that a client imposes on your service will depend on the requirements imposed on the client, so there is a cascading effect up through the demand chain. This cascading will cause uncertainty in both the specification and the realization of the SLAs. A second constraint of the SLA technique is that each instance of your service may know how many requests it can handle, but the client has multiple available instances of your service. Thus, the provisioning component has to know how many instances currently exist of your service.
Mapping Among Architectural Elements
The final type of coordination decision that can be specified in the architecture is the mapping among architectural elements. We discuss two different types of mappingswork assignments and allocation. Both of these are decisions that are made globally.
- Work assignments. A single team may work on multiple modules, but having multiple development teams work on the same module requires a great deal of coordination among those development teams. Since coordination takes time, an easier structure is to package the work of a single team into modules and develop interfaces among the modules to allow modules developed by different teams to interoperate. In fact, the original definition of a module by David Parnas in the 1970s was as a work assignment of a team. Although not required, it is reasonable that each component (i.e., microservice) is the responsibility of a single development team. That is, the set of modules that, when linked, constitute a component are the output of a single development team. This does not preclude a single development team from being responsible for multiple components but it means that any coordination involving a component is settled within a single development team, and that any coordination involving multiple development teams goes across components. Given the set of constraints on the architecture we are describing, cross-team coordination requirements are limited.
- Allocation. Each component (i.e., microservice) will exist as an independent deployable unit. This allows each component to be allocated to a single (virtual) machine or container, or it allows multiple components to be allocated to a single (virtual) machine. The redeployment or upgrade of one microservice will not affect any other microservices. We explore this choice in Chapter 6.