Reliability, Resiliency and Recovery Design Patterns in Cloud Computing
- Resource Pooling
- Resource Reservation
- Hypervisor Clustering
- Redundant Storage
- Dynamic Failure Detection and Recovery
- Multipath Resource Access
- Redundant Physical Connection for Virtual Servers
- Synchronized Operating State
- Zero Downtime
- Storage Maintenance Window
- Virtual Server Auto Crash Recovery
- Non-Disruptive Service Relocation
- Resource Pooling
- Resource Reservation
- Hypervisor Clustering
- Redundant Storage
- Dynamic Failure Detection and Recovery
- Multipath Resource Access
- Redundant Physical Connection for Virtual Servers
- Synchronized Operating State
- Zero Downtime
- Storage Maintenance Window
- Virtual Server Auto Crash Recovery
- Non-Disruptive Service Relocation
Contingency planning efforts for continuity of operations and disaster recovery are concerned with designing and implementing cloud architectures that provide runtime reliability, operational resiliency, and automated recovery when interruptions are encountered, regardless of origin.
The patterns in this chapter address different aspects of these requirements. Starting with foundational patterns, such as Resource Pooling (99), Resource Reservation (106), Hypervisor Clustering (112), and Redundant Storage (119), which address basic failover and availability demands, the chapter continues with more specialized and complex patterns, such as Dynamic Failure Detection and Recovery (123) and Zero Downtime (143), which establish resilient cloud architectures that act as pillars for enterprise cloud solutions.
It is also worth noting that this set of patterns establishes and contributes to the availability leg of the security triad of confidentiality, integrity, and availability and is further complemented by several cloud security patterns in Chapters 8 and 9 in maximizing the reliability and resiliency potential by protecting against attacks that can compromise the availability of an organization’s cloud-hosted IT resources.
Resource Pooling
How can IT resources be organized to support dynamic sharing?
Problem |
When sharing identical IT resources for scalability purposes, it can be error-prone and burdensome to keep them fully synchronized on an on-going basis. |
Solution |
An automated synchronization system is provided to group identical IT resources into pools and to maintain their synchronicity. |
Application |
Resource pools can be created at different sizes and further organized into hierarchies to provide parent and child pools. |
Mechanisms |
Audit Monitor, Cloud Storage Device, Cloud Usage Monitor, Hypervisor, Logical Network Perimeter, Pay-Per-Use Monitor, Remote Administration System, Resource Management System, Resource Replication, Virtual CPU, Virtual Infrastructure Manager (VIM), Virtual RAM, Virtual Server |
Problem
When assembling identical IT resources for sharing and scalability purposes (such as when applying Shared Resources (17) and Dynamic Scalability (25)), the IT resources need to carefully be kept synchronized so that no one IT resource differs from another.
Manually establishing and maintaining the level of required synchronicity across collections of shared IT resources is challenging, effort-intensive and, most importantly, error-prone. Variances or disparity between shared IT resources can lead to inconsistent runtime behavior and cause numerous types of runtime exceptions.
Solution
Identical IT resources are grouped into resource pools and maintained by a system that automatically ensures they remain synchronized (Figure 4.1). The following items are commonly pooled:
- physical servers
- virtual servers
- cloud storage devices
- internetwork and networking devices
- CPUs
- memory (RAM)
Figure 4.1 A sample resource pool comprised of four sub-pools of CPUs, memory, cloud storage devices, and virtual network devices.
Dedicated pools can be created for each of these items, or respective pools can be further grouped into a larger pool (in which case each individual pool becomes a sub-pool).
Application
As stated previously, this pattern is primarily applied in support of Shared Resources (17) and Dynamic Scalability (25) in order to establish a reliable system of shared IT resource synchronization. The Resource Pooling pattern itself can be further supported by the application of Resource Reservation (106).
Provided here are common examples of resource pools:
Physical server pools composed of ready-to-go, networked servers installed with operating systems and any other necessary programs or applications.
Virtual server pools are usually configured using templates that cloud consumers can choose from, such as a pool of mid-tier Windows servers with 4 GBs of RAM or a pool of low-tier Ubuntu servers with 2 GBs of RAM.
Storage pools (or cloud storage device pools) that consist of file-based or block-based storage structures. Storage pools can contain empty or filled cloud storage devices. Often storage resource pools will take advantage of LUNs.
Network pools (or interconnect pools) are composed of different, preconfigured network connectivity devices. For example, a pool of virtual firewall devices or physical network switches can be created for redundant connectivity, load balancing, or link aggregation.
CPU pools are ready to be allocated to virtual servers. These are often broken down into individual processing cores (as opposed to pooling entire CPUs).
Pools of physical RAM that can be used in newly provisioned physical servers or to vertically scale physical servers.
Resource pools can grow to become complex, with multiple pools created for specific cloud consumers or applications. To help with the organization of diverse resource pools, a hierarchical structure can be established to create parent, sibling, and nested pools.
Sibling resource pools are normally drawn from the same collection of physical IT resources (as opposed to IT resources spread out over different data centers) and are isolated from one another so that each cloud consumer is only provided access to its respective pool (Figure 4.2).
Figure 4.2 Pools B and C are sibling pools taken from the larger Pool A that has been allocated to a cloud consumer. This is an alternative to taking the IT resources for Pool B and Pool C from a general reserve of IT resources that is shared throughout the cloud.
In the nested pool model, larger pools are divided into smaller pools of the same kind (Figure 4.3). Nested pools can be used to assign resource pools to different departments or groups within the same cloud consumer organization.
Figure 4.3 Nested Pools A.1 and A.2 are comprised of the same IT resources as Pool A, but in different quantities. Nested pools are generally used to provision cloud services that are rapidly instantiated using the same kind of IT resources with the same configuration settings.
After resources pools have been defined, multiple instances of IT resources from each pool can be created to provide an in-memory pool of “live” IT resources.
Mechanisms
- Audit Monitor – This mechanism monitors resource pool usage to ensure compliance with privacy and regulation requirements, especially when pools include cloud storage devices or data loaded into memory.
- Cloud Storage Device – Cloud storage devices are commonly pooled as a result of the application of this pattern.
- Cloud Usage Monitor – Various cloud usage monitors can be involved with the runtime tracking and synchronization required by IT resources within pools and by the systems managing the resource pools themselves.
- Hypervisor – The hypervisor mechanism is responsible for providing virtual servers with access to resource pools, and hosting virtual servers and sometimes the resource pools themselves. Hypervisors further can distribute physical computing capacity between the virtual servers based on each virtual server’s configuration and priority.
- Logical Network Perimeter – The logical network perimeter can be used to logically organize and isolate the resource pools.
- Pay-Per-Use Monitor – The pay-per-use monitor collects usage and billing information in relation to how individual cloud consumers use and are allocated IT resources from various pools.
- Remote Administration System – This mechanism is commonly used to interface with backend systems and programs in order to provide resource pool administration features via a front-end portal.
- Resource Management System – The resource management system mechanism supplies cloud consumers with the tools and permission management options to administer resource pools.
- Resource Replication – This mechanism can be used to generate new instances of IT resources for a given resource pool.
- Virtual CPU – This mechanism is used to allocate CPU to virtual servers, and also helps to determine whether a hypervisor’s physical CPU is being over-utilized or a virtual server requires more CPU capacity. When a system has more than one CPU or when hypervisors belong to the same cluster, their total CPU capacity can be aggregated into a pool and leveraged by virtual servers.
- Virtual Infrastructure Manager (VIM) – This mechanism enables pools of resources to be created on individual hypervisors, and can also aggregate the capacity of multiple hypervisors into a pool from where virtual CPU and memory resources can be assigned to virtual servers.
- Virtual RAM – This mechanism is used to allocate memory to virtual servers, and to measure the memory utilization of hypervisors and virtual servers. When more than one hypervisor is present, a pool encompassing the combined memory capacity of the hypervisors can be created. This mechanism is also used to identify whether more memory should be added to a virtual server.
- Virtual Server – This mechanism is associated with the Resource Pooling pattern in how virtual server hosted IT resources are provisioned and consumed by resource pools that are assigned to cloud consumers. Virtual servers themselves may also be pooled.