- Resource Pooling
- Resource Reservation
- Hypervisor Clustering
- Redundant Storage
- Dynamic Failure Detection and Recovery
- Multipath Resource Access
- Redundant Physical Connection for Virtual Servers
- Synchronized Operating State
- Zero Downtime
- Storage Maintenance Window
- Virtual Server Auto Crash Recovery
- Non-Disruptive Service Relocation
Zero Downtime
How can downtime of virtual servers be avoided or eliminated?
Problem |
It is challenging to provide zero downtime guarantees when a physical host acts as a single point of failure for virtual servers. |
Solution |
A fault tolerance system is established so that when a physical server fails, virtual servers are migrated to another physical server. |
Application |
A combination of virtual server fault tolerance, replication, clustering, and load balancing are applied and all virtual servers are stored in a shared volume allowing different physical hosts to access their fi les. |
Mechanisms |
Audit Monitor, Cloud Storage Device, Cloud Usage Monitor, Failover System, Hypervisor, Live VM Migration, Logical Network Perimeter, Physical Uplink, Resource Cluster, Resource Replication, Virtual CPU, Virtual Disk, Virtual Infrastructure Manager (VIM), Virtual Network, Virtual RAM, Virtual Server, Virtual Switch, Virtualization Agent, Virtualization Monitor |
Problem
A physical server naturally acts as a single point of failure for the virtual servers it hosts. As a result, when the physical server fails or is compromised, the availability of any (or all) hosted virtual servers can be affected. This makes the issuance of zero downtime guarantees by a cloud provider to cloud consumers challenging.
Solution
A failover system is established so that virtual servers are dynamically moved to different physical server hosts in the event that their original physical server host fails. For example, in Figure 4.30, Virtual Server A is dynamically moved to another physical server host.
Figure 4.30 Physical Server A fails, triggering the live VM migration program to dynamically move Virtual Server A to Physical Server B.
Application
Multiple physical servers are assembled into a group that is controlled by a fault tolerance system capable of switching activity from one physical server to another, without interruption. Resource cluster and live VM migration components are commonly part of this form of high availability cloud architecture.
The resulting fault tolerance assures that, in case of physical server failure, hosted virtual servers will be migrated to a secondary physical server. All virtual servers are stored on a shared volume (as per Persistent Virtual Network Configuration (227)) so that other physical server hosts in the same group can access their files.
Live storage replication can further be utilized to guarantee that virtual server files and hard disks remain available via secondary storage devices.
Mechanisms
- Audit Monitor – This mechanism may be required to ensure that the relocation of virtual servers does not relocate hosted data to prohibited locations.
- Cloud Storage Device – A cloud storage device is used to store virtual server network configuration data shared by the physical servers. It stores virtual servers and virtual disks in a central repository so that other available hypervisors can access the files and power on the failed virtual servers in case one of the hypervisors fails.
- Cloud Usage Monitor – Incarnations of this mechanism are used to monitor the actual IT resource usage of cloud consumers to help ensure that virtual server capacities are not exceeded.
- Failover System – The failover system can be used to switch from a failed primary physical server to a secondary physical server.
- Hypervisor – The hypervisor of each affected physical server hosts the affected virtual servers.
- Live VM Migration – When multiple instances of the same service or virtual server are provisioned for the purpose of redundancy and availability, this mechanism is used to seamlessly distribute different instances of the same service between different hypervisors to make sure one hypervisor will not become a single point of failure.
- Logical Network Perimeter – Logical network perimeters provide and maintain the isolation that is required to ensure that each cloud consumer remains within its own logical boundary subsequent to virtual server relocation.
- Physical Uplink – Physical uplinks are used and deployed in a redundant model, so that the virtual servers and services will not lose their connectivity to the cloud service consumers if a physical uplink fails or becomes disconnected.
- Resource Cluster – The resource cluster mechanism is applied to create different types of active/active cluster groups that collaboratively improve the availability of virtual server-hosted IT resources.
- Resource Replication – This mechanism can create new virtual server and cloud service instances upon primary virtual server failure.
- Virtual CPU – The virtual CPU mechanism is used to provide CPU cycling, scheduling, and processing capabilities to the virtual servers.
- Virtual Disk – This mechanism is used to allocate local storage space to the hosted virtual servers.
- Virtual Infrastructure Manager (VIM) – This mechanism is used to control the availability and redundancy of the virtual servers and services, and initiates proper command when rebalancing the environment or recreating a new instance of a service or virtual server is required.
- Virtual Network – This mechanism is used to connect virtual servers and the services hosted on top of them.
- Virtual RAM – This mechanism is used to establish access for the virtual servers and applications to the physical memory installed on the physical server.
- Virtual Server – This is the mechanism to which this pattern primarily applied.
- Virtual Switch – This mechanism is used to connect hosted virtual servers to the physical network and external cloud service consumers using physical uplinks.
- Virtualization Agent – Virtual servers use this mechanism to send regular heartbeat messages to the hypervisor. A recovery process is initiated if the hypervisor does not receive heartbeats after an extended period of time.
- Virtualization Monitor – This mechanism is used to monitor the virtual servers’ availability and operational status.