- Resource Pooling
- Resource Reservation
- Hypervisor Clustering
- Redundant Storage
- Dynamic Failure Detection and Recovery
- Multipath Resource Access
- Redundant Physical Connection for Virtual Servers
- Synchronized Operating State
- Zero Downtime
- Storage Maintenance Window
- Virtual Server Auto Crash Recovery
- Non-Disruptive Service Relocation
Virtual Server Auto Crash Recovery
In the event that a virtual server’s operating system crashes, how can the hosted cloud services be automatically recovered?
Problem |
A virtual server whose operating system suddenly fails needs to be able to have its hosted cloud services automatically recovered. |
Solution |
The virtual server’s activity is constantly monitored and traced for recovery, in the event of an operating system failure. |
Application |
Applying this pattern involves specific techniques and mechanisms that are used by the hypervisor to check the operational status of the virtual server. |
Mechanisms |
Hypervisor, Virtualization Agent |
Problem
When the operating system of a virtual server fails or crashes, the cloud services that are hosted on this virtual server also become unavailable. This can in turn cause an outage or even an SLA breach, since some organizations have little to no tolerance for outages.
The following steps are shown in Figure 4.38:
- Cloud Service A is running on Virtual Server A.
- Cloud consumers suddenly cannot access the service.
An investigation shows that Hypervisor A is working fine and has been allocating resources to Virtual Server A. However, Virtual Server A’s resource usage is zero. Further investigation reveals that its operating system has crashed, which is why Cloud Service A is not working.
Figure 4.38 Cloud Service A becomes suddenly inaccessible to a cloud consumer.
The system administrator has to manually reboot Virtual Server A in order to bring it back into operation.
Solution
Applying this pattern ensures that the operational status of a given virtual server is always being checked by the hypervisor on a routine basis. If the virtual server is not running or shows no signs of operation after a certain length of time, then the hypervisor takes action and restarts the virtual server automatically, to recover the virtual server from a crash.
The scenario that results is illustrated in Figure 4.39 in the following steps:
- Hypervisor A is monitoring Virtual Server A’s operational status.
- Cloud Service A becomes unavailable due to an operating system failure on Virtual Server A.
- Hypervisor A becomes aware of the nonoperational status of Virtual Server A immediately.
- Hypervisor A restarts Virtual Server A.
- Service resumes without requiring human interaction and cloud consumers can access the virtual server.
Hypervisor A continues to monitor the operational status of Virtual Server A.
Figure 4.39 After the crash, Cloud Service A becomes available again.
Application
This pattern can be applied in different ways, depending on the brand and model of the hypervisor and the mechanism used to track the resource utilization of the virtual servers. The following chart in Figure 4.40 illustrates the steps involved in applying the pattern.
Figure 4.40 The steps involved in applying this pattern are shown.
Different methods and mechanisms can be used to check the virtual server’s operational status, such as a mechanism that can install an agent inside the virtual server that reports back to the hypervisor. Another mechanism is a hypervisor that checks the resource usage of the virtual server, including memory and CPU usage, at pre-defined intervals. A different method is to check the virtual server’s network traffic and storage traffic for communication over the network and whether it is accessing or requesting any storage.
While this pattern ensures that virtual servers, applications, and services are operational and can be automatically recovered in the case of an operating system failure, this pattern may also restart the virtual server as a result of a “false positive.”
Mechanisms
- Hypervisor – The hypervisor mechanism hosts the virtual servers and is responsible for making sure the virtual servers are up and running. Any failed or crashed virtual servers are restarted by this mechanism.
- Virtualization Agent – This mechanism establishes one-way communication via specialized messages that are sent by the virtual servers to the host hypervisor at frequent and regular intervals to confirm virtual server operation.