- The Cookbook for Setting Up a Serviceguard Package-less Cluster
- The Basics of a Failure
- The Basics of a Cluster
- The "Split-Brain" Syndrome
- Hardware and Software Considerations for Setting Up a Cluster
- Testing Critical Hardware before Setting Up a Cluster
- Setting Up a Serviceguard Package-less Cluster
- Constant Monitoring
- Chapter Review
- Test Your Knowledge
- Answers to Test Your Knowledge
- Chapter Review Questions
- Answers to Chapter Review Questions
25.2 The Basics of a Failure
The users will connect to the application through an application or application package IP address, thus, removing the dependency between an individual server and an individual application. In the event of a "failure," the application will be restarted on another node that we will refer to as an "adoptive node." Essentially, we want our applications to run on a prescribed machine for as long as possible and, hence, eliminate the necessity to restart it on an adoptive node. Here is a list of the general points of what constitutes a failure:
-
A failure of all LAN communications: If we had a Standby LAN card, Serviceguard would use it. Otherwise, the application package will be moved to an adoptive node.
-
Total system failure: The cluster will detect a node is no longer functioning and restart an application package on an adoptive node.
-
Application failure: The cluster is monitoring prescribed application processes. If such a process dies, Serviceguard has two option: restart the process a prescribed number of times, or restart the application on an adoptive node.
-
Other critical resources fail: Serviceguard can be configured to utilize the Event Monitoring Service (EMS) that can monitor the state of critical system components and resources. Should one of these components or resources fail, an application package will be moved to an adoptive node.