- The Cookbook for Setting Up a Serviceguard Package-less Cluster
- The Basics of a Failure
- The Basics of a Cluster
- The "Split-Brain" Syndrome
- Hardware and Software Considerations for Setting Up a Cluster
- Testing Critical Hardware before Setting Up a Cluster
- Setting Up a Serviceguard Package-less Cluster
- Constant Monitoring
- Chapter Review
- Test Your Knowledge
- Answers to Test Your Knowledge
- Chapter Review Questions
- Answers to Chapter Review Questions
Answers to Chapter Review Questions
A1: With an Active/Standby configuration, a node is designated as a Standby node and runs an application only when a failure occurs. If the same node is deemed a Standby node for multiple applications, it may have to run multiple applications in the event of multiple failures. A Rolling/Standby configuration is where there is an initial Standby node ready to take over in the event of a failure. The difference is that every node that sustains a failure can subsequently become a Standby node; in this way, the responsibility of being a Standby node rolls over to the node that is currently not running an application.
-
/etc/lmrc: This startup script needs to be modified to not activate all volume groups at startup time. Serviceguard will activate volume groups as necessary.
-
/etc/fstab: Any filesystems that will be shared between nodes in the cluster must not be listed in /etc/fstab because Serviceguard will mount any filesystems when starting up associated applications.
A3: The cluster has four nodes. The following points can be made regarding the configuration:
-
Using a serial heartbeat in a four-node cluster is not supported.
-
Using a cluster-lock disk in a four-node cluster is supported but unusual.
-
The HEARTBEAT_INTERVAL = 6 seconds. The NODE_TIMEOUT = 5 seconds. As such, the cluster will be constantly reforming because the nodes will be timing out before sending heartbeat packets.
The cluster configuration is invalid.
A4: An HP-UX Real-Time Priority of 20 is a very high priority and gives a process a high probability of being executed when it needs to. The cmcld process is the most important process in the cluster because it coordinates the sending and receiving of heartbeat packets. If this process cannot run, the node will not be able to send/receive heartbeat packets and will be deemed to have failed. This will cause a cluster reformation, and the node in question may end up instigating a Transfer Of Control (TOC). The implications for managing our own processes/applications is that if we run processes at a priority of 20 or greater, there is a possibility that the cmcld process will not be allowed to execute and will cause Serviceguard to instigate a Transfer Of Control (TOC) because application processes are monopolizing the processors.
A5: AUTOSTART_CMCLD is stored in the startup configuration file /etc/rc.config.d/cmcluster. Its default value is 0. The parameter controls where the node will attempt to rejoin the cluster after the system is rebooted. There are three reasons to set the parameter to = 0.
-
The cluster will normally be started the first time by the cmruncl command. Once the cluster is up and running, the nodes in the cluster should remain up as long as possible. If a node is rebooted, it must be for a reason. If a node does not rejoin the cluster automatically, i.e., AUTOSTART_CMCLD=0, it can indicate to the administrator(s) that something unexpected has happened to the node.
-
If a node is experiencing hardware/software problems that cause it to reboot repeatedly and AUTOSTART_CMCLD=1, the node would be attempting to rejoin the cluster a number of times. This will cause a cluster reformation that can potentially mask other problems with individual nodes or the cluster as a whole.
-
When a node is started up after some hardware/software maintenance, it is often the case that an administrator will want to ensure that any hardware/software updates/changes have been effective before allowing the node to rejoin the cluster. Having AUTOSTART_CMCLD=0 will allow the system to be rebooted as normal without attempting to rejoin the cluster, allowing the administrator to perform any additional configuration checks as necessary.