- The Cookbook for Setting Up a Serviceguard Package-less Cluster
- The Basics of a Failure
- The Basics of a Cluster
- The "Split-Brain" Syndrome
- Hardware and Software Considerations for Setting Up a Cluster
- Testing Critical Hardware before Setting Up a Cluster
- Setting Up a Serviceguard Package-less Cluster
- Constant Monitoring
- Chapter Review
- Test Your Knowledge
- Answers to Test Your Knowledge
- Chapter Review Questions
- Answers to Chapter Review Questions
25.6 Testing Critical Hardware before Setting Up a Cluster
Let's start by looking at some "tips and tricks" of setting up our key hardware components. We'll begin with disk drives:
-
Disk Drives: There are two scenarios I want to consider here:
- -
Using VxVM disks:
When using VxVM disks, we should give each disk an easily identifiable Disk Media name. In this way, when we deport/import
disk groups, we can identify disks easily; remember, it is possible that device file names will not be consistent across machines
in the cluster, so Disk Media Names will be the identifier in this case.
- -
Using LVM disks:
Using LVM disks poses its own problems. The identifier for an LVM disk is the device file. We need to know the device file name to be able to import the volume group into all nodes in the
cluster. Here's what I do:
-
- Set up the volume group on one node in the cluster. This includes all logical volumes and filesystems. Create all the mount points necessary.
-
- Load all data/files into the appropriate filesystems and logical volumes.
-
- Do not update the file /etc/fstab.
-
- Test that you can see and access all data/files appropriately.
-
- Create a map file (in preview mode) from the active volume group. Here is an example of the error message you will see:
root@hpeos001[] # vgexport p m /tmp/vg01.map /dev/vg01 vgexport: Volume group "/dev/vg01" is still active. root@hpeos001[] # cat /tmp/vg01.map 1 db 2 progs root@hpeos001[] #
This is not really an error; it's LVM just telling you that the volume group is still active. You will notice from the output above that the important part of this step is that the map file is created.
Something else to notice is that I haven't used the "-s" option to vgexport; this would write the concatenated "CPU-ID+VG-ID" into the map file. This seems like a good idea, but in my experience using this in a large configuration just causes you slight problems later on as we see.
-
- You can now distribute the map file to all nodes in the cluster in preparation for using vgimport to import the relevant disks into the volume group.
-
- The next problem is the fact that device files may be different on different nodes. The biggest problem is the Instance number of the interface the disk is connected to. Some administrators spend lots of time and effort to make the Instance numbers of all corresponding devices on a machine the same. That's fine by me if you want to take that route; see how we do it in Chapter 4, "Advanced Peripheral Configuration" (Rebuilding the ioinit File to Suit Your Needs). If you aren't going to spend all your time doing that, then you need to identify which disks are connected to which interfaces. You will need to work this out for all nodes in the cluster. See the example in Figure 25-2.
Figure 25-2 Identifying shared disks.
What we have is a single disk, dual-pathed from two hosts. In systems with lots of disks and multiple paths, e.g., through a SAN, you may have four paths per disk and possibly hundreds of disks. This example goes to show how you would identify which paths are "related" to a particular disk. Just looking at the device files will not show you much. Using a command like diskinfo won't yield much information either. We need to go back to our understanding of the layout of an LVM disk. At the beginning of a disk, we have the 8KB header used to point to the boot area. We then have the PVRA. The first element of the PVRA is an LVM Record, which identifies this disk as an LVM disk. The LVM Record is 8 bytes in size. So take 8KB + 8 bytes = 8200 (2008 in hex) bytes. If we read from that location, we find four interesting numbers in the PVRA; the CPU-ID, the PV-ID, the CPU-ID (again) and the VG-ID. If we were to read this information from any of the device files, we should see exactly the same information. I would use the following command to do this:
# echo "0x2008?4X" | adb /dev/dsk/cXtYdZ
adb moves to the prescribed address and prints four integers; in my case, I display them in hex (the reason for using hex will become apparent). In fact, if you look at the output below, I have run just that command on my first node, hpeos001:
root@hpeos001[] # echo "0x2008?4X" | adb /dev/dsk/c12t0d1 2008: 77A22A2C 3C9DFCD9 77A22A2C 3C9DFCD6 root@hpeos001[] # echo "0x2008?4X" | adb /dev/dsk/c13t0d1 2008: 77A22A2C 3C9DFCD9 77A22A2C 3C9DFCD6
If we look at the output from the commands run from the other node, hpeos002, the output should be the same.
root@hpeos001[] # echo "0x2008?4X" | adb /dev/dsk/c9t0d1 2008: 77A22A2C 3C9DFCD9 77A22A2C 3C9DFCD6 root@hpeos001[] # echo "0x2008?4X" | adb /dev/dsk/c10t0d1 2008: 77A22A2C 3C9DFCD9 77A22A2C 3C9DFCD6
Theses are the two nodes as shown in Figure 25-2. As you can see, the relevant fields match up regardless of which device file we look at. In a large, complex environment, this can help you visualize which device files are "related" to which disks.
-
- We can use vgimport to get the relevant disks into the relevant volume groups using the map file we distributed earlier.
# mkdir /dev/vg01 # mknod /dev/vg01/group c 64 0x010000 # vgimport /dev/vg01 /dev/dsk/c9t0d1 /dev/dsk/c10t0d1 # vgchange a y /dev/vg01 # vgcfgbackup /dev/vg01 # vgchange a n /dev/vg01
I want to say just a word regarding the "-s" option to vgexport/vgimport. I mentioned earlier that in my experience using this option would cause you slight problems. Here are the reasons why:
-
- You are going to have to document the layout of your disks anyway, so why not do it now?
-
- When we use the "-s" option with vgimport, vgimport must scan every disk on the system looking for matching CPU-ID+VG-ID values for corresponding disks. When you have lots of disks, this can take many seconds. In fact, on some systems I have seen it take many minutes. While it is inquiring of all those disks, you are interrupting other important data related IO.
-
- Conclusion: Get to know your hardware; it makes sense in the long run.
-
-
- It might be advisable to start to draw some form of diagram so that you know how your disks map to the device files. Documentation will certainly help later, as well as in any Disaster Recovery scenario.
-
- -
Using VxVM disks:
When using VxVM disks, we should give each disk an easily identifiable Disk Media name. In this way, when we deport/import
disk groups, we can identify disks easily; remember, it is possible that device file names will not be consistent across machines
in the cluster, so Disk Media Names will be the identifier in this case.
-
LAN cards
The important thing to remember here is the importance of having a bridged network between your "active" LAN cards and your "standby" LAN cards. Before embarking on creating a cluster, you must ensure that you can linkloop from and to each LAN card in your bridged network. If not, Serviceguard will not allow you to proceed. It's a relatively simple process as long as you are confident in how your network has been physically constructed. Let's look at a simple example shown in Figure 25-3.
Figure 25-3 A bridged network.
We can see from Figure 25-3 that we have eliminated all the SPOFs for the Cluster Network. Now we need to check that our switch/hub/bridge has no filtering or blocking configured on the ports that we are using. If we are using two links between the switches, as shown, it may be a good idea to disconnect one link, perform the tests, swap the links over, and perform the tests again. It is not important which node you perform the test from because if one component is not working, it will show up. Some administrators perform the tests from both nodes "just to be sure." I don't mind that as a philosophy. I perform the tests on one node, and let you perform the tests on both of your nodes. Note that I am sending packets out of both interfaces and to both interfaces; so in our example, that means four linkloop tests. Below, I am performing the tests on two nodes configured similarly to the nodes in Figure 25-3.
root@hpeos001[] # lanscan Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI Path Address In# State NamePPA ID Type Support Mgr# 8/16/6 0x080009BA841B 0 UP lan0 snap0 1 ETHER Yes 119 8/20/5/2 0x0800093D4C50 1 UP lan1 snap1 2 ETHER Yes 119 root@hpeos001[] # linkloop i 0 0x080009C269C6 Link connectivity to LAN station: 0x080009C269C6 -- OK root@hpeos001[] # linkloop i 0 0x080009E419BF Link connectivity to LAN station: 0x080009E419BF -- OK root@hpeos001[] # linkloop i 1 0x080009C269C6 Link connectivity to LAN station: 0x080009C269C6 -- OK root@hpeos001[] # linkloop i 1 0x080009E419BF Link connectivity to LAN station: 0x080009E419BF -- OK
I am going to assume that you have worked out all the SPOFs we talked about earlier and that you are happy with your current IT processes, system performance, user access, and security issues. I am also going to assume that all nodes in the cluster are listed in your host lookup database.