Sequence/Dependencies
The scripts you develop need to perform operations in the correct order to run successfully. Mapping out what the dependencies are helps you determine the sequence in which steps must take place. The following is a list of major cluster configuration tasks along with what their dependencies are.
Shared storage configurationIf VxVM software is chosen for your volume manager, mirrored volumes can be created before the cluster software is loaded, but since VxVM software is not cluster-aware, these volumes need to be registered with the cluster after the cluster software is installed and configured. Solstice DiskSuite is cluster-aware and should be used to configure your shared storage after the cluster is operational.
Data Agent package additionCan be loaded after the Sun Cluster software core component packages are installed and while the cluster is active or inactive.
NAFO configurationThis is not tied to any particular data agent or service. Must be configured while the cluster node is active. For testing purposes, a NAFO group consisting of only a single network adapter can be created using the system's primary network interface.
Cluster filesystem creationBefore the filesystem can be created you must create the volumes where it will be placed. Do this with either Solstice DiskSuite or VxVM software. If VxVM software is chosen, the Veritas disk group created must be registered with the cluster before the filesystem is created.
Cluster filesystem mountingThe mount point directory needs to be created on all nodes before it will mount. The mount operation only needs to be performed once and can be performed from any cluster node.
Virtual hostname resourceAn entry must exist in a name service or /etc/hosts on each node. Also, at least one NAFO group must exist on each node.
Resource type registrationThe data agent package needs to be installed on both cluster nodes. The registration command can be run on any active cluster node in the cluster.
Resource Group creationThis can be performed on any active node once all the resources, that the group is comprised of, are created.
Script Sequence for HA-NFS Agent
A helpful technique, to determine the correct sequence to run your script commands in, is by mapping out what operations need to performed on each cluster node. FIGURE 2 shows the steps required on each cluster node to setup the HA-NFS data service. As noted in FIGURE 2, the dotted arrows show dependencies.
FIGURE 2 Data Service Setup Tasks and Dependencies for HA-NFS
The steps listed in FIGURE 2 above the top arrow are independent of each other and can be performed on either cluster node at any time. However, since it makes sense to execute all the commands without dependencies on the first node before a script is run on the second cluster node, FIGURE 2 presents the steps that way. This eliminates the need to go back and forth between the cluster nodes when the scripts are run.
Determining the Status of the Cluster
Included in any script should be a status check of the cluster current configuration to prevent the script from producing undesired results and to inform the user that the cluster is not in the state that the script assumes it to be in. The scstat -n command is a convenient way to determine whether the cluster nodes are active or online. It can also be used as a way to capture the names of the cluster nodes for use later in a script. The following is the output of the scstat -n command when it is run from the command line.
# scstat -n -- Cluster Nodes -- Node name Status --------- ------ Cluster node: alpha Online Cluster node: beta Online |
A shell script can be created to fetch the names of the cluster nodes and their current status, as shown in the following example.
STAT='scstat -n | awk '/^ Cluster node:/ {print $3,$4}'' NODE1='echo $STAT | cut -d" " -f1' NODE1STAT='echo $STAT | cut -d" " -f2' NODE2='echo $STAT | cut -d" " -f3' NODE2STAT='echo $STAT | cut -d" " -f4' |
Shared Storage Configuration
When the Solaris OE initializes, all devices physically connected to the system it is running on are probed, then mapped to logical device names. Working with logical device names, in the format of controller-target-disk-slice, is easier than referencing the physical path to the device, which can be very cryptic. However, the order in which devices are probed is determined by which bus slot the device controller is placed in and how the device is cabled.
While differences in logical device names do not have any ill effects, they must be taken into consideration when developing automated scripts since you cannot assume logical device names are always consistent between different clusters even if the hardware components are identical. To determine what the actual physical device is by examining its logical device name the luxadm command is useful. The following is an example of the output from this command.
# luxadm inquiry /dev/rdsk/c1t0d0s0 INQUIRY: Physical path: /devices/pci@8,70000/SUNW,glc@1/fp@0,0/ssd@w50020f2,0:a,raw Vendor: SUN Product: T300 Revision: 0116 Serial Number 9717E44007 Device type: 0x0 (Disk device) ... |
From the output you can determine what type of storage device is assigned to a particular logical device name. If the shared storage is a Sun StorEdge T3 array, the associated logical device name can be extracted by examining all the device names appearing in the /dev/rdsk directory, searching for a product type of T300.
The following shell script routine can be used to search for the two entries in /dev/rdsk that match the product type of T300, the name Sun Microsystems uses to denote the Sun StorEdge T3 array.
find_T3() { ALL_DISKS='scdidadm -l | awk '{print $2}' | cut -d: -f2' CANDIDATES= for i in $ALL_DISKS do DISKTYPE='luxadm inquiry ${i}s0 2>/dev/null \ | awk '/Product:/ {print $2}'' if [ "$DISKTYPE" = "T300" ]; then CANDIDATES="$CANDIDATES 'basename $i'" fi done NDISK='echo $CANDIDATES | wc -w' if [ $NDISK != "2" ]; then gettext "ERROR: The number of T3 arrays found \ is $NDISK \n" gettext "Number of T3 arrays must = 2\n" exit 1 fi T3DISK1='echo $CANDIDATES | awk '{print $1}'' T3DISK2='echo $CANDIDATES | awk '{print $2}'' } |
Once the logical device name is determined, mirrored volumes can be created using either VxVM software or Solstice DiskSuite as described in the next section.
Creating a Shared Mirrored Volume with VxVM Software
The steps for creating a shared mirrored volume with VxVM software are:
Initialize the disks.
Create a disk group.
Add the shared disk to the disk group.
Create the mirrored volume.
Register the disk group with the cluster.
The following commands are used to perform these steps. The variables T3DISK1 and T3DISK2 are used to represent the two Sun StorEdge T3 arrays. The values for these two variables are obtained using the code in the find_T3() example.
/etc/vx/bin/vxdisksetup -i $T3DISK1 /etc/vx/bin/vxdisksetup -i $T3DISK1 vxdg init mydskgrp $T3DISK1 vxdg -g mydskgrp adddisk $T3DISK2 vxassist -g mydskgrp make hanfsvol1 500m layout=mirror scconf -a -D type=vxvm,name=mydskgrp,nodelist=$NODE1:$NODE2 |
After the shared storage is mirrored and registered with the cluster, you can create a filesystem on the mirrored volume as shown in the following command.
newfs /dev/vx/rdsk/mydiskgrp/hanfsvol1 |
Creating Shared Mirror Storage with Solstice DiskSuite
Since Solstice DiskSuite disksets, are cluster-aware the equivalent of VxVM disk groups, do not need to be registered with the cluster. However, disksets are created specifying the Disk ID (DID) numbers used to identify global devices instead of logical device names. To view all the logical device names to DID mappings, the scdidadm command with the -L option is run (see the following illustration).
# scdidamd -L 1 alpha:/dev/rdsk/c0t6d0 /dev/did/rdsk/d1 2 beta:/dev/rdsk/c2t0d0 /dev/did/rdsk/d2 2 alpha:/dev/rdsk/c1t0d0 /dev/did/rdsk/d2 3 beta:/dev/rdsk/c1t1d0 /dev/did/rdsk/d3 3 alpha:/dev/rdsk/c2t1d0 /dev/did/rdsk/d3 4 alpha:/dev/rdsk/c3t1d0 /dev/did/rdsk/d4 5 alpha:/dev/rdsk/c3t0d0 /dev/did/rdsk/d5 6 beta:/dev/rdsk/c0t6d0 /dev/did/rdsk/d6 7 beta:/dev/rdsk/c3t0d0 /dev/did/rdsk/d7 8 beta:/dev/rdsk/c3t1d0 /dev/did/rdsk/d8 |
Before an Solstice DiskSuite diskset can be created, the logical device names of the shared storage devices need to be translated into the equivalent DID device number. Using the function find_T3() (shown in a previous example), you can find the logical device names, then convert them to the equivalent DID device name by inserting the following lines into your script.
DID1='scdidadm -l | grep $T3DISK1 | awk '{print $3}'' DID2='scdidadm -l | grep $T3DISK2 | awk '{print $3}'' |
The steps for creating a mirrored volume using Solstice DiskSuite software are:
Create a cluster disk group specifying the potential cluster node owners.
Add the two Solstice DiskSuite T3 Arrays to the cluster disk group.
Initialize the two Solstice DiskSuite T3 Arrays.
Create a disk mirror.
Add the two Solstice DiskSuite T3 Arrays to the mirror.
The following example shows the commands that perform these operations.
metaset -s mydskset -a -h $NODE1 $NODE2 metaset -s mydskset -a $DID1 $DID2 metainit -s mydskset mydskset/d0 1 1 ${DID1}s0 metainit -s mydskset mydskset/d1 1 1 ${DID2}s0 metainit -s mydskset mydskset/d100 -m mydskset/d0 metattach -s mydskset mydskset/d100 mydskset/d1 metaset -s mydskset -a -m $NODE1 metaset -s mydskset -a -m $NODE2 |
You can now create the filesystem as shown in the following example.
newfs /dev/md/mydskset/rdsk/d100 |
Mounting the Global Filesystem
After a global filesystem is created on the mirrored shared storage devices, the mount command can be run from any cluster node. However, before the mount can take place, a mount directory must be created on each cluster node and the entry that performs the mount must exist in each cluster nodes's /etc/vfstab file. The following example shows the commands run on each cluster node to perform these operations.
On beta:
mkdir /global/nfs echo "/dev/vx/dsk/mydskgrp/hanfsvol1 \ /dev/vx/rdsk/mydskgrp/hanfsvol1 /global/nfs ufs 2 yes \ global,logging" >> /etc/vfstab |
On alpha:
mkdir /global/nfs echo "/dev/vx/dsk/mydskgrp/hanfsvol1 \ /dev/vx/rdsk/mydskgrp/hanfsvol1 /global/nfs ufs 2 yes \ global,logging" >> /etc/vfstab mount /global/nfs |
Once the global filesystem is mounted, application software can be placed in it and administrative directory and files can be created there. For the HA-NFS example, no additional application software is required, but you need to create some additional directories and configuration files as shown in the following example.
On beta:
mount /global/nfs mkdir /global/nfs/admin mkdir /global/nfs/admin/SUNW.nfs echo 'share -F nfs -o rw -d"HA-NFS" /global/nfs/data' > \ /global/nfs/admin/SUNW.nfs/dfstab.nfs-res mkdir /global/nfs/data chmod 777 /global/nfs/data |
Creating the Data Service Resource Group
When a data service failure is detected on an active cluster node, the data service is transferred, or failed-over, to a working cluster node. The failover unit that Sun Cluster 3.0 software uses is the resource group, which is a collection of resources that are required for the data service to run. Resources are categorized by their type. For example, the resource that represents the NFS data service is of type SUNW.nfs and the resource for failing-over the shared storage device is of type SUNW.HAStorage. Before a resource can be created, you need to register its type.
A logical hostname resource is required for all data services since this name represents the IP address clients use to access the service and must be transferable from one cluster node to another. Although it is considered a resource, no resource type is associated with it. Therefore, there is no resource type that needs to be registered with the cluster.
The basic steps to create a resource group for a data service are:
Register the resource types of the resources used in the group, if not already registered.
Create a resource group specifying the cluster nodes that can run it along with the pathname of any administrative directory.
Add required resources to the resource group.
Place the resource group in the online mode.
The following example illustrates those steps.
On beta:
scrgadm -a -t SUNW.nfs scrgadm -a -t SUNW.HAStorage scrgadm -a -g myresgrp -h alpha,beta -y \ Pathprefix=/global/nfs/admin scrgadm -a -L -g myresgrp -l hostname scrgadm -a -j hares -g myresgrp -t SUNW.HAStorage -x \ ServicePaths=/global/nfs -x AffinityOn=True scrgadm -a -j nfsres -g myresgrp -t SUNW.nfs -y \ Resource_dependencies=hares scswitch -Z -g myresgrp |