- Installing the Oracle Solaris OS on a Cluster Node
- Securing Your Solaris Operating System
- Solaris Cluster Software Installation
- Time Synchronization
- Cluster Management
- Cluster Monitoring
- Service-Level Management and Telemetry
- Patching and Upgrading Your Cluster
- Backing Up Your Cluster
- Creating New Resource Types
- Tuning and Troubleshooting
Creating New Resource Types
As described in the section "Data Service and Application Agents" in Chapter 2, "Oracle Solaris Cluster: Features and Architecture," Oracle has a substantial list of supported agents that cover most of the applications in your data center. These application agents are maintained by Oracle and are extensively tested on each new release of both the Solaris Cluster software and the application itself. Even so, inevitably you will have an application that is not part of the existing agent portfolio.
Application Suitability
Before creating a resource type for your application, you must determine whether the application meets the criteria for being made highly available. The following list highlights the main points you must consider. For a complete list see "Analyzing the Application for Suitability" in [SCDevGuide].
- Is your application crash-tolerant? This is important because in a highly available environment your application must be able to recover its data consistency without requiring manual intervention. If the application did require such intervention, then most of the benefits of a high-availability framework would be lost.
- Does your application rely on the physical node name of the machine, such as that resulting from calls to uname, gethostbyname, or equivalent interfaces? If so, then when the application moves to another cluster node, the dependency on the physical hostname will probably cause the application to fail. There is a work-around to this problem, which is to interpose the libschost.so.1 library. However, this work-around can sometimes raise support issues with application vendors.
- Can your application run on a multihomed system, that is, one with several public networks? Your application must be able to handle situations where IP addresses are configured and unconfigured from network adapters as services move around the cluster. This has consequences for the way your application binds to the network.
- Does your application use hard-coded path names for the location of its data? If so, then symbolic links might not be sufficient to ensure that the data is stored in a location that is compatible with using a failover or global file system. If the application renames a data file, it can break the symbolic links.
After you have determined that your application is suitable for being made highly available, you have several ways to achieve the necessary integration:
- You can use the Generic Data Service (GDS) directly and just supply the required parameters. Although you cannot define any new extension properties for the resource type you create, it is by far the simplest option.
- You can create a subclass of the GDS to create a completely new resource type. This option enables you to define one or more extension properties for your new resource type. This option is relatively simple and yet provides considerable flexibility.
- You can extend the GDS using the Advanced Agent Toolkit. Although this option does not create a new resource type, it does enable you to define one or more extension properties. This option is also relatively simple and provides considerable flexibility.
- You can use the GUI scdsbuilder tool and customize the resulting shell script or C source using the Resource Management API (RMAPI) and the Data Service Development Library (DSDL) APIs. If significant customization work is needed, this option might result in an increased maintenance burden.
- You can use the RMAPI or DSDL APIs directly to develop your resource type from scratch. This option trades the development and maintenance costs for ultimate flexibility and performance.
Each option is discussed in more detail in the following sections.
Generic Data Service
The Generic Data Service (GDS) is provided with the Solaris Cluster software. The SUNW.gds agent is packaged in the SUNWscgds package, which is installed as standard by the Solaris Cluster software installer program. The SUNW.gds agent is considered the preferred way to create both failover and scalable resources. The GDS is supported by Oracle, but you must support the script that you provide for the Start_command, Stop_command, Probe_command, and Validate_command methods.
By default, the SUNW.gds resource type is not registered, so you must register it before attempting to create a resource of that type. The commands in the following example show how to determine if the resource type is registered and then how to register it, if it is not already present.
Example 4.13. Registering the SUNW.gds Resource Type
Use the clresourcetype command to determine whether the SUNW.gds resource type needs to be registered.
# clresourcetype list | grep SUNW.gds # clresourcetype register SUNW.gds # clresourcetype list | grep SUNW.gds SUNW.gds:6
In addition to the standard resource properties, the GDS agent has four properties to enable you to integrate your application: Start_command, Stop_command, Probe_command, and Validate_command. These properties are described in "Integrating Your Application-Specific Logic." By using the GDS as the basis for your application, you automatically benefit from all the patches and feature upgrades that the GDS receives.
Example 4.14 shows how you can use the GDS to make the X11 program xeyes highly available. You begin by creating a Start_command program. In this example, a script calls the full path name of the program with a parameter that is passed to the shell script. This script must exist on all the cluster nodes on which the application is intended to run.
Next, having checked that the SUNW.gds resource type is registered, you create the resource group. In this example, you allow the resource group's node list to default to all the cluster nodes.
Next, you create a resource to represent your program. In the example, the Start_command property is specified by the script you wrote (and which must exist on all nodes). The display parameter to use is also specified. Because this program does not listen on any network ports, you set the network_aware property to false. This means that the probe mechanism used will be the continued existence of the xeyes process that the Start_command program leaves running in the background. By default, any resource you create is enabled so that when the resource group is brought online, the resource is automatically started. To change the default, you can specify the -d argument to the clresource create command.
The last two steps instruct the RGM that it needs to control or manage the xeyes-rg resource group and then to bring that resource group online. The action of bringing the resource group online starts the resource because it was created in an enabled state.
Assuming you have allowed remote X11 clients to display on your X server using xhost and you have specified the correct X display to use (substitute a value suited to your environment for myhost:1.0), then the xeyes program will appear on your display. You can switch the resource group between nodes and the RGM will kill the xeyes process and restart it on the new node, phys-summer2, as the example shows.
Example 4.14. Creating a Simple, Highly Available xeyes Service
List the script that will be used to start the xeyes command.
# cat /tmp/start_xeyes #!/bin/ksh /usr/openwin/demo/xeyes -display $1 & exit 0
Check that the SUNW.gds resource type is registered, and then create the resource group and resource that will control the xeyes service.
# clresourcetype list | grep SUNW.gds SUNW.gds:6 # clresourcegroup create xeyes-rg # clresource create -t SUNW.gds > -p start_command="/tmp/start_xeyes myhost:1.0" > -p network_aware=false > -g xeyes-rg xeyes-rs
Use the clresourcegroup command to bring the xeyes-rg resource group online.
# clresourcegroup manage xeyes-rg # clresourcegroup online xeyes-rg # clresourcegroup status xeyes-rg === Cluster Resource Groups === Group Name Node Name Suspended Status ---------- --------- --------- ------ xeyes-rg phys-summer1 No Online phys-summer2 No Offline # clresourcegroup switch -n phys-summer2 xeyes-rg # clresourcegroup status xeyes-rg === Cluster Resource Groups === Group Name Node Name Suspended Status ---------- --------- --------- ------ xeyes-rg phys-summer1 No Offline phys-summer2 No Online
To demonstrate how the GDS handles application failure, quit the xeyes program from your X display. You will notice that the RGM restarts the application almost instantaneously. The messages in /var/adm/messages (see Example 4.15) indicate that the RGM recognized the failure and restarted the service.
After the fault probe determines that the service is online, indicated by Service is online in /var/adm/messages, kill the process again. The resource has two properties that determine how many times it is restarted by the RGM within a certain time period. These properties are Retry_count and Retry_interval (see Example 4.16). After the specified number of failures, the built-in logic of the GDS determines that the current node is unhealthy and releases the service so that it can be started on another node. If the service also experiences problems on this node, then the RGM will not fail the service back to its original node unless the time period, in seconds, as defined by the resource group's Pingpong_interval property, has passed. Instead, the GDS attempts to keep the service running on the remaining node. This behavior is governed by another property called Failover_mode.
The purpose of the Pingpong_interval property is to prevent a service that fails to start from endlessly looping, resulting in the service migrating back and forth between cluster nodes. In a test environment, you might need to reset the value of Pingpong_interval to a lower value. Doing so enables you to restart your service once you have corrected any problems you encountered.
Example 4.15. Sample RGM Messages
The /var/adm/messages file contains information on the state changes of the resource groups and resources in the cluster.
Nov 23 04:00:23 phys-summer2 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group xeyes-rg state on node phys-summer2 change to RG_ONLINE Nov 23 04:01:23 phys-summer2 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource xeyes-rs status msg on node phys-summer2 change to <Service is online.> Nov 23 04:01:25 phys-summer2 Cluster.PMF.pmfd: [ID 887656 daemon.notice] Process: tag="xeyes-rg,xeyes-rs,0.svc", cmd="/bin/sh -c /tmp/start_xeyes myhost:1.0", Failed to stay up. Nov 23 04:01:25 phys-summer2 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource xeyes-rs status on node phys-summer2 change to R_FM_FAULTED Nov 23 04:01:25 phys-summer2 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource xeyes-rs status msg on node phys-summer2 change to <Service daemon not running.> Nov 23 04:01:25 phys-summer2 SC[,SUNW.gds:6,xeyes-rg,xeyes-rs,gds_probe]: [ID 423137 daemon.error] A resource restart attempt on resource xeyes-rs in resource group xeyes-rg has been blocked because the number of restarts within the past Retry_ interval (370 seconds) would exceed Retry_count (2) Nov 23 04:01:25 phys-summer2 SC[,SUNW.gds:6,xeyes-rg,xeyes-rs,gds_probe]: [ID 874133 daemon.notice] Issuing a failover request because the application exited. Nov 23 04:01:25 phys-summer2 Cluster.RGM.global.rgmd: [ID 494478 daemon.notice] resource xeyes-rs in resource group xeyes-rg has requested failover of the resource group on phys-summer2. Nov 23 04:01:25 phys-summer2 Cluster.RGM.global.rgmd: [ID 423291 daemon.error] RGM isn't failing resource group <xeyes-rg> off of node <phys-summer2>, because there are no other current or potential masters Nov 23 04:01:25 phys-summer2 Cluster.RGM.global.rgmd: [ID 702911 daemon.error] Resource <xeyes-rs> of Resource Group <xeyes-rg> failed pingpong check on node <phys- summer1>. The resource group will not be mastered by that node. Nov 23 04:01:25 phys-summer2 SC[,SUNW.gds:6,xeyes-rg,xeyes-rs,gds_probe]: [ID 969827 daemon.error] Failover attempt has failed. Nov 23 04:01:25 phys-summer2 SC[,SUNW.gds:6,xeyes-rg,xeyes-rs,gds_probe]: [ID 670283 daemon.notice] Issuing a resource restart request because the application exited.
Example 4.16. Retry, Failover Mode, and Ping-pong Interval Properties
Use the clresource command to determine the property values of the xeyes-rs resource.
# clresource show > -p retry_count,retry_interval,failover_mode xeyes-rs === Resources === Resource: xeyes-rs --- Standard and extension properties --- Retry_interval: 370 Class: standard Description: Time in which monitor attempts to restart a failed resource Retry_count times. Type: int Retry_count: 2 Class: standard Description: Indicates the number of times a monitor restarts the resource if it fails. Type: int Failover_mode: SOFT Class: standard Description: Modifies recovery actions taken when the resource fails. Type: enum # clresourcegroup show -p pingpong_interval xeyes-rg === Resource Groups and Resources === Resource Group: xeyes-rg Pingpong_interval: 3600
In the preceding example, the display variable property can be changed only by stopping the resource and modifying the Start_command property. Although of little importance here, because the xeyes program must be restarted to change the target X server on which it displays, it does make a difference in instances where a variable can be changed while a service is running. Examples include changing debugging levels to use and changing directories for log files.
To create a resource type that has new extension properties that can be changed when you need to change them, you need to either write your resource type from scratch or create a subclass of the GDS, as described in a later section.
Supporting New Applications Using the Advanced Agent Toolkit
Many application agents in the current Solaris Cluster software release are derived from the Advanced Agent Toolkit methodology [AdvGDSTlkit]: HA-PostgreSQL, HA-MySQL, and HA containers, to name three. All three use the SUNW.gds agent as their basis. However, in its raw form, the SUNW.gds agent has some limitations.
The rationale behind the toolkit is that all new application agents have many common requirements:
- They might require one or more extension properties.
- They must provide debugging information.
- They might need to disable the process-monitoring facility (pmfadm) for applications that leave no obvious child processes to monitor.
- They must supply a Start_command script, as a minimum, and possibly Stop_command, Probe_command, and Validate_command scripts.
The toolkit also simplifies much of the work needed to handle Oracle Solaris Zones and SMF. Thus, providing this extended framework enables your developers to focus on the application-specific integration work rather than on debugging the framework itself. After the work is complete, the new resource type is registered using a registration script.
Developing Resource Types by Creating a Subclass of the GDS
The advantage of creating a subclass of the GDS, rather than writing a new resource type from scratch, is that the new resource type inherits all the best practices that are already part of the standard GDS code. In addition, creating a subclass of the GDS enables you to create your own resource type extension properties while retaining the same level of flexibility as if you had started from scratch. Finally, your new resource type, which is a subclass of the GDS, has a distinct name, enabling you to easily distinguish resources of the new resource type. If you instead used the Advanced Agent Toolkit or the SUNW.gds agent, then you would have to determine what the resource is by examining the extension properties or reviewing the code. This step would be necessary because the resource type would be set to SUNW.gds, rather than MYCORP.appsvr, for example.
You create a subclass of the GDS by creating a resource type registration (RTR) file where the RT_basedir parameter is set to the directory containing binaries used by the standard GDS methods: Start, Stop, Validate, and so on. You then extend the RTR file by defining your own resource type extension properties. Finally, you set the method parameters in the RTR file to point to your scripts that override the standard GDS behavior.
Several existing Sun resource types are implemented this way, including the HA-Logical Domain agent (SUNW.ldom), which was covered in the section "Failover Guest Domains" in Chapter 3, "Combining Virtualization Technologies with Oracle Solaris Cluster Software."
The RTR file for the SUNW.ldom resource type is shown in Example 4.17. In this RTR file, the RT_basedir parameter is set to the standard directory for the GDS package, that is, /opt/SUNWscgds/bin. Of the standard methods, only Init, Boot, and Validate have been overridden using programs that are located in the ../../SUNWscxvm/bin directory. Unlike a standard GDS resource type, the Start_command, Stop_command, Probe_command, and Validate_command properties are assigned fixed values and cannot be changed. This is indicated by the Tunable = NONE settings. Furthermore, each command, apart from validate_command, is called with a consistent set of arguments, namely, -R %RS_NAME -T %RT_NAME -G %RG_NAME. The %variable construct is similar to the $variable syntax found in shell scripts. It means that when a resource of this type is instantiated, use the names you assigned it as arguments. For example, if you wrote a resource type called FOO.bar and then created a resource group called whizz-rg containing a resource called bang-rs of this type, the argument passed would be -R bang-rs -T FOO.bar -G whizz-rg. With these arguments, you can then make calls to the RMAPI or DSDL APIs to retrieve or set properties.
In contrast to the Start_command, Stop_command, and Probe_command properties, the Validate_command property does not use this construct. Instead, the RGM passes the validate command all the properties listed for the resource type on the command line. Then the validate command parses this list and determines whether the configuration is valid.
Example 4.17. RTR File for the SUNW.ldom Resource Type
The following text shows some of the key parts of the RTR file for the SUNW.ldom resource type:
. . . RESOURCE_TYPE = "ldom"; VENDOR_ID = SUNW; RT_DESCRIPTION = "Sun Cluster HA for xVM Server SPARC Guest Domains"; RT_version ="1"; API_version = 10; RT_basedir=/opt/SUNWscgds/bin; Init = ../../SUNWscxvm/bin/init_xvm; Boot = ../../SUNWscxvm/bin/boot_xvm; Start = gds_svc_start; Stop = gds_svc_stop; Validate = ../../SUNWscxvm/bin/validate_xvm; Update = gds_update; Monitor_start = gds_monitor_start; Monitor_stop = gds_monitor_stop; Monitor_check = gds_monitor_check; Init_nodes = RG_PRIMARIES; Failover = FALSE; # The paramtable is a list of bracketed resource property declarations # that come after the resource-type declarations # The property-name declaration must be the first attribute # after the open curly of a paramtable entry # # The following are the system defined properties. Each of the system defined # properties have a default value set for each of the attributes. Look at # man rt_reg(4) for a detailed explanation. # { PROPERTY = Start_timeout; MIN = 60; DEFAULT = 300; } { PROPERTY = Stop_timeout; MIN = 60; DEFAULT = 300; } . . . # This is an optional property. Any value provided will be used as # the absolute path to a command to invoke to validate the application. # If no value is provided, The validation will be skipped. # { PROPERTY = Validate_command; EXTENSION; STRING; DEFAULT = ""; TUNABLE = NONE; DESCRIPTION = "Command to validate the application"; } # This property must be specified, since this is the only mechanism # that indicates how to start the application. Since a value must # be provided, there is no default. The value must be an absolute path. { PROPERTY = Start_command; EXTENSION; STRINGARRAY; DEFAULT = "/opt/SUNWscxvm/bin/control_xvm start -R %RS_NAME -T %RT_NAME -G %RG_NAME"; TUNABLE = NONE; DESCRIPTION = "Command to start application"; } # This is an optional property. Any value provided will be used as # the absolute path to a command to invoke to stop the application. # If no value is provided, signals will be used to stop the application. # # It is assumed that Stop_command will not return until the # application has been stopped. { PROPERTY = Stop_command; EXTENSION; STRING; DEFAULT = "/opt/SUNWscxvm/bin/control_xvm stop -R %RS_NAME -T %RT_NAME -G %RG_NAME"; TUNABLE = NONE; DESCRIPTION = "Command to stop application"; } # This is an optional property. Any value provided will be used as # the absolute path to a command to invoke to probe the application. # If no value is provided, the "simple_probe" will be used to probe # the application. # { PROPERTY = Probe_command; EXTENSION; STRING; DEFAULT = "/opt/SUNWscxvm/bin/control_xvm probe -R %RS_NAME -G %RG_NAME -T %RT_NAME"; TUNABLE = NONE; DESCRIPTION = "Command to probe application"; } # This is an optional property. It determines whether the application # uses network to communicate with its clients. # { PROPERTY = Network_aware; EXTENSION; BOOLEAN; DEFAULT = FALSE; TUNABLE = AT_CREATION; DESCRIPTION = "Determines whether the application uses network"; } # This is an optional property, which determines the signal sent to the # application for being stopped. # { PROPERTY = Stop_signal; EXTENSION; INT; MIN = 1; MAX = 37; DEFAULT = 15; TUNABLE = WHEN_DISABLED; DESCRIPTION = "The signal sent to the application for being stopped"; } # This is an optional property, which determines whether to failover when # retry_count is exceeded during retry_interval. # { PROPERTY = Failover_enabled; EXTENSION; BOOLEAN; DEFAULT = TRUE; TUNABLE = WHEN_DISABLED; DESCRIPTION = "Determines whether to failover when retry_count is exceeded during retry_interval"; } # This is an optional property that specifies the log level GDS events. # { PROPERTY = Log_level; EXTENSION; ENUM { NONE, INFO, ERR }; DEFAULT = "INFO"; TUNABLE = ANYTIME; DESCRIPTION = "Determines the log level for event based traces"; } { Property = Debug_level; Extension; Per_node; Int; Min = 0; Max = 2; Default = 0; Tunable = ANYTIME; Description = "Debug level"; } { Property = Domain_name; Extension; String; Minlength = 1; Tunable = WHEN_DISABLED; Description = "LDoms Guest Domain name"; } { Property = Migration_type; Extension; Enum { NORMAL, MIGRATE }; Default = "MIGRATE"; Tunable = ANYTIME; Description = "Type of guest domain migration to be performed"; } { PROPERTY = Plugin_probe; EXTENSION; STRING; DEFAULT = ""; TUNABLE = ANYTIME; DESCRIPTION = "Script or command to check the guest domain"; } { PROPERTY = Password_file; EXTENSION; STRING; DEFAULT = ""; TUNABLE = WHEN_DISABLED; DESCRIPTION = "The complete path to the file containing the target host password"; }
scdsbuilder GUI
To customize an agent beyond what is permitted by the GDS, you can use the Agent Builder command, scdsbuilder (see the scdsbuilder(1HA) man page). This command has three code generation options, and the resulting files are wrapped in a Solaris package that you can install on your cluster nodes:
- DSDL code (see the section "Data Service Development Library").
- ksh code, including all the necessary scha_control commands (see the section "Resource Management API"). With the ksh code, you are creating your own resource type.
- A ksh registration script for a GDS agent. Here, the code generates the appropriate clresource create command.
You can customize the resulting code to your specific needs. However, with the ksh registration script for the GDS agent, the scope for modification is limited. The example in Figure 4.7 shows the use of the third option.
Figure 4.7 Using the scdsbuilder GUI to create a new resource type
The scdsbuilder command starts the Solaris Cluster Agent Builder GUI, as shown in Figure 4.7. In this example, data has already been specified for each field available to the user. A short code of SUNW is specified for the vendor name, and tstgds is specified for the application name. This data is then used to generate both the name of the package that Agent Builder creates for you and the name of the resource type that you will subsequently use.
The information you provide in the other fields is used as follows:
- The RT version enables you to specify a version number for this resource type. You can identify which version of the agent you are running when it is placed into production.
- The working directory is used by Agent Builder as a working area in which it can create your package and write other associated, intermediate files.
- Your target application determines whether you select the scalable or failover option. If a particular instance of an application can run on multiple nodes at once without corrupting any of its data files, then you can select the scalable option. A good example of such an application is a web server. For all other applications, such as databases and file services, select the failover option.
- The Network Aware check box is used to determine whether any resource created using this resource type needs to have the port_list property set. The port_list property is then used by the GDS service to provide a simple probe mechanism.
- The source type option determines whether the resulting code uses the C programming language, ksh, or the GDS (see the section "SUNW.gds" in Chapter 2, "Oracle Solaris Cluster: Features and Architecture") to create the data service. To use the C option, you must have a C compiler installed on your system.
After you have entered the data and clicked on the Next button, you are presented with the screen shown in Figure 4.8.
Figure 4.8 Completing the resource type definition using scdsbuilder
Integrating Your Application-Specific Logic
You use the fields in this second screen to provide the location of the programs (which can be compiled executables or scripts) and their associated arguments that will be used to start, stop, probe, and validate your data service when it is installed on the target cluster nodes. For each program, you can set a time limit on how long it can take for the program to complete. If the program does not complete within the allocated time period, then the resource is placed into a failed state, such as STOP_FAILED.
You are required to provide a value only for the start program. All the other programs are optional. Any programs specified must exit with a return code of zero only when they have successfully completed their work. If they fail to perform their allotted task, they must return a value greater than 100. Values below that are used by the Solaris Cluster commands and have specific meanings (see the intro(1CL) man page).
The programs you assign to the start and stop commands must return successfully only when your target application has actually completed the relevant operation. If the stop command leaves the application under its control running, or not completely stopped, but the stop command returns successfully, then the cluster framework erroneously determines that it is safe to start the resource group on another cluster node. In some instances, particularly when the application uses a global file system, this outcome could result in data corruption because the two instances of the application could write to their data files in an uncontrolled fashion.
If no stop command is provided, the process tree that results from the start command is terminated using the kill command.
The validate command enables you to check that your application is correctly configured on all the potential nodes on which it can run. Again, if the program determines that your application is misconfigured, the validate program must exit with a nonzero exit code.
The capability to incorporate a probe command is one of the key benefits of using the Solaris Cluster framework. A probe command enables you to write a program that determines the health of your application. As an example, if you are writing a probe for a database, you could test whether it can execute basic SQL statements, such as creating or deleting a table, or adding or deleting a record. If you do not provide a probe script, then default methods are used instead.
For non-network-aware applications, the process-monitoring command pmfadm (see the pmfadm(1M) man page) monitors the process tree spawned by your start command. Only if all the processes have failed will the cluster framework attempt to restart the service. Therefore, if your service consists of multiple processes and only one process fails, then pmfadm will not recognize this fault unless it causes all the other processes to fail as well. Consequently, if you need to monitor your application with a higher degree of granularity, you must provide a custom fault probe.
If the application is network-aware, then the default probe tries to open the port listed in the port_list property. Because this is a simple probe, it makes no attempt to retrieve any data. Even if the default probe successfully opens the ports, that does not necessarily indicate overall application health.
In the preceding example, you would install the package generated by scdsbuilder on all your cluster nodes. You would then register the new resource type so that you could create new resources of this type. When the RGM is requested to create a resource, it calls the validate command: /usr/local/bin/my_validate -o some_param. If that command succeeds and you enable the resource, the RGM calls the /usr/local/bin/my_start -r foo-rs -g bar-rg command. In both cases, the initial arguments are fixed, but you can modify them subsequently using the clresource command.
Resource Type Registration File
If you decide to write an agent from scratch using either the RMAPI or DSDL APIs, you must first describe the properties of your proposed resource type in a file known as the resource type registration (RTR) file. This file provides the RGM with details on which programs to call and which variables are required to control the particular application.
Example 4.18 shows an extract from the SUNW.LogicalHostname RTR file. As the example shows, all the programs for this resource type are located in the directory defined by RT_BASEDIR. The RTR file also defines programs that will, among other tasks, start, stop, and probe (Monitor_start) the logical IP address that the resource plumbs. These addresses are, in turn, defined in the HostnameList property.
The extension properties you define are all application-specific. They could, for example, refer to the location of the software binaries, that is, the application home directory. If a property has a default value, then you can define it in the RTR file to save your system administrator from having to override it each time he or she creates a resource of this type. Furthermore, you can place limits on what values certain properties can take and when they can be changed.
Example 4.18. Extract from the SUNW.LogicalHostname RTR File
The following text shows some of the key parts of the RTR file for the SUNW.LogicalHostname resource type:
# # Copyright 1998-2008 Sun Microsystems, Inc. All rights reserved. # Use is subject to license terms. # #ident "@(#)SUNW.LogicalHostname 1.20 08/05/20 SMI" # Registration information and Paramtable for HA Failover IPaddress # # NOTE: Keywords are case insensitive, i.e. users may use any # capitalization style they wish # RESOURCE_TYPE ="LogicalHostname"; VENDOR_ID = SUNW; RT_DESCRIPTION = "Logical Hostname Resource Type"; SYSDEFINED_TYPE = LOGICAL_HOSTNAME; RT_VERSION ="3"; API_VERSION = 2; INIT_NODES = RG_PRIMARIES; RT_BASEDIR=/usr/cluster/lib/rgm/rt/hafoip; FAILOVER = TRUE; # To enable Global_zone_override GLOBAL_ZONE = TRUE; START = hafoip_start; STOP = hafoip_stop; PRENET_START = hafoip_prenet_start; VALIDATE = hafoip_validate; UPDATE = hafoip_update; MONITOR_START = hafoip_monitor_start; MONITOR_STOP = hafoip_monitor_stop; MONITOR_CHECK = hafoip_monitor_check; PKGLIST = SUNWscu; # # Upgrade directives # #$upgrade #$upgrade_from "1.0" anytime #$upgrade_from "2" anytime # The paramtable is a list of bracketed resource property declarations # that come after the resource-type declarations # The property-name declaration must be the first attribute # after the open curly of a paramtable entry # # The Paramtable cannot contain TIMEOUT properties for methods # that aren't in the RT { PROPERTY = Start_timeout; MIN=360; DEFAULT=500; } . . . # HostnameList: List of hostnames managed by this resource. All must be # on the same subnet. If need > 1 subnet with a RG, create as many # resources as there are subnets. { PROPERTY = HostnameList; EXTENSION; STRINGARRAY; TUNABLE = AT_CREATION; DESCRIPTION = "List of hostnames this resource manages"; } . . .
Resource Management API
The Resource Management API (RMAPI) is a set of low-level functions contained in the libscha.so library with both C and shell interfaces. All the function names provided by this interface are prefixed with scha_. The shell interfaces are listed in section 1HA of the Solaris Cluster manual pages.
The ksh scripts generated by the Agent Builder are built using these commands, so you can insert additional lines in this code where the comments indicate. However, for greater control over the logic imposed on your application you must write your application agent from scratch.
Data Service Development Library
The Data Service Development Library (DSDL) is a set of higher-level functions encapsulated in the libdsdev.so library that builds on the RMAPI functionality. This library can only be accessed using a C programming language interface. Consequently, it is potentially more time-consuming to write a complete application agent using this approach, although it does offer the greatest level of performance and flexibility.
If you used Agent Builder to create a resource type, you can customize it by inserting extra DSDL code where the comments indicate. Otherwise, you must write your agent from scratch.
All the function names provided by the library are prefixed with scds_ and are documented in section 3HA of the Solaris Cluster manual pages. The NFS agent source code [NFSAgent] serves as a good example of how these APIs are used. Using the nfs_svc_start.c source as a specific example, the library is initialized with scds_initialize(). Resource and resource group names are then retrieved using scds_get_resource_name() and scds_get_resource_group_name() calls, respectively. Finally, the status of the resource is set by the RMAPI scha_resource_setstatus() call. Most of the coding effort involved with using these interfaces is consumed by the logic that describes how the agent should behave in various failure scenarios. For example, how many times should the agent attempt to restart the service before giving up and potentially failing over? What should the agent do in response to a network failure?
One advantage of using the GDS is that all the best practices for service behavior are already in the logic of the code that makes up the agent, saving you from re-creating that code.
Useful Utilities for Building Custom Data Services
The Solaris Cluster software comes with two programs that you will find very useful if you create your resource type from scratch: hatimerun (see the hatimerun(1M) man page) and pmfadm.
hatimerun Command
Throughout the Start, Stop, Monitor_start, and Validate methods of your resource type, you will need to run various programs to perform the required logic steps. Because your goal is high availability, you cannot wait for a program that might never respond or return, whether that program has gone into a loop or is unable to retrieve some important data from the network, disk, or other program. Consequently, you must place time constraints on the duration of the program's execution. This is the function of the hatimerun command. It enables you to execute a program under its control and set a limit on the time it can take to respond. If the program in question fails to respond in a timely fashion, it is terminated by default.
The hatimerun command also enables you to leave the program running asynchronously in the background, change the exit code returned after a timeout, or use a particular signal to terminate your program.
The most common usage of this command is in your probe commands or in the steps leading up to stopping or starting your application.
pmfadm Command
If you write a custom probe for your service, you decide what constitutes a healthy service. The criteria might include application-specific checks to determine if the data it is delivering to potential clients is valid or timely. If the application consists of multiple processes, you might want to check that each process is running, using the ps command. All of these tests combine to give you the best assessment of your application's current health. However, your probe is scheduled to make its checks only at regular intervals. Even though you can tune these checks to occur at shorter intervals, doing so results in a greater load on your system. Consequently, you must wait, on average, half the probe period before your probe detects a situation where your application has completely failed, meaning that all the processes have exited. Once again, this does not help much toward your goal of high availability.
The solution is to use pmfadm, the process-monitoring facility command. When you start your application under pmfadm, it monitors all the processes your application spawns to a level that you determine. By default, it monitors all the application's child processes. If they all exit, pmfadm immediately restarts your application for you on the condition that it has not already exceeded a preset number of restarts within a certain time interval.
The most common usage of this command is in your start command to ensure that your key application processes are monitored and that complete failures are reacted to immediately.
libschost.so Library
Some applications store or make use of configuration information about the physical hostname of the server on which the application is running. Such applications will most likely fail when the application is placed in a resource group and moved between the nodes of a cluster. This failure occurs because calls to uname or gethostbyname produce different responses on the global zone of each cluster node. Oracle Application Server and the Oracle E-Business Suite are two examples of programs that risk such failures [LibHost].
To overcome this limitation, you use the LD_PRELOAD feature to enable the runtime linker to interpose the libschost.so.1 library in the dynamic linking process. The following example shows how this is done. You can use the same construct within your resource Start or Monitor_start (probe) methods, as required.
Example 4.19. How to Use the sclibhost.so.1 Library to Change the String Returned as the Hostname
Use the uname command to display the current hostname.
# uname -n phys-winter1
Set the LD_PRELOAD_32, LD_PRELOAD_64 and SC_LHOSTNAME environment variables, and then rerun the uname command.
# LD_PRELOAD_32=$LD_PRELOAD_32:/usr/cluster/lib/libschost.so.1 # LD_PRELOAD_64=$LD_PRELOAD_64:/usr/cluster/lib/64/libschost.so.1 # SC_LHOSTNAME=myhost # export SC_LHOSTNAME LD_PRELOAD_32 LD_PRELOAD_64 # uname -n myhost