- Installing the Oracle Solaris OS on a Cluster Node
- Securing Your Solaris Operating System
- Solaris Cluster Software Installation
- Time Synchronization
- Cluster Management
- Cluster Monitoring
- Service-Level Management and Telemetry
- Patching and Upgrading Your Cluster
- Backing Up Your Cluster
- Creating New Resource Types
- Tuning and Troubleshooting
Service-Level Management and Telemetry
When you consolidate multiple services onto a Solaris Cluster installation, you must ensure that your service levels are met even when several services reside on the same cluster node. The Oracle Solaris OS has many features, such as resource controls and scheduler options, to help you achieve these goals. These resource allocations can be defined in the projects database stored locally in /etc/project or held in the name service maps.
The Solaris Cluster software can bind both resource groups and resources to projects using the RG_project_name and Resource_project_name properties, respectively. The following example shows how to create a processor pool (containing four CPUs) that uses the fair share scheduler (FSS). The processor pool is then associated with the user project that limits shared memory usage to 8 gigabytes. The FSS can be enabled by using the dispadmin -d FSS command.
Example 4.9. Binding a Resource Group to a Project Associated with a Processor Pool
Determine the number of processors the system has using the psrinfo command.
Define a four-CPU-processor set called oracle_pset in a temporary file, and then use the file as input to the poolcfg command.
# psrinfo | wc -l 24 # cat /tmp/create_oracle_pool.txt create pset oracle_pset ( uint pset.min = 1 ; uint pset.max = 4) create pool oracle_pool associate pool oracle_pool ( pset oracle_pset ) modify pool oracle_pool ( string pool.scheduler = "FSS" ) # poolcfg -f /tmp/create_oracle_pool.txt
Instantiate the configuration using the pooladm command.
# pooladm -c # pooladm system default string system.comment int system.version 1 boolean system.bind-default true string system.poold.objectives wt-load pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true string pool.scheduler FSS int pool.importance 1 string pool.comment pset pset_default pool oracle_pool int pool.sys_id 2 boolean pool.active true boolean pool.default false string pool.scheduler FSS int pool.importance 1 string pool.comment pset oracle_pset pset oracle_pset int pset.sys_id 1 boolean pset.default false uint pset.min 1 uint pset.max 4 string pset.units population uint pset.load 17 uint pset.size 4 string pset.comment cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 0 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 3 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 2 string cpu.comment string cpu.status on-line pset pset_default int pset.sys_id -1 boolean pset.default true . . .
Use the projadd command to make oracle_pool the project pool for user oracle.
# projadd -p 4242 -K "project.max-shm-memory=(privileged,8GB,deny)" > -K project.pool=oracle_pool user.oracle # su - oracle Sun Microsystems Inc. SunOS 5.10 Generic January 2005 $ id -p uid=424242(oracle) gid=424242(oinstall) projid=4242(user.oracle) $ exit # clresourcegroup create -p RG_project_name=user.oracle oracle-rg
Similarly, using the clzonecluster command (see the clzonecluster(1M) man page), you can bind zone clusters to pools, dedicate or limit the number of CPUs allocated to them, and limit the physical, swap, or locked memory they can use.
Gathering Telemetry from the Solaris Cluster Software
The Solaris Cluster service-level management feature enables you to configure the Solaris Cluster software to gather telemetry data from your cluster. Using this feature, you can collect statistics on CPU, memory, swap, and network utilization of the cluster node as well as on resource groups and system components such as disks and network adapters. By monitoring system resource usage through the Solaris Cluster software, you can collect data that reflects how a service using specific system resources is performing. You can also discover resource bottlenecks, overloads, and even underutilized hardware resources. Based on this data, you can assign applications to nodes that have the necessary resources and choose which node each application should fail over to.
This feature must be set up using the clsetup command. The telemetry data is stored in its own Java DB database held on a failover or global file system that you must provide for its use. After the setup is complete, you can enable the telemetry on the resource groups, choose the attributes to monitor, and set thresholds. Figure 4.5 and Figure 4.6 show the type of output you can receive from using this feature.
Figure 4.5 Alarm showing that the write I/O rate to disk d4 has exceeded the threshold set
Figure 4.6 Public network adapter utilization telemetry gathered using the service-level management feature
Figure 4.5 shows that an alarm has been generated because disk d4 has exceeded the threshold set for it.
Figure 4.6 shows the utilization of the public network adapters bge0 and bge1 on cluster node pbaital1.
The telemetry uses the RG_slm_type resource group property, which can be set to one of two values: automated or manual. The default value for the RG_slm_type property is manual. Unless the RG_slm_type property value is explicitly set to automated when a resource group is created, telemetry is not enabled for the resource group. If the resource group RG_slm_type property is changed, resource utilization monitoring begins only after the resource group is restarted.
When a resource group has the RG_slm_type property set to automated, the Resource Group Manager (RGM) internally generates a Solaris project to track the system resource utilization for all processes encapsulated by the resource of the resource group. This tracking happens regardless of whether the RG_project_name and Resource_project_name properties are set. The telemetry can track only the system resource utilization: CPU usage, resident set size (RSS), and swap usage for resource groups that have the RG_slm_type property set to automated. Telemetry for other objects is gathered at the node, zone, disk, or network interface level, as appropriate.
See Example 8.9 in Chapter 8, "Example Oracle Solaris Cluster Implementations," for more information about how to set up, configure, and use the Solaris Cluster telemetry.
Using the Solaris Cluster Manager browser interface simplifies the process of configuring thresholds and viewing the telemetry monitoring data.
The following example shows the generated project name in the RG_SLM_projectname property. However, unlike other resource group properties, you cannot set this property manually. Furthermore, if RG_slm_type is set to automated, the RG_project_name and Resource_project_name properties will be ignored. Conversely, when RG_slm_type is set to manual, the processes of the resource group's resource will be bound to the projects named in the RG_project_name and Resource_project_name properties. However, the RGM will not track the system resources they use.
Example 4.10. The Effect of Setting the RG_slm_type Property to automated
Use the clresourcegroup command to show the property settings for the apache-1-rg resource group.
# clresourcegroup show -v apache-1-rg === Resource Groups and Resources === Resource Group: apache-1-rg RG_description: <NULL> RG_mode: Failover RG_state: Managed RG_project_name: default RG_affinities: <NULL> RG_SLM_type: manual Auto_start_on_new_cluster: False Failback: False Nodelist: phys-winter1 phys-winter2 Maximum_primaries: 1 Desired_primaries: 1 RG_dependencies: <NULL> Implicit_network_dependencies: True Global_resources_used: <All> Pingpong_interval: 3600 Pathprefix: <NULL> RG_System: False Suspend_automatic_recovery: False --- Resources for Group apache-1-rg --- . . .
Use the clresourcegroup command to set the RG_SLM_type property to automated.
# clresourcegroup set -p RG_SLM_type=automated apache-1-rg # clresourcegroup show -v apache-1-rg === Resource Groups and Resources === Resource Group: apache-1-rg RG_description: <NULL> RG_mode: Failover RG_state: Managed RG_project_name: default RG_affinities: <NULL> RG_SLM_type: automated RG_SLM_projectname: SCSLM_apache_1_rg RG_SLM_pset_type: default RG_SLM_CPU_SHARES: 1 RG_SLM_PSET_MIN: 0 Auto_start_on_new_cluster: False Failback: False Nodelist: phys-winter1 phys-winter2 Maximum_primaries: 1 Desired_primaries: 1 RG_dependencies: <NULL> Implicit_network_dependencies: True Global_resources_used: <All> Pingpong_interval: 3600 Pathprefix: <NULL> RG_System: False Suspend_automatic_recovery: False --- Resources for Group apache-1-rg --- . . .