- Installing the Oracle Solaris OS on a Cluster Node
- Securing Your Solaris Operating System
- Solaris Cluster Software Installation
- Time Synchronization
- Cluster Management
- Cluster Monitoring
- Service-Level Management and Telemetry
- Patching and Upgrading Your Cluster
- Backing Up Your Cluster
- Creating New Resource Types
- Tuning and Troubleshooting
Backing Up Your Cluster
Although your data might be protected by hardware RAID or host-based mirroring software and even possibly replicated to another site for disaster recovery purposes, you must have a consistent, usable backup of the data on your cluster. The requirement is twofold, involving backup of the root disk and backup of the application data. Both have their own specific challenges.
Root Disk Backup
Your root disk contains the Oracle Solaris OS with numerous configuration files that the system requires to perform its tasks. Not all of these files are static. Many of the log files you need to retain for auditing and debugging purposes are highly dynamic. Therefore, you must achieve a consistent backup of your system so that you can restore your system successfully, if the need arises.
When using UFS for the root disk, only two methods are available for achieving a guaranteed consistent backup of the root file system partitions:
- Boot the system into single-user mode.
- Use both lockfs and fssnap while the system is at its normal run level.
Obviously, booting a node into single-user mode requires that you switch over all the services hosted on this node. Not only does this result in service outages, but it also means that the application might have to share the resources on its new host node, which might degrade its performance somewhat. The lockfs/fssnap option seems better. However, they can result in the system pausing while the data is flushed from the buffer cache and a consistent view is reached. If this pause is too long, it might have an adverse effect on the cluster framework. Furthermore, any real-time process prevents fssnap from being able to lock the file system. Thus, with a Solaris Cluster installation, you must temporarily suspend the xntpd daemon. However, other processes, such as the Oracle 10g Real Application Clusters or Oracle 11g Real Application Clusters frameworks might make this approach unworkable.
After you have performed the backup, you can delete the snapshot and move on to the next partition on the root disk.
Example 4.12. Using lockfs and fssnap to Create a Consistent Root (/) File System Snapshot
Stop the xntpd daemon before locking the root (/) file system with the lockfs command.
# /etc/rc2.d/S74xntpd.cluster stop # lockfs -f
Take a snapshot of the root (/) file system using the fssnap command before restarting the xntpd daemon.
# time fssnap -o backing-store=/spare_disk / /dev/fssnap/0 real 0m19.370s user 0m0.003s sys 0m0.454s # /etc/rc2.d/S74xntpd.cluster start Perform backup... # fssnap -d /dev/fssnap/0 Deleted snapshot 0.
For an Oracle Solaris ZFS file system, the situation is much more straightforward. By issuing a zfs snapshot command, you can create a consistent view of a file system that you can back up and restore with confidence. Using the -r flag allows you to create these snapshots recursively for all file systems below a certain mount point, further simplifying the process.
Backing Up Application Data on a Cluster
The first challenge with backing up application data when a service resides on a cluster is determining which cluster node the service is currently running on. If a failure has recently occurred, then the service might not be running on its primary node. If you are running Oracle RAC, the database is probably running on multiple nodes simultaneously. In addition, the data might be stored on raw disk or in Oracle's Automatic Storage Management (ASM), rather than in a file system. Consequently, any backup process must be capable of communicating with the node that currently hosts the application, rather than depending on the application being on a particular node, and potentially using application-specific backup procedures or software.
Although fssnap can be used in certain circumstances to achieve a consistent view of the root (/) file system partitions for backup, do not use it with failover UFS file systems. The pause in file system activity while the snapshot is being taken might result in the service fault probe detecting a fault and causing a service failover. Furthermore, fssnap cannot be used with global file systems (see the section "The Cluster File System" in Chapter 2, "Oracle Solaris Cluster: Features and Architecture") because fssnap must be run on the UFS mount point directly and works closely with the in-memory data structures of UFS. This means that the PxFS client and server (master) must interpret the fssnap ioctl system calls, but this capability is not currently present in PxFS.
Once more, the Oracle Solaris ZFS snapshot feature enables you to obtain a consistent view of the application data and so is a simpler option if there are no specific tools for consistently backing up the application data.
Many backup products are available from Oracle and from third-party sources. Many have application-specific integration features, for example, the ability to integrate with Oracle's RMAN backup function. Most products can back up data stored in any file system (UFS, ZFS, QFS, VxFS) that you might have configured in your cluster.
Highly Available Backup Servers
It's obviously very important to perform regular, secure backups of your critical systems. This, in turn, means that the systems performing the backup must be sufficiently highly available. Otherwise, they might not be able to complete a backup within the time window available. Although there is little you can do to make an individual tape drive more available, you can have tape libraries housing multiple tape drives. Then the problem of availability rests with the system that controls the backups.
A backup (master) server contains the backup configuration information: catalogs of previous backups, schedules for subsequent backups, and target nodes to be backed up. Just like any other service, this collection of data files and the programs that access it can be made highly available. Thus, a highly available service can be achieved by placing the configuration files on a highly available file system, hosted by one or more Solaris Cluster nodes, and encapsulating the backup server program in a suitable resource in a resource group.
The most common data center backup configuration uses SAN-attached tape libraries with multiple tape drives. You configure the master server to manage the backup by communicating with the client software installed on each target cluster node to be backed up. Instead of defining an entire physical server as a target, you use the logical host of the individual services that require their data to be backed up. The master server then contacts the appropriate physical node when the time comes to back up the data. If you need to back up the individual nodes, then you define the backup so that it covers only the file systems that constitute the root (/) file system. When the time comes to perform the backup, the master server directs the client to stream the necessary dataset to one or more tapes in the library.
Solaris Cluster agents are available for both the StorageTek Enterprise Backup software and Veritas NetBackup. If a Solaris Cluster agent is not available for your backup software, you can easily create one, as described in the next section.