Managing Back Up and Recovery
Successful data center operations require good backup, restore, and recovery processes. Good processes are critical when a data center is providing highly available services.
The management server is the focal point of Sun Cluster 3.0 system recovery. Recovery procedures for the management server itself are required. Because the management server acts as the JumpStart server for the cluster nodes, the management server plays an important part in the recovery of a cluster node.
Restoring Management Server Files
The management server contains many different files: JumpStart software profiles for cluster nodes, copies of Solaris OE used when installing clients, AnswerBook2 documentation, Sun Management Center software support files, and so on. Most of these files are static. You can restore these files from the distribution media. However, SYSLOG files change regularly and tend to be relatively small, so they require continuous backup.
Determining When to Perform Full or Incremental Back Ups
Using a local tape drive4, you can do a full backup of the management server when major changes of the file systems occur. For example, perform a full back up when updating the JumpStart software directory structure with a new release of the Solaris OE.
Perform incremental backups to save SYSLOG and configuration files regularly.
Recovering the Management Server
A set of DVDs that contain the software image installed on the management server in the factory enables you to rebuild the management server to the factory-installed state. If a catastrophic failure causes the loss of the management server operating environment, and you cannot recover the operating environment from backup tape, use this recovery process. Recovering to the factory-installed state by using the DVDs is significantly faster than reloading the dozens of packages installed on the management server from their distribution DVDs.
Reinstalling Cluster Nodes
A DVD image is not provided for the cluster nodes. Reinstall a cluster node by running JumpStart software against it from the management server. This technique is also useful for switching among the various database configurations available. Once the nodes are fully installed and operational, follow standard, documented procedures for backing up Solaris OE routinely on them.5
Backing Up and Recovering the ORACLE Database
ORACLE offers a series of backup and recovery methods, each of which provides a finer grain recovery than the one preceding it. A "cold" backup requires shutting down all database instances and making a copy of each component (data files, control files, redo log files, etc.). Cold backups can only recover the database to the point in the past when the backup was last taken. Individual tables may be backed up to a point in time and recovered using the ORACLE utilities Export and Import features, respectively. These default backup methods can be used after the nodes are installed and configured.
Using Archive Log Mode
To provide for up-to-the-minute recovery, the DBA places the database in archive log mode, specifying a destination to which each instance copies its online redo log files when they are full. Setting up archive log mode allows the DBA to perform "hot" backups while each instance remains on-line.
To issue a hot backup, the DBA codes a script, run from either instance, that causes each tablespace, one at a time, to be placed off-line using the alter tablespace...begin backup command.
Processing then copies each component file of the tablespace either to tape or to another directory before bringing the tablespace back on-line using alter tablespace...end backup. SQL can be issued to modify data in the tablespace while it is being backed up, but doing so requires additional redo log space. Therefore, we recommend that you perform hot backups when the database is likely to undergo few data modifications.
Using ORACLE's Recovery Manager Utility
ORACLE's Recovery Manager utility (RMAN) performs hot backups without placing each tablespace into backup mode. An additional database must be created to store the RMAN recovery catalogue. Alternatively, you can purchase ORACLE Data Guard software to set up a database replication site, physically removed from the RAC cluster. A site running Data Guard software operates in a continuous recovery mode, constantly processing redo log files it receives from the RAC instances. You can take advantage of the Data Guard software site to perform any necessary backups, leaving the cluster fully available around the clock.
Using ORACLE's Flashback Query
Oracle9i introduced a new feature called Flashback Query, allowing a user to perform SQL against a table or set of tables, seeing their contents as they existed at some point in the past, as specified by the user.
Flashback Query requires the database to be placed into automatic undo management mode, another new feature of Oracle9i. Using automatic undo management frees the DBA from having to create individual rollback segments (also known as undo) for each RAC instance. Instead, the DBA creates what is known as an undo tablespace for each instance, and Oracle9i takes care of creating and deleting rollback segments within each tablespace as needed.
To facilitate Flashback Query, the DBA specifies an undo retention period to indicate how far back an SQL statement may need to go, thus dictating the amount of undo information retained in each undo tablespace at any given time.