SAP R/3 Storage Management
Service-level agreements increasingly have to reflect that, in case of planned downtime such as backup, hardware/software maintenance, and R/3 upgrade, and unplanned downtime such as error analysis and restore, the system has to be available again within minutes. Figure 20 shows a combination of advanced infrastructure solutions developed by SAP's Advanced Technology Group (ATG) in cooperation with their storage partners. In this scenario, the live data is constantly copied and business continuation is allowed during split (and resynchronization) of the mirror5.
Figure 20 Split mirror backup and standby R/3 System.
Once the mirror is split, ATG creates additional copies for backup and a standby SAP R/3 System. This solution minimizes the impact on the live environment and offloads the backup activity from the live database server.
The split mirror solution was successfully implemented for a SAP R/3 database managed by DB2/390 stored on one IBM Enterprise Storage Server (ESS) using ESS's advanced functionsthe local FlashCopy and synchronous Peer-to-Peer Remote Copy (PPRC).
Customers who require remote data vaulting or need to scale to a larger database size can simply extend this split mirror solution to an additional ESS. The split mirror solution described in this part of the article is an exact repetition of the solution implemented on IBM's RAMAC Virtual Array (RVA).
In this implementation, only logical volumes that contain DB2 LOGs, Bootstrap Dataset (BSDS), and ICF catalog are constantly mirrored, and all other logical volumes used by the R/3 database are synchronized only for backup and refresh of the standby system. With this setup, user or application logical errors are not copied immediately to the mirror. In case of a disaster, ATG can recover the database to a certain point in time, starting with the last backup and applying the LOGs of the live system. This process (called DB2 conditional restart) may be very time-consuming.
Business continuation in a split mirror implementation (described later) that also supports remote data vaulting (two storage subsystems) will be significantly reduced if the live storage subsystem is completely lost. To avoid having to perform a DB2 conditional restart in this situation, you need to perform constant PPRC of all logical volumes that belong to the R/3 database.
Environment Setup
Because split mirror backup/recovery is a high-availability solution, ATG highly recommends the use of two physically separated database hosts and two ESS storage systems. Each database host manages its own ESS storage system, and the fast resynchronization of the live database with a remote copy is managed by the ESS systems.
In ATG's tests, two LPARs were run in a SYSPLEX environment instead of the two physically separated database hosts. The R/3 live database was stored in one ESS cluster, the mirror, and an additional copy in the other ESS cluster (an overview of the ESS structure is discussed later in the article).
For the fast synchronization of the mirror, ATG used the synchronous Peer-to-Peer Copy (PPRC) function. To adjust these commands for a certain environment, it is necessary to know the parameters used by the PPRC commands and how to value them, as shown in Table 15.
Table 1 PPRC Parameters
Parameter |
Description |
ser# |
A unique serial number given to each ESS at the plant. |
ssid |
Subsystem ID of a logical subsystem (LSS). |
lss ID |
Two-digit number of a logical subsystem (LSS)last 2 digits of SSID. |
linkaddr |
Physical link to/from the storage system. Format: aaaa bb cc. |
aaaa |
Primary volume's 3990 Cluster/Interface (System Adapter ID, SAID). |
bb |
ESCON director "DESTINATION" address. Is 00 if this is a directly attached pair of DASD control units, or with a static ESCON switch. |
cc |
Reserved address; always 00 for a 3990-06. |
ccuu |
Address (four digits) of a device (logical volume) inside the ESS. |
cca |
Channel connection address of a logical volume (last two digits of ccuu). |
serial |
Name of a logical volume. |
The logical volume addresses and names are defined by the customer's storage administrator, who provides those that fit the customer's naming conventions.
ESS Structure and Connections
The ESS is divided into two symmetrically structured units called clusters. Each cluster can take over the workload of the other, if one is unavailable.
Host and PPRC Connections
The ESS is equipped with 16 host adapters (HA) located in 4 HA bays. Each bay contains four Has, and each HA consists of two ports. In ATG's OS/390 test environment, only ESCON connections were used for the host and PPRC connections.
Figure 21 shows the SAIDs of the ESCON ports of ATG's test ESS (from the front)5. The highlighted ports are used for PPRC between the two ESS clusters. The primary logical partition on the host mainframe (LPAR) and ESS cluster 1, and the secondary LPAR and ESS cluster 2 are connected by eight ESCON connections each.
Figure 21 Overview of test ESS.
ATG chose this configuration to treat their test ESS as two separate storage systems. This is, however, not the recommended configuration.
In general, a host is connected to ports located in all four HA bays. This configuration will increase the availability. If one connection, an HA bay, or even a cluster is not available, the Common Parts Interconnect (CPI) bus will give access to the data.
From the host or ESS storage system point of view, data will be transferred on logical paths. The maximum logical paths per ESCON port is 64. To maximize the PPRC I/O throughput, ATG used 4 ESCON connections (corresponding to 256 logical paths) between the 2 ESS clusters.
Volume Layout and Definition Of The Mirror
ATG's live database (consists of DB2 BSDS, LOGs, catalog and directory; all R/3 pagesets (table and index spaces); and ICFCAT and user catalog) is spread across 48 logical volumes (3390-3). This set of volumes was named DBt2, as shown in Figure 225.
Figure 22 Volume layout.
Volume set DBt0 is the mirror of the live database, created by PPRC. Because PPRC copies tracks from one volume to another, the copy will differ from the original only in the device address. During the PPRC process that makes the DBt0 volumes accessible on the secondary side (RECOVER), ATG renames the volumes to their original names. In the end, names and contents of the catalog datasets are unchanged and still reflect the volumes used by DBt2. Volume set DBt1 is a local FlashCopy (on the same cluster or storage system) of the mirrorthe secondary runtime instance.
To maximize I/O throughput, it is recommended to spread the volumes evenly across all available logical subsystems (LSSs). In ATG's environment, eight LSSs exist on each cluster (see Figure 21). Every LSS in cluster 1 contains six DBt2 volumes, and every LSS in cluster 2 contains six DBt0 and six DBt1 volumes.
Split Mirror Backup
The live system is in normal read/write operation. Only the LOG-volumes (DB2 LOGs, BSDS, ICFCAT, and user catalog) are in a constant synchronous PPRC connection. After the last resynchronization (BACKUP), all other PPRC volume pairs were suspended, and they are now accessible as usual simplex volumes, as shown in Figure 235.
Figure 23 Split mirror backup solution.
Split Mirror Backup Process
On the primary side, ATG suspends the LOG-volume pairs. This allows them to operate on the primary and secondary sides without any mutual influence. On the secondary side, the LOG-volumes are made simplex and are varied together with all other DBt0 and DBt1 volumes online (step 1).
The next step is to FlashCopy the DBt0 volumes to the DBt1 volumes (step 2). This safety copy will help ATG recover from a disaster that may happen during the resynchronization of the mirror. A user or application logical error during the resynchronization would immediately make the mirror also inconsistent. The safety copy DBt1 will enable recovery of the database with a DB2 conditional restart on the secondary side, while the live system is available for error analysis.
At this point in time, ATG is ready for the resynchronization of the mirror. All PPRC pairs again are established in RESYNC mode, which copies only those tracks updated during the period of suspension (step 3).
As soon as this process is 99% finished, ATG causes DB2 to suspend all writes (step 4) coming from the R/3 application. This DB2 function throttles down application writes, but no R/3 application process will recognize this; a process will slow down only as long as ATG causes DB2 to resume write operations.
Once the remaining 1% of updated tracks is also synchronized, ATG suspends all PPRC pairs (step 5). Immediately after the mirror is split, DB2 resumes all write operations (step 6).
NOTE
Only in the short period between step 4 and step 6 were application writes slowed down.
The Live System Is Now Back to Normal Read/Write Operation
On the secondary side, ATG now makes all DBt0 volumes simplex and varies them together with all other DBt0 and DBt1 volumes online (step 7). Because the DBt0 volumes need to be as fast as possible to establish the PPRC pair between the LOG-volumes, ATG copies with FlashCopy all DBt0 volumes to DBt1 (step 8). As soon as the FlashCopy relationship is established (LOGICAL COMPLETE), FlashCopy allows read/write access although the tracks are not yet physically copied.
NOTE
Before a task can update a track on the source that has not yet been copied, FlashCopy copies the track to the target volume. The following reads to this old track on the target volume will be satisfied from the target volume. After some time, all tracks will have been copied to the target volume, and the FlashCopy relationship will end.
Immediately after the logical complete message, ATG again establishes the PPRC pairs in RESYNC mode (step 9) for the LOG volumes.
The Environment Is Now Back to Normal Processing
In ATG's test scenario, it is assumed that the standby system also has to be available as soon as possible. Therefore, after the first FlashCopy is physically completed, a second FlashCopy is started from DBt1 to DBt1' (step 10). ATG has to wait for the physical completion of the FlashCopy started in step 8 because a source and target volume can be involved in only one FlashCopy relationship at a time.
As soon as this second FlashCopy is logically completed, ATG restarted the standby system; after the copy is physically completed, the contents of the DBt1' volumes are moved to tape (step 11).
Process Automation
The detailed description of the split-mirror backup shows the complexity of this process. Of course, the process can be managed by system administrators, but to avoid additional points of failures, ATG has to concentrate on the automation of this process.
A single PPRC command is actually synchronous, but if a set of commands (for example, RECOVER for all volume pairs that contain the live database) is started under ICKDSF or TSO BATCH (IKJEFT01), ATG is not sure if all commands finished successfully at the point in time that the process gets back control. The same is true for MVS commands, such as VARY, which varies volumes ONLINE or OFFLINE.
ATG solved the latter problem with a WAIT loop that runs a certain amount of time to ensure that the command successfully finished. PPRC offers the QUERY command to get information about the progress of a command. This information is presented formatted or unformatted on an output screen (in the SYSLOG or in a dataset).
From the operations point of view, this is not handy because, for every QUERY call, ATG gets a huge message output in the MVS system log. To find out, for example, whether all paired volumes are synchronized, a program that scans this output is needed.
Suspension of Application Writes
For a recovery, DB2 on OS/390 needs consistency points (checkpoints identified by a highly written RBA) that are established by the commands QUIESCE or ARCHIVE LOG, but these commands wait for COMMIT. With SAP's recommended DB2 parameter settings, the commands will wait 15 seconds less than the resource timeout. This means that if there is continuous activity in the SAP R/3 System, the commands may never finish successfully.
Be aware that the SUSPEND log write command is not a DB2 checkpoint. This means that pages modified in DB2 buffers will not be flushed out to disk. But all log buffers will be written to disk, and, due to the log-write latch, all application writes will be suspended. After the successful execution of this command, all PGLOGRBAs in the table and index spaces are less than or equal the RBA achieved by this command. The latter will enable a normal DB2 restart on the copied DBt1 volume set to get a consistent SAP R/3 database image.
Recovery
The backup process described earlier will provide a physical dump of the live database achieved at the moment of application write suspension. This dump will be on tape and on the DBt1' volumes of the secondary storage system.
The contents of the DBt1 volumes differ from that of the DBt1' volumes in that, after the restart of the standby system, all modifications done by transactions open at the moment of write suspension are rolled back. The LOG volumes of the DBt0 volumes contain the contents of the live system's LOG volumes, and all other DBt0 volumes have the same contents as their corresponding DBt1' volumes. Immediately after application writes are resumed, the contents of all DBt0, DBt1, and DBt1' volumes (with the exception of the DBt0 LOG volumes) will go out-of-date.
The kind of recovery scenario that a customer will apply depends on his service-level agreements. He may decide to offer the users temporarily a read-only system (DBt1), and he might try to find a reverse-engineering process that will recover a consistent live system.
If the error analysis or the reverse-engineering process takes too long, the customer may decide to go back to the last backup. In this case, the DBt1' image will be copied back to DBt2, and DB2 will be restarted. In the latter scenario, the customer may lose too much work; therefore, he may decide to recover to a point in time closer to the disaster. In this case, the DBt0 image will be copied back to DBt2, and the DB2 LOGs will be applied. This DB2 conditional restart process can be very time-consuming.