Home > Articles > Software Development & Management

IT Management Reference Guide

Jul 9, 2004

␡

⎙ Print

< Back Page 75 of 205 Next >

There are several methods available for recovering data that has been altered, deleted, damaged, or otherwise made inaccessible. The recovery techniques used depend on the manner in which the data was backed up. Table 1 lists four common types of data backups. The first three are referred to as physical backups because operating system software or specialized program products copy the data as it physically resides on the disk without regard to database structures or logical organization—it is purely a physical backup. The fourth is called a logical backup because database management software reads—or backs up—logical parts of the database, such as tables, schemas, data dictionaries, or indexes, and then writes the output to binary files. This may be done for the full database, for individual users, or for specific tables.

Physical offline backups require that all online systems, applications, and databases residing on a volume being backed up be shut down prior to starting the backup process. Performing several full volume backups of high-capacity disk drives may take many hours to complete and are normally done on weekends when systems can be shut down for long periods of time. Incremental backups also require systems and databases to be shut down, but for much shorter periods of time. Since only the data that has changed since the last backup is what is copied, incremental backups can usually be completed within a few hours if done on a nightly basis.

Table 1 Types of Data Backups

Type of Backup	Alternate Names
1. Physical full backup	Cold backup Full volume backup Full offline backup
2. Physical incremental backup	Incremental backup Incremental offline backup
3. Physical online backup	Online backup Hot backup Archive backup
4. Logical backup	Exporting files Exporting files into binary files

A physical online backup is a powerful backup technique that offers two very valuable and distinct benefits:

Databases can remain open to users during the backup process.
Recovery can be accomplished back to the last transaction processed.

The database environment must be running in an archive mode for online backups to occur properly. This means that fully filled log files, prior to being written over, are first written to an archive file. During online backups, table files are put into a backup state one at a time to enable the operating system to back up the data associated with it. Any changes made during the backup process are temporarily stored in logs files and then brought back to their normal state after that particular table file has been backed up.

Full recovery is accomplished by restoring the last full backup and the incremental backups taken since the last full backup and then doing a forward recovery utilizing the archive and log tapes. For Oracle databases, the logging is referred to as redo files; when these files are full, they are copied to archive files before being written over for continuous logging. Sybase, IBM's Database2 (DB2) and Microsoft's SQLSERVER have similar logging mechanisms using checkpoints and transaction logs. Log files can also be shipped or transported to other locations to aid in disaster recovery.

Replication is another form of backup in which highly critical data is copied in close to real time to a remote location. Replication intervals can vary from just a few minutes to several hours. I assisted three recent clients in implementing replication schemes that were similar in concept but different in application. One replicated its critical data between Los Angeles and Las Vegas every thirty minutes. Another replicated theirs every twenty minutes from coast to coast. The third company replicated their crucial data every fifteen minutes between Southern California and Denver. The point here is that replication schemes will vary depending on a company's requirements and the amount of costs they are willing to incur.

Logical backups are less complicated and more time consuming to perform than physical backups. There are three advantages to performing logical backups in concert with physical backups:

Exports can be made online enabling 24/7 applications and databases to remain operational during the copying process.
Small portions of a database can be exported and imported, efficiently enabling maintenance to be performed on only the data required.
Exported data can be imported into databases or schemas at a higher version level than the original database, allowing for testing at new software levels.

Another approach to safeguarding data becoming more prevalent today is thedisk-to-disk backup. As the size of critical databases continues to grow, and as allowable backup windows continue to shrink, the advantages of this approach are rapidly helping to justify its obvious costs. The first advantage is the significant reduction in backup and recovery time. Copying directly to disk is orders of magnitude faster than copying to tape. This benefit also applies to online backups, which, while allowing databases to be open and accessible during backup processing, still incur a performance hit that is noticeably reduced by this method.

Another advantage of disk-to-disk backups is that the stored copy can be used for other purposes, such as testing or report generation which, if done with the original data, could impact database performance. Finally, this approach can actually cost justify tape backups. Copying the second stored disk files to tape can be scheduled at any time, provided it ends prior to the beginning of the next disk backup. It may even reduce investment in tape equipment, which can offset the costs of additional disks.

A thorough understanding of the requirements and the capabilities of data backups, restores, and recovery is necessary for implementing a robust storage management process. Several other backup considerations need to be kept in mind when designing such a process, and these are listed in Table 2.

1. Backup window

2. Restore times

3. Expiration dates

4. Retention periods

5. Recycle periods

6. Generation data groups

7. Offsite retrieval times

8. Tape density

9. Tape format

10. Tape packaging

11. Shelf life

12. Automation techniques

Table 2 Data Backup Considerations

There are three key questions that need to be answered at the outset:

How much nightly backup window is available?
How long will it take to perform nightly backups?
Back to what point in time should recovery be made?

If the time needed to back up all the required data on a nightly basis exceeds the offline backup window, then some form of online backup will be necessary. The method of recovery that will be used will depend on whether data is to be restored back to the last incremental backup or back to the last transaction completed.

Expiration dates, retention periods, and recycling periods are related issues pertaining to the length of time data is intended to stay in existence. Weekly and monthly application jobs may create temporary data files that are designed to expire one week or one month, respectively, after the data was generated. Other files may need to be retained for several years for auditing purposes or for government regulations. Backup files on tape also fall into these categories. Expiration dates and retention periods are specified in the job control language that describes how these various files will be created. Recycle periods relate to the elapsed time before backup tapes are reused.

A generation data group (GDG) is a mainframe mechanism for creating new versions of a data file that would be similar to that created with backup jobs. The advantage of this is the ability to restore back to a specific day with simple parameter changes to the job control language. Offsite retrieval time is the maximum contracted time that the offsite tape storage provider is allowed to physically bring tapes to the data center from the time of notification.

Tape density, format, and packaging relate to characteristics that may change over time and consequently change recovery procedures. Density refers to the compression of bits as they are stored on the tape; it will increase as technology advances and equipment is upgraded. Format refers to the number and configuration of tracks on the tape. Packaging refers to the size and shape of the enclosures used to house the tapes.

The shelf life of magnetic tape is sometimes overlooked and can become problematic for tapes with retention periods exceeding five or six years. Temperature, humidity, handling, frequent changes in the environment, the quality of the tape, and other factors can influence the actual shelf life of any given tape, but five years is a good rule of thumb to use for recopying long-retained tapes.

Mechanical tape loaders, automated tape library systems, and movable tape rack systems can all add a degree of labor-saving automation to the storage management process. As with any process automation, thorough planning and process streamlining need to precede the implementation of the automation.

This concludes the four-part series on storage management. It covered the areas storage capacities, performance, reliability and recoverability. Other sections of this Management Guide that are related to storage management include those on Capacity Planning and Improving High Availability.

< Back Page 75 of 205 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address