- Disaster at the Movie Studio
- Steps to Developing an Effective Disaster-Recovery Process
- Nightmare Incidents with Disaster-Recovery Plans
- Harris Kern's Enterprise Computing Institute
Steps to Developing an Effective Disaster-Recovery Process
The following 10 steps are required to develop an effective disaster-recovery process. The steps should be executed serially and in the order prescribed.
Step 1: Acquire Executive Support
The acquisition of executive support, particularly in the form of an executive sponsor, is the first step necessary for developing a truly robust disaster-recovery process. As mentioned earlier, there are many resources required to design and maintain an effective program, and these all need funding approval from senior management to initiate the effort and to see it through to completion.
Another reason that this support is important is that managers are typically the first to be notified when a disaster actually occurs. This sets off a chain of events involving management decisions about deploying the IT recovery team, declaring an emergency to the disaster-recovery service provider, notifying facilities and physical security, and taking whatever emergency preparedness actions may be necessary. By involving management early in the design process, and by securing their emotional as well as financial buy-in, you increase the likelihood of management understanding and flawlessly executing its role when a calamity occurs.
There are several other responsibilities of a disaster-recovery executive sponsor. One is selecting a process owner. Another is acquiring support from the managers of the participants of the cross-functional team to ensure that participants are properly chosen and committed to the program. These other managers may be direct reports, peers within IT, or, in the case of facilities, outside IT. Finally, the executive sponsor needs to demonstrate ongoing support by requesting and reviewing frequent progress reports, offering suggestions for improvement, questioning unclear elements of the plan, and resolving issues of conflict.
Step 2: Select a Process Owner
The process owner for disaster recovery is the most important individual involved with this process because of the many key roles this person plays. The process owner must assemble and lead the cross-functional team in such diverse activities as preparing the business impact analysis, identifying and prioritizing requirements, developing business-continuity strategies, selecting an outside service provider, and conducting realistic tests of the process. This person should exhibit several key attributes and be selected very carefully. Potential candidates include an operations supervisor, the data center manager, or even the infrastructure manager.
Step 3: Assemble a Cross-Functional Team
Representatives of appropriate departments from several areas inside and outside IT should be assembled into a cross-functional design team. Departments typically represented on this team include computer operations, applications development, server and systems administration, facilities, key customer departments, data security, physical security, and network operations. This team will work on requirements, conduct a business impact analysis, select an outside service provider, design the final overall recovery process, identify members of the recovery team, conduct tests of the recovery process, and document the plan.
Step 4: Conduct a Business Impact Analysis
Even the most thorough of disaster-recovery plans will not be able to justify the expense of including every business process and application in the recovery. Inventorying and prioritizing of critical business processes should represent the entire company. Key IT customers should help coordinate this effort with the process owner to ensure that all critical processes are included. Processes that need to be resumed within 24 hours to prevent serious business impact, such as loss of revenue or major impact to customers, are rated as an A priority. Those processes that need to be resumed within 72 hours are rated as a B, and greater than 72 hours are rated C. These identifications and priorities will be used to propose business-continuity strategies.
Step 5: Identify and Prioritize Requirements
One of the first activities of the cross-functional team is to brainstorm the identity of requirements for the process, such as business, technical, and logistical requirements. Business requirements include defining the specific criteria for declaring a disaster and determining which processes are to be recovered and in what timeframes. Technical requirements include what type of platforms will be eligible as recovery devices for servers, disk, and desktops, and how much bandwidth will be needed. Logistical requirements include the amount of time allowed to declare a disaster and transportation arrangements at both the disaster site and the recovery site.
Step 6: Assess Possible Business-Continuity Strategies
Based on the business impact analysis and the list of prioritized requirements, the cross-functional team should propose and assess several alternative business-continuity strategies. These will likely include alternative remote sites within the company and geographic hot sites supplied by an outside provider.
Step 7: Choose Participants and Clarify Their Roles for the Recovery Team
The cross-functional team chooses the individuals who will participate in the recovery activities after any declared disaster. The recovery team may be similar to the cross-functional team but should not be identical. Additional members should include representatives from an outside service provider (if used), key customer representatives based on the prioritized business-impact analysis, and the executive sponsor. Once the recovery team is selected, it's imperative that each individual's role and responsibility be clearly defined, documented, and communicated.
Step 8: Document the Disaster-Recovery Plan
The last official activity of the cross-functional team is to document the disaster-recovery plan for use by the recovery team, which will then have responsibility for maintaining its accuracy, accessibility, and distribution. Documentation of the plan must also include up-to-date configuration diagrams of the hardware, software, and network components involved in the recovery.
Step 9: Plan and Execute Regularly Scheduled Tests of the Plan
Disaster-recovery plans should be tested a minimum of once per year. Progressive companies test three or four times annually. During the test, a checklist should be maintained to record the disposition and duration of every task that was performed for later comparison to those of the planned tasks. Infrastructures with world-class disaster-recovery programs test at least twice per year. When first starting out, particularly for complex environments, consider developing a test plan that spans up to three yearsevery six months the tests can become progressively more involved, starting with program and data restores, followed by processing loads and print tests, then initial network-connectivity tests, and eventually full network and desktop load and functionality tests.
Step 10: Conduct a Lessons-Learned Postmortem After Each Test
The intent of the lessons-learned postmortem is to review exactly how the test was executed as well as to identify what went well, what needs to be improved, and what enhancements or efficiencies could be added to improve future tests.