- Management Reference Guide
- Table of Contents
- Introduction
- Strategic Management
- Establishing Goals, Objectives, and Strategies
- Aligning IT Goals with Corporate Business Goals
- Utilizing Effective Planning Techniques
- Developing Worthwhile Mission Statements
- Developing Worthwhile Vision Statements
- Instituting Practical Corporate Values
- Budgeting Considerations in an IT Environment
- Introduction to Conducting an Effective SWOT Analysis
- IT Governance and Disaster Recovery, Part One
- IT Governance and Disaster Recovery, Part Two
- Customer Management
- Identifying Key External Customers
- Identifying Key Internal Customers
- Negotiating with Customers and Suppliers—Part 1: An Introduction
- Negotiating With Customers and Suppliers—Part 2: Reaching Agreement
- Negotiating and Managing Realistic Customer Expectations
- Service Management
- Identifying Key Services for Business Users
- Service-Level Agreements That Really Work
- How IT Evolved into a Service Organization
- FAQs About Systems Management (SM)
- FAQs About Availability (AV)
- FAQs About Performance and Tuning (PT)
- FAQs About Service Desk (SD)
- FAQs About Change Management (CM)
- FAQs About Configuration Management (CF)
- FAQs About Capacity Planning (CP)
- FAQs About Network Management
- FAQs About Storage Management (SM)
- FAQs About Production Acceptance (PA)
- FAQs About Release Management (RM)
- FAQs About Disaster Recovery (DR)
- FAQs About Business Continuity (BC)
- FAQs About Security (SE)
- FAQs About Service Level Management (SL)
- FAQs About Financial Management (FN)
- FAQs About Problem Management (PM)
- FAQs About Facilities Management (FM)
- Process Management
- Developing Robust Processes
- Establishing Mutually Beneficial Process Metrics
- Change Management—Part 1
- Change Management—Part 2
- Change Management—Part 3
- Audit Reconnaissance: Releasing Resources Through the IT Audit
- Problem Management
- Problem Management–Part 2: Process Design
- Problem Management–Part 3: Process Implementation
- Business Continuity Emergency Communications Plan
- Capacity Planning – Part One: Why It is Seldom Done Well
- Capacity Planning – Part Two: Developing a Capacity Planning Process
- Capacity Planning — Part Three: Benefits and Helpful Tips
- Capacity Planning – Part Four: Hidden Upgrade Costs and
- Improving Business Process Management, Part 1
- Improving Business Process Management, Part 2
- 20 Major Elements of Facilities Management
- Major Physical Exposures Common to a Data Center
- Evaluating the Physical Environment
- Nightmare Incidents with Disaster Recovery Plans
- Developing a Robust Configuration Management Process
- Developing a Robust Configuration Management Process – Part Two
- Automating a Robust Infrastructure Process
- Improving High Availability — Part One: Definitions and Terms
- Improving High Availability — Part Two: Definitions and Terms
- Improving High Availability — Part Three: The Seven R's of High Availability
- Improving High Availability — Part Four: Assessing an Availability Process
- Methods for Brainstorming and Prioritizing Requirements
- Introduction to Disk Storage Management — Part One
- Storage Management—Part Two: Performance
- Storage Management—Part Three: Reliability
- Storage Management—Part Four: Recoverability
- Twelve Traits of World-Class Infrastructures — Part One
- Twelve Traits of World-Class Infrastructures — Part Two
- Meeting Today's Cooling Challenges of Data Centers
- Strategic Security, Part One: Assessment
- Strategic Security, Part Two: Development
- Strategic Security, Part Three: Implementation
- Strategic Security, Part Four: ITIL Implications
- Production Acceptance Part One – Definition and Benefits
- Production Acceptance Part Two – Initial Steps
- Production Acceptance Part Three – Middle Steps
- Production Acceptance Part Four – Ongoing Steps
- Case Study: Planning a Service Desk Part One – Objectives
- Case Study: Planning a Service Desk Part Two – SWOT
- Case Study: Implementing an ITIL Service Desk – Part One
- Case Study: Implementing a Service Desk Part Two – Tool Selection
- Ethics, Scandals and Legislation
- Outsourcing in Response to Legislation
- Supplier Management
- Identifying Key External Suppliers
- Identifying Key Internal Suppliers
- Integrating the Four Key Elements of Good Customer Service
- Enhancing the Customer/Supplier Matrix
- Voice Over IP, Part One — What VoIP Is, and Is Not
- Voice Over IP, Part Two — Benefits, Cost Savings and Features of VoIP
- Application Management
- Production Acceptance
- Distinguishing New Applications from New Versions of Existing Applications
- Assessing a Production Acceptance Process
- Effective Use of a Software Development Life Cycle
- The Role of Project Management in SDLC— Part 2
- Communication in Project Management – Part One: Barriers to Effective Communication
- Communication in Project Management – Part Two: Examples of Effective Communication
- Safeguarding Personal Information in the Workplace: A Case Study
- Combating the Year-end Budget Blitz—Part 1: Building a Manageable Schedule
- Combating the Year-end Budget Blitz—Part 2: Tracking and Reporting Availability
- References
- Developing an ITIL Feasibility Analysis
- Organization and Personnel Management
- Optimizing IT Organizational Structures
- Factors That Influence Restructuring Decisions
- Alternative Locations for the Help Desk
- Alternative Locations for Database Administration
- Alternative Locations for Network Operations
- Alternative Locations for Web Design
- Alternative Locations for Risk Management
- Alternative Locations for Systems Management
- Practical Tips To Retaining Key Personnel
- Benefits and Drawbacks of Using IT Consultants and Contractors
- Deciding Between the Use of Contractors versus Consultants
- Managing Employee Skill Sets and Skill Levels
- Assessing Skill Levels of Current Onboard Staff
- Recruiting Infrastructure Staff from the Outside
- Selecting the Most Qualified Candidate
- 7 Tips for Managing the Use of Mobile Devices
- Useful Websites for IT Managers
- References
- Automating Robust Processes
- Evaluating Process Documentation — Part One: Quality and Value
- Evaluating Process Documentation — Part Two: Benefits and Use of a Quality-Value Matrix
- When Should You Integrate or Segregate Service Desks?
- Five Instructive Ideas for Interviewing
- Eight Surefire Tips to Use When Being Interviewed
- 12 Helpful Hints To Make Meetings More Productive
- Eight Uncommon Tips To Improve Your Writing
- Ten Helpful Tips To Improve Fire Drills
- Sorting Out Today’s Various Training Options
- Business Ethics and Corporate Scandals – Part 1
- Business Ethics and Corporate Scandals – Part 2
- 12 Tips for More Effective Emails
- Management Communication: Back to the Basics, Part One
- Management Communication: Back to the Basics, Part Two
- Management Communication: Back to the Basics, Part Three
- Asset Management
- Managing Hardware Inventories
- Introduction to Hardware Inventories
- Processes To Manage Hardware Inventories
- Use of a Hardware Inventory Database
- References
- Managing Software Inventories
- Business Continuity Management
- Ten Lessons Learned from Real-Life Disasters
- Ten Lessons Learned From Real-Life Disasters, Part 2
- Differences Between Disaster Recovery and Business Continuity , Part 1
- Differences Between Disaster Recovery and Business Continuity , Part 2
- 15 Common Terms and Definitions of Business Continuity
- The Federal Government’s Role in Disaster Recovery
- The 12 Common Mistakes That Cause BIAs To Fail—Part 1
- The 12 Common Mistakes That Cause BIAs To Fail—Part 2
- The 12 Common Mistakes That Cause BIAs To Fail—Part 3
- The 12 Common Mistakes That Cause BIAs To Fail—Part 4
- Conducting an Effective Table Top Exercise (TTE) — Part 1
- Conducting an Effective Table Top Exercise (TTE) — Part 2
- Conducting an Effective Table Top Exercise (TTE) — Part 3
- Conducting an Effective Table Top Exercise (TTE) — Part 4
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part One
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Two
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Three
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Four
- The Information Technology Infrastructure Library (ITIL)
- The Origins of ITIL
- The Foundation of ITIL: Service Management
- Five Reasons for Revising ITIL
- The Relationship of Service Delivery and Service Support to All of ITIL
- Ten Common Myths About Implementing ITIL, Part One
- Ten Common Myths About Implementing ITIL, Part Two
- Characteristics of ITIL Version 3
- Ten Benefits of itSMF and its IIL Pocket Guide
- Translating the Goals of the ITIL Service Delivery Processes
- Translating the Goals of the ITIL Service Support Processes
- Elements of ITIL Least Understood, Part One: Service Delivery Processes
- Case Study: Recovery Reactions to a Renegade Rodent
- Elements of ITIL Least Understood, Part Two: Service Support
- Case Studies
- Case Study — Preparing for Hurricane Charley
- Case Study — The Linux Decision
- Case Study — Production Acceptance at an Aerospace Firm
- Case Study — Production Acceptance at a Defense Contractor
- Case Study — Evaluating Mainframe Processes
- Case Study — Evaluating Recovery Sites, Part One: Quantitative Comparisons/Natural Disasters
- Case Study — Evaluating Recovery Sites, Part Two: Quantitative Comparisons/Man-made Disasters
- Case Study — Evaluating Recovery Sites, Part Three: Qualitative Comparisons
- Case Study — Evaluating Recovery Sites, Part Four: Take-Aways
- Disaster Recovery Test Case Study Part One: Planning
- Disaster Recovery Test Case Study Part Two: Planning and Walk-Through
- Disaster Recovery Test Case Study Part Three: Execution
- Disaster Recovery Test Case Study Part Four: Follow-Up
- Assessing the Robustness of a Vendor’s Data Center, Part One: Qualitative Measures
- Assessing the Robustness of a Vendor’s Data Center, Part Two: Quantitative Measures
- Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part One: What Did the Team Do Well
- (d) Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part Two
This is the final installment of the four-part case study on conducting an operational recovery exercise. In part one I discussed the preparations for, and the conducting of, the actual meeting with the business unit sponsor. In part two I described the weekly planning meetings and the structured walk-though of the exercise, and in part three I shared the compiled results of the exercise. In this part I show how the recovery exercise team captured, analyzed and followed up on several lessons learned.
Lessons Learned – What We Did Well
The recovery team conducted a lessons learned sessions within one week of the exercise. I facilitated this meeting in a manner described in the earlier section. This included a round robin method to solicit input on what we did well, and a nominal group technique to prioritize the responses. After compiling the feedback, I wrote and distributed a brief analysis of the results. Table 1 displays this information. I asked each participant to rank their top seven responses. Responses ranked first received seven points, those ranked second received six points, and so on. The far right column of Table 1 shows the distributions of these rankings. The total points received by each response served to prioritize them, as shown by the second column of the table.
Table 1 Prioritized Actions Done Well
What Did We Do Well? |
|||
# |
Pts |
Response |
Distribution |
1 |
45 |
Exercise preparation uncovered production problems. |
7,7,7,6,5,5,5,2,1 |
2 |
34 |
Good communication prior to exercise. |
7,7,7,6,4,3 |
3 |
32 |
Identified what needs to be fixed. |
7,6,5,4,4,4,1,1 |
T4 |
27 |
Set reasonable expectations prior to the exercise. |
6,6,5,4,2,2,2 |
T4 |
27 |
Achieved the goals of the exercise. |
7,7,6,4,2,1 |
T4 |
27 |
Good communication during the exercise. |
7,6,5,5,4 |
7 |
26 |
Good identification of issues prior to exercise. |
7,5,5,4,3,2 |
8 |
23 |
Good participation from cross-functional teams |
6,5,5,4,2,1 |
9 |
21 |
Identified the right owners to the right pieces. |
6,5,5,3,1,1 |
T10 |
19 |
Pre-planning and assumptions done well. |
6,3,3,3,2,1,1 |
T10 |
19 |
Good response to problems. |
7,4,4,2,2 |
12 |
17 |
Agendas, action items and daily meetings were fruitful. |
6,6,2,2,1 |
13 |
14 |
It was correct to make it a functional test. |
7,3,3,1 |
14 |
10 |
Involved the right customers. |
6,3,1 |
15 |
9 |
Daily and weekly meetings complemented each other. |
3,3,3 |
16 |
5 |
Video communication was a plus. |
4,1 |
17 |
4 |
Good leadership was displayed. |
4 |
18 |
3 |
Geographic logistics added realism to the test. |
3 |
19 |
2 |
Made good use of the bridge line. |
2 |
20 |
0 |
Pulled servicing in early to get a feel for the process. |
|
Analysis
The fact that the preparation for this exercise uncovered some unknown production problems was by far highest rated response in this category with 45 points. The next two highest ranked entries, with 34 and 32 points, involved actions prior to (good communication) and during (identifying fix-it needs) the exercise. The next three responses tied for fourth place with 27 points and also involved activities prior to (reasonable expectations), during (communication) and after (goals all met) the exercise. The next response ranked just under these three with 26 points and involved good identification of issues beforehand.
The next four ranked responses were grouped closely and rounded out the top ten with 23, 21 and a tie for 19 points. These entries involved favorable impressions of participation, planning and problem-solving. The remaining responses touched on such items as meeting management, video teleconferencing and geographic logistics.
Lessons Learned – What We Could Do Better
In a manner similar to collecting input on 'What We Did Well', I facilitated gathering feedback on 'What Could We Do Better?', shown in Table 2. This information was extremely important in learning how to improve future exercises, and led to several follow-up action items, shown in Table 3.
This concludes the case study on an actual operational disaster recovery exercise. The effort was successful in that the team restored critical software applications at a designated recovery site within reasonable time-frames, and that business users were able to verify the functional operation of the software. A number of improvement suggestions came out of the exercise, and these will be put into practice during upcoming months. Another similar exercise will be conducted in about six months.
Table 2 Prioritized Improvement Suggestions
What Could We Do Better? |
|||
# |
Pts |
Response |
Distribution |
1 |
49 |
Extend time between system build out and exercise. |
7,7,7,6,6,6,5,3,1,1 |
T2 |
45 |
Provide more time for application build out. |
7,7,7,6,6,6,5,1 |
T2 |
45 |
Evaluate timelines/scope for better schedule estimates. |
7,7,5,5,4,4,4,3,3,2,1 |
4 |
34 |
Validate and pre-test systems prior to the exercise. |
7,6,6,6,5,3,1 |
5 |
30 |
Improve assignment of roles. |
7,6,6,5,4,2 |
6 |
24 |
Provide more clearly defined handoffs. |
5,5,4,3,3,2,2 |
7 |
20 |
Limit those who want to act as leaders during exercise. |
6,4,3,2,2,2,1 |
8 |
19 |
Build out applications sequentially, not all at once. |
7,5,3,3,1 |
T9 |
18 |
Increase participation by Enterprise Architecture. |
7,5,4,2 |
T9 |
18 |
Provide more time to plan/execute exercise properly. |
6,5,4,3 |
11 |
16 |
Improve Friday meetings. |
5,5,3,2,1 |
12 |
13 |
Ensure attendees prepare better for Friday meetings. |
4,4,2,2,1 |
13 |
12 |
Test host files more thoroughly prior to exercise. |
4,4,3,1 |
14 |
10 |
Increase attendance at weekly planning meetings. |
7,2,1 |
T15 |
4 |
Improve testing of VPN concentrator POC. |
4 |
T15 |
4 |
Improve terminology to clarify purpose of exercise. |
3,1 |
17 |
3 |
Provide more machines for shadowing. |
2,1 |
18 |
0 |
Improve use of bridge line. |
|
Analysis
The three highest ranked responses (with 49, 45 and 45 points) all dealt with time issues. It is clear that the majority of participants feel more time is needed in future exercises to ensure that software applications are fully built out and successfully tested prior to the exercise. This is included as one of the follow-up action items. Two other responses, ranked slightly lower in 8th and 9th place (tied) with 19 and 18 points, also dealt with time issues.
Two other follow-up action items are a result of lessons learned improvement suggestions. The fourth highest ranked response with 34 points was to validate and pre-test systems, and the 13th entry with 12 points was to test host files more thoroughly.
Responses ranked 5th and 6th with 30 and 24 points, respectively, both involved exercise management, and consisted of improving role assignments and providing more clearly defined handoffs. These responses also resulted in follow-up improvement items.
Table 3 Follow-up Action Items
# |
Date Asgnd |
Description |
Resp Person |
ECD |
Rev ECD |
ACD |
1 |
6/09 |
Resolve issue of security badge not allowing access into building D. |
Associate A |
8/01 |
|
|
2 |
6/09 |
Resolve Citrix access problem with home laptop computer. |
Associate B |
8/01 |
|
6/09 |
3 |
6/09 |
Ensure adequate time is provided to staff for application build-outs |
Associate C |
8/01 |
|
|
4 |
6/09 |
Ensure script that shuts down clusters is checked for successful completion. |
Associate D |
8/01 |
|
|
5 |
6/09 |
Ensure that host files are tested prior to exercise. |
Associate E |
8/01 |
|
|
6 |
6/09 |
Improve and clarify the assignment of roles for next exercise. |
Associate F |
8/01 |
|
|
7 |
6/09 |
Provide more clearly defined handoffs for next exercise. |
Associate G |
8/01 |
|
|
8 |
6/09 |
Emphasize proper handoffs during Walk-Thru simulation |
Associate H |
8/01 |
|
|
9 |
|
|
|
|
|
|
10 |
|
|
|
|
|
|
ECD=Estimated Completion Date Rev-ECD=Revised Completion Date ACD=Actual Completion Date