- Management Reference Guide
- Table of Contents
- Introduction
- Strategic Management
- Establishing Goals, Objectives, and Strategies
- Aligning IT Goals with Corporate Business Goals
- Utilizing Effective Planning Techniques
- Developing Worthwhile Mission Statements
- Developing Worthwhile Vision Statements
- Instituting Practical Corporate Values
- Budgeting Considerations in an IT Environment
- Introduction to Conducting an Effective SWOT Analysis
- IT Governance and Disaster Recovery, Part One
- IT Governance and Disaster Recovery, Part Two
- Customer Management
- Identifying Key External Customers
- Identifying Key Internal Customers
- Negotiating with Customers and Suppliers—Part 1: An Introduction
- Negotiating With Customers and Suppliers—Part 2: Reaching Agreement
- Negotiating and Managing Realistic Customer Expectations
- Service Management
- Identifying Key Services for Business Users
- Service-Level Agreements That Really Work
- How IT Evolved into a Service Organization
- FAQs About Systems Management (SM)
- FAQs About Availability (AV)
- FAQs About Performance and Tuning (PT)
- FAQs About Service Desk (SD)
- FAQs About Change Management (CM)
- FAQs About Configuration Management (CF)
- FAQs About Capacity Planning (CP)
- FAQs About Network Management
- FAQs About Storage Management (SM)
- FAQs About Production Acceptance (PA)
- FAQs About Release Management (RM)
- FAQs About Disaster Recovery (DR)
- FAQs About Business Continuity (BC)
- FAQs About Security (SE)
- FAQs About Service Level Management (SL)
- FAQs About Financial Management (FN)
- FAQs About Problem Management (PM)
- FAQs About Facilities Management (FM)
- Process Management
- Developing Robust Processes
- Establishing Mutually Beneficial Process Metrics
- Change Management—Part 1
- Change Management—Part 2
- Change Management—Part 3
- Audit Reconnaissance: Releasing Resources Through the IT Audit
- Problem Management
- Problem Management–Part 2: Process Design
- Problem Management–Part 3: Process Implementation
- Business Continuity Emergency Communications Plan
- Capacity Planning – Part One: Why It is Seldom Done Well
- Capacity Planning – Part Two: Developing a Capacity Planning Process
- Capacity Planning — Part Three: Benefits and Helpful Tips
- Capacity Planning – Part Four: Hidden Upgrade Costs and
- Improving Business Process Management, Part 1
- Improving Business Process Management, Part 2
- 20 Major Elements of Facilities Management
- Major Physical Exposures Common to a Data Center
- Evaluating the Physical Environment
- Nightmare Incidents with Disaster Recovery Plans
- Developing a Robust Configuration Management Process
- Developing a Robust Configuration Management Process – Part Two
- Automating a Robust Infrastructure Process
- Improving High Availability — Part One: Definitions and Terms
- Improving High Availability — Part Two: Definitions and Terms
- Improving High Availability — Part Three: The Seven R's of High Availability
- Improving High Availability — Part Four: Assessing an Availability Process
- Methods for Brainstorming and Prioritizing Requirements
- Introduction to Disk Storage Management — Part One
- Storage Management—Part Two: Performance
- Storage Management—Part Three: Reliability
- Storage Management—Part Four: Recoverability
- Twelve Traits of World-Class Infrastructures — Part One
- Twelve Traits of World-Class Infrastructures — Part Two
- Meeting Today's Cooling Challenges of Data Centers
- Strategic Security, Part One: Assessment
- Strategic Security, Part Two: Development
- Strategic Security, Part Three: Implementation
- Strategic Security, Part Four: ITIL Implications
- Production Acceptance Part One – Definition and Benefits
- Production Acceptance Part Two – Initial Steps
- Production Acceptance Part Three – Middle Steps
- Production Acceptance Part Four – Ongoing Steps
- Case Study: Planning a Service Desk Part One – Objectives
- Case Study: Planning a Service Desk Part Two – SWOT
- Case Study: Implementing an ITIL Service Desk – Part One
- Case Study: Implementing a Service Desk Part Two – Tool Selection
- Ethics, Scandals and Legislation
- Outsourcing in Response to Legislation
- Supplier Management
- Identifying Key External Suppliers
- Identifying Key Internal Suppliers
- Integrating the Four Key Elements of Good Customer Service
- Enhancing the Customer/Supplier Matrix
- Voice Over IP, Part One — What VoIP Is, and Is Not
- Voice Over IP, Part Two — Benefits, Cost Savings and Features of VoIP
- Application Management
- Production Acceptance
- Distinguishing New Applications from New Versions of Existing Applications
- Assessing a Production Acceptance Process
- Effective Use of a Software Development Life Cycle
- The Role of Project Management in SDLC— Part 2
- Communication in Project Management – Part One: Barriers to Effective Communication
- Communication in Project Management – Part Two: Examples of Effective Communication
- Safeguarding Personal Information in the Workplace: A Case Study
- Combating the Year-end Budget Blitz—Part 1: Building a Manageable Schedule
- Combating the Year-end Budget Blitz—Part 2: Tracking and Reporting Availability
- References
- Developing an ITIL Feasibility Analysis
- Organization and Personnel Management
- Optimizing IT Organizational Structures
- Factors That Influence Restructuring Decisions
- Alternative Locations for the Help Desk
- Alternative Locations for Database Administration
- Alternative Locations for Network Operations
- Alternative Locations for Web Design
- Alternative Locations for Risk Management
- Alternative Locations for Systems Management
- Practical Tips To Retaining Key Personnel
- Benefits and Drawbacks of Using IT Consultants and Contractors
- Deciding Between the Use of Contractors versus Consultants
- Managing Employee Skill Sets and Skill Levels
- Assessing Skill Levels of Current Onboard Staff
- Recruiting Infrastructure Staff from the Outside
- Selecting the Most Qualified Candidate
- 7 Tips for Managing the Use of Mobile Devices
- Useful Websites for IT Managers
- References
- Automating Robust Processes
- Evaluating Process Documentation — Part One: Quality and Value
- Evaluating Process Documentation — Part Two: Benefits and Use of a Quality-Value Matrix
- When Should You Integrate or Segregate Service Desks?
- Five Instructive Ideas for Interviewing
- Eight Surefire Tips to Use When Being Interviewed
- 12 Helpful Hints To Make Meetings More Productive
- Eight Uncommon Tips To Improve Your Writing
- Ten Helpful Tips To Improve Fire Drills
- Sorting Out Today’s Various Training Options
- Business Ethics and Corporate Scandals – Part 1
- Business Ethics and Corporate Scandals – Part 2
- 12 Tips for More Effective Emails
- Management Communication: Back to the Basics, Part One
- Management Communication: Back to the Basics, Part Two
- Management Communication: Back to the Basics, Part Three
- Asset Management
- Managing Hardware Inventories
- Introduction to Hardware Inventories
- Processes To Manage Hardware Inventories
- Use of a Hardware Inventory Database
- References
- Managing Software Inventories
- Business Continuity Management
- Ten Lessons Learned from Real-Life Disasters
- Ten Lessons Learned From Real-Life Disasters, Part 2
- Differences Between Disaster Recovery and Business Continuity , Part 1
- Differences Between Disaster Recovery and Business Continuity , Part 2
- 15 Common Terms and Definitions of Business Continuity
- The Federal Government’s Role in Disaster Recovery
- The 12 Common Mistakes That Cause BIAs To Fail—Part 1
- The 12 Common Mistakes That Cause BIAs To Fail—Part 2
- The 12 Common Mistakes That Cause BIAs To Fail—Part 3
- The 12 Common Mistakes That Cause BIAs To Fail—Part 4
- Conducting an Effective Table Top Exercise (TTE) — Part 1
- Conducting an Effective Table Top Exercise (TTE) — Part 2
- Conducting an Effective Table Top Exercise (TTE) — Part 3
- Conducting an Effective Table Top Exercise (TTE) — Part 4
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part One
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Two
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Three
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Four
- The Information Technology Infrastructure Library (ITIL)
- The Origins of ITIL
- The Foundation of ITIL: Service Management
- Five Reasons for Revising ITIL
- The Relationship of Service Delivery and Service Support to All of ITIL
- Ten Common Myths About Implementing ITIL, Part One
- Ten Common Myths About Implementing ITIL, Part Two
- Characteristics of ITIL Version 3
- Ten Benefits of itSMF and its IIL Pocket Guide
- Translating the Goals of the ITIL Service Delivery Processes
- Translating the Goals of the ITIL Service Support Processes
- Elements of ITIL Least Understood, Part One: Service Delivery Processes
- Case Study: Recovery Reactions to a Renegade Rodent
- Elements of ITIL Least Understood, Part Two: Service Support
- Case Studies
- Case Study — Preparing for Hurricane Charley
- Case Study — The Linux Decision
- Case Study — Production Acceptance at an Aerospace Firm
- Case Study — Production Acceptance at a Defense Contractor
- Case Study — Evaluating Mainframe Processes
- Case Study — Evaluating Recovery Sites, Part One: Quantitative Comparisons/Natural Disasters
- Case Study — Evaluating Recovery Sites, Part Two: Quantitative Comparisons/Man-made Disasters
- Case Study — Evaluating Recovery Sites, Part Three: Qualitative Comparisons
- Case Study — Evaluating Recovery Sites, Part Four: Take-Aways
- Disaster Recovery Test Case Study Part One: Planning
- Disaster Recovery Test Case Study Part Two: Planning and Walk-Through
- Disaster Recovery Test Case Study Part Three: Execution
- Disaster Recovery Test Case Study Part Four: Follow-Up
- Assessing the Robustness of a Vendor’s Data Center, Part One: Qualitative Measures
- Assessing the Robustness of a Vendor’s Data Center, Part Two: Quantitative Measures
- Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part One: What Did the Team Do Well
- (d) Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part Two
Ten Lessons Learned from Real-Life Disasters
The following lessons learned are those that I feel are the most important based on the disasters in which I participated. There are dozens of other lessons I have experienced after running countless dry-run and simulation disaster exercises. Obvious tips such as ensuring the personal safety of yourselves, your loved ones and your staffs are not elaborated here, nor are items such as testing the restore of your backups, the accuracy of phone contacts, and ensuring recovery hardware and software is compatible to original versions. In light of tragedy of New Orleans from Hurricane Katrina, I believe it is most helpful to relate those lessons that were learned from real-life experiences. These were not simulations; these were not test exercises. These were true disasters from which we learned true lessons of improvement. The intent here is that you will also learn from these experiences.
Part One - Five Lessons Learned During Disaster Planning
- Include representatives from all appropriate areas in the disaster planning process. These areas include technical, business, administrative, support, and vendor groups. Three of the most effective disaster recovery plans I have ever seen executed in an actual emergency all had one thing in common: the developers of the plans made sure that they included representatives from all appropriate functions from across the company. In preparing for Hurricane Charley in 2004, a financial services client of mine included six areas of IT (voice networks, data networks, systems engineering, database, applications and the help desk) and three key areas of business users. They also included human resources, purchasing, legal, facilities, and key vendors. These groups all worked well together, contributed valuable input, and identified, evaluated and selected good number of reasonable alternatives.
- Require multiple levels of management sign-off for plans. During the past several years I have developed and tested numerous IT disaster recovery plans for half a dozen clients. I always encourage the clients to involve multiple levels of management sign-off on the completed plans. While this may appear to be needless bureaucracy, it provides several benefits. Recovery team members who will be using the plans tend to make them more thorough and accurate when they see the support and buy-in of their managers. Lower level managers seem to review the plans more carefully when they know higher level managers will be holding them more responsible. The key is to have the highest level, usually the CIO or the Chief Risk Officer holding all of the others accountable for disaster plans that can actually their business functions. Companies have used anywhere from one to four levels of management sign-off, but regardless of the number of levels it is always the degree of accountability insisted on by the highest level executive that will cause the greatest effect.
- Don't overlook vital hardcopy records. A financial services
client of mine has a major regional business and data center in Tampa, Florida
that was directly in the path of Hurricane Charley in August 2004. Their
planning for the impending landfall and wave surges provided a valuable lesson
in disaster preparedness. For several days recovery planners conducted meetings
with a large cross-functional team representing all critical areas of IT and the
business. Most of the emphasis was on technical decisions such as the switching
of telephone lines, the redundancy of networks, the restoration of data, and the
availability of recovery servers. With less than 48 hours left to evacuate the
greater Tampa area, all technical provisions seemed to be in place.
During one of the last planning sessions a representative from the loan origination division asked a simple question: "What about the loan documents?" Everyone turned to her in surprise as she explained that there were 10,000 loan documents in cardboard boxes on the first floor of their building that was located ¼ mile from the Gulf where eight foot wave surges were predicted. Suddenly all of our planning shifted from technical recovery to vital hardcopy recovery, with all records safely stored within 12 hours.
- Evaluate risk factors when selecting recovery sites. Many large companies these days have multiple recovery sites from which they can operate their business and IT functions. In the case of the financial services company whose staffs had to evacuate Tampa due to Hurricane Charley last year, the firm had two choices for recovery sites: their facilities in Chicago and New York. Chicago had the larger facility for recovery and more hotel rooms for workers, and there was less concern about security. New York's facility was better wired for an IT recovery, but would be hosting the Republican Convention for the presidential election, and there was concern about security and hotel rooms. In the end, the team selected New York because its benefits outweighed it drawbacks. The process serves as an important lesson in evaluating all factors in a recovery site decision.
- Operationally test recovery plans. There are four common types of tests for business continuity and disaster recovery plans: validation, simulation, operational, and mock. Validation testing focuses on the accuracy and completeness of the data within the plan such as phone numbers, version/release numbers and model numbers. Simulation testing, sometimes called a table-top exercise, involves walking through the recovery actions of disaster scenario that simulates an actual event. Improvements and refinements to the recovery plans usually result from these types of tests.
An operational test is the most involved type of test exercise because it involves bringing recovered systems up at a recovery site and testing the functionality and performance of the systems. In the various disaster recoveries in which I participated, operational testing had been conducted previously and made for much smoother activation of plans during actual disasters. This is not to say that unforeseen problems do not arise such as changes in weather and unavailability of key personnel. But operationally testing the plans ahead of time tends to minimize the impact of unforeseen events.
These are five important lessons I learned while planning for actual disasters. Next week I will discuss five other important lessons I learned while participating in the recovery from real events, notably California earthquakes and Hurricane Charley.