- Management Reference Guide
- Table of Contents
- Introduction
- Strategic Management
- Establishing Goals, Objectives, and Strategies
- Aligning IT Goals with Corporate Business Goals
- Utilizing Effective Planning Techniques
- Developing Worthwhile Mission Statements
- Developing Worthwhile Vision Statements
- Instituting Practical Corporate Values
- Budgeting Considerations in an IT Environment
- Introduction to Conducting an Effective SWOT Analysis
- IT Governance and Disaster Recovery, Part One
- IT Governance and Disaster Recovery, Part Two
- Customer Management
- Identifying Key External Customers
- Identifying Key Internal Customers
- Negotiating with Customers and Suppliers—Part 1: An Introduction
- Negotiating With Customers and Suppliers—Part 2: Reaching Agreement
- Negotiating and Managing Realistic Customer Expectations
- Service Management
- Identifying Key Services for Business Users
- Service-Level Agreements That Really Work
- How IT Evolved into a Service Organization
- FAQs About Systems Management (SM)
- FAQs About Availability (AV)
- FAQs About Performance and Tuning (PT)
- FAQs About Service Desk (SD)
- FAQs About Change Management (CM)
- FAQs About Configuration Management (CF)
- FAQs About Capacity Planning (CP)
- FAQs About Network Management
- FAQs About Storage Management (SM)
- FAQs About Production Acceptance (PA)
- FAQs About Release Management (RM)
- FAQs About Disaster Recovery (DR)
- FAQs About Business Continuity (BC)
- FAQs About Security (SE)
- FAQs About Service Level Management (SL)
- FAQs About Financial Management (FN)
- FAQs About Problem Management (PM)
- FAQs About Facilities Management (FM)
- Process Management
- Developing Robust Processes
- Establishing Mutually Beneficial Process Metrics
- Change Management—Part 1
- Change Management—Part 2
- Change Management—Part 3
- Audit Reconnaissance: Releasing Resources Through the IT Audit
- Problem Management
- Problem Management–Part 2: Process Design
- Problem Management–Part 3: Process Implementation
- Business Continuity Emergency Communications Plan
- Capacity Planning – Part One: Why It is Seldom Done Well
- Capacity Planning – Part Two: Developing a Capacity Planning Process
- Capacity Planning — Part Three: Benefits and Helpful Tips
- Capacity Planning – Part Four: Hidden Upgrade Costs and
- Improving Business Process Management, Part 1
- Improving Business Process Management, Part 2
- 20 Major Elements of Facilities Management
- Major Physical Exposures Common to a Data Center
- Evaluating the Physical Environment
- Nightmare Incidents with Disaster Recovery Plans
- Developing a Robust Configuration Management Process
- Developing a Robust Configuration Management Process – Part Two
- Automating a Robust Infrastructure Process
- Improving High Availability — Part One: Definitions and Terms
- Improving High Availability — Part Two: Definitions and Terms
- Improving High Availability — Part Three: The Seven R's of High Availability
- Improving High Availability — Part Four: Assessing an Availability Process
- Methods for Brainstorming and Prioritizing Requirements
- Introduction to Disk Storage Management — Part One
- Storage Management—Part Two: Performance
- Storage Management—Part Three: Reliability
- Storage Management—Part Four: Recoverability
- Twelve Traits of World-Class Infrastructures — Part One
- Twelve Traits of World-Class Infrastructures — Part Two
- Meeting Today's Cooling Challenges of Data Centers
- Strategic Security, Part One: Assessment
- Strategic Security, Part Two: Development
- Strategic Security, Part Three: Implementation
- Strategic Security, Part Four: ITIL Implications
- Production Acceptance Part One – Definition and Benefits
- Production Acceptance Part Two – Initial Steps
- Production Acceptance Part Three – Middle Steps
- Production Acceptance Part Four – Ongoing Steps
- Case Study: Planning a Service Desk Part One – Objectives
- Case Study: Planning a Service Desk Part Two – SWOT
- Case Study: Implementing an ITIL Service Desk – Part One
- Case Study: Implementing a Service Desk Part Two – Tool Selection
- Ethics, Scandals and Legislation
- Outsourcing in Response to Legislation
- Supplier Management
- Identifying Key External Suppliers
- Identifying Key Internal Suppliers
- Integrating the Four Key Elements of Good Customer Service
- Enhancing the Customer/Supplier Matrix
- Voice Over IP, Part One — What VoIP Is, and Is Not
- Voice Over IP, Part Two — Benefits, Cost Savings and Features of VoIP
- Application Management
- Production Acceptance
- Distinguishing New Applications from New Versions of Existing Applications
- Assessing a Production Acceptance Process
- Effective Use of a Software Development Life Cycle
- The Role of Project Management in SDLC— Part 2
- Communication in Project Management – Part One: Barriers to Effective Communication
- Communication in Project Management – Part Two: Examples of Effective Communication
- Safeguarding Personal Information in the Workplace: A Case Study
- Combating the Year-end Budget Blitz—Part 1: Building a Manageable Schedule
- Combating the Year-end Budget Blitz—Part 2: Tracking and Reporting Availability
- References
- Developing an ITIL Feasibility Analysis
- Organization and Personnel Management
- Optimizing IT Organizational Structures
- Factors That Influence Restructuring Decisions
- Alternative Locations for the Help Desk
- Alternative Locations for Database Administration
- Alternative Locations for Network Operations
- Alternative Locations for Web Design
- Alternative Locations for Risk Management
- Alternative Locations for Systems Management
- Practical Tips To Retaining Key Personnel
- Benefits and Drawbacks of Using IT Consultants and Contractors
- Deciding Between the Use of Contractors versus Consultants
- Managing Employee Skill Sets and Skill Levels
- Assessing Skill Levels of Current Onboard Staff
- Recruiting Infrastructure Staff from the Outside
- Selecting the Most Qualified Candidate
- 7 Tips for Managing the Use of Mobile Devices
- Useful Websites for IT Managers
- References
- Automating Robust Processes
- Evaluating Process Documentation — Part One: Quality and Value
- Evaluating Process Documentation — Part Two: Benefits and Use of a Quality-Value Matrix
- When Should You Integrate or Segregate Service Desks?
- Five Instructive Ideas for Interviewing
- Eight Surefire Tips to Use When Being Interviewed
- 12 Helpful Hints To Make Meetings More Productive
- Eight Uncommon Tips To Improve Your Writing
- Ten Helpful Tips To Improve Fire Drills
- Sorting Out Today’s Various Training Options
- Business Ethics and Corporate Scandals – Part 1
- Business Ethics and Corporate Scandals – Part 2
- 12 Tips for More Effective Emails
- Management Communication: Back to the Basics, Part One
- Management Communication: Back to the Basics, Part Two
- Management Communication: Back to the Basics, Part Three
- Asset Management
- Managing Hardware Inventories
- Introduction to Hardware Inventories
- Processes To Manage Hardware Inventories
- Use of a Hardware Inventory Database
- References
- Managing Software Inventories
- Business Continuity Management
- Ten Lessons Learned from Real-Life Disasters
- Ten Lessons Learned From Real-Life Disasters, Part 2
- Differences Between Disaster Recovery and Business Continuity , Part 1
- Differences Between Disaster Recovery and Business Continuity , Part 2
- 15 Common Terms and Definitions of Business Continuity
- The Federal Government’s Role in Disaster Recovery
- The 12 Common Mistakes That Cause BIAs To Fail—Part 1
- The 12 Common Mistakes That Cause BIAs To Fail—Part 2
- The 12 Common Mistakes That Cause BIAs To Fail—Part 3
- The 12 Common Mistakes That Cause BIAs To Fail—Part 4
- Conducting an Effective Table Top Exercise (TTE) — Part 1
- Conducting an Effective Table Top Exercise (TTE) — Part 2
- Conducting an Effective Table Top Exercise (TTE) — Part 3
- Conducting an Effective Table Top Exercise (TTE) — Part 4
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part One
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Two
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Three
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Four
- The Information Technology Infrastructure Library (ITIL)
- The Origins of ITIL
- The Foundation of ITIL: Service Management
- Five Reasons for Revising ITIL
- The Relationship of Service Delivery and Service Support to All of ITIL
- Ten Common Myths About Implementing ITIL, Part One
- Ten Common Myths About Implementing ITIL, Part Two
- Characteristics of ITIL Version 3
- Ten Benefits of itSMF and its IIL Pocket Guide
- Translating the Goals of the ITIL Service Delivery Processes
- Translating the Goals of the ITIL Service Support Processes
- Elements of ITIL Least Understood, Part One: Service Delivery Processes
- Case Study: Recovery Reactions to a Renegade Rodent
- Elements of ITIL Least Understood, Part Two: Service Support
- Case Studies
- Case Study — Preparing for Hurricane Charley
- Case Study — The Linux Decision
- Case Study — Production Acceptance at an Aerospace Firm
- Case Study — Production Acceptance at a Defense Contractor
- Case Study — Evaluating Mainframe Processes
- Case Study — Evaluating Recovery Sites, Part One: Quantitative Comparisons/Natural Disasters
- Case Study — Evaluating Recovery Sites, Part Two: Quantitative Comparisons/Man-made Disasters
- Case Study — Evaluating Recovery Sites, Part Three: Qualitative Comparisons
- Case Study — Evaluating Recovery Sites, Part Four: Take-Aways
- Disaster Recovery Test Case Study Part One: Planning
- Disaster Recovery Test Case Study Part Two: Planning and Walk-Through
- Disaster Recovery Test Case Study Part Three: Execution
- Disaster Recovery Test Case Study Part Four: Follow-Up
- Assessing the Robustness of a Vendor’s Data Center, Part One: Qualitative Measures
- Assessing the Robustness of a Vendor’s Data Center, Part Two: Quantitative Measures
- Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part One: What Did the Team Do Well
- (d) Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part Two
There are many traits that distinguish a world-class infrastructure from that of a mediocre one. Table 1 below summarizes twelve of the most common of these characteristics as they might appear in world-class and mediocre infrastructures. In this first of a two-part series I describe in more detail the first six of these traits.
-
Executive Support – As we discussed in chapter six, executive support is one of the primary prerequisites for implementing a world-class infrastructure. Executive support does not mean the mere approving of budgets for hardware, software and human resources. Executives in many firms with mediocre infrastructures readily approve budgets. What this does mean is an IT executive who actively participates in the planning, development and decision-making processes of Systems Management.
Active participation by executives can take on many forms. It may involve executives taking the time to understand the challenges and obstacles of providing sound infrastructures. It may consist of managers helping to prioritize which functions of Systems Management are most important to their firms. It may result in executives backing up their staffs when negotiating reasonable, rather than the more frequently unrealistic, service levels with customers. Finally, it may be the CIO or his representative ensuring that other departments within IT, notably applications development, actively support and comply with established infrastructure policies, procedures, and standards.
Table 1 Common Traits of World-Class and Mediocre Infrastructures
World-Class Infrastructures
Mediocre Infrastructures
- Totally Supported By Executive Management
- Meaningful Metrics Analyzed, Not Just Collected
- Proactive Approach To Problem Solving, Change Management, Availability, Performance and Tuning, and Capacity Planning
- Employees Well Trained
- Employees Well Equipped
- Processes are Designed With Robustness Throughout Them
- Help Desk Involves Call Management, Not Just Call Tracking
- Employees Empowered To Make Decisions and Improvements
- Standards Are Well Developed and Adhered To
- Technology is Effectively Used To Automate Streamlined Processes
- Functions of Systems Management are Integrated
- Technical Goals are Aligned withBusiness Goals
- Little or No Support From Executive Management
- Convenient Metrics, Not Necessarily Meaningful, Collected, Not Analyzed
- Reactive Approach To Problem Solving, Change Management, Availability, Performance and Tuning, and Capacity Planning
- Employees Poorly Trained
- Employees Poorly Equipped
- Processes Designed With Little or No Robustness in Them
- Help Desk Focuses on Call Tracking, Not Call Management
- Employees Empowered Very Little, or Not At All
- Standards Poorly Developed With Little or No Enforcement
- Technology is Applied, if At All, Inappropriately
- Little or No Integration of Systems Management Functions
- Little or No Alignment of Technical Goals with Business Goals
-
Meaningful Metrics Analyzed – One of the most common characteristics I have observed over the years that differentiate well-managed infrastructures from those poorly managed is their use of metrics. One of the first distinctions in this regard is the difference between merely collecting data, and establishing truly meaningful metrics as derived from this data.
For example, most all companies today collect some type of data about outages to their online systems, regardless of whether the systems are hosted on mainframes, client/servers, or the Internet. A typical metric may be to measure the percent uptime of a particular system over a given period of time and to establish a target goal, for instance, 99% uptime.
The data collected in this example may include the start and end times of the outage, the systems impacted, and the corrective actions taken to restore service. The metric itself is the computation of the percent uptime on a daily, weekly or monthly basis for each online system measured. Compiling the outage data into a more meaningful metric may involve segregating the percentage uptime between prime-shift and off-shift. Or reporting on actual system downtime in minutes or hours, as opposed to percent availability. A meaningful availability metric may also be a measure of output as defined by the customer. For example, we had a Purchasing Officer customer who requested that we measure availability based on the number of Purchase Orders his staff was able to process on a weekly basis.
Instituting meaningful metrics helps improve the overall management of an infrastructure, but the ultimate use of them involves their analysis to reveal trends, patterns and relationships. This in-depth analysis can often lead to the root cause of problems and more proactive approach to meeting service levels.
An example from an aerospace client can illustrate this point. This firm was running highly classified data over expensively encrypted network lines. High network availability was of paramount importance to ensure the economic use of the costly lines as well as the productive use of the highly paid specialists using them. Intermittent network outages began occurring at some point but proved elusive to troubleshoot. Finally we trended the data and noticed a pattern that seemed to center around the afternoon of the third Thursday of every month.
This monthly pattern eventually led us and our suppliers to uncover the fact that our telephone carrier was performing routine line maintenance for disaster recovery the third Thursday of every month. The switching involved with this maintenance was producing just enough line interference to affect the sensitivity of our encrypted lines. The maintenance was consequently modified for less interference and the problem never re-occurred. The analyzing and trending of the metrics data led us directly to the root cause and eventual resolution of the problem.
-
Proactive Approach – World Class Infrastructures employ a proactive approach to identify and prevent potential problems impacting performance and availability. Marginal infrastructures are forced to take a more reactive approach toward problem solving. For example, a proactive strategy may use the analysis of meaningful utilization metrics to predict when an out-of-capacity condition is likely to occur. Armed with this information, technicians can then decide whether to add more capacity or to re-schedule or reduce workloads to prevent outages or performance problems. A reactive approach allows no time to identify these conditions and to make proactive decisions. Other performance and capacity indicators, such as memory swaps and bandwidths, can similarly be analyzed to proactively identify and prevent bottlenecks and outages.
-
Well-Trained Employees – World-class infrastructures invest heavily in training their staffs. This training may take the form of on-the-job-training, onsite classroom instruction, offsite courses at local facilities, out-of town classes, or bringing vendors in to conduct customized training. Top-rated infrastructures often employ a buddy system, or a one-on-one mentoring program in which experienced senior level technicians share both the content and the application of their knowledge to more junior level staff. Cross-training between infrastructure departments such as operations and networks, or between system administration and database administration is another effective method used by well-managed organizations to optimize employee training.
-
Well-Equipped Employees – An attribute of world-class infrastructures that parallels well-trained employees is ensuring they are also well-equipped. Less sophisticated shops sometimes sacrifice hardware and software tools in the name of cost savings. This is often a false economy that can drag out problem resolution times, extend the length of outages, occasionally duplicate work efforts, and eventually frustrate key staff members to the point that they seek employment elsewhere.
While budget items need to be justified and managed, top rated infrastructures usually find the means to provide the tools their technicians need. These tools may include pagers, cell phones, personal assistant palmtops, laptops, at-home high speed network connections and specialized software for desktops.
-
Robust Processes – World class infrastructures know how to develop, design and maintain robust processes. The topic of robust processes, and the characteristics that define them, are described at length in the previous part of this section under Developing Robust Processes.
This concludes the initial part of this piece on the Twelve Traits of a World-Class Infrastructure. In part two I describe the last six of these characteristics.