- Management Reference Guide
- Table of Contents
- Introduction
- Strategic Management
- Establishing Goals, Objectives, and Strategies
- Aligning IT Goals with Corporate Business Goals
- Utilizing Effective Planning Techniques
- Developing Worthwhile Mission Statements
- Developing Worthwhile Vision Statements
- Instituting Practical Corporate Values
- Budgeting Considerations in an IT Environment
- Introduction to Conducting an Effective SWOT Analysis
- IT Governance and Disaster Recovery, Part One
- IT Governance and Disaster Recovery, Part Two
- Customer Management
- Identifying Key External Customers
- Identifying Key Internal Customers
- Negotiating with Customers and Suppliers—Part 1: An Introduction
- Negotiating With Customers and Suppliers—Part 2: Reaching Agreement
- Negotiating and Managing Realistic Customer Expectations
- Service Management
- Identifying Key Services for Business Users
- Service-Level Agreements That Really Work
- How IT Evolved into a Service Organization
- FAQs About Systems Management (SM)
- FAQs About Availability (AV)
- FAQs About Performance and Tuning (PT)
- FAQs About Service Desk (SD)
- FAQs About Change Management (CM)
- FAQs About Configuration Management (CF)
- FAQs About Capacity Planning (CP)
- FAQs About Network Management
- FAQs About Storage Management (SM)
- FAQs About Production Acceptance (PA)
- FAQs About Release Management (RM)
- FAQs About Disaster Recovery (DR)
- FAQs About Business Continuity (BC)
- FAQs About Security (SE)
- FAQs About Service Level Management (SL)
- FAQs About Financial Management (FN)
- FAQs About Problem Management (PM)
- FAQs About Facilities Management (FM)
- Process Management
- Developing Robust Processes
- Establishing Mutually Beneficial Process Metrics
- Change Management—Part 1
- Change Management—Part 2
- Change Management—Part 3
- Audit Reconnaissance: Releasing Resources Through the IT Audit
- Problem Management
- Problem Management–Part 2: Process Design
- Problem Management–Part 3: Process Implementation
- Business Continuity Emergency Communications Plan
- Capacity Planning – Part One: Why It is Seldom Done Well
- Capacity Planning – Part Two: Developing a Capacity Planning Process
- Capacity Planning — Part Three: Benefits and Helpful Tips
- Capacity Planning – Part Four: Hidden Upgrade Costs and
- Improving Business Process Management, Part 1
- Improving Business Process Management, Part 2
- 20 Major Elements of Facilities Management
- Major Physical Exposures Common to a Data Center
- Evaluating the Physical Environment
- Nightmare Incidents with Disaster Recovery Plans
- Developing a Robust Configuration Management Process
- Developing a Robust Configuration Management Process – Part Two
- Automating a Robust Infrastructure Process
- Improving High Availability — Part One: Definitions and Terms
- Improving High Availability — Part Two: Definitions and Terms
- Improving High Availability — Part Three: The Seven R's of High Availability
- Improving High Availability — Part Four: Assessing an Availability Process
- Methods for Brainstorming and Prioritizing Requirements
- Introduction to Disk Storage Management — Part One
- Storage Management—Part Two: Performance
- Storage Management—Part Three: Reliability
- Storage Management—Part Four: Recoverability
- Twelve Traits of World-Class Infrastructures — Part One
- Twelve Traits of World-Class Infrastructures — Part Two
- Meeting Today's Cooling Challenges of Data Centers
- Strategic Security, Part One: Assessment
- Strategic Security, Part Two: Development
- Strategic Security, Part Three: Implementation
- Strategic Security, Part Four: ITIL Implications
- Production Acceptance Part One – Definition and Benefits
- Production Acceptance Part Two – Initial Steps
- Production Acceptance Part Three – Middle Steps
- Production Acceptance Part Four – Ongoing Steps
- Case Study: Planning a Service Desk Part One – Objectives
- Case Study: Planning a Service Desk Part Two – SWOT
- Case Study: Implementing an ITIL Service Desk – Part One
- Case Study: Implementing a Service Desk Part Two – Tool Selection
- Ethics, Scandals and Legislation
- Outsourcing in Response to Legislation
- Supplier Management
- Identifying Key External Suppliers
- Identifying Key Internal Suppliers
- Integrating the Four Key Elements of Good Customer Service
- Enhancing the Customer/Supplier Matrix
- Voice Over IP, Part One — What VoIP Is, and Is Not
- Voice Over IP, Part Two — Benefits, Cost Savings and Features of VoIP
- Application Management
- Production Acceptance
- Distinguishing New Applications from New Versions of Existing Applications
- Assessing a Production Acceptance Process
- Effective Use of a Software Development Life Cycle
- The Role of Project Management in SDLC— Part 2
- Communication in Project Management – Part One: Barriers to Effective Communication
- Communication in Project Management – Part Two: Examples of Effective Communication
- Safeguarding Personal Information in the Workplace: A Case Study
- Combating the Year-end Budget Blitz—Part 1: Building a Manageable Schedule
- Combating the Year-end Budget Blitz—Part 2: Tracking and Reporting Availability
- References
- Developing an ITIL Feasibility Analysis
- Organization and Personnel Management
- Optimizing IT Organizational Structures
- Factors That Influence Restructuring Decisions
- Alternative Locations for the Help Desk
- Alternative Locations for Database Administration
- Alternative Locations for Network Operations
- Alternative Locations for Web Design
- Alternative Locations for Risk Management
- Alternative Locations for Systems Management
- Practical Tips To Retaining Key Personnel
- Benefits and Drawbacks of Using IT Consultants and Contractors
- Deciding Between the Use of Contractors versus Consultants
- Managing Employee Skill Sets and Skill Levels
- Assessing Skill Levels of Current Onboard Staff
- Recruiting Infrastructure Staff from the Outside
- Selecting the Most Qualified Candidate
- 7 Tips for Managing the Use of Mobile Devices
- Useful Websites for IT Managers
- References
- Automating Robust Processes
- Evaluating Process Documentation — Part One: Quality and Value
- Evaluating Process Documentation — Part Two: Benefits and Use of a Quality-Value Matrix
- When Should You Integrate or Segregate Service Desks?
- Five Instructive Ideas for Interviewing
- Eight Surefire Tips to Use When Being Interviewed
- 12 Helpful Hints To Make Meetings More Productive
- Eight Uncommon Tips To Improve Your Writing
- Ten Helpful Tips To Improve Fire Drills
- Sorting Out Today’s Various Training Options
- Business Ethics and Corporate Scandals – Part 1
- Business Ethics and Corporate Scandals – Part 2
- 12 Tips for More Effective Emails
- Management Communication: Back to the Basics, Part One
- Management Communication: Back to the Basics, Part Two
- Management Communication: Back to the Basics, Part Three
- Asset Management
- Managing Hardware Inventories
- Introduction to Hardware Inventories
- Processes To Manage Hardware Inventories
- Use of a Hardware Inventory Database
- References
- Managing Software Inventories
- Business Continuity Management
- Ten Lessons Learned from Real-Life Disasters
- Ten Lessons Learned From Real-Life Disasters, Part 2
- Differences Between Disaster Recovery and Business Continuity , Part 1
- Differences Between Disaster Recovery and Business Continuity , Part 2
- 15 Common Terms and Definitions of Business Continuity
- The Federal Government’s Role in Disaster Recovery
- The 12 Common Mistakes That Cause BIAs To Fail—Part 1
- The 12 Common Mistakes That Cause BIAs To Fail—Part 2
- The 12 Common Mistakes That Cause BIAs To Fail—Part 3
- The 12 Common Mistakes That Cause BIAs To Fail—Part 4
- Conducting an Effective Table Top Exercise (TTE) — Part 1
- Conducting an Effective Table Top Exercise (TTE) — Part 2
- Conducting an Effective Table Top Exercise (TTE) — Part 3
- Conducting an Effective Table Top Exercise (TTE) — Part 4
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part One
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Two
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Three
- The 13 Cardinal Steps for Implementing a Business Continuity Program — Part Four
- The Information Technology Infrastructure Library (ITIL)
- The Origins of ITIL
- The Foundation of ITIL: Service Management
- Five Reasons for Revising ITIL
- The Relationship of Service Delivery and Service Support to All of ITIL
- Ten Common Myths About Implementing ITIL, Part One
- Ten Common Myths About Implementing ITIL, Part Two
- Characteristics of ITIL Version 3
- Ten Benefits of itSMF and its IIL Pocket Guide
- Translating the Goals of the ITIL Service Delivery Processes
- Translating the Goals of the ITIL Service Support Processes
- Elements of ITIL Least Understood, Part One: Service Delivery Processes
- Case Study: Recovery Reactions to a Renegade Rodent
- Elements of ITIL Least Understood, Part Two: Service Support
- Case Studies
- Case Study — Preparing for Hurricane Charley
- Case Study — The Linux Decision
- Case Study — Production Acceptance at an Aerospace Firm
- Case Study — Production Acceptance at a Defense Contractor
- Case Study — Evaluating Mainframe Processes
- Case Study — Evaluating Recovery Sites, Part One: Quantitative Comparisons/Natural Disasters
- Case Study — Evaluating Recovery Sites, Part Two: Quantitative Comparisons/Man-made Disasters
- Case Study — Evaluating Recovery Sites, Part Three: Qualitative Comparisons
- Case Study — Evaluating Recovery Sites, Part Four: Take-Aways
- Disaster Recovery Test Case Study Part One: Planning
- Disaster Recovery Test Case Study Part Two: Planning and Walk-Through
- Disaster Recovery Test Case Study Part Three: Execution
- Disaster Recovery Test Case Study Part Four: Follow-Up
- Assessing the Robustness of a Vendor’s Data Center, Part One: Qualitative Measures
- Assessing the Robustness of a Vendor’s Data Center, Part Two: Quantitative Measures
- Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part One: What Did the Team Do Well
- (d) Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part Two
Part One: Getting Started
Problems are a fact of life in all data centers. The sheer complexity of systems and diversity of services offered today all but guarantee that problems will occur. One of the many factors that separate world-class infrastructures from mediocre ones is how well they manage the variety and volume of problems they encounter.
This three part series discusses the entire realm of problem management. This first part begins with a commonly accepted definition of this process followed by its interpretation and implications. This section also discusses the scope of problem management and shows how it differs from change management and request management.
Definition of Problem Management
Regardless of how well designed its processes or how smooth running its operations, even a world-class infrastructure will occasionally miss its targeted levels of service. The branch of systems management that deals with the handling of these occurrences is called problem management, defined below.
problem managementa process to identify, log, track, resolve, and analyze events that adversely impact IT services
The identification of problems typically comes in the form of a trouble call from an end-user to a help desk facility, but problems may also be identified by programmers, analysts, or systems administrators. Problems are normally logged into a database for subsequent tracking, resolution, and analysis. The sophistication of the database and the depth of the analysis varies widely from shop to shop and are two key indicators of the relative robustness of a problem management process. I will discuss these two characteristics in more detail in part three of this series.
Scope of Problem Management
Several of my clients have struggled with distinguishing the three closely related processes of problem management, change management, and request management. While the initial input to these three processes may be similar, the methods for managing a problem, a change, or a request for service will typically vary significantly from each other. As a result, the scope of what actually constitutes problem management also varies significantly from shop to shop. Most infrastructures do agree that first-level problem handling, commonly referred to as tier 1, is the minimum basis for problem management. Table 1 shows some of the more common variations to this scheme.
Table 1 Variations of Problem Management Schemes
Variation Number |
Description. |
1 |
Tier 1 reporting only; sometimes called incident management |
2 |
Tier 2 reporting only; sometimes called traditional problem management |
3 |
Tier 3 reporting only; sometimes called escalation management |
4 |
Major service disruption; sometimes called crisis management |
5 |
Tier 1 reporting and request management |
6 |
All tiers reporting and request management |
7 |
All tiers reporting and both request and change management |
The most integrated approach is the last variation in the table, in which all three tiers of problem management are tightly coupled with change management and request management. Among other things this means that all calls concerning problems, changes, and service requests go through a centralized help desk and are logged into the same type of database.
The value of a centralized help desk can be significant, but if it is replacing a segregated call numbers the effort to integrate can be substantial. A few years ago I had a client with multiple help desks who wanted me to centralize them. Every time they deployed a new application they issued a new, temporary help desk number. The problem was the temporary number became a permanent number. By the time I was assigned to the task, there were 11 separate help desk numbers in use. It took us almost a year to fully replace this scheme with a fully integrated, single number (the slogan was 'one call does it all') customer service center. But the positive reaction of users to the new call center set-up, and the value it provided, made it all worth-while.
Distinguishing between Problem, Change, and Request Management
Problem, change, and request management are three infrastructure processes that are closely related but distinct. Changes sometimes cause problems or can be the result of a problem. Expanding a database that is running out of storage space may be a good, proactive change, but it may cause backup windows to extend into production time, resulting in a scheduling problem for operations.
The same is true of problems causing or being the result of changes. Request management is usually treated as a subset of problem management, but it applies to individuals requesting services or enhancements, such as a specialized keyboard, a file restore, an advanced-function mouse, extra copies of a report, or a larger sized monitor. Table 2 shows a general delineation of problem, change, and service requests.
Table 2 General Delineation of Problem, Change, and Service Requests
Problem Ticket |
Change Ticket |
Service Request |
Problems with a desktop component used by a single customer. Any kind of production service interruption such as:
Problems of an urgent nature impacting customers will generate an associated change ticket. Essentially a "fix when broke" approach to request management. |
Adding, deleting, or modifying any production hardware, software, facilities, or documentation impacting more than one customer. All changes are designated as either emergency changes or planned changes. An emergency change is an urgent, mandatory change impacting customers that must be implemented in less than 24 hours to restore accessibility, functionality, or acceptable performance to a production application or to a support service.
All nonemergency changes are designated as planned changes and are assigned one of four priority levels. |
Responding to an operational service request. Adding, deleting, or modifying any production hardware, software, facilities, or documentation impacting one individual customer. |
To further delineate the differences between problem, change, and service requests, I have provided in Table 3 some examples of the three types of requests taken from actual infrastructure environments.
In the next update, I will present the first seven steps of the 11 step method of developing a world-class problem management process.
Table 3 Examples of Problem, Change, and Service Requests
Problem Ticket |
Change Ticket |
Service Request |
Desktop problems that are handled and resolved over the phone. Network accessibility from a desktop. Urgent problems will generate a change ticket if problem is not resolvable over the phone. Resolve a functionality problem within a new or existing application. |
Add a backup server. (Priority Level 4) Add more disk volumes. (P/L-3) Upgrade server operating systems. (P/L-2) Migrate to a new database architecture. (P/L-1) Respond to an urgent problem. (emergency) Add a new function that has never before existed in the production environment. |
Rerun a job. Reprint a report. Restore a file. Upgrade desktop components. File permissions. Delete an unused login, account, password, etc. Assign more disk space without adding more disk volumes. Install a new occurrence of an existing function into a production environment. |
References
Walker, Gary, IT Problem Management, Prentice Hall, 2001
Schiesser, Rich, IT Systems Management, Prentice Hall, 2002