Issues
Let's focus on some of the most critical issues and the serious impact that each has on the organization. Based on the data gathered from our assessments, the most severe issues could be grouped under the categories of organization, people, and process.
Organizational Issues
Lack of a production control function (production QA, second-level system administration, process ownership, and so on) means that there is only one level of support for system and database administration functionsall problems go directly to senior technical staff. There is not enough time for these staff members to do the job right; senior technical staff can't properly plan and design the infrastructure because they're too busy fighting fires. In turn, this leads to a lack of ownership and accountability for critical enterprise-wide processes.
Lack of a tape librarian function. This means that system integrity is compromised and in many cases even minimal disaster-recovery requirements are nonexistent.
Silos of technologies. Many IT shops are organized to focus on particular technologies (mainframe, AS400, NT, UNIX, Novell, and so on). In other words, there are separate goals, objectives, and priorities for every technology. This causes poor communication, huge barriers, and obstacles between groups, as well as duplication of system-management efforts for every technology. Very few companies are looking at enterprise-wide system-management solutions.
Roles and responsibilities not clearly defined. This results in overlapping job functions, poor morale, duplication of efforts, and confusion in the ranks for problem resolutionespecially for help desk staff, who are tasked to resolve problems as quickly as possible and frequently don't have the authority to do what needs to be done.
Splitting the infrastructure group. Some IT shops split the infrastructure group between infrastructure development and production support, resulting in poor communication, poor morale, and turf battles.
Some organizations are structured to focus on high-visibility strategic projects and have a separate function to focus on daily production-support issues. This leads to difficulty in turning over projects from development to support. Technicians would prefer to work on new projects and provide analysis on the latest and greatest technology rather than be labeled as full-time production-support personnel.
System management tools not fully implemented, customized, and maintained. This leads to manual intervention, wasted costs, and wasted technical resources. Senior technical staff are spending 90-95% of their time fixing production problems.
Lack of a three-tier support model. Most IT shops we've visited only have two levels of support. Problems are recorded at the help desk or they're detected from operations; these problems are then quickly routed to senior technical staff. Senior technical staff are spending 90-95% of their time in a reactive mode. With a three-tier support model, senior technicians can spend 80% of their time on strategic initiatives. The three-tier support model provides these additional benefits:
Enhanced skills for junior and second-level support personnel. Organizations today need to breed senior technical staff within the organization as quickly as possible, and continue with their external recruitment efforts.
Better turnaround for problem resolution.
The ability to fully provide analysis and implementation of enterprise systems-management solutions.
Ineffective architecture function. The architecture function has proven to be ineffective for designing infrastructures. The CIO might think otherwise because one of the architect's functions is to help design the proper infrastructure, but we have yet to see this strategy work effectively. The result is that infrastructure development lags further behind the needs of the customers and IT.
People Issues
Recruiting technical staff. Most of the companies we visited are not taking the time to breed technical expertise within the organization. Instead, they put all their energy into external recruiting, but the competition for skilled people is fierce. This leaves a big voidthey need to start breeding skilled resources within the organization as well as continue to look at external resources.
Retaining technical staff. With the focus on external recruiting, there is very little emphasis on retaining senior technical staff and training junior technical staff. Technical staff is in a constant firefighting/reactive mode, frustration is high due to the chaotic state of the infrastructure, and burnout is imminent.
Lack of a career development path for junior technical staff. Because the organization doesn't develop and promote the proper career path, career development is limited to daily problem resolution. This results in a lack of skilled technical resources when senior technical staff members leave and external resources can't be recruited quickly enough to fill the gap.
Culture barriers. Because many IT shops are structured around technology, the barriers between technical staff supporting these different technologies is a huge problem in IT. NT people don't talk to UNIX people, mainframe people don't communicate with client/server staff, and so on. The staff must learn that a production system is a production system is a production system. If the systems you're supporting are critical to the success of the company, you must treat them all equally.
Poor communications within and external to IT. Communications within IT are extremely poor, especially between application development and the infrastructure-support organization. Application development's charter is to design, develop, and deploy applications into production as quickly as possible. The infrastructure-support organization's charter is to ensure reliability, availability, and serviceability (RAS) for the mission-critical production environment. Poor communications also results in these problems:
Wasted effort.
Inefficient use of resources.
Projects that take more time, resources, and money to implement.
Service levels that are difficult to maintain.
User frustration with IT.
Lack of data center staff mentoring in reliability, availability, and serviceability (RAS). There are distinct phases to designing your organization to support RAS:
Clearly specifying mission-critical from nonmission-critical functions.
Implementing a production-control function.
Structuring three levels of support.
Difficulty supporting mission-critical client/server applications regardless of platform/paradigm. It's crucial to take the best practices from the legacy environment and the most important methodologies from open systems to come up with the best of both worlds.
Process Issues
Lack of critical disciplines. Change Control, Problem Management, and Production Acceptance are three of the most critical processes or disciplines that are necessary to keep the organization flowing:
Change Control. A process that coordinates any change that can potentially impact the operational production environment.
Problem Management. A centralized process to manage and resolve user, network, application, and system problems.
Production Acceptance. A methodology to promote communication, standards, guidelines, and teamwork for deploying, implementing, and supporting mission-critical client/server distributed systems.
Three-fourths of the companies we studied didn't even have an enterprise-wide change control process. Problem management was not as bad; 90% of the companies had something that resembled problem management. That's the good news, but the bad news is that 65% of those were broken. How can this be? Is anything and everything that ever came out of the mainframe environment tossed aside? Unfortunately, the answer appears to be yes. Seventy-five percent of the companies we studied had a legacy environment. Why is everyone turning their backs to a very successful mission-critical, production-support environment? We can turn our backs to many other aspects of the mainframe era, but not the way it provides RAS.
Lack of metrics. Not one of the companies we studied studies had organization-wide metrics. About half of the companies had some sort of help desk metrics, and the 75% that had mainframe systems had some form of metrics for them.
Metrics are crucial to measure effectiveness in the client/server environment: If you can't measure it, there's no way you can manage it effectively. It's also important to establish internal metrics for each area of the operation as well as quarterly or semiannual ratings to compare the results based on cost and performance efficiencies. When companies get their infrastructure in order, it's imperative that they establish metrics.
Lack of a process to benchmark services. Once IT has built that elusive "world-class" infrastructure, the next step is to benchmark selective parts of their infrastructure with outside sources. Executive management always claims that IT spends too muchyou can beat them to the punch by comparing your infrastructure costs with those of your competition and your vendors. (Hopefully your costs are lower.)
The first step is to document the extent of services and their related costs so that you can compare. The idea is to compare the cost of system administration support or network support with the costs of using vendors to provide this type of service. Divide your infrastructure into pieces and then go out and benchmark. Ask vendors what they would charge to support your mission-critical servers, network, and so on. (This is only a benchmark exercise, of course, as outsourcing any part of your client/server mission-critical environment is not recommended.) Once your infrastructure is cost effective, take a look at your competition.
Lack of a process to market and sell IT services. After benchmarking your IT serviceshopefully, the results are better than external sourcesdocument your services, associate a price with those services, and market and sell those services to your internal customers and senior executives. If a customer sees your costs versus the costs of external sources (assuming that yours is the less expensive bill), maybe they'll stop complaining.
Lack of service level agreements. Expectations are not properly documented between end users and IT. Once the organization is structured properly to support RAS and the minimum and sufficient processes are implemented, a formal agreement must be in place between IT and its customers.
Lack of a process to measure customer satisfaction. With the evolution of client/server computing and the torrid pace of change with information technology, maybe this item should read "No communications." In this networked era, there are few clear demarcations as to who does what to whom and when. And this is only internally within IT. What about the poor users? Are you effectively communicating with them? Only 5% of the 100+ companies we analyzed actually attempted to measure customer satisfaction. Scheduled survey forms are unpopular and the usual response rate is less than 30%. But there are ways to reengineer this entire process, if you can find the time.