- What Is System Center?
- Understanding System Center Configuration Manager
- Understanding System Center Operations Manager
- Understanding System Center Data Protection Manager
- Understanding System Center Virtual Machine Manager
- Understanding System Center Service Manager
- Understanding System Center Capacity Planner
- Understanding System Center Mobile Device Manager
- Understanding System Center Essentials
- Understanding System Center Licensing
- Summary
- Best Practices
Understanding System Center Operations Manager
System Center Operations Manager (SCOM) 2007 R2 is the second product being addressed in this chapter. SCOM is used to monitor and alert network administrators when something (a server, workstation, network device, application, and so forth) is not working as expected, such as being offline, in a failed state, or even not running as fast as normal. The SCOM management console, shown in Figure 1.4, provides details about the events and errors of the systems being monitored and managed by SCOM.
Figure 1.4 The System Center Operations Manager console.
In the past, system monitoring was simply monitoring and alerting when something was "down"; however, with System Center Operations Manager, the monitoring is proactive and alerts are triggered before problems cause a system to fail. SCOM proactively checks the operation of systems and devices, and when the devices are performing differently than normal—which many times is a precursor to a pending system failure—SCOM begins the alert and notification process.
Business Solutions Addressed by System Center Operations Manager
System Center Operations Manager helps an organization be proactive about system operations rather than waiting for a server or application to fail, incur operational downtime, and recover from the failure. SCOM helps IT personnel ensure systems are running as expected. SCOM monitors the normal operation of servers, workstations, and applications to create a known baseline on how the systems are operating. When the systems fall out of the norm of the baseline, meaning that something is wrong, and while downtime has not occurred, the systems or applications are not running as they always do and IT personnel are then notified to review the situation and take corrective action.
SCOM also helps the IT department identify systems that should be replaced before others due to reliability issues. SCOM can keep track of system uptime and downtime and generate a report that ranks the reliability of systems based on their ongoing performance. If all things were equal in terms of age or depreciation schedule of systems, yet an organization will be replacing a portion of the systems, the reports can be used to identify which systems should be replaced first.
SCOM also has the ability to monitor applications as if a user is accessing the application and not just based on whether a system is operational or not. A system can appear to be fully operational, yet when users try to log on to the system, they could get logon errors or terrible access performance. SCOM has the ability to utilize automation by having a client system log on to a web server or an application server with stored credentials and validate that systems throughout the enterprise are more than operational and are serving users as expected.
SCOM can also be used to produce reports that help auditors and regulators validate that the organization's IT operations meet regulatory compliance requirements. Automated report generation for information such as password attempt violations, service-level agreement details, encrypted data access validation, and the like makes SCOM more than just a monitoring tool, but an information compliance reporting tool.
The bottom line is that SCOM helps IT personnel identify problems that need to be fixed before the problems create downtime that impacts the operations of the business. This is critical in keeping employees productive for internal servers, and helps an organization maintain business continuity when their servers host applications that help the organization generate revenues. A properly designed, implemented, and configured monitoring tool like SCOM can mean the difference of an organization focused on productivity and continuity versus an organization that is constantly recovering from system failures.
Major Features of System Center Operations Manager
System Center Operations Manager 2007 R2 has hundreds of features and functions that an IT administrator can leverage as part of their system monitoring and proactive management practices; some of the major features in the product are as follows:
- Server and client system monitoring—Key to SCOM is its ability to monitor servers and client systems. Using an agent that installs on the system (or agentless if the administrator desires), information about the system(s) is reported back to the SCOM monitoring server with operational data tracked and logged on a continuous basis.
- Event correlation—SCOM is smart enough to know that when a wide area network (WAN) connection is down, the status of all of the servers and devices on the other side of the WAN connection becomes unknown. Rather than sending hundreds of alerts that SCOM has lost contact with every device on the other side of a WAN, SCOM instead sends a single alert that the WAN connection is down and that the status of devices on the other side of the WAN are in an unknown state.
- Event log collection—Key to regulatory compliance reporting is to note system changes as well as potential security violations. SCOM has the ability to collect event logs and syslogs from systems, consolidate the data, and provide reports on the aggregate of information such as failed password attempts against all monitored servers in the environment.
- System monitoring—Monitoring in SCOM is more than just noting that a system is up or down, but also the general response time of the system and applications running on the system. Specific applications can be monitored using SCOM, such as monitoring SharePoint servers, SQL servers, or Exchange servers, as shown in Figure 1.5.
Figure 1.5 System monitoring and alerting in SCOM of specific servers in an environment.
- Client system monitoring—Added in recent updates to SCOM is the ability to monitor and report on not just servers, but also client workstations in a network. Client system monitoring is commonly used to monitor and help manage and support critical client systems. A critical client system might be a laptop that belongs to a key executive, or it could be a workstation that serves as a print server or data collection device. Whatever the case, SCOM has the ability to monitor servers as well as client systems in the enterprise.
- Application monitoring—SCOM has the ability to monitor specific application and website URLs, not just to see if the servers are running or if the website is responding, but to actually confirm that the site is responding in a timely manner. This deep level of monitoring, shown in Figure 1.6, confirms response time and can even have test user accounts log on to session states to validate that a site or protected site is responding as expected.
Figure 1.6 Monitoring applications and web URLs in SCOM.
- Service-oriented management—Traditional system monitoring treated all systems the same, so whether a single (only) system of its type in a network or a system that has multiple redundant nodes, any system failure would result in a page or alert. SCOM is service oriented, meaning that if multiple servers exist for redundancy, the administrator will not be urgently paged or alerted if one of many systems is down. As long as the service (such as email routing, web hosting, or domain authentication) continues to operate, a different level of response (such as an email notification instead of an urgent page) is triggered.
- Integrated solutions databases—For administrators debugging a problem, the process usually involves grabbing event errors out of the log files, going to Microsoft TechNet to research the information, finding the solution, and then going back to the server to try the solution. With SCOM, it has Microsoft's TechNet information integrated into the system so that when an event occurs and shows up on the SCOM console, right there with the event error is the symptom information and recommended solution that an administrator would normally find in TechNet online. Additionally, SCOM not only has the information of what an administrator should do (like start and stop a service), but SCOM also presents a Restart Service option on the SCOM console screen for the administrator to simply click the option to restart the service. If that solution solves the problem, SCOM allows the administrator to choose to have that solution (like restarting the service) automatically run the next time the event occurs on ANY server in the environment. This self-healing process allows an organization to set processes that automatically trigger and resolve problems without having an administrator manually identify and perform a simple task.
- Service-level agreement (SLA) tracking and reporting—Many organizations have, publish, and manage to a specific service-level agreement metric, so if a network service is offline or degraded, the service-level quality is triggered and the overall service-level agreement is measured. SCOM has reports as well as a Dashboard view component that provides administrators the ability to know the status of system operations in the network.
- Reporting—With previous versions of SCOM, reporting was an external add-on. Effectively, if an administrator wanted a report on the status of systems, a separate report tool was run. With the latest releases of SCOM, the reports are available right within the SCOM console. From a common console, an administrator can monitor systems as well as generate reports on every managed system in the environment.
Background on System Center Operations Manager
System Center Operations Manager 2007 R2 has over a decade of history at Microsoft and many years before that before Microsoft acquired the technology back in 1999. From its early roots as Operations Manager 2000 to what is now System Center Operations Manager 2007 R2, SCOM has come a long way.
Some of the major revisions and history of the product are as follows:
- NetIQ Enterprise Event Manager—System Center Operations Manager has its roots from a 1999 product acquisition Microsoft made from NetIQ. The product, NetIQ Enterprise Event Manager, was already a well-established tool for monitoring network environments and formed the basis of Microsoft's operations management offering.
- Microsoft Operations Manager (MOM) 2000—In 2000, Microsoft took the NetIQ product and rebranded it as Microsoft Operations Manager 2000, doing a little to include support for monitoring and managing the newly released Active Directory 2000 and Windows 2000 Server; however, for the most part, MOM 2000 was the NetIQ product with a new name. For the next five years, Microsoft released service packs and management packs to update the product to support all of the new Active Directory–supported products Microsoft was releasing like Exchange 2000 Server, Exchange Server 2003, SharePoint Portal Server 2001, SQL Server 2000, and the like.
- Microsoft Operations Manager (MOM) 2005—With the release of MOM 2005, Microsoft now had its first fully revised Microsoft monitoring and management product. Most organizations would consider this the Microsoft v2.0 of the product where core components such as event monitoring, event correlation, proactive monitoring, integration with TechNet support data, and the like made MOM 2005 a good Microsoft-focused monitoring and alerting product.
- System Center Operations Manager 2007—SCOM 2007 was a major improvement from Microsoft and one where the product was truly revised to meet the needs of enterprises. SCOM 2007 was now fully integrated with Active Directory so that servers and server roles (such as all Exchange front-end servers or all domain controllers) could be identified as a group. Role-based security was added so that there was better granular control over views and tasks that an administrator was able to perform. Also, the addition of an audit log collection system that auditors and regulators were looking for consolidated log information in which SCOM 2007 was able to extract log information and make that available for reporting.
- System Center Operations Manager 2007 SP1—SCOM 2007 SP1 included a rollup of all hotfixes for SCOM 2007, support for Windows 2008 as the base operating system that SCOM could run on, and a significant update to the Asset Intelligence (v1.5) component of SCOM for organizations that need better asset tracking and awareness.
- System Center Operations Manager 2007 R2—For those who have been using SCOM for a long time, the release of SCOM 2007 R2 was seen as a huge turning point of making SCOM a truly enterprise monitoring and management solution. SCOM 2007 R2 provided support for not only Windows-based servers and applications, but also now has support for non-Windows-based systems like UNIX and Linux system monitoring. SCOM 2007 R2 also has the ability of granularly defining Service Level Objectives, such as monitoring and assessing the response time of a specific logon procedure or web page view rather than simply pinging the system to see if it was up. In addition, significant improvements in scalability have been achieved, where monitoring of workloads can now be measured in the thousands of events per agent, allowing SCOM to reach into the largest data centers to manage Windows and non-Windows servers, network appliances and devices, and client systems throughout an enterprise.
What to Expect in the System Center Operations Manager Chapters
In this book, four chapters are dedicated to the System Center Operations Manager product. These chapters are as follows:
- Chapter 6, "Operations Manager Design and Planning"—This chapter covers the architectural design, server placement, role placement, and planning of the deployment of System Center Operations Manager 2007 R2 in the enterprise. The chapter addresses where to place management servers and where management packs fit in to SCOM for providing better data collection and reporting. This chapter also introduces the various server roles and how the server roles can be placed on a single server in a small environment or distributed to multiple servers, including best practices that have been found in combining certain roles and the logic behind combining roles even in the largest of enterprises.
- Chapter 7, "Operations Manager Implementation and Administration"—Chapter 7 dives into the installation process of SCOM along with routine administrative tasks commonly used in managing an SCOM environment. This includes the familiarization of the SCOM management console features and how an administrator would use the management console to perform ongoing tasks.
- Chapter 8, "Using Operations Manager for Monitoring and Alerting"—Chapter 8 gets into the meat of SCOM, focusing on core capabilities, such as monitoring individual servers and events and monitoring a collection of servers and creating event correlation to associate a series of servers, network devices, and applications for a better monitored view of key applications and network resources. Many organizations tend to just turn on the basic monitoring that SCOM has, which is good, but that's not where the value is in SCOM. The value is creating automation tasks so that when an event occurs, SCOM can automatically assess the problem, correlate the problem to other events, and send the IT administrator a specific notification or alert that will help the administrator better manage the environment as a whole. This chapter covers the process as well as digs into tips, tricks, and lessons learned in sharing best practices of monitoring and alerting in the enterprise.
- Chapter 9, "Using Operations Manager for Operations and Security Reporting"—The final chapter on SCOM in this book covers the reporting capabilities built in to SCOM. In earlier versions of the Operations Manager product, Crystal Reports was used as an external reporting tool that reached into the MOM databases to generate reports, which was cumbersome and really more of an afterthought for reporting. With SCOM 2007 R2, reporting is done through SQL Reporting Services and integrated right into the main SCOM console. Rather than seeing reporting as something some people use occasionally, SCOM's reporting takes management reports seriously as compliance officers, auditors, and executives want and need meaningful reports on the operations and management of their systems. SCOM 2007 R2 reporting provides out-of-the-box reports to track the most common business information reports needed out of the monitoring and security alerting system, with the ability to customize reports specific to the needs of the organization. This chapter covers the out-of-the-box reports as well as how an administrator can customize reports specific to their needs.
System Center Operations Manager 2007 R2 is a very powerful tool that helps network administrators be proactive in the monitoring of their servers and network devices, both Microsoft and non-Microsoft, and have the ability to address problems before downtime occurs. Jump to Chapters 6 through 9 of this book for specific information and deployment and configuration guidance on how SCOM can be best leveraged in your enterprise.