Measuring Your Current Performance Usage
Measuring your current performance is necessary so that you can get a good idea of how your current environment is performing. By doing this, you can ensure that you properly size your virtual hardware and can avoid any bottlenecks on your ESX hosts. Doing this before you start your project is important so that you do not run into any surprises that can cause problems during your deployment phase.
What to Measure
You should focus on four general performance categories: CPU, memory, disk, and network. You should gather these metrics for a minimum of one week, and preferably over a one-month period of time. Gathering these metrics for a longer period of time gives you a better understanding of any performance trends that you may be experiencing that might not happen on a regular basis. It is also important to gather metrics during critical business cycles (for example, weekly payroll processing or a monthly reporting process) where performance may spike. The combined results of these metrics will help determine your overall consolidation ratio (number of VMs per ESX host) and how many ESX servers you will need for the number of physical servers that you want to virtualize. Consolidation ratios can vary from as little as 2:1 to as high as 50:1 based on the total amount of resources that your VMs will require and the size of your ESX host servers.
Let's go over the categories, some important metrics, and some guidelines on each one.
Measuring CPU Usage
Typically, most Windows servers have very low overall CPU usage (< 10%), which is why virtualization is a great solution to maximize your hardware resources and reduce the number of physical servers in your environment. Average processor utilization is the best metric to use to measure how busy a server actually is. It will give you an overall indication of how much processor the physical server is using, which you can use to help plan your ESX host size. Most servers will peak near 100% at various times, but the peaks are not as important as the overall average utilization. High processor queue lengths can indicate a bottleneck on a physical server, which may disappear in a virtual environment because of the way the ESX hypervisor handles the scheduling and processing of CPU requests. Table 1.1 lists the CPU metrics that you will want to watch to determine the amount of CPU usage on your servers.
Table 1.1. Important CPU Metrics
Statistic |
Description |
Why This Is Important |
Processor queue length (average and maximum) |
Processor queue length is the number of threads in the processor queue. There is a single queue for processor time even on computers with multiple processors or cores. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. |
A sustained processor queue length of ten or more threads typically indicates a processor bottleneck. |
% processor time (average and maximum) |
% processor time is the percentage of elapsed time that the processor spends to execute a non-idle thread. It is calculated by measuring the duration the idle thread is active in the sample interval and subtracting that time from the interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run.) This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive and subtracting that value from 100%. |
This value indicates how much CPU that your server is actually using, which can be used to plan the amount of CPU needed on a virtual host. |
Measuring Memory Usage
The actual amount of physical memory that a server uses will determine how much memory your ESX hosts will need to be able to support all the VMs on it. It is possible to overcommit an ESX host (assigning VMs more memory than the host physically has), but it is not recommended in most cases because it will degrade the performance of your VMs once your host's physical memory has been used up. Table 1.2 lists the memory metrics that you will want to watch to determine the amount of memory usage on your servers.
Table 1.2. Important Memory Metrics
Statistic |
Description |
Why This Is Important |
Available free memory (average and least) |
Available MBytes is the amount of physical memory, in megabytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free, and zero page lists. |
This value indicates how much physical memory is not being used by your server. If you have excessive free memory then consider reducing the amount of RAM assigned to the server when moving it to a virtual host. |
Pages/sec (average and maximum) |
Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause systemwide delays. |
This value counts the number of times per second that the computer must access virtual memory rather than physical memory. This number normally increases as available memory decreases. Too many pages/sec can cause excessive disk activity and create a disk bottleneck. This often indicates that a system does not have enough physical memory. |
Measuring Disk Usage
The important things to know about a disk are how much you are using (disk space) and how much reading and writing to the disk that each server does (transfer rate). Disk is the slowest of the resources because it relies on a mechanical device and is usually the first bottleneck to performance in most systems. Therefore, it is important to understand how much disk activity your servers will be doing so that you can select a proper storage solution for your virtual hosts. It's also important to factor in the number of spindles (hard disks) in your redundant array of inexpensive disk (RAID) groups on your physical servers. A RAID group with more spindles will have better disk performance than one with fewer spindles. If you were to virtualize a physical server with a ten-spindle RAID group, you may not get the same performance if your ESX host is configured with only a five-spindle RAID group. Table 1.3 lists the disk metrics that you will want to watch to determine the amount of disk usage on your servers.
Table 1.3. Important Disk Metrics
Statistic |
Description |
Why This Is Important |
% disk time |
% disk time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. |
Similar to % processor time, this can be useful in characterizing the workload and gives a general indication of how busy the disk is. |
Average disk queue length |
Average disk queue length is the average number of both read and write requests that were queued for the selected disk during the sample interval. |
This tells you how many I/O operations are waiting for the hard disk to become available. This number should be as low as possible; a high number (> 5) can indicate an I/O bottleneck depending on the number of spindles (hard disks) in your RAID group. It's best to divide your average queue length by the number of spindles in your RAID group to get a more accurate number. |
Disk bytes/sec |
Disk bytes/sec is the rate bytes are transferred to or from the disk during write or read operations. |
This provides information about the throughput of the disk system and how busy it is. |
Physical disk transfers/sec |
Disk transfers/sec is the rate of read and write operations on the disk. |
This is the total number of read and write requests processed per second (commonly known as I/O operations per second or IOPS). Like disk bytes/sec, this also measures the throughput of the system. The difference is that this counter does not consider the size of the request, just the fact that it is a request. |
Measuring Network Usage
Network is a resource that typically is plentiful in virtual hosts because you can easily put many multiport high-speed network interface cards (NICs) in your ESX servers. You should still identify any servers that generate a large amount of network traffic so that you can add extra NICs if needed, and it will also help you when you build and configure your virtual switches (vSwitches). Also, network traffic between VMs that are on the same vSwitch does not go over the physical network (it travels along the host bus), which could reduce the amount of network traffic generated by your servers after they are virtual. Table 1.4 lists the network metrics that you will want to watch to determine the amount of network usage on your servers.
Table 1.4. Important Network Metrics
Statistic |
Description |
Why This Is Important |
Bytes total/sec |
Bytes total/sec is the rate at which bytes are sent and received over each network adapter, including framing characters. Network interface\\bytes received/sec is a sum of network interface\\bytes received/sec and network interface\\bytes sent/sec. |
This counter shows the amount of traffic through your network interface in byte per second. |
How to Measure It
You can determine your current performance usage in a number of ways:
- Use existing enterprise monitoring systems.
- Use operating system built-in performance monitoring tools (PerfMon).
- Use third-party analysis tools such as PlateSpin PowerRecon or Tek-Tools Profiler for VMware.
- Contact a VMware business partner and have them install the Capacity Planner tool in your environment.
VMware's Capacity Planner
Capacity Planner is a powerful tool that automatically collects all the relevant performance metrics on each Windows server in your environment and prepares a report that you can use to determine your hardware requirements for your virtual environment. It can identify trends in your environment and make recommendations for grouping physical servers on virtual hosts. It uses the built-in Microsoft performance counters and does not require that an agent be installed on each server that will be analyzed (it uses the WMI and the Remote Registry service). The Enterprise dashboard screen from Capacity Planner, along with all the other available options, is shown in Figure 1.1.
Figure 1.1 Sample screen from VMware's Capacity Planner tool
Currently, Capacity Planner is provided by VMware to its business partners only and is not available to the general public. Most business partners will install and configure it in your environment for you for free as long as you plan on buying software/hardware and professional services from them for your virtualization project. Using Capacity Planner is the best method for collecting data from your servers and reporting on it, because it was developed specifically for infrastructure assessment and data analysis and will provide consolidation estimates, recommendations, and capacity assessments.
Beginning with vCenter Server version 2.5, a "lite" version of Capacity Planner was integrated into vCenter Server as a feature called Guided Consolidation. This utility uses a built-in wizard to discover physical systems and analyze them to prepare them to be converted into VMs. Once these systems have been analyzed, they can be converted into VMs by the built-in VMware Converter feature of vCenter Server 2.5. The data gathered by this utility is basic and does not use some of the more advanced metrics that the full version of Capacity Planner uses. It can analyze up to 100 systems simultaneously and reports only on average CPU and memory utilization. Because of its limitations, it is recommended that you use a more robust performance monitor for your initial implementation. We discuss this feature in detail in a later chapter.
Using Built-In Operating System Tools to Gather Server Performance Statistics
For your Windows servers, you can use the Windows built-in performance monitor utility (PerfMon) to measure your server's statistics. The downside to this method is that you will have to set up, collect, and review the statistics for each server individually, which can be time-consuming if you have many servers. Alternatively, you can set up a dedicated workstation or server to centrally monitor and collect statistics from each server.
Most Linux servers have only built-in real-time statistic reporting tools. You may look at some free tools that provide historical performance reporting for Linux servers, like Sysstat (http://pagesperso-orange.fr/sebastien.godard/).
If you do choose to use PerfMon to gather your statistics, the following steps will help you set up and configure it. Before you begin, if you are going to use a central workstation to collect statistics, make sure the Performance Logs and Alerts service on the workstation is configured to start with a domain account that has access to every server that you want to monitor:
- Load the PerfMon utility on a workstation or server (Administrative Tools > Performance).
- In the left pane, select Counter Logs (located under Performance Logs and Alerts).
- Select Action from the top menu (or right-click Counter Logs) and choose New Log Settings.
- Enter a descriptive name for your log settings.
- Click the Add Counters button.
- Choose the Select Counters from Computer option, and type in the name of one of the servers you are going to monitor below it. Be sure and include the \\ before the Windows server name.
- After you enter your server name, it will connect to it and display a list of available counters below it, as shown in Figure 1.2.
Figure 1.2 PerfMon Add Counters window
- Select the performance object that you want to display counters for (for example, processor, memory, network interface), and then select the individual counter (for instance, Pages/sec), select All Instances if it is applicable (except for Network Interfaces, you do not want to select the Loopback interface) and not grayed out, and then click the Add button.
- Repeat this for every performance counter that you want to monitor on the server. The recommended counters you will want to add are listed here:
- Memory: Available MBytes
- Memory: Pages/sec
- Processor: % processor time
- System: Processor queue length
- Network Interface: Bytes total/sec
- Physical Disk: % Disk Time
- Physical Disk: Avg. disk queue length
- Physical Disk: Disk bytes/sec
- Physical Disk: Disk transfers/sec
- After you have added all counters for a particular server, you can type in a new server name to continue adding counters for other servers.
- Click the Close button after you have added all counters.
- Select the data sample interval, as shown in Figure 1.3; the default is 15 seconds, which is an aggressive interval and will result in more peak instances because of the shorter sampling period. You may want to consider changing this to a high interval between one and five minutes so that you do not overwhelm the workstation and cause it to miss data from some of the servers.
Figure 1.3 PerfMon Log Settings window
- Click OK to save your custom log settings.
- Collection will automatically begin (as indicated when the icon turns green). The results will be written to a log file (for example, C:\PerfLogs\MyServers000001.blg). You can stop it at any time by selecting your log settings and selecting Action, Stop (or by right-clicking it and selecting Stop). When you stop a collection, the log file it has written to is no longer used; a new log file is created once you start it again.
- If you have stopped your collection, you can review it by selecting System Monitor in the left pane, and then clicking the Disk icon (View Log Data). Then, on the Source tab, select your log file that was created; optionally, you can change the time range. On the Data tab, add your performance counters for each server. On the General tab, select your view type (Graph, Histogram, or Report) and click OK. Your counter will be displayed, and you can see the minimum, maximum, and average results for each one, as shown in Figure 1.4.
Figure 1.4 PerfMon resulting historical data for each counter
- It's a good idea to test this for a short period (for example, one hour) and review the results to make sure it is working before you leave it running for a longer period of time.
Using Enterprise Monitoring Systems
If you are using an existing monitoring system, try to report on only the appropriate statistics that will be relevant to determining your needs to size your virtual hosts. Too many statistics can make it more difficult to determine how busy a host is in each of the categories. Also, remember that when you convert your physical servers to VMs your enterprise monitoring system may not report accurate statistics because of the differences inherent with virtual environments.
What to Do with the Data You Collect
After you have gathered your performance statistics, you should group your servers into three categories:
- High overall resource utilization
- Medium overall resource utilization
- Low overall resource utilization
Then identify the servers that have the highest resource utilizations in specific areas: CPU, memory, disk, and network. You should then review the servers in the high overall resource utilization category to make sure that virtualizing them makes sense. Also, do the same for the top few servers in each of the specific resource areas. When you've determined which servers you want to virtualize, you can move on to sizing your hardware to match your expected workload.
It is helpful to put together a spreadsheet that contains the following information about your physical servers to help you add up the amount of CPU, memory, and disk needed for your ESX hosts:
- Server name
- Model
- Operating system
- Function
- Number of CPUs
- Speed of CPUs
- Total disk space
- Total disk space used
- Physical memory
Next, add your performance measurements to it:
- Average CPU usage (% processor time)
- Maximum CPU usage (% processor time)
- Average processor queue length
- Maximum processor queue length
- Average available free memory
- Minimum available free memory
- Average memory pages/sec
- Maximum memory pages/sec
- Average % disk time
- Maximum % disk time
- Average disk queue length
- Maximum disk queue length
- Average disk bytes/sec
- Maximum disk bytes/sec
- Average disk transfers/sec
- Maximum disk transfers/sec
Finally, add a ranking for each resource using a scale of one (least) to five (most) based on the averages for the measurements of each category. This ranking will help give you an idea of where each server ranks in usage for each of the resource areas. A server that has high ranking in more than two of the following categories may not be a good virtualization candidate:
- CPU resource usage
- Memory resource usage
- Disk resource usage
- Network resource usage
When you are done, you will have a spreadsheet that contains an inventory of all your physical servers and the resource usage statistics that you can use to help size your ESX hosts properly. In the next chapter, we discuss sizing hardware for your ESX hosts.