Detecting Disk Subsystem Bottlenecks
With most Exchange systems, the disk subsystem has the most influence on performance.
The primary consideration with the disk subsystem is not its size, but its capability to handle multiple random reads and writes quickly. For example, when Exchange users open their inboxes, the set of properties in the default folder view must be read for approximately the first 20 messages. If the property information is not in the cache, it must be read from the information stored on disk. Likewise, a message transferred from one server to another must be written to disk for the receiving server to acknowledge its receipt. (This is a safety measure that prevents message loss during power outages.) Now imagine the read and write activity created by 300 heavy email users on one server. Their combined requests would generate a multitude of random messages (traffic) on the disk subsystem.
CAUTION
Sometimes you see extremely high % Disk Times and think that your disk subsystem is causing a bottleneck. However, you want to examine other overview counters before going in any one direction. For example, when available memory drops to critical levels, Windows 2000 begins to page (write unused data or code to the hard drive to make room for more active programs). In a case of extreme RAM resource starvation, your disk subsystem can be reading and writing furiously and appear to be bottlenecked. Looking at other general disk counters in PerfMon will validate this illusion.
When you examine both memory and disk subsystem counters, you'll notice that during prolonged memory paging, disk activity increases. The solution is to add more memory, not to increase your disk subsystem capacity.
If you suspect that the server's disk subsystem is forming a bottleneck that slows down user requests, examine the following Windows 2000 Performance Monitor counters:
Physical Disk: % Disk TimeDisk Time is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests. In other words, this counter provides an indication of how busy your disk subsystem is over the time period that you're measuring in PerfMon. A consistent average over 95 percent indicates significant disk activity.
Physical Disk: Current Disk Queue LengthThis counter measures the number of requests waiting to use the disk subsystem at the time the performance data was collected. Multispindle disk devices can have numerous requests active at any instance of time. Requests would experience a delay directly proportional to the queue length minus the number of spindles on the disk. This counter should average less than 2 percent for good performance. Use the Disk Queue Length counter combined with the % Disk Time counter to get an exceptional overview of your disk subsystem's workload.
Both counters can monitor either your server's physically installed disk spindles or RAID bundles.