Addressing Disk Subsystem Bottlenecks
After you have established that the disk subsystem is the bottleneck slowing down your Exchange system, you can implement the following solutions to resolve the problem.
Again, these recommendations are guidelines only. However, they are guidelines that address the most common root causes of disk bottlenecks.
Separating All Transaction Logs
In the earlier versions of Exchange, you could significantly enhance the performance of the Information Store by placing the transaction log files on separate drives. These transactions are critical to the operation of the Exchange server and should be protected against failure by using hardware mirroring.
The same rule applies to Exchange 2000. In Exchange 2000, each transaction log set belongs to a storage group, and the number of transaction log drives in your server should equal the number of planned storage groups. The file system should be formatted with NTFS 5.0. It is recommended that each transaction log set be placed on a separate spindle for optimum performance.
Installing Additional Hard Disks
You can separate Windows 2000 processes (for example, paging file) and Exchange processes (for example, message-tracking logs) to enhance performance. You can also move the public and private Information Stores' transaction logs to separate disks or arrays for even better performance.
Overall, if you have a RAID subsystem, installing more drives coupled with a large RAM cache on the RAID controller yields faster throughput.
Installing Faster Hard Disks and Drive Controllers
Choose a disk with the lowest seek time available (seek time is the time required to move the disk drive's heads from one track of data to another). The ratio of time spent seeking as opposed to time spent transferring data is usually 10 to 1.
Determine what type of transfers the controller card performs: 16-bit, 32-bit, or 64-bit transfers. The more bits in the transfer operation, the faster the controller moves data. Ultra-SCSI technology, available from several vendors, offers an excellent combination of these features. The best way to judge is to run your performance simulations by using the guidelines, counters, and utilities (such as Load Simulator) outlined in this article. Short of performing a full-scale simulation, your best bet is to find whitepapers from independent testing firms that profile the latest technology.
Using RAID Disk Striping to Increase Performance
Use RAID 0 (disk striping) to increase overall capacity for random reads and writes. You will need at least two physical drives for RAID 0. Use RAID 5 (disk striping with parity) for slightly slower performance but more fault tolerance. You will need at least three physical drives for RAID 5.
If you implement RAID at the hardware level, choose a controller card with a large (4MB) onboard cache. Several vendors offer complete solutions of this type.
Write-Caching Hard Disk Controllers
Recent advances of technology coupled with the greater demand for increased performance from disk subsystems make it more likely that the server will use a write-caching disk controller. Write-caching can affect the transactional integrity of the Exchange Extensible Storage Engine.
The write-ahead log mechanism of the message store's database generally requires that writes not be cached. If for performance reasons you decide to turn on caching, you must ensure that cached writes will not be lost in case of system failure. The controller must provide battery backup and other fault-tolerant mechanisms against all possible conditions that could result from the discarding of dirty or updated pages in the write cache. To meet these requirements, the hardware write mechanism must be designed with a messaging/database in mind. Some of the design features would include using onboard battery backup, intercepting the RST bus signal to avoid uncontrolled reset of the caching controller, and using mirrored or error checking and correcting memory.
Minding Memory Requirements in Exchange
When Exchange runs, it keeps only portions of needed data, referred to as pages, in memory at any one time. When it needs a page of data that is not in RAM (page fault), Windows 2000 loads that page into physical memory from a peripheral device, which is usually the hard drive. The average instruction in memory executes in nanoseconds, which is one-billionth of a second, and hard drive seek and access times are in milliseconds. Therefore, Windows 2000 must run approximately 100,000 times slower than normal to retrieve a page from disk.