- Advantages of HP Smart Array Controllers
- HP Array Controller Utilities
- Array Controller Technologies and Capabilities
- ATA RAID
- Summary
- Learning Check
13.3 Array Controller Technologies and Capabilities
Several key capabilities and technologies are implemented in HP array controllers, including the following:
-
Online spare drives
-
Array capacity expansion
-
Logical volume extension
-
Online RAID-level migration
-
Online stripe-size migration
-
Hard drive failure prediction
-
Dynamic sector repair
-
Hot-plug drive support
-
Automatic data recovery
-
Array accelerator (read/write cache)
-
Data protection
-
Array performance tuning
It is important for Accredited Integration Specialists to understand these technologies and features.
13.3.1 Online Spare Drives
The online spare drive acts as a temporary replacement for a failed drive, as illustrated in Figure 13-2. One online spare drive can be added to any fault-tolerant logical drive (RAID 0 is not supported). An online spare may be assigned to more than one array, if efficient use of drive capacity is important. The capacity of the online spare must be at least as large as that of the other drives in the array. All HP Smart Array controllers support up to four online spare drives.
Figure 13-2 How an online spare drive works.
When a data drive fails, the online spare drive automatically starts to rebuild the data of the failed drive. After the online spare drive has been completely rebuilt, the failure of a second drive can be handled without data loss. A second drive most likely will not fail until the online drive has been rebuilt; nevertheless, only ADG can handle two simultaneous drive failures in all cases.
As soon as the failed drive is replaced, data is automatically rebuilt on the new drive. After data has been completely rebuilt on the new drive, the online spare switches back to its role as an online spare drive. This avoids roaming online spare drives.
Data is rebuilt to the online spare at a rate of 10 to 20 minutes per gigabyte, depending on the priority assigned to rebuilding and the total number of drives in the array.
The online spare drive does not have to be partitioned or formatted. The online drive is always active and running, even when it is not in use.
Insight Manager 7 can monitor the online spare drive just like all the other active drives.
The online spare drive is available for RAID 1, RAID 1+0, RAID 4, RAID 5, and RAID ADG.
Selecting a high rebuild priority results in reduced server performance while the rebuild is in progress. Setting the rebuild priority to low allows normal server performance, because rebuilding only occurs when the server is idle; rebuild time can be significantly longer depending on system activity.
13.3.2 Array Capacity Expansion
To perform an online array expansion, install a new drive in a hot-pluggable drive bay and use the ACU to add the new drive to an existing array. Figure 13-3 illustrates capacity expansion.
Figure 13-3 How an online spare drive works.
All data is relocated after the expansion process is started. Redistributing data across all the drives creates free space in each drive. These zones on all drives are then available to create a new logical drive or extend the capacity of an existing logical drive.
When the new logical drive is presented to the operating system after the expansion process, the operating system does not see a larger drive. It sees the old logical drive and a new logical drive. The expansion process is independent of the operating system. For example, if a 10GB logical volume is expanded from four drives to six drives, the operating system is unaware of this change.
Physical drive expansion does not create a larger logical drive, but creates a new logical drive. It is visible to the operating system after the expansion process is completed.
Drive array expansion is performed at the array controller level, not at the logical drive level. In most cases, all disk drives attached to a controller should be grouped together into a single array. This provides the most efficient use of RAID fault tolerance. Using the ACU, you can assign physical drives to an array and designate up to four drives per array controller as online spares.
Up to 32 logical drives can be defined with any HP Smart Array controller. All drives within an array should be the same size. If disks of higher capacity are installed within a single array, the extra capacity will not be available. Some operating systems support fewer than 32 logical drives.
Under Windows 2003, Windows 2000, Windows NT, Linux, and Novell NetWare, the ACU can be started online. The server does not have to be powered down when disks are configured.
The amount of time required to perform the online capacity expansion depends on several parameters, including drive speed, server processor speed, the amount of I/O work the server is doing, and the priority level of the capacity expansion.
The priority level can be changed from low (the default in ACU) to medium or high to expand the volumes as quickly as possible. Depending on these factors, the expansion process takes between 10 to 15 minutes per gigabyte.
All current HP Smart Array controllers support online array expansion without data loss. Data reallocation runs as a background process. It can be assigned a high, medium, or low priority depending on the performance required when the data is reallocated. RAID protection is maintained throughout reallocation. The time required for data reallocation depends on the size of the logical drive.
13.3.3 Logical Volume Extension
Performing a drive extension is the process of growing the size of a logical drive. In this case, the increased size of the logical drive is reported to the operating system. Volume extension is illustrated in Figure 13-4.
Figure 13-4 Logical volume extension.
Only operating systems that support volume extension can use the added capacity without losing data.
Not all operating systems support online logical drive extension through the ACU.
Windows, NetWare, and other advanced operating systems support volume and logical drive extension, which enables you to add additional drives to an existing RAID set and extend the logical drive so that it displays as free space at the end of the same drive presented to the operating system.
Linux only supports volume and logical drive extension at the operating system level. It is not supported through the logical drive extension on the array controller.
You can use the Diskpart.exe command line utility, included with Windows Server 2003 or the Windows 2000 Resource Kit, to extend an existing partition into free space.
HP OpenView Storage Volume Growth enables dynamic expansion of volumes on Microsoft Windows 2000 or Windows Server 2003 basic disks.
Third-party software vendors have created utilities that can be used to repartition disks without data loss. Most of these utilities work offline.
Some operating systems require updates or service packs to support volume or logical drive extension. For example, Windows 2000 requires at least SP3 if you are using dynamic disks. For basic disks, Windows does require SP3.
13.3.4 Online RAID-Level Migration
All current HP array controllers support RAID-level migration. You can easily migrate a logical drive to a new RAID level. There might need to be unused drive space available on the array for the migration to be possible, depending on the initial and final settings for the stripe size and RAID level.
Online RAID-level migration is illustrated in Figure 13-5.
Figure 13-5 Online RAID-level migration.
In a Windows or NetWare architecture, this can be performed online without disrupting system operation or causing data loss. Offline migration can be performed with any operating system.
13.3.5 Online Stripe-Size Migration
All current HP array controllers also support stripe-size migration. You can easily change the stripe size of an existing logical drive using the ACU. In a Windows and NetWare architecture, this can be performed online without disrupting system operation or causing data loss. The default data stripe size for controllers differs depending on which fault-tolerant RAID is used.
13.3.6 Hard Drive Failure Prediction Technology
HP pioneered failure prediction technology for hard disk drives in the form of monitoring tests run by Smart Array controllers. Called Monitoring and Performance (M&P) or Drive Parameter Tracking, these tests externally monitor hard drive attributes such as seek times, spin-up times, and media defects (more than 20 parameters) to detect changes that could indicate potential failure.
The flowchart in Figure 13-6 illustrates the process used by drive failure protection technology.
Figure 13-6 Drive failure prediction process.
HP worked with the hard drive industry to help develop a diagnostic and failure prediction capability known as Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.). Over the years, as S.M.A.R.T. matured, HP used both M&P and S.M.A.R.T. to support hard drive failure prediction technology for Prefailure Warranty replacement of hard drives.
S.M.A.R.T. has now matured to the point that HP relies exclusively on this technology for hard drive failure prediction technology to support Prefailure Warranty.
Starting in 2001, HP has been shipping SCSI hard drives configured to disable M&P tests on the Smart Array controllers. This eliminates false failure predictions and improves performance by eliminating the hourly M&P controller-initiated tests.
S.M.A.R.T. improves failure prediction technology by placing monitoring capabilities within the hard disk drive. These monitoring routines are more accurate than the original M&P tests because they are designed for a specific drive type and have direct access to internal performance, calibration, and error measurements. S.M.A.R.T. uses internal performance indicators and real-time monitoring and analysis to improve data protection and fault prediction capability beyond that of the original M&P tests. In addition, HP Smart Array controllers proactively scan the hard drive media during idle time and deal with any media defects detected.
S.M.A.R.T. can often predict a problem before failure occurs. HP Smart Array controllers will recognize a S.M.A.R.T. error code and notify the system of an impending hard drive failure. Insight Manager will be notified whenever a potential problem arises. HP drives that fail to meet expected criteria are eligible for replacement under the unique HP Prefailure Warranty.
13.3.7 Dynamic Sector Repair (DSR)
Under normal operation, even initially defect-free drive media can develop defects. This is a common phenomenon. The bit density and rotational speed of disks is increasing every year, and so is the likelihood of problems. Usually a drive can internally remap bad sectors without external help using cyclic redundancy check (CRC) checksums stored at the end of each sector.
All Smart Array controllers perform a surface analysis as a background job when there is no other disk activity. Even a completely unreadable sector can be rebuilt and remapped by using the RAID capabilities of the controller.
DSR functions automatically with hardware-handled fault tolerance. DSR is unavailable when hardware fault tolerance is not used. It uses the fault tolerance of the drive subsystem to replace a bad sector with a spare sector. The correct data is written to the spare sector on the same drive.
DSR triggers automatically. The HP Smart Array controllers trigger DSR after 30 seconds of idle time.
When DSR detects a bad or a potentially bad sector, it relocates the data to a sector on a different track (as shown in Figure 13-7), just in case two sectors within the same track are bad.
Figure 13-7 How DSR works.
DSR does not affect disk subsystem performance because it runs as a background task. DSR discontinues when the operating system makes a request.
The disk drive activity LEDs flash when the DSR is running.
13.3.8 Hot-Plug Drive Support
Several of the advantages provided by Smart Array controllers require hot-pluggable SCSI drives. Without hot-pluggable drives, the following operations cannot be completed with the drive online:
-
Replacement of a failed drive in a fault-tolerant array
-
Addition of drives and arrays
-
Expansion of arrays
Although HP supports non-hot-pluggable drives on all of its array controllers, they are not recommended. One of the primary advantages of array controllers is the ability to recover fully from a drive failure without taking the server offline. This capability requires the use of hot-pluggable drives in conjunction with an array controller.
13.3.8.1 HOT-PLUGGABLE DRIVE LEDS
The HP Smart Array controller firmware has been enhanced so that when the controller detects that an attached hot-pluggable hard drive has entered a degraded status, the amber LED on the hard drive flashes. This enhancement allows easier detection and replacement of the affected physical hard drive, especially when reported by a system management utility such as Insight Manager. The affected hot-pluggable hard drive remains online and displays the LED combinations listed in the following table.
Status |
Condition |
---|---|
Online |
On |
Drive Access |
On, off, or blinking |
Drive Failure |
Blinking amber |
This feature is not supported in a RAID 0 no-fault-tolerant configuration. The controller must be configured in a RAID 1, RAID 1+0, RAID 5, or RAID ADG fault-tolerant configuration.
13.3.9 Automatic Data Recovery
A Smart Array controller automatically detects whether a failed drive has been replaced. When the RAID level is set for 1, 1+0, 4, 5, or ADG, data is rebuilt automatically on the new drive. All you must do is replace the failed drive. In a system that supports hot-pluggable drives, this replacement can be done with the system up and running. The rebuild priority can be set and changed any time using the ACU.
When a drive fails, the following factors influence the data recovery time:
-
Type and size of the drive
-
RAID level
-
Workload on the system
-
Controller type
-
HP Smart Array accelerator setting
-
HP Smart Array drive-recovery priority level
If the system is in use during the drive rebuild, recovery time depends on the level of activity. Most systems should recover in nearly the same time with moderate activity as with no load, particularly RAID 1. RAID 5 is more sensitive to system load during the recovery period because of the considerably heavier I/O requirements of the failed system.
Selecting a high rebuild priority results in reduced server performance when the rebuild is in progress. Setting a low rebuild priority allows normal server performance, because rebuilding occurs only when the server is idle; rebuild time can be significantly longer depending on system activity.
13.3.10 Array Accelerator (Read/Write Cache)
The array accelerator on the Smart Array controllers dramatically improves I/O performance. Depending on the controller, it can have a size of 4, 16, 32, 64, 128, or 256MB.
The array accelerator uses an intelligent read-ahead algorithm that anticipates data needs and reduces wait time. It detects sequential read activity on single or multiple I/O threads and predicts what requests will follow. The data is gathered and stored in the high-speed cache. As soon as the data is requested by the operating system, the data is delivered 100 times faster than a disk can deliver data.
Whenever random-access patterns are detected, read-ahead is disabled because reading ahead data under random I/O slows down the system instead of making it faster.
By default, the array accelerator cache capacity is equally divided between reads and writes. If your server application has significantly more reads than writes (or vice versa), you may need to change this setting to improve performance. This change can be accomplished online without rebooting the system. The optimal ratio setting is application-dependent.
If the disks are busy, new writes can be stored in the cache and written to the disk later when there is less activity (write-back). Some smaller blocks can usually be combined into larger blocks resulting in fewer but larger blocks written to the disk, thus improving performance.
The Smart Array 5300 controller is the only array controller family with upgradeable cache modules.
13.3.11 Data Protection
Data in a write cache demands special protection. Data protection provided by HP array controllers are battery backup, BBWC enabler, and recovery ROM.
13.3.11.1 BATTERY BACKUP AND BBWC ENABLER
All Smart Array controllers with a battery-backed write cache (BBWC) feature a removable memory module and a BBWC enabler. A short cable connects the memory module and the enabler. In the event of a server shutdown, without using tools you can remove the memory module, the enabler, and the hard drives and install them in another ProLiant server that supports BBWC. When the new server is powered on, an initialization process writes the preserved data to the hard drives.
In the event of a general power outage, the BBWC enabler protects data in the memory module, which holds both the read cache and the write cache. You can allocate the size of each cache with the ACU.
The batteries in the BBWC enabler are recharged continuously through a trickle-charging process whenever the system power is on. The batteries protect data in a failed server for up to three or four days, depending on the size of the memory module. Under normal operating conditions, the batteries last for three years before replacement is necessary.
The BBWC enabler consists of the following components:
-
A battery module, which includes a charger and status indicators
-
A field-installable battery cable
Depending on the HP ProLiant server platform, there are several mechanisms for deploying a BBWC enabler. The enabler might be
-
A standard feature.
-
Available as an option.
-
Bundled with a Smart Array 5i to 5i Plus controller upgrade.
For more information on the HP Smart Array controllers, visit http://h18004.www1.hp.com/products/servers/proliantstorage/arraycontrollers/.
13.3.11.2 RECOVERY ROM
Smart Array controllers feature recovery ROM, which provides protection against firmware corruption.
The controller maintains two copies of firmware in ROM. Previous working firmware is maintained when new firmware is flashed to the controller. The controller will roll over to standby firmware if corruption occurs.
Recovery ROM reduces the risk of flashing new firmware to the controller.
13.3.12 Array Performance Tuning
You can optimize the performance of an array in several ways, including the following:
-
Choose a stripe size suitable for the type of data transfer common to the system.
-
Change the fault-tolerance mode to one that requires less overhead.
-
Enlarge the logical drive to span all four controller channels (depending on the controller).
-
Change the read/write cache ratio in the Smart Array controller.
13.3.13 Disk Striping
To speed operations that retrieve data from disk storage, you can use disk striping to distribute volume segments across multiple disks. The most effective method is to distribute volume segments equally across the disks.
Striping improves disk response time by uniting multiple physical drives into a single logical drive. The logical drive is arranged so that blocks of data are written alternately across all physical drives in the logical array. The number of sectors per block is referred to as the striping factor.
Depending on the array controller in use, the striping factor can be modified, usually with the manufacturer's system configuration utility. Many of the HP Smart Array controllers can be modified online with online utilities that indicate the status of the logical drives and arrays and display the completion percentage of the rebuild process. For NetWare, this utility is cpqonlin.nlm and for Windows, it is the ACU. The ACU for Linux is installed along with the ProLiant Support Paq (PSP). You can enable the ACU through the Systems Management home page using the command cpqacuexe.
To access the System Management home page, go to https://127.0.0.1:2381.
On HP controllers released before the Smart Array 3100ES, changes to stripe size are data-destructive. In addition, any change to the logical volume geometry (such as striping factor, volume size, or RAID level) can be data-destructive.
RAID 0 striping improves volume I/O because you can read data and write data concurrently to each disk. If one of the disks fails, the entire volume becomes unavailable. To provide fault tolerance, implement some of the fault-tolerant RAID levels supported by Smart Array controllers.
13.3.14 Optimizing the Stripe Size
Selecting the appropriate stripe (chunk) size is important to achieving optimum performance within an array. The stripe size is the amount of data that is read or written to each disk in the array when data requests are processed by the array controller.
The terms chunk , block, and segment are used interchangeably. Chunk is used most often when discussing storage.
The following table lists the available stripe sizes and their characteristics.
Fault-Tolerance Method |
Available Stripe Sizes (KB) |
Default Size (KB) |
---|---|---|
RAID 0 |
128, 256 |
128 |
RAID 1 or 1+0 |
8, 16, 32, 64, 128, 256 |
128 |
RAID 5 or RAID ADG |
8, 16, 32, 64 |
16 |
To choose the optimal stripe size, you should understand how the applications request data.
The default stripe size delivers good performance in most circumstances. When high performance is important, you might need to modify the stripe size.
If the stripe size is too large, there will be poor load balancing across the drives.
If the stripe size is too small, there will be many cross-stripe transfers (split I/Os) and performance will be reduced.
Split I/Os involve stripes split onto two disks, causing both disks to seek, rotate, and transfer data. The response time depends on the slowest disk. Split I/Os reduce the request rate because there are fewer drives to service incoming requests.
Type of Server Application |
Suggested Stripe-Size Change |
---|---|
Mixed read/write |
Accept the default value. |
Mainly read (such as database or |
Larger stripe sizes work best. Internet applications) |
Mainly write (such as image-manipulation applications) |
Smaller stripes for RAID 5, RAID ADG. Larger stripes for RAID 0, RAID 1, RAID 1+0 |
If you stripe disks on two or more SCSI controllers (called controller multiplexing), the operating system must calculate where to place data in relation to the striping, in addition to other calculations that contribute to processor overhead. For best performance, stripe disks only on the same controller or use an HP Smart Array controller with multiple channels and specific circuitry for handling these calculations.
A multichannel card uses only one interrupt. The HP Smart Array 5300 and 6400 series controllers feature two or more channels for enhanced performance and capacity.