DR on CPU/Memory Boards
CPU and memory boards can be connected, disconnected, configured, or unconfigured. You can view the status of these boards by using the DR module in the Sun MC software. In the following figure, boards N0.SB1 and N0.SB4 are in the disconnected, untested state, and are available to be configured into a domain.
FIGURE 3 CPU/Memory Component Table in Sun MC
To View DR Operations for CPU/Memory Boards
Log in to the domain as superuser.
Use the cfgadm(1M) command with its help option (-h) and the name of the CPU/Memory board to view the DR operations for the CPU/Memory board.
The following example shows the output of the command.
# cfgadm -h NO.SB2 Usage: cfgadm [-f] [-y|-n] [-v] [-o hardware_options] -c function ap_id [ap_id...] cfgadm [-f] [-y|-n] [-v] [-o hardware_options] -x function ap_id [ap_id...] cfgadm [-v] [-s listing_options] [-o hardware_options] [-a] [-l [ap_id|ap_type...]] cfgadm [-v] [-o hardware_options] -t ap_id [ap_type...] cfgadm [-v] [-o hardware_options] -h [ap_id |ap_type...] Sbd specific commands/options: cfgadm [-o parable] -l ap_id cfgadm [-o unassign|nopoweroff] -c disconnect ap_id cfgadm -t ap_id cfgadm -x assign ap_id cfgadm -x unassign ap_id cfgadm -x poweron ap_id cfgadm -x poweroff ap_id
Unconfiguring Memory
The entire amount of memory on a CPU/Memory board is a dynamic attachment point. The same DR mechanisms are used if you unconfigure memory individually or as part of an unconfigure-disconnect operation on a CPU/Memory board.
When unconfiguring memory, the DR operation differentiates between permanent and nonpermanent memory. Permanent memory is nonpageable (that is, it contains kernel or OpenBoot PROM structures). All of the other memory is pageable and is considered nonpermanent. DR uses different mechanisms for pageable and nonpageable memory.
You can determine if memory is permanent or nonpermanent from the CLI by using the cfgadm -av | grep permanent command or by using the Sun MC DR module, as shown in the following figure.
FIGURE 4 Memory Component Table in Sun MC
For nonpermanent memory, the memory pages are flushed back to a disk, moved to another memory location, or swapped out appropriately. The length of time this operation takes depends on the amount of memory being unconfigured and the system usage of the memory (for example, memory pages that are locked by processes). The Solaris OE and user applications are blocked from using these pages during the unconfigure process.
For permanent memory, a copy-rename operation is used because permanent memory contains critical kernel structures, so it cannot be swapped out. Before you can unconfigure permanent memory, nonpermanent memory on another CPU/Memory board must be unconfigured. The amount of nonpermanent memory must be at least as large as the amount on the board with permanent memory. Only the size of the memory is relevant, the memory layout is not.
After the DR software has a place for the permanent memory, the Solaris OE and user applications are quiesced (suspended), and the memory is copied to the new location. The memory controllers are then renamed appropriately. During the quiesce, all memory activity by the operating system is stopped, and all I/O operations and thread activity is paused. For most domains, only one CPU/Memory board contains permanent memory.
Before unconfiguring nonpermanent memory, you must do the following:
Change the memory interleaving to either within-board or within-cpu if the domain was not booted with memory interleaving set to one of these values, then reboot the domain.
Before unconfiguring permanent memory, you must do the following:
Stop all real-time processes.
Ensure that all of the device drivers are Device Driver Interface (DDI) compliant.
The following table contains the requirements for the DDI functions on loaded device drivers with different types of attachment points.
TABLE 2 DDI Requirements for Device Drivers
Support |
Permanent Memory |
Nonpermanent Memory |
I/O Devices |
For Unconfiguring: |
|
|
|
DDI_DETACH |
Yes |
Yes |
Yes |
DDI_SUSPEND |
Yes |
|
|
DDI_RESUME |
Yes |
|
|
|
|
|
|
For Configuring: |
|
|
|
DDI_ATTACH |
Yes |
Yes |
Yes |
The following table contains a description of the DDI, IPMP, and Traffic Manager support for common Sun drivers. See your service representative for updates to this list.
TABLE 3 DDI, IPMP, and Traffic Manager Support for Drivers
Driver |
DDI |
IPMP |
Traffic Manager |
hme |
Yes |
Yes |
N/A |
isptwo |
Yes |
N/A |
TBD |
qfe |
Yes |
Yes |
N/A |
ba |
Yes |
TBD |
N/A |
qlc |
Yes |
N/A |
Yes |
glm |
Yes |
N/A |
TBD |
Unconfiguration Commands
The unconfiguration commands for DR are the same for CPU/Memory boards with or without permanent memory. When you unconfigure permanent memory, you will see a message about the memory, and you must confirm the operation.
Unconfiguration Example
In this example, SB1 has 833 Mbytes of permanent memory, and SB2 has nonpermanent memory. If you unconfigure or disconnect SB1, or if you unconfigure the memory on SB1, the operating system will be quiesced. If you unconfigure or disconnect SB2, or unconfigure the memory on SB2, the system will not be quiesced.
FIGURE 5 shows the dynamic attachment points memory in the domain. The location of the permanent memory depends on previous DR operations, the size of the kernel, and the memory distribution within the domain; therefore, the permanent memory does not have to be on the lowest numbered CPU/Memory board in the domain.
FIGURE 5 Memory Information for the Domain
FIGURE 6 shows the memory output after the memory on SB1 has been unconfigured. Comparing this figure with the previous figure, you can see that the memory base address for SB2 has changed from 0x2000000000 to 0.x0. You can also see that the permanent memory has been moved from SB1 to SB2. The memory base address for SB1 is now the prior memory base address for SB2. The permanent memory base address, after the unconfigure operation, is still 0x0.
FIGURE 6 Memory Information After an Unconfigure Operation
Configuring Memory
Configuring memory into the domain is a simple operation that adds the memory to the Solaris OE memory structures. This memory can then be used by the Solaris OE and by user applications. You must set up interleaving within the board before configuring a CPU/Memory board or its memory into the domain.
DR Operation Time
The total amount of time used for unconfiguring and disconnecting a CPU/Memory board depends on two segments of time. The first segment is the length of time it takes for CPU/Memory resources to be removed from a domain. Understanding this length of time helps you determine the time required for a CPU/Memory board to be disconnected prior to performing upgrades or service actions.
The second segment of time is the length of time it takes for the system to be quiesced when permanent memory is unconfigured or disconnected. The quiesce period depends on the amount of time the DR software takes to perform the copy-rename operation and on the amount of time it takes to suspend and resume device drivers and processes. The total time is affected by the number of processors, the number of I/O device drivers that have to be suspended and resumed, and the amount of memory that needs to be copied.
When you unconfigure or disconnect a CPU/Memory board with used memory pages, the pages must be freed up. This process is known as draining the memory. The drain time depends on the use of the page by the operating system, the amount of pages that are locked by processes, and the amount of memory being unconfigured. System load and locking methods are specific to the system or application using the pages.
The total amount of time required for the drain and quiesce operations depends on the system configuration and the applications running on the domain.
The total amount of time used for connecting and configuring a CPU/Memory board depends on the time it takes to run the POST operation.
NOTE
Measuring the drain and quiesce time on production systems is important. It allows you to estimate the length of time needed for configuration changes and upgrades. It also helps to manage the impact of unconfiguring and disconnecting permanent memory.
Time Length Examples
In the following examples, the systems are configured to run only the Solaris OE. This condition allows the examples to show clearly the influencing factors on the length of time needed for the DR operations.
This first example, the number of CPU/Memory boards in the domain was increased stepwise, then the length of time for disconnecting the CPU/Memory board containing permanent memory was measured. The disconnect time is the sum of the time taken for memory drain and system quiesce. FIGURE 7 shows how adding CPUs to the domain increases the disconnect time.
FIGURE 7 CPUs as a Factor of the Disconnect Time
Each step of four CPUs corresponds to a 2-Gbyte step in total memory. The disconnect times show that as you increase the number of CPUs, you increase the disconnect time. The increase in time is due to the fact that more CPUs have to be suspended and resumed during the copy-rename operation. The increase in time is not significant, so the increase of CPUs has a low impact on the disconnect time.
In the second example, the same configuration was used, but the number of CPUs was held constant at 24. The total amount of memory in the domain was increased stepwise, then the quiesce time was measured. To derive the quiesce time, debug kernels were used because with a production kernel, only the sum of the quiesce and drain time can be measured accurately.
FIGURE 8 shows that a static number of CPUs does not significantly impact the amount of time for the quiesce operation.
FIGURE 8 Memory as a Factor of the Quiesce Time
Removing Individual CPUs and Memory Banks
CPUs and memory are dynamic attachment points on a CPU/Memory board. Individual CPUs or memory banks cannot be unconfigured independently because of the association of the memory banks and CPUs and the fact that the entire memory on a CPU/Memory board is treated as a single dynamic attachment point.
The memory controller is implemented on the UltraSPARC_ III CPUs. Each CPU controls two of the eight memory banks on the CPU/Memory board. FIGURE 9 shows the dynamic attachment points for the CPUs and memory on a CPU/Memory board.
FIGURE 9 Dynamic CPU/Memory Attachment Points
However, with the use of DR and Sun Fire SSC commands, you can disable an individual memory bank or CPU, isolating it from domain usage.
To Remove Individual CPUs and Memory Banks
You can remove individual CPUs and memory banks by using the disablecomponent command on the Sun Fire SSC.
Log in to the domain as superuser.
Use the cfgadm(1M) command or the Sun MC DR module to disconnect the CPU/Memory board.
This command removes the board from usage by the domain.
Use the disablecomponent command on the Sun Fire SSC to disconnect the individual component.
You can use the showcomponent command on the Sun Fire SSC to verify which components are enable or disabled. The memory controlled by a CPU is not usable when the CPU is disabled.
Use the cfgadm(1M) command or the Sun MC DR module to configure the CPU/Memory board back into the domain.
All of the components, except those that are disabled, are configured back into the domain.
NOTE
Using DR, unconfiguring individual CPUs will cause the entire memory on the CPU/Memory board to be lost. Using this procedure, you can minimize the losses of resources when you remove individual CPUs or memory banks. This procedure enables you to remove individual components so that scheduled maintenance can take place.