- Recommendations for Applying Preferred Practices
- Principals of Mission-Critical Implementations
- Physical Environment
- Internal Network Planning
- External Network Planning
- System Controller Configuration
- Platform and Domain Administration
- Security
- Error Analysis and Diagnosis
- Platform and Domain Configuration
- Dynamic Reconfiguration
- References
- Related Resources
Dynamic Reconfiguration
Dynamic reconfiguration (DR) is a powerful tool for allocating and de-allocating resources to and from a domain with minimal interruption. Included in the features of DR is the ability to add or delete CPU/Memory boards within a running domain, as well as move them between running domains. You can also hot swap CPU/Memory and PCI boards using DR. DR operations support both hsPCI I/O assemblies and MaxCPU boards, as well as CPU/Memory boards. DR is also very useful when servicing failed components in the system because they can be dynamically removed from the running domain.
Some applications are better suited to maximize the effectiveness of DR than others. To fully understand the impact of a DR operation on a domain, detailed knowledge of the domain's configuration and application workload is critical. This section provides some preferred practices that you should consider when designing and deploying a domain that will utilize DR. For more information about DR, reference the online Sun documentation sets for DR, which include the Dynamic Reconfiguration Users Guide, Dynamic Reconfiguration Installation Guide, Release Notes, and the SMS Users Guide. This documentation can be found at: http://docs.sun.com, http://www.sun.com/products-n-solutions/hardware/docs, and http://sunsolve.sun.com
Before you use DR, you need to validate some basic tasks. Make sure that certain components are up-to-date and available. This includes the proper Solaris OE version and update, SMS version and patches, CPU memory Fcode versions, and SunMC version. Some general guidelines can be implemented into the design of the Sun Fire 15K/12K server to enhance the ability for successful DR operations. This should include spreading memory and CPUs evenly across all boards in the frame and configuring applications with the mind-set that both the OS and the application will need to be made inactive for a short period of time to complete the detach process.
Additionally, when designing Sun Fire 15K/12K domains for DR, consider that domains with only one CPU/Memory board cannot be detached. Boards with only one path to the boot array cannot be detached. All critical resources that will be detached must be redundant and must have alternate paths, and most importantly, the domain that is detaching a board must be able to tolerate less processors and memory.
On a tactical level, the following tasks must be completed and verified.
Ensure that the board is flashed to the correct LPOST version, check on http://sunsolve.sun.com for the latest firmware versions.
When adding a component you must be able to verify that the firmware level on the domain matches the firmware level on the component that is being added to the domain.
Are there any bound processes to the CPU/memory board that is being detached? (The pbind command can be used to detect these processes.)
Are there any unsupported or third party adapters in the board that is being used in the DR operation?
Are there any 'real-time" processes running such as NTP on the domain that is detaching a CPU/memory board?
Are there available boards equipped in the domain to receive the permanent memory from the board that is being detached?
The proper Solaris /etc/system parameters must be in place to enable caged mode for the kernel and to allow for memory to be moved.
The status of the component must show that it is available. To check the status of the component, use the showboards command.
When removing a component from a domain, check to see if the board contains permanent memory (nonpageable memory such as the kernel) by using the cfgadm command.
Memory interleaving must also be set on the boards that are being detached.
Once the DR operation has completed successfully, check that the operating system has detected the change by using common UNIX commands such as psrinfo and prtconf | grep -i memory. Also use the SMS commands such as showplatform and showdevices. All the DR messages will be logged in the /var/adm/messages file for the domain involved in the DR operations.
Is the domain involved in the DR operations running a database that contains ISM segments?
Using Dynamic Reconfiguration With Oracle Databases
Many applications that run on the Sun Fire 15K/12K servers use a database (primarily, an Oracle database). When using DR to move system board resources on a domain running Oracle, domains must be configured with sufficient physical memory and swap space to contain the memory-resident components of both the new (attached) database image and the old (detached) images. If sufficient physical memory is not in place at the time of the DR operation, the operating system will return an error. This occurs because the Solaris OE cannot obtain enough free memory within the domain to move the intimate shared memory (ISM) segments out of the CPU/memory board where memory is being drained. In addition to moving ISM segments, the domains must also be able to store the kernel, Oracle instances, and shared memory segments. Therefore, when designing a domain for DR, make sure that the smallest set of system boards to be used for a domain running an instance of Oracle can contain the kernel, Oracle shadow processes, ISM segments, and shared memory (SGA) segments.
Oracle databases use intimate shared memory, this allows a database to share data within the shared memory space between processes. It also locks shared memory used by the database so it will not be swapped out. Therefore, because ISM segments cannot be paged out during the drain operations, they must be relocated to other physical memory on a remaining system board. You can identify the location of ISM segments using the cfgadm command, where they are reported as permanent memory. Sufficient memory for the ISM segments must remain in the domain after the CPU/Memory board is detached for the DR operation to be successful. After you relocate ISM, Oracle will lock pages involved in I/O operations, making them inaccessible to a DR operation until the I/O completes. Be aware this will slow down the progress of a DR operation on a heavily loaded system.
The combination of Oracle 9i and the Solaris 8 OE or the Solaris 9 OE has a unique feature when used in combination with DR. When memory is added by a DR operation, Oracle instances can dynamically recognize additional memory within the domain without having to restart the Oracle instance. This is accomplished by the Oracle 9i feature called Dynamic SGA and the added support within the Solaris 8 OE for dynamic intimate shared memory (DISM). The key point is that, depending on the Solaris and Oracle versions being used, you might not have to restart some applications before they will recognize the added resources provided by the DR operations.
Implementing Dynamic Reconfiguration Procedures
When detatching components from a domain, you must consider the implications to the application. Before attempting the DR operation, all DR activities should be fully tested with standard testing procedures and should be well documented. Be sure to configure the test domain with the same technologies and applications used in the production environment to simulate the production domain as closely as possible. It is critical that you strictly adhere to testing, because the detach operation must pause the operating system and drain memory pages being used by the operating system and applications, thereby temporarily suspending CPUs, processes, and devices. The testing must show that the pausing and draining operation during detach does not adversely affect the applications and operating system on the domain. The testing must also validate that the applications running in the domain where a board is being removed can run effectively with less memory, CPU resources, or I/O bandwidth. Special provisions might be required for DR operations with VERITAS Cluster Server and Sun Cluster_ software, and you should test these procedures as well. This might mean that domains that are part of an active cluster will need to be taken out of the cluster during the DR operation. Also, testing should validate that the boards being added to a domain are reliable. Therefore, the domain used for testing DR operations should test the exact set of boards being proposed in the DR attach operation to the production domain.