Best Practices
This section introduces the best practices for planning and implementing DR.
Architect Phase
This section describes the best practices for architecting the DR. For requirements, refer to the first part of this two-part article series "Dynamic Reconfiguration for High-End Servers: Part 1: Planning Phase."
Architect for DR From the Start
To be able to perform DR operations, the domain must be properly configured for DR. Always architect the Sun Fire 15K/12K platform for DR from the start. These include configuring duplicate boards with the same amount of memory and CPU resources. If practical, configure an additional hot-spare CPU/Memory board into the Sun Fire 15K/12K platform. Such a "floating" board can be attached to a domain before a board is to be detached from that domain. This best practice helps to lessen system memory impact.
Operating System Readiness
Always duplicate critical slot 1 resources. The lowest numbered I/O board hosting the golden Input/Output Static Random Access Memory (IOSRAM) and the domain console Ethernet port are critical resources. Consider configuring a domain with two hsPCI boards. Install the boot device and primary domain network interface in the lowest numbered hsPCI in the domain.
Duplicate the hsPCI cards for IPMP/Sun StorEdge Traffic Manager Software and pre-test host bus adapters. Consider duplicating hsPCI cards for critical resources, for example, boot disk and production network. Use hsPCI cards tested in known and good condition, because HPOST does not test I/O adapters.
Configure swap space as multiple partitions or files on disks attached to controllers hosted by different I/O boards. This practice allows any swap partition to be easily replaced using the swap command.
Security Readiness
Use appropriate available component lists (ACLs) for each domain, and set them up to protect hardware resources between domains. Create a separate userid login account for each domain's administrator with appropriate platform and domain privileges so that the administrator does not rely on the sms-svc account each time a DR operation is performed from the SC (for example, addboard, deleteboard, and so on). This practice is important because sms-svc overwrites everything, including the ACL setup of each domain.
Application Readiness
Size the application memory appropriately. When using ORACLE databases, consider the relationship of total memory size versus System Global Area (SGA) requirements. Smaller database SGAs are easier to move.
Configure applications for the pause (OS quiescence) required to test components.
Device Readiness
All I/O devices need the full DDI support. As a best practice, spread memory equally across all boards.
Implement Phase
This section describes best practices for the implementation phase.
Sun Fire 15K/12K Server Installation
Ensure that all requirements regarding the hardware, software, and firmware presented in the first part of this two-part article series "Dynamic Reconfiguration for High-End Servers: Part 1: Planning Phase" are satisfied.
Full Testing and Verification
As part of the Enterprise Installation Standards (EIS) process, basic DR tests are performed on each Sun Fire 15K/12K server installation. These basic DR tests are comprised of a DR detach of the CPU/Memory board with kernel memory and a subsequent DR attach of that board. While these tests are basic, they can verify the correct hardware, software, and firmware settings. After the EIS installation, further modifications are performed on the domains, for example, applications or security hardening.
It is best practice to test DR operations in a test domain before implementing them in a production environment. The test domain should be configured with the same hardware, software (OS including patches), firmware, and applications used in the production domain.
Always test the DR operations regularly to make sure they are working properly, in particular after configuration changes have been made to the platform, for example, patch updates. A quick and good verification is to perform a DR detach operation of the CPU/Memory board containing the kernel memory, followed by a DR attach of the board.
Documentation
All DR operations and procedures should be well documented (for example, in a runbook) for troubleshooting the domain DR processes.
Integration Into Change Management
After testing and documenting all DR operations, which are to be executed in the production environment later, integrate the DR process into the existing change management system.
Manage Phase
This section describes best practices for the manage phase.
What To Avoid
Do not attempt to DR detach CPU and memory resources from an already busy system. Schedule DR detach operations during non-peak hours, if possible. Also, configuring non DR-safe devices or drivers might cause the DR operation to fail if the OS needs to quiesce.
Continuous Change Management and Verification
DR is a major availability feature of the Sun Fire 15K/12K servers, providing significant system uptime benefits. To leverage these benefits, ensure that DR is working properly. Whenever a change in the domain is performed, it is strongly recommended that you verify the functionality of the DR operations. For example, a DR detach of the CPU/Memory board containing the kernel permanent memory followed by a DR attach of the same board is, in most cases, sufficient to verify the proper functioning of DR.
Increase User Acceptance
Always use and encourage friendly DR tools. Sun Management Center software for DR is recommended. It simplifies the DR process and is less error-prone than other tools. SMS addboard and deleteboard commands are easier to use than the Solaris cfgadm command, and they are generally more accepted than the cfgadm command.