- Recommendations for Applying Preferred Practices
- Principals of Mission-Critical Implementations
- Physical Environment
- Internal Network Planning
- External Network Planning
- System Controller Configuration
- Platform and Domain Administration
- Security
- Error Analysis and Diagnosis
- Platform and Domain Configuration
- Dynamic Reconfiguration
- References
- Related Resources
Principals of Mission-Critical Implementations
As a general rule, Sun Fire 15K/12K servers are designed to run large, mission-critical applications that require very high levels of availability, recoverability, serviceability, and manageability. Keep in mind that even the most advanced and well tested technologies can and will fail, whether the cause is from software, hardware, or operational procedures.
Consider the following basic design principals when using Sun Fire 15K/12K servers to achieve a successful implementation.
Configure these servers such that there is no single point of failure, and implement simple and fast recovery procedures in case built-in redundancy fails. For example, design components with redundancy, and design procedures that support a simple and fast recovery in the event that redundancy and failovers do not work.
Design the platform infrastructure with the mind-set that maintenance windows might be non-existent or very minimal. Therefore, design the platform so maintenance and servicing can be done when systems are online.
Put in place low-level infrastructure and services that have very high availability (for example, 99.99 percent). This includes local area networks (LANs), wide area networks (WANs), naming and directory services, network file services, power, cooling, and storage area networks (SANs).
Select standard solutions for backup, recovery, storage, SAN, and network. Avoid having too many one-off cases for applications when building the platform infrastructure. Minimize the number of builds, platforms, versions, and tools used.
Choose technologies that are well established in the industry and well supported by their vendors. Avoid in-house customized solutions that must be maintained internally by the system administrators and IT staff. Avoid bleeding edge technologies that do not have a track record.
Choose technologies and designs that are well understood and manageable by your IT staff. Design architectures that are simple, and reduce complexity whenever possible.
Invest in areas that have a big effect on availability and quality of service. For example, invest in operation management, testing procedures, on-site spares, high-end service contracts, IT staff training, change management, monitoring tools, and performance tools.
Implement security procedures and utilize security hardening toolkits to protect the servers' network access points and services. Test security around the Sun Fire platforms with intrusion detection software and audits. Use established products and procedures for implementing security.
Develop, maintain, and test online operational procedures, and implement monitoring software from well established vendors.