Campus Clusters Based on Sun Cluster Software
- Introduction
- Technology Options for Disaster Recovery Solutions
- Quick Checklist for Deployments
- Campus Cluster Maximum Distances
- Campus Cluster Topologies and Components
- Campus Cluster Configurations
- Performance in a Campus Cluster Environment
- Management Aspects of Campus Clusters
- Glossary
- Related Resources
This article highlights key technologies involved in spreading a SunPlex™ environment (enabled by Sun Cluster 3.0 software) across a company campus or distributed sites. It describes the processes that need to be incorporated by data centers to leverage the capability of campus clusters.
This article is targeted at IT architects and technical staff who want to understand, evaluate, and address single-site level disaster recovery for their data centers.
This article contains the following topics:
"Introduction"
"Technology Options for Disaster Recovery Solutions"
"Quick Checklist for Deployments"
"Campus Cluster Maximum Distances"
"Campus Cluster Topologies and Components"
"Campus Cluster Configurations"
"Performance in a Campus Cluster Environment"
"Management Aspects of Campus Clusters"
"Glossary"
"Related Resources"
Introduction
Disasters do not happen often, but when they do occur, they are likely to have a significant impact on business in terms of lost revenue and service availability. Ensuring business continuity requires that enterprises deploy a multifaceted solution that includes several levels of disaster prevention and recovery technologies and well-documented procedures.
As part of a comprehensive, flexible, and scalable disaster recovery solution, campus clusters based on Sun Cluster 3.0 and newer versions can help protect service availability. With the SunPlex environment, enterprises can deliver higher service levels while helping to protect their critical business services from unavoidable risksfrom small interruptions, such as power failures, to major catastrophes such as earthquakes and fires.
Yet technology alone does not address all aspects of continuous service availability. While most enterprises deploy some type of disaster recovery technology to protect against hardware failures or isolated incidents, protecting against a major catastrophe requires a well-planned, comprehensive solution. To ensure the highest levels of business continuity, enterprises must invest in three essential componentspeople, processes, and products. A well-trained staff armed with thoroughly tested procedures and a robust cluster infrastructure such as Sun Cluster 3.0 is the best defense against detrimental service interruptions.
Cluster Evolution
The concept of clustering two or more redundant servers and related storage arrays was originally introduced to ensure higher levels of availability in mission-critical or compute-intensive environments. These original clusters were expensive to manage, complex to administer, and difficult to extend as needs changed. Consequently, their use was limited. As high-end servers became more affordable and more widely used by enterprises of all types, clustering technology evolved to provide much greater flexibility, scalability, and manageability with increasing levels of service availability.
Local clusters (for example, clusters where all of the nodes1 and storage subsystems are at the same site) play a major role in achieving business continuity by providing a solid level of continuous service availability. In the early days of clustering technology, share storage subsystems were usually attached using SCSI technology. Due to technology limitations, the maximum distance between cluster nodes was limited by the maximum cable length between a server, the shared storage, and the other server, which could not exceed 50m.
While this configuration offers good protection against events such as node disk crashes, it does not protect against events that could destroy or damage the facility site.
With the advent of Fibre Channel technology, it became possible to replicate data over much greater distances. Now, enterprises can deploy cluster nodes and storage in different buildings or even at different sites without changing the software infrastructure, applications, data, and storage subsystems, thus building extended clusters.
Cluster Limitations
One drawback of long wires between nodes or storage is increased latency, which can decrease performance dramatically. When latency increases because of longer distances (for example, across a country), other types of data replication (for example, asynchronous mirroring) and other types of high availability or disaster recovery solutions (for example, a cluster of clusters) needed to be developed to solve these problems.
Although extended clusters offer significant protection against disasters, they are not a complete disaster recovery solution. A cluster that has only one logical copy of data is still vulnerable against inconsistencies that might be introduced by faulty software or hardware, even if that data is mirrored. Common user errors such as erroneously deleting database tables may cause a major disaster. In those cases, tape backup or some other up-to-date copy of data is invaluable for recovery.
Even cluster software can fail, especially in the case of a major disaster affecting the cluster infrastructure. For example, a campus cluster where all the nodes are located within a few kilometers may be subject to a major earthquake, knocking out utilities or otherwise affecting its operation. To protect against this possibility, most enterprises deploy a multifaceted solution to ensure continuous service availability.
People, Processes, and Products
Campus clusters are one of the best examples where people, processes, and products must work together for the solution to deliver its maximum benefits. The entire integrated stack of products, from servers and storage subsystems to the operating environment and clustering software, forms only the base of a highly available campus cluster infrastructure. Well-trained, dedicated people must then administer the infrastructure. Processes that cover all aspects of disaster prevention and recovery must be in place. A well-prepared enterprise not only deploys a comprehensive solution, it verifies its processes, trains its staff, and tests the technologies regularly (at least annually). When personnel are trained, best practices implemented, and sound technologies deployed, companies can deliver the high levels of service continuity required to remain competitive in today's economy.
SunPlex Environment
Built around the Sun Cluster 3.0 solution, the Solaris™ Operating Environment (Solaris OE), and Sun™ server, storage, and network connectivity products, the SunPlex environment helps increase business service levels while decreasing the costs and risks of managing complex enterprise networks. Through the SunPlex environment, devices, file systems, and networks can operate seamlessly across a tightly coupled pool of resources, making it easy to deploy extended or campus clusters without changing the underlying infrastructure or applications. A campus cluster based on Sun Cluster 3.0 software is a cluster where nodes are separated by distance in at least two sites.
Sun Cluster 3.0 software is designed to protect against single hardware or software failures such as node crashes or service interruptions. For greater reliability and performance, Sun Cluster 3.0 software is tightly integrated with the Solaris OE. This integration speeds up error detection time and makes the whole software stack more robust.
Depending on the failure, Sun Cluster 3.0 software either fails over the affected services to another node in the cluster or tries to restart them. In either case, the software's highest priority is to maintain data integrity regardless of what happens. This requirement drives the layout of the infrastructure and all of the algorithms in the product. This requirement is the reason why in certain disaster scenarios it might be necessary to initiate recovery procedures manually, in order not to jeopardize data integrity.
Standard monitoring agents are available for many best-of-breed databases and ERP applications. Agents for other services can be developed and deployed using either sophisticated APIs or easy-to-use utilities such as the SunPlex Agent Builder tool.
The Sun Cluster 3.0 software framework and associated algorithms do not change when deployed in a campus cluster. Service availability with data integrity is the primary goal. Depending on the actual requirements, the Sun Cluster 3.0 solution can form an excellent base for a disaster recovery solution, especially when combined with additional technologies, trained personnel, and well-developed management processes.