Home > Articles > Operating Systems, Server > Microsoft Servers

Building the Foundations for a Highly Available Windows Server Architecture

Jan 14, 2005

📄 Contents

␡

⎙ Print

< Back Page 2 of 14 Next >

This chapter is from the book 

Building High Availability Windows Server 2003 Solutions

Learn More Buy

Windows Clustering 101

There was a time in the not-too-distant past when the thought of clustering Windows servers sent a chill down the spines of network engineers and caused them to go take out long-term care insurance. Those days are gone with the clustering services that are now built into the Windows Server 2003 operating system. Only Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition can create clusters. Windows Storage Server 2003 is a version of Enterprise Edition for clustering file share resources.

There are two parts to clustering a high-availability service or application, such as Exchange 2003 or SQL Server 2000 or SQL Server 2005. The first part entails setting up the base cluster service and getting a virtual server going. The second part entails creating the resources that failover on that virtual server. Most of the second part of clustering is dealt with in Part II of this book.

By the time you are ready to cluster Exchange or SQL Server, you will be able to failover the virtual server resources from one node to the other and keep services, like drives and network interface cards, under the control of the cluster.

The Cluster Model

With Windows Server 2003, you have three models from which to choose; they are built into the operating system and are, thus, supported by Microsoft. The third option may require third-party software. Table 6.1 discusses the models in order of increasing complexity.

The most common cluster model (and inherited from Windows 2000 and Windows NT) is the single quorum cluster model in which multiple nodes of a cluster share a single quorum resource. In this model, all nodes communicate with each other across a local interconnect, and all nodes share a common disk array (in a SAN or a SCSI enclosure).

Windows Server 2003 also introduces the concept of a single node cluster, which is a cluster that is comprised of a single node or server. For obvious reasons, a single node cluster runs host cluster resources, but the cluster resources cannot fail-over to anything.

Then there is the geographic cluster or so-called "geo-cluster" in which the nodes that comprise the cluster are separated over a geographic divide. A wide area network usually separates the nodes and the geo-cluster nodes can be in different buildings or even across the country. They don't share storage or a quorum.

The central repository of data in a cluster is the so-called quorum resource. You can think of the quorum as the brain center of the cluster. The idea of a cluster is to provide system or server redundancy. In other words, when a server in the cluster fails, the cluster service is able to transfer operations to a healthy node. This is called failover. The quorum resource data is persistent and the quorum must survive node failure in the cluster or the resources cannot fail to the healthy node and start up.

This is why in a traditional, single quorum resource cluster, the quorum cannot be mounted into any single device on the node of the cluster unless the cluster can gain exclusive access to the device (and unless it can be moved or transferred upon node failure, which is technically possible even on a local disk resource as we will soon see). There are two exceptions to this rule: the single node cluster and the so-called geo-cluster, a concept in clustering now possible with Windows Server 2003.

Each of the cluster models discussed employs a different quorum resource type. Table 6.1 discusses the models.

Table 6.1 Cluster Model Options

Cluster Model	Application	Location of Cluster Configuration Data
Single Node	Ideal for labs, testing, development, and hosting applications on a virtual server	The quorum resource maintains the cluster configuration data either on a cluster storage device (an external drive array) or as a local drive on the node. Setup requires selection of the Local Quorum resource type.
Single Quorum	Typical local Active-Passive and Active-Active clusters	The quorum resource maintains the cluster configuration data on the single cluster storage device to which all nodes in the cluster are connected. Setup of this model ratifies the Physical Disk resource type (or other storage class resource type). The cluster installation will fail if this resource time does not test true as a viable quorum (we demostrate this later in the chapter).
Majority Node Set	Geographically dispersed server clusters	Geographic clusters are separated over wide area networks; therefore, each node maintains its own copy of the cluster configuration data. The quorum resource ensures the cluster configuration data is kept consistent across the nodes.

Single Node

Of particular interest is the Single Node cluster model in which the quorum resource can be maintained on a storage device on the local node. The idea behind the single node model is novel. With previous versions of the operating system, it was impossible to establish a virtual server, what users attach to, on a cluster comprising only one node. The single node cluster enables this. The Single Node cluster model is illustrated in Figure 6.1.

NOTE

Note: This chapter covers the creation of a single quorum cluster. However, we do touch on the subject of geo-clusters in Chapter 10, "High Availability, High-Performance Exchange."

You can use the single node cluster for lab testing of applications that have been engineered for clustering. You can also use it to test access to storage devices, quorum resources, and so on. The lab or development work is, thus, used to migrate the cluster-aware application into production as a standard single quorum cluster. It is also possible to simply cluster the single node with other nodes at a later time. The resource groups are in place and all you need to do is configure fail-over policies for the groups.

Figure 6.1 Single Node cluster model.

A single node cluster can also be used to simply provide a virtual server that users connect to. The virtual server service and name, thus, survives hardware failure. Both administrators and clients can see the virtual servers on the network and they do not have to browse a list of actual servers to find file shares.

What happens when the server hosting the single node cluster and the virtual server fail? The Cluster service automatically restarts the various application and dependent resources when the node is repaired. You can also use this service to automatically restart applications that would not otherwise be able to restart themselves.

For example, you can use this model to locate all the file and print resources in your organization on a single computer, establishing separate groups for each department. When clients from one department need to connect to the appropriate file or print share, they can find the share as easily as they would find an actual computer.

You can move the virtual server to a new node and end users never know the physical server behind the virtual server name has been changed. The real NetBIOS name of the server is never used. The downside of this idea is downtime. Moving the virtual server name to a new server requires downtime. Therefore, this is not suitable for a high-availability solution.

Single Quorum Cluster

This cluster model prescribes the quorum resource maintains all cluster configuration data on a single cluster storage device that all nodes have the potential to control. As mentioned earlier, this is the cluster model available in previous versions of Windows. The Single Quorum cluster model is illustrated in Figure 6.2.

Microsoft discounts the perception that the cluster storage device can be a single point of failure and promotes the idea that a Storage Area Network (SAN) where there are often multiple, redundant paths from the cluster nodes to the storage device mitigates in favor of this solution. While not discounting this model, if you study how a SAN is built, you discover there is some truth that a SAN is a single point of failure.

You can indeed have multple paths to the storage device (the "heart" of the SAN) as discussed in Chapters 3, "Storage for Highly Available Systems," and 4, "Highly Available Networks." However, the SAN controller is really nothing more than a server with an operating system that is dedicated to hosting the drive arrays in its enclosures. Unless you have redundant controllers, your SAN will fail if a component in the SAN controller fails. SAN memory can fail, its operating system can hang, the processors can be fried, and so on. Thus, to really elimimate every single point of failure in this model, you really need to have two SANs on the back end. This idea really opens a can of worms. After all, most IT shops do not budget for two SANs for every cluster. The SANs of today have many redundant components within their single footprint (usually a very large footprint) in the data center. To deploy two-mirrored SANs on a cluster is not only a very expensive proposition, but it is technically very difficult to install and manage.

Figure 6.2 Single Quorum cluster model.

Majority Node Set

As mentioned, geo-cluster nodes can reside on opposite sides of the planet because each node maintains its own copy of the cluster configuration data. The quorum resource in the geo-cluster is called the Majority Node Set resource. Its job is to ensure the cluster configuration data is kept consistent across the different nodes; it is essentially a mirroring mechanism. The Majority Node Set cluster model is illustrated in Figure 6.3.

The quorum data is transmitted unencrypted over Server Message Block (SMB) file shares from one node to the other. Naturally, the cluster nodes cannot be connected to a common cluster disk array, which is the main idea behind this model.

Figure 6.3 Majority Node Set cluster model.

You can use a majority node set cluster in special situations, and it will likely require special third-party software and hardware offered by your Original Equipment Manufacturer (OEM), Independent Software Vendor (ISV), or Independent Hardware Vendor (IHV).

Let's look at an example. Let's say we create an 8-node geo-cluster. We could, for example, locate four nodes in one data center, say in Atlanta, and the other 4 nodes in another data center in Phoenix. This can be achieved, and you can still present a single point of access to your clients. At any time a node in the geo-cluster can be taken offline, either intentionally or as a result of failure, and the cluster still remains available.

You can create these clusters without cluster disks. In other words, you can host applications that can failover, but the data the application needs are replicated or mirrored to the quorum data repositories on the other nodes on the cluster. For example, we can use this model with SQL Server to keep a database state up-to-date with log shipping. In Chapter 10, we investigate the particular solutions offered by NSI Software: Double-Take and GeoCluster.

The majority node set is enticing, but there are disadvantages. For starters, if more than half the nodes fail at any one time, then the entire cluster itself fails. When this happens, we say the cluster has lost quorum. This fail-over limitation is in contrast to the Single Quorum cluster model discussed ealier which will not fail until the last node in the cluster fails.

The Quorum Resource

Every cluster requires a resource which is designated as the quorum resource. The idea of the quorum is to provide a place to store configuration data for the cluster. Thus, when a cluster node fails, the quorum lives to service the new active node (or nodes) in the cluster. The quorum essentially maintains the configuration data the cluster needs to recover.

This data in the quorum is saved in the form of recovery logs. These logs store the changes that have been saved in the cluster database. Each node in the cluster depends on the data in the cluster database for configuration and state.

A cluster cannot exist without the cluster database. For example, a cluster is created when each node that joins the cluster updates its private copy of the cluster database. When you add a node to the existing cluster, the Cluster service retrieves data from the other active nodes and uses it to expand the cluster. When you create the first node in a cluster, the creation process updates the cluster database with details about the new node. This is discussed in more detail in the section "Clustering" later in this chapter.

The quorum resource is also used by the cluster service to ensure the cluster is composed of an active collection of communicating nodes. If the nodes in the cluster can communicate normally with each other (across the cluster interconnect), then you have a cluster. Like all service databases on the Windows platform, the cluster database and the quorum resource logs can become corrupt. There are procedures to fix these resources and we cover this a little later in this chapter.

When you attempt to create a cluster, the first node in the cluster needs to gain control of the quorum resource. If it cannot see the resource (this quorum), then the cluster installation fails. We show you this later. In addition, a new node is allowed to join a cluster or remain in the cluster only if it can communicate with the node that controls the quorum resource.

Let's now look at how the quorum resource is used in a two-node cluster, which is the type of cluster we will in the coming chapters.

When the first node in the cluster fails, the second node continues to write changes to the cluster database that it has taken control of. When the first node recovers and a fail-back is initiated, then ownership on the cluster database and quorum resource is returned in the fail-back mechanism.

But what if the second node fails before the first is recovered? In such a case, the first node must first update its copy of the cluster database with the changes made by the second node before it failed. It does this using the quorum resource recovery logs.

If the event the interconnect between the nodes fails, then each node automatically assumes the other node has failed. Typically, both nodes then attempt to continue operating as the cluster, and what you now have is a state called split brain sydrome. Imagine both servers succeeded in operating the cluster, you would then have two separate clusters claiming the same virtual server name and competing for the same disk resources. This is not a good condition for a system to find itself in.

The operating system prevents this scenario with quorum resource ownership. The node that succeeds in gaining control of the quorum resource wins and continues to present the cluster. In other words, whoever controls the "brain" wins. The other node submits, the fail-over completes, and the resources on the failed node are deactivated.

What constitutes a valid quorum resource? The quorum can be any resource that meets the following attributes:

It can be accessed by a single node that must be able to gain physical control of it and defend the control.
It must reside on physical storage that can be accessed by any node in the cluster.
It must be established on the NTFS file system.

It is possible to create custom resource types as long as developers meet the arbitration and storage requirements specified in the API exposed by the Microsoft Software Development Kit. Let's now look at some deployment scenarios.

Deployment Scenarios

Let's discuss some example deployment schemes, namely the n-node fail-over scheme, the fail-over ring scheme, and the hot-standby server scheme.

In the n-node fail-over scheme you deploy applications that are setup to be moved to a passive node when the primary node on a 2-node cluster fails. In this configuration, you limit the possible owners for each resource group. You will see how we do this in Part II of this book.

Let's consider the so-called N+I hot-standby server scheme. Here you reduce the overhead of the 2-node failover by adding a "spare" node (one for each cluster pair) to the cluster. This provides a so-called "hot-standby" server that is part and parcel of the cluster and equally capable of running the applications from each node pair in the event of a failure of both of the other nodes. Both of these solutions are called active/passive clusters—n-node and n-node+1 (or N+1).

As you create the N+1 mode cluster, you will discover it is a simple matter to configure as the spare node. How you use a combination of the preferred owners lists and the possible owners list depends on your application. You typically set the preferred node to the node that the application runs on by default; and you set the possible owners for a given resource group to the preferred node and the spare node.

Then there is the concept of a Failover Ring. Here you set up each node in the cluster to run an application instance. Let's assume we have an instance of SQL Server on each node of the cluster. In the event of a failure, the SQL Server on the failed node is moved to the next node in sequence. Actually, an instance of SQL Server is installed on every server. Fail-over simply activates the SQL Server instance, and it takes control of the databases stored on the SAN or SCSI array. We call this the Active-Active cluster.

You can also allow the server cluster to choose the failover node at random. You can do this with large clusters and you'll just not define a preferred owners list for the resource groups. In other words, each resource group that has an empty preferred owners list is failed over to any node in random fashion in the event that the node currently hosting that group fails.

We will leave the clustering subject now and return to the creation of the infrastructure to support our clusters.

< Back Page 2 of 14 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address