Introduction
The motivation of this article is to help application owners maintain continuous delivery of service for single-server applications on the cloud. Specifically, we focus on using workload migration to reduce downtime due to planned maintenance, which is the main contributor to downtime for single-server cloud applications. In particular, we describe tools and techniques as well as best practices to minimize service disruption while an application or infrastructure is under maintenance.
When its primary system is unavailable, commonly due to service provider or virtual-machine owner maintenance, the services provided by an application can be moved to a secondary location, in the practice known as workload migration. Migration consists of moving an application to a secondary location that has previously been configured in the same way as the origination system; this includes ensuring that the target system has data that is complete and in a consistent state with that of the origination system. Workload migration on the cloud differs from high availability (HA) and disaster recovery (DR) setups, in that workload migration activities for these applications are primarily for planned reasons.
We can divide applications into three kinds based on business context, as shown in Table 1.
Table 1
Application characterization.
Application Type |
Typical Topology |
Maintenance Goals |
Lightly used |
Usually only a single server |
Migrate application to a secondary location during a maintenance activity, avoiding the inconvenience caused by maintenance outage |
Heavily used |
High availability (cluster) and disaster topologies (active-passive) |
Use DR site as secondary during maintenance at the primary site |
Mission-critical |
Multiple active servers distributed globally |
Traffic management to the globally distributed systems ensures zero downtime due to maintenance or individual system failure |
The primary focus of this article is applications that are relatively lightly used. Migration on the cloud gives these applications an economical way to continue to operate through maintenance windows, similar to the migration capabilities that heavily used and mission-critical applications enjoyed in the past.
Further distinctions can be drawn between single-server and heavily used or mission-critical applications:
- High availability primarily focuses on maintaining availability in the event of a hardware failure. There's some overlap in the set of tools used, but many HA tools don't work over a high-latency network connection, such as connecting over the Internet to a different geographic area. For example, WebSphere clustering needs to operate over a low-latency network.
- Disaster recovery is a scenario with goals and sets of tools similar to those for application migration. Since DR tools and literature focus on mission-critical applications, however, this article focuses on applications that are less critical and methods that are more economical.
The intent of this article is not to enable lightly used applications to include HA or DR concepts in the case of unplanned outage or disaster recovery, as heavily used and mission-critical applications require. Heavily used applications running on the cloud can achieve disaster recovery at a separate site in case of outage, and the cloud enables mission-critical applications to run in a multi-node, active-active, geographically distributed topology to address high availability, disaster recovery, and maintenance simultaneously. However, both heavily used and mission-critical topologies incur greater costs for development and infrastructure to run.
The conceptual diagram in Figure 1 shows the major parts of migrating a lightly used application to avoid a maintenance window.
Figure 1 Schematic diagram of migration concepts.