- Understanding the Model
- Identifying Key Components
- Applying Architectural Principles
- Applying the Model to FijiNet
Applying Architectural Principles
Supporting key components of the sample ISP architectural model are architectural principles, as shown earlier and again in FIGURE 6. Architectural principles are major design considerations that help you qualify advantages and disadvantages of each design option, so that you arrive at a solution that best fits business requirements, functional requirements, and available technology.
FIGURE 6 ISP Architectural Model
We categorize architectural principles into eight areas: scalability, availability, reliability, manageability, adaptability, security, performance, and open system. Although there are other design principles you might need to use or consider, we focus on these as most critical.
Consider each of these principles (and any others that apply) when evaluating design issues and trade-offs for key components. For example, apply scalability to different layers within an architecture. You could address it at the network, system, and application layers. Failing to address scalability at each layer could result in nonoptimal scalability for an architecture.
Ultimately, some architectural principles may not apply to your design. However, it's important initially to consider them as part of the design process, especially for large-scale environments with higher levels of complexity. For example, if cost is a significant design constraint, then adding expensive layers of redundancy to enhance availability is most likely not applicable.
Scalability
Scalability is the ability to add additional resources, for example, routers, switches, servers, memory, disks, and CPUs to an architecture without redesigning it. A good design takes into account the need for scalability so that, within reason, as a business grows and user demand increases, new computing resources can be added on demand. Some customers have a clear idea of their plans for growth and indicate such at the beginning, while others may need you to suggest and build in scalability, based upon your interpretation of their current and future business requirements.
When you address scalability, we recommend using the following scaling models, depending upon which one is applicable to your design. These are simplified models that address scaling for both hardware and software at the same time during the architecture design process.
TABLE 1 Scaling Model for Servers
Scaling Model |
Vertical |
Horizontal |
System Type: |
Single Large System |
Multiple Small Systems |
Software Type: |
Multithreaded applications |
Single-threaded applications |
To Scale: |
Add CPU, memory, disk, and I/O |
Add additional systems |
Both models apply to key components. Each major component within an infrastructure, for example, network, system, application, storage, etc., has its own scaling model.
Vertical Scalability
Multithreaded applications are more complex in their scaling model. Typically, the first line of scaling within a single system for a multithreaded application within a single system is to achieve the maximum vertical scalability by adding more resources such as CPU, memory, and I/O. Vertical scaling is appropriate for applications that scale well within a single large server, such as database servers.
TIP
Scale multithreaded applications vertically first. When maximum vertical scaling is achieved, scale the same applications using horizontal scaling techniques, for example, running the applications on multiple boxes behind a load balancer.
Horizontal Scalability
For single-threaded applications, the model for scaling is horizontal. In this model, a vertical scaling limitation of the server is replaced with a much more scalable load distribution paradigm. This technique is deployed at a system level by adding more servers to increase scalability.
TIP
Unlike multithreaded applications, single-threaded applications do not achieve optimal benefits from vertical scaling. For example, adding more memory benefits single-threaded applications; however, adding another CPU does not. Scaling horizontally can be done by running multiple instances on multiple boxes behind a load balancer.
In contrast to availability, which is designed for failover, the purpose of multiple system redundancy in scalability is to provide a model for adding resources to increase capacity.
Availability
Availability has many definitions within Internet architectures. In this book, it means that resources and access to those resources are available upon request. Availability design is predicated on the removal of any single point-of-failure within an architecture to ensure a desired level of uptime. This uptime is usually expressed in percentages and often referred as the "number of 9s." For example, most mission critical systems have a desired uptime of "five 9s," meaning that the system is available 99.999 percent of the time.
TABLE 2 Availability Levels
Uptime Percentage |
Nines |
Allowable Downtime Per Month |
99.9999 |
6 |
0.043 minute |
99.999 |
5 |
0.43 minute |
99.99 |
4 |
4.30 minutes |
99.9 |
3 |
43 minutes |
99 |
2 |
7.2 hours |
We determined allowable downtime by using the following formula:1
Availability= MTBF MTBF + MTTR
where MTBF is mean time between failure and MTTR is mean time to repair.
For marketing reasons, many ISPs calculate the level of availability over a 12-month period instead of monthly. (This practice yields an overall higher average level of availability than calculating it monthly, because monthly calculations fluctuate from month to month.)
We calculate the availability monthly because system administrators typically perform maintenance monthly; therefore, monthly calculations are more beneficial for determining allowable downtime to perform maintenance and upgrades. This practice is fairly universal for system administrators of ISPs. Other reasons for calculating it on a monthly basis:
Revenue, usage, stats, spending, etc. are done monthly.
Waiting for one year to find out the level of availability is unrealistic.
A primary attribute of availability design is redundant hardware/software within the architecture, such as network, server, application, and storage.
TIP
Design in such a way that if a component fails, it does not cause the entire architecture to fail. To achieve this design objective, design using a modular approach, allowing components to be replaced at any time without affecting the availability of the system.
The four layers, covered in the following paragraphs, are as follows:
- Network layer
- System layer
- Application layer
- Data layer
Network Layer
At the network layer, availability can be achieved with redundant physical links to the Internet. This redundancy ensures that if there is a link failure, for example, due to hardware failure, access is still available via a surviving link. In addition, redundant network components such as routers, switches, load balancers, and firewalls are necessary to ensure access availability in the event of hardware failure. To enhance reliability at the network layer, remove all single points-of-failure from the network.
NOTE
For the Solaris_ Operating Environment (Solaris OE), IP multi-pathing (IPMP) can be used to achieve redundant network connections from the same server to multiple switches.
System Layer
At the system layer, availability is achieved with redundant servers in stand-alone or cluster configurations.
For front-end servers such as those deployed in web farms, you can use load balancers to ensure availability in the event that one or more servers fail to respond to service requests.
In a cluster environment, two or more servers are configured to provide high availability. The number of nodes configured in a cluster is dependent upon the software and hardware. If one server fails, one of the surviving servers takes over and responds to service requests.
A fundamental difference between stand-alone servers and clustered servers is the ability to maintain session states. If a stand-alone server fails while a session is active, the connection has to be reestablished from the client. However, if a clustered server fails, the session state and connection is maintained by a standby server.
NOTE
The cost of redundant servers and software licensing is extremely expensive for small- to mid-size ISPs. However, without it, ISPs may lose subscribers and revenue to competing ISPs because of subscriber dissatisfaction from service interruptions. Subscriber expectations for availability and reliability are usually high, and many competitors already offer high availability and reliability.
Application Layer
At the application layer, availability can be achieved with clustering and high availability software. You can configure applications with clusters or high availability to enhance availability in the event of service failure. Service failure and restart can be automatically invoked through service failure detection and monitoring. Also, you can enhance availability at the application layer by using a load balancer with multiple servers.
Data Layer
At the data layer, availability can be achieved with redundant storage arrays coupled with logical volumes. Redundant storage arrays allow data to be accessible in the event of a controller or storage array failure. Logical volumes and RAID (redundant array of independent disks) ensure data is accessible in the event of disk failure.
At the data layer, RAID 0+1 (stripping and mirroring) or RAID 5 (stripping with parity) achieves availability and reliability in case of disk failure. RAID 0+1 is a more expensive solution because twice the hardware (storage arrays and disks) is needed. However, the advantage is that no performance degradation occurs due to a disk failure. RAID 5 can have performance degradation if a disk fails, because data has to be rebuilt from parity.
Reliability
Reliability is best defined from the perspective of end users. Users want network services and servers to be available when they access them. Reliability for them is consistency of service uptime and availability. To users, a system is reliable when they do not frequently encounter busy signals on their modems, network connection error messages, etc.
From an architect's perspective, reliability is uptime and service response time for users, so that a system is available when users access services.
For businesses today, especially service providers, reliability of service has implications beyond customer satisfaction. Because service providers establish and maintain their reputations based on availability and reliability of their services, many of them require carrier-class grade high availability and reliability.
TIP
Reliability depends upon and is affected by the design for availability; therefore, your design for an ISP architecture should balance a customer's requirements for both availability and reliability, within any constraints imposed by customer or technology.
Dependent upon availability design, reliability is increased through an infrastructure based on redundant servers. Functionally componentized architecture results in more intrinsic redundancy and fewer inherent single points-of-failure. Furthermore, any damage to an individual service is unlikely to impact other services.
The constructs of redundancy are useful in achieving many aspects of reliability, scalability, and availability.
Manageability
Manageability addresses how an infrastructure can be managed during its life cycle. The key to manageability is to keep an architecture design simple, yet effective. Meet all functional and business requirements without adding complexity. If a design is too complex and difficult to manage, there is more likelihood for operation and management failure, and troubleshooting becomes more difficult and time consuming. Also consider management tools, management plans, and methods of monitoring services. Ensure that devices and components that need to be monitored are managed. If a system goes down and there is nothing monitoring the device or component causing the outage, customer satisfaction and subscriber satisfaction are at risk, in addition to associated costs and potential loss of revenue.
Adaptability
For any architecture, change during a life cycle is inevitable. An architecture must be adaptable enough to accommodate growth and changes in technology, business, and user needs. Within the customer's financial constraints and growth plans, design an architecture that allows for adaptability.
Modular architectures inherently support flexibility in two ways: individual components are themselves easily augmented, and, because components are independent, new components can be added without disturbing or revamping other components within an architecture.
Security
From a larger perspective, security is a combination of processes, products, and people. Security is achieved by establishing effective policies and implementing procedures that enforce policies. Security policies are useless without control over who has access to and can affect security on servers and services. Securing access requires establishing an appropriate authentication regime.
From an architecture perspective, security is access to network, system, and data resources.
At the network layer, security can be achieved with an access control list (ACL) on routers, packet filters, firewalls, and network-based intrusion detection systems (IDS).
At the system layer, security can be achieved with system hardening, access permission, host-based IDSs, scanners, and file checkers.
At the data layer, security can be achieved with authentication and authorization.
Functional decomposition (separating functional components) contributes to security by making it easy to build security around different components. In addition, if one component is compromised, the security breach may be more easily contained.
Adapting to evolving threats is a never-ending cycle of processes. The strategy of responding to security threats has to evolve as potential intruders gain knowledge and discover new attack techniques.
We recommend designing security strategies with great flexibility in approaches to provide the best security against present and future threats.
Performance
Although performance has multiple definitions, in this book we relate it to the "expected" response time after a user requests a service. Depending upon an ISP's requirements, response time may be critical or noncritical, and these distinctions may be further refined by service type.
Individual services use system resources, for example, memory, CPU, and I/O, in different ways. A modular architecture provides the ability to independently monitor and tune each service.
The causes of slow response times are many. For example, some common causes are network latency, server degradation, and application responsiveness. Degradation at any of these layers can result in poor overall performance.
A system is easier to tune when it is running only a few applications. When many applications are running on a system, they must share resources, and tuning becomes complicated and challenging.
TIP
Two products available from Sun are useful in managing resources: Solaris Resource Manager and Solaris Bandwidth Manager. The Solaris Resource Manager manages resources for users, groups, and enterprise applications. The Solaris Bandwidth Manager controls bandwidth allocated to applications, users, and organizations.
Open System
Ideally, design using an open system approach so that an architecture is not dependent upon a single hardware or software vendor. An architecture is less flexible when built upon proprietary specifications. Building upon a set of open system standards that are accepted by a recognized consortium provides greater flexibility for business changes and growth, such as adding users and services and integrating new technology.