Building N1™ Grid Solutions: Realizing the Vision
- Properties of an N1 Grid System
- Workload-to-Resources Mapping
- N1 Grid Systems Realized
- Summary
Chapter 2 discussed why there is a need to view the infrastructure on which services are deployed differently. This view leads to the definition of the N1 Grid vision, strategy, and architecture. This chapter discusses the desired properties, attributes, and potential routes to realizing N1 Grid systems. Although much of the text describes N1 Grid systems and the N1 Grid operating environment (N1 Grid OE), the definitions are neither implementations nor product descriptions, nor are they product names.
Properties of an N1 Grid System
One of the best ways of exploring the true nature of an N1 Grid system is to compare it with a more familiar class of systems (for example, a traditional symmetric multiprocessor or SMP computer system). A traditional SMP system is typically defined as the set of components that share a system bus, memory, and I/O, and that run under the control of an instance of an operating system (for example, the SolarisTM or Linux operating system).
An N1 Grid system is defined as a set of components (for example, servers, network switches, load balancers, firewalls, storage switches, storage controllers, and disks) under the control of a single instance of the N1 Grid OE. However, unlike a traditional operating system, the N1 Grid OE is distributed (that is, components of the operating environment run on various components within the N1 Grid system). An N1 Grid system is constrained to being within a single data center, although clusters of N1 Grid systems, both within a single data center or spread across a number of data centers, might act like and exhibit many, if not all, of the same attributes as a single N1 Grid system. Do not be distracted by the potential for geographically separated or clustered N1 Grid systems for now. Just like a traditional SMP system, an N1 Grid system is under the control of a single instance of an operating system.
An N1 Grid system consists of a set of platform resources, both hardware and software, within a single data center that is managed by a single instance of the N1 Grid OE. The N1 Grid OE itself is a part of the N1 Grid system, just as the SolarisTM Operating System (Solaris OS) is a part of a traditional server system. Platform resources include but are not limited to servers, service processors, network switches, load balancers, firewalls, storage switches, storage controllers, disks, hypervisors, and operating systems.
N1 Grid Operating Environment
In general, the N1 Grid OE is the software component of an N1 Grid system that manages the mapping of workload (that is, services) onto platform resources in line with a set of policies. Workloads include, but are not limited to:
-
Multitier web services
-
Traditional client-server applications
-
Grid workloads
Policies reflect high-level business goals, related to cost and quality of service, such as:
-
Priority (versus other services)
-
Average transaction response time (for transactional workloads)
-
Number of concurrent transactions or users
-
Target completion time (for batch or non-interactive and non-transactional workloads)
-
Acceptable outage (in hours per year or hours per month)
-
Cost of service
Resources are system elements, such as:
-
Compute elements (for example discrete serverswith or without installed operating systems or hypervisorsand physically partitionable servers, such as Sun servers with dynamic system domains)
-
Network elements (such as switches, routers, load balancers, and firewalls)
-
Storage elements (such as disks, arrays, array controllers, and SAN fabric elements)
The N1 Grid OE is unlike a traditional operating system in that it is inherently both distributed and hierarchical. Its components, as well as the resources it manages, are distributed across a network. The resources are not simply physical components. They can be aggregations of components, including software. Thus, a server with a traditional operating system (for example, the Solaris OS) or a cluster (for example, a SunTM Cluster software environment) can be a resource, to which policy and the management of specific workloads might be delegated. The N1 Grid OE can deploy the traditional operating system or clustering software to create what is sometimes referred to as a metaresource, which it then manages.
The full realization of an N1 Grid OE (for instance, a single, integrated operating environment) is in the future. However, some key components of the operating environment are available as products today, and N1 Grid systems can be implemented in a basic form using today's technologies, coupled with appropriate architectures and operational models.
Heterogeneity Within N1 Grid Systems
Heterogeneity is an obvious requirement for N1 Grid systems because data centers rarely have a single vendor strategy for their infrastructure. N1 Grid solutions are also service-centric, and typically, not all of the components or tiers of a service are deployed on a homogeneous environment. To manage services holistically, a distributed, heterogeneous set of resources has to be managed. Thus, the N1 Grid OE builds on existing, open network computing protocols and standards and on extant application models.
An analogy with a more traditional computer system should help clarify. Typically, a systems vendor architects a computer system. When that system is manufactured, some of the components of the system are designed and manufactured by the systems vendor. These are typically components that reflect the core competency, differentiation, and value-add of that vendor. Some other components might be designed by the vendor, but manufactured by another party to the vendor's designs. Again, this might be required to provide optimized components that ensure a better integrated and more valuable system. However, implementation of that componentry might not be a core competency of the vendor. Finally, some components are commodity parts. The only differentiation between one vendor's product and another is price or performance. Functionally, they are identical.
Just as this is true when building a traditional computer, it is also true in the case of an N1 Grid system. An N1 Grid system and its operating environment can include both software and hardware components from multiple vendors, which are driven by the same architecture with successful implementation being based on common functional and interface specifications. N1 Grid systems can include third-party network and SAN fabric elements and compute elements, together with their hosted operating systems, such as an IBM UNIX server, a Dell Linux server, or a Hewlett Packard server running the Microsoft Windows software.
Functional Breakdown
An N1 system consists of a collection of network distributed resources:
-
Compute elements (such as traditional servers, blades, or hardware partitionable servers, such as the Sun FireTM 15K server).
-
Network elements (such as switches, routers, firewalls, and load balancers)
-
Storage elements (such as disks, arrays, array controllers, and SAN switches)
These managed elements become components within the N1 Grid system, just like the individual processor in today's SMP, so that they are no longer managed directly. The traditional operating system manages processors on behalf of the user. The N1 Grid OE likewise manages these networked resources on behalf of the users of N1 Grid systems, mapping workload onto them in line with policies.
The workload that is managed is no longer the process or the thread. It becomes the servicea higher-level, more abstract entity that the business understands (for example, an online bookstore or an e-banking service). The policy that is used to manage that service is likewise more abstract.
Rather than specifying scheduling classes, priorities, or amounts of memory to be used by a service component, N1 Grid OE policies will eventually specify the business goals for the service, such as average transaction response time, number of concurrent customers, batch completion window, allowable outages per month, and desired cost. The N1 Grid OE will automatically translate between these higher-level abstractions and will automatically allocate the traditional resources to service components.
For the N1 Grid OE to be able to do this, it must mimic some of the activities that data center architects, managers, and administrators perform today. It must also capture the information they typically use. For example, the N1 Grid OE must have the following:
-
Resource model
The resource model provides a way of describing the set of resources available to the N1 Grid OE. An appropriate internal representation is required, along with the means of manually and automatically populating it.
-
Service model
The service model describes services so that the dependencies between various service components, between service components and resources, and among multiple services can be captured. The dependencies range from simple ones, such as those required to instantiate or run a service component (for example, the right version of the operating system) to performance, scaling, availability, and security. An appropriate internal representation is required, along with the means to manually and automatically populate it.
-
Provisioning mechanisms
The provisioning mechanisms enable the mapping of workloads to resources within the context of the network. Within an N1 Grid system, any given service component has the potential to run on different elements and can be easily moved from one element to another to improve flexibility, availability, and utilization.
-
Policy and goals model
The policy and goals model describes the policies and business goals that drive the system.
-
Telemetry and controls interface
Telemetry and control interfaces enable the state of the various system components, both the hosted service components and the compute, storage, and network elements on which they are deployed, to be monitored and controlled.
Resources
At the core of the N1 Grid vision is the concept that the system is now a network of resources and components, rather than a server connected to the network. This mandates a different perspective on the development and management of the resources. Specifically, resources are treated as components in a fabric. The management silos observed today need to be avoided, at least from the architectural perspective. Although N1 Grid systems can use existing componentry, it is ultimately intended to drive the development of some of these components, optimizing them for operation within an N1 Grid system so that they inherently increase the value of that system. In the long term, this implies the development of standards for the representation of the fabric of resources, together with mechanisms that enable the automatic discovery of fabric components, how they are connected together and how they can interact with one another.
Service-Centric Management
Central to the N1 Grid vision is the concept of moving from the management of low-level entities, such as servers, to the services or applications on which a business is built. By moving the focus of management to services and driving the delivery of those services using business goals, the intention is to remove the management headache associated with managing today's infrastructures.
For the N1 Grid OE to effectively manage the deployment of a service onto the underlying resources at its disposal, it must have an understanding of the nature of the service. FIGURE 3-1 shows a simplified service dependency graph, including the dependencies within a service, between different services within an enterprise, and between services hosted within different enterprises. The dependency graph must capture instantiation, performance, scaling, availability, and security dependencies, along with others types of dependencies.
Figure 3-1 Simplified Service Dependency Graph
For example, if a Tier 3 service consists of database, application server, and web server components, the N1 Grid OE would need to represent and resolve the following dependencies:
-
Sets of instantiation dependencies for the service and its service components, for example:
-
The bookstore depends directly on the database, application server, and web server tiers.
-
The web server tier is running Apache version X.
-
The version of Apache requires Linux version Y running on IA32-based servers.
-
The web tier uses a load-balancing appliance from a specific vendor.
-
The application server tier uses the Sun JavaTM Enterprise System application server version 6.0, which requires the Solaris 9 OS, update X, with patches A, B, and C.
-
The database server is version Z and requires the Solaris 8 OS, update X, with patches E and F.
-
The bookstore has external dependencies on a separate intrusion detection service and firewall service.
-
-
Performance and scaling attributes and dependencies of the service, for example:
-
The web tier scales though replication, and the load is distributed using the load balancer.
-
The application server tier scales on large SMPs, up to 16 processors per instance, as well as across multiple servers using its internal cluster and load balancing capabilities.
-
The database server tier scales as a single instance within an SMP system (perhaps up to 94 processors).
-
-
Availability attributes of the service, such as:
-
The web tier is made highly available through replication and through a load balancer.
-
The application server tier is made highly available using its own clustering capabilities and replication.
-
The database is made highly available by ensuring that it runs only on compute elements with redundant components and within a Sun Cluster software environment.
-
These are simple examples. In reality, an N1 Grid OE service description would include information about where and how in the service to recover from certain types of component failures, how to ensure appropriate security, or information about the numbers of instances of service components required to cater to certain loads, derived from capacity-planning exercises.
In short, creating the description of the service is the formalization of what are today typically ad hoc processes within data centers. Processes typically vary, not only from one data center or enterprise to another, but often within the same data center. These processes are usually poorly documented and are often manually implemented by people working in disparate groups. Thus, their implementation might be both time consuming and error prone.
With N1 Grid systems and their focus on the description of the service, the emphasis shifts from the repetitive and often manual tasks associated with deployment and ongoing maintenance of components to actually defining and setting up the service properly so that ongoing maintenance can be optimized or automated. Representing a service as a dependency graph enables the manipulation of more abstract entities and provides finer-grained control.
The benefits of service-centric management include:
-
It is more intuitive because it is focused on what the business cares about.
-
It drives a new sustainable operational model that is not driven by the nature of the underlying componentry, but rather by a consistent view of the services.
-
It essentially changes the TCO dynamics of the data center because the components that people manage today become hidden by new toolsand ultimately by the N1 Grid OE.