Introduction to Storage Virtualization
1.1 Storage Virtualization Overview
The data storage industry is one of the most dynamic sectors in information technology today. Due largely to the introduction of high-performance networking between servers and storage assets, storage technology has undergone a rapid transformation as one innovation after another has pushed storage solutions forward. At the same time, the viability of new storage technologies is repeatedly affirmed by the rapid adoption of networked storage by virtually every large enterprise and institution. Businesses, governments, and institutions today depend on information, and information in its unrefined form as data ultimately resides somewhere on storage media. Applying new technologies to safeguard this essential data, facilitate its access, and simplify its management has readily understandable value.
Since the early 1990s, storage innovation has produced a steady stream of new technology solutions, including Fibre Channel, network-attached storage (NAS), server clustering, serverless backup, high-availability dual-pathing, point-in-time data copy (snapshots), shared tape access, storage over distance, iSCSI, CIM (common information model)-based management of storage assets and transports, and storage virtualization. Each of these successive waves of technical advance has been accompanied by disruption to previous practices, vendor contention, over-hyping of what the new solution could actually do, and confusion among customers. Ultimately, however, each step in technical development eventually settles on some useful application, and all the marketing dust finally settles back into place.
No storage networking innovation has caused more confusion in today's market, however, than storage virtualization. In brief, storage virtualization is the logical abstraction of physical storage systems and thus, when well implemented, hides the complexity of physical storage devices and their specific requirements from management view. Storage virtualization has tremendous potential for simplifying storage administration and reducing costs for managing diverse storage assets.
Unlike previous new protocols or architectures, however, storage virtualization has no standard measure defined by a reputable organization such as INCITS (InterNational Committee for Information Technology Standards) or the IETF (Internet Engineering Task Force). The closest vendor-neutral attempt to make storage virtualization concepts comprehensible has been the work of the Storage Networking Industry Association (SNIA), which has produced useful tutorial content on the various flavors of virtualization technology. Still, storage virtualization continues to play the role of elephant to the long lines of vendors and customers who, having been blinded by exaggerated marketing claims, attempt to lay hands on it in total darkness. Everyone walks away with a different impression. It is often difficult, therefore, to say exactly what the technology is or should be expected to do.
As might be expected, some of the confusion over storage virtualization is vendor-induced. Storage virtualization products vary considerably, as do their implementation methods. Vendors of storage arrays may host virtualization directly on the storage controller, while software vendors may port virtualization applications to servers or SAN appliances. Fabric switch manufacturers may implement virtualization services within the fabric in the form of smart switch technology. Some vendors implement storage virtualization commands and data along the same path between server and storage, while others split the control path and data path apart. Advocates of one or the other virtualization method typically have sound reasons why their individual approach is best, while their competitors are ever willing to explain in even greater detail why it is not. The diversity of storage virtualization approaches alone forces customers into a much longer decision and acquisition cycle as they attempt to sort out the benefits and demerits of the various offerings and try to separate marketing hype from useful fact.
In addition, it is difficult to read data sheets or marketing collateral on virtualization products without encountering extended discussions about point-in-time data copying via snapshots, data replication, mirroring, remote extension over IP, and other utilities. Although storage virtualization facilitates these services, none are fundamentally dependent on storage virtualization technologies. The admixture of core storage virtualization concepts such as storage pooling with ancillary concepts such as snapshots contributes to the confusion over what the technology really does.
Although storage virtualization technology has spawned new companies and products, virtualizing storage is not new. Even in open systems environments, atomic forms of virtual storage have been around for years. In 1987, for example, researchers Patterson, Gibson, and Katz at the University of California Berkeley published a document entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)," which described means to combine multiple disks and virtualize them to the operating system as a single large disk. Although RAID technology was intended to enhance storage performance and provide data recoverability against disk failure, it also streamlined storage management by reducing disk administration from many physical objects to a single virtual one. Today, storage virtualization technologies leverage lower-level virtualizing techniques such as RAID, but primarily focus on virtualizing higher-level storage systems and storage processes instead of discrete disk components.
The economic drivers for storage virtualization are very straightforward: reduce costs without sacrificing data integrity or performance. Computer systems in general are highly complex, too complex, in fact, to be administered at a discrete physical level. As computer technology has evolved, a higher proportion of CPU cycle time has been dedicated to abstracting the underlying hardware, memory management, input/output, and processor requirements from the user interface. Today, a computer user does not have to be conversant in assembly language programming to make a change in a spreadsheet. The interface and management of the underlying technology has been heavily virtualized.
Storage administration, by contrast, is still tedious, manual-intensive, and seemingly never-ending. The introduction of storage networking has centralized storage administrative tasks by consolidating dispersed direct-attached storage assets into larger, shared resources on a SAN. Fewer administrators can now manage more disk capacity and support more servers, but capacity for each server must still be monitored, logical units manually created and assigned, zones established and exported, and new storage assets manually brought online to service new application requirements. In addition, although shared storage represents a major technological advance over direct-attached storage, it has introduced its own complexity in terms of implementation and support. Complexity equates to cost. Finding ways to hide complexity, automate tedious tasks, streamline administration, and still satisfy the requirements of high performance and data availability saves money, and that is always the bottom line. That is the promise of storage virtualization, although many solutions today are still far short of this goal.
Another highly advertised objective for storage virtualization is to overcome vendor interoperability issues. Storage array manufacturers comply with the appropriate SCSI and Fibre Channel standards for basic connectivity to their products. Each, however, also implements proprietary value-added utilities and features to differentiate their offerings to the market and these, in turn, pose interoperability problems for customers with heterogeneous storage environments. Disk-to-disk data replication solutions, for example, are vendor-specific: EMC's version only works with EMC; IBM's only with IBM. By virtualizing vendor-specific storage into its vanilla flavor, storage virtualization products can be used to provide data replication across vendor lines. In addition, it becomes possible to replicate data from higher-end storage arrays with much cheaper disk assets such as JBODs (just a bunch of disks), thus addressing both interoperability and economic issues.
The concept of a system level storage virtualization strategy occurs repeatedly in vendor collateral. One of the early articles was Compaq's Enterprise Network Storage Architecture and its description of a storage utility. According to the ENSA document, this technology would transform storage ". . . into a utility service that is accessed reliably and transparently by users, and is professionally managed with tools and technology behind the scenes. This is achieved by incorporating physical disks into a large consolidated pool, and then virtualizing application disks from the pool."
The operative words here are reliably and transparently. Technical remedies, like doctors, must first do no harm. Reliability implies that storage data is highly accessible, protected, and at expected performance of delivery. Transparency implies that the complexity of storage systems has been successfully masked from view and that tedious administrative tasks have been automated on the back end. The abstraction layer of storage virtualization therefore bears the heavy burden of preserving the performance and data integrity requirements of physical storage while reducing the intricate associations between physical systems to a simple utility outlet into which applications can be plugged. Part of the challenge is to get the abstraction apparition conjured into place; a greater challenge is to ensure that the mirage does not dissolve when unexpected events or failures occur in the physical world. Utilities, after all, are expected to provide continuous service regardless of demand. You shouldn't have to phone the power company every time you wish to turn on a light.
The notion of utility applied to storage and compute resources conveys not only reliability and transparency, but also ubiquity. The simpler a technology becomes, the more widely it may be deployed. Storage networking is still an esoteric technology and requires expertise to design, implement and support. The substantial research, standards requirement definition, product development, testing, certification, and interoperability required to create operational SANs was in effect funded by large enterprise customers who had the most pressing need and budget to support new and complex storage solutions. Once a storage networking industry was established, however, shared storage expanded beyond the top tier enterprises into mainstream businesses. Leveraging storage virtualization to create a storage utility model will accelerate the market penetration of SANs and, in combination with other technologies such as iSCSI, spread shared storage solutions to small and medium businesses as well.
Currently, all major storage providers have some sort of storage virtualization strategy in place, with varying degrees of implementation in products. Upon acquiring Compaq, Hewlett-Packard (HP) inherited the ENSA (and ENSA-2) storage utility white paper and has supplemented it with its Storage Grid and other initiatives. IBM has TotalStorage with SAN Volume Controller. EMC's Information Lifecycle Management (ILM) extends storage virtualization's reach throughout the creation and eventual demise of data. Hitachi Data Systems supports array-based storage virtualization on its 9000 series systems. Even Sun Microsystems has a component for pooling of storage resources within its N1 system virtualization architecture. These vendor-driven storage virtualization initiatives reflect both proactive and reactive responses to the customers' desire for simplified storage management and are being executed through both in-house development and acquisition of innovative startups.
In addition, multilateral partnerships are being forged between vendors of virtualization software, storage providers, SAN switch manufacturers, and even nonstorage vendors such as Microsoft to bring new storage virtualization solutions to market. Despite the high confusion factor (and often contributing to it), storage virtualization development has considerable momentum and continues to spawn a diversity of product offerings. This is typical of an evolutionary process, with initial variation of attributes, cross-pollination, inheritance of successful features, and ultimately a natural selection for the most viable within specific environments. Because storage virtualization is still evolving, it is premature to say which method will ultimately prevail. It is likely that storage virtualization will continue to adapt to a diversity of customer environments and appear in a number of different forms in the storage ecosystem.