2.3 Storage Software
Software is the key to all storage networking, and a wide variety of software runs throughout any storage network. Software exists on virtually all physical devices that make up a storage network; the clients that end up using the data; and all the devices between the two endpoints. It is safe to say that without the software, you might as well use the disk platters as Frisbees for all they are worth to the business.
2.3.1 Backup
Backup and archiving software typically resides between source data and target data repositories. In backup operations, data is moved from primary media to secondary media, typically tape, for storage. The secondary medium is then stored in a location physically distant from the location of the source storage. In this way, in the event of a widespread disaster either the backup media or the primary storage devices will be safe. With tape, you can restore your data to a previous snapshot. In the event of a disaster, you would lose data that had changed between the last data backup and the time of the disaster. Archiving tends to be a little more flexible, with the intent being to store snapshots of data for a certain period of time. Archives tend to be used for recordkeeping and account management, basically as a way to see the state of the books at a particular time. Archiving is also useful in programming environments where a particular archive contains the state of code at a particular time, whereas a backup may be a degenerate version of an archive in which only one copy exists.
People sometimes confuse RAID level 1, mirroring, with backup and archiving. The first argument usually goes that because the user has a complete copy, or mirror, of data on a second disk, the data is backed up. This is not the case. In the event of a disaster, both mirrors are usually in the same location and will be destroyed. The counter argument is that the mirror can be kept offsite. But this approach has two problems. First, propagation delay slows down the read/write of data. Second, it doesn't prevent someone from accidentally deleting all the data with a single call to a delete function. In this case, data is wiped out of both mirrors because it is a valid operation.
Backup typically is based on policies that are set by the system administrator. Policies typically dictate
The schedule to use for backup operations
The source and target for data
How much data to back up (all data, changed data, new data, etc.)
In traditional backup software, data flows from the source to the host that is running the backup process and then to the target device. Certain protocols and processes, however, are changing this approach. The most prevalent problem with backup is that the amount of data is increasing while the window for completing the operation is shrinking.
With creative use of storage networks, resource managers can deploy a series of mirrors and data caches to shrink or even eliminate the amount of time that resources must be offline.
2.3.2 Hierarchical Storage Management
Hierarchical storage management is the practice of treating all storage types as a single storage pool. Primary, secondary, and tertiary storage types are all leveraged to minimize the cost of storage and maximize the quality of service. The attributes of a particular storage device determine whether it is a primary, secondary, or tertiary device. For example, a high-speed disk-based device can be considered a primary device. It is expensive in terms of price per byte. A tape drive, in contrast, has a low price per byte cost but is very slow and thus serves as a tertiary device.
Typically, when storage capacity is in short supply, data is moved from a primary storage device to a secondary storage device, such as a tape library. This move is usually based on the last-accessed date, with files being moved that have not been accessed for a long time. To users, the data still "appears" to be in their file space. The only noticeable difference is that when the file is accessed, it takes extra time to retrieve the data from the tape library.
Hierarchical storage management is often used on mainframe computers. Heterogeneous storage networks do not use it as frequently, although HSM is being revived as a way to move and cache data in a storage network.
2.3.3 Operating System Software
Operating system software is a pivotal element in the management of any storage network. Clients make file requests via CIFS or NFS (or another file system protocol) to a server. The server is running either a general-purpose operating system or an operating system that is specialized for file servers. The operating system contains file systems, volume management capabilities, and block-level device drivers. How the operating system and the file-manipulation components are tuned can have a dramatic effect on the response time and efficiency of your storage network.
Closely related to the operating system is software that may run on top of it to facilitate various interactions with the software components and hardware components. One example is a standards-based management software component, such as a WBEM layer. This capability is not built into the operating system but is required for manipulation of the underlying components. Other software components that may be closely related to the operating system onboard a device are Web servers, telnet daemons, and remote procedure call mechanisms for control point manipulation.
The client view of data is typically managed by a volume manager, a file system that uses the volumes exposed by the volume manager, and software that allows access to the data via a network file sharing protocol (such as NFS or CIFS). The volume manager is responsible for partitioning physical locations on media and creating logical groups of physical devices. The volume manager then moves data through a device driver (or block device) to the physical device. Above the volume manager is a file system that allows users to access the data in an application-centric, usable format.
When managing storage, the file system and the volume manager give views of how data is being used and how much storage individuals have.
2.3.4 Storage Resource Management Applications
Storage resource managers provide a view of a complete storage network, from the physical characteristics of the topology all the way to the policies that guide the day-to-day operations of the storage network. SRMs typically provide the following:
A complete view of storage resources in the form of a graphical topology
Access to an integrated management console or individual management consoles from vendors of the various products being managed
A view of the scheduled backup/restore operations
Performance statistics on various network devices
Data about the performance of various storage devices
Access to log data or history data about the storage network
Some level of policy control for automated network event handling
The purpose of the storage resource manager is to manage all aspects of storage (software and hardware) from a single application. Traditionally, creating an SRM is easier if a homogeneous storage network is in place. Jiro and the Federated Management Architecture are designed to ease the job of building an SRM that fits into a heterogeneous environment.