Linux Filesystems 101
- Basic Data Storage Terminology and Concepts
- Improvements in Linux Filesystems
- Onward and Upward
Much of the computer industry is caught up in an endless quest for faster processors, higher video resolution, more memory, and bigger disks. Although a brighter, shinier computer system is always a good thing; better performance is not necessarily just a function of better hardware. As Linux demonstrates by its very existence, more powerful, more sophisticated, and better-designed operating system software can make any computer system more usable, more reliable, and thus a better resource for its user community.
The bottom line of any computer system is storing, retrieving, manipulating, and saving information. Filesystems, the generic term for the various types of structured data storage used on computer systems, play a huge, although largely invisible, role in supporting a community of happy users and calm system administrators. As a low-level operating system component, a filesystem is about as sexy as a skeletonbut life would get messy pretty quickly if it wasn't there and didn't work reliably.
Today's mobile workforce, ubiquitous networking, and high-availability requirements have brought about some interesting and beneficial alternatives in terms of how, where, and when data is read and written to long-term storage. To set the stage for exploring some of the more interesting filesystem developments that are available on Linux systems today, let's first examine the basics of storing data on disk drives that are physically connected to a computer.
Basic Data Storage Terminology and Concepts
A modern hard drive is composed of multiple platters that are mounted on a central spindle like the layers of a wedding cake. As the platters rotate, data is read from and written to them by disk heads that move in and out between platter along its radius. These disk heads are mounted on a single assembly that is much like a comb, with a disk head mounted on the top and bottom of each of the teeth that move between the platters.
The top and bottom of each platter are organized into tracks, which are concentric circles of data on the surface of the platter. Each track is divided into sectors, also known as physical blocks, which are side-by-side chunks of data in the track. As a final organizational term, the tracks on all platter surfaces that are the same distance from the spindle are often referred to as cylinders. Some filesystems use cylinders to optimize disk accessbecause all of the disk heads move together, writing or reading data in parallel to or from disk blocks within a cylinder can be much faster than writing or reading data sequentially within a single track.
The terms sector and block are used interchangeably when discussing hardware, but block has a different meaning in "filesystem-speak," in which it is used as a logical unit rather than a physical one. In filesystems, a block is the fundamental allocation unit from which a filesystem is constructed. A filesystem therefore consists of a specific number of logical blocks. Block size differs between filesystems, but is always an even multiple of the physical sector size. Block size is either predefined for a specific type of filesystem or set when a specific filesystem of a specific type is created.
There are two main goals of selecting an appropriate logical block size for use in a filesystem: minimizing the number of file system Input/Output (I/O) operations necessary to read and write files, and minimizing the amount of space wasted when files in the filesystem turn out to be smaller than the block size. In real life, files vary widely in size. Selecting the "right" block size for a filesystem is therefore a compromise between wasting as little space as possible and minimizing the number of blocks that have to be allocated to store a file.
Hard drives are divided into one or more physical partitions. Physical partitions can hold a single filesystem, can be redefined as a logical partition that can hold multiple filesystems, or can be combined together by software to form a pool of storage on which one or more logical filesystems can be created. The primary reasons for partitioning a hard drive include the following:
To reduce the amount of time required to locate a specific piece of data on the drive. It simply takes less time (and less location information) to find a specific piece of data in a smaller pool of information.
To limit the amount of data that can be lost or damaged if a disk is damaged, or if a specific filesystem becomes corrupted.
To speed up administrative operations such as defragmentation, consistency checking, and filesystem repair.
To simplify administrative operations, such as backups. It's simpler to back up partitions that will fit on a single tape or other backup media because no operator intervention is required (such as switching tapes). Multiple partitions also enable you to install system files and applications programs on different partitions than user accounts. You then can back up the partition containing user accounts without accidentally backing up vast amounts of relatively unchanging executables, system files, and so on.
Without imposing some organization on a disk drive, operating system software wouldn't know where to look to find files and the information that the filesystem uses to organize them. To make this easy to remember, here's one of my favorite bits of technical writing ever, a quote from a Hewlett-Packard manual shipped with one of its Unix (HP-UX) workstations in the mid 1980s:
On a clear disk, you can seek forever.
After a disk is partitioned, filesystems can be created on those partitions. During the filesystem-creation process, the higher-level data structures that the system will use to store, locate, and retrieve files and directories are created. Modern filesystems on physical or logical partitions consist of hierarchically organized files and directories. Just as files are containers for data, directories are containers for files and/or other directories. The classic analogy for hierarchical filesystems is a filing cabinet with multiple drawers containing files stored in folders. Folders can contain either files or other folders. It's therefore easy to find a specific file or folder by describing it in terms of the drawer and folders in which it is contained. Locating a file in a directory in a specific filesystem is exactly like saying "open drawer W, find folder X, look in there for folder Y, and pull out file Z."