- Read Write Speed
- Defragmentation
- Internal Fragmentation
- Trimming Unused Space
Defragmentation
Anyone who has used Microsoft Windows or DOS is familiar with defragmentation. It was a very common requirement under DOS for two reasons:
- The File Allocation Table (FAT) design meant that you needed to search a linked list for free space, and couldn't find a bit of free space for the desired size without searching the whole of the free space on the disk, so files were just written immediately into the first free block, then the next free block, and so on.
- DOS was not designed with caching in mind, so it was very difficult for the operating system to know how much space was going to be needed for a file.
Under a UNIX system, in contrast, free space was typically stored in some kind of tree structure arranged by size. Files were stored in RAM for a short while before being written to the disk, and applications writing very large files typically did a seek to the end first to give the operating system a hint about the size.
The defragmentation process works by reading files that are scattered over the disk and then rewriting them into a single block of free space. It is important because, as I've already mentioned, seeking on a mechanical disk is very expensive. If you have a 1MB file in 2048 512-byte blocks scattered over the disk, then it will take a significantly longer amount of time to read it than if it's in a single 1MB stretch.
A typical hard disk has a seek time somewhere in the 4-10ms range. If you assume 4ms, then that means that it's limited to 250 seeks per second. If you read a file that is in 512-byte chunks scattered over the disk, then you are limited to 125KB/s. In contrast, a modern laptop drive can easily top 30MB/s in linear reads.
This is one of the reasons that flash is popular with database servers. The limit there is the number of I/O operations per second (IOPS), rather than the raw linear throughput. Where a hard disk can typically sustain something on the order of 100IOPS, 1000IOPS is considered very low for solid state disks.
In fact, defragmentation can make some SSDs slower. A typical large SSD uses several chips for storage. Sometimes these are striped; sometimes they are concatenated. If your file is split between several chips, then the controller can sometimes read blocks from them in parallel, speeding up the access rate.
Most operating systems don't need explicit defragmentation anymore. They will pick an allocation strategy that minimizes fragmentation and often rearrange existing files in the background to reduce fragmentation.
Windows is an exception, largely for psychological reasons. There was a user study performed a few years ago where users in a corporate Linux environment were given a defragmentation program that they could run. It spent a few hours displaying an animation, but didn't touch the disk. Most of the users reported that their computers were more responsive after running it. Even though most users don't know what defragmentation actually does, they expect to be able to do it to stop their computer becoming slower over time.
On a flash drive, both approaches can be harmful. Defragmentation does little to speed up access, but does use up the finite number of erase cycles that the flash cells have.