Supporting Solid State Disks
- Read Write Speed
- Defragmentation
- Internal Fragmentation
- Trimming Unused Space
In the last article, we looked at some of the technologies that are being used to implement solid state disks (SSDs). Now we'll look at how these technologies have driven changes in operating systems.
To begin with, it's worth reminding ourselves of what the characteristics of a typical mechanical disk are and the kinds of optimization that these have driven. Hard disks have a rotating platter and a disk head on an actuator arm that moves across it. Typically, they spin at somewhere in the 3-15 KRPM ballpark, so reading a complete ring is very fast. Moving the head sideways is much slower because it has to move, then settle.
In the past 20 years, hard disks have gone from around 20MB to around 2TB. In that time, average seek times have gone from around 10ms to around 5ms. Although the capacity has increased by five orders of magnitude, the seek time has only halved. At 5ms per seek, the maximum random read or write speed for a disk is around 100KB/swriting one 512 byte block each seek, and assuming that linear writes are infinitely fast. In contrast, sequential write operations can easily exceed 30MB/s with a modern disk.
Because of this massive difference in speeds, operating systems try very hard to avoid random writes. They use a variety of techniques to achieve this, including write queuing, journaling, and caching.
Read Write Speed
One of the most obvious differences between current SSDs and hard disks is that there is no longer such a thing as read-write speed. A hard disk can typically both read and write data as fast as the platter spins under the head. Flash drives cannot. They can read data very quickly, but writing is a lot slower.
Most of the time, operating systems use 4KB memory pages for the filesystem cache. When you write 1 byte, they keep the change in memory for a bit and then periodically write the entire modified page out to disk. For a flash drive, this involves erasing an entire block, then rewriting its contents.
This is not the case if you are writing to an empty disk. You can write individual bytes of flash memory when they're in the unset state, but you can't then modify them without erasing the entire blocktypically 128KB or more. This affects the ideal write pattern quite a lot.
Partition alignment is very important for avoiding some of this overhead. Traditional hard disks use 512-byte blocks, and partitions are aligned on these boundaries, quite often on an odd number, for tedious historical reasons. The caching is usually done using memory pages relative to the start of the partition (you don't want to accidentally modify the master boot record, after all). If the partition starts on an odd offset, then one of these 4KB pages may overlap two 128KB blocks on the flash. When the operating system writes it back to the disk, the SSD controller must read 256KB, erase two blocks, then rewrite 256KB. This, as you can imagine, is slow.
Partition alignment is also a problem for newer rotating drives, because they use 4KB blocks internally and emulate 512-byte block access for older operating systems. Accessing 512-byte blocks starting somewhere other than a 4KB boundary (which most small files will be if they are on a partition with an odd start offset) requires loading 4KB blocks into the controller and then splitting them up.
For a mechanical drive, seeking is very slow, so you want to write a linear block of data out, even if it means overwriting old data. With a flash drive, random writes that don't overwrite old data are preferable. These are almost exact opposite write patterns.