Tools and Strategies for Securing Hardware
We worry about computer security because important social processes keep migrating to distributed computerized settings. When considering security, it’s important to take a holistic view—because we care about the security of the social process itself, not only some component. However, in this chapter, we take a reductionist point of view and look at one of the components in particular: the hardware.
Historically, probably the main path toward thinking about hardware security came from considering the protection of computation from adversaries with direct physical access to the computing machinery. We’ve always liked framing this question in terms of dependency. Alice’s interests may depend on certain properties of computation X—perhaps integrity of its action or confidentiality of some key parameters. However, if X occurs on Bob’s machine, then whether these properties hold may depend on Bob. For example, if Bob’s machine is a standard PC and Bob is root, then he pretty much has free reign over X. He can see and modify data and code at will. As a consequence, preservation of Alice’s interests depends on the behavior of Bob, since Bob could subtly subvert the properties that Alice depends on. These circumstances force Alice to trust Bob, whether or not she wants to. If Bob’s interests do not coincide with Alice’s, this could be a problem.
This main path—reducing Alice’s dependency by modifying Bob’s computer—leads to several lines of inquiry. The obvious path is using hardware itself to protect data and computation. Another path toward thinking about hardware security comes from considering that computing hardware is the underlying physical environment for computation. As such, the nature of the hardware can directly influence the nature of the computation it hosts. A quick glance at BugTraq [Sec06] or the latest Microsoft security announcements suffices to establish that deploying secure systems on conventional hardware has proved rather hard. This observation raises another question: If we changed the hardware, could we make it easier to solve this problem?
In this chapter, we take a long look at this exciting emerging space.
- Section 16.1 discusses how memory devices may leak secrets, owing to physical attack.
- Section 16.2 considers physical attacks and defenses on more general computing devices.
- Section 16.3 reviews some larger tools the security artisan can use when considering the physical security of computing systems.
- Section 16.4 focuses on security approaches that change the hardware architecture more fundamentally.
- Section 16.5 looks at some future trends regarding hardware security.
(The first author’s earlier book [Smi04c] provides a longer—but older—discussion of many of these issues. Chapter 3 in particular focuses on attacks.)
16.1 Data Remanence
One of the first challenges in protecting computers against adversaries with direct physical contact is protecting the stored data. Typically, one sees this problem framed as how a device can hide critical secrets from external adversaries, although the true problem is more general than this, as we discuss later. Potential attacks and defenses here depend on the type of beast we’re talking about.
We might start by thinking about data remanence: what data an adversary might extract from a device after it has intended to erase it.
16.1.1 Magnetic Media
Historically, nonvolatile magnetic media, such as disks or once ubiquitous tapes, have been notorious for retaining data after deletion. On a physical level, the contents of overwritten cells have been reputed to be readable via magnetic-force microscopy; however, a knowledgeable colleague insists that no documented case exists for any modern disk drive. Nonetheless, researchers (e.g., [Gut96]) and government standards bodies (e.g., [NCS91]) have established guidelines for overwriting cells in order to increase assurance that the previously stored data has been destroyed. (The general idea is to write a binary pattern, then its complement, then repeat many times.)
A complicating factor here is the existence of many layers of abstraction between a high-level request to delete data and what actually happens on the device in question. In between, many things could cause trouble.
- For example, a traditional filesystem usually breaks a file into a series of chunks, each sized to occupy a disk sector, and distributes these chunks on the disk according to various heuristics intended to improve performance. Some type of index table, perhaps in a sector of its own, indicates where each chunk is. In such a system, when the higher-level software deletes or even shrinks a file, the filesystem may respond by clearing that entry in that file’s index table and marking that sector as "free." However, the deleted data may remain on the disk, in this now-free sector. (Issues such as this led to the object reuse worries of the Orange Book world of Chapter 2.)
- Journaling filesystems, a more advanced technology, make things even worse. Journaling filesystems treat the disk not as a place to store files so much as a place to store a log of changes to files. As with Word’s Fast Save option (see Chapter 13), the history of edits that resulted in a file’s current state may be available to the adversary inspecting the disk itself.
- Computing hardware has seen a sort of trickle-down (or perhaps smarting-down) effect, whereby traditionally "dumb" peripherals now feature their own processors and computing ability. Disk controllers are no exception to this trend, leading to yet another level of abstraction between the view the computing system sees and what actually happens with the physical media.
16.1.2 FLASH
In recent years, semiconductor FLASH memory (e.g., in USB thumbdrives) has probably become more ubiquitous than magnetic media for removable storage. FLASH is also standard nonvolatile storage in most embedded devices, such as cell phones and PDAs. The internal structure of a FLASH device is a bit more complex than other semiconductor memories (e.g., see [Nii95]). FLASH is organized into sectors, each usually on the order of tens or hundreds of kilobytes. When in "read" mode, the device acts as an ordinary ROM. To write a sector, the system must put the FLASH device into write mode, which requires writing a special sequence of bytes, essentially opcodes, to special addresses in the FLASH device. Typically, the stored bits can be written only one way (e.g., change only from 0 to 1). To erase a sector (e.g., clearing all the bits back to 0), another sequence of magic bytes must be written. Often, FLASH devices include the ability to turn a designated sector into ROM by wiring a pin a certain way at manufacture time.
FLASH gives two additional challenges for system implementers. First, writing and erasing sectors both take nontrivial time; failure, such as power interruption, during such an interval may lead to undetermined sector corruption. Second, each FLASH cell has a relatively small (e.g., 10,000) lifetime of erase-write cycles.
These technical limitations lead to incredible acrobatics when designing a filesystem for FLASH (e.g., [GT05, Nii95]). In order to avoid wearing out the FLASH sectors, designers will use data structures that selectively mark bits to indicate dirty bytes within sectors and rotate usage throughout the sectors on the device. For fault tolerance, designers may try to make writes easy to undo, so that the old version of a file can be recovered if a failure occurs during the nontrivial duration of a write. Even relatively simple concepts, such as a directory or index table, get interesting—if you decide to keep one, then you’ll quickly wear out that sector, even if you’re clever with the rest of the files.
FLASH architecture has several consequences for security.
- Because of these log-structured and fault-tolerant contortions, old data may still exist in the device even if the higher levels of the system thought it was erased.
- Because an error in a product’s ROM can be expensive, at least one vendor includes an undocumented feature to rewrite the ROM sector by writing a magic series of bytes to the chip. (The complexity of the legitimate magic-byte interface makes it hard to otherwise discover such back doors.)
- Because of the large market demand for low-cost thumbdrives and the smarting down of computation into peripherals, much engineering has gone into commercial FLASH drives, leading to a gap between even the API the encapsulated device provides and the internal state.
16.1.3 RAM
Random-access memory (RAM) is the standard medium for memory during active computation. Dynamic RAM (DRAM) stores each bit as an electrical charge in a capacitor. Since these charges tend to be short-lived, data remanence is not as much of an issue here. This short lifetime leads to additional functionality: The devices need to continually read and restore the charges, before they decay. (One wonders whether this continual processing of stored data might lead to side-channel exposures.) However, the capacitors do not take much real estate on the chip; as a consequence, DRAM tends to be favored when large amounts of memory are required.
In contrast, static RAM (SRAM) stores each bit via the state in a flip-flop, a small collection of logic gates. This approach takes more real estate but does not require the extra functionality and its extra power. As a consequence, when a device needs memory with the properties of RAM (e.g., none of this sector business) but otherwise nonvolatile, it may end up using battery-backed SRAM, which is sometimes referred to as BBRAM.
SRAM, however, is not without remanence issues. Long-term storage of the same bits can cause memory to imprint those values and retain them even after power-up. Environmental factors, such as cold temperatures and radiation, can also cause imprinting. (Gutmann [Gut01] and Weingart [Wei00] both provide more discussion of these issues.)
16.1.4 The System
So far, we’ve discussed properties of the memory medium itself. However, the memory is embedded in the context of a larger system, and this larger context can lead to issues. For example, at many levels in the software stack, software optimization might decide that a write to a data location that will no longer be used is unnecessary and silently eliminate it. This can undo a programmer’s efforts to clear sensitive data. Researchers at Stanford recently used a form of virtualization (see Section 16.4.2) to explore this issue of data lifetime in the context of an entire system—and uncovered many surprising cases of data living longer than the designers or programmers intended or believed [CPG+04].
16.1.5 Side Channels
Devices that instantiate computation in the real world must exist as physical machines in the real world. Because of this physical existence, computational actions the device takes can result in real-world physical actions that the designer can easily fail to foresee but that an adversary can exploit. We discussed many examples of this in Section 8.4.