1.3 CMOS Technology
CMOS is the dominant integrated circuit technology. In this section we will introduce some basic concepts of CMOS to understand why it is so widespread and some of the challenges introduced by the inherent characteristics of CMOS.
1.3.1 Power Consumption
Power consumption constraints
The huge chips that can be fabricated today are possible only because of the relatively tiny consumption of CMOS circuits. Power consumption is critical at the chip level because much of the power is dissipated as heat, and chips have limited heat dissipation capacity. Even if the system in which a chip is placed can supply large amounts of power, most chips are packaged to dissipate fewer than 10 to 15 Watts of power before they suffer permanent damage (though some chips dissipate well over 50 Watts thanks to special packaging). The power consumption of a logic circuit can, in the worst case, limit the number transistors we can effectively put on a single chip.
Limiting the number of transistors per chip changes system design in several ways. Most obviously, it increases the physical size of a system. Using high-powered circuits also increases power supply and cooling requirements. A more subtle effect is caused by the fact that the time required to transmit a signal between chips is much larger than the time required to send the same signal between two transistors on the same chip; as a result, some of the advantage of using a higher-speed circuit family is lost. Another subtle effect of decreasing the level of integration is that the electrical design of multi-chip systems is more complex: microscopic wires on-chip exhibit parasitic resistance and capacitance, while macroscopic wires between chips have capacitance and inductance, which can cause a number of ringing effects that are much harder to analyze.
The close relationship between power consumption and heat makes low-power design techniques important knowledge for every CMOS designer. Of course, low-energy design is especially important in battery-operated systems like cellular telephones. Energy, in contrast, must be saved by avoiding unnecessary work. We will see throughout the rest of this book that minimizing power and energy consumption requires careful attention to detail at every level of abstraction, from system architecture down to layout.
As CMOS features become smaller, additional power consumption mechanisms come into play. Traditional CMOS consumes power when signals change but consumes only negligible power when idle. In modern CMOS, leakage mechanisms start to drain current even when signals are idle. In the smallest geometry processes, leakage power consumption can be larger than dynamic power consumption. We must introduce new design techniques to combat leakage power.
1.3.2 Design and Testability
Design verification
Our ability to build large chips of unlimited variety introduces the problem of checking whether those chips have been manufactured correctly. Designers accept the need to verify or validate their designs to make sure that the circuits perform the specified function. (Some people use the terms verification and validation interchangeably; a finer distinction reserves verification for formal proofs of correctness, leaving validation to mean any technique which increases confidence in correctness, such as simulation.) Chip designs are simulated to ensure that the chip’s circuits compute the proper functions to a sequence of inputs chosen to exercise the chip.
Manufacturing test
But each chip that comes off the manufacturing line must also undergo manufacturing test—the chip must be exercised to demonstrate that no manufacturing defects rendered the chip useless. Because IC manufacturing tends to introduce certain types of defects and because we want to minimize the time required to test each chip, we can’t just use the input sequences created for design verification to perform manufacturing test. Each chip must be designed to be fully and easily testable. Finding out that a chip is bad only after you have plugged it into a system is annoying at best and dangerous at worst. Customers are unlikely to keep using manufacturers who regularly supply bad chips.
Defects introduced during manufacturing range from the catastrophic—contamination that destroys every transistor on the wafer—to the subtle—a single broken wire or a crystalline defect that kills only one transistor. While some bad chips can be found very easily, each chip must be thoroughly tested to find even subtle flaws that produce erroneous results only occasionally. Tests designed to exercise functionality and expose design bugs don’t always uncover manufacturing defects. We use fault models to identify potential manufacturing problems and determine how they affect the chip’s operation. The most common fault model is stuck-at-0/1: the defect causes a logic gate’s output to be always 0 (or 1), independent of the gate’s input values. We can often determine whether a logic gate’s output is stuck even if we can’t directly observe its outputs or control its inputs. We can generate a good set of manufacturing tests for the chip by assuming each logic gate’s output is stuck at 0 (then 1) and finding an input to the chip which causes different outputs when the fault is present or absent. (Both the stuck-at-0/1 fault model and the assumption that faults occur only one at a time are simplifications, but they often are good enough to give good rejection of faulty chips.)
Testability as a design process
Unfortunately, not all chip designs are equally testable. Some faults may require long input sequences to expose; other faults may not be testable at all, even though they cause chip malfunctions that aren’t covered by the fault model. Traditionally, chip designers have ignored testability problems, leaving them to a separate test engineer who must find a set of inputs to adequately test the chip. If the test engineer can’t change the chip design to fix testability problems, his or her job becomes both difficult and unpleasant. The result is often poorly tested chips whose manufacturing problems are found only after the customer has plugged them into a system. Companies now recognize that the only way to deliver high-quality chips to customers is to make the chip designer responsible for testing, just as the designer is responsible for making the chip run at the required speed. Testability problems can often be fixed easily early in the design process at relatively little cost in area and performance. But modern designers must understand testability requirements, analysis techniques which identify hard-to-test sections of the design, and design techniques which improve testability.
1.3.3 Reliability
Reliability is a lifetime problem
Earlier generations of VLSI technology were robust enough that testing chips at manufacturing time was sufficient to identify working parts—a chip either worked or it didn’t. In today’s nanometer-scale technologies, the problem of determining whether a chip works is more complex. A number of mechanisms can cause transient failures that cause occasional problems but are not repeatable. Some other failure mechanisms, like overheating, cause permanent failures but only after the chip has operated for some time. And more complex manufacturing problems cause problems that are harder to diagnose and may affect performance rather than functionality.
Design-formanufacturability
A number of techniques, referred to as design-for-manufacturability or design-for-yield, are in use today to improve the reliability of chips that come off the manufacturing line. We can make chips more reliable by designing circuits and architectures that reduce design stresses and check for problems. For example, heat is one major cause of chip failure. Proper power management circuitry can reduce the chip’s heat dissipation and reduce the damage caused by overheating. We also need to change the way we design chips. Some of the convenient levels of abstraction that served us well in earlier technologies are no longer entirely appropriate in nanometer technologies. We need to check more thoroughly and be willing to solve reliability problems by modifying design decisions made earlier.