The State of the Art
Dual-core CPUs were first made a commercial success by IBM with the POWER4 series some years ago. The idea is simple: Most big-iron machines have a large number of CPUs, and if you put more than one in each CPU package then you can reduce the physical size.
These days, Intel and AMD have jumped on the dual-core bandwagon and are racing toward quad-cores and beyond. This is a logical development, according to Moore’s Law—one of the most misquoted observations in computing. Moore’s Law states that the number of transistors that can be put on a CPU for a fixed financial investment doubles every 12–24 months. (The exact time period varies, depending on when you ask Gordon Moore, but is usually quoted as 18 months.) If you want to spend more money, you can add more transistors; the Extreme Edition Pentiums do this to have more cache, for example.
The question becomes what to do with these spare transistors. The Pentium II, released in 1997, used 7.5 million transistors. The Itanium 2, released in 2004, used 592 million. Most of these were cache. Adding cache to a CPU is a nice, easy way of using up transistors. Cache is very simple, and adding more is only slightly more complicated than copying-and-pasting part of the chip design a few times. Unfortunately, it starts to get into diminishing returns fairly quickly. Once the entire working set of a process fits in cache, adding more provides no benefit.
The next trick is to add more cores, effectively duplicating the entire CPU. Looking at the transistor counts for the last two CPUs, we see that in 2004 it would have been possible to produce an 80-core Pentium II. Within a decade, it will be economically feasible to produce a single chip with 5,000 P6 cores. Unfortunately, the power requirements of such a chip would mean that it would require its own electricity substation—not to mention a steady supply of liquid nitrogen to cool it. It also looks as though memory technology will only be at the stage where a few percent of them can be kept fed with data. Each could have its own memory bus, but then you’ve got a minimum of 10,000 pins—64,000 for a 64-bit memory bus. Even designing the package in which such a chip would be distributed is a significant engineering challenge; designing a motherboard that would connect each RAM channel to a memory bank is a problem that would give most PCB designers recurring nightmares.
Throwing more cache onto chips worked for a little while. Throwing more cores on will work for a little while longer. Eventually, however, a more intelligent solution will be required.