What Else Can Go on a CPU?
Floating-point coprocessors, memory management units, and vector processors have already been added to modern CPUs. Digital signal processing units (DSPs) have been added to a number of embedded CPUs, and it seems likely that they’ll find their way into consumer CPUs soon.
The first use of additional transistors was to add more execution units, making deeper pipelines and wider superscalar architectures, and then more cache. Now we’re adding entire homogeneous processing units. Each of these only scales to a certain extent, though. The step from one to two cores is a huge improvement; it’s rare for even a CPU-bound process on my computer to be allocated more than 75% of the CPU, and far more common for it to be at around 50%, with the rest shared between other apps and the kernel.
Going from two to four cores is going to be a smaller improvement, but still significant. When you get up to 32 or 64 cores, things get more interesting. It’s almost impossible to write threaded code that scales to this degree and isn’t too buggy to use. It’s easier with an asynchronous message-passing approach, but the popular desktop-development APIs aren’t designed around this model. And, realistically, very little desktop software will need this kind of power. Some will—video editing, for example, can eat about as much CPU power as you throw at it for the foreseeable future. The shrinkage of the high end will continue. These days, many people wouldn’t notice much difference between a 1 GHz Athlon and a 3 GHz Core 2 Duo most of the time. The number of people who need the fastest computer available is already quite small. The number who even need a medium-speed machine is going to shrink.
While mobile computing and datacenter density continue to grow, power consumption is going to be more important. Imagine a 32-core CPU that allows you to turn off cores when not in use. While mobile, you may well find that you need only two or three cores.