Predicting the Future
A little-known fact about the semiconductor industry is that a significant fraction of their silicon usage is caused by the number of crystal balls they go through. Producing a generation of CPUs is a huge engineering endeavor, but whether it will be successful really depends on how well you predicted the future at the start of the process.
Depending on the complexity of the design, it typically takes 3–5 years to produce a new microarchitecture, so you need to make some guesses very early on. The first guess is how many transistors you can use. Moore’s Law gives a rough indication of the number of transistors on a chip for a fixed dollar value. Over the long term, it’s a fairly good guide, but the short term can bring quite a lot of fluctuations. You start with guessing how much you can sell the chip for when it’s produced. From this number, you can get a rough number of transistors. If you guess too many, your part will cost more than you guessed. If you guess too few, you can always bolt on some extra cache or make the chips cheaper.
This guesswork is further complicated by the fact that most designs last a few years, so you need to make sure that your design can take advantage of more transistors later. The first NetBurst chip, the Pentium 4, used 42 million transistors. The last ones in the series, branded as Pentium D, had more than 200 million. Over the lifespan of the design, the number of transistors increased by a factor of almost six.
Sounds difficult? Actually, that’s the (relatively) easy bit. The really hard part is guessing what people are going to want to buy in 3–5 years. A few generations ago, the simple answer to this was "fast." Of course, that’s not a complete answer. "Fast" depends a lot on your workload. The PowerPC 970 is incredibly fast if your workload consists entirely of independent multiply/add operations, but is slower on other workloads. Intel made the wrong guess about what "fast" meant in the late 1990s with the Pentium MMX. MMX instructions allowed much faster integer operations just as most computationally limited processes began to rely heavily on floating-point operations. AMD did very well in the next generation with its Athlon, which had very good floating-point performance.
These days, speed isn’t enough. Efficiency is also important. The only market in which fast is more important than efficient is the desktop. Laptops and handheld computers need efficiency so that they can run on battery power for longer periods, and servers or cluster nodes need efficiency to keep the power and air-conditioning requirements of the datacenter manageable.